imsciences 0.9.5.5__tar.gz → 0.9.5.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: imsciences
3
- Version: 0.9.5.5
3
+ Version: 0.9.5.7
4
4
  Summary: IMS Data Processing Package
5
5
  Author: IMS
6
6
  Author-email: cam@im-sciences.com
@@ -24,6 +24,7 @@ Requires-Dist: yfinance
24
24
  Requires-Dist: holidays
25
25
  Requires-Dist: google-analytics-data
26
26
  Requires-Dist: geopandas
27
+ Requires-Dist: geopy
27
28
 
28
29
  # IMS Package Documentation
29
30
 
@@ -49,6 +50,7 @@ Table of Contents
49
50
  5. [Data Pulling](#data-pulling)
50
51
  6. [Installation](#installation)
51
52
  7. [License](#license)
53
+ 8. [Roadmap](#roadmap)
52
54
 
53
55
  ---
54
56
 
@@ -249,14 +251,14 @@ ims_vis = datavis()
249
251
  - **Example**: `pull_ga('GeoExperiment-31c5f5db2c39.json', '111111111', '2023-10-15', 'United Kingdom', ['totalUsers', 'newUsers'])`
250
252
 
251
253
  ## 2. `process_itv_analysis`
252
- - **Description**: Pull in GA4 data for geo experiments.
253
- - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2)`
254
- - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'])`
254
+ - **Description**: Processes region-level data for geo experiments by mapping ITV regions, grouping selected metrics, merging with media spend data, and saving the result.
255
+ - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)`
256
+ - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'], ['newUsers', 'transactions'], ['sum', 'sum'])`
255
257
 
256
258
  ## 3. `process_city_analysis`
257
- - **Description**: Processes city-level data for geo experiments by grouping user metrics, merging with media spend data, and saving the result.
258
- - **Usage**: `process_city_analysis(raw_df, spend_df, output_path, group1, group2, response_column)`
259
- - **Example**: `process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], 'newUsers')`
259
+ - **Description**: Processes city-level data for geo experiments by grouping selected metrics, merging with media spend data, and saving the result.
260
+ - **Usage**: `process_city_analysis(raw_df, spend_df, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)`
261
+ - **Example**: `process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], ['newUsers', 'transactions'], ['sum', 'sum'])`
260
262
 
261
263
  ---
262
264
 
@@ -343,3 +345,11 @@ pip install imsciences
343
345
  This project is licensed under the MIT License. ![License](https://img.shields.io/badge/license-MIT-blue.svg)
344
346
 
345
347
  ---
348
+
349
+ ## Roadmap
350
+
351
+ - [Fixes]: Naming conventions are inconsistent/ have changed from previous seasonality tools (eg. 'seas_nyd' is named 'seas_new_years_day', 'week_1' is named 'seas_1')
352
+ - [Fixes]: Naming conventions can be inconsistent within the data pull (suffix on some var is 'gb' on some it is 'uk' and for others there is no suffix) - furthermore, there is a lack of consistency for global holidays/events (Christmas, Easter, Halloween, etc) - some have regional suffix and others don't.
353
+ - [Additions]: Need to add new data pulls for more macro and seasonal varibles
354
+
355
+ ---
@@ -22,6 +22,7 @@ Table of Contents
22
22
  5. [Data Pulling](#data-pulling)
23
23
  6. [Installation](#installation)
24
24
  7. [License](#license)
25
+ 8. [Roadmap](#roadmap)
25
26
 
26
27
  ---
27
28
 
@@ -222,14 +223,14 @@ ims_vis = datavis()
222
223
  - **Example**: `pull_ga('GeoExperiment-31c5f5db2c39.json', '111111111', '2023-10-15', 'United Kingdom', ['totalUsers', 'newUsers'])`
223
224
 
224
225
  ## 2. `process_itv_analysis`
225
- - **Description**: Pull in GA4 data for geo experiments.
226
- - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2)`
227
- - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'])`
226
+ - **Description**: Processes region-level data for geo experiments by mapping ITV regions, grouping selected metrics, merging with media spend data, and saving the result.
227
+ - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)`
228
+ - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'], ['newUsers', 'transactions'], ['sum', 'sum'])`
228
229
 
229
230
  ## 3. `process_city_analysis`
230
- - **Description**: Processes city-level data for geo experiments by grouping user metrics, merging with media spend data, and saving the result.
231
- - **Usage**: `process_city_analysis(raw_df, spend_df, output_path, group1, group2, response_column)`
232
- - **Example**: `process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], 'newUsers')`
231
+ - **Description**: Processes city-level data for geo experiments by grouping selected metrics, merging with media spend data, and saving the result.
232
+ - **Usage**: `process_city_analysis(raw_df, spend_df, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)`
233
+ - **Example**: `process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], ['newUsers', 'transactions'], ['sum', 'sum'])`
233
234
 
234
235
  ---
235
236
 
@@ -315,4 +316,12 @@ pip install imsciences
315
316
 
316
317
  This project is licensed under the MIT License. ![License](https://img.shields.io/badge/license-MIT-blue.svg)
317
318
 
318
- ---
319
+ ---
320
+
321
+ ## Roadmap
322
+
323
+ - [Fixes]: Naming conventions are inconsistent/ have changed from previous seasonality tools (eg. 'seas_nyd' is named 'seas_new_years_day', 'week_1' is named 'seas_1')
324
+ - [Fixes]: Naming conventions can be inconsistent within the data pull (suffix on some var is 'gb' on some it is 'uk' and for others there is no suffix) - furthermore, there is a lack of consistency for global holidays/events (Christmas, Easter, Halloween, etc) - some have regional suffix and others don't.
325
+ - [Additions]: Need to add new data pulls for more macro and seasonal varibles
326
+
327
+ ---
@@ -26,14 +26,14 @@ class geoprocessing:
26
26
  print(" - Example: pull_ga('GeoExperiment-31c5f5db2c39.json', '111111111', '2023-10-15', 'United Kingdom', ['totalUsers', 'newUsers'])")
27
27
 
28
28
  print("\n2. process_itv_analysis")
29
- print(" - Description: Pull in GA4 data for geo experiments.")
30
- print(" - Usage: process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2)")
31
- print(" - Example:process_itv_analysis(df,'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'])")
29
+ print(" - Description: Processes region-level data for geo experiments by mapping ITV regions, grouping selected metrics, merging with media spend data, and saving the result.")
30
+ print(" - Usage: process_itv_analysis(raw_df, itv_path, cities_path, media_spend_path, output_path, test_group, control_group, columns_to_aggregate, aggregator_list")
31
+ print(" - Example: process_itv_analysis(df, 'itv_regional_mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'], ['newUsers', 'transactions'], ['sum', 'sum']")
32
32
 
33
33
  print("\n3. process_city_analysis")
34
- print(" - Description: Processes city-level data for geo experiments by grouping user metrics, merging with media spend data, and saving the result.")
35
- print(" - Usage: process_city_analysis(raw_df, spend_df, output_path, group1, group2, response_column)")
36
- print(" - Example:process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], 'newUsers')")
34
+ print(" - Description: Processes city-level data for geo experiments by grouping selected metrics, merging with media spend data, and saving the result.")
35
+ print(" - Usage: process_city_analysis(raw_data, spend_data, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)")
36
+ print(" - Example: process_city_analysis(df, spend, 'output.csv', ['Barnsley'], ['Aberdeen'], ['newUsers', 'transactions'], ['sum', 'mean'])")
37
37
 
38
38
  def pull_ga(self, credentials_file, property_id, start_date, country, metrics):
39
39
  """
@@ -137,23 +137,28 @@ class geoprocessing:
137
137
  logging.error(f"An unexpected error occurred: {e}")
138
138
  raise
139
139
 
140
- def process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2):
140
+ def process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, test_group, control_group, columns_to_aggregate, aggregator_list):
141
141
  """
142
142
  Process ITV analysis by mapping geos, grouping data, and merging with media spend.
143
143
 
144
144
  Parameters:
145
- raw_df (pd.DataFrame): Raw input data containing 'geo', 'newUsers', 'totalRevenue', and 'date'.
145
+ raw_df (pd.DataFrame): Raw input data containing columns such as 'geo', plus any metrics to be aggregated.
146
146
  itv_path (str): Path to the ITV regional mapping CSV file.
147
147
  cities_path (str): Path to the Geo Mappings Excel file.
148
148
  media_spend_path (str): Path to the media spend Excel file.
149
149
  output_path (str): Path to save the final output CSV file.
150
150
  group1 (list): List of geo regions for group 1.
151
151
  group2 (list): List of geo regions for group 2.
152
+ columns_to_aggregate (list): List of columns in `raw_df` that need aggregation.
153
+ aggregator_list (list): List of aggregation operations (e.g. ["sum", "mean", ...]) for corresponding columns.
152
154
 
153
155
  Returns:
154
- None
156
+ pd.DataFrame: The final merged and aggregated DataFrame.
155
157
  """
156
- # Load and preprocess data
158
+
159
+ # -----------------------
160
+ # 1. Load and preprocess data
161
+ # -----------------------
157
162
  itv = pd.read_csv(itv_path).dropna(subset=['Latitude', 'Longitude'])
158
163
  cities = pd.read_excel(cities_path).dropna(subset=['Latitude', 'Longitude'])
159
164
 
@@ -163,59 +168,114 @@ class geoprocessing:
163
168
  itv_gdf = gpd.GeoDataFrame(itv, geometry='geometry')
164
169
  cities_gdf = gpd.GeoDataFrame(cities, geometry='geometry')
165
170
 
166
- # Perform spatial join to match geos
167
- joined_gdf = gpd.sjoin_nearest(itv_gdf, cities_gdf, how='inner', distance_col='distance')
171
+ # -----------------------
172
+ # 2. Perform spatial join to match geos
173
+ # -----------------------
174
+ joined_gdf = gpd.sjoin_nearest(
175
+ itv_gdf,
176
+ cities_gdf,
177
+ how='inner',
178
+ distance_col='distance'
179
+ )
168
180
  matched_result = joined_gdf[['ITV Region', 'geo']].drop_duplicates(subset=['geo'])
169
181
 
170
182
  # Handle unmatched geos
171
183
  unmatched_geos = set(cities_gdf['geo']) - set(matched_result['geo'])
172
184
  unmatched_cities_gdf = cities_gdf[cities_gdf['geo'].isin(unmatched_geos)]
173
- nearest_unmatched_gdf = gpd.sjoin_nearest(unmatched_cities_gdf, itv_gdf, how='inner', distance_col='distance')
185
+
186
+ nearest_unmatched_gdf = gpd.sjoin_nearest(
187
+ unmatched_cities_gdf,
188
+ itv_gdf,
189
+ how='inner',
190
+ distance_col='distance'
191
+ )
174
192
 
175
193
  unmatched_geo_mapping = nearest_unmatched_gdf[['geo', 'ITV Region', 'Latitude_right', 'Longitude_right']]
176
194
  unmatched_geo_mapping.columns = ['geo', 'ITV Region', 'Nearest_Latitude', 'Nearest_Longitude']
177
195
 
178
196
  matched_result = pd.concat([matched_result, unmatched_geo_mapping[['geo', 'ITV Region']]])
179
197
 
180
- # Group and filter data
198
+ # -----------------------
199
+ # 3. Merge with raw data
200
+ # -----------------------
181
201
  merged_df = pd.merge(raw_df, matched_result, on='geo', how='left')
182
- merged_df = merged_df[merged_df["geo"] != "(not set)"].drop(columns=['geo'])
183
- merged_df = merged_df.rename(columns={'ITV Region': 'geo', 'newUsers': 'response'})
184
202
 
185
- grouped_df = merged_df.groupby(['date', 'geo'], as_index=False).agg({'response': 'sum', 'totalRevenue': 'sum'})
186
- filtered_df = grouped_df[grouped_df['geo'].isin(group1 + group2)].copy()
203
+ # Remove rows where geo is "(not set)"
204
+ merged_df = merged_df[merged_df["geo"] != "(not set)"]
205
+
206
+ # Replace 'geo' column with 'ITV Region'
207
+ # - We'll keep the "ITV Region" naming for clarity, but you can rename if you like.
208
+ merged_df = merged_df.drop(columns=['geo'])
209
+ merged_df = merged_df.rename(columns={'ITV Region': 'geo'})
210
+
211
+ # -----------------------
212
+ # 4. Group and aggregate
213
+ # -----------------------
214
+ # Build the dictionary for aggregation: {col1: agg1, col2: agg2, ...}
215
+ aggregation_dict = dict(zip(columns_to_aggregate, aggregator_list))
187
216
 
188
- assignment_map = {city: 1 for city in group1}
189
- assignment_map.update({city: 2 for city in group2})
217
+ # Perform the groupby operation
218
+ grouped_df = merged_df.groupby(['date', 'geo'], as_index=False).agg(aggregation_dict)
219
+
220
+ # -----------------------
221
+ # 5. Filter for test & control groups
222
+ # -----------------------
223
+ filtered_df = grouped_df[grouped_df['geo'].isin(test_group + control_group)].copy()
224
+
225
+ assignment_map = {city: 1 for city in test_group}
226
+ assignment_map.update({city: 2 for city in control_group})
190
227
  filtered_df['assignment'] = filtered_df['geo'].map(assignment_map)
191
228
 
192
- # Merge with media spend data
229
+ # -----------------------
230
+ # 6. Merge with media spend
231
+ # -----------------------
193
232
  media_spend_df = pd.read_excel(media_spend_path).rename(columns={'Cost': 'cost'})
194
- analysis_df = pd.merge(filtered_df, media_spend_df, on=['date', 'geo'], how='left')
233
+
234
+ # Merge on date and geo
235
+ analysis_df = pd.merge(
236
+ filtered_df,
237
+ media_spend_df,
238
+ on=['date', 'geo'],
239
+ how='left'
240
+ )
241
+
242
+ # Fill missing cost with 0
195
243
  analysis_df['cost'] = analysis_df['cost'].fillna(0)
196
244
 
197
- # Save the final output
245
+ # -----------------------
246
+ # 7. Save to CSV
247
+ # -----------------------
198
248
  analysis_df.to_csv(output_path, index=False)
199
-
200
- return analysis_df
249
+
250
+ return analysis_df
201
251
 
202
- def process_city_analysis(self, raw_data, spend_data, output_path, group1, group2, response_column):
252
+ def process_city_analysis(self, raw_data, spend_data, output_path, test_group, control_group, columns_to_aggregate, aggregator_list):
203
253
  """
204
- Process city analysis by grouping data, analyzing user metrics, and merging with spend data.
254
+ Process city-level analysis by grouping data, applying custom aggregations,
255
+ and merging with spend data.
205
256
 
206
257
  Parameters:
207
- raw_data (str or pd.DataFrame): Raw input data as a file path (CSV/XLSX) or DataFrame.
208
- spend_data (str or pd.DataFrame): Spend data as a file path (CSV/XLSX) or DataFrame.
209
- output_path (str): Path to save the final output file (CSV or XLSX).
210
- group1 (list): List of city regions for group 1.
211
- group2 (list): List of city regions for group 2.
212
- response_column (str): Column name to be used as the response metric.
258
+ raw_data (str or pd.DataFrame):
259
+ - Raw input data as a file path (CSV/XLSX) or a DataFrame.
260
+ - Must contain 'date' and 'city' columns, plus any columns to be aggregated.
261
+ spend_data (str or pd.DataFrame):
262
+ - Spend data as a file path (CSV/XLSX) or a DataFrame.
263
+ - Must contain 'date', 'geo', and 'cost' columns.
264
+ output_path (str):
265
+ - Path to save the final output file (CSV or XLSX).
266
+ group1 (list):
267
+ - List of city regions to be considered "Test Group" or "Group 1".
268
+ group2 (list):
269
+ - List of city regions to be considered "Control Group" or "Group 2".
270
+ columns_to_aggregate (list):
271
+ - List of columns to apply aggregation to, e.g. ['newUsers', 'transactions'].
272
+ aggregator_list (list):
273
+ - List of corresponding aggregation functions, e.g. ['sum', 'mean'].
274
+ - Must be the same length as columns_to_aggregate.
213
275
 
214
276
  Returns:
215
- pd.DataFrame: Processed DataFrame.
277
+ pd.DataFrame: The final merged, aggregated DataFrame.
216
278
  """
217
- import pandas as pd
218
- import os
219
279
 
220
280
  def read_file(data):
221
281
  """Helper function to handle file paths or return DataFrame directly."""
@@ -239,39 +299,82 @@ class geoprocessing:
239
299
  else:
240
300
  raise ValueError("Unsupported file type. Please use a CSV or XLSX file.")
241
301
 
242
- # Read data
302
+ # -----------------------
303
+ # 1. Read and validate data
304
+ # -----------------------
243
305
  raw_df = read_file(raw_data)
244
- spend_df = read_file(spend_data)
245
-
246
- # Ensure necessary columns are present
247
- required_columns = {'date', 'city', response_column}
248
- if not required_columns.issubset(raw_df.columns):
249
- raise ValueError(f"Input DataFrame must contain the following columns: {required_columns}")
250
-
306
+ raw_df = raw_df.rename(columns={'city': 'geo'})
307
+ spend_df = read_file(spend_data).rename(columns={'Cost': 'cost'})
308
+
309
+ # Columns we minimally need in raw_df
310
+ required_columns = {'date', 'geo'}
311
+ # Ensure the columns to aggregate are there
312
+ required_columns = required_columns.union(set(columns_to_aggregate))
313
+ missing_in_raw = required_columns - set(raw_df.columns)
314
+ if missing_in_raw:
315
+ raise ValueError(
316
+ f"The raw data is missing the following required columns: {missing_in_raw}"
317
+ )
318
+
319
+ # Validate spend data
251
320
  spend_required_columns = {'date', 'geo', 'cost'}
252
- if not spend_required_columns.issubset(spend_df.columns):
253
- raise ValueError(f"Spend DataFrame must contain the following columns: {spend_required_columns}")
254
-
321
+ missing_in_spend = spend_required_columns - set(spend_df.columns)
322
+ if missing_in_spend:
323
+ raise ValueError(
324
+ f"The spend data is missing the following required columns: {missing_in_spend}"
325
+ )
326
+
327
+ # -----------------------
328
+ # 2. Clean and prepare spend data
329
+ # -----------------------
255
330
  # Convert cost column to numeric after stripping currency symbols and commas
256
- spend_df['cost'] = spend_df['cost'].replace('[^\d.]', '', regex=True).astype(float)
257
-
258
- # Rename and process input DataFrame
259
- raw_df = raw_df.rename(columns={'city': 'geo', response_column: 'response'})
260
-
261
- # Filter and group data
262
- filtered_df = raw_df[raw_df['geo'].isin(group1 + group2)].copy()
263
-
264
- grouped_df = filtered_df.groupby(['date', 'geo'], as_index=False).agg({'response': 'sum'})
265
-
266
- assignment_map = {city: 1 for city in group1}
267
- assignment_map.update({city: 2 for city in group2})
331
+ spend_df['cost'] = (
332
+ spend_df['cost']
333
+ .replace('[^\\d.]', '', regex=True)
334
+ .astype(float)
335
+ )
336
+
337
+ # -----------------------
338
+ # 3. Prepare raw data
339
+ # -----------------------
340
+ # Filter only the relevant geos
341
+ filtered_df = raw_df[raw_df['geo'].isin(test_group + control_group)].copy()
342
+ # -----------------------
343
+ # 4. Group and aggregate
344
+ # -----------------------
345
+ # Create a dictionary of {col: agg_function}
346
+ if len(columns_to_aggregate) != len(aggregator_list):
347
+ raise ValueError(
348
+ "columns_to_aggregate and aggregator_list must have the same length."
349
+ )
350
+ aggregation_dict = dict(zip(columns_to_aggregate, aggregator_list))
351
+
352
+ # Perform groupby using the aggregator dictionary
353
+ grouped_df = filtered_df.groupby(['date', 'geo'], as_index=False).agg(aggregation_dict)
354
+
355
+ # -----------------------
356
+ # 5. Map groups (Test vs. Control)
357
+ # -----------------------
358
+ assignment_map = {city: 1 for city in test_group}
359
+ assignment_map.update({city: 2 for city in control_group})
268
360
  grouped_df['assignment'] = grouped_df['geo'].map(assignment_map)
269
361
 
270
- # Merge with spend data
271
- merged_df = pd.merge(grouped_df, spend_df, on=['date', 'geo'], how='left')
362
+ # -----------------------
363
+ # 6. Merge with spend data
364
+ # -----------------------
365
+ merged_df = pd.merge(
366
+ grouped_df,
367
+ spend_df, # has date, geo, cost
368
+ on=['date', 'geo'],
369
+ how='left'
370
+ )
371
+
372
+ # Fill missing cost with 0
272
373
  merged_df['cost'] = merged_df['cost'].fillna(0)
273
374
 
274
- # Save the final output
375
+ # -----------------------
376
+ # 7. Write out results
377
+ # -----------------------
275
378
  write_file(merged_df, output_path)
276
379
 
277
- return merged_df
380
+ return merged_df
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: imsciences
3
- Version: 0.9.5.5
3
+ Version: 0.9.5.7
4
4
  Summary: IMS Data Processing Package
5
5
  Author: IMS
6
6
  Author-email: cam@im-sciences.com
@@ -24,6 +24,7 @@ Requires-Dist: yfinance
24
24
  Requires-Dist: holidays
25
25
  Requires-Dist: google-analytics-data
26
26
  Requires-Dist: geopandas
27
+ Requires-Dist: geopy
27
28
 
28
29
  # IMS Package Documentation
29
30
 
@@ -49,6 +50,7 @@ Table of Contents
49
50
  5. [Data Pulling](#data-pulling)
50
51
  6. [Installation](#installation)
51
52
  7. [License](#license)
53
+ 8. [Roadmap](#roadmap)
52
54
 
53
55
  ---
54
56
 
@@ -249,14 +251,14 @@ ims_vis = datavis()
249
251
  - **Example**: `pull_ga('GeoExperiment-31c5f5db2c39.json', '111111111', '2023-10-15', 'United Kingdom', ['totalUsers', 'newUsers'])`
250
252
 
251
253
  ## 2. `process_itv_analysis`
252
- - **Description**: Pull in GA4 data for geo experiments.
253
- - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, group1, group2)`
254
- - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'])`
254
+ - **Description**: Processes region-level data for geo experiments by mapping ITV regions, grouping selected metrics, merging with media spend data, and saving the result.
255
+ - **Usage**: `process_itv_analysis(self, raw_df, itv_path, cities_path, media_spend_path, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)`
256
+ - **Example**: `process_itv_analysis(df, 'itv regional mapping.csv', 'Geo_Mappings_with_Coordinates.xlsx', 'IMS.xlsx', 'itv_for_test_analysis_itvx.csv', ['West', 'Westcountry', 'Tyne Tees'], ['Central Scotland', 'North Scotland'], ['newUsers', 'transactions'], ['sum', 'sum'])`
255
257
 
256
258
  ## 3. `process_city_analysis`
257
- - **Description**: Processes city-level data for geo experiments by grouping user metrics, merging with media spend data, and saving the result.
258
- - **Usage**: `process_city_analysis(raw_df, spend_df, output_path, group1, group2, response_column)`
259
- - **Example**: `process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], 'newUsers')`
259
+ - **Description**: Processes city-level data for geo experiments by grouping selected metrics, merging with media spend data, and saving the result.
260
+ - **Usage**: `process_city_analysis(raw_df, spend_df, output_path, test_group, control_group, columns_to_aggregate, aggregator_list)`
261
+ - **Example**: `process_city_analysis(df, spend, output, ['Barnsley'], ['Aberdeen'], ['newUsers', 'transactions'], ['sum', 'sum'])`
260
262
 
261
263
  ---
262
264
 
@@ -343,3 +345,11 @@ pip install imsciences
343
345
  This project is licensed under the MIT License. ![License](https://img.shields.io/badge/license-MIT-blue.svg)
344
346
 
345
347
  ---
348
+
349
+ ## Roadmap
350
+
351
+ - [Fixes]: Naming conventions are inconsistent/ have changed from previous seasonality tools (eg. 'seas_nyd' is named 'seas_new_years_day', 'week_1' is named 'seas_1')
352
+ - [Fixes]: Naming conventions can be inconsistent within the data pull (suffix on some var is 'gb' on some it is 'uk' and for others there is no suffix) - furthermore, there is a lack of consistency for global holidays/events (Christmas, Easter, Halloween, etc) - some have regional suffix and others don't.
353
+ - [Additions]: Need to add new data pulls for more macro and seasonal varibles
354
+
355
+ ---
@@ -9,3 +9,4 @@ yfinance
9
9
  holidays
10
10
  google-analytics-data
11
11
  geopandas
12
+ geopy
@@ -8,7 +8,7 @@ def read_md(file_name):
8
8
  return f.read()
9
9
  return ''
10
10
 
11
- VERSION = '0.9.5.5'
11
+ VERSION = '0.9.5.7'
12
12
  DESCRIPTION = 'IMS Data Processing Package'
13
13
  LONG_DESCRIPTION = read_md('README.md')
14
14
 
@@ -24,7 +24,7 @@ setup(
24
24
  packages=find_packages(),
25
25
  install_requires=[
26
26
  "pandas", "plotly", "numpy", "fredapi", "xgboost", "scikit-learn",
27
- "bs4", "yfinance", "holidays", "google-analytics-data", "geopandas",
27
+ "bs4", "yfinance", "holidays", "google-analytics-data", "geopandas", "geopy"
28
28
  ],
29
29
  keywords=['data processing', 'apis', 'data analysis', 'data visualization', 'machine learning'],
30
30
  classifiers=[
File without changes
File without changes