PyPI - eqcctpro - Versions diffs - 0.5.5__tar.gz → 0.5.7__tar.gz - Mend

eqcctpro 0.5.5tar.gz → 0.5.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of eqcctpro might be problematic. Click here for more details.

Files changed (10) hide show

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eqcctpro
-Version: 0.5.5
+Version: 0.5.7
 Summary: EQCCTPro: A powerful seismic event detection toolkit
 Author-email: Constantinos Skevofilax <constantinos.skevofilax@austin.utexas.edu>, Victor Salles <victor.salles@beg.utexas.edu>
 Project-URL: Homepage, https://pypi.org/project/eqcctpro/
@@ -126,20 +126,25 @@ For additional details and package updates, visit the **EQCCTPro PyPI page**:
 ### **Using Sample Waveform Data**
 To understand how **EQCCTPro** works, it is **highly recommended** to use provided sample seismic waveform data as the data source when testing the package.
-Sample seismic waveform data from 50 TexNet stations have provided in the repository under `sample_1_minute_data.zip`.
+1-minute long sample seismic waveforms from 229 TexNet stations have been provided in the repository under the `230_stations_1_min_dt.zip` file.
 ### **Step 1: Unzip the Sample Wavefrom Data**
 After downloading the `.zip` file through the GitHub methods above, run:
 ```sh
-[skevofilaxc] unzip sample_1_minute_data.zip
+[skevofilaxc] unzip 230_stations_1_min_dt.zip
 ```
 ### **Step 2: Check and Understand the Directory Structure**
-The extracted data will contain multiple station directories:
+The extracted data will contain a timechunk subdirectories, comprised of multiple station directories:
 ```sh
-[skevofilaxc sample_1_minute_data]$ ls
-AT01  CF01  DG05  EF54  EF76   HBVL  MB09  MB21   MID02  ODSA  PB16  PB25  PB35  PB52  PH02  SM03  WB11
-BB01  CT02  DG09  EF63  FOAK4  HNDO  MB13  MB25   MID03  PB04  PB17  PB26  PB39  PB54  PL01  SMWD  WB12
-BP01  DB02  EF02  EF75  FW13   MB06  MB19  MID01  MO01   PB11  PB18  PB34  PB42  PECS  SM02  WB06
+[skevofilaxc 230_stations_1_min_dt]$ ls
+20241215T120000Z_20241215T120100Z
+[skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
+237B  BP01  CT02  DG02  DG10  EE04  EF07  EF54  EF63  EF69  EF77   FOAK3  FW06  FW14  HBVL  LWM2  MB05  MB12  MB19   MBBB3  MID03  NM01  OG02  PB05  PB11  PB19  PB26  PB34  PB41  PB51  PB57  PH03  SA06  SGCY  SN02  SN10  WB03  WB09  YK01
+435B  BRDY  CV01  DG04  DKNS  EF02  EF08  EF56  EF64  EF71  ELG6   FOAK4  FW07  FW15  HNDO  LWM3  MB06  MB13  MB21   MBBB5  MLDN   NM02  OG04  PB06  PB12  PB21  PB28  PB35  PB42  PB52  PB58  PL01  SA07  SM01  SN03  SNAG  WB04  WB10
+ALPN  BW01  CW01  DG05  DRIO  EF03  EF09  EF58  EF65  EF72  ET02   FW01   FW09  GV01  HP01  MB01  MB07  MB15  MB22   MBBB6  MNHN   NM03  OZNA  PB07  PB14  PB22  PB29  PB37  PB43  PB53  PB59  PLPT  SA09  SM02  SN04  TREL  WB05  WB11
+APMT  CF01  DB02  DG06  DRZT  EF04  EF51  EF59  EF66  EF74  FLRS   FW02   FW11  GV02  HP02  MB02  MB08  MB16  MB25   MG01   MO01   ODSA  PB01  PB08  PB16  PB23  PB30  PB38  PB44  PB54  PCOS  POST  SAND  SM03  SN07  VHRN  WB06  WB12
+AT01  CRHG  DB03  DG07  EE02  EF05  EF52  EF61  EF67  EF75  FOAK1  FW04   FW12  GV03  INDO  MB03  MB09  MB17  MBBB1  MID01  NGL01  OE01  PB03  PB09  PB17  PB24  PB32  PB39  PB46  PB55  PECS  SA02  SD01  SM04  SN08  VW01  WB07  WTFS
+BB01  CT01  DB04  DG09  EE03  EF06  EF53  EF62  EF68  EF76  FOAK2  FW05   FW13  GV04  LWM1  MB04  MB11  MB18  MBBB2  MID02  NGL02  OG01  PB04  PB10  PB18  PB25  PB33  PB40  PB47  PB56  PH02  SA04  SE01  SMWD  SN09  WB02  WB08  WW01
 ```
 Each subdirectory contains **mSEED** files of different waveform components:
 ```sh
@@ -175,14 +180,29 @@ To process mSEED from various seismic stations, use the **EQCCTMSeedRunner** cla
 **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory, which consists of station directories formatted as follows:
 ```sh
-[skevofilaxc sample_1_minute_data]$ ls
-AT01  CF01  DG05  EF54  EF76   HBVL  MB09  MB21   MID02  ODSA  PB16  PB25  PB35  PB52  PH02  SM03  WB11
-BB01  CT02  DG09  EF63  FOAK4  HNDO  MB13  MB25   MID03  PB04  PB17  PB26  PB39  PB54  PL01  SMWD  WB12
-BP01  DB02  EF02  EF75  FW13   MB06  MB19  MID01  MO01   PB11  PB18  PB34  PB42  PECS  SM02  WB06
+[skevofilaxc 230_stations_1_min_dt]$ ls
+20241215T120000Z_20241215T120100Z
+[skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
+237B  BP01  CT02  DG02  DG10  EE04  EF07  EF54  EF63  EF69  EF77   FOAK3  FW06  FW14  HBVL  LWM2  MB05  MB12  MB19   MBBB3  MID03  NM01  OG02  PB05  PB11  PB19  PB26  PB34  PB41  PB51  PB57  PH03  SA06  SGCY  SN02  SN10  WB03  WB09  YK01
+435B  BRDY  CV01  DG04  DKNS  EF02  EF08  EF56  EF64  EF71  ELG6   FOAK4  FW07  FW15  HNDO  LWM3  MB06  MB13  MB21   MBBB5  MLDN   NM02  OG04  PB06  PB12  PB21  PB28  PB35  PB42  PB52  PB58  PL01  SA07  SM01  SN03  SNAG  WB04  WB10
+ALPN  BW01  CW01  DG05  DRIO  EF03  EF09  EF58  EF65  EF72  ET02   FW01   FW09  GV01  HP01  MB01  MB07  MB15  MB22   MBBB6  MNHN   NM03  OZNA  PB07  PB14  PB22  PB29  PB37  PB43  PB53  PB59  PLPT  SA09  SM02  SN04  TREL  WB05  WB11
+APMT  CF01  DB02  DG06  DRZT  EF04  EF51  EF59  EF66  EF74  FLRS   FW02   FW11  GV02  HP02  MB02  MB08  MB16  MB25   MG01   MO01   ODSA  PB01  PB08  PB16  PB23  PB30  PB38  PB44  PB54  PCOS  POST  SAND  SM03  SN07  VHRN  WB06  WB12
+AT01  CRHG  DB03  DG07  EE02  EF05  EF52  EF61  EF67  EF75  FOAK1  FW04   FW12  GV03  INDO  MB03  MB09  MB17  MBBB1  MID01  NGL01  OE01  PB03  PB09  PB17  PB24  PB32  PB39  PB46  PB55  PECS  SA02  SD01  SM04  SN08  VW01  WB07  WTFS
+BB01  CT01  DB04  DG09  EE03  EF06  EF53  EF62  EF68  EF76  FOAK2  FW05   FW13  GV04  LWM1  MB04  MB11  MB18  MBBB2  MID02  NGL02  OG01  PB04  PB10  PB18  PB25  PB33  PB40  PB47  PB56  PH02  SA04  SE01  SMWD  SN09  WB02  WB08  WW01
 ```
-Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming convention.** Otherwise, EQCCTPro will **not** work.
+Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming conventions.** Otherwise, EQCCTPro will **not** work.
+Create subdirectories for each timechunk (sub-parent directories) and for each station (child directories). The station directories should be named as shown above.
+Each timechunk directory spans from the **start of the analysis period minus the waveform overlap** to the **end of the analysis period**, based on the defined timechunk duration.
-Within each subdirectory, such as PB35, it is made up of mSEED files of different poses (EX. N, E, Z):
+For example:
+```sh
+[skevofilaxc 230_stations_2hr_1_hr_dt]$ ls
+20241215T115800Z_20241215T130000Z  20241215T125800Z_20241215T140000Z
+```
+The timechunk time length is 1 hour long. At the same time, we use a waveform overlap of 2 minutes. Hence: `20241215T115800Z_20241215T130000Z` spans from `11:58:00 to 13:00:00 UTC on Dec 15, 2024` and `20241215T125800Z_20241215T140000Z` spans from `12:58:00 to 14:00:00 UTC on Dec 15, 2024`
+Each station subdirectory, such as PB35, are made up of mSEED files from seismometer different poses (EX. N, E, Z):
 ```sh
 [skevofilaxc PB35]$ ls
 TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed  TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
@@ -207,17 +227,22 @@ eqcct_runner = EQCCTMSeedRunner(
     S_threshold=0.02,
     p_model_filepath='/path/to/model_p.h5',
     s_model_filepath='/path/to/model_s.h5',
-    number_of_concurrent_predictions=5,
+    number_of_concurrent_station_predictions=5,
+    number_of_concurrent_timechunk_predictions=2
     best_usecase_config=True,
     csv_dir='/path/to/csv',
     selected_gpus=[0],
     set_vram_mb=24750,
-    specific_stations='AT01, BP01, DG05'
-)
+    specific_stations='AT01, BP01, DG05',
+    start_time='2024-12-14 12:00:00',
+    end_time='2024-12-15 12:00:00',
+    timechunk_dt=1,
+    waveform_overlap=2)
 eqcct_runner.run_eqcctpro()
 ```
-**EQCCTMseedRunner** has multiple input paramters that need to be configured and are defined below:
+**EQCCTMseedRunner** has multiple input parameters that need to be configured and are defined below:
 - **`use_gpu (bool)`: True or False**
   - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
@@ -232,7 +257,7 @@ eqcct_runner.run_eqcctpro()
     - "I want this program to run only on these specific cores."
 - **`input_dir (str)`**
   - Directory path to the the mSEED directory
-  - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data`
+  - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt`
 - **`output_dir (str)`**
   - Directory path to where the output picks and logs will be sent
   - Doesn't need to exist, will be created if doesn't exist
@@ -249,13 +274,16 @@ eqcct_runner.run_eqcctpro()
   - Filepath to where the P EQCCT detection model is stored
 - **`s_model_filepath (str)`**
   - Filepath to where the S EQCCT detection model is stored
-- **`number_of_concurrent_predictions (int)`**
+- **`number_of_concurrent_station_predictions (int)`**
   - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
-  - EX. if number_of_concurrent_predictions = 5, there will be up to 5 EQCCT instances analyzing 5 different waveforms at the sametime
-  - Best to use the optimal amount for your hardware, which can be identified using **EvaluateSystem** (below)
+  - EX. if number_of_concurrent_station_predictions = 5, up to 5 EQCCT instances can simultaneously analyze waveforms from 5 distinct seismic stations
+  - To use the optimal parameter value for this param, use the **EvaluateSystem** class (can be found below)
+- **`number_of_concurrent_timechunk_predictions (int)`: default = None**
+  - The number of timechunks running in parallel
+  - Avoids the sequential processing of timechunks by processing multiple timechunks in parallel, exponetially reducing runtime
 - **`best_usecase_config (bool)`: default = False**
-  - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall usecase configurations
-  - Best overall usecase configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
+  - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall use-case configurations
+  - Best overall use-case configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
   - Can only be used if EvaluateSystem has been run
 - **`csv_dir (str)`**
   - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
@@ -268,12 +296,26 @@ eqcct_runner.run_eqcctpro()
   - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
 - **`specific_stations (str)`: default = None**
   - String that contains the "list" of stations you want to only analyze
-  - EX. Out of the 50 sample stations in `sample_1_minute_data`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
+  - EX. Out of the 50 sample stations in `230_stations_1_min_dt`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
   - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
-- **`cpu_id_list (list)`: default = [1]**
-  - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
-  - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
-    - "I want this program to run only on these specific cores."
+- **`start_time (str)`: default = None**
+  - The start time of the area of time that is being analyzed
+  - EX. 2024-12-14 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`end_time (str)`: default = None**
+  - The end time of the area of time that is being analyzed
+  - EX. 2024-12-15 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`timechunk_dt (int)`: default = None**
+  - The length each time chunk is (in minutes)
+  - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
+- **`waveform_overlap (int)`: default = None**
+  - The duration (in minutes) for which each waveform overlaps with the others
 ---
@@ -295,10 +337,10 @@ eval_gpu = EvaluateSystem(
                 S_threshold=0.02,
                 p_model_filepath='/path/to/model_p.h5',
                 s_model_filepath='/path/to/model_s.h5',
-                stations2use=2,
                 cpu_id_list=[0,1],
                 set_vram_mb=24750,
-                selected_gpus=[0]
+                selected_gpus=[0],
+                stations2use=2
 )
 eval_gpu.evaluate()
 ```
@@ -318,17 +360,24 @@ eval_cpu = EvaluateSystem(
                 S_threshold=0.02,
                 p_model_filepath='/path/to/model_p.h5',
                 s_model_filepath='/path/to/model_s.h5',
-                stations2use=12,
-                cpu_id_list=range(87,102),
-                starting_amount_of_stations=2,
+                cpu_id_list=range(0,49),
+                min_cpu_amount=20,
+                cpu_test_step_size=1,
+                stations2use=50,
+                starting_amount_of_stations=25,
                 station_list_step_size=1,
-                min_cpu_amount=15,
-                min_conc_predictions=2,
-                conc_predictions_step_size=1)
+                min_conc_stations=25,
+                conc_station_tasks_step_size=5,
+                start_time='2024-12-15 12:00:00',
+                end_time='2024-12-15 14:00:00',
+                conc_timechunk_tasks_step_size=1,
+                timechunk_dt=30,
+                waveform_overlap=2,
+                tmp_dir=tmp_dir)
 eval_cpu.evaluate()
 ```
-**EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Predictions, and Workloads (stations), as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
+**EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Timechunk and Station Tasks, as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
 **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
 the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
@@ -336,14 +385,14 @@ The following input parameters need to be configurated for **EvaluateSystem** to
 - **`mode (str)`**
   - Can be either `cpu` or `gpu`
-  - Tells `EvaluateSystem` which configuration trials should it iterate through
+  - Tells `EvaluateSystem` which computing approach the trials should it iterate with
 - **`intra_threads (int)`: default = 1**
   - Controls how many intra-parallelism threads Tensorflow can use
 - **`inter_threads (int)`: default = 1**
   - Controls how many inter-parallelism threads Tensorflow can use
 - **`input_dir (str)`**
   - Directory path to the the mSEED directory
-  - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data
+  - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt
 - **`output_dir (str)`**
   - Directory path to where the output picks and logs will be sent
   - Doesn't need to exist, will be created if doesn't exist
@@ -363,14 +412,20 @@ The following input parameters need to be configurated for **EvaluateSystem** to
   - Filepath to where the P EQCCT detection model is stored
 - **`s_model_filepath (str)`**
   - Filepath to where the S EQCCT detection model is stored
-- **`stations2use (int)`: default = None**
-  - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
-  - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
 - **`cpu_id_list (list)`: default = [1]**
   - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
   - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
     - "I want this program to run only on these specific cores."
   - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
+- **`min_cpu_amount (int)`: default = 1**
+  - Is the minimum amount of CPUs you want to start your trials with
+  - By default, trials will start iterating with 1 CPU up to the maximum allocated
+  - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
+- **`cpu_test_step_size`: default = 1**
+  - Is the desired step size for the trials will march from `min_cpu_amount` to `len(cpu_id_list)`
+- **`stations2use (int)`: default = None**
+  - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
+  - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
 - **`starting_amount_of_stations (int)`: default = 1**
   - For evaluating your system, you have the option to set a starting amount of stations you want to use in the test
   - By default, the test will start using 1 station but now is configurable
@@ -378,16 +433,34 @@ The following input parameters need to be configurated for **EvaluateSystem** to
   - You can set a step size for the station list that is generated
   - For example if the stepsize is set to 10 and you start with 50 stations with a max of 100, then your list would be: [50, 60, 70, 80, 80, 100]
   - Using 1 will use the default step size of 1-10, then step size of 5 up to station2use
-- **`min_cpu_amount (int)`: default = 1**
-  - Is the minimum amount of CPUs you want to start your trials with
-  - By default, trials will start iterating with 1 CPU up to the maximum allocated
-  - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
-- **`min_conc_predictions (int)`: default = 1**
-  - Is the minimum amount of concurrent predictions you want each trial iteration to start with
+- **`min_conc_stations (int)`: default = 1**
+  - Is the minimum amount of concurrent stations predictions you want each trial iteration to start with
   - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50].
-- **`conc_predictions_step_size (int)`: default = 1**
-  - Is the concurrent predictions step size you want each trial iteration to iterate with with
+- **`conc_station_tasks_step_size (int)`: default = 1**
+  - Is the concurrent station predictions step size you want each trial iteration to iterate with
   - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50]
+- **`start_time (str)`: default = None**
+  - The start time of the area of time that is being analyzed
+  - EX. 2024-12-14 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`end_time (str)`: default = None**
+  - The end time of the area of time that is being analyzed
+  - EX. 2024-12-15 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`conc_timechunk_tasks_step_size (int)`: default = 1**
+  - Is the concurrent timechunk predictions step size you want each trial iteration to iterate with
+- **`timechunk_dt (int)`: default = None**
+  - The length each time chunk is (in minutes)
+  - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
+- **`waveform_overlap (int)`: default = None**
+  - The duration (in minutes) for which each waveform oself.start_timeverlaps with the others
+- **`tmp_dir (str)`: default = 1**
+  - A temporary directory to store all temp files produced by EQCCTPro
+  - Used to help ease system cleanup and to not write to system's default temporary directory
 - **`set_vram_mb (float)`**
   - Value of the maximum amount of VRAM EQCCTPro can use
   - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/README.md RENAMED Viewed

@@ -93,20 +93,25 @@ For additional details and package updates, visit the **EQCCTPro PyPI page**:
 ### **Using Sample Waveform Data**
 To understand how **EQCCTPro** works, it is **highly recommended** to use provided sample seismic waveform data as the data source when testing the package.
-Sample seismic waveform data from 50 TexNet stations have provided in the repository under `sample_1_minute_data.zip`.
+1-minute long sample seismic waveforms from 229 TexNet stations have been provided in the repository under the `230_stations_1_min_dt.zip` file.
 ### **Step 1: Unzip the Sample Wavefrom Data**
 After downloading the `.zip` file through the GitHub methods above, run:
 ```sh
-[skevofilaxc] unzip sample_1_minute_data.zip
+[skevofilaxc] unzip 230_stations_1_min_dt.zip
 ```
 ### **Step 2: Check and Understand the Directory Structure**
-The extracted data will contain multiple station directories:
+The extracted data will contain a timechunk subdirectories, comprised of multiple station directories:
 ```sh
-[skevofilaxc sample_1_minute_data]$ ls
-AT01  CF01  DG05  EF54  EF76   HBVL  MB09  MB21   MID02  ODSA  PB16  PB25  PB35  PB52  PH02  SM03  WB11
-BB01  CT02  DG09  EF63  FOAK4  HNDO  MB13  MB25   MID03  PB04  PB17  PB26  PB39  PB54  PL01  SMWD  WB12
-BP01  DB02  EF02  EF75  FW13   MB06  MB19  MID01  MO01   PB11  PB18  PB34  PB42  PECS  SM02  WB06
+[skevofilaxc 230_stations_1_min_dt]$ ls
+20241215T120000Z_20241215T120100Z
+[skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
+237B  BP01  CT02  DG02  DG10  EE04  EF07  EF54  EF63  EF69  EF77   FOAK3  FW06  FW14  HBVL  LWM2  MB05  MB12  MB19   MBBB3  MID03  NM01  OG02  PB05  PB11  PB19  PB26  PB34  PB41  PB51  PB57  PH03  SA06  SGCY  SN02  SN10  WB03  WB09  YK01
+435B  BRDY  CV01  DG04  DKNS  EF02  EF08  EF56  EF64  EF71  ELG6   FOAK4  FW07  FW15  HNDO  LWM3  MB06  MB13  MB21   MBBB5  MLDN   NM02  OG04  PB06  PB12  PB21  PB28  PB35  PB42  PB52  PB58  PL01  SA07  SM01  SN03  SNAG  WB04  WB10
+ALPN  BW01  CW01  DG05  DRIO  EF03  EF09  EF58  EF65  EF72  ET02   FW01   FW09  GV01  HP01  MB01  MB07  MB15  MB22   MBBB6  MNHN   NM03  OZNA  PB07  PB14  PB22  PB29  PB37  PB43  PB53  PB59  PLPT  SA09  SM02  SN04  TREL  WB05  WB11
+APMT  CF01  DB02  DG06  DRZT  EF04  EF51  EF59  EF66  EF74  FLRS   FW02   FW11  GV02  HP02  MB02  MB08  MB16  MB25   MG01   MO01   ODSA  PB01  PB08  PB16  PB23  PB30  PB38  PB44  PB54  PCOS  POST  SAND  SM03  SN07  VHRN  WB06  WB12
+AT01  CRHG  DB03  DG07  EE02  EF05  EF52  EF61  EF67  EF75  FOAK1  FW04   FW12  GV03  INDO  MB03  MB09  MB17  MBBB1  MID01  NGL01  OE01  PB03  PB09  PB17  PB24  PB32  PB39  PB46  PB55  PECS  SA02  SD01  SM04  SN08  VW01  WB07  WTFS
+BB01  CT01  DB04  DG09  EE03  EF06  EF53  EF62  EF68  EF76  FOAK2  FW05   FW13  GV04  LWM1  MB04  MB11  MB18  MBBB2  MID02  NGL02  OG01  PB04  PB10  PB18  PB25  PB33  PB40  PB47  PB56  PH02  SA04  SE01  SMWD  SN09  WB02  WB08  WW01
 ```
 Each subdirectory contains **mSEED** files of different waveform components:
 ```sh
@@ -142,14 +147,29 @@ To process mSEED from various seismic stations, use the **EQCCTMSeedRunner** cla
 **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory, which consists of station directories formatted as follows:
 ```sh
-[skevofilaxc sample_1_minute_data]$ ls
-AT01  CF01  DG05  EF54  EF76   HBVL  MB09  MB21   MID02  ODSA  PB16  PB25  PB35  PB52  PH02  SM03  WB11
-BB01  CT02  DG09  EF63  FOAK4  HNDO  MB13  MB25   MID03  PB04  PB17  PB26  PB39  PB54  PL01  SMWD  WB12
-BP01  DB02  EF02  EF75  FW13   MB06  MB19  MID01  MO01   PB11  PB18  PB34  PB42  PECS  SM02  WB06
+[skevofilaxc 230_stations_1_min_dt]$ ls
+20241215T120000Z_20241215T120100Z
+[skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
+237B  BP01  CT02  DG02  DG10  EE04  EF07  EF54  EF63  EF69  EF77   FOAK3  FW06  FW14  HBVL  LWM2  MB05  MB12  MB19   MBBB3  MID03  NM01  OG02  PB05  PB11  PB19  PB26  PB34  PB41  PB51  PB57  PH03  SA06  SGCY  SN02  SN10  WB03  WB09  YK01
+435B  BRDY  CV01  DG04  DKNS  EF02  EF08  EF56  EF64  EF71  ELG6   FOAK4  FW07  FW15  HNDO  LWM3  MB06  MB13  MB21   MBBB5  MLDN   NM02  OG04  PB06  PB12  PB21  PB28  PB35  PB42  PB52  PB58  PL01  SA07  SM01  SN03  SNAG  WB04  WB10
+ALPN  BW01  CW01  DG05  DRIO  EF03  EF09  EF58  EF65  EF72  ET02   FW01   FW09  GV01  HP01  MB01  MB07  MB15  MB22   MBBB6  MNHN   NM03  OZNA  PB07  PB14  PB22  PB29  PB37  PB43  PB53  PB59  PLPT  SA09  SM02  SN04  TREL  WB05  WB11
+APMT  CF01  DB02  DG06  DRZT  EF04  EF51  EF59  EF66  EF74  FLRS   FW02   FW11  GV02  HP02  MB02  MB08  MB16  MB25   MG01   MO01   ODSA  PB01  PB08  PB16  PB23  PB30  PB38  PB44  PB54  PCOS  POST  SAND  SM03  SN07  VHRN  WB06  WB12
+AT01  CRHG  DB03  DG07  EE02  EF05  EF52  EF61  EF67  EF75  FOAK1  FW04   FW12  GV03  INDO  MB03  MB09  MB17  MBBB1  MID01  NGL01  OE01  PB03  PB09  PB17  PB24  PB32  PB39  PB46  PB55  PECS  SA02  SD01  SM04  SN08  VW01  WB07  WTFS
+BB01  CT01  DB04  DG09  EE03  EF06  EF53  EF62  EF68  EF76  FOAK2  FW05   FW13  GV04  LWM1  MB04  MB11  MB18  MBBB2  MID02  NGL02  OG01  PB04  PB10  PB18  PB25  PB33  PB40  PB47  PB56  PH02  SA04  SE01  SMWD  SN09  WB02  WB08  WW01
 ```
-Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming convention.** Otherwise, EQCCTPro will **not** work.
+Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming conventions.** Otherwise, EQCCTPro will **not** work.
+Create subdirectories for each timechunk (sub-parent directories) and for each station (child directories). The station directories should be named as shown above.
+Each timechunk directory spans from the **start of the analysis period minus the waveform overlap** to the **end of the analysis period**, based on the defined timechunk duration.
-Within each subdirectory, such as PB35, it is made up of mSEED files of different poses (EX. N, E, Z):
+For example:
+```sh
+[skevofilaxc 230_stations_2hr_1_hr_dt]$ ls
+20241215T115800Z_20241215T130000Z  20241215T125800Z_20241215T140000Z
+```
+The timechunk time length is 1 hour long. At the same time, we use a waveform overlap of 2 minutes. Hence: `20241215T115800Z_20241215T130000Z` spans from `11:58:00 to 13:00:00 UTC on Dec 15, 2024` and `20241215T125800Z_20241215T140000Z` spans from `12:58:00 to 14:00:00 UTC on Dec 15, 2024`
+Each station subdirectory, such as PB35, are made up of mSEED files from seismometer different poses (EX. N, E, Z):
 ```sh
 [skevofilaxc PB35]$ ls
 TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed  TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
@@ -174,17 +194,22 @@ eqcct_runner = EQCCTMSeedRunner(
     S_threshold=0.02,
     p_model_filepath='/path/to/model_p.h5',
     s_model_filepath='/path/to/model_s.h5',
-    number_of_concurrent_predictions=5,
+    number_of_concurrent_station_predictions=5,
+    number_of_concurrent_timechunk_predictions=2
     best_usecase_config=True,
     csv_dir='/path/to/csv',
     selected_gpus=[0],
     set_vram_mb=24750,
-    specific_stations='AT01, BP01, DG05'
-)
+    specific_stations='AT01, BP01, DG05',
+    start_time='2024-12-14 12:00:00',
+    end_time='2024-12-15 12:00:00',
+    timechunk_dt=1,
+    waveform_overlap=2)
 eqcct_runner.run_eqcctpro()
 ```
-**EQCCTMseedRunner** has multiple input paramters that need to be configured and are defined below:
+**EQCCTMseedRunner** has multiple input parameters that need to be configured and are defined below:
 - **`use_gpu (bool)`: True or False**
   - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
@@ -199,7 +224,7 @@ eqcct_runner.run_eqcctpro()
     - "I want this program to run only on these specific cores."
 - **`input_dir (str)`**
   - Directory path to the the mSEED directory
-  - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data`
+  - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt`
 - **`output_dir (str)`**
   - Directory path to where the output picks and logs will be sent
   - Doesn't need to exist, will be created if doesn't exist
@@ -216,13 +241,16 @@ eqcct_runner.run_eqcctpro()
   - Filepath to where the P EQCCT detection model is stored
 - **`s_model_filepath (str)`**
   - Filepath to where the S EQCCT detection model is stored
-- **`number_of_concurrent_predictions (int)`**
+- **`number_of_concurrent_station_predictions (int)`**
   - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
-  - EX. if number_of_concurrent_predictions = 5, there will be up to 5 EQCCT instances analyzing 5 different waveforms at the sametime
-  - Best to use the optimal amount for your hardware, which can be identified using **EvaluateSystem** (below)
+  - EX. if number_of_concurrent_station_predictions = 5, up to 5 EQCCT instances can simultaneously analyze waveforms from 5 distinct seismic stations
+  - To use the optimal parameter value for this param, use the **EvaluateSystem** class (can be found below)
+- **`number_of_concurrent_timechunk_predictions (int)`: default = None**
+  - The number of timechunks running in parallel
+  - Avoids the sequential processing of timechunks by processing multiple timechunks in parallel, exponetially reducing runtime
 - **`best_usecase_config (bool)`: default = False**
-  - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall usecase configurations
-  - Best overall usecase configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
+  - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall use-case configurations
+  - Best overall use-case configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
   - Can only be used if EvaluateSystem has been run
 - **`csv_dir (str)`**
   - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
@@ -235,12 +263,26 @@ eqcct_runner.run_eqcctpro()
   - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
 - **`specific_stations (str)`: default = None**
   - String that contains the "list" of stations you want to only analyze
-  - EX. Out of the 50 sample stations in `sample_1_minute_data`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
+  - EX. Out of the 50 sample stations in `230_stations_1_min_dt`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
   - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
-- **`cpu_id_list (list)`: default = [1]**
-  - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
-  - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
-    - "I want this program to run only on these specific cores."
+- **`start_time (str)`: default = None**
+  - The start time of the area of time that is being analyzed
+  - EX. 2024-12-14 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`end_time (str)`: default = None**
+  - The end time of the area of time that is being analyzed
+  - EX. 2024-12-15 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`timechunk_dt (int)`: default = None**
+  - The length each time chunk is (in minutes)
+  - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
+- **`waveform_overlap (int)`: default = None**
+  - The duration (in minutes) for which each waveform overlaps with the others
 ---
@@ -262,10 +304,10 @@ eval_gpu = EvaluateSystem(
                 S_threshold=0.02,
                 p_model_filepath='/path/to/model_p.h5',
                 s_model_filepath='/path/to/model_s.h5',
-                stations2use=2,
                 cpu_id_list=[0,1],
                 set_vram_mb=24750,
-                selected_gpus=[0]
+                selected_gpus=[0],
+                stations2use=2
 )
 eval_gpu.evaluate()
 ```
@@ -285,17 +327,24 @@ eval_cpu = EvaluateSystem(
                 S_threshold=0.02,
                 p_model_filepath='/path/to/model_p.h5',
                 s_model_filepath='/path/to/model_s.h5',
-                stations2use=12,
-                cpu_id_list=range(87,102),
-                starting_amount_of_stations=2,
+                cpu_id_list=range(0,49),
+                min_cpu_amount=20,
+                cpu_test_step_size=1,
+                stations2use=50,
+                starting_amount_of_stations=25,
                 station_list_step_size=1,
-                min_cpu_amount=15,
-                min_conc_predictions=2,
-                conc_predictions_step_size=1)
+                min_conc_stations=25,
+                conc_station_tasks_step_size=5,
+                start_time='2024-12-15 12:00:00',
+                end_time='2024-12-15 14:00:00',
+                conc_timechunk_tasks_step_size=1,
+                timechunk_dt=30,
+                waveform_overlap=2,
+                tmp_dir=tmp_dir)
 eval_cpu.evaluate()
 ```
-**EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Predictions, and Workloads (stations), as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
+**EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Timechunk and Station Tasks, as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
 **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
 the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
@@ -303,14 +352,14 @@ The following input parameters need to be configurated for **EvaluateSystem** to
 - **`mode (str)`**
   - Can be either `cpu` or `gpu`
-  - Tells `EvaluateSystem` which configuration trials should it iterate through
+  - Tells `EvaluateSystem` which computing approach the trials should it iterate with
 - **`intra_threads (int)`: default = 1**
   - Controls how many intra-parallelism threads Tensorflow can use
 - **`inter_threads (int)`: default = 1**
   - Controls how many inter-parallelism threads Tensorflow can use
 - **`input_dir (str)`**
   - Directory path to the the mSEED directory
-  - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data
+  - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt
 - **`output_dir (str)`**
   - Directory path to where the output picks and logs will be sent
   - Doesn't need to exist, will be created if doesn't exist
@@ -330,14 +379,20 @@ The following input parameters need to be configurated for **EvaluateSystem** to
   - Filepath to where the P EQCCT detection model is stored
 - **`s_model_filepath (str)`**
   - Filepath to where the S EQCCT detection model is stored
-- **`stations2use (int)`: default = None**
-  - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
-  - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
 - **`cpu_id_list (list)`: default = [1]**
   - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
   - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
     - "I want this program to run only on these specific cores."
   - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
+- **`min_cpu_amount (int)`: default = 1**
+  - Is the minimum amount of CPUs you want to start your trials with
+  - By default, trials will start iterating with 1 CPU up to the maximum allocated
+  - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
+- **`cpu_test_step_size`: default = 1**
+  - Is the desired step size for the trials will march from `min_cpu_amount` to `len(cpu_id_list)`
+- **`stations2use (int)`: default = None**
+  - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
+  - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
 - **`starting_amount_of_stations (int)`: default = 1**
   - For evaluating your system, you have the option to set a starting amount of stations you want to use in the test
   - By default, the test will start using 1 station but now is configurable
@@ -345,16 +400,34 @@ The following input parameters need to be configurated for **EvaluateSystem** to
   - You can set a step size for the station list that is generated
   - For example if the stepsize is set to 10 and you start with 50 stations with a max of 100, then your list would be: [50, 60, 70, 80, 80, 100]
   - Using 1 will use the default step size of 1-10, then step size of 5 up to station2use
-- **`min_cpu_amount (int)`: default = 1**
-  - Is the minimum amount of CPUs you want to start your trials with
-  - By default, trials will start iterating with 1 CPU up to the maximum allocated
-  - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
-- **`min_conc_predictions (int)`: default = 1**
-  - Is the minimum amount of concurrent predictions you want each trial iteration to start with
+- **`min_conc_stations (int)`: default = 1**
+  - Is the minimum amount of concurrent stations predictions you want each trial iteration to start with
   - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50].
-- **`conc_predictions_step_size (int)`: default = 1**
-  - Is the concurrent predictions step size you want each trial iteration to iterate with with
+- **`conc_station_tasks_step_size (int)`: default = 1**
+  - Is the concurrent station predictions step size you want each trial iteration to iterate with
   - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50]
+- **`start_time (str)`: default = None**
+  - The start time of the area of time that is being analyzed
+  - EX. 2024-12-14 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`end_time (str)`: default = None**
+  - The end time of the area of time that is being analyzed
+  - EX. 2024-12-15 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`conc_timechunk_tasks_step_size (int)`: default = 1**
+  - Is the concurrent timechunk predictions step size you want each trial iteration to iterate with
+- **`timechunk_dt (int)`: default = None**
+  - The length each time chunk is (in minutes)
+  - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
+- **`waveform_overlap (int)`: default = None**
+  - The duration (in minutes) for which each waveform oself.start_timeverlaps with the others
+- **`tmp_dir (str)`: default = 1**
+  - A temporary directory to store all temp files produced by EQCCTPro
+  - Used to help ease system cleanup and to not write to system's default temporary directory
 - **`set_vram_mb (float)`**
   - Value of the maximum amount of VRAM EQCCTPro can use
   - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/eqcctpro.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eqcctpro
-Version: 0.5.5
+Version: 0.5.7
 Summary: EQCCTPro: A powerful seismic event detection toolkit
 Author-email: Constantinos Skevofilax <constantinos.skevofilax@austin.utexas.edu>, Victor Salles <victor.salles@beg.utexas.edu>
 Project-URL: Homepage, https://pypi.org/project/eqcctpro/
@@ -126,20 +126,25 @@ For additional details and package updates, visit the **EQCCTPro PyPI page**:
 ### **Using Sample Waveform Data**
 To understand how **EQCCTPro** works, it is **highly recommended** to use provided sample seismic waveform data as the data source when testing the package.
-Sample seismic waveform data from 50 TexNet stations have provided in the repository under `sample_1_minute_data.zip`.
+1-minute long sample seismic waveforms from 229 TexNet stations have been provided in the repository under the `230_stations_1_min_dt.zip` file.
 ### **Step 1: Unzip the Sample Wavefrom Data**
 After downloading the `.zip` file through the GitHub methods above, run:
 ```sh
-[skevofilaxc] unzip sample_1_minute_data.zip
+[skevofilaxc] unzip 230_stations_1_min_dt.zip
 ```
 ### **Step 2: Check and Understand the Directory Structure**
-The extracted data will contain multiple station directories:
+The extracted data will contain a timechunk subdirectories, comprised of multiple station directories:
 ```sh
-[skevofilaxc sample_1_minute_data]$ ls
-AT01  CF01  DG05  EF54  EF76   HBVL  MB09  MB21   MID02  ODSA  PB16  PB25  PB35  PB52  PH02  SM03  WB11
-BB01  CT02  DG09  EF63  FOAK4  HNDO  MB13  MB25   MID03  PB04  PB17  PB26  PB39  PB54  PL01  SMWD  WB12
-BP01  DB02  EF02  EF75  FW13   MB06  MB19  MID01  MO01   PB11  PB18  PB34  PB42  PECS  SM02  WB06
+[skevofilaxc 230_stations_1_min_dt]$ ls
+20241215T120000Z_20241215T120100Z
+[skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
+237B  BP01  CT02  DG02  DG10  EE04  EF07  EF54  EF63  EF69  EF77   FOAK3  FW06  FW14  HBVL  LWM2  MB05  MB12  MB19   MBBB3  MID03  NM01  OG02  PB05  PB11  PB19  PB26  PB34  PB41  PB51  PB57  PH03  SA06  SGCY  SN02  SN10  WB03  WB09  YK01
+435B  BRDY  CV01  DG04  DKNS  EF02  EF08  EF56  EF64  EF71  ELG6   FOAK4  FW07  FW15  HNDO  LWM3  MB06  MB13  MB21   MBBB5  MLDN   NM02  OG04  PB06  PB12  PB21  PB28  PB35  PB42  PB52  PB58  PL01  SA07  SM01  SN03  SNAG  WB04  WB10
+ALPN  BW01  CW01  DG05  DRIO  EF03  EF09  EF58  EF65  EF72  ET02   FW01   FW09  GV01  HP01  MB01  MB07  MB15  MB22   MBBB6  MNHN   NM03  OZNA  PB07  PB14  PB22  PB29  PB37  PB43  PB53  PB59  PLPT  SA09  SM02  SN04  TREL  WB05  WB11
+APMT  CF01  DB02  DG06  DRZT  EF04  EF51  EF59  EF66  EF74  FLRS   FW02   FW11  GV02  HP02  MB02  MB08  MB16  MB25   MG01   MO01   ODSA  PB01  PB08  PB16  PB23  PB30  PB38  PB44  PB54  PCOS  POST  SAND  SM03  SN07  VHRN  WB06  WB12
+AT01  CRHG  DB03  DG07  EE02  EF05  EF52  EF61  EF67  EF75  FOAK1  FW04   FW12  GV03  INDO  MB03  MB09  MB17  MBBB1  MID01  NGL01  OE01  PB03  PB09  PB17  PB24  PB32  PB39  PB46  PB55  PECS  SA02  SD01  SM04  SN08  VW01  WB07  WTFS
+BB01  CT01  DB04  DG09  EE03  EF06  EF53  EF62  EF68  EF76  FOAK2  FW05   FW13  GV04  LWM1  MB04  MB11  MB18  MBBB2  MID02  NGL02  OG01  PB04  PB10  PB18  PB25  PB33  PB40  PB47  PB56  PH02  SA04  SE01  SMWD  SN09  WB02  WB08  WW01
 ```
 Each subdirectory contains **mSEED** files of different waveform components:
 ```sh
@@ -175,14 +180,29 @@ To process mSEED from various seismic stations, use the **EQCCTMSeedRunner** cla
 **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory, which consists of station directories formatted as follows:
 ```sh
-[skevofilaxc sample_1_minute_data]$ ls
-AT01  CF01  DG05  EF54  EF76   HBVL  MB09  MB21   MID02  ODSA  PB16  PB25  PB35  PB52  PH02  SM03  WB11
-BB01  CT02  DG09  EF63  FOAK4  HNDO  MB13  MB25   MID03  PB04  PB17  PB26  PB39  PB54  PL01  SMWD  WB12
-BP01  DB02  EF02  EF75  FW13   MB06  MB19  MID01  MO01   PB11  PB18  PB34  PB42  PECS  SM02  WB06
+[skevofilaxc 230_stations_1_min_dt]$ ls
+20241215T120000Z_20241215T120100Z
+[skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
+237B  BP01  CT02  DG02  DG10  EE04  EF07  EF54  EF63  EF69  EF77   FOAK3  FW06  FW14  HBVL  LWM2  MB05  MB12  MB19   MBBB3  MID03  NM01  OG02  PB05  PB11  PB19  PB26  PB34  PB41  PB51  PB57  PH03  SA06  SGCY  SN02  SN10  WB03  WB09  YK01
+435B  BRDY  CV01  DG04  DKNS  EF02  EF08  EF56  EF64  EF71  ELG6   FOAK4  FW07  FW15  HNDO  LWM3  MB06  MB13  MB21   MBBB5  MLDN   NM02  OG04  PB06  PB12  PB21  PB28  PB35  PB42  PB52  PB58  PL01  SA07  SM01  SN03  SNAG  WB04  WB10
+ALPN  BW01  CW01  DG05  DRIO  EF03  EF09  EF58  EF65  EF72  ET02   FW01   FW09  GV01  HP01  MB01  MB07  MB15  MB22   MBBB6  MNHN   NM03  OZNA  PB07  PB14  PB22  PB29  PB37  PB43  PB53  PB59  PLPT  SA09  SM02  SN04  TREL  WB05  WB11
+APMT  CF01  DB02  DG06  DRZT  EF04  EF51  EF59  EF66  EF74  FLRS   FW02   FW11  GV02  HP02  MB02  MB08  MB16  MB25   MG01   MO01   ODSA  PB01  PB08  PB16  PB23  PB30  PB38  PB44  PB54  PCOS  POST  SAND  SM03  SN07  VHRN  WB06  WB12
+AT01  CRHG  DB03  DG07  EE02  EF05  EF52  EF61  EF67  EF75  FOAK1  FW04   FW12  GV03  INDO  MB03  MB09  MB17  MBBB1  MID01  NGL01  OE01  PB03  PB09  PB17  PB24  PB32  PB39  PB46  PB55  PECS  SA02  SD01  SM04  SN08  VW01  WB07  WTFS
+BB01  CT01  DB04  DG09  EE03  EF06  EF53  EF62  EF68  EF76  FOAK2  FW05   FW13  GV04  LWM1  MB04  MB11  MB18  MBBB2  MID02  NGL02  OG01  PB04  PB10  PB18  PB25  PB33  PB40  PB47  PB56  PH02  SA04  SE01  SMWD  SN09  WB02  WB08  WW01
 ```
-Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming convention.** Otherwise, EQCCTPro will **not** work.
+Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming conventions.** Otherwise, EQCCTPro will **not** work.
+Create subdirectories for each timechunk (sub-parent directories) and for each station (child directories). The station directories should be named as shown above.
+Each timechunk directory spans from the **start of the analysis period minus the waveform overlap** to the **end of the analysis period**, based on the defined timechunk duration.
-Within each subdirectory, such as PB35, it is made up of mSEED files of different poses (EX. N, E, Z):
+For example:
+```sh
+[skevofilaxc 230_stations_2hr_1_hr_dt]$ ls
+20241215T115800Z_20241215T130000Z  20241215T125800Z_20241215T140000Z
+```
+The timechunk time length is 1 hour long. At the same time, we use a waveform overlap of 2 minutes. Hence: `20241215T115800Z_20241215T130000Z` spans from `11:58:00 to 13:00:00 UTC on Dec 15, 2024` and `20241215T125800Z_20241215T140000Z` spans from `12:58:00 to 14:00:00 UTC on Dec 15, 2024`
+Each station subdirectory, such as PB35, are made up of mSEED files from seismometer different poses (EX. N, E, Z):
 ```sh
 [skevofilaxc PB35]$ ls
 TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed  TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
@@ -207,17 +227,22 @@ eqcct_runner = EQCCTMSeedRunner(
     S_threshold=0.02,
     p_model_filepath='/path/to/model_p.h5',
     s_model_filepath='/path/to/model_s.h5',
-    number_of_concurrent_predictions=5,
+    number_of_concurrent_station_predictions=5,
+    number_of_concurrent_timechunk_predictions=2
     best_usecase_config=True,
     csv_dir='/path/to/csv',
     selected_gpus=[0],
     set_vram_mb=24750,
-    specific_stations='AT01, BP01, DG05'
-)
+    specific_stations='AT01, BP01, DG05',
+    start_time='2024-12-14 12:00:00',
+    end_time='2024-12-15 12:00:00',
+    timechunk_dt=1,
+    waveform_overlap=2)
 eqcct_runner.run_eqcctpro()
 ```
-**EQCCTMseedRunner** has multiple input paramters that need to be configured and are defined below:
+**EQCCTMseedRunner** has multiple input parameters that need to be configured and are defined below:
 - **`use_gpu (bool)`: True or False**
   - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
@@ -232,7 +257,7 @@ eqcct_runner.run_eqcctpro()
     - "I want this program to run only on these specific cores."
 - **`input_dir (str)`**
   - Directory path to the the mSEED directory
-  - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data`
+  - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt`
 - **`output_dir (str)`**
   - Directory path to where the output picks and logs will be sent
   - Doesn't need to exist, will be created if doesn't exist
@@ -249,13 +274,16 @@ eqcct_runner.run_eqcctpro()
   - Filepath to where the P EQCCT detection model is stored
 - **`s_model_filepath (str)`**
   - Filepath to where the S EQCCT detection model is stored
-- **`number_of_concurrent_predictions (int)`**
+- **`number_of_concurrent_station_predictions (int)`**
   - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
-  - EX. if number_of_concurrent_predictions = 5, there will be up to 5 EQCCT instances analyzing 5 different waveforms at the sametime
-  - Best to use the optimal amount for your hardware, which can be identified using **EvaluateSystem** (below)
+  - EX. if number_of_concurrent_station_predictions = 5, up to 5 EQCCT instances can simultaneously analyze waveforms from 5 distinct seismic stations
+  - To use the optimal parameter value for this param, use the **EvaluateSystem** class (can be found below)
+- **`number_of_concurrent_timechunk_predictions (int)`: default = None**
+  - The number of timechunks running in parallel
+  - Avoids the sequential processing of timechunks by processing multiple timechunks in parallel, exponetially reducing runtime
 - **`best_usecase_config (bool)`: default = False**
-  - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall usecase configurations
-  - Best overall usecase configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
+  - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall use-case configurations
+  - Best overall use-case configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
   - Can only be used if EvaluateSystem has been run
 - **`csv_dir (str)`**
   - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
@@ -268,12 +296,26 @@ eqcct_runner.run_eqcctpro()
   - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
 - **`specific_stations (str)`: default = None**
   - String that contains the "list" of stations you want to only analyze
-  - EX. Out of the 50 sample stations in `sample_1_minute_data`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
+  - EX. Out of the 50 sample stations in `230_stations_1_min_dt`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
   - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
-- **`cpu_id_list (list)`: default = [1]**
-  - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
-  - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
-    - "I want this program to run only on these specific cores."
+- **`start_time (str)`: default = None**
+  - The start time of the area of time that is being analyzed
+  - EX. 2024-12-14 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`end_time (str)`: default = None**
+  - The end time of the area of time that is being analyzed
+  - EX. 2024-12-15 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`timechunk_dt (int)`: default = None**
+  - The length each time chunk is (in minutes)
+  - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
+- **`waveform_overlap (int)`: default = None**
+  - The duration (in minutes) for which each waveform overlaps with the others
 ---
@@ -295,10 +337,10 @@ eval_gpu = EvaluateSystem(
                 S_threshold=0.02,
                 p_model_filepath='/path/to/model_p.h5',
                 s_model_filepath='/path/to/model_s.h5',
-                stations2use=2,
                 cpu_id_list=[0,1],
                 set_vram_mb=24750,
-                selected_gpus=[0]
+                selected_gpus=[0],
+                stations2use=2
 )
 eval_gpu.evaluate()
 ```
@@ -318,17 +360,24 @@ eval_cpu = EvaluateSystem(
                 S_threshold=0.02,
                 p_model_filepath='/path/to/model_p.h5',
                 s_model_filepath='/path/to/model_s.h5',
-                stations2use=12,
-                cpu_id_list=range(87,102),
-                starting_amount_of_stations=2,
+                cpu_id_list=range(0,49),
+                min_cpu_amount=20,
+                cpu_test_step_size=1,
+                stations2use=50,
+                starting_amount_of_stations=25,
                 station_list_step_size=1,
-                min_cpu_amount=15,
-                min_conc_predictions=2,
-                conc_predictions_step_size=1)
+                min_conc_stations=25,
+                conc_station_tasks_step_size=5,
+                start_time='2024-12-15 12:00:00',
+                end_time='2024-12-15 14:00:00',
+                conc_timechunk_tasks_step_size=1,
+                timechunk_dt=30,
+                waveform_overlap=2,
+                tmp_dir=tmp_dir)
 eval_cpu.evaluate()
 ```
-**EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Predictions, and Workloads (stations), as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
+**EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Timechunk and Station Tasks, as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
 **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
 the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
@@ -336,14 +385,14 @@ The following input parameters need to be configurated for **EvaluateSystem** to
 - **`mode (str)`**
   - Can be either `cpu` or `gpu`
-  - Tells `EvaluateSystem` which configuration trials should it iterate through
+  - Tells `EvaluateSystem` which computing approach the trials should it iterate with
 - **`intra_threads (int)`: default = 1**
   - Controls how many intra-parallelism threads Tensorflow can use
 - **`inter_threads (int)`: default = 1**
   - Controls how many inter-parallelism threads Tensorflow can use
 - **`input_dir (str)`**
   - Directory path to the the mSEED directory
-  - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data
+  - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt
 - **`output_dir (str)`**
   - Directory path to where the output picks and logs will be sent
   - Doesn't need to exist, will be created if doesn't exist
@@ -363,14 +412,20 @@ The following input parameters need to be configurated for **EvaluateSystem** to
   - Filepath to where the P EQCCT detection model is stored
 - **`s_model_filepath (str)`**
   - Filepath to where the S EQCCT detection model is stored
-- **`stations2use (int)`: default = None**
-  - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
-  - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
 - **`cpu_id_list (list)`: default = [1]**
   - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
   - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
     - "I want this program to run only on these specific cores."
   - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
+- **`min_cpu_amount (int)`: default = 1**
+  - Is the minimum amount of CPUs you want to start your trials with
+  - By default, trials will start iterating with 1 CPU up to the maximum allocated
+  - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
+- **`cpu_test_step_size`: default = 1**
+  - Is the desired step size for the trials will march from `min_cpu_amount` to `len(cpu_id_list)`
+- **`stations2use (int)`: default = None**
+  - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
+  - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
 - **`starting_amount_of_stations (int)`: default = 1**
   - For evaluating your system, you have the option to set a starting amount of stations you want to use in the test
   - By default, the test will start using 1 station but now is configurable
@@ -378,16 +433,34 @@ The following input parameters need to be configurated for **EvaluateSystem** to
   - You can set a step size for the station list that is generated
   - For example if the stepsize is set to 10 and you start with 50 stations with a max of 100, then your list would be: [50, 60, 70, 80, 80, 100]
   - Using 1 will use the default step size of 1-10, then step size of 5 up to station2use
-- **`min_cpu_amount (int)`: default = 1**
-  - Is the minimum amount of CPUs you want to start your trials with
-  - By default, trials will start iterating with 1 CPU up to the maximum allocated
-  - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
-- **`min_conc_predictions (int)`: default = 1**
-  - Is the minimum amount of concurrent predictions you want each trial iteration to start with
+- **`min_conc_stations (int)`: default = 1**
+  - Is the minimum amount of concurrent stations predictions you want each trial iteration to start with
   - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50].
-- **`conc_predictions_step_size (int)`: default = 1**
-  - Is the concurrent predictions step size you want each trial iteration to iterate with with
+- **`conc_station_tasks_step_size (int)`: default = 1**
+  - Is the concurrent station predictions step size you want each trial iteration to iterate with
   - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50]
+- **`start_time (str)`: default = None**
+  - The start time of the area of time that is being analyzed
+  - EX. 2024-12-14 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`end_time (str)`: default = None**
+  - The end time of the area of time that is being analyzed
+  - EX. 2024-12-15 12:00:00
+  - Must follow the following convention YYYY-MO-DA HR:MI:SC
+  - Used to create a list of defined timechunks from the defined analysis timeframe
+  - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
+- **`conc_timechunk_tasks_step_size (int)`: default = 1**
+  - Is the concurrent timechunk predictions step size you want each trial iteration to iterate with
+- **`timechunk_dt (int)`: default = None**
+  - The length each time chunk is (in minutes)
+  - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
+- **`waveform_overlap (int)`: default = None**
+  - The duration (in minutes) for which each waveform oself.start_timeverlaps with the others
+- **`tmp_dir (str)`: default = 1**
+  - A temporary directory to store all temp files produced by EQCCTPro
+  - Used to help ease system cleanup and to not write to system's default temporary directory
 - **`set_vram_mb (float)`**
   - Value of the maximum amount of VRAM EQCCTPro can use
   - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "eqcctpro"
-version = "0.5.5"
+version = "0.5.7"
 description = "EQCCTPro: A powerful seismic event detection toolkit"
 readme = "README.md"
 requires-python = ">=3.10.14"

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/eqcctpro/__init__.py RENAMED Viewed

File without changes

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/eqcctpro.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/eqcctpro.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/eqcctpro.egg-info/requires.txt RENAMED Viewed

File without changes

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/eqcctpro.egg-info/top_level.txt RENAMED Viewed

File without changes

{eqcctpro-0.5.5 → eqcctpro-0.5.7}/setup.cfg RENAMED Viewed

File without changes

eqcctpro 0.5.5__tar.gz → 0.5.7__tar.gz

Potentially problematic release.

eqcctpro 0.5.5tar.gz → 0.5.7tar.gz