eqcctpro 0.6.2__py3-none-any.whl → 0.7.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,541 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: eqcctpro
3
- Version: 0.6.2
4
- Summary: EQCCTPro: A powerful seismic event detection toolkit
5
- Author-email: Constantinos Skevofilax <constantinos.skevofilax@austin.utexas.edu>, Victor Salles <victor.salles@beg.utexas.edu>
6
- Project-URL: Homepage, https://pypi.org/project/eqcctpro/
7
- Project-URL: Repository, https://github.com/ut-beg-texnet/eqcct/tree/main/eqcctpro
8
- Project-URL: Issues, https://github.com/ut-beg-texnet/eqcct/tree/main/eqcctpro/issues
9
- Project-URL: Documentation, https://github.com/ut-beg-texnet/eqcct/blob/main/eqcctpro/README.md
10
- Requires-Python: >=3.10.14
11
- Description-Content-Type: text/markdown
12
- Requires-Dist: numpy==1.26.4
13
- Requires-Dist: pandas==2.2.3
14
- Requires-Dist: matplotlib==3.10.0
15
- Requires-Dist: obspy==1.4.1
16
- Requires-Dist: progress==1.6
17
- Requires-Dist: psutil==6.1.1
18
- Requires-Dist: ray==2.42.1
19
- Requires-Dist: schedule==1.2.2
20
- Requires-Dist: sdnotify==0.3.2
21
- Requires-Dist: tensorflow<2.19,>=2.15
22
- Requires-Dist: tensorflow-estimator<2.19,>=2.15
23
- Requires-Dist: tensorflow-io-gcs-filesystem==0.37.1
24
- Requires-Dist: tensorboard==2.15.2
25
- Requires-Dist: tensorboard-data-server==0.7.2
26
- Requires-Dist: silence-tensorflow==1.2.3
27
- Requires-Dist: scipy==1.15.1
28
- Requires-Dist: protobuf==4.25.8
29
- Requires-Dist: grpcio==1.70.0
30
- Requires-Dist: absl-py==2.1.0
31
- Requires-Dist: h5py==3.12.1
32
- Requires-Dist: pynvml==12.0.0
33
-
34
- # **EQCCTPro: powerful seismic event detection toolkit**
35
-
36
- EQCCTPro is a high-performace seismic event detection and processing framework that leverages DL-pickers, like EQCCT, to process seismic data efficiently. It enables users to fully leverage the computational ability of their computing resources for maximum performance for simultaneous seismic waveform processing, achieving real-time performance by identifying and utilizing the optimal computational configurations for their hardware. More information about the development, capabilities, and real-world applications about EQCCTPro can be read about in our research publication here.
37
-
38
- ## **Features**
39
- - Supports both CPU and GPU execution
40
- - Configurable parallelism execution for optimized performance
41
- - Includes tools for evaluating system performance for optimal usecase configurations
42
- - Automatic selection of best-usecase configurations
43
- - Efficient handling of large-scale seismic data
44
-
45
- # **Installation Guide**
46
- There are **two installation methods** for EQCCTPro:
47
-
48
- 1. **Method 1: Install EQCCTPro out of the box** (for experienced users)
49
- 2. **Method 2: Install EQCCTPro with sample waveform data** (recommended for first-time users)
50
-
51
- It is **highly recommended** that first-time users pull the `EQCCTPro` folder, which includes sample waveform data and code to help get acquainted with **EQCCTPro**.
52
-
53
- ---
54
-
55
- ## **Method 1: Install EQCCTPro (No Sample Data)**
56
- This method installs only the EQCCTPro package **without** the sample waveform data.
57
-
58
- ### **Step 1: Create a Clean Conda Environment for the Install**
59
- EQCCTPro **requires Python 3.10.14 or higher as well as minimum Tensorflow packages**. If you have a clean working environment, you can simply run `pip install eqcctpro`. However, if you have a nonclean environment, its highly recommended to create a new conda environment so that you can install the necessary packages safely with no issues. You can create a new conda environment with the correct Python version by using the following commands:
60
-
61
- ```sh
62
- conda create --name yourenvironemntname python=3.10.14 -y
63
- conda activate yourenvironemntname
64
- python3 --version
65
- ```
66
- Expected output:
67
- ```
68
- Python 3.10.14
69
- ```
70
-
71
- After activating your new conda environment, run the following command:
72
- ```sh
73
- pip install eqcctpro
74
- ```
75
- You will have access to EQCCTPro and its functionality. However you will not have immediate access to the provided sample waveform data to use for testing. Youcan pull the waveform data either by downloading the .zip file from the repository or by following step 3.
76
-
77
- ### **Step 3 (Optional): Pull the EQCCTPro Folder**
78
- Although not required, **it is highly recommended** to pull the `EQCCTPro` folder to gain access to sample waveform data for testing.
79
-
80
- ```sh
81
- mkdir my_work_directory
82
- cd my_work_directory
83
- git clone --depth 1 --filter=tree:0 https://github.com/ut-beg-texnet/eqcct.git --sparse
84
- cd eqcct
85
- git sparse-checkout set eqcctpro
86
- ```
87
-
88
- ---
89
-
90
- ## **Method 2: Install EQCCTPro with Sample Data (Recommended for First-Time Users)**
91
- This method sets up EQCCTPro **with a pre-created conda environment and sample waveform data**.
92
-
93
- ### **Step 1: Clone the EQCCTPro Repository**
94
- ```sh
95
- mkdir my_work_directory
96
- cd my_work_directory
97
- git clone --depth 1 --filter=tree:0 https://github.com/ut-beg-texnet/eqcct.git --sparse
98
- cd eqcct
99
- git sparse-checkout set eqcctpro
100
- ```
101
-
102
- ### **Step 2: Create and Activate the Conda Environment**
103
- A **pre-configured conda environment** is included in the repository to handle all dependencies.
104
-
105
- ```sh
106
- conda env create -f environment.yml
107
- conda activate eqcctpro
108
- ```
109
-
110
- ### **Step 3: Install EQCCTPro**
111
- After activating the environment, install the EQCCTPro package:
112
- ```sh
113
- pip install eqcctpro
114
- ```
115
-
116
- This will install any remaining dependencies needed for **EQCCTPro**.
117
-
118
- ---
119
-
120
- ## **More Information**
121
- For additional details and package updates, visit the **EQCCTPro PyPI page**:
122
- 🔗 [EQCCTPro on PyPI](https://pypi.org/project/eqcctpro/)
123
-
124
- ---
125
-
126
- ### **Using Sample Waveform Data**
127
- To understand how **EQCCTPro** works, it is **highly recommended** to use provided sample seismic waveform data as the data source when testing the package.
128
-
129
- 1-minute long sample seismic waveforms from 229 TexNet stations have been provided in the repository under the `230_stations_1_min_dt.zip` file.
130
-
131
- ### **Step 1: Unzip the Sample Wavefrom Data**
132
- After downloading the `.zip` file through the GitHub methods above, run:
133
- ```sh
134
- unzip 230_stations_1_min_dt.zip
135
- ```
136
- ### **Step 2: Check and Understand the Directory Structure**
137
- The extracted data will contain a timechunk subdirectories, comprised of multiple station directories:
138
- ```sh
139
- [skevofilaxc 230_stations_1_min_dt]$ ls
140
- 20241215T120000Z_20241215T120100Z
141
- [skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
142
- 237B BP01 CT02 DG02 DG10 EE04 EF07 EF54 EF63 EF69 EF77 FOAK3 FW06 FW14 HBVL LWM2 MB05 MB12 MB19 MBBB3 MID03 NM01 OG02 PB05 PB11 PB19 PB26 PB34 PB41 PB51 PB57 PH03 SA06 SGCY SN02 SN10 WB03 WB09 YK01
143
- 435B BRDY CV01 DG04 DKNS EF02 EF08 EF56 EF64 EF71 ELG6 FOAK4 FW07 FW15 HNDO LWM3 MB06 MB13 MB21 MBBB5 MLDN NM02 OG04 PB06 PB12 PB21 PB28 PB35 PB42 PB52 PB58 PL01 SA07 SM01 SN03 SNAG WB04 WB10
144
- ALPN BW01 CW01 DG05 DRIO EF03 EF09 EF58 EF65 EF72 ET02 FW01 FW09 GV01 HP01 MB01 MB07 MB15 MB22 MBBB6 MNHN NM03 OZNA PB07 PB14 PB22 PB29 PB37 PB43 PB53 PB59 PLPT SA09 SM02 SN04 TREL WB05 WB11
145
- APMT CF01 DB02 DG06 DRZT EF04 EF51 EF59 EF66 EF74 FLRS FW02 FW11 GV02 HP02 MB02 MB08 MB16 MB25 MG01 MO01 ODSA PB01 PB08 PB16 PB23 PB30 PB38 PB44 PB54 PCOS POST SAND SM03 SN07 VHRN WB06 WB12
146
- AT01 CRHG DB03 DG07 EE02 EF05 EF52 EF61 EF67 EF75 FOAK1 FW04 FW12 GV03 INDO MB03 MB09 MB17 MBBB1 MID01 NGL01 OE01 PB03 PB09 PB17 PB24 PB32 PB39 PB46 PB55 PECS SA02 SD01 SM04 SN08 VW01 WB07 WTFS
147
- BB01 CT01 DB04 DG09 EE03 EF06 EF53 EF62 EF68 EF76 FOAK2 FW05 FW13 GV04 LWM1 MB04 MB11 MB18 MBBB2 MID02 NGL02 OG01 PB04 PB10 PB18 PB25 PB33 PB40 PB47 PB56 PH02 SA04 SE01 SMWD SN09 WB02 WB08 WW01
148
- ```
149
- Each subdirectory contains **mSEED** files of different waveform components:
150
- ```sh
151
- [skevofilaxc PB35]$ ls
152
- TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
153
- TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
154
- ```
155
- EQCCT (i.e., the ML model) requires at least one pose per station for detection, but using multiple poses enhances P and S wave directionality.
156
-
157
- You have successfully installed EQCCTPro and set up the required sample waveform dataset for testing.
158
-
159
- ---
160
- ### **Using EQCCTPro**
161
- There are three main capabilities of EQCCTPro:
162
- 1. **Process mSEED data from singular or multiple seismic stations using either CPUs or GPUs**
163
- 2. **Evaluate your system to identify the optimal parallelization configurations needed to get the minimum runtime performance out of your system**
164
- 3. **Identify and return back the optimal parallelization configurations for both specific and general-use usecases for both CPU (a) and GPU applications (b)**
165
-
166
-
167
- These capabilities are achieved using the following core functions:
168
-
169
- - **EQCCTMSeedRunner** (for processing mSEED data)
170
-
171
- - **EvaluateSystem** (for system evaluation)
172
-
173
- - **OptimalCPUConfigurationFinder** (for CPU configuration optimization)
174
-
175
- - **OptimalGPUConfigurationFinder** (for GPU configuration optimization)
176
-
177
- ---
178
- ### **Processing mSEED data using EQCCTPro (EQCCTMSeedRunner)**
179
- To process mSEED from various seismic stations, use the **EQCCTMSeedRunner** class.
180
- **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory, which consists of station directories formatted as follows:
181
-
182
- ```sh
183
- [skevofilaxc 230_stations_1_min_dt]$ ls
184
- 20241215T120000Z_20241215T120100Z
185
- [skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
186
- 237B BP01 CT02 DG02 DG10 EE04 EF07 EF54 EF63 EF69 EF77 FOAK3 FW06 FW14 HBVL LWM2 MB05 MB12 MB19 MBBB3 MID03 NM01 OG02 PB05 PB11 PB19 PB26 PB34 PB41 PB51 PB57 PH03 SA06 SGCY SN02 SN10 WB03 WB09 YK01
187
- 435B BRDY CV01 DG04 DKNS EF02 EF08 EF56 EF64 EF71 ELG6 FOAK4 FW07 FW15 HNDO LWM3 MB06 MB13 MB21 MBBB5 MLDN NM02 OG04 PB06 PB12 PB21 PB28 PB35 PB42 PB52 PB58 PL01 SA07 SM01 SN03 SNAG WB04 WB10
188
- ALPN BW01 CW01 DG05 DRIO EF03 EF09 EF58 EF65 EF72 ET02 FW01 FW09 GV01 HP01 MB01 MB07 MB15 MB22 MBBB6 MNHN NM03 OZNA PB07 PB14 PB22 PB29 PB37 PB43 PB53 PB59 PLPT SA09 SM02 SN04 TREL WB05 WB11
189
- APMT CF01 DB02 DG06 DRZT EF04 EF51 EF59 EF66 EF74 FLRS FW02 FW11 GV02 HP02 MB02 MB08 MB16 MB25 MG01 MO01 ODSA PB01 PB08 PB16 PB23 PB30 PB38 PB44 PB54 PCOS POST SAND SM03 SN07 VHRN WB06 WB12
190
- AT01 CRHG DB03 DG07 EE02 EF05 EF52 EF61 EF67 EF75 FOAK1 FW04 FW12 GV03 INDO MB03 MB09 MB17 MBBB1 MID01 NGL01 OE01 PB03 PB09 PB17 PB24 PB32 PB39 PB46 PB55 PECS SA02 SD01 SM04 SN08 VW01 WB07 WTFS
191
- BB01 CT01 DB04 DG09 EE03 EF06 EF53 EF62 EF68 EF76 FOAK2 FW05 FW13 GV04 LWM1 MB04 MB11 MB18 MBBB2 MID02 NGL02 OG01 PB04 PB10 PB18 PB25 PB33 PB40 PB47 PB56 PH02 SA04 SE01 SMWD SN09 WB02 WB08 WW01
192
- ```
193
- Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming conventions.** Otherwise, EQCCTPro will **not** work.
194
- Create subdirectories for each timechunk (sub-parent directories) and for each station (child directories). The station directories should be named as shown above.
195
- Each timechunk directory spans from the **start of the analysis period minus the waveform overlap** to the **end of the analysis period**, based on the defined timechunk duration.
196
-
197
- For example:
198
- ```sh
199
- [skevofilaxc 230_stations_2hr_1_hr_dt]$ ls
200
- 20241215T115800Z_20241215T130000Z 20241215T125800Z_20241215T140000Z
201
- ```
202
- The timechunk time length is 1 hour long. At the same time, we use a waveform overlap of 2 minutes. Hence: `20241215T115800Z_20241215T130000Z` spans from `11:58:00 to 13:00:00 UTC on Dec 15, 2024` and `20241215T125800Z_20241215T140000Z` spans from `12:58:00 to 14:00:00 UTC on Dec 15, 2024`
203
-
204
-
205
- Each station subdirectory, such as PB35, are made up of mSEED files from seismometer different poses (EX. N, E, Z):
206
- ```sh
207
- [skevofilaxc PB35]$ ls
208
- TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
209
- TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
210
- ```
211
- EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
212
-
213
- After setting up or utilizing the provided sample waveform directory, and install eqcctpro, import **EQCCTMseedRunner** as show below:
214
-
215
- ```python
216
- from eqcctpro import EQCCTMSeedRunner
217
-
218
- eqcct_runner = EQCCTMSeedRunner(
219
- use_gpu=False,
220
- intra_threads=1,
221
- inter_threads=1,
222
- cpu_id_list=[0,1,2,3,4],
223
- input_dir='/path/to/mseed',
224
- output_dir='/path/to/outputs',
225
- log_filepath='/path/to/outputs/eqcctpro.log',
226
- P_threshold=0.001,
227
- S_threshold=0.02,
228
- p_model_filepath='/path/to/model_p.h5',
229
- s_model_filepath='/path/to/model_s.h5',
230
- number_of_concurrent_station_predictions=5,
231
- number_of_concurrent_timechunk_predictions=2
232
- best_usecase_config=True,
233
- csv_dir='/path/to/csv',
234
- selected_gpus=[0],
235
- set_vram_mb=24750,
236
- specific_stations='AT01, BP01, DG05',
237
- start_time='2024-12-14 12:00:00',
238
- end_time='2024-12-15 12:00:00',
239
- timechunk_dt=1,
240
- waveform_overlap=2)
241
-
242
- eqcct_runner.run_eqcctpro()
243
- ```
244
-
245
- **EQCCTMseedRunner** has multiple input parameters that need to be configured and are defined below:
246
-
247
- - **`use_gpu (bool)`: True or False**
248
- - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
249
- - Further specification of which GPU(s) and CPU(s) are provided in the parameters below
250
- - **`intra_threads (int)`: default = 1**
251
- - Controls how many intra-parallelism threads Tensorflow can use
252
- - **`inter_threads (int)`: default = 1**
253
- - Controls how many inter-parallelism threads Tensorflow can use
254
- - **`cpu_id_list (list)`: default = [1]**
255
- - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
256
- - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
257
- - "I want this program to run only on these specific cores."
258
- - **`input_dir (str)`**
259
- - Directory path to the the mSEED directory
260
- - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt`
261
- - **`output_dir (str)`**
262
- - Directory path to where the output picks and logs will be sent
263
- - Doesn't need to exist, will be created if doesn't exist
264
- - Recommended to be in the same working directory as the input directory for convience
265
- - **`log_filepath (str)`**
266
- - Filepath to where the EQCCTPro log will be written to and stored
267
- - Doesn't need to exist, will be created if doesn't exist
268
- - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
269
- - **`P_threshold (float)`: default = 0.001**
270
- - Threshold in which the P probabilities above it will be considered as P arrival
271
- - **`S_threshold (float)`: default = 0.02**
272
- - Threshold in which the S probabilities above it will be considered as S arrival
273
- - **`p_model_filepath (str)`**
274
- - Filepath to where the P EQCCT detection model is stored
275
- - **`s_model_filepath (str)`**
276
- - Filepath to where the S EQCCT detection model is stored
277
- - **`number_of_concurrent_station_predictions (int)`**
278
- - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
279
- - EX. if number_of_concurrent_station_predictions = 5, up to 5 EQCCT instances can simultaneously analyze waveforms from 5 distinct seismic stations
280
- - To use the optimal parameter value for this param, use the **EvaluateSystem** class (can be found below)
281
- - **`number_of_concurrent_timechunk_predictions (int)`: default = None**
282
- - The number of timechunks running in parallel
283
- - Avoids the sequential processing of timechunks by processing multiple timechunks in parallel, exponetially reducing runtime
284
- - **`best_usecase_config (bool)`: default = False**
285
- - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall use-case configurations
286
- - Best overall use-case configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
287
- - Can only be used if EvaluateSystem has been run
288
- - **`csv_dir (str)`**
289
- - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
290
- - Script will look for specific files, will only exist if EvaluateSystem has been run
291
- - **`selected_gpus (list)`: default = None**
292
- - List of GPU IDs on your computer you want to use if `use_gpu = True`
293
- - None existing GPU IDs will cause the code to exit
294
- - **`set_vram_mb (float)`**
295
- - Value of the maximum amount of VRAM EQCCTPro can use
296
- - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
297
- - **`specific_stations (str)`: default = None**
298
- - String that contains the "list" of stations you want to only analyze
299
- - EX. Out of the 50 sample stations in `230_stations_1_min_dt`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
300
- - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
301
- - **`start_time (str)`: default = None**
302
- - The start time of the area of time that is being analyzed
303
- - EX. 2024-12-14 12:00:00
304
- - Must follow the following convention YYYY-MO-DA HR:MI:SC
305
- - Used to create a list of defined timechunks from the defined analysis timeframe
306
- - Must be the exact start time of the analysis time period (does not include the prior waveform overlap time IE. 2024-12-15 11:58:00 for a 2 minute waveform overlap time)
307
- - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
308
- - **`end_time (str)`: default = None**
309
- - The end time of the area of time that is being analyzed
310
- - EX. 2024-12-15 12:00:00
311
- - Must follow the following convention YYYY-MO-DA HR:MI:SC
312
- - Used to create a list of defined timechunks from the defined analysis timeframe
313
- - Must be the exact end time of the analysis time period
314
- - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
315
- - **`timechunk_dt (int)`: default = None**
316
- - The length each time chunk is (in minutes)
317
- - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
318
- - **`waveform_overlap (int)`: default = None**
319
- - The duration (in minutes) for which each waveform overlaps with the others
320
-
321
-
322
- ---
323
-
324
- ### **Evaluating Your Systems Runtime Performance Capabilites**
325
- To evaluate your system’s runtime performance capabilites for both your CPU(s) and GPU(s), the **EvaluateSystem** class allows you to autonomously evaluate your system:
326
-
327
- ```python
328
- from eqcctpro import EvaluateSystem
329
-
330
- eval_gpu = EvaluateSystem(
331
- mode='gpu',
332
- intra_threads=1,
333
- inter_threads=1,
334
- input_dir='/path/to/mseed',
335
- output_dir='/path/to/outputs',
336
- log_filepath='/path/to/outputs/eqcctpro.log',
337
- csv_dir='/path/to/csv',
338
- P_threshold=0.001,
339
- S_threshold=0.02,
340
- p_model_filepath='/path/to/model_p.h5',
341
- s_model_filepath='/path/to/model_s.h5',
342
- cpu_id_list=[0,1],
343
- set_vram_mb=24750,
344
- selected_gpus=[0],
345
- stations2use=2
346
- )
347
- eval_gpu.evaluate()
348
- ```
349
-
350
- ```python
351
- from eqcctpro import EvaluateSystem
352
-
353
- eval_cpu = EvaluateSystem(
354
- mode='cpu',
355
- intra_threads=1,
356
- inter_threads=1,
357
- input_dir='/path/to/mseed',
358
- output_dir='/path/to/outputs',
359
- log_filepath='/path/to/outputs/eqcctpro.log',
360
- csv_dir='/path/to/csv',
361
- P_threshold=0.001,
362
- S_threshold=0.02,
363
- p_model_filepath='/path/to/model_p.h5',
364
- s_model_filepath='/path/to/model_s.h5',
365
- cpu_id_list=range(0,49),
366
- min_cpu_amount=20,
367
- cpu_test_step_size=1,
368
- stations2use=50,
369
- starting_amount_of_stations=25,
370
- station_list_step_size=1,
371
- min_conc_stations=25,
372
- conc_station_tasks_step_size=5,
373
- start_time='2024-12-15 12:00:00',
374
- end_time='2024-12-15 14:00:00',
375
- conc_timechunk_tasks_step_size=1,
376
- timechunk_dt=30,
377
- waveform_overlap=2,
378
- tmp_dir=tmp_dir)
379
- eval_cpu.evaluate()
380
- ```
381
-
382
- **EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Timechunk and Station Tasks, as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
383
- **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
384
- the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
385
-
386
- The following input parameters need to be configurated for **EvaluateSystem** to evaluate your system based on your desired utilization of EQCCTPro:
387
-
388
- - **`mode (str)`**
389
- - Can be either `cpu` or `gpu`
390
- - Tells `EvaluateSystem` which computing approach the trials should it iterate with
391
- - **`intra_threads (int)`: default = 1**
392
- - Controls how many intra-parallelism threads Tensorflow can use
393
- - **`inter_threads (int)`: default = 1**
394
- - Controls how many inter-parallelism threads Tensorflow can use
395
- - **`input_dir (str)`**
396
- - Directory path to the the mSEED directory
397
- - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/230_stations_1_min_dt
398
- - **`output_dir (str)`**
399
- - Directory path to where the output picks and logs will be sent
400
- - Doesn't need to exist, will be created if doesn't exist
401
- - Recommended to be in the same working directory as the input directory for convience
402
- - **`log_filepath (str)`**
403
- - Filepath to where the EQCCTPro log will be written to and stored
404
- - Doesn't need to exist, will be created if doesn't exist
405
- - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
406
- - **`csv_dir (str)`**
407
- - Directory path where the CSV's outputted by EvaluateSystem will be saved
408
- - Doesn't need to exist, will be created if doesn't exist
409
- - **`P_threshold (float)`: default = 0.001**
410
- - Threshold in which the P probabilities above it will be considered as P arrival
411
- - **`S_threshold (float)`: default = 0.02**
412
- - Threshold in which the S probabilities above it will be considered as S arrival
413
- - **`p_model_filepath (str)`**
414
- - Filepath to where the P EQCCT detection model is stored
415
- - **`s_model_filepath (str)`**
416
- - Filepath to where the S EQCCT detection model is stored
417
- - **`cpu_id_list (list)`: default = [1]**
418
- - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
419
- - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
420
- - "I want this program to run only on these specific cores."
421
- - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
422
- - **`min_cpu_amount (int)`: default = 1**
423
- - Is the minimum amount of CPUs you want to start your trials with
424
- - By default, trials will start iterating with 1 CPU up to the maximum allocated
425
- - Can now set a value as the starting point, such as 15 CPUs up to the maximum of for instance 25
426
- - **`cpu_test_step_size`: default = 1**
427
- - Is the desired step size for the trials will march from `min_cpu_amount` to `len(cpu_id_list)`
428
- - **`stations2use (int)`: default = None**
429
- - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
430
- - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
431
- - **`starting_amount_of_stations (int)`: default = 1**
432
- - For evaluating your system, you have the option to set a starting amount of stations you want to use in the test
433
- - By default, the test will start using 1 station but now is configurable
434
- - **`station_list_step_size (int)`: default = 1**
435
- - You can set a step size for the station list that is generated
436
- - For example if the stepsize is set to 10 and you start with 50 stations with a max of 100, then your list would be: [50, 60, 70, 80, 80, 100]
437
- - Using 1 will use the default step size of 1-10, then step size of 5 up to station2use
438
- - **`min_conc_stations (int)`: default = 1**
439
- - Is the minimum amount of concurrent stations predictions you want each trial iteration to start with
440
- - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50].
441
- - **`conc_station_tasks_step_size (int)`: default = 1**
442
- - Is the concurrent station predictions step size you want each trial iteration to iterate with
443
- - By default, if `min_conc_predictions` and `conc_predictions_step_size` are set to 1, a custom step size iteration will be applied to test the 50 sample waveforms. The sequence follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, n+5, 50]
444
- - **`start_time (str)`: default = None**
445
- - The start time of the area of time that is being analyzed
446
- - EX. 2024-12-14 12:00:00
447
- - Must follow the following convention YYYY-MO-DA HR:MI:SC
448
- - Used to create a list of defined timechunks from the defined analysis timeframe
449
- - Must be the exact start time of the analysis time period (does not include the prior waveform overlap time IE. 2024-12-15 11:58:00 for a 2 minute waveform overlap time)
450
- - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
451
- - **`end_time (str)`: default = None**
452
- - The end time of the area of time that is being analyzed
453
- - EX. 2024-12-15 12:00:00
454
- - Must follow the following convention YYYY-MO-DA HR:MI:SC
455
- - Used to create a list of defined timechunks from the defined analysis timeframe
456
- - Must be the exact end time of the analysis time period
457
- - Also used in the EvaluateSystem() class to help users note the analysis timeframe in the results CSV file for future result review
458
- - **`conc_timechunk_tasks_step_size (int)`: default = 1**
459
- - Is the concurrent timechunk predictions step size you want each trial iteration to iterate with
460
- - **`timechunk_dt (int)`: default = None**
461
- - The length each time chunk is (in minutes)
462
- - EX. timechunk_dt = 10 and the analysis period is 30 minutes, then three 10-minute long timechunks will be created
463
- - **`waveform_overlap (int)`: default = None**
464
- - The duration (in minutes) for which each waveform oself.start_timeverlaps with the others
465
- - **`tmp_dir (str)`: default = 1**
466
- - A temporary directory to store all temp files produced by EQCCTPro
467
- - Used to help ease system cleanup and to not write to system's default temporary directory
468
- - **`set_vram_mb (float)`**
469
- - Value of the maximum amount of VRAM EQCCTPro can use
470
- - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError
471
- - **`selected_gpus (list)`: default = None**
472
- - List of GPU IDs on your computer you want to use if `mode = 'gpu'`
473
- - Non-existing GPU IDs will cause the code to exit
474
-
475
-
476
- ---
477
- ### **Finding Optimal CPU/GPU Configurations**
478
- After running **EvalutateSystem**, you can use either the **OptimalCPUConfigurationFinder** or the **OptimalGPUConfigurationFinder** determine the best CPU or GPU configurations (respectively) for your specific usecase:
479
-
480
- ```python
481
- from eqcctpro import OptimalCPUConfigurationFinder, OptimalGPUConfigurationFinder
482
-
483
- csv_filepath = '/path/to/csv'
484
-
485
- cpu_finder = OptimalCPUConfigurationFinder(csv_filepath)
486
- best_cpu_config = cpu_finder.find_best_overall_usecase()
487
- print(best_cpu_config)
488
-
489
- optimal_cpu_config = cpu_finder.find_optimal_for(cpu=3, station_count=2)
490
- print(optimal_cpu_config)
491
-
492
- gpu_finder = OptimalGPUConfigurationFinder(csv_filepath)
493
- best_gpu_config = gpu_finder.find_best_overall_usecase()
494
- print(best_gpu_config)
495
-
496
- optimal_gpu_config = gpu_finder.find_optimal_for(num_cpus=1, gpu_list=[0], station_count=1)
497
- print(optimal_gpu_config)
498
- ```
499
- Both **OptimalCPUConfigurationFinder** and **OptimalGPUConfigurationFinder** each have two usecases:
500
-
501
- 1. **`find_best_overall_usecase`**
502
- - Returns the best overall usecase configuration
503
- - Uses middle 50% of CPUs for moderate, balanced CPU usage, with the maximum amount of stations processed with the minimum runtime
504
- 2. **`find_optimal_for`**
505
- - Return the paralleliztion configurations (EX. concurrent predictions, intra/inter thread counts, vram, etc.) for a given number of CPU(s)/GPU(s) and stations
506
- - Enables users to quickly identify which input parameters should be used for the given amount of resources and workload they have for the minimum runtime possible on their computer
507
-
508
- A input CSV directory path must be passed for the classes to use as a reference point:
509
- - **`csv_filepath (str)`**
510
- - Directory path where the CSV's outputted by EvaluateSystem are
511
-
512
- Using **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, no input parameters are needed. It will return back the best usecase parameters.
513
-
514
- For **OptimalCPUConfigurationFinder.find_optimal_for()**, the function requires two input parameters:
515
- - **`cpu (int)`**
516
- - The number of CPU(s) you want to use in your application
517
- - **`station_count (int)`**
518
- - The number of station(s) you want to use in your application
519
-
520
- **OptimalCPUConfigurationFinder.find_optimal_for()** will return back a trial data point containing the mimimum runtime based on your input paramters
521
-
522
- Similar to **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, **OptimalGPUConfigurationFinder.find_best_overall_usecase()** will return back the best usecase parameters and no input parameters are needed.
523
-
524
- For **OptimalGPUConfigurationFinder.find_optimal_for()**, the function requires three input parameters:
525
- - **`cpu (int)`**
526
- - The number of CPU(s) you want to use in your application
527
- - **`gpu_list (list)`**
528
- - The specific GPU ID(s) you want to use in your application
529
- - Useful if you have multiple GPUs available and want to use/dedicate a specific one to using EQCCTPro
530
- - **`station_count (int)`**
531
- - The number of station(s) you want to use in your application
532
-
533
- ## **Configuration**
534
- The `environment.yml` file specifies the dependencies required to run EQCCTPro. Ensure you have the correct versions installed by using the provided conda environment setup.
535
-
536
- ## **License**
537
- EQCCTPro is provided under an open-source license. See LICENSE for details.
538
-
539
- ## **Contact**
540
- For inquiries or issues, please contact constantinos.skevofilax@austin.utexas.edu or victor.salles@beg.utexas.edu.
541
-
@@ -1,5 +0,0 @@
1
- eqcctpro/__init__.py,sha256=JK27ZrLxVDNHsdorp7UAislI8haH23rZhnEivVM7hgA,141
2
- eqcctpro-0.6.2.dist-info/METADATA,sha256=lAvGe6qc2skJvJ65-4cMjS_ad8PxCN93vBvglb8XDmc,31595
3
- eqcctpro-0.6.2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
4
- eqcctpro-0.6.2.dist-info/top_level.txt,sha256=u0cu2JdF7Z0ob7y4XdUCLoSGp_xOudAYz-fbsQ-B1yY,9
5
- eqcctpro-0.6.2.dist-info/RECORD,,