eqcctpro 0.3__tar.gz → 0.4.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of eqcctpro might be problematic. Click here for more details.

@@ -0,0 +1,340 @@
1
+ Metadata-Version: 2.2
2
+ Name: eqcctpro
3
+ Version: 0.4.1
4
+ Description-Content-Type: text/markdown
5
+ Requires-Dist: numpy==1.26.4
6
+ Requires-Dist: pandas==2.2.3
7
+ Requires-Dist: matplotlib==3.10.0
8
+ Requires-Dist: obspy==1.4.1
9
+ Requires-Dist: progress==1.6
10
+ Requires-Dist: psutil==6.1.1
11
+ Requires-Dist: ray==2.42.1
12
+ Requires-Dist: schedule==1.2.2
13
+ Requires-Dist: sdnotify==0.3.2
14
+ Requires-Dist: tensorflow<2.19,>=2.15
15
+ Requires-Dist: tensorflow-estimator<2.19,>=2.15
16
+ Requires-Dist: tensorflow-io-gcs-filesystem==0.37.1
17
+ Requires-Dist: tensorboard==2.15.2
18
+ Requires-Dist: tensorboard-data-server==0.7.2
19
+ Requires-Dist: silence-tensorflow==1.2.3
20
+ Requires-Dist: scipy==1.15.1
21
+ Requires-Dist: protobuf==4.25.6
22
+ Requires-Dist: grpcio==1.70.0
23
+ Requires-Dist: absl-py==2.1.0
24
+ Requires-Dist: h5py==3.12.1
25
+ Requires-Dist: pynvml==12.0.0
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: requires-dist
29
+
30
+ # EQCCTPro: powerful seismic event detection toolkit
31
+
32
+ EQCCTPro is a high-performace seismic event detection and processing framework that leverages EQCCT to process seismic data efficiently. It enables users to fully leverage the computational ability of their computing resources for maximum performance for simultaneous seismic waveform processing, achieving real-time performance by identifying and utilizing the optimal computational configurations for their hardware. More information about the development, capabilities, and real-world applications about EQCCTPro can be read about in our research publication here.
33
+
34
+ ## Features
35
+ - Supports both CPU and GPU execution
36
+ - Configurable parallelism execution for optimized performance
37
+ - Includes tools for evaluating system performance for optimal usecase configurations
38
+ - Automatic selection of best-usecase configurations
39
+ - Efficient handling of large-scale seismic data
40
+
41
+ ## Installation
42
+ To install the necessary dependencies, create a conda environment using:
43
+
44
+ ```sh
45
+ [skevofilaxc] conda env create -f environment.yml
46
+ [skevofilaxc] conda activate eqcctpro
47
+ ```
48
+
49
+ You can get the `environment.yml` file from the `eqcct` repository. You can download the entire eqcct repository or download only the `eqcctpro` repository using the following command:
50
+
51
+ ```sh
52
+ [skevofilaxc] mkdir my_work_directory
53
+ [skevofilaxc] cd my_work_directory
54
+ [skevofilaxc] git clone --depth 1 --filter=tree:0 https://github.com/ut-beg-texnet/eqcct.git --sparse
55
+ [skevofilaxc] cd eqcct
56
+ [skevofilaxc] git sparse-checkout set eqcctpro
57
+ ```
58
+ After creating and activating the conda environment, install the **eqcctpro Python package** using the following command:
59
+ ```sh
60
+ [skevofilaxc] pip install eqcctpro
61
+ ```
62
+ More information on the package can be found at our PyPi project link [eqcctpro](https://pypi.org/project/eqcctpro/).
63
+
64
+ ## Creating a Test Workspace Environment
65
+ It's highly suggested to create a workspace environment to first understand how eqcctpro works.
66
+ Sample seismic waveform data from 50 TexNet stations have provided in the eqcctpro repository under the `sample_1_minute_data.zip` file.
67
+
68
+ After downloading the .zip file, either individually or through the git pull methods, run the following command to unzip it:
69
+ ```sh
70
+ [skevofilaxc] unzip sample_1_minute_data.zip
71
+ ```
72
+ It's contents will look like:
73
+ ```sh
74
+ [skevofilaxc sample_1_minute_data]$ ls
75
+ AT01 CF01 DG05 EF54 EF76 HBVL MB09 MB21 MID02 ODSA PB16 PB25 PB35 PB52 PH02 SM03 WB11
76
+ BB01 CT02 DG09 EF63 FOAK4 HNDO MB13 MB25 MID03 PB04 PB17 PB26 PB39 PB54 PL01 SMWD WB12
77
+ BP01 DB02 EF02 EF75 FW13 MB06 MB19 MID01 MO01 PB11 PB18 PB34 PB42 PECS SM02 WB06
78
+ ```
79
+ Where each subdirectory is named after station code, and is made up of mSEED files of different poses:
80
+ ```sh
81
+ [skevofilaxc PB35]$ ls
82
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
83
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
84
+ ```
85
+ EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
86
+
87
+ You are now set up for testing.
88
+ ## Usage
89
+ There are three main capabilities of EQCCTPro:
90
+ 1. Process mSEED data from singular or multiple seismic stations using either CPUs or GPUs
91
+ 2. Evaluate your system to identify the optimal parallelization configurations needed to get the minimum runtime performance out of your system
92
+ 3. Identify and return back the optimal parallelization configurations for both specific and general-use usecases for both CPU (a) and GPU applications (b)
93
+
94
+ These capabilities are achieved by the following functions in order respect to the above descriptions:
95
+ **EQCCTMSeedRunner (1)**, **EvaluateSystem (2)**, **OptimalCPUConfigurationFinder (3a)**, **OptimalGPUConfigurationFinder (3b)**.
96
+
97
+ ### Processing mSEED data using EQCCTPro (EQCCTMSeedRunner)
98
+ To use EQCCTPro to process mSEED from various seismic stations, use the **EQCCTMSeedRunner** class.
99
+ **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory. The input directory is made up of station directories such as:
100
+
101
+ ```sh
102
+ [skevofilaxc sample_1_minute_data]$ ls
103
+ AT01 CF01 DG05 EF54 EF76 HBVL MB09 MB21 MID02 ODSA PB16 PB25 PB35 PB52 PH02 SM03 WB11
104
+ BB01 CT02 DG09 EF63 FOAK4 HNDO MB13 MB25 MID03 PB04 PB17 PB26 PB39 PB54 PL01 SMWD WB12
105
+ BP01 DB02 EF02 EF75 FW13 MB06 MB19 MID01 MO01 PB11 PB18 PB34 PB42 PECS SM02 WB06
106
+ ```
107
+ Where each subdirectory is named after station code. If you wish to use create your own input directory with custom information, **please follow the above naming convention.** Otherwise, EQCCTPro will **not** work.
108
+
109
+ Within each subdirectory, such as PB35, it is made up of mSEED files of different poses (EX. N, E, Z):
110
+ ```sh
111
+ [skevofilaxc PB35]$ ls
112
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
113
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
114
+ ```
115
+ EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
116
+
117
+ After setting up or utilizing the provided sample waveform directory, and install eqcctpro, import **EQCCTMseedRunner** as show below:
118
+
119
+ ```python
120
+ from eqcctpro import EQCCTMSeedRunner
121
+
122
+ eqcct_runner = EQCCTMSeedRunner(
123
+ use_gpu=False,
124
+ intra_threads=1,
125
+ inter_threads=1,
126
+ cpu_id_list=[0,1,2,3,4],
127
+ input_dir='/path/to/mseed',
128
+ output_dir='/path/to/outputs',
129
+ log_filepath='/path/to/outputs/eqcctpro.log',
130
+ P_threshold=0.001,
131
+ S_threshold=0.02,
132
+ p_model_filepath='/path/to/model_p.h5',
133
+ s_model_filepath='/path/to/model_s.h5',
134
+ number_of_concurrent_predictions=5,
135
+ best_usecase_config=True,
136
+ csv_dir='/path/to/csv',
137
+ selected_gpus=[0],
138
+ set_vram_mb=24750,
139
+ specific_stations='AT01, BP01, DG05'
140
+ )
141
+ eqcct_runner.run_eqcctpro()
142
+ ```
143
+
144
+ **EQCCTMseedRunner** has multiple input paramters that need to be configured and are defined below:
145
+
146
+ - **`use_gpu (bool)`: True or False**
147
+ - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
148
+ - Further specification of which GPU(s) and CPU(s) are provided in the parameters below
149
+ - **`intra_threads (int)`: default = 1**
150
+ - Controls how many intra-parallelism threads Tensorflow can use
151
+ - **`inter_threads (int)`: default = 1**
152
+ - Controls how many inter-parallelism threads Tensorflow can use
153
+ - **`cpu_id_list (list)`: default = [1]**
154
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
155
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
156
+ - "I want this program to run only on these specific cores."
157
+ - **`input_dir (str)`**
158
+ - Directory path to the the mSEED directory
159
+ - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data`
160
+ - **`output_dir (str)`**
161
+ - Directory path to where the output picks and logs will be sent
162
+ - Doesn't need to exist, will be created if doesn't exist
163
+ - Recommended to be in the same working directory as the input directory for convience
164
+ - **`log_filepath (str)`**
165
+ - Filepath to where the EQCCTPro log will be written to and stored
166
+ - Doesn't need to exist, will be created if doesn't exist
167
+ - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
168
+ - **`P_threshold (float)`: default = 0.001**
169
+ - Threshold in which the P probabilities above it will be considered as P arrival
170
+ - **`S_threshold (float)`: default = 0.02**
171
+ - Threshold in which the S probabilities above it will be considered as S arrival
172
+ - **`p_model_filepath (str)`**
173
+ - Filepath to where the P EQCCT detection model is stored
174
+ - **`s_model_filepath (str)`**
175
+ - Filepath to where the S EQCCT detection model is stored
176
+ - **`number_of_concurrent_predictions (int)`**
177
+ - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
178
+ - EX. if number_of_concurrent_predictions = 5, there will be up to 5 EQCCT instances analyzing 5 different waveforms at the sametime
179
+ - Best to use the optimal amount for your hardware, which can be identified using **EvaluateSystem** (below)
180
+ - **`best_usecase_config (bool)`: default = False**
181
+ - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall usecase configurations
182
+ - Best overall usecase configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
183
+ - Can only be used if EvaluateSystem has been run
184
+ - **`csv_dir (str)`**
185
+ - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
186
+ - Script will look for specific files, will only exist if EvaluateSystem has been run
187
+ - **`selected_gpus (list)`: default = None**
188
+ - List of GPU IDs on your computer you want to use if `use_gpu = True`
189
+ - None existing GPU IDs will cause the code to exit
190
+ - **`set_vram_mb (float)`**
191
+ - Value of the maximum amount of VRAM EQCCTPro can use
192
+ - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
193
+ - **`specific_stations (str)`: default = None**
194
+ - String that contains the "list" of stations you want to only analyze
195
+ - EX. Out of the 50 sample stations in `sample_1_minute_data`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
196
+ - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
197
+ - **`cpu_id_list (list)`: default = [1]**
198
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
199
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
200
+ - "I want this program to run only on these specific cores."
201
+ ### Evaluating Your Systems Runtime Performance Capabilites
202
+ To evaluate your system’s runtime performance capabilites for both your CPU(s) and GPU(s), the **EvaluateSystem** class allows you to autonomously evaluate your system:
203
+
204
+ ```python
205
+ from eqcctpro import EvaluateSystem
206
+
207
+ eval_gpu = EvaluateSystem(
208
+ mode='gpu',
209
+ intra_threads=1,
210
+ inter_threads=1,
211
+ input_dir='/path/to/mseed',
212
+ output_dir='/path/to/outputs',
213
+ log_filepath='/path/to/outputs/eqcctpro.log',
214
+ csv_dir='/path/to/csv',
215
+ P_threshold=0.001,
216
+ S_threshold=0.02,
217
+ p_model_filepath='/path/to/model_p.h5',
218
+ s_model_filepath='/path/to/model_s.h5',
219
+ stations2use=2,
220
+ cpu_id_list=[0,1],
221
+ set_vram_mb=24750,
222
+ selected_gpus=[0]
223
+ )
224
+ eval_gpu.evaluate()
225
+ ```
226
+ **EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Predictions, and Workloads (stations), as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
227
+ **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
228
+ the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
229
+
230
+ The following input parameters need to be configurated for **EvaluateSystem** to evaluate your system based on your desired utilization of EQCCTPro:
231
+
232
+ - **`mode (str)`**
233
+ - Can be either `cpu` or `gpu`
234
+ - Tells `EvaluateSystem` which configuration trials should it iterate through
235
+ - **`intra_threads (int)`: default = 1**
236
+ - Controls how many intra-parallelism threads Tensorflow can use
237
+ - **`inter_threads (int)`: default = 1**
238
+ - Controls how many inter-parallelism threads Tensorflow can use
239
+ - **`input_dir (str)`**
240
+ - Directory path to the the mSEED directory
241
+ - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data
242
+ - **`output_dir (str)`**
243
+ - Directory path to where the output picks and logs will be sent
244
+ - Doesn't need to exist, will be created if doesn't exist
245
+ - Recommended to be in the same working directory as the input directory for convience
246
+ - **`log_filepath (str)`**
247
+ - Filepath to where the EQCCTPro log will be written to and stored
248
+ - Doesn't need to exist, will be created if doesn't exist
249
+ - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
250
+ - **`csv_dir (str)`**
251
+ - Directory path where the CSV's outputted by EvaluateSystem will be saved
252
+ - Doesn't need to exist, will be created if doesn't exist
253
+ - **`P_threshold (float)`: default = 0.001**
254
+ - Threshold in which the P probabilities above it will be considered as P arrival
255
+ - **`S_threshold (float)`: default = 0.02**
256
+ - Threshold in which the S probabilities above it will be considered as S arrival
257
+ - **`p_model_filepath (str)`**
258
+ - Filepath to where the P EQCCT detection model is stored
259
+ - **`s_model_filepath (str)`**
260
+ - Filepath to where the S EQCCT detection model is stored
261
+ - **`stations2use (int)`: default = None**
262
+ - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
263
+ - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
264
+ - **`cpu_id_list (list)`: default = [1]**
265
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
266
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
267
+ - "I want this program to run only on these specific cores."
268
+ - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
269
+ - **`set_vram_mb (float)`**
270
+ - Value of the maximum amount of VRAM EQCCTPro can use
271
+ - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError
272
+ - **`selected_gpus (list)`: default = None**
273
+ - List of GPU IDs on your computer you want to use if `mode = 'gpu'`
274
+ - Non-existing GPU IDs will cause the code to exit
275
+
276
+ ### Finding Optimal CPU/GPU Configurations
277
+ After running **EvalutateSystem**, you can use either the **OptimalCPUConfigurationFinder** or the **OptimalGPUConfigurationFinder** determine the best CPU or GPU configurations (respectively) for your specific usecase:
278
+
279
+ ```python
280
+ from eqcctpro import OptimalCPUConfigurationFinder, OptimalGPUConfigurationFinder
281
+
282
+ csv_filepath = '/path/to/csv'
283
+
284
+ cpu_finder = OptimalCPUConfigurationFinder(csv_filepath)
285
+ best_cpu_config = cpu_finder.find_best_overall_usecase()
286
+ print(best_cpu_config)
287
+
288
+ optimal_cpu_config = cpu_finder.find_optimal_for(cpu=3, station_count=2)
289
+ print(optimal_cpu_config)
290
+
291
+ gpu_finder = OptimalGPUConfigurationFinder(csv_filepath)
292
+ best_gpu_config = gpu_finder.find_best_overall_usecase()
293
+ print(best_gpu_config)
294
+
295
+ optimal_gpu_config = gpu_finder.find_optimal_for(num_cpus=1, gpu_list=[0], station_count=1)
296
+ print(optimal_gpu_config)
297
+ ```
298
+ Both **OptimalCPUConfigurationFinder** and **OptimalGPUConfigurationFinder** each have two usecases:
299
+
300
+ 1. **`find_best_overall_usecase`**
301
+ - Returns the best overall usecase configuration
302
+ - Uses middel 50% of CPUs for moderate, balanced CPU usage, with the maximum amount of stations processed with the minimum runtime
303
+ 2. **`find_optimal_for`**
304
+ - Return the paralleliztion configurations (EX. concurrent predictions, intra/inter thread counts, vram, etc.) for a given number of CPU(s)/GPU(s) and stations
305
+ - Enables users to quickly identify which input parameters should be used for the given amount of resources and workload they have for the minimum runtime possible on their computer
306
+
307
+ A input CSV directory path must be passed for the classes to use as a reference point:
308
+ - **`csv_filepath (str)`**
309
+ - Directory path where the CSV's outputted by EvaluateSystem are
310
+
311
+ Using **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, no input parameters are needed. It will return back the best usecase parameters.
312
+
313
+ For **OptimalCPUConfigurationFinder.find_optimal_for()**, the function requires two input parameters:
314
+ - **`cpu (int)`**
315
+ - The number of CPU(s) you want to use in your application
316
+ - **`station_count (int)`**
317
+ - The number of station(s) you want to use in your application
318
+
319
+ **OptimalCPUConfigurationFinder.find_optimal_for()** will return back a trial data point containing the mimimum runtime based on your input paramters
320
+
321
+ Similar to **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, **OptimalGPUConfigurationFinder.find_best_overall_usecase()** will return back the best usecase parameters and no input parameters are needed.
322
+
323
+ For **OptimalGPUConfigurationFinder.find_optimal_for()**, the function requires three input parameters:
324
+ - **`cpu (int)`**
325
+ - The number of CPU(s) you want to use in your application
326
+ - **`gpu_list (list)`**
327
+ - The specific GPU ID(s) you want to use in your application
328
+ - Useful if you have multiple GPUs available and want to use/dedicate a specific one to using EQCCTPro
329
+ - **`station_count (int)`**
330
+ - The number of station(s) you want to use in your application
331
+
332
+ ## Configuration
333
+ The `environment.yml` file specifies the dependencies required to run EQCCTPro. Ensure you have the correct versions installed by using the provided conda environment setup.
334
+
335
+ ## License
336
+ EQCCTPro is provided under an open-source license. See LICENSE for details.
337
+
338
+ ## Contact
339
+ For inquiries or issues, please contact constantinos.skevofilax@austin.utexas.edu.
340
+
@@ -0,0 +1,311 @@
1
+ # EQCCTPro: powerful seismic event detection toolkit
2
+
3
+ EQCCTPro is a high-performace seismic event detection and processing framework that leverages EQCCT to process seismic data efficiently. It enables users to fully leverage the computational ability of their computing resources for maximum performance for simultaneous seismic waveform processing, achieving real-time performance by identifying and utilizing the optimal computational configurations for their hardware. More information about the development, capabilities, and real-world applications about EQCCTPro can be read about in our research publication here.
4
+
5
+ ## Features
6
+ - Supports both CPU and GPU execution
7
+ - Configurable parallelism execution for optimized performance
8
+ - Includes tools for evaluating system performance for optimal usecase configurations
9
+ - Automatic selection of best-usecase configurations
10
+ - Efficient handling of large-scale seismic data
11
+
12
+ ## Installation
13
+ To install the necessary dependencies, create a conda environment using:
14
+
15
+ ```sh
16
+ [skevofilaxc] conda env create -f environment.yml
17
+ [skevofilaxc] conda activate eqcctpro
18
+ ```
19
+
20
+ You can get the `environment.yml` file from the `eqcct` repository. You can download the entire eqcct repository or download only the `eqcctpro` repository using the following command:
21
+
22
+ ```sh
23
+ [skevofilaxc] mkdir my_work_directory
24
+ [skevofilaxc] cd my_work_directory
25
+ [skevofilaxc] git clone --depth 1 --filter=tree:0 https://github.com/ut-beg-texnet/eqcct.git --sparse
26
+ [skevofilaxc] cd eqcct
27
+ [skevofilaxc] git sparse-checkout set eqcctpro
28
+ ```
29
+ After creating and activating the conda environment, install the **eqcctpro Python package** using the following command:
30
+ ```sh
31
+ [skevofilaxc] pip install eqcctpro
32
+ ```
33
+ More information on the package can be found at our PyPi project link [eqcctpro](https://pypi.org/project/eqcctpro/).
34
+
35
+ ## Creating a Test Workspace Environment
36
+ It's highly suggested to create a workspace environment to first understand how eqcctpro works.
37
+ Sample seismic waveform data from 50 TexNet stations have provided in the eqcctpro repository under the `sample_1_minute_data.zip` file.
38
+
39
+ After downloading the .zip file, either individually or through the git pull methods, run the following command to unzip it:
40
+ ```sh
41
+ [skevofilaxc] unzip sample_1_minute_data.zip
42
+ ```
43
+ It's contents will look like:
44
+ ```sh
45
+ [skevofilaxc sample_1_minute_data]$ ls
46
+ AT01 CF01 DG05 EF54 EF76 HBVL MB09 MB21 MID02 ODSA PB16 PB25 PB35 PB52 PH02 SM03 WB11
47
+ BB01 CT02 DG09 EF63 FOAK4 HNDO MB13 MB25 MID03 PB04 PB17 PB26 PB39 PB54 PL01 SMWD WB12
48
+ BP01 DB02 EF02 EF75 FW13 MB06 MB19 MID01 MO01 PB11 PB18 PB34 PB42 PECS SM02 WB06
49
+ ```
50
+ Where each subdirectory is named after station code, and is made up of mSEED files of different poses:
51
+ ```sh
52
+ [skevofilaxc PB35]$ ls
53
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
54
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
55
+ ```
56
+ EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
57
+
58
+ You are now set up for testing.
59
+ ## Usage
60
+ There are three main capabilities of EQCCTPro:
61
+ 1. Process mSEED data from singular or multiple seismic stations using either CPUs or GPUs
62
+ 2. Evaluate your system to identify the optimal parallelization configurations needed to get the minimum runtime performance out of your system
63
+ 3. Identify and return back the optimal parallelization configurations for both specific and general-use usecases for both CPU (a) and GPU applications (b)
64
+
65
+ These capabilities are achieved by the following functions in order respect to the above descriptions:
66
+ **EQCCTMSeedRunner (1)**, **EvaluateSystem (2)**, **OptimalCPUConfigurationFinder (3a)**, **OptimalGPUConfigurationFinder (3b)**.
67
+
68
+ ### Processing mSEED data using EQCCTPro (EQCCTMSeedRunner)
69
+ To use EQCCTPro to process mSEED from various seismic stations, use the **EQCCTMSeedRunner** class.
70
+ **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory. The input directory is made up of station directories such as:
71
+
72
+ ```sh
73
+ [skevofilaxc sample_1_minute_data]$ ls
74
+ AT01 CF01 DG05 EF54 EF76 HBVL MB09 MB21 MID02 ODSA PB16 PB25 PB35 PB52 PH02 SM03 WB11
75
+ BB01 CT02 DG09 EF63 FOAK4 HNDO MB13 MB25 MID03 PB04 PB17 PB26 PB39 PB54 PL01 SMWD WB12
76
+ BP01 DB02 EF02 EF75 FW13 MB06 MB19 MID01 MO01 PB11 PB18 PB34 PB42 PECS SM02 WB06
77
+ ```
78
+ Where each subdirectory is named after station code. If you wish to use create your own input directory with custom information, **please follow the above naming convention.** Otherwise, EQCCTPro will **not** work.
79
+
80
+ Within each subdirectory, such as PB35, it is made up of mSEED files of different poses (EX. N, E, Z):
81
+ ```sh
82
+ [skevofilaxc PB35]$ ls
83
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
84
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
85
+ ```
86
+ EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
87
+
88
+ After setting up or utilizing the provided sample waveform directory, and install eqcctpro, import **EQCCTMseedRunner** as show below:
89
+
90
+ ```python
91
+ from eqcctpro import EQCCTMSeedRunner
92
+
93
+ eqcct_runner = EQCCTMSeedRunner(
94
+ use_gpu=False,
95
+ intra_threads=1,
96
+ inter_threads=1,
97
+ cpu_id_list=[0,1,2,3,4],
98
+ input_dir='/path/to/mseed',
99
+ output_dir='/path/to/outputs',
100
+ log_filepath='/path/to/outputs/eqcctpro.log',
101
+ P_threshold=0.001,
102
+ S_threshold=0.02,
103
+ p_model_filepath='/path/to/model_p.h5',
104
+ s_model_filepath='/path/to/model_s.h5',
105
+ number_of_concurrent_predictions=5,
106
+ best_usecase_config=True,
107
+ csv_dir='/path/to/csv',
108
+ selected_gpus=[0],
109
+ set_vram_mb=24750,
110
+ specific_stations='AT01, BP01, DG05'
111
+ )
112
+ eqcct_runner.run_eqcctpro()
113
+ ```
114
+
115
+ **EQCCTMseedRunner** has multiple input paramters that need to be configured and are defined below:
116
+
117
+ - **`use_gpu (bool)`: True or False**
118
+ - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
119
+ - Further specification of which GPU(s) and CPU(s) are provided in the parameters below
120
+ - **`intra_threads (int)`: default = 1**
121
+ - Controls how many intra-parallelism threads Tensorflow can use
122
+ - **`inter_threads (int)`: default = 1**
123
+ - Controls how many inter-parallelism threads Tensorflow can use
124
+ - **`cpu_id_list (list)`: default = [1]**
125
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
126
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
127
+ - "I want this program to run only on these specific cores."
128
+ - **`input_dir (str)`**
129
+ - Directory path to the the mSEED directory
130
+ - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data`
131
+ - **`output_dir (str)`**
132
+ - Directory path to where the output picks and logs will be sent
133
+ - Doesn't need to exist, will be created if doesn't exist
134
+ - Recommended to be in the same working directory as the input directory for convience
135
+ - **`log_filepath (str)`**
136
+ - Filepath to where the EQCCTPro log will be written to and stored
137
+ - Doesn't need to exist, will be created if doesn't exist
138
+ - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
139
+ - **`P_threshold (float)`: default = 0.001**
140
+ - Threshold in which the P probabilities above it will be considered as P arrival
141
+ - **`S_threshold (float)`: default = 0.02**
142
+ - Threshold in which the S probabilities above it will be considered as S arrival
143
+ - **`p_model_filepath (str)`**
144
+ - Filepath to where the P EQCCT detection model is stored
145
+ - **`s_model_filepath (str)`**
146
+ - Filepath to where the S EQCCT detection model is stored
147
+ - **`number_of_concurrent_predictions (int)`**
148
+ - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
149
+ - EX. if number_of_concurrent_predictions = 5, there will be up to 5 EQCCT instances analyzing 5 different waveforms at the sametime
150
+ - Best to use the optimal amount for your hardware, which can be identified using **EvaluateSystem** (below)
151
+ - **`best_usecase_config (bool)`: default = False**
152
+ - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall usecase configurations
153
+ - Best overall usecase configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
154
+ - Can only be used if EvaluateSystem has been run
155
+ - **`csv_dir (str)`**
156
+ - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
157
+ - Script will look for specific files, will only exist if EvaluateSystem has been run
158
+ - **`selected_gpus (list)`: default = None**
159
+ - List of GPU IDs on your computer you want to use if `use_gpu = True`
160
+ - None existing GPU IDs will cause the code to exit
161
+ - **`set_vram_mb (float)`**
162
+ - Value of the maximum amount of VRAM EQCCTPro can use
163
+ - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
164
+ - **`specific_stations (str)`: default = None**
165
+ - String that contains the "list" of stations you want to only analyze
166
+ - EX. Out of the 50 sample stations in `sample_1_minute_data`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
167
+ - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
168
+ - **`cpu_id_list (list)`: default = [1]**
169
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
170
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
171
+ - "I want this program to run only on these specific cores."
172
+ ### Evaluating Your Systems Runtime Performance Capabilites
173
+ To evaluate your system’s runtime performance capabilites for both your CPU(s) and GPU(s), the **EvaluateSystem** class allows you to autonomously evaluate your system:
174
+
175
+ ```python
176
+ from eqcctpro import EvaluateSystem
177
+
178
+ eval_gpu = EvaluateSystem(
179
+ mode='gpu',
180
+ intra_threads=1,
181
+ inter_threads=1,
182
+ input_dir='/path/to/mseed',
183
+ output_dir='/path/to/outputs',
184
+ log_filepath='/path/to/outputs/eqcctpro.log',
185
+ csv_dir='/path/to/csv',
186
+ P_threshold=0.001,
187
+ S_threshold=0.02,
188
+ p_model_filepath='/path/to/model_p.h5',
189
+ s_model_filepath='/path/to/model_s.h5',
190
+ stations2use=2,
191
+ cpu_id_list=[0,1],
192
+ set_vram_mb=24750,
193
+ selected_gpus=[0]
194
+ )
195
+ eval_gpu.evaluate()
196
+ ```
197
+ **EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Predictions, and Workloads (stations), as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
198
+ **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
199
+ the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
200
+
201
+ The following input parameters need to be configurated for **EvaluateSystem** to evaluate your system based on your desired utilization of EQCCTPro:
202
+
203
+ - **`mode (str)`**
204
+ - Can be either `cpu` or `gpu`
205
+ - Tells `EvaluateSystem` which configuration trials should it iterate through
206
+ - **`intra_threads (int)`: default = 1**
207
+ - Controls how many intra-parallelism threads Tensorflow can use
208
+ - **`inter_threads (int)`: default = 1**
209
+ - Controls how many inter-parallelism threads Tensorflow can use
210
+ - **`input_dir (str)`**
211
+ - Directory path to the the mSEED directory
212
+ - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data
213
+ - **`output_dir (str)`**
214
+ - Directory path to where the output picks and logs will be sent
215
+ - Doesn't need to exist, will be created if doesn't exist
216
+ - Recommended to be in the same working directory as the input directory for convience
217
+ - **`log_filepath (str)`**
218
+ - Filepath to where the EQCCTPro log will be written to and stored
219
+ - Doesn't need to exist, will be created if doesn't exist
220
+ - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
221
+ - **`csv_dir (str)`**
222
+ - Directory path where the CSV's outputted by EvaluateSystem will be saved
223
+ - Doesn't need to exist, will be created if doesn't exist
224
+ - **`P_threshold (float)`: default = 0.001**
225
+ - Threshold in which the P probabilities above it will be considered as P arrival
226
+ - **`S_threshold (float)`: default = 0.02**
227
+ - Threshold in which the S probabilities above it will be considered as S arrival
228
+ - **`p_model_filepath (str)`**
229
+ - Filepath to where the P EQCCT detection model is stored
230
+ - **`s_model_filepath (str)`**
231
+ - Filepath to where the S EQCCT detection model is stored
232
+ - **`stations2use (int)`: default = None**
233
+ - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
234
+ - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
235
+ - **`cpu_id_list (list)`: default = [1]**
236
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
237
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
238
+ - "I want this program to run only on these specific cores."
239
+ - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
240
+ - **`set_vram_mb (float)`**
241
+ - Value of the maximum amount of VRAM EQCCTPro can use
242
+ - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError
243
+ - **`selected_gpus (list)`: default = None**
244
+ - List of GPU IDs on your computer you want to use if `mode = 'gpu'`
245
+ - Non-existing GPU IDs will cause the code to exit
246
+
247
+ ### Finding Optimal CPU/GPU Configurations
248
+ After running **EvalutateSystem**, you can use either the **OptimalCPUConfigurationFinder** or the **OptimalGPUConfigurationFinder** determine the best CPU or GPU configurations (respectively) for your specific usecase:
249
+
250
+ ```python
251
+ from eqcctpro import OptimalCPUConfigurationFinder, OptimalGPUConfigurationFinder
252
+
253
+ csv_filepath = '/path/to/csv'
254
+
255
+ cpu_finder = OptimalCPUConfigurationFinder(csv_filepath)
256
+ best_cpu_config = cpu_finder.find_best_overall_usecase()
257
+ print(best_cpu_config)
258
+
259
+ optimal_cpu_config = cpu_finder.find_optimal_for(cpu=3, station_count=2)
260
+ print(optimal_cpu_config)
261
+
262
+ gpu_finder = OptimalGPUConfigurationFinder(csv_filepath)
263
+ best_gpu_config = gpu_finder.find_best_overall_usecase()
264
+ print(best_gpu_config)
265
+
266
+ optimal_gpu_config = gpu_finder.find_optimal_for(num_cpus=1, gpu_list=[0], station_count=1)
267
+ print(optimal_gpu_config)
268
+ ```
269
+ Both **OptimalCPUConfigurationFinder** and **OptimalGPUConfigurationFinder** each have two usecases:
270
+
271
+ 1. **`find_best_overall_usecase`**
272
+ - Returns the best overall usecase configuration
273
+ - Uses middel 50% of CPUs for moderate, balanced CPU usage, with the maximum amount of stations processed with the minimum runtime
274
+ 2. **`find_optimal_for`**
275
+ - Return the paralleliztion configurations (EX. concurrent predictions, intra/inter thread counts, vram, etc.) for a given number of CPU(s)/GPU(s) and stations
276
+ - Enables users to quickly identify which input parameters should be used for the given amount of resources and workload they have for the minimum runtime possible on their computer
277
+
278
+ A input CSV directory path must be passed for the classes to use as a reference point:
279
+ - **`csv_filepath (str)`**
280
+ - Directory path where the CSV's outputted by EvaluateSystem are
281
+
282
+ Using **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, no input parameters are needed. It will return back the best usecase parameters.
283
+
284
+ For **OptimalCPUConfigurationFinder.find_optimal_for()**, the function requires two input parameters:
285
+ - **`cpu (int)`**
286
+ - The number of CPU(s) you want to use in your application
287
+ - **`station_count (int)`**
288
+ - The number of station(s) you want to use in your application
289
+
290
+ **OptimalCPUConfigurationFinder.find_optimal_for()** will return back a trial data point containing the mimimum runtime based on your input paramters
291
+
292
+ Similar to **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, **OptimalGPUConfigurationFinder.find_best_overall_usecase()** will return back the best usecase parameters and no input parameters are needed.
293
+
294
+ For **OptimalGPUConfigurationFinder.find_optimal_for()**, the function requires three input parameters:
295
+ - **`cpu (int)`**
296
+ - The number of CPU(s) you want to use in your application
297
+ - **`gpu_list (list)`**
298
+ - The specific GPU ID(s) you want to use in your application
299
+ - Useful if you have multiple GPUs available and want to use/dedicate a specific one to using EQCCTPro
300
+ - **`station_count (int)`**
301
+ - The number of station(s) you want to use in your application
302
+
303
+ ## Configuration
304
+ The `environment.yml` file specifies the dependencies required to run EQCCTPro. Ensure you have the correct versions installed by using the provided conda environment setup.
305
+
306
+ ## License
307
+ EQCCTPro is provided under an open-source license. See LICENSE for details.
308
+
309
+ ## Contact
310
+ For inquiries or issues, please contact constantinos.skevofilax@austin.utexas.edu.
311
+
@@ -0,0 +1,340 @@
1
+ Metadata-Version: 2.2
2
+ Name: eqcctpro
3
+ Version: 0.4.1
4
+ Description-Content-Type: text/markdown
5
+ Requires-Dist: numpy==1.26.4
6
+ Requires-Dist: pandas==2.2.3
7
+ Requires-Dist: matplotlib==3.10.0
8
+ Requires-Dist: obspy==1.4.1
9
+ Requires-Dist: progress==1.6
10
+ Requires-Dist: psutil==6.1.1
11
+ Requires-Dist: ray==2.42.1
12
+ Requires-Dist: schedule==1.2.2
13
+ Requires-Dist: sdnotify==0.3.2
14
+ Requires-Dist: tensorflow<2.19,>=2.15
15
+ Requires-Dist: tensorflow-estimator<2.19,>=2.15
16
+ Requires-Dist: tensorflow-io-gcs-filesystem==0.37.1
17
+ Requires-Dist: tensorboard==2.15.2
18
+ Requires-Dist: tensorboard-data-server==0.7.2
19
+ Requires-Dist: silence-tensorflow==1.2.3
20
+ Requires-Dist: scipy==1.15.1
21
+ Requires-Dist: protobuf==4.25.6
22
+ Requires-Dist: grpcio==1.70.0
23
+ Requires-Dist: absl-py==2.1.0
24
+ Requires-Dist: h5py==3.12.1
25
+ Requires-Dist: pynvml==12.0.0
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: requires-dist
29
+
30
+ # EQCCTPro: powerful seismic event detection toolkit
31
+
32
+ EQCCTPro is a high-performace seismic event detection and processing framework that leverages EQCCT to process seismic data efficiently. It enables users to fully leverage the computational ability of their computing resources for maximum performance for simultaneous seismic waveform processing, achieving real-time performance by identifying and utilizing the optimal computational configurations for their hardware. More information about the development, capabilities, and real-world applications about EQCCTPro can be read about in our research publication here.
33
+
34
+ ## Features
35
+ - Supports both CPU and GPU execution
36
+ - Configurable parallelism execution for optimized performance
37
+ - Includes tools for evaluating system performance for optimal usecase configurations
38
+ - Automatic selection of best-usecase configurations
39
+ - Efficient handling of large-scale seismic data
40
+
41
+ ## Installation
42
+ To install the necessary dependencies, create a conda environment using:
43
+
44
+ ```sh
45
+ [skevofilaxc] conda env create -f environment.yml
46
+ [skevofilaxc] conda activate eqcctpro
47
+ ```
48
+
49
+ You can get the `environment.yml` file from the `eqcct` repository. You can download the entire eqcct repository or download only the `eqcctpro` repository using the following command:
50
+
51
+ ```sh
52
+ [skevofilaxc] mkdir my_work_directory
53
+ [skevofilaxc] cd my_work_directory
54
+ [skevofilaxc] git clone --depth 1 --filter=tree:0 https://github.com/ut-beg-texnet/eqcct.git --sparse
55
+ [skevofilaxc] cd eqcct
56
+ [skevofilaxc] git sparse-checkout set eqcctpro
57
+ ```
58
+ After creating and activating the conda environment, install the **eqcctpro Python package** using the following command:
59
+ ```sh
60
+ [skevofilaxc] pip install eqcctpro
61
+ ```
62
+ More information on the package can be found at our PyPi project link [eqcctpro](https://pypi.org/project/eqcctpro/).
63
+
64
+ ## Creating a Test Workspace Environment
65
+ It's highly suggested to create a workspace environment to first understand how eqcctpro works.
66
+ Sample seismic waveform data from 50 TexNet stations have provided in the eqcctpro repository under the `sample_1_minute_data.zip` file.
67
+
68
+ After downloading the .zip file, either individually or through the git pull methods, run the following command to unzip it:
69
+ ```sh
70
+ [skevofilaxc] unzip sample_1_minute_data.zip
71
+ ```
72
+ It's contents will look like:
73
+ ```sh
74
+ [skevofilaxc sample_1_minute_data]$ ls
75
+ AT01 CF01 DG05 EF54 EF76 HBVL MB09 MB21 MID02 ODSA PB16 PB25 PB35 PB52 PH02 SM03 WB11
76
+ BB01 CT02 DG09 EF63 FOAK4 HNDO MB13 MB25 MID03 PB04 PB17 PB26 PB39 PB54 PL01 SMWD WB12
77
+ BP01 DB02 EF02 EF75 FW13 MB06 MB19 MID01 MO01 PB11 PB18 PB34 PB42 PECS SM02 WB06
78
+ ```
79
+ Where each subdirectory is named after station code, and is made up of mSEED files of different poses:
80
+ ```sh
81
+ [skevofilaxc PB35]$ ls
82
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
83
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
84
+ ```
85
+ EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
86
+
87
+ You are now set up for testing.
88
+ ## Usage
89
+ There are three main capabilities of EQCCTPro:
90
+ 1. Process mSEED data from singular or multiple seismic stations using either CPUs or GPUs
91
+ 2. Evaluate your system to identify the optimal parallelization configurations needed to get the minimum runtime performance out of your system
92
+ 3. Identify and return back the optimal parallelization configurations for both specific and general-use usecases for both CPU (a) and GPU applications (b)
93
+
94
+ These capabilities are achieved by the following functions in order respect to the above descriptions:
95
+ **EQCCTMSeedRunner (1)**, **EvaluateSystem (2)**, **OptimalCPUConfigurationFinder (3a)**, **OptimalGPUConfigurationFinder (3b)**.
96
+
97
+ ### Processing mSEED data using EQCCTPro (EQCCTMSeedRunner)
98
+ To use EQCCTPro to process mSEED from various seismic stations, use the **EQCCTMSeedRunner** class.
99
+ **EQCCTMSeedRunner** enables users to process multiple mSEED from a given input directory. The input directory is made up of station directories such as:
100
+
101
+ ```sh
102
+ [skevofilaxc sample_1_minute_data]$ ls
103
+ AT01 CF01 DG05 EF54 EF76 HBVL MB09 MB21 MID02 ODSA PB16 PB25 PB35 PB52 PH02 SM03 WB11
104
+ BB01 CT02 DG09 EF63 FOAK4 HNDO MB13 MB25 MID03 PB04 PB17 PB26 PB39 PB54 PL01 SMWD WB12
105
+ BP01 DB02 EF02 EF75 FW13 MB06 MB19 MID01 MO01 PB11 PB18 PB34 PB42 PECS SM02 WB06
106
+ ```
107
+ Where each subdirectory is named after station code. If you wish to use create your own input directory with custom information, **please follow the above naming convention.** Otherwise, EQCCTPro will **not** work.
108
+
109
+ Within each subdirectory, such as PB35, it is made up of mSEED files of different poses (EX. N, E, Z):
110
+ ```sh
111
+ [skevofilaxc PB35]$ ls
112
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.HHZ__20241215T115800Z__20241215T120100Z.mseed
113
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
114
+ ```
115
+ EQCCT only needs one pose for the detection to occur, however more poses allow for better detection of the direction of the P and S waves.
116
+
117
+ After setting up or utilizing the provided sample waveform directory, and install eqcctpro, import **EQCCTMseedRunner** as show below:
118
+
119
+ ```python
120
+ from eqcctpro import EQCCTMSeedRunner
121
+
122
+ eqcct_runner = EQCCTMSeedRunner(
123
+ use_gpu=False,
124
+ intra_threads=1,
125
+ inter_threads=1,
126
+ cpu_id_list=[0,1,2,3,4],
127
+ input_dir='/path/to/mseed',
128
+ output_dir='/path/to/outputs',
129
+ log_filepath='/path/to/outputs/eqcctpro.log',
130
+ P_threshold=0.001,
131
+ S_threshold=0.02,
132
+ p_model_filepath='/path/to/model_p.h5',
133
+ s_model_filepath='/path/to/model_s.h5',
134
+ number_of_concurrent_predictions=5,
135
+ best_usecase_config=True,
136
+ csv_dir='/path/to/csv',
137
+ selected_gpus=[0],
138
+ set_vram_mb=24750,
139
+ specific_stations='AT01, BP01, DG05'
140
+ )
141
+ eqcct_runner.run_eqcctpro()
142
+ ```
143
+
144
+ **EQCCTMseedRunner** has multiple input paramters that need to be configured and are defined below:
145
+
146
+ - **`use_gpu (bool)`: True or False**
147
+ - Tells Ray to use either the GPU(s) (True) or CPUs (False) on your computer to process the waveforms in the entire workflow
148
+ - Further specification of which GPU(s) and CPU(s) are provided in the parameters below
149
+ - **`intra_threads (int)`: default = 1**
150
+ - Controls how many intra-parallelism threads Tensorflow can use
151
+ - **`inter_threads (int)`: default = 1**
152
+ - Controls how many inter-parallelism threads Tensorflow can use
153
+ - **`cpu_id_list (list)`: default = [1]**
154
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
155
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
156
+ - "I want this program to run only on these specific cores."
157
+ - **`input_dir (str)`**
158
+ - Directory path to the the mSEED directory
159
+ - EX. `/home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data`
160
+ - **`output_dir (str)`**
161
+ - Directory path to where the output picks and logs will be sent
162
+ - Doesn't need to exist, will be created if doesn't exist
163
+ - Recommended to be in the same working directory as the input directory for convience
164
+ - **`log_filepath (str)`**
165
+ - Filepath to where the EQCCTPro log will be written to and stored
166
+ - Doesn't need to exist, will be created if doesn't exist
167
+ - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
168
+ - **`P_threshold (float)`: default = 0.001**
169
+ - Threshold in which the P probabilities above it will be considered as P arrival
170
+ - **`S_threshold (float)`: default = 0.02**
171
+ - Threshold in which the S probabilities above it will be considered as S arrival
172
+ - **`p_model_filepath (str)`**
173
+ - Filepath to where the P EQCCT detection model is stored
174
+ - **`s_model_filepath (str)`**
175
+ - Filepath to where the S EQCCT detection model is stored
176
+ - **`number_of_concurrent_predictions (int)`**
177
+ - The number of concurrent EQCCT detection tasks that can happen simultaneously on a given number of resources
178
+ - EX. if number_of_concurrent_predictions = 5, there will be up to 5 EQCCT instances analyzing 5 different waveforms at the sametime
179
+ - Best to use the optimal amount for your hardware, which can be identified using **EvaluateSystem** (below)
180
+ - **`best_usecase_config (bool)`: default = False**
181
+ - If True, will override inputted cpu_id_list, number_of_concurrent_predictions, intra_threads, inter_threads values for the best overall usecase configurations
182
+ - Best overall usecase configurations are defined as the best overall input configurations that minimize runtime while doing the most amount of processing with your available hardware
183
+ - Can only be used if EvaluateSystem has been run
184
+ - **`csv_dir (str)`**
185
+ - Directory path containing the CSV's outputted by EvaluateSystem that contain the trial data that will be used to find the best_usecase_config
186
+ - Script will look for specific files, will only exist if EvaluateSystem has been run
187
+ - **`selected_gpus (list)`: default = None**
188
+ - List of GPU IDs on your computer you want to use if `use_gpu = True`
189
+ - None existing GPU IDs will cause the code to exit
190
+ - **`set_vram_mb (float)`**
191
+ - Value of the maximum amount of VRAM EQCCTPro can use
192
+ - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to **OutOfMemoryError**
193
+ - **`specific_stations (str)`: default = None**
194
+ - String that contains the "list" of stations you want to only analyze
195
+ - EX. Out of the 50 sample stations in `sample_1_minute_data`, if I only want to analyze AT01, BP01, DG05, then specific_stations='AT01, BP01, DG05'.
196
+ - Removes the need to move station directories around to be used as input, can contain all stations in one directory for access
197
+ - **`cpu_id_list (list)`: default = [1]**
198
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process.
199
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
200
+ - "I want this program to run only on these specific cores."
201
+ ### Evaluating Your Systems Runtime Performance Capabilites
202
+ To evaluate your system’s runtime performance capabilites for both your CPU(s) and GPU(s), the **EvaluateSystem** class allows you to autonomously evaluate your system:
203
+
204
+ ```python
205
+ from eqcctpro import EvaluateSystem
206
+
207
+ eval_gpu = EvaluateSystem(
208
+ mode='gpu',
209
+ intra_threads=1,
210
+ inter_threads=1,
211
+ input_dir='/path/to/mseed',
212
+ output_dir='/path/to/outputs',
213
+ log_filepath='/path/to/outputs/eqcctpro.log',
214
+ csv_dir='/path/to/csv',
215
+ P_threshold=0.001,
216
+ S_threshold=0.02,
217
+ p_model_filepath='/path/to/model_p.h5',
218
+ s_model_filepath='/path/to/model_s.h5',
219
+ stations2use=2,
220
+ cpu_id_list=[0,1],
221
+ set_vram_mb=24750,
222
+ selected_gpus=[0]
223
+ )
224
+ eval_gpu.evaluate()
225
+ ```
226
+ **EvaluateSystem** will iterate through different combinations of CPU(s), Concurrent Predictions, and Workloads (stations), as well as GPU(s), and the amount of VRAM (MB) each Concurrent Prediction can use.
227
+ **EvaluateSystem** will take time, depending on the number of CPU/GPUs, the amount of VRAM available, and the total workload that needs to be tested. However, after doing the testing once for most if not all usecases,
228
+ the trial data will be available and can be used to identify the optimal input parallelization configurations for **EQCCTMSeedRunner** to use to get the maximum amount of processing out of your system in the shortest amonut of time.
229
+
230
+ The following input parameters need to be configurated for **EvaluateSystem** to evaluate your system based on your desired utilization of EQCCTPro:
231
+
232
+ - **`mode (str)`**
233
+ - Can be either `cpu` or `gpu`
234
+ - Tells `EvaluateSystem` which configuration trials should it iterate through
235
+ - **`intra_threads (int)`: default = 1**
236
+ - Controls how many intra-parallelism threads Tensorflow can use
237
+ - **`inter_threads (int)`: default = 1**
238
+ - Controls how many inter-parallelism threads Tensorflow can use
239
+ - **`input_dir (str)`**
240
+ - Directory path to the the mSEED directory
241
+ - EX. /home/skevofilaxc/my_work_directory/eqcct/eqcctpro/sample_1_minute_data
242
+ - **`output_dir (str)`**
243
+ - Directory path to where the output picks and logs will be sent
244
+ - Doesn't need to exist, will be created if doesn't exist
245
+ - Recommended to be in the same working directory as the input directory for convience
246
+ - **`log_filepath (str)`**
247
+ - Filepath to where the EQCCTPro log will be written to and stored
248
+ - Doesn't need to exist, will be created if doesn't exist
249
+ - Recommended to be **in** the **output directory** and called **eqcctpro.log**, however the name can be changed for your own purposes
250
+ - **`csv_dir (str)`**
251
+ - Directory path where the CSV's outputted by EvaluateSystem will be saved
252
+ - Doesn't need to exist, will be created if doesn't exist
253
+ - **`P_threshold (float)`: default = 0.001**
254
+ - Threshold in which the P probabilities above it will be considered as P arrival
255
+ - **`S_threshold (float)`: default = 0.02**
256
+ - Threshold in which the S probabilities above it will be considered as S arrival
257
+ - **`p_model_filepath (str)`**
258
+ - Filepath to where the P EQCCT detection model is stored
259
+ - **`s_model_filepath (str)`**
260
+ - Filepath to where the S EQCCT detection model is stored
261
+ - **`stations2use (int)`: default = None**
262
+ - Controls the maximum amount of stations EvaluateSystem can use in its trial iterations
263
+ - Sample data has been provided so that the maximum is 50, however, if using custom data, configure for your specific usecase
264
+ - **`cpu_id_list (list)`: default = [1]**
265
+ - List that defines which specific CPU cores that sched_setaffinity will allocate for executing the current EQCCTPro process and **is the maximum amount of cores EvaluteSystem can use in its trial iterations**
266
+ - Allows for specific allocation and limitation of CPUs for a given EQCCTPro process
267
+ - "I want this program to run only on these specific cores."
268
+ - Must be at least 1 CPU if using GPUs (Ray needs CPUs to manage the Raylets (concurrent tasks), however the processing of the waveform is done on the GPU)
269
+ - **`set_vram_mb (float)`**
270
+ - Value of the maximum amount of VRAM EQCCTPro can use
271
+ - Must be a real value that is based on your hardware's physical memory space, if it exceeds the space the code will break due to OutOfMemoryError
272
+ - **`selected_gpus (list)`: default = None**
273
+ - List of GPU IDs on your computer you want to use if `mode = 'gpu'`
274
+ - Non-existing GPU IDs will cause the code to exit
275
+
276
+ ### Finding Optimal CPU/GPU Configurations
277
+ After running **EvalutateSystem**, you can use either the **OptimalCPUConfigurationFinder** or the **OptimalGPUConfigurationFinder** determine the best CPU or GPU configurations (respectively) for your specific usecase:
278
+
279
+ ```python
280
+ from eqcctpro import OptimalCPUConfigurationFinder, OptimalGPUConfigurationFinder
281
+
282
+ csv_filepath = '/path/to/csv'
283
+
284
+ cpu_finder = OptimalCPUConfigurationFinder(csv_filepath)
285
+ best_cpu_config = cpu_finder.find_best_overall_usecase()
286
+ print(best_cpu_config)
287
+
288
+ optimal_cpu_config = cpu_finder.find_optimal_for(cpu=3, station_count=2)
289
+ print(optimal_cpu_config)
290
+
291
+ gpu_finder = OptimalGPUConfigurationFinder(csv_filepath)
292
+ best_gpu_config = gpu_finder.find_best_overall_usecase()
293
+ print(best_gpu_config)
294
+
295
+ optimal_gpu_config = gpu_finder.find_optimal_for(num_cpus=1, gpu_list=[0], station_count=1)
296
+ print(optimal_gpu_config)
297
+ ```
298
+ Both **OptimalCPUConfigurationFinder** and **OptimalGPUConfigurationFinder** each have two usecases:
299
+
300
+ 1. **`find_best_overall_usecase`**
301
+ - Returns the best overall usecase configuration
302
+ - Uses middel 50% of CPUs for moderate, balanced CPU usage, with the maximum amount of stations processed with the minimum runtime
303
+ 2. **`find_optimal_for`**
304
+ - Return the paralleliztion configurations (EX. concurrent predictions, intra/inter thread counts, vram, etc.) for a given number of CPU(s)/GPU(s) and stations
305
+ - Enables users to quickly identify which input parameters should be used for the given amount of resources and workload they have for the minimum runtime possible on their computer
306
+
307
+ A input CSV directory path must be passed for the classes to use as a reference point:
308
+ - **`csv_filepath (str)`**
309
+ - Directory path where the CSV's outputted by EvaluateSystem are
310
+
311
+ Using **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, no input parameters are needed. It will return back the best usecase parameters.
312
+
313
+ For **OptimalCPUConfigurationFinder.find_optimal_for()**, the function requires two input parameters:
314
+ - **`cpu (int)`**
315
+ - The number of CPU(s) you want to use in your application
316
+ - **`station_count (int)`**
317
+ - The number of station(s) you want to use in your application
318
+
319
+ **OptimalCPUConfigurationFinder.find_optimal_for()** will return back a trial data point containing the mimimum runtime based on your input paramters
320
+
321
+ Similar to **OptimalCPUConfigurationFinder.find_best_overall_usecase()**, **OptimalGPUConfigurationFinder.find_best_overall_usecase()** will return back the best usecase parameters and no input parameters are needed.
322
+
323
+ For **OptimalGPUConfigurationFinder.find_optimal_for()**, the function requires three input parameters:
324
+ - **`cpu (int)`**
325
+ - The number of CPU(s) you want to use in your application
326
+ - **`gpu_list (list)`**
327
+ - The specific GPU ID(s) you want to use in your application
328
+ - Useful if you have multiple GPUs available and want to use/dedicate a specific one to using EQCCTPro
329
+ - **`station_count (int)`**
330
+ - The number of station(s) you want to use in your application
331
+
332
+ ## Configuration
333
+ The `environment.yml` file specifies the dependencies required to run EQCCTPro. Ensure you have the correct versions installed by using the provided conda environment setup.
334
+
335
+ ## License
336
+ EQCCTPro is provided under an open-source license. See LICENSE for details.
337
+
338
+ ## Contact
339
+ For inquiries or issues, please contact constantinos.skevofilax@austin.utexas.edu.
340
+
@@ -1,7 +1,6 @@
1
1
  README.md
2
2
  setup.py
3
3
  eqcctpro/__init__.py
4
- eqcctpro/eqcctpro.py
5
4
  eqcctpro.egg-info/PKG-INFO
6
5
  eqcctpro.egg-info/SOURCES.txt
7
6
  eqcctpro.egg-info/dependency_links.txt
@@ -7,8 +7,8 @@ psutil==6.1.1
7
7
  ray==2.42.1
8
8
  schedule==1.2.2
9
9
  sdnotify==0.3.2
10
- tensorflow==2.15.1
11
- tensorflow-estimator==2.15.0
10
+ tensorflow<2.19,>=2.15
11
+ tensorflow-estimator<2.19,>=2.15
12
12
  tensorflow-io-gcs-filesystem==0.37.1
13
13
  tensorboard==2.15.2
14
14
  tensorboard-data-server==0.7.2
@@ -8,9 +8,12 @@ os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
8
8
  os.environ["NVIDIA_LOG_LEVEL"] = "ERROR"
9
9
  os.environ["CUDA_MODULE_LOADING"] = "LAZY"
10
10
 
11
+ with open("README.md", "r") as f:
12
+ description = f.read()
13
+
11
14
  setup(
12
15
  name="eqcctpro",
13
- version="0.3",
16
+ version="0.4.1",
14
17
  packages=find_packages(),
15
18
  install_requires=[
16
19
  "numpy==1.26.4",
@@ -22,8 +25,8 @@ setup(
22
25
  "ray==2.42.1",
23
26
  "schedule==1.2.2",
24
27
  "sdnotify==0.3.2",
25
- "tensorflow==2.15.1",
26
- "tensorflow-estimator==2.15.0",
28
+ "tensorflow>=2.15,<2.19", # Updated TensorFlow constraint
29
+ "tensorflow-estimator>=2.15,<2.19", # Updated TensorFlow Estimator constraint
27
30
  "tensorflow-io-gcs-filesystem==0.37.1",
28
31
  "tensorboard==2.15.2",
29
32
  "tensorboard-data-server==0.7.2",
@@ -35,4 +38,6 @@ setup(
35
38
  "h5py==3.12.1",
36
39
  "pynvml==12.0.0",
37
40
  ],
41
+ long_description=description,
42
+ long_description_content_type="text/markdown"
38
43
  )
eqcctpro-0.3/PKG-INFO DELETED
@@ -1,25 +0,0 @@
1
- Metadata-Version: 2.2
2
- Name: eqcctpro
3
- Version: 0.3
4
- Requires-Dist: numpy==1.26.4
5
- Requires-Dist: pandas==2.2.3
6
- Requires-Dist: matplotlib==3.10.0
7
- Requires-Dist: obspy==1.4.1
8
- Requires-Dist: progress==1.6
9
- Requires-Dist: psutil==6.1.1
10
- Requires-Dist: ray==2.42.1
11
- Requires-Dist: schedule==1.2.2
12
- Requires-Dist: sdnotify==0.3.2
13
- Requires-Dist: tensorflow==2.15.1
14
- Requires-Dist: tensorflow-estimator==2.15.0
15
- Requires-Dist: tensorflow-io-gcs-filesystem==0.37.1
16
- Requires-Dist: tensorboard==2.15.2
17
- Requires-Dist: tensorboard-data-server==0.7.2
18
- Requires-Dist: silence-tensorflow==1.2.3
19
- Requires-Dist: scipy==1.15.1
20
- Requires-Dist: protobuf==4.25.6
21
- Requires-Dist: grpcio==1.70.0
22
- Requires-Dist: absl-py==2.1.0
23
- Requires-Dist: h5py==3.12.1
24
- Requires-Dist: pynvml==12.0.0
25
- Dynamic: requires-dist
eqcctpro-0.3/README.md DELETED
@@ -1 +0,0 @@
1
- hello!
@@ -1,64 +0,0 @@
1
-
2
- import os
3
- from predictor import EQCCTMSeedRunner, EvaluateSystem, OptimalCPUConfigurationFinder, OptimalGPUConfigurationFinder
4
- input_mseed_directory_path = '/home/skevofilaxc/eqcctpro/mseed/20241215T115800Z_20241215T120100Z'
5
- output_pick_directory_path = '/home/skevofilaxc/eqcctpro/outputs'
6
- log_file_path = '/home/skevofilaxc/eqcctpro/outputs/eqcctpro.log'
7
- csv_filepath = '/home/skevofilaxc/eqcctpro/csv'
8
-
9
- # Can run EQCCT on a given input dir on GPU or CPU
10
- # Can also specify the number of stations you want to use as well
11
-
12
- # eqcct_runner = EQCCTMSeedRunner(use_gpu=True,
13
- # intra_threads=1,
14
- # inter_threads=1,
15
- # cpu_id_list=[0,1,2,3,4],
16
- # input_dir=input_mseed_directory_path,
17
- # output_dir=output_pick_directory_path,
18
- # log_filepath=log_file_path,
19
- # P_threshold=0.001,
20
- # S_threshold=0.02,
21
- # p_model_filepath='/home/skevofilaxc/model/ModelPS/test_trainer_024.h5',
22
- # s_model_filepath='/home/skevofilaxc/model/ModelPS/test_trainer_021.h5',
23
- # number_of_concurrent_predictions=5,
24
- # best_usecase_config=True,
25
- # csv_dir=csv_filepath,
26
- # selected_gpus=[0],
27
- # set_vram_mb=24750,
28
- # specific_stations='AT01, BP01, DG05')
29
-
30
- # eqcct_runner.run_eqcctpro()
31
-
32
-
33
- # eval_gpu = EvaluateSystem('gpu',
34
- # intra_threads=1,
35
- # inter_threads=1,
36
- # input_dir=input_mseed_directory_path,
37
- # output_dir=output_pick_directory_path,
38
- # log_filepath=log_file_path,
39
- # csv_dir=csv_filepath,
40
- # P_threshold=0.001,
41
- # S_threshold=0.02,
42
- # p_model_filepath='/home/skevofilaxc/model/ModelPS/test_trainer_024.h5',
43
- # s_model_filepath='/home/skevofilaxc/model/ModelPS/test_trainer_021.h5',
44
- # stations2use=2,
45
- # cpu_id_list=range(0,1,2),
46
- # set_vram_mb=24750,
47
- # selected_gpus=[0])
48
- # eval_gpu.evaluate() # This triggers evaluate_gpu() if mode is 'gpu'
49
-
50
- cpu_finder = OptimalCPUConfigurationFinder(csv_filepath)
51
- best_cpu_config = cpu_finder.find_best_overall_usecase()
52
- print(best_cpu_config)
53
-
54
-
55
- optimal_cpu_config = cpu_finder.find_optimal_for(cpu=3, station_count=2)
56
- print(optimal_cpu_config)
57
-
58
-
59
- gpu_finder = OptimalGPUConfigurationFinder(csv_filepath)
60
- best_gpu_config = gpu_finder.find_best_overall_usecase()
61
- print(best_gpu_config)
62
-
63
- optimal_gpu_config = gpu_finder.find_optimal_for(num_cpus=1, gpu_list=[0], station_count=1)
64
- print(optimal_gpu_config)
@@ -1,25 +0,0 @@
1
- Metadata-Version: 2.2
2
- Name: eqcctpro
3
- Version: 0.3
4
- Requires-Dist: numpy==1.26.4
5
- Requires-Dist: pandas==2.2.3
6
- Requires-Dist: matplotlib==3.10.0
7
- Requires-Dist: obspy==1.4.1
8
- Requires-Dist: progress==1.6
9
- Requires-Dist: psutil==6.1.1
10
- Requires-Dist: ray==2.42.1
11
- Requires-Dist: schedule==1.2.2
12
- Requires-Dist: sdnotify==0.3.2
13
- Requires-Dist: tensorflow==2.15.1
14
- Requires-Dist: tensorflow-estimator==2.15.0
15
- Requires-Dist: tensorflow-io-gcs-filesystem==0.37.1
16
- Requires-Dist: tensorboard==2.15.2
17
- Requires-Dist: tensorboard-data-server==0.7.2
18
- Requires-Dist: silence-tensorflow==1.2.3
19
- Requires-Dist: scipy==1.15.1
20
- Requires-Dist: protobuf==4.25.6
21
- Requires-Dist: grpcio==1.70.0
22
- Requires-Dist: absl-py==2.1.0
23
- Requires-Dist: h5py==3.12.1
24
- Requires-Dist: pynvml==12.0.0
25
- Dynamic: requires-dist
File without changes
File without changes