eqcctpro 0.6.2__py3-none-any.whl → 0.7.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,312 @@
1
+ Metadata-Version: 2.4
2
+ Name: eqcctpro
3
+ Version: 0.7.0
4
+ Summary: EQCCTPro: A Powerful Seismic Event Detection & Performance Optimization Toolkit
5
+ Author-email: Constantinos Skevofilax <constantinos.skevofilax@austin.utexas.edu>, Victor Salles <victor.salles@beg.utexas.edu>
6
+ Project-URL: Homepage, https://pypi.org/project/eqcctpro/
7
+ Project-URL: Repository, https://github.com/ut-beg-texnet/eqcct/tree/main/eqcctpro
8
+ Project-URL: Issues, https://github.com/ut-beg-texnet/eqcct/tree/main/eqcctpro/issues
9
+ Project-URL: Documentation, https://github.com/ut-beg-texnet/eqcct/blob/main/eqcctpro/README.md
10
+ Requires-Python: >=3.10.14
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: numpy==1.26.4
13
+ Requires-Dist: pandas==2.2.3
14
+ Requires-Dist: matplotlib==3.10.0
15
+ Requires-Dist: obspy==1.4.1
16
+ Requires-Dist: psutil==6.1.1
17
+ Requires-Dist: ray==2.42.1
18
+ Requires-Dist: tensorflow==2.20.0
19
+ Requires-Dist: keras==3.12.0
20
+ Requires-Dist: tensorboard==2.20.0
21
+ Requires-Dist: tensorboard-data-server==0.7.2
22
+ Requires-Dist: silence-tensorflow==1.2.3
23
+ Requires-Dist: scipy==1.15.1
24
+ Requires-Dist: protobuf==6.33.4
25
+ Requires-Dist: h5py==3.12.1
26
+ Requires-Dist: pynvml==12.0.0
27
+ Requires-Dist: torch==2.5.1
28
+ Requires-Dist: seisbench==0.10.2
29
+ Requires-Dist: nvidia-cudnn-cu12==9.17.1.4
30
+ Requires-Dist: requests==2.32.3
31
+ Requires-Dist: rich==13.9.4
32
+ Requires-Dist: shapely==2.1.0
33
+
34
+ # **EQCCTPro: A Powerful Seismic Event Detection & Performance Optimization Toolkit**
35
+
36
+ EQCCTPro is a high-performance seismic event detection and processing framework designed to bridge the gap between deep learning models and large-scale seismic data processing. It natively supports **EQCCT** (TensorFlow) and the **SeisBench** ecosystem (PyTorch), including models like **PhaseNet**, **EQTransformer**, **GPD**, and **CRED**.
37
+
38
+ EQCCTPro is engineered for **real-time performance**, identifying the optimal parallelization configurations for your specific hardware (CPU and Multi-GPU) to minimize runtime and maximize station throughput. EQCCTPro has enabled seismic networks, like the Texas Seismological Research Group (TexNet), to enable their DL picking model EQCCT to run operationally, in real-time, over its network of over 250+ seismic stations. More information on the architecture and application of EQCCTPro can be found in our upcoming publication [here](https://github.com/ut-beg-texnet/eqcct/blob/main/eqcctpro/OptimizedEQCCT_Paper.pdf).
39
+
40
+ ## **Features**
41
+ - **Multi-Model Support**: Integrated with **EQCCT** and **SeisBench** (PhaseNet, EQTransformer, etc.).
42
+ - **Hybrid Parallelism**: Optimized for both CPU-only and Multi-GPU environments using Ray.
43
+ - **Intelligent Benchmarking**: Automated system evaluation with 20% step-size concurrency testing and redundancy filtering.
44
+ - **Advanced VRAM Management**: Per-worker memory slicing and aggregate pool safety caps to prevent OOM errors.
45
+ - **Automated Dataset Creation**: Workflow-ready data retrieval and denoising via FDSNWS connection.
46
+ - **Resource Selection**: Fine-grained control over CPU affinity binding and specific GPU selection.
47
+
48
+ # **Installation Guide**
49
+
50
+ EQCCTPro requires a specific dependency stack to ensure compatibility between TensorFlow, PyTorch, and CUDA libraries.
51
+
52
+ ### **Requirements**
53
+ - **Python**: 3.10.14+
54
+ - **TensorFlow**: 2.20.0
55
+ - **PyTorch**: 2.5.1 + cu121
56
+ - **SeisBench**: 0.10.2
57
+ - **NVIDIA Driver**: Compatible with CUDA 12.1+
58
+
59
+ ### **Standard Installation (Recommended)**
60
+ The easiest way to install EQCCTPro with its sample data and all dependencies provided via the `environment.yml` file can be found below:
61
+
62
+ ```sh
63
+ # Clone the repository
64
+ git clone https://github.com/ut-beg-texnet/eqcct.git
65
+ cd eqcct/eqcctpro
66
+
67
+ # Create and activate the environment
68
+ conda env create -f environment.yml
69
+ conda activate eqcctpro
70
+ ```
71
+
72
+ ### **Pip installation**
73
+ EQCCTPro is also maintained on the PyPI website, which can be found [here](https://pypi.org/project/eqcctpro/).
74
+ You can install the EQCCTPro package via:
75
+
76
+ ```sh
77
+ pip install eqcctpro
78
+ ```
79
+
80
+ ---
81
+
82
+ # Understanding the Waveform Data Input Style to EQCCTPro
83
+
84
+ 1-minute long sample seismic waveforms from 229 TexNet stations have been provided in the repository under the `230_stations_1_min_dt.zip` file to help users understand the EQCCTPro waveform input style.
85
+
86
+ After donwloading the `.zip` file from the repository, run:
87
+ ```sh
88
+ unzip 230_stations_1_min_dt.zip
89
+ ```
90
+
91
+ Inside the zip foilder, we have can see a single timechunk subdirectory, which is comprised of 229 station subdirectories that contain three-component waveforms:
92
+
93
+ ```sh
94
+ [skevofilaxc 230_stations_1_min_dt]$ ls
95
+ 20241215T120000Z_20241215T120100Z
96
+
97
+ [skevofilaxc 230_stations_1_min_dt]$ cd 20241215T120000Z_20241215T120100Z
98
+ 237B BP01 CT02 DG02 DG10 EE04 EF07 EF54 EF63 EF69 EF77 FOAK3 FW06 FW14
99
+ HBVL LWM2 MB05 MB12 MB19 MBBB3 MID03 NM01 OG02 PB05 PB11 PB19 PB26 PB34
100
+ PB41 PB51 PB57 PH03 SA06 SGCY SN02 SN10 WB03 WB09 YK01
101
+ 435B BRDY CV01 DG04 DKNS EF02 EF08 EF56 EF64 EF71 ELG6 FOAK4 FW07 FW15
102
+ HNDO LWM3 MB06 MB13 MB21 MBBB5 MLDN NM02 OG04 PB06 PB12 PB21 PB28 PB35
103
+ PB42 PB52 PB58 PL01 SA07 SM01 SN03 SNAG WB04 WB10
104
+ ALPN BW01 CW01 DG05 DRIO EF03 EF09 EF58 EF65 EF72 ET02 FW01 FW09 GV01
105
+ HP01 MB01 MB07 MB15 MB22 MBBB6 MNHN NM03 OZNA PB07 PB14 PB22 PB29 PB37
106
+ PB43 PB53 PB59 PLPT SA09 SM02 SN04 TREL WB05 WB11
107
+ APMT CF01 DB02 DG06 DRZT EF04 EF51 EF59 EF66 EF74 FLRS FW02 FW11 GV02
108
+ HP02 MB02 MB08 MB16 MB25 MG01 MO01 ODSA PB01 PB08 PB16 PB23 PB30 PB38
109
+ PB44 PB54 PCOS POST SAND SM03 SN07 VHRN WB06 WB12
110
+ AT01 CRHG DB03 DG07 EE02 EF05 EF52 EF61 EF67 EF75 FOAK1 FW04 FW12 GV03
111
+ INDO MB03 MB09 MB17 MBBB1 MID01 NGL01 OE01 PB03 PB09 PB17 PB24 PB32 PB39
112
+ PB46 PB55 PECS SA02 SD01 SM04 SN08 VW01 WB07 WTFS
113
+ BB01 CT01 DB04 DG09 EE03 EF06 EF53 EF62 EF68 EF76 FOAK2 FW05 FW13 GV04
114
+ LWM1 MB04 MB11 MB18 MBBB2 MID02 NGL02 OG01 PB04 PB10 PB18 PB25 PB33 PB40
115
+ PB47 PB56 PH02 SA04 SE01 SMWD SN09 WB02 WB08 WW01
116
+
117
+ [skevofilaxc PB35]$ ls
118
+ TX.PB35.00.HH1__20241215T115800Z__20241215T120100Z.mseed TX.PB35.00.
119
+ HHZ__20241215T115800Z__20241215T120100Z.mseed
120
+ TX.PB35.00.HH2__20241215T115800Z__20241215T120100Z.mseed
121
+ ```
122
+ EQCCT requires at least one pose per station for detection, but
123
+ using multiple poses enhances P and S wave directionality.
124
+
125
+ Where each subdirectory is named after station code. If you wish to use create your own input directory with custom waveform mSEED files, **please follow the above naming conventions.** Otherwise, EQCCTPro will **not** work.
126
+ Create subdirectories for each timechunk (sub-parent directories) and for each station (child directories). The station directories should be named as shown above. Each timechunk directory spans from the **start of the analysis period minus the waveform overlap** to the **end of the analysis period**, based on the defined timechunk duration.
127
+
128
+ For example:
129
+ ```sh
130
+ [skevofilaxc 230_stations_2hr_1_hr_dt]$ ls
131
+ 20241215T115800Z_20241215T130000Z 20241215T125800Z_20241215T140000Z
132
+ ```
133
+ The timechunk time length is 1 hour long. At the same time, we use a waveform overlap of 2 minutes. Hence: `20241215T115800Z_20241215T130000Z` spans from `11:58:00 to 13:00:00 UTC on Dec 15, 2024` and `20241215T125800Z_20241215T140000Z` spans from `12:58:00 to 14:00:00 UTC on Dec 15, 2024`.
134
+
135
+ ## **Dataset creation using a FDSNWS connection**
136
+
137
+ Through the help of [Donavin97](https://github.com/Donavin97), it is now possible to create the necesary dataset structure with your own data using the
138
+ provided `create_dataset.py` script.
139
+
140
+ `create_dataset.py` can:
141
+ 1. Retrieves waveform data from a user defined FDSNWS webservice.
142
+ 2. Selects data according to network, station, channel and location codes.
143
+ 3. Has the option for defining time chunks according to the users requirements.
144
+ 4. Automatically downloads and creates the required folder structure for eqcctpro.
145
+ 5. Optionally denoises the data using seisbench as backend.
146
+
147
+ An example is provided below:
148
+ ```sh
149
+ python create_dataset.py -h
150
+
151
+ usage: create_dataset.py [-h] [--start START] [--end END] [--networks NETWORKS]
152
+ [--stations STATIONS] [--locations LOCATIONS]
153
+ [--channels CHANNELS] [--host HOST] [--output OUTPUT] [--chunk
154
+ CHUNK] [--denoise]
155
+
156
+ Download FDSN waveforms in equal-time chunks.
157
+
158
+ options:
159
+ -h, --help show this help message and exit
160
+ --start START Start time, e.g. 2024-12-03T00:00:00Z
161
+ --end END End time, e.g. 2024-12-03T02:00:00Z
162
+ --networks NETWORKS Comma-separated network codes or *
163
+ --stations STATIONS Comma-separated station codes or *
164
+ --locations LOCATIONS
165
+ Comma-separated location codes or *
166
+ --channels CHANNELS Comma-separated channel codes or *
167
+ --host HOST FDSNWS base URL
168
+ --output OUTPUT Base output directory
169
+ --chunk CHUNK Chunk size in minutes. Splits start■end into N windows.
170
+ --denoise If set, apply seisbench.DeepDenoiser to each chunk.
171
+ ```
172
+
173
+ An example to download waveforms from a local FDSNWS server is given below:
174
+ ```sh
175
+ python create_dataset.py --start 2025-10-31T00:00 --end 2025-10-31T04:00 --networks TX
176
+ --stations "*" --locations "*" --channels HH?,HN? --host http://localhost:8080 --output
177
+ waveforms_directory --chunk 60
178
+ ```
179
+
180
+ The resulting output folder contains the data to be processed by EQCCTPro.
181
+
182
+ **Note:** Please make sure that you set a consistant chunk size in the download script, as well as in EQCCTPro itself to avoid issues.
183
+ E.g.: If you set a time chunk of 20 minutes in the download script, then also use 20 minutes as chunk size when calling EQCCTPro. This is so that data won't be processed eroniusly.
184
+
185
+ ---
186
+
187
+ # **1. Processing mSEED Data (RunEQCCTPro)**
188
+
189
+ The `RunEQCCTPro` class is the primary interface for running seismic detection on your data. It handles model loading (TensorFlow or PyTorch), waveform segmenting, and parallel pick generation.
190
+
191
+ ### **Example: Running SeisBench PhaseNet on GPU**
192
+ ```python
193
+ from eqcctpro import RunEQCCTPro
194
+
195
+ runner = RunEQCCTPro(
196
+ model_type='seisbench', # 'eqcct' or 'seisbench'
197
+ seisbench_parent_model='PhaseNet',# SeisBench class
198
+ seisbench_child_model='original', # Pretrained version
199
+ Detection_threshold=0.3, # SeisBench detection threshold
200
+ use_gpu=True,
201
+ selected_gpus=[0, 1], # Use multiple GPUs
202
+ vram_mb=2500, # VRAM budget per station task
203
+ number_of_concurrent_station_predictions=10,
204
+ number_of_concurrent_timechunk_predictions=2,
205
+ start_time='2024-12-15 12:00:00',
206
+ end_time='2024-12-15 13:00:00',
207
+ timechunk_dt=30,
208
+ waveform_overlap=2
209
+ )
210
+
211
+ runner.run_eqcctpro()
212
+ ```
213
+
214
+ ### **Parameter Definitions**
215
+
216
+ #### **Model Configuration**
217
+ - **`model_type (str)`**: Choice of `'eqcct'` (for the original EQCCT model) or `'seisbench'` (for SeisBench-based models).
218
+ - **`seisbench_parent_model (str)`**: (SeisBench only) The model architecture (e.g., `PhaseNet`, `EQTransformer`).
219
+ - **`seisbench_child_model (str)`**: (SeisBench only) The pretrained weights (e.g., `original`, `stead`, `ethz`).
220
+ - **`Detection_threshold (float)`**: (SeisBench only) The probability threshold for detection traces. Default: `0.3`.
221
+ - **`P_threshold (float)`**: (EQCCT only) Arrival probability threshold for P-waves. Default: `0.001`.
222
+ - **`S_threshold (float)`**: (EQCCT only) Arrival probability threshold for S-waves. Default: `0.02`.
223
+ - **`p_model_filepath / s_model_filepath (str)`**: (EQCCT only) Paths to the `.h5` model files.
224
+
225
+ #### **Hardware & Parallelism**
226
+ - **`use_gpu (bool)`**: Enables GPU acceleration.
227
+ - **`selected_gpus (list)`**: List of GPU indices (e.g., `[0, 1]`) to utilize.
228
+ - **`vram_mb (float)`**: The hard VRAM limit allocated to **each** station prediction task.
229
+ - **`cpu_id_list (list)`**: Specific CPU core IDs to bind the process to (e.g., `range(0, 16)`).
230
+ - **`intra_threads (int)`**: Default = 1; Controls how many intra-parallelism threads Tensorflow can use
231
+ - **`inter_threads (int)`**: Default = 1; Controls how many inter-parallelism threads Tensorflow can use
232
+ - **`number_of_concurrent_station_predictions (int)`**: How many stations to process in parallel per timechunk.
233
+ - **`number_of_concurrent_timechunk_predictions (int)`**: How many timechunks to process in parallel.
234
+
235
+ #### **Workflow & Data**
236
+ - **`input_dir / output_dir (str)`**: Paths for input mSEED files and output pick results.
237
+ - **`start_time / end_time (str)`**: Analysis window (Format: `YYYY-MM-DD HH:MM:SS`).
238
+ - **`timechunk_dt (int)`**: Duration of each processing chunk in minutes.
239
+ - **`waveform_overlap (int)`**: Overlap between chunks in minutes to ensure no events are missed at boundaries.
240
+ - **`best_usecase_config (bool)`**: If `True`, overrides parallelism settings with the optimal values found by `EvaluateSystem`.
241
+
242
+ ---
243
+
244
+ # **2. System Evaluation (EvaluateSystem)**
245
+
246
+ Before running large-scale production jobs, use `EvaluateSystem` to benchmark your hardware. It autonomously runs trials across different concurrency levels to find the "sweet spot" for your system.
247
+
248
+ ### **Key Benchmark Optimizations**
249
+ - **20% Step Size**: Automatically tests station concurrency at 20%, 40%, 60%, 80%, and 100% levels.
250
+ - **Redundancy Filtering**: Skips configurations that are already in the results CSV, allowing for interrupted evaluations to resume instantly.
251
+ - **GPU Resource Slicing**: Dynamically calculates per-task VRAM limits based on an aggregate pool.
252
+
253
+ ### **Example: Evaluating GPU Performance**
254
+ ```python
255
+ from eqcctpro import EvaluateSystem
256
+
257
+ eval_gpu = EvaluateSystem(
258
+ eval_mode='gpu',
259
+ model_type='seisbench',
260
+ seisbench_parent_model='PhaseNet',
261
+ seisbench_child_model='original',
262
+ selected_gpus=[0, 1],
263
+ max_vram_mb=48000, # Total VRAM pool to test across all GPUs
264
+ gpu_vram_safety_cap=0.95, # Reserve 5% VRAM for system stability
265
+ stations2use=100, # Max stations to test
266
+ cpu_id_list=range(0, 8), # CPUs available for Ray management
267
+ input_dir='/path/to/mseed',
268
+ csv_dir='/path/to/results'
269
+ )
270
+ eval_gpu.evaluate()
271
+ ```
272
+
273
+ ### **Evaluation Parameters**
274
+ - **`eval_mode (str)`**: `'cpu'` or `'gpu'`.
275
+ - **`max_vram_mb (float)`**: The total aggregate VRAM budget across all GPUs for the evaluation. If not provided, it is calculated from physical VRAM.
276
+ - **`gpu_vram_safety_cap (float)`**: The fraction of VRAM (0.0 to 1.0) EQCCTPro is allowed to use. Default: `0.95`.
277
+ - **`stations2use (int)`**: The maximum number of stations to test in the benchmark.
278
+ - **`min_cpu_amount / cpu_test_step_size (int)`**: Controls the iterative testing of CPU core counts.
279
+ - **`starting_amount_of_stations / station_list_step_size (int)`**: Controls the iterative testing of total workload size.
280
+
281
+ ---
282
+
283
+ # **3. Finding Optimal Configurations**
284
+
285
+ Once the evaluation is complete, use the configuration finders to extract the best settings. Results are now automatically grouped by the model used during testing.
286
+
287
+ ```python
288
+ from eqcctpro import OptimalGPUConfigurationFinder
289
+
290
+ # results_dir should contain 'gpu_test_results.csv'
291
+ finder = OptimalGPUConfigurationFinder(results_dir='/path/to/results')
292
+
293
+ # 1. Get the fastest overall config for a balanced workload
294
+ best_config = finder.find_best_overall_usecase()
295
+
296
+ # 2. Get the optimal config for a specific resource limit
297
+ # Example: What is the fastest way to process 50 stations using 4 CPUs and GPU 0?
298
+ specific_config = finder.find_optimal_for(num_cpus=4, gpu_list=[0], station_count=50)
299
+ ```
300
+
301
+ There are more examples on how to use EQCCTPro using different SeisBench and EQCCT models in `run.py` file.
302
+
303
+ ---
304
+
305
+
306
+ # **License & Citation**
307
+ EQCCTPro is provided under an open-source license. If you use this software in your research, please cite our work:
308
+ [Optimized EQCCT Paper](https://github.com/ut-beg-texnet/eqcct/blob/main/eqcctpro/OptimizedEQCCT_Paper.pdf) (Currently in Review).
309
+
310
+ # **Contact**
311
+ **Constantinos Skevofilax**: constantinos.skevofilax@austin.utexas.edu
312
+ **Victor Salles**: victor.salles@beg.utexas.edu
@@ -0,0 +1,10 @@
1
+ eqcctpro/__init__.py,sha256=nFHoI_nYKfH3taZznoU55-icf0aPgTjadFgsKoO_ARc,312
2
+ eqcctpro/eqcct_tf_models.py,sha256=hg1ULpCJdbregL6LWeja2OK81UNoIXMbBlqcADJPcaY,14832
3
+ eqcctpro/functionality.py,sha256=8NQKQ00TVk--WJcIzPxrEcvTX1kFUNf4cnGKHGMpHPM,86036
4
+ eqcctpro/parallelization.py,sha256=RNNEedMmPGjjM9hCsTRjtQUILlpbYIpAf5uzkLnlOwE,51935
5
+ eqcctpro/seisbench_models.py,sha256=psg2aodfV12cuPKsw5XTZO-lWMlE4dNArHDLH0rHHU0,10249
6
+ eqcctpro/tools.py,sha256=pMZ55uJsCmVioFgezffvtYPVz0Ei1f74atjOlpxEaY4,46609
7
+ eqcctpro-0.7.0.dist-info/METADATA,sha256=ZzNj3iiXfWGz8kCfkadJvQ0iim8iDmhEAOv4BPVFO4k,16107
8
+ eqcctpro-0.7.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
9
+ eqcctpro-0.7.0.dist-info/top_level.txt,sha256=u0cu2JdF7Z0ob7y4XdUCLoSGp_xOudAYz-fbsQ-B1yY,9
10
+ eqcctpro-0.7.0.dist-info/RECORD,,