goesgcp 1.0.3__tar.gz → 1.0.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: goesgcp
3
- Version: 1.0.3
3
+ Version: 1.0.5
4
4
  Summary: A package to download and process GOES-16/17 data
5
5
  Home-page: https://github.com/helvecioneto/goesgcp
6
6
  Author: Helvecio B. L. Neto
@@ -46,28 +46,26 @@ The script uses the `argparse` module for handling command-line arguments. Below
46
46
  goesgcp [OPTIONS]
47
47
  ```
48
48
 
49
- | Option | Description |
50
- |----------------------|-----------------------------------------------------------------------------|
51
- | `--satellite` | Name of the satellite (e.g., goes16). |
52
- | `--product` | Name of the satellite product (e.g., ABI-L2-CMIPF). |
53
- | `--var_name` | Variable name to extract (e.g., CMI). |
54
- | `--channel` | Channel to use (e.g., 13). | |
55
- | `--between_minutes` | Filter data between these minutes (default: `[0, 60]`). |
56
- | `--output_path` | Path for saving output files (default: `output/`). | |
57
- | `--lat_min` | Minimum latitude of the bounding box (default: `-56`). |
58
- | `--lat_max` | Maximum latitude of the bounding box (default: `35`). |
59
- | `--lon_min` | Minimum longitude of the bounding box (default: `-116`). |
60
- | `--lon_max` | Maximum longitude of the bounding box (default: `-25`). |
61
- | `--max_attempts` | Number of attempts to download a file before logging a failure (default: `3`).|
49
+ | Option | Description |
50
+ |----------------------|----------------------------------------------------------------------------|
51
+ | `--satellite` | Name of the satellite (e.g., goes16). |
52
+ | `--product` | Name of the satellite product (e.g., ABI-L2-CMIPF). |
53
+ | `--var_name` | Variable name to extract (e.g., CMI). |
54
+ | `--channel` | Channel to use (e.g., 13). |
55
+ | `--output` | Path for saving output files (default: `output/`). |
56
+ | `--lat_min` | Minimum latitude of the bounding box (default: `-56`). |
57
+ | `--lat_max` | Maximum latitude of the bounding box (default: `35`). |
58
+ | `--lon_min` | Minimum longitude of the bounding box (default: `-116`). |
59
+ | `--lon_max` | Maximum longitude of the bounding box (default: `-25`). |
60
+ | `--resolution` | Set the reprojet data resolution in degree (default: `-0.045`). |
62
61
 
63
62
  ### Examples
64
63
 
65
- To download and process recent data for the GOES-16 satellite, ABI-L2-CMIPF product, variable CMI, and channel 13, run the following command:
64
+ To download most 3 recent data for the GOES-16 satellite, ABI-L2-CMIPF product, variable CMI, and channel 13, run the following command:
66
65
 
67
66
  ```bash
68
- goesgcp --satellite goes16 --product ABI-L2-CMIPF --var_name CMI --channel 13 --recent 3 --output_path "output/"
67
+ goesgcp --satellite goes16 --product ABI-L2-CMIPF --var_name CMI --channel 13 --recent 3 --output "output/"
69
68
  ```
70
69
 
71
70
  ### Credits
72
- All the credit goes to the original author of the **goes2go** library.
73
71
  And this is a otimization by Helvecio Neto - 2025
@@ -21,28 +21,26 @@ The script uses the `argparse` module for handling command-line arguments. Below
21
21
  goesgcp [OPTIONS]
22
22
  ```
23
23
 
24
- | Option | Description |
25
- |----------------------|-----------------------------------------------------------------------------|
26
- | `--satellite` | Name of the satellite (e.g., goes16). |
27
- | `--product` | Name of the satellite product (e.g., ABI-L2-CMIPF). |
28
- | `--var_name` | Variable name to extract (e.g., CMI). |
29
- | `--channel` | Channel to use (e.g., 13). | |
30
- | `--between_minutes` | Filter data between these minutes (default: `[0, 60]`). |
31
- | `--output_path` | Path for saving output files (default: `output/`). | |
32
- | `--lat_min` | Minimum latitude of the bounding box (default: `-56`). |
33
- | `--lat_max` | Maximum latitude of the bounding box (default: `35`). |
34
- | `--lon_min` | Minimum longitude of the bounding box (default: `-116`). |
35
- | `--lon_max` | Maximum longitude of the bounding box (default: `-25`). |
36
- | `--max_attempts` | Number of attempts to download a file before logging a failure (default: `3`).|
24
+ | Option | Description |
25
+ |----------------------|----------------------------------------------------------------------------|
26
+ | `--satellite` | Name of the satellite (e.g., goes16). |
27
+ | `--product` | Name of the satellite product (e.g., ABI-L2-CMIPF). |
28
+ | `--var_name` | Variable name to extract (e.g., CMI). |
29
+ | `--channel` | Channel to use (e.g., 13). |
30
+ | `--output` | Path for saving output files (default: `output/`). |
31
+ | `--lat_min` | Minimum latitude of the bounding box (default: `-56`). |
32
+ | `--lat_max` | Maximum latitude of the bounding box (default: `35`). |
33
+ | `--lon_min` | Minimum longitude of the bounding box (default: `-116`). |
34
+ | `--lon_max` | Maximum longitude of the bounding box (default: `-25`). |
35
+ | `--resolution` | Set the reprojet data resolution in degree (default: `-0.045`). |
37
36
 
38
37
  ### Examples
39
38
 
40
- To download and process recent data for the GOES-16 satellite, ABI-L2-CMIPF product, variable CMI, and channel 13, run the following command:
39
+ To download most 3 recent data for the GOES-16 satellite, ABI-L2-CMIPF product, variable CMI, and channel 13, run the following command:
41
40
 
42
41
  ```bash
43
- goesgcp --satellite goes16 --product ABI-L2-CMIPF --var_name CMI --channel 13 --recent 3 --output_path "output/"
42
+ goesgcp --satellite goes16 --product ABI-L2-CMIPF --var_name CMI --channel 13 --recent 3 --output "output/"
44
43
  ```
45
44
 
46
45
  ### Credits
47
- All the credit goes to the original author of the **goes2go** library.
48
46
  And this is a otimization by Helvecio Neto - 2025
@@ -4,10 +4,10 @@ import xarray as xr
4
4
  import argparse
5
5
  import sys
6
6
  import tqdm
7
- from concurrent.futures import ThreadPoolExecutor
7
+ from multiprocessing import Pool
8
8
  from google.cloud import storage
9
9
  from datetime import datetime, timedelta, timezone
10
- from pyproj import CRS
10
+ from pyproj import CRS, Transformer
11
11
 
12
12
 
13
13
 
@@ -64,71 +64,124 @@ def get_recent_files(connection, bucket_name, base_prefix, pattern, min_files):
64
64
  # Return only the names of the most recent files, according to the minimum requested
65
65
  return [file[0] for file in files[:min_files]]
66
66
 
67
- def download_file(connection, bucket_name, blob_name, local_path):
68
- """Downloads a file from a GCP bucket."""
69
- bucket = connection.bucket(bucket_name)
70
- blob = bucket.blob(blob_name)
71
- blob.download_to_filename(local_path)
72
67
 
73
68
  def crop_reproject(file, output):
74
69
  """
75
70
  Crops and reprojects a GOES-16 file to EPSG:4326.
76
71
  """
72
+ # Open the file
73
+ ds = xr.open_dataset(file, engine='netcdf4')
77
74
 
78
-
79
- ds = xr.open_dataset(file)
80
75
  # Select only var_name and goes_imager_projection
81
76
  ds = ds[[var_name, "goes_imager_projection"]]
77
+
82
78
  # Get projection
83
79
  sat_height = ds["goes_imager_projection"].attrs["perspective_point_height"]
84
80
  ds = ds.assign_coords({
85
81
  "x": ds["x"].values * sat_height,
86
82
  "y": ds["y"].values * sat_height,
87
83
  })
88
- # Set CRS
84
+ # Set CRS from goes_imager_projection
89
85
  crs = CRS.from_cf(ds["goes_imager_projection"].attrs)
90
86
  ds = ds.rio.write_crs(crs)
91
87
 
92
- # Reproject to EPSG:4326 using parallel processing
93
- ds = ds.rio.reproject(dst_crs="EPSG:4326",
94
- resolution=(resolution, resolution),
95
- num_threads=-1)
88
+ # Try to reduce the size of the dataset
89
+ try:
90
+ # Create a transformer
91
+ transformer = Transformer.from_crs(CRS.from_epsg(4326), crs)
92
+ # Calculate the margin
93
+ margin_ratio = 0.40 # 40% margin
94
+
95
+ # Get the bounding box
96
+ min_x, min_y = transformer.transform(lat_min, lon_min)
97
+ max_x, max_y = transformer.transform(lat_max, lon_max)
98
+
99
+ # Calculate the range
100
+ x_range = abs(max_x - min_x)
101
+ y_range = abs(max_y - min_y)
102
+
103
+ margin_x = x_range * margin_ratio
104
+ margin_y = y_range * margin_ratio
105
+
106
+ # Expand the bounding box
107
+ min_x -= margin_x
108
+ max_x += margin_x
109
+ min_y -= margin_y
110
+ max_y += margin_y
111
+
112
+ # Select the region
113
+ if ds["y"].values[0] > ds["y"].values[-1]: # Eixo y decrescente
114
+ ds_ = ds.sel(x=slice(min_x, max_x), y=slice(max_y, min_y))
115
+ else: # Eixo y crescente
116
+ ds_ = ds.sel(x=slice(min_x, max_x), y=slice(min_y, max_y))
117
+ # Sort by y
118
+ if ds_["y"].values[0] > ds_["y"].values[-1]:
119
+ ds_ = ds_.sortby("y")
120
+ # Assign to ds
121
+ ds = ds_
122
+ except:
123
+ pass
124
+
125
+ # Reproject to EPSG:4326
126
+ ds = ds.rio.reproject("EPSG:4326", resolution=resolution)
96
127
 
97
128
  # Rename lat/lon coordinates
98
129
  ds = ds.rename({"x": "lon", "y": "lat"})
99
130
 
100
- # # Crop using lat/lon coordinates, in parallel
101
- ds = ds.rio.clip_box(minx=lon_min, miny=lat_min, maxx=lon_max, maxy=lat_max)
131
+ # Add resolution to attributes
132
+ ds[var_name].attrs['resolution'] = "x={:.2f} y={:.2f} degree".format(resolution, resolution)
102
133
 
103
- # Remove any previous file
104
- if pathlib.Path(f'{output}{file.split("/")[-1]}.nc').exists():
105
- pathlib.Path(f'{output}{file.split("/")[-1]}.nc').unlink()
134
+ # Crop using lat/lon coordinates, in parallel
135
+ ds = ds.rio.clip_box(minx=lon_min, miny=lat_min, maxx=lon_max, maxy=lat_max)
106
136
 
107
137
  # Add comments
108
- ds[var_name].attrs['comments'] = 'Cropped and reprojected to EPSG:4326 by helvecioblneto@gmail.com'
138
+ ds[var_name].attrs['comments'] = 'Cropped and reprojected to EPSG:4326 by goesgcp'
109
139
 
110
- # # Save as netcdf
111
- ds.to_netcdf(f'{output}{file.split("/")[-1]}')
140
+ # Add global metadata comments
141
+ ds.attrs['comments'] = "Data processed by goesgcp, author: Helvecio B. L. Neto (helvecioblneto@gmail.com)"
142
+
143
+ # Save as netcdf overwriting the original file
144
+ ds.to_netcdf(f'{output}{file.split("/")[-1]}', mode='w', format='NETCDF4_CLASSIC')
112
145
 
113
- # Remove original file
114
- pathlib.Path(file).unlink()
146
+ # Close the dataset
147
+ ds.close()
115
148
 
116
149
  return
117
150
 
118
151
 
119
152
 
153
+ def download_file(args):
154
+ """Downloads a file from a GCP bucket."""
155
+
156
+ bucket_name, blob_name, local_path = args
157
+
158
+ # Create a client
159
+ bucket = storage_client.bucket(bucket_name)
160
+ blob = bucket.blob(blob_name)
161
+
162
+ # Download the file
163
+ blob.download_to_filename(local_path, timeout=120)
164
+
165
+ # Crop and reproject the file
166
+ crop_reproject(local_path, output_path)
167
+
168
+ # Remove the file
169
+ pathlib.Path(local_path).unlink()
170
+
171
+
172
+
120
173
  def main():
121
174
 
122
175
  global output_path, var_name, \
123
176
  lat_min, lat_max, lon_min, lon_max, \
124
- max_attempts, parallel, recent, resolution
177
+ max_attempts, parallel, recent, resolution, storage_client
125
178
 
126
179
  epilog = """
127
180
  Example usage:
128
181
 
129
- - To download recent files from the GOES-16 satellite for the ABI-L2-CMIPF product, extracting the CMI variable from channel 13, in the last 30 minutes:
182
+ - To download recent 10 files from the GOES-16 satellite for the ABI-L2-CMIPF product:
130
183
 
131
- goesgcp --satellite goes16 --product ABI-L2-CMIP --domain F --var_name CMI --channel 13 --recent 10 --output_path "output/"
184
+ goesgcp --satellite goes16 --product ABI-L2-CMIP --recent 10 --output_path "output/"
132
185
  """
133
186
 
134
187
 
@@ -146,11 +199,11 @@ def main():
146
199
  parser.add_argument('--recent', type=int, default=3, help='Number of recent files to download')
147
200
 
148
201
  # Geographic bounding box
149
- parser.add_argument('--lat_min', type=float, default=-56, help='Minimum latitude of the bounding box')
150
- parser.add_argument('--lat_max', type=float, default=35, help='Maximum latitude of the bounding box')
151
- parser.add_argument('--lon_min', type=float, default=-116, help='Minimum longitude of the bounding box')
152
- parser.add_argument('--lon_max', type=float, default=-25, help='Maximum longitude of the bounding box')
153
- parser.add_argument('--resolution', type=float, default=0.045, help='Resolution of the output file')
202
+ parser.add_argument('--lat_min', type=float, default=-81.3282, help='Minimum latitude of the bounding box')
203
+ parser.add_argument('--lat_max', type=float, default=81.3282, help='Maximum latitude of the bounding box')
204
+ parser.add_argument('--lon_min', type=float, default=-156.2995, help='Minimum longitude of the bounding box')
205
+ parser.add_argument('--lon_max', type=float, default=6.2995, help='Maximum longitude of the bounding box')
206
+ parser.add_argument('--resolution', type=float, default=0.03208, help='Resolution of the output file')
154
207
  parser.add_argument('--output', type=str, default='output/', help='Path for saving output files')
155
208
 
156
209
  # Other settings
@@ -205,28 +258,22 @@ def main():
205
258
  if not recent_files:
206
259
  print(f"No files found with the pattern {pattern}. Exiting...")
207
260
  sys.exit(1)
208
- print('Downloading files...')
209
- # Loading bar
261
+
262
+ # Create a temporary directory
263
+ pathlib.Path('tmp/').mkdir(parents=True, exist_ok=True)
264
+
265
+ print(f"Downloading and processing {len(recent_files)} files...")
266
+
267
+ # Process files in parallel
210
268
  loading_bar = tqdm.tqdm(total=len(recent_files), ncols=100, position=0, leave=True,
211
269
  bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} + \
212
270
  [Elapsed:{elapsed} Remaining:<{remaining}]')
213
271
 
214
- # Create a temporary directory
215
- pathlib.Path('tmp/').mkdir(parents=True, exist_ok=True)
216
-
217
272
  # Download all files to a temporary directory
218
- with ThreadPoolExecutor(max_workers=args.processes) as executor:
219
- for file in recent_files:
220
- download_file(storage_client, bucket_name, file, f'tmp/{file.split("/")[-1]}')
273
+ with Pool(processes=args.processes) as pool:
274
+ for _ in pool.imap_unordered(download_file, [(bucket_name,
275
+ file, f'tmp/{file.split("/")[-1]}') for file in recent_files]):
221
276
  loading_bar.update(1)
222
- loading_bar.close()
223
-
224
- print('Cropping and reprojecting files...')
225
- # Crop and reproject all files in serial mode
226
- for file in recent_files:
227
- crop_reproject(f'tmp/{file.split("/")[-1]}', output)
228
- loading_bar.update(1)
229
- loading_bar.close()
230
277
 
231
278
  # Remove temporary directory
232
279
  shutil.rmtree('tmp/')
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: goesgcp
3
- Version: 1.0.3
3
+ Version: 1.0.5
4
4
  Summary: A package to download and process GOES-16/17 data
5
5
  Home-page: https://github.com/helvecioneto/goesgcp
6
6
  Author: Helvecio B. L. Neto
@@ -46,28 +46,26 @@ The script uses the `argparse` module for handling command-line arguments. Below
46
46
  goesgcp [OPTIONS]
47
47
  ```
48
48
 
49
- | Option | Description |
50
- |----------------------|-----------------------------------------------------------------------------|
51
- | `--satellite` | Name of the satellite (e.g., goes16). |
52
- | `--product` | Name of the satellite product (e.g., ABI-L2-CMIPF). |
53
- | `--var_name` | Variable name to extract (e.g., CMI). |
54
- | `--channel` | Channel to use (e.g., 13). | |
55
- | `--between_minutes` | Filter data between these minutes (default: `[0, 60]`). |
56
- | `--output_path` | Path for saving output files (default: `output/`). | |
57
- | `--lat_min` | Minimum latitude of the bounding box (default: `-56`). |
58
- | `--lat_max` | Maximum latitude of the bounding box (default: `35`). |
59
- | `--lon_min` | Minimum longitude of the bounding box (default: `-116`). |
60
- | `--lon_max` | Maximum longitude of the bounding box (default: `-25`). |
61
- | `--max_attempts` | Number of attempts to download a file before logging a failure (default: `3`).|
49
+ | Option | Description |
50
+ |----------------------|----------------------------------------------------------------------------|
51
+ | `--satellite` | Name of the satellite (e.g., goes16). |
52
+ | `--product` | Name of the satellite product (e.g., ABI-L2-CMIPF). |
53
+ | `--var_name` | Variable name to extract (e.g., CMI). |
54
+ | `--channel` | Channel to use (e.g., 13). |
55
+ | `--output` | Path for saving output files (default: `output/`). |
56
+ | `--lat_min` | Minimum latitude of the bounding box (default: `-56`). |
57
+ | `--lat_max` | Maximum latitude of the bounding box (default: `35`). |
58
+ | `--lon_min` | Minimum longitude of the bounding box (default: `-116`). |
59
+ | `--lon_max` | Maximum longitude of the bounding box (default: `-25`). |
60
+ | `--resolution` | Set the reprojet data resolution in degree (default: `-0.045`). |
62
61
 
63
62
  ### Examples
64
63
 
65
- To download and process recent data for the GOES-16 satellite, ABI-L2-CMIPF product, variable CMI, and channel 13, run the following command:
64
+ To download most 3 recent data for the GOES-16 satellite, ABI-L2-CMIPF product, variable CMI, and channel 13, run the following command:
66
65
 
67
66
  ```bash
68
- goesgcp --satellite goes16 --product ABI-L2-CMIPF --var_name CMI --channel 13 --recent 3 --output_path "output/"
67
+ goesgcp --satellite goes16 --product ABI-L2-CMIPF --var_name CMI --channel 13 --recent 3 --output "output/"
69
68
  ```
70
69
 
71
70
  ### Credits
72
- All the credit goes to the original author of the **goes2go** library.
73
71
  And this is a otimization by Helvecio Neto - 2025
@@ -13,7 +13,7 @@ with open('requirements.txt') as f:
13
13
 
14
14
  setup(
15
15
  name="goesgcp",
16
- version='1.0.3',
16
+ version='1.0.5',
17
17
  author="Helvecio B. L. Neto",
18
18
  author_email="helvecioblneto@gmail.com",
19
19
  description="A package to download and process GOES-16/17 data",
File without changes
File without changes
File without changes