PVNet 5.0.18__tar.gz → 5.0.20__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. {pvnet-5.0.18 → pvnet-5.0.20}/PKG-INFO +32 -75
  2. {pvnet-5.0.18 → pvnet-5.0.20}/PVNet.egg-info/PKG-INFO +32 -75
  3. {pvnet-5.0.18 → pvnet-5.0.20}/README.md +31 -74
  4. {pvnet-5.0.18 → pvnet-5.0.20}/LICENSE +0 -0
  5. {pvnet-5.0.18 → pvnet-5.0.20}/PVNet.egg-info/SOURCES.txt +0 -0
  6. {pvnet-5.0.18 → pvnet-5.0.20}/PVNet.egg-info/dependency_links.txt +0 -0
  7. {pvnet-5.0.18 → pvnet-5.0.20}/PVNet.egg-info/requires.txt +0 -0
  8. {pvnet-5.0.18 → pvnet-5.0.20}/PVNet.egg-info/top_level.txt +0 -0
  9. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/__init__.py +0 -0
  10. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/data/__init__.py +0 -0
  11. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/data/base_datamodule.py +0 -0
  12. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/data/site_datamodule.py +0 -0
  13. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/data/uk_regional_datamodule.py +0 -0
  14. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/load_model.py +0 -0
  15. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/__init__.py +0 -0
  16. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/base_model.py +0 -0
  17. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/ensemble.py +0 -0
  18. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/__init__.py +0 -0
  19. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/basic_blocks.py +0 -0
  20. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/encoders/__init__.py +0 -0
  21. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/encoders/basic_blocks.py +0 -0
  22. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/encoders/encoders3d.py +0 -0
  23. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/late_fusion.py +0 -0
  24. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/linear_networks/__init__.py +0 -0
  25. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/linear_networks/basic_blocks.py +0 -0
  26. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/linear_networks/networks.py +0 -0
  27. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/site_encoders/__init__.py +0 -0
  28. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/site_encoders/basic_blocks.py +0 -0
  29. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/models/late_fusion/site_encoders/encoders.py +0 -0
  30. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/optimizers.py +0 -0
  31. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/training/__init__.py +0 -0
  32. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/training/lightning_module.py +0 -0
  33. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/training/plots.py +0 -0
  34. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/training/train.py +0 -0
  35. {pvnet-5.0.18 → pvnet-5.0.20}/pvnet/utils.py +0 -0
  36. {pvnet-5.0.18 → pvnet-5.0.20}/pyproject.toml +0 -0
  37. {pvnet-5.0.18 → pvnet-5.0.20}/setup.cfg +0 -0
  38. {pvnet-5.0.18 → pvnet-5.0.20}/tests/test_end2end.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PVNet
3
- Version: 5.0.18
3
+ Version: 5.0.20
4
4
  Summary: PVNet
5
5
  Author-email: Peter Dudfield <info@openclimatefix.org>
6
6
  Requires-Python: >=3.11
@@ -142,120 +142,77 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
142
142
  If you install the local version of `ocf-data-sampler` that is more recent than the version
143
143
  specified in `PVNet` it is not guarenteed to function properly with this library.
144
144
 
145
- ## Pre-saving samples of data for training/validation of PVNet
145
+ ## Streaming samples (no pre-save)
146
146
 
147
- PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
147
+ PVNet now trains and validates directly from **streamed_samples** (i.e. no pre-saving to disk).
148
148
 
149
- Make sure you have copied the example configs (as already stated above):
150
- ```
149
+ Make sure you have copied example configs (as already stated above):
151
150
  cp -r configs.example configs
152
- ```
153
-
154
- ### Set up and config example for sample creation
155
-
156
- We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
157
151
 
152
+ ### Set up and config example for streaming
158
153
 
159
- When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
154
+ We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
160
155
 
161
- ```yaml
162
- configuration: "PLACEHOLDER.yaml"
163
- ```
164
-
165
- This should be given the whole path to the config on your local machine, for example:
156
+ At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
166
157
 
167
- ```yaml
168
158
  configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
169
- ```
170
-
171
- Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
172
-
173
- This is also where you can update the train, val & test periods to cover the data you have access to.
174
-
175
- ### Running the sample creation script
176
-
177
- Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
178
-
179
- ```bash
180
- python scripts/save_samples.py
181
- ```
182
- PVNet uses
183
- [hydra](https://hydra.cc/) which enables us to pass variables via the command
184
- line that will override the configuration defined in the `./configs` directory, like this:
185
-
186
- ```bash
187
- python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
188
- ```
189
159
 
190
- `scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
160
+ You can also update train/val/test time ranges here to match the period you have access to.
191
161
 
192
162
  If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
193
163
 
194
- ```
195
164
  gcloud auth login
196
- ```
197
165
 
198
- Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
166
+ You can provide multiple storage locations as a list. For example:
199
167
 
200
- ```yaml
201
168
  satellite:
202
- zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
203
- ```
204
-
205
- Or to satellite data hosted by Google:
169
+ zarr_path:
170
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
171
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
206
172
 
207
- ```yaml
208
- satellite:
209
- zarr_path:
210
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
211
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
212
- ```
213
-
214
- ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
173
+ `ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
215
174
 
175
+ ⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
216
176
 
217
177
  ### Training PVNet
218
178
 
219
- How PVNet is run is determined by the extensive configuration in the config
220
- files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
179
+ How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
221
180
 
222
- Make sure to update the following config files before training your model:
181
+ Update the following before training:
223
182
 
224
- 1. In `configs/datamodule/presaved_samples.yaml`:
225
- - update `sample_dir` to point to the directory you stored your samples in during sample creation
226
- 2. In `configs/model/late_fusion.yaml`:
227
- - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
228
- - `in_channels`: number of variables your NWP source supplies
229
- - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
230
- 3. In `configs/trainer/default.yaml`:
231
- - set `accelerator: 0` if running on a system without a supported GPU
183
+ 1. In `configs/model/late_fusion.yaml`:
184
+ - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
185
+ - `in_channels`: the number of variables your NWP source supplies
186
+ - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
187
+ 2. In `configs/trainer/default.yaml`:
188
+ - Set `accelerator: 0` if running on a system without a supported GPU
189
+ 3. In `configs/datamodule/streamed_samples.yaml`:
190
+ - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
191
+ - Adjust the train/val/test time ranges to your available data
232
192
 
233
- If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
234
- your customised config files:
193
+ If you create custom config files, update the main `./configs/config.yaml` defaults:
235
194
 
236
- ```yaml
237
195
  defaults:
238
196
  - trainer: default.yaml
239
197
  - model: late_fusion.yaml
240
- - datamodule: presaved_samples.yaml
198
+ - datamodule: streamed_samples.yaml
241
199
  - callbacks: null
242
200
  - experiment: null
243
201
  - hparams_search: null
244
202
  - hydra: default.yaml
245
- ```
246
203
 
247
- Assuming you ran the `save_samples.py` script to generate some presaved train and
248
- val data samples, you can now train PVNet by running:
204
+ Now train PVNet:
249
205
 
250
- ```
251
206
  python run.py
252
- ```
207
+
208
+ You can override any setting with Hydra, e.g.:
209
+
210
+ python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
253
211
 
254
212
  ## Backtest
255
213
 
256
214
  If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
257
215
 
258
-
259
216
  ## Testing
260
217
 
261
218
  You can use `python -m pytest tests` to run tests
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PVNet
3
- Version: 5.0.18
3
+ Version: 5.0.20
4
4
  Summary: PVNet
5
5
  Author-email: Peter Dudfield <info@openclimatefix.org>
6
6
  Requires-Python: >=3.11
@@ -142,120 +142,77 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
142
142
  If you install the local version of `ocf-data-sampler` that is more recent than the version
143
143
  specified in `PVNet` it is not guarenteed to function properly with this library.
144
144
 
145
- ## Pre-saving samples of data for training/validation of PVNet
145
+ ## Streaming samples (no pre-save)
146
146
 
147
- PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
147
+ PVNet now trains and validates directly from **streamed_samples** (i.e. no pre-saving to disk).
148
148
 
149
- Make sure you have copied the example configs (as already stated above):
150
- ```
149
+ Make sure you have copied example configs (as already stated above):
151
150
  cp -r configs.example configs
152
- ```
153
-
154
- ### Set up and config example for sample creation
155
-
156
- We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
157
151
 
152
+ ### Set up and config example for streaming
158
153
 
159
- When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
154
+ We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
160
155
 
161
- ```yaml
162
- configuration: "PLACEHOLDER.yaml"
163
- ```
164
-
165
- This should be given the whole path to the config on your local machine, for example:
156
+ At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
166
157
 
167
- ```yaml
168
158
  configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
169
- ```
170
-
171
- Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
172
-
173
- This is also where you can update the train, val & test periods to cover the data you have access to.
174
-
175
- ### Running the sample creation script
176
-
177
- Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
178
-
179
- ```bash
180
- python scripts/save_samples.py
181
- ```
182
- PVNet uses
183
- [hydra](https://hydra.cc/) which enables us to pass variables via the command
184
- line that will override the configuration defined in the `./configs` directory, like this:
185
-
186
- ```bash
187
- python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
188
- ```
189
159
 
190
- `scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
160
+ You can also update train/val/test time ranges here to match the period you have access to.
191
161
 
192
162
  If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
193
163
 
194
- ```
195
164
  gcloud auth login
196
- ```
197
165
 
198
- Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
166
+ You can provide multiple storage locations as a list. For example:
199
167
 
200
- ```yaml
201
168
  satellite:
202
- zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
203
- ```
204
-
205
- Or to satellite data hosted by Google:
169
+ zarr_path:
170
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
171
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
206
172
 
207
- ```yaml
208
- satellite:
209
- zarr_path:
210
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
211
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
212
- ```
213
-
214
- ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
173
+ `ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
215
174
 
175
+ ⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
216
176
 
217
177
  ### Training PVNet
218
178
 
219
- How PVNet is run is determined by the extensive configuration in the config
220
- files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
179
+ How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
221
180
 
222
- Make sure to update the following config files before training your model:
181
+ Update the following before training:
223
182
 
224
- 1. In `configs/datamodule/presaved_samples.yaml`:
225
- - update `sample_dir` to point to the directory you stored your samples in during sample creation
226
- 2. In `configs/model/late_fusion.yaml`:
227
- - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
228
- - `in_channels`: number of variables your NWP source supplies
229
- - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
230
- 3. In `configs/trainer/default.yaml`:
231
- - set `accelerator: 0` if running on a system without a supported GPU
183
+ 1. In `configs/model/late_fusion.yaml`:
184
+ - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
185
+ - `in_channels`: the number of variables your NWP source supplies
186
+ - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
187
+ 2. In `configs/trainer/default.yaml`:
188
+ - Set `accelerator: 0` if running on a system without a supported GPU
189
+ 3. In `configs/datamodule/streamed_samples.yaml`:
190
+ - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
191
+ - Adjust the train/val/test time ranges to your available data
232
192
 
233
- If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
234
- your customised config files:
193
+ If you create custom config files, update the main `./configs/config.yaml` defaults:
235
194
 
236
- ```yaml
237
195
  defaults:
238
196
  - trainer: default.yaml
239
197
  - model: late_fusion.yaml
240
- - datamodule: presaved_samples.yaml
198
+ - datamodule: streamed_samples.yaml
241
199
  - callbacks: null
242
200
  - experiment: null
243
201
  - hparams_search: null
244
202
  - hydra: default.yaml
245
- ```
246
203
 
247
- Assuming you ran the `save_samples.py` script to generate some presaved train and
248
- val data samples, you can now train PVNet by running:
204
+ Now train PVNet:
249
205
 
250
- ```
251
206
  python run.py
252
- ```
207
+
208
+ You can override any setting with Hydra, e.g.:
209
+
210
+ python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
253
211
 
254
212
  ## Backtest
255
213
 
256
214
  If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
257
215
 
258
-
259
216
  ## Testing
260
217
 
261
218
  You can use `python -m pytest tests` to run tests
@@ -113,120 +113,77 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
113
113
  If you install the local version of `ocf-data-sampler` that is more recent than the version
114
114
  specified in `PVNet` it is not guarenteed to function properly with this library.
115
115
 
116
- ## Pre-saving samples of data for training/validation of PVNet
116
+ ## Streaming samples (no pre-save)
117
117
 
118
- PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
118
+ PVNet now trains and validates directly from **streamed_samples** (i.e. no pre-saving to disk).
119
119
 
120
- Make sure you have copied the example configs (as already stated above):
121
- ```
120
+ Make sure you have copied example configs (as already stated above):
122
121
  cp -r configs.example configs
123
- ```
124
-
125
- ### Set up and config example for sample creation
126
-
127
- We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
128
122
 
123
+ ### Set up and config example for streaming
129
124
 
130
- When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
125
+ We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
131
126
 
132
- ```yaml
133
- configuration: "PLACEHOLDER.yaml"
134
- ```
135
-
136
- This should be given the whole path to the config on your local machine, for example:
127
+ At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
137
128
 
138
- ```yaml
139
129
  configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
140
- ```
141
-
142
- Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
143
-
144
- This is also where you can update the train, val & test periods to cover the data you have access to.
145
-
146
- ### Running the sample creation script
147
-
148
- Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
149
-
150
- ```bash
151
- python scripts/save_samples.py
152
- ```
153
- PVNet uses
154
- [hydra](https://hydra.cc/) which enables us to pass variables via the command
155
- line that will override the configuration defined in the `./configs` directory, like this:
156
-
157
- ```bash
158
- python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
159
- ```
160
130
 
161
- `scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
131
+ You can also update train/val/test time ranges here to match the period you have access to.
162
132
 
163
133
  If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
164
134
 
165
- ```
166
135
  gcloud auth login
167
- ```
168
136
 
169
- Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
137
+ You can provide multiple storage locations as a list. For example:
170
138
 
171
- ```yaml
172
139
  satellite:
173
- zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
174
- ```
175
-
176
- Or to satellite data hosted by Google:
140
+ zarr_path:
141
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
142
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
177
143
 
178
- ```yaml
179
- satellite:
180
- zarr_path:
181
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
182
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
183
- ```
184
-
185
- ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
144
+ `ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
186
145
 
146
+ ⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
187
147
 
188
148
  ### Training PVNet
189
149
 
190
- How PVNet is run is determined by the extensive configuration in the config
191
- files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
150
+ How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
192
151
 
193
- Make sure to update the following config files before training your model:
152
+ Update the following before training:
194
153
 
195
- 1. In `configs/datamodule/presaved_samples.yaml`:
196
- - update `sample_dir` to point to the directory you stored your samples in during sample creation
197
- 2. In `configs/model/late_fusion.yaml`:
198
- - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
199
- - `in_channels`: number of variables your NWP source supplies
200
- - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
201
- 3. In `configs/trainer/default.yaml`:
202
- - set `accelerator: 0` if running on a system without a supported GPU
154
+ 1. In `configs/model/late_fusion.yaml`:
155
+ - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
156
+ - `in_channels`: the number of variables your NWP source supplies
157
+ - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
158
+ 2. In `configs/trainer/default.yaml`:
159
+ - Set `accelerator: 0` if running on a system without a supported GPU
160
+ 3. In `configs/datamodule/streamed_samples.yaml`:
161
+ - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
162
+ - Adjust the train/val/test time ranges to your available data
203
163
 
204
- If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
205
- your customised config files:
164
+ If you create custom config files, update the main `./configs/config.yaml` defaults:
206
165
 
207
- ```yaml
208
166
  defaults:
209
167
  - trainer: default.yaml
210
168
  - model: late_fusion.yaml
211
- - datamodule: presaved_samples.yaml
169
+ - datamodule: streamed_samples.yaml
212
170
  - callbacks: null
213
171
  - experiment: null
214
172
  - hparams_search: null
215
173
  - hydra: default.yaml
216
- ```
217
174
 
218
- Assuming you ran the `save_samples.py` script to generate some presaved train and
219
- val data samples, you can now train PVNet by running:
175
+ Now train PVNet:
220
176
 
221
- ```
222
177
  python run.py
223
- ```
178
+
179
+ You can override any setting with Hydra, e.g.:
180
+
181
+ python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
224
182
 
225
183
  ## Backtest
226
184
 
227
185
  If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
228
186
 
229
-
230
187
  ## Testing
231
188
 
232
189
  You can use `python -m pytest tests` to run tests
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes