PVNet 5.0.15__tar.gz → 5.2.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. {pvnet-5.0.15 → pvnet-5.2.3}/PKG-INFO +33 -81
  2. {pvnet-5.0.15 → pvnet-5.2.3}/PVNet.egg-info/PKG-INFO +33 -81
  3. {pvnet-5.0.15 → pvnet-5.2.3}/PVNet.egg-info/SOURCES.txt +2 -4
  4. {pvnet-5.0.15 → pvnet-5.2.3}/PVNet.egg-info/requires.txt +1 -1
  5. {pvnet-5.0.15 → pvnet-5.2.3}/README.md +30 -78
  6. pvnet-5.0.15/pvnet/data/base_datamodule.py → pvnet-5.2.3/pvnet/datamodule.py +22 -90
  7. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/load_model.py +2 -0
  8. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/base_model.py +1 -1
  9. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/encoders/basic_blocks.py +1 -0
  10. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/encoders/encoders3d.py +24 -11
  11. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/late_fusion.py +12 -6
  12. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/training/lightning_module.py +17 -10
  13. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/training/train.py +1 -23
  14. pvnet-5.2.3/pvnet/utils.py +169 -0
  15. {pvnet-5.0.15 → pvnet-5.2.3}/pyproject.toml +2 -2
  16. pvnet-5.2.3/tests/test_datamodule.py +15 -0
  17. {pvnet-5.0.15 → pvnet-5.2.3}/tests/test_end2end.py +6 -5
  18. pvnet-5.0.15/pvnet/data/__init__.py +0 -3
  19. pvnet-5.0.15/pvnet/data/site_datamodule.py +0 -29
  20. pvnet-5.0.15/pvnet/data/uk_regional_datamodule.py +0 -29
  21. pvnet-5.0.15/pvnet/utils.py +0 -88
  22. {pvnet-5.0.15 → pvnet-5.2.3}/LICENSE +0 -0
  23. {pvnet-5.0.15 → pvnet-5.2.3}/PVNet.egg-info/dependency_links.txt +0 -0
  24. {pvnet-5.0.15 → pvnet-5.2.3}/PVNet.egg-info/top_level.txt +0 -0
  25. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/__init__.py +0 -0
  26. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/__init__.py +0 -0
  27. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/ensemble.py +0 -0
  28. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/__init__.py +0 -0
  29. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/basic_blocks.py +0 -0
  30. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/encoders/__init__.py +0 -0
  31. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/linear_networks/__init__.py +0 -0
  32. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/linear_networks/basic_blocks.py +0 -0
  33. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/linear_networks/networks.py +0 -0
  34. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/site_encoders/__init__.py +0 -0
  35. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/site_encoders/basic_blocks.py +0 -0
  36. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/models/late_fusion/site_encoders/encoders.py +0 -0
  37. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/optimizers.py +0 -0
  38. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/training/__init__.py +0 -0
  39. {pvnet-5.0.15 → pvnet-5.2.3}/pvnet/training/plots.py +0 -0
  40. {pvnet-5.0.15 → pvnet-5.2.3}/setup.cfg +0 -0
@@ -1,12 +1,12 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PVNet
3
- Version: 5.0.15
3
+ Version: 5.2.3
4
4
  Summary: PVNet
5
5
  Author-email: Peter Dudfield <info@openclimatefix.org>
6
- Requires-Python: >=3.11
6
+ Requires-Python: <3.14,>=3.11
7
7
  Description-Content-Type: text/markdown
8
8
  License-File: LICENSE
9
- Requires-Dist: ocf-data-sampler>=0.5.20
9
+ Requires-Dist: ocf-data-sampler>=0.6.0
10
10
  Requires-Dist: numpy
11
11
  Requires-Dist: pandas
12
12
  Requires-Dist: matplotlib
@@ -29,7 +29,7 @@ Dynamic: license-file
29
29
 
30
30
  # PVNet
31
31
  <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
32
- [![All Contributors](https://img.shields.io/badge/all_contributors-20-orange.svg?style=flat-square)](#contributors-)
32
+ [![All Contributors](https://img.shields.io/badge/all_contributors-21-orange.svg?style=flat-square)](#contributors-)
33
33
  <!-- ALL-CONTRIBUTORS-BADGE:END -->
34
34
 
35
35
  [![tags badge](https://img.shields.io/github/v/tag/openclimatefix/PVNet?include_prereleases&sort=semver&color=FFAC5F)](https://github.com/openclimatefix/PVNet/tags)
@@ -142,120 +142,71 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
142
142
  If you install the local version of `ocf-data-sampler` that is more recent than the version
143
143
  specified in `PVNet` it is not guarenteed to function properly with this library.
144
144
 
145
- ## Pre-saving samples of data for training/validation of PVNet
146
145
 
147
- PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
146
+ ### Set up and config example for streaming
148
147
 
149
- Make sure you have copied the example configs (as already stated above):
150
- ```
151
- cp -r configs.example configs
152
- ```
153
-
154
- ### Set up and config example for sample creation
155
-
156
- We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
157
-
158
-
159
- When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
160
-
161
- ```yaml
162
- configuration: "PLACEHOLDER.yaml"
163
- ```
148
+ We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
164
149
 
165
- This should be given the whole path to the config on your local machine, for example:
150
+ At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
166
151
 
167
- ```yaml
168
152
  configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
169
- ```
170
-
171
- Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
172
-
173
- This is also where you can update the train, val & test periods to cover the data you have access to.
174
-
175
- ### Running the sample creation script
176
153
 
177
- Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
178
-
179
- ```bash
180
- python scripts/save_samples.py
181
- ```
182
- PVNet uses
183
- [hydra](https://hydra.cc/) which enables us to pass variables via the command
184
- line that will override the configuration defined in the `./configs` directory, like this:
185
-
186
- ```bash
187
- python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
188
- ```
189
-
190
- `scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
154
+ You can also update train/val/test time ranges here to match the period you have access to.
191
155
 
192
156
  If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
193
157
 
194
- ```
195
158
  gcloud auth login
196
- ```
197
-
198
- Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
199
-
200
- ```yaml
201
- satellite:
202
- zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
203
- ```
204
159
 
205
- Or to satellite data hosted by Google:
160
+ You can provide multiple storage locations as a list. For example:
206
161
 
207
- ```yaml
208
162
  satellite:
209
- zarr_path:
210
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
211
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
212
- ```
163
+ zarr_path:
164
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
165
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
213
166
 
214
- ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
167
+ `ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
215
168
 
169
+ ⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
216
170
 
217
171
  ### Training PVNet
218
172
 
219
- How PVNet is run is determined by the extensive configuration in the config
220
- files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
173
+ How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
221
174
 
222
- Make sure to update the following config files before training your model:
175
+ Update the following before training:
223
176
 
224
- 1. In `configs/datamodule/presaved_samples.yaml`:
225
- - update `sample_dir` to point to the directory you stored your samples in during sample creation
226
- 2. In `configs/model/late_fusion.yaml`:
227
- - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
228
- - `in_channels`: number of variables your NWP source supplies
229
- - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
230
- 3. In `configs/trainer/default.yaml`:
231
- - set `accelerator: 0` if running on a system without a supported GPU
177
+ 1. In `configs/model/late_fusion.yaml`:
178
+ - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
179
+ - `in_channels`: the number of variables your NWP source supplies
180
+ - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
181
+ 2. In `configs/trainer/default.yaml`:
182
+ - Set `accelerator: 0` if running on a system without a supported GPU
183
+ 3. In `configs/datamodule/streamed_samples.yaml`:
184
+ - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
185
+ - Adjust the train/val/test time ranges to your available data
232
186
 
233
- If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
234
- your customised config files:
187
+ If you create custom config files, update the main `./configs/config.yaml` defaults:
235
188
 
236
- ```yaml
237
189
  defaults:
238
190
  - trainer: default.yaml
239
191
  - model: late_fusion.yaml
240
- - datamodule: presaved_samples.yaml
192
+ - datamodule: streamed_samples.yaml
241
193
  - callbacks: null
242
194
  - experiment: null
243
195
  - hparams_search: null
244
196
  - hydra: default.yaml
245
- ```
246
197
 
247
- Assuming you ran the `save_samples.py` script to generate some presaved train and
248
- val data samples, you can now train PVNet by running:
198
+ Now train PVNet:
249
199
 
250
- ```
251
200
  python run.py
252
- ```
201
+
202
+ You can override any setting with Hydra, e.g.:
203
+
204
+ python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
253
205
 
254
206
  ## Backtest
255
207
 
256
208
  If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
257
209
 
258
-
259
210
  ## Testing
260
211
 
261
212
  You can use `python -m pytest tests` to run tests
@@ -294,6 +245,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
294
245
  <td align="center" valign="top" width="14.28%"><a href="https://github.com/markus-kreft"><img src="https://avatars.githubusercontent.com/u/129367085?v=4?s=100" width="100px;" alt="Markus Kreft"/><br /><sub><b>Markus Kreft</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=markus-kreft" title="Code">💻</a></td>
295
246
  <td align="center" valign="top" width="14.28%"><a href="http://jack-kelly.com"><img src="https://avatars.githubusercontent.com/u/460756?v=4?s=100" width="100px;" alt="Jack Kelly"/><br /><sub><b>Jack Kelly</b></sub></a><br /><a href="#ideas-JackKelly" title="Ideas, Planning, & Feedback">🤔</a></td>
296
247
  <td align="center" valign="top" width="14.28%"><a href="https://github.com/zaryab-ali"><img src="https://avatars.githubusercontent.com/u/85732412?v=4?s=100" width="100px;" alt="zaryab-ali"/><br /><sub><b>zaryab-ali</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=zaryab-ali" title="Code">💻</a></td>
248
+ <td align="center" valign="top" width="14.28%"><a href="https://github.com/Lex-Ashu"><img src="https://avatars.githubusercontent.com/u/181084934?v=4?s=100" width="100px;" alt="Lex-Ashu"/><br /><sub><b>Lex-Ashu</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=Lex-Ashu" title="Code">💻</a></td>
297
249
  </tr>
298
250
  </tbody>
299
251
  </table>
@@ -1,12 +1,12 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PVNet
3
- Version: 5.0.15
3
+ Version: 5.2.3
4
4
  Summary: PVNet
5
5
  Author-email: Peter Dudfield <info@openclimatefix.org>
6
- Requires-Python: >=3.11
6
+ Requires-Python: <3.14,>=3.11
7
7
  Description-Content-Type: text/markdown
8
8
  License-File: LICENSE
9
- Requires-Dist: ocf-data-sampler>=0.5.20
9
+ Requires-Dist: ocf-data-sampler>=0.6.0
10
10
  Requires-Dist: numpy
11
11
  Requires-Dist: pandas
12
12
  Requires-Dist: matplotlib
@@ -29,7 +29,7 @@ Dynamic: license-file
29
29
 
30
30
  # PVNet
31
31
  <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
32
- [![All Contributors](https://img.shields.io/badge/all_contributors-20-orange.svg?style=flat-square)](#contributors-)
32
+ [![All Contributors](https://img.shields.io/badge/all_contributors-21-orange.svg?style=flat-square)](#contributors-)
33
33
  <!-- ALL-CONTRIBUTORS-BADGE:END -->
34
34
 
35
35
  [![tags badge](https://img.shields.io/github/v/tag/openclimatefix/PVNet?include_prereleases&sort=semver&color=FFAC5F)](https://github.com/openclimatefix/PVNet/tags)
@@ -142,120 +142,71 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
142
142
  If you install the local version of `ocf-data-sampler` that is more recent than the version
143
143
  specified in `PVNet` it is not guarenteed to function properly with this library.
144
144
 
145
- ## Pre-saving samples of data for training/validation of PVNet
146
145
 
147
- PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
146
+ ### Set up and config example for streaming
148
147
 
149
- Make sure you have copied the example configs (as already stated above):
150
- ```
151
- cp -r configs.example configs
152
- ```
153
-
154
- ### Set up and config example for sample creation
155
-
156
- We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
157
-
158
-
159
- When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
160
-
161
- ```yaml
162
- configuration: "PLACEHOLDER.yaml"
163
- ```
148
+ We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
164
149
 
165
- This should be given the whole path to the config on your local machine, for example:
150
+ At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
166
151
 
167
- ```yaml
168
152
  configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
169
- ```
170
-
171
- Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
172
-
173
- This is also where you can update the train, val & test periods to cover the data you have access to.
174
-
175
- ### Running the sample creation script
176
153
 
177
- Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
178
-
179
- ```bash
180
- python scripts/save_samples.py
181
- ```
182
- PVNet uses
183
- [hydra](https://hydra.cc/) which enables us to pass variables via the command
184
- line that will override the configuration defined in the `./configs` directory, like this:
185
-
186
- ```bash
187
- python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
188
- ```
189
-
190
- `scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
154
+ You can also update train/val/test time ranges here to match the period you have access to.
191
155
 
192
156
  If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
193
157
 
194
- ```
195
158
  gcloud auth login
196
- ```
197
-
198
- Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
199
-
200
- ```yaml
201
- satellite:
202
- zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
203
- ```
204
159
 
205
- Or to satellite data hosted by Google:
160
+ You can provide multiple storage locations as a list. For example:
206
161
 
207
- ```yaml
208
162
  satellite:
209
- zarr_path:
210
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
211
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
212
- ```
163
+ zarr_path:
164
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
165
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
213
166
 
214
- ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
167
+ `ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
215
168
 
169
+ ⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
216
170
 
217
171
  ### Training PVNet
218
172
 
219
- How PVNet is run is determined by the extensive configuration in the config
220
- files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
173
+ How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
221
174
 
222
- Make sure to update the following config files before training your model:
175
+ Update the following before training:
223
176
 
224
- 1. In `configs/datamodule/presaved_samples.yaml`:
225
- - update `sample_dir` to point to the directory you stored your samples in during sample creation
226
- 2. In `configs/model/late_fusion.yaml`:
227
- - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
228
- - `in_channels`: number of variables your NWP source supplies
229
- - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
230
- 3. In `configs/trainer/default.yaml`:
231
- - set `accelerator: 0` if running on a system without a supported GPU
177
+ 1. In `configs/model/late_fusion.yaml`:
178
+ - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
179
+ - `in_channels`: the number of variables your NWP source supplies
180
+ - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
181
+ 2. In `configs/trainer/default.yaml`:
182
+ - Set `accelerator: 0` if running on a system without a supported GPU
183
+ 3. In `configs/datamodule/streamed_samples.yaml`:
184
+ - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
185
+ - Adjust the train/val/test time ranges to your available data
232
186
 
233
- If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
234
- your customised config files:
187
+ If you create custom config files, update the main `./configs/config.yaml` defaults:
235
188
 
236
- ```yaml
237
189
  defaults:
238
190
  - trainer: default.yaml
239
191
  - model: late_fusion.yaml
240
- - datamodule: presaved_samples.yaml
192
+ - datamodule: streamed_samples.yaml
241
193
  - callbacks: null
242
194
  - experiment: null
243
195
  - hparams_search: null
244
196
  - hydra: default.yaml
245
- ```
246
197
 
247
- Assuming you ran the `save_samples.py` script to generate some presaved train and
248
- val data samples, you can now train PVNet by running:
198
+ Now train PVNet:
249
199
 
250
- ```
251
200
  python run.py
252
- ```
201
+
202
+ You can override any setting with Hydra, e.g.:
203
+
204
+ python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
253
205
 
254
206
  ## Backtest
255
207
 
256
208
  If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
257
209
 
258
-
259
210
  ## Testing
260
211
 
261
212
  You can use `python -m pytest tests` to run tests
@@ -294,6 +245,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
294
245
  <td align="center" valign="top" width="14.28%"><a href="https://github.com/markus-kreft"><img src="https://avatars.githubusercontent.com/u/129367085?v=4?s=100" width="100px;" alt="Markus Kreft"/><br /><sub><b>Markus Kreft</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=markus-kreft" title="Code">💻</a></td>
295
246
  <td align="center" valign="top" width="14.28%"><a href="http://jack-kelly.com"><img src="https://avatars.githubusercontent.com/u/460756?v=4?s=100" width="100px;" alt="Jack Kelly"/><br /><sub><b>Jack Kelly</b></sub></a><br /><a href="#ideas-JackKelly" title="Ideas, Planning, & Feedback">🤔</a></td>
296
247
  <td align="center" valign="top" width="14.28%"><a href="https://github.com/zaryab-ali"><img src="https://avatars.githubusercontent.com/u/85732412?v=4?s=100" width="100px;" alt="zaryab-ali"/><br /><sub><b>zaryab-ali</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=zaryab-ali" title="Code">💻</a></td>
248
+ <td align="center" valign="top" width="14.28%"><a href="https://github.com/Lex-Ashu"><img src="https://avatars.githubusercontent.com/u/181084934?v=4?s=100" width="100px;" alt="Lex-Ashu"/><br /><sub><b>Lex-Ashu</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=Lex-Ashu" title="Code">💻</a></td>
297
249
  </tr>
298
250
  </tbody>
299
251
  </table>
@@ -7,13 +7,10 @@ PVNet.egg-info/dependency_links.txt
7
7
  PVNet.egg-info/requires.txt
8
8
  PVNet.egg-info/top_level.txt
9
9
  pvnet/__init__.py
10
+ pvnet/datamodule.py
10
11
  pvnet/load_model.py
11
12
  pvnet/optimizers.py
12
13
  pvnet/utils.py
13
- pvnet/data/__init__.py
14
- pvnet/data/base_datamodule.py
15
- pvnet/data/site_datamodule.py
16
- pvnet/data/uk_regional_datamodule.py
17
14
  pvnet/models/__init__.py
18
15
  pvnet/models/base_model.py
19
16
  pvnet/models/ensemble.py
@@ -33,4 +30,5 @@ pvnet/training/__init__.py
33
30
  pvnet/training/lightning_module.py
34
31
  pvnet/training/plots.py
35
32
  pvnet/training/train.py
33
+ tests/test_datamodule.py
36
34
  tests/test_end2end.py
@@ -1,4 +1,4 @@
1
- ocf-data-sampler>=0.5.20
1
+ ocf-data-sampler>=0.6.0
2
2
  numpy
3
3
  pandas
4
4
  matplotlib
@@ -1,6 +1,6 @@
1
1
  # PVNet
2
2
  <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
3
- [![All Contributors](https://img.shields.io/badge/all_contributors-20-orange.svg?style=flat-square)](#contributors-)
3
+ [![All Contributors](https://img.shields.io/badge/all_contributors-21-orange.svg?style=flat-square)](#contributors-)
4
4
  <!-- ALL-CONTRIBUTORS-BADGE:END -->
5
5
 
6
6
  [![tags badge](https://img.shields.io/github/v/tag/openclimatefix/PVNet?include_prereleases&sort=semver&color=FFAC5F)](https://github.com/openclimatefix/PVNet/tags)
@@ -113,120 +113,71 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
113
113
  If you install the local version of `ocf-data-sampler` that is more recent than the version
114
114
  specified in `PVNet` it is not guarenteed to function properly with this library.
115
115
 
116
- ## Pre-saving samples of data for training/validation of PVNet
117
116
 
118
- PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
117
+ ### Set up and config example for streaming
119
118
 
120
- Make sure you have copied the example configs (as already stated above):
121
- ```
122
- cp -r configs.example configs
123
- ```
124
-
125
- ### Set up and config example for sample creation
126
-
127
- We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
128
-
129
-
130
- When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
131
-
132
- ```yaml
133
- configuration: "PLACEHOLDER.yaml"
134
- ```
119
+ We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
135
120
 
136
- This should be given the whole path to the config on your local machine, for example:
121
+ At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
137
122
 
138
- ```yaml
139
123
  configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
140
- ```
141
-
142
- Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
143
-
144
- This is also where you can update the train, val & test periods to cover the data you have access to.
145
-
146
- ### Running the sample creation script
147
124
 
148
- Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
149
-
150
- ```bash
151
- python scripts/save_samples.py
152
- ```
153
- PVNet uses
154
- [hydra](https://hydra.cc/) which enables us to pass variables via the command
155
- line that will override the configuration defined in the `./configs` directory, like this:
156
-
157
- ```bash
158
- python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
159
- ```
160
-
161
- `scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
125
+ You can also update train/val/test time ranges here to match the period you have access to.
162
126
 
163
127
  If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
164
128
 
165
- ```
166
129
  gcloud auth login
167
- ```
168
-
169
- Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
170
-
171
- ```yaml
172
- satellite:
173
- zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
174
- ```
175
130
 
176
- Or to satellite data hosted by Google:
131
+ You can provide multiple storage locations as a list. For example:
177
132
 
178
- ```yaml
179
133
  satellite:
180
- zarr_path:
181
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
182
- - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
183
- ```
134
+ zarr_path:
135
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
136
+ - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
184
137
 
185
- ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
138
+ `ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
186
139
 
140
+ ⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
187
141
 
188
142
  ### Training PVNet
189
143
 
190
- How PVNet is run is determined by the extensive configuration in the config
191
- files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
144
+ How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
192
145
 
193
- Make sure to update the following config files before training your model:
146
+ Update the following before training:
194
147
 
195
- 1. In `configs/datamodule/presaved_samples.yaml`:
196
- - update `sample_dir` to point to the directory you stored your samples in during sample creation
197
- 2. In `configs/model/late_fusion.yaml`:
198
- - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
199
- - `in_channels`: number of variables your NWP source supplies
200
- - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
201
- 3. In `configs/trainer/default.yaml`:
202
- - set `accelerator: 0` if running on a system without a supported GPU
148
+ 1. In `configs/model/late_fusion.yaml`:
149
+ - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
150
+ - `in_channels`: the number of variables your NWP source supplies
151
+ - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
152
+ 2. In `configs/trainer/default.yaml`:
153
+ - Set `accelerator: 0` if running on a system without a supported GPU
154
+ 3. In `configs/datamodule/streamed_samples.yaml`:
155
+ - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
156
+ - Adjust the train/val/test time ranges to your available data
203
157
 
204
- If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
205
- your customised config files:
158
+ If you create custom config files, update the main `./configs/config.yaml` defaults:
206
159
 
207
- ```yaml
208
160
  defaults:
209
161
  - trainer: default.yaml
210
162
  - model: late_fusion.yaml
211
- - datamodule: presaved_samples.yaml
163
+ - datamodule: streamed_samples.yaml
212
164
  - callbacks: null
213
165
  - experiment: null
214
166
  - hparams_search: null
215
167
  - hydra: default.yaml
216
- ```
217
168
 
218
- Assuming you ran the `save_samples.py` script to generate some presaved train and
219
- val data samples, you can now train PVNet by running:
169
+ Now train PVNet:
220
170
 
221
- ```
222
171
  python run.py
223
- ```
172
+
173
+ You can override any setting with Hydra, e.g.:
174
+
175
+ python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
224
176
 
225
177
  ## Backtest
226
178
 
227
179
  If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
228
180
 
229
-
230
181
  ## Testing
231
182
 
232
183
  You can use `python -m pytest tests` to run tests
@@ -265,6 +216,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
265
216
  <td align="center" valign="top" width="14.28%"><a href="https://github.com/markus-kreft"><img src="https://avatars.githubusercontent.com/u/129367085?v=4?s=100" width="100px;" alt="Markus Kreft"/><br /><sub><b>Markus Kreft</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=markus-kreft" title="Code">💻</a></td>
266
217
  <td align="center" valign="top" width="14.28%"><a href="http://jack-kelly.com"><img src="https://avatars.githubusercontent.com/u/460756?v=4?s=100" width="100px;" alt="Jack Kelly"/><br /><sub><b>Jack Kelly</b></sub></a><br /><a href="#ideas-JackKelly" title="Ideas, Planning, & Feedback">🤔</a></td>
267
218
  <td align="center" valign="top" width="14.28%"><a href="https://github.com/zaryab-ali"><img src="https://avatars.githubusercontent.com/u/85732412?v=4?s=100" width="100px;" alt="zaryab-ali"/><br /><sub><b>zaryab-ali</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=zaryab-ali" title="Code">💻</a></td>
219
+ <td align="center" valign="top" width="14.28%"><a href="https://github.com/Lex-Ashu"><img src="https://avatars.githubusercontent.com/u/181084934?v=4?s=100" width="100px;" alt="Lex-Ashu"/><br /><sub><b>Lex-Ashu</b></sub></a><br /><a href="https://github.com/openclimatefix/pvnet/commits?author=Lex-Ashu" title="Code">💻</a></td>
268
220
  </tr>
269
221
  </tbody>
270
222
  </table>