PyPI - PVNet - Versions diffs - 5.0.18__tar.gz → 5.0.20__tar.gz - Mend

PVNet 5.0.18tar.gz → 5.0.20tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{pvnet-5.0.18 → pvnet-5.0.20}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: PVNet
-Version: 5.0.18
+Version: 5.0.20
 Summary: PVNet
 Author-email: Peter Dudfield <info@openclimatefix.org>
 Requires-Python: >=3.11
@@ -142,120 +142,77 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
 If you install the local version of `ocf-data-sampler` that is more recent than the version
 specified in `PVNet` it is not guarenteed to function properly with this library.
-## Pre-saving samples of data for training/validation of PVNet
+## Streaming samples (no pre-save)
-PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
+PVNet now trains and validates directly from **streamed_samples** (i.e. no pre-saving to disk).
-Make sure you have copied the example configs (as already stated above):
-```
+Make sure you have copied example configs (as already stated above):
 cp -r configs.example configs
-```
-### Set up and config example for sample creation
-We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
+### Set up and config example for streaming
-When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
+We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
-```yaml
-configuration: "PLACEHOLDER.yaml"
-```
-This should be given the whole path to the config on your local machine, for example:
+At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
-```yaml
 configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
-```
-Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
-This is also where you can update the train, val & test periods to cover the data you have access to.
-### Running the sample creation script
-Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
-```bash
-python scripts/save_samples.py
-```
-PVNet uses
-[hydra](https://hydra.cc/) which enables us to pass variables via the command
-line that will override the configuration defined in the `./configs` directory, like this:
-```bash
-python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
-```
-`scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
+You can also update train/val/test time ranges here to match the period you have access to.
 If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
-```
 gcloud auth login
-```
-Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
+You can provide multiple storage locations as a list. For example:
-```yaml
 satellite:
-    zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
-```
-Or to satellite data hosted by Google:
+  zarr_path:
+    - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
+    - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
-```yaml
-satellite:
-    zarr_path:
-      - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
-      - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
-```
-ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
+`ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
+⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
 ### Training PVNet
-How PVNet is run is determined by the extensive configuration in the config
-files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
+How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
-Make sure to update the following config files before training your model:
+Update the following before training:
-1. In `configs/datamodule/presaved_samples.yaml`:
-    - update `sample_dir` to point to the directory you stored your samples in during sample creation
-2. In `configs/model/late_fusion.yaml`:
-    - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
-        - `in_channels`: number of variables your NWP source supplies
-        - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
-3. In `configs/trainer/default.yaml`:
-    - set `accelerator: 0` if running on a system without a supported GPU
+1. In `configs/model/late_fusion.yaml`:
+    - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
+        - `in_channels`: the number of variables your NWP source supplies
+        - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
+2. In `configs/trainer/default.yaml`:
+    - Set `accelerator: 0` if running on a system without a supported GPU
+3. In `configs/datamodule/streamed_samples.yaml`:
+    - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
+    - Adjust the train/val/test time ranges to your available data
-If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
-your customised config files:
+If you create custom config files, update the main `./configs/config.yaml` defaults:
-```yaml
 defaults:
   - trainer: default.yaml
   - model: late_fusion.yaml
-  - datamodule: presaved_samples.yaml
+  - datamodule: streamed_samples.yaml
   - callbacks: null
   - experiment: null
   - hparams_search: null
   - hydra: default.yaml
-```
-Assuming you ran the `save_samples.py` script to generate some presaved train and
-val data samples, you can now train PVNet by running:
+Now train PVNet:
-```
 python run.py
-```
+You can override any setting with Hydra, e.g.:
+python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
 ## Backtest
 If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
 ## Testing
 You can use `python -m pytest tests` to run tests

{pvnet-5.0.18 → pvnet-5.0.20}/PVNet.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: PVNet
-Version: 5.0.18
+Version: 5.0.20
 Summary: PVNet
 Author-email: Peter Dudfield <info@openclimatefix.org>
 Requires-Python: >=3.11
@@ -142,120 +142,77 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
 If you install the local version of `ocf-data-sampler` that is more recent than the version
 specified in `PVNet` it is not guarenteed to function properly with this library.
-## Pre-saving samples of data for training/validation of PVNet
+## Streaming samples (no pre-save)
-PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
+PVNet now trains and validates directly from **streamed_samples** (i.e. no pre-saving to disk).
-Make sure you have copied the example configs (as already stated above):
-```
+Make sure you have copied example configs (as already stated above):
 cp -r configs.example configs
-```
-### Set up and config example for sample creation
-We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
+### Set up and config example for streaming
-When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
+We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
-```yaml
-configuration: "PLACEHOLDER.yaml"
-```
-This should be given the whole path to the config on your local machine, for example:
+At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
-```yaml
 configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
-```
-Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
-This is also where you can update the train, val & test periods to cover the data you have access to.
-### Running the sample creation script
-Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
-```bash
-python scripts/save_samples.py
-```
-PVNet uses
-[hydra](https://hydra.cc/) which enables us to pass variables via the command
-line that will override the configuration defined in the `./configs` directory, like this:
-```bash
-python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
-```
-`scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
+You can also update train/val/test time ranges here to match the period you have access to.
 If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
-```
 gcloud auth login
-```
-Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
+You can provide multiple storage locations as a list. For example:
-```yaml
 satellite:
-    zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
-```
-Or to satellite data hosted by Google:
+  zarr_path:
+    - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
+    - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
-```yaml
-satellite:
-    zarr_path:
-      - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
-      - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
-```
-ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
+`ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
+⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
 ### Training PVNet
-How PVNet is run is determined by the extensive configuration in the config
-files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
+How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
-Make sure to update the following config files before training your model:
+Update the following before training:
-1. In `configs/datamodule/presaved_samples.yaml`:
-    - update `sample_dir` to point to the directory you stored your samples in during sample creation
-2. In `configs/model/late_fusion.yaml`:
-    - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
-        - `in_channels`: number of variables your NWP source supplies
-        - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
-3. In `configs/trainer/default.yaml`:
-    - set `accelerator: 0` if running on a system without a supported GPU
+1. In `configs/model/late_fusion.yaml`:
+    - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
+        - `in_channels`: the number of variables your NWP source supplies
+        - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
+2. In `configs/trainer/default.yaml`:
+    - Set `accelerator: 0` if running on a system without a supported GPU
+3. In `configs/datamodule/streamed_samples.yaml`:
+    - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
+    - Adjust the train/val/test time ranges to your available data
-If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
-your customised config files:
+If you create custom config files, update the main `./configs/config.yaml` defaults:
-```yaml
 defaults:
   - trainer: default.yaml
   - model: late_fusion.yaml
-  - datamodule: presaved_samples.yaml
+  - datamodule: streamed_samples.yaml
   - callbacks: null
   - experiment: null
   - hparams_search: null
   - hydra: default.yaml
-```
-Assuming you ran the `save_samples.py` script to generate some presaved train and
-val data samples, you can now train PVNet by running:
+Now train PVNet:
-```
 python run.py
-```
+You can override any setting with Hydra, e.g.:
+python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
 ## Backtest
 If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
 ## Testing
 You can use `python -m pytest tests` to run tests

{pvnet-5.0.18 → pvnet-5.0.20}/README.md RENAMED Viewed

@@ -113,120 +113,77 @@ pip install -e <PATH-TO-ocf-data-sampler-REPO>
 If you install the local version of `ocf-data-sampler` that is more recent than the version
 specified in `PVNet` it is not guarenteed to function properly with this library.
-## Pre-saving samples of data for training/validation of PVNet
+## Streaming samples (no pre-save)
-PVNet contains a script for generating samples of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
+PVNet now trains and validates directly from **streamed_samples** (i.e. no pre-saving to disk).
-Make sure you have copied the example configs (as already stated above):
-```
+Make sure you have copied example configs (as already stated above):
 cp -r configs.example configs
-```
-### Set up and config example for sample creation
-We will use the following example config file for creating samples: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
+### Set up and config example for streaming
-When creating samples, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the sample creation script: `streamed_samples.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
+We will use the following example config file to describe your data sources: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. Delete or comment the parts for data you are not using.
-```yaml
-configuration: "PLACEHOLDER.yaml"
-```
-This should be given the whole path to the config on your local machine, for example:
+At run time, the datamodule config `PVNet/configs/datamodule/streamed_samples.yaml` points to your chosen configuration file:
-```yaml
 configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
-```
-Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
-This is also where you can update the train, val & test periods to cover the data you have access to.
-### Running the sample creation script
-Run the `save_samples.py` script to create samples with the parameters specified in the datamodule config (`streamed_samples.yaml` in this example):
-```bash
-python scripts/save_samples.py
-```
-PVNet uses
-[hydra](https://hydra.cc/) which enables us to pass variables via the command
-line that will override the configuration defined in the `./configs` directory, like this:
-```bash
-python scripts/save_samples.py datamodule=streamed_samples datamodule.sample_output_dir="./output" datamodule.num_train_samples=10 datamodule.num_val_samples=5
-```
-`scripts/save_samples.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_samples.yaml` or create your own in the same folder.
+You can also update train/val/test time ranges here to match the period you have access to.
 If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
-```
 gcloud auth login
-```
-Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
+You can provide multiple storage locations as a list. For example:
-```yaml
 satellite:
-    zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
-```
-Or to satellite data hosted by Google:
+  zarr_path:
+    - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
+    - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
-```yaml
-satellite:
-    zarr_path:
-      - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
-      - "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
-```
-ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
+`ocf-data-sampler` is currently set up to use 11 channels from the satellite data (the 12th, HRV, is not used).
+⚠️ NB: Our publicly accessible satellite data is currently saved with a blosc2 compressor, which is not supported by the tensorstore backend PVNet relies on now. We are in the process of updating this; for now, the paths above cannot be used with this codebase.
 ### Training PVNet
-How PVNet is run is determined by the extensive configuration in the config
-files. The configs stored in `PVNet/configs.example` should work with samples created using the steps and sample creation config mentioned above.
+How PVNet is run is determined by the configuration files. The example configs in `PVNet/configs.example` work with **streamed_samples** using `datamodule/streamed_samples.yaml`.
-Make sure to update the following config files before training your model:
+Update the following before training:
-1. In `configs/datamodule/presaved_samples.yaml`:
-    - update `sample_dir` to point to the directory you stored your samples in during sample creation
-2. In `configs/model/late_fusion.yaml`:
-    - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
-        - `in_channels`: number of variables your NWP source supplies
-        - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
-3. In `configs/trainer/default.yaml`:
-    - set `accelerator: 0` if running on a system without a supported GPU
+1. In `configs/model/late_fusion.yaml`:
+    - Update the list of encoders to match the data sources you are using. For different NWP sources, keep the same structure but ensure:
+        - `in_channels`: the number of variables your NWP source supplies
+        - `image_size_pixels`: spatial crop matching your NWP resolution and the settings in your datamodule configuration (unless you coarsened, e.g. for ECMWF)
+2. In `configs/trainer/default.yaml`:
+    - Set `accelerator: 0` if running on a system without a supported GPU
+3. In `configs/datamodule/streamed_samples.yaml`:
+    - Point `configuration:` to your local `example_configuration.yaml` (or your custom one)
+    - Adjust the train/val/test time ranges to your available data
-If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
-your customised config files:
+If you create custom config files, update the main `./configs/config.yaml` defaults:
-```yaml
 defaults:
   - trainer: default.yaml
   - model: late_fusion.yaml
-  - datamodule: presaved_samples.yaml
+  - datamodule: streamed_samples.yaml
   - callbacks: null
   - experiment: null
   - hparams_search: null
   - hydra: default.yaml
-```
-Assuming you ran the `save_samples.py` script to generate some presaved train and
-val data samples, you can now train PVNet by running:
+Now train PVNet:
-```
 python run.py
-```
+You can override any setting with Hydra, e.g.:
+python run.py datamodule=streamed_samples datamodule.configuration="/FULL-PATH/PVNet/configs/datamodule/configuration/example_configuration.yaml"
 ## Backtest
 If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
 ## Testing
 You can use `python -m pytest tests` to run tests