PyPI - rslearn - Versions diffs - 0.0.1__py3-none-any.whl → 0.0.21__py3-none-any.whl - Mend

rslearn 0.0.1py3-none-any.whl → 0.0.21py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (166) hide show

rslearn/arg_parser.py +31 -0
rslearn/config/__init__.py +6 -12
rslearn/config/dataset.py +520 -401
rslearn/const.py +9 -15
rslearn/data_sources/__init__.py +8 -23
rslearn/data_sources/aws_landsat.py +242 -98
rslearn/data_sources/aws_open_data.py +111 -151
rslearn/data_sources/aws_sentinel1.py +131 -0
rslearn/data_sources/climate_data_store.py +471 -0
rslearn/data_sources/copernicus.py +884 -12
rslearn/data_sources/data_source.py +43 -12
rslearn/data_sources/earthdaily.py +484 -0
rslearn/data_sources/earthdata_srtm.py +282 -0
rslearn/data_sources/eurocrops.py +242 -0
rslearn/data_sources/gcp_public_data.py +578 -222
rslearn/data_sources/google_earth_engine.py +461 -135
rslearn/data_sources/local_files.py +219 -150
rslearn/data_sources/openstreetmap.py +51 -89
rslearn/data_sources/planet.py +24 -60
rslearn/data_sources/planet_basemap.py +275 -0
rslearn/data_sources/planetary_computer.py +798 -0
rslearn/data_sources/usda_cdl.py +195 -0
rslearn/data_sources/usgs_landsat.py +115 -83
rslearn/data_sources/utils.py +249 -61
rslearn/data_sources/vector_source.py +1 -0
rslearn/data_sources/worldcereal.py +449 -0
rslearn/data_sources/worldcover.py +144 -0
rslearn/data_sources/worldpop.py +153 -0
rslearn/data_sources/xyz_tiles.py +150 -107
rslearn/dataset/__init__.py +8 -2
rslearn/dataset/add_windows.py +2 -2
rslearn/dataset/dataset.py +40 -51
rslearn/dataset/handler_summaries.py +131 -0
rslearn/dataset/manage.py +313 -74
rslearn/dataset/materialize.py +431 -107
rslearn/dataset/remap.py +29 -4
rslearn/dataset/storage/__init__.py +1 -0
rslearn/dataset/storage/file.py +202 -0
rslearn/dataset/storage/storage.py +140 -0
rslearn/dataset/window.py +181 -44
rslearn/lightning_cli.py +454 -0
rslearn/log_utils.py +24 -0
rslearn/main.py +384 -181
rslearn/models/anysat.py +215 -0
rslearn/models/attention_pooling.py +177 -0
rslearn/models/clay/clay.py +231 -0
rslearn/models/clay/configs/metadata.yaml +295 -0
rslearn/models/clip.py +68 -0
rslearn/models/component.py +111 -0
rslearn/models/concatenate_features.py +103 -0
rslearn/models/conv.py +63 -0
rslearn/models/croma.py +306 -0
rslearn/models/detr/__init__.py +5 -0
rslearn/models/detr/box_ops.py +103 -0
rslearn/models/detr/detr.py +504 -0
rslearn/models/detr/matcher.py +107 -0
rslearn/models/detr/position_encoding.py +114 -0
rslearn/models/detr/transformer.py +429 -0
rslearn/models/detr/util.py +24 -0
rslearn/models/dinov3.py +177 -0
rslearn/models/faster_rcnn.py +30 -28
rslearn/models/feature_center_crop.py +53 -0
rslearn/models/fpn.py +19 -8
rslearn/models/galileo/__init__.py +5 -0
rslearn/models/galileo/galileo.py +595 -0
rslearn/models/galileo/single_file_galileo.py +1678 -0
rslearn/models/module_wrapper.py +65 -0
rslearn/models/molmo.py +69 -0
rslearn/models/multitask.py +384 -28
rslearn/models/olmoearth_pretrain/__init__.py +1 -0
rslearn/models/olmoearth_pretrain/model.py +421 -0
rslearn/models/olmoearth_pretrain/norm.py +86 -0
rslearn/models/panopticon.py +170 -0
rslearn/models/panopticon_data/sensors/drone.yaml +32 -0
rslearn/models/panopticon_data/sensors/enmap.yaml +904 -0
rslearn/models/panopticon_data/sensors/goes.yaml +9 -0
rslearn/models/panopticon_data/sensors/himawari.yaml +9 -0
rslearn/models/panopticon_data/sensors/intuition.yaml +606 -0
rslearn/models/panopticon_data/sensors/landsat8.yaml +84 -0
rslearn/models/panopticon_data/sensors/modis_terra.yaml +99 -0
rslearn/models/panopticon_data/sensors/qb2_ge1.yaml +34 -0
rslearn/models/panopticon_data/sensors/sentinel1.yaml +85 -0
rslearn/models/panopticon_data/sensors/sentinel2.yaml +97 -0
rslearn/models/panopticon_data/sensors/superdove.yaml +60 -0
rslearn/models/panopticon_data/sensors/wv23.yaml +63 -0
rslearn/models/pick_features.py +17 -10
rslearn/models/pooling_decoder.py +60 -7
rslearn/models/presto/__init__.py +5 -0
rslearn/models/presto/presto.py +297 -0
rslearn/models/presto/single_file_presto.py +926 -0
rslearn/models/prithvi.py +1147 -0
rslearn/models/resize_features.py +59 -0
rslearn/models/sam2_enc.py +13 -9
rslearn/models/satlaspretrain.py +38 -18
rslearn/models/simple_time_series.py +188 -77
rslearn/models/singletask.py +24 -13
rslearn/models/ssl4eo_s12.py +40 -30
rslearn/models/swin.py +44 -32
rslearn/models/task_embedding.py +250 -0
rslearn/models/terramind.py +256 -0
rslearn/models/trunk.py +139 -0
rslearn/models/unet.py +68 -22
rslearn/models/upsample.py +48 -0
rslearn/models/use_croma.py +508 -0
rslearn/template_params.py +26 -0
rslearn/tile_stores/__init__.py +41 -18
rslearn/tile_stores/default.py +409 -0
rslearn/tile_stores/tile_store.py +236 -132
rslearn/train/all_patches_dataset.py +530 -0
rslearn/train/callbacks/adapters.py +53 -0
rslearn/train/callbacks/freeze_unfreeze.py +348 -17
rslearn/train/callbacks/gradients.py +129 -0
rslearn/train/callbacks/peft.py +116 -0
rslearn/train/data_module.py +444 -20
rslearn/train/dataset.py +588 -235
rslearn/train/lightning_module.py +192 -62
rslearn/train/model_context.py +88 -0
rslearn/train/optimizer.py +31 -0
rslearn/train/prediction_writer.py +319 -84
rslearn/train/scheduler.py +92 -0
rslearn/train/tasks/classification.py +55 -28
rslearn/train/tasks/detection.py +132 -76
rslearn/train/tasks/embedding.py +120 -0
rslearn/train/tasks/multi_task.py +28 -14
rslearn/train/tasks/per_pixel_regression.py +291 -0
rslearn/train/tasks/regression.py +161 -44
rslearn/train/tasks/segmentation.py +428 -53
rslearn/train/tasks/task.py +6 -5
rslearn/train/transforms/__init__.py +1 -1
rslearn/train/transforms/concatenate.py +54 -10
rslearn/train/transforms/crop.py +29 -11
rslearn/train/transforms/flip.py +18 -6
rslearn/train/transforms/mask.py +78 -0
rslearn/train/transforms/normalize.py +101 -17
rslearn/train/transforms/pad.py +19 -7
rslearn/train/transforms/resize.py +83 -0
rslearn/train/transforms/select_bands.py +76 -0
rslearn/train/transforms/sentinel1.py +75 -0
rslearn/train/transforms/transform.py +89 -70
rslearn/utils/__init__.py +2 -6
rslearn/utils/array.py +8 -6
rslearn/utils/feature.py +2 -2
rslearn/utils/fsspec.py +90 -1
rslearn/utils/geometry.py +347 -7
rslearn/utils/get_utm_ups_crs.py +2 -3
rslearn/utils/grid_index.py +5 -5
rslearn/utils/jsonargparse.py +178 -0
rslearn/utils/mp.py +4 -3
rslearn/utils/raster_format.py +268 -116
rslearn/utils/rtree_index.py +64 -17
rslearn/utils/sqlite_index.py +7 -1
rslearn/utils/vector_format.py +252 -97
{rslearn-0.0.1.dist-info → rslearn-0.0.21.dist-info}/METADATA +532 -283
rslearn-0.0.21.dist-info/RECORD +167 -0
{rslearn-0.0.1.dist-info → rslearn-0.0.21.dist-info}/WHEEL +1 -1
rslearn-0.0.21.dist-info/licenses/NOTICE +115 -0
rslearn/data_sources/raster_source.py +0 -309
rslearn/models/registry.py +0 -5
rslearn/tile_stores/file.py +0 -242
rslearn/utils/mgrs.py +0 -24
rslearn/utils/utils.py +0 -22
rslearn-0.0.1.dist-info/RECORD +0 -88
/rslearn/{data_sources/geotiff.py → py.typed} +0 -0
{rslearn-0.0.1.dist-info → rslearn-0.0.21.dist-info}/entry_points.txt +0 -0
{rslearn-0.0.1.dist-info → rslearn-0.0.21.dist-info/licenses}/LICENSE +0 -0
{rslearn-0.0.1.dist-info → rslearn-0.0.21.dist-info}/top_level.txt +0 -0

{rslearn-0.0.1.dist-info → rslearn-0.0.21.dist-info}/METADATA RENAMED Viewed

@@ -1,9 +1,9 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: rslearn
-Version: 0.0.1
+Version: 0.0.21
 Summary: A library for developing remote sensing datasets and models
-Author-email: Favyen Bastani <favyenb@allenai.org>, Yawen Zhang <yawenz@allenai.org>, Patrick Beukema <patrickb@allenai.org>, Henry Herzog <henryh@allenai.org>, Piper Wolters <piperw@allenai.org>
-License: Apache License
+Author: OlmoEarth Team
+License:                                  Apache License
                                    Version 2.0, January 2004
                                 http://www.apache.org/licenses/
@@ -205,36 +205,62 @@ License: Apache License
            See the License for the specific language governing permissions and
            limitations under the License.
-Requires-Python: >=3.10
+Project-URL: homepage, https://github.com/allenai/rslearn
+Project-URL: issues, https://github.com/allenai/rslearn/issues
+Project-URL: repository, https://github.com/allenai/rslearn
+Requires-Python: >=3.11
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: boto3
-Requires-Dist: class-registry
-Requires-Dist: python-dateutil
-Requires-Dist: pytimeparse
-Requires-Dist: fiona
-Requires-Dist: fsspec[gcs,s3]
-Requires-Dist: Pillow
-Requires-Dist: pyproj
-Requires-Dist: rasterio
-Requires-Dist: shapely
-Requires-Dist: tqdm
-Requires-Dist: torch
-Requires-Dist: torchvision
-Requires-Dist: universal-pathlib
-Requires-Dist: lightning[pytorch-extra]
+License-File: NOTICE
+Requires-Dist: boto3>=1.39
+Requires-Dist: fiona>=1.10
+Requires-Dist: fsspec>=2025.10.0
+Requires-Dist: jsonargparse>=4.35.0
+Requires-Dist: lightning>=2.5.1.post0
+Requires-Dist: Pillow>=11.3
+Requires-Dist: pyproj>=3.7
+Requires-Dist: python-dateutil>=2.9
+Requires-Dist: pytimeparse>=1.1
+Requires-Dist: rasterio>=1.4
+Requires-Dist: shapely>=2.1
+Requires-Dist: torch>=2.7.0
+Requires-Dist: torchvision>=0.22.0
+Requires-Dist: tqdm>=4.67
+Requires-Dist: universal_pathlib>=0.2.6
 Provides-Extra: extra
-Requires-Dist: earthengine-api ; extra == 'extra'
-Requires-Dist: gcsfs ; extra == 'extra'
-Requires-Dist: google-cloud-storage ; extra == 'extra'
-Requires-Dist: mgrs ; extra == 'extra'
-Requires-Dist: osmium ; extra == 'extra'
-Requires-Dist: planet ; extra == 'extra'
-Requires-Dist: pycocotools ; extra == 'extra'
-Requires-Dist: rtree ; extra == 'extra'
-Requires-Dist: satlaspretrain-models ; extra == 'extra'
-Requires-Dist: scipy ; extra == 'extra'
-Requires-Dist: wandb ; extra == 'extra'
+Requires-Dist: accelerate>=1.10; extra == "extra"
+Requires-Dist: cdsapi>=0.7.6; extra == "extra"
+Requires-Dist: earthdaily[platform]>=1.0.7; extra == "extra"
+Requires-Dist: earthengine-api>=1.6.3; extra == "extra"
+Requires-Dist: einops>=0.8; extra == "extra"
+Requires-Dist: fsspec[gcs,s3]; extra == "extra"
+Requires-Dist: google-cloud-bigquery>=3.35; extra == "extra"
+Requires-Dist: google-cloud-storage>=2.18; extra == "extra"
+Requires-Dist: huggingface_hub>=0.34.4; extra == "extra"
+Requires-Dist: netCDF4>=1.7.2; extra == "extra"
+Requires-Dist: osmium>=4.0.2; extra == "extra"
+Requires-Dist: planet>=3.1; extra == "extra"
+Requires-Dist: planetary_computer>=1.0; extra == "extra"
+Requires-Dist: pycocotools>=2.0; extra == "extra"
+Requires-Dist: pystac_client>=0.9; extra == "extra"
+Requires-Dist: rtree>=1.4; extra == "extra"
+Requires-Dist: termcolor>=3.0; extra == "extra"
+Requires-Dist: satlaspretrain_models>=0.3; extra == "extra"
+Requires-Dist: scipy>=1.16; extra == "extra"
+Requires-Dist: terratorch>=1.0.2; extra == "extra"
+Requires-Dist: transformers>=4.55; extra == "extra"
+Requires-Dist: wandb>=0.21; extra == "extra"
+Requires-Dist: timm>=0.9.7; extra == "extra"
+Provides-Extra: dev
+Requires-Dist: interrogate>=1.7.0; extra == "dev"
+Requires-Dist: mypy<2,>=1.17.1; extra == "dev"
+Requires-Dist: pre-commit>=4.3.0; extra == "dev"
+Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: pytest_httpserver; extra == "dev"
+Requires-Dist: ruff>=0.12.9; extra == "dev"
+Requires-Dist: pytest-dotenv; extra == "dev"
+Requires-Dist: pytest-xdist; extra == "dev"
+Dynamic: license-file
 Overview
 --------
@@ -254,10 +280,12 @@ rslearn helps with:
 Quick links:
-- [CoreConcepts](CoreConcepts.md) summarizes key concepts in rslearn, including
+- [CoreConcepts](docs/CoreConcepts.md) summarizes key concepts in rslearn, including
   datasets, windows, layers, and data sources.
-- [Examples](Examples.md) contains more examples, including customizing different
+- [Examples](docs/Examples.md) contains more examples, including customizing different
   stages of rslearn with additional code.
+- [DatasetConfig](docs/DatasetConfig.md) documents the dataset configuration file.
+- [ModelConfig](docs/ModelConfig.md) documents the model configuration file.
 Setup
@@ -265,9 +293,33 @@ Setup
 rslearn requires Python 3.10+ (Python 3.12 is recommended).
-    git clone https://github.com/allenai/rslearn.git
-    cd rslearn
-    pip install .[extra]
+```
+git clone https://github.com/allenai/rslearn.git
+cd rslearn
+pip install .[extra]
+```
+Supported Data Sources
+----------------------
+rslearn supports ingesting raster and vector data from the following data sources. Even
+if you don't plan to train models within rslearn, you can still use it to easily
+download, crop, and re-project data based on spatiotemporal rectangles (windows) that
+you define. See [Examples](docs/Examples.md) and [DatasetConfig](docs/DatasetConfig.md)
+for how to setup these data sources.
+- Sentinel-1
+- Sentinel-2 L1C and L2A
+- Landsat 8/9 OLI-TIRS
+- National Agriculture Imagery Program
+- OpenStreetMap
+- Xyz (Slippy) Tiles (e.g., Mapbox tiles)
+- Planet Labs (PlanetScope, SkySat)
+- ESA WorldCover 2021
+rslearn can also be used to easily mosaic, crop, and re-project any sets of local
+raster and vector files you may have.
 Example Usage
@@ -281,28 +333,27 @@ Let's start by defining a region of interest and obtaining Sentinel-2 images. Cr
 directory `/path/to/dataset` and corresponding configuration file at
 `/path/to/dataset/config.json` as follows:
-    {
-        "layers": {
-            "sentinel2": {
-                "type": "raster",
-                "band_sets": [{
-                    "dtype": "uint8",
-                    "bands": ["R", "G", "B"]
-                }],
-                "data_source": {
-                    "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
+```json
+{
+    "layers": {
+        "sentinel2": {
+            "type": "raster",
+            "band_sets": [{
+                "dtype": "uint8",
+                "bands": ["R", "G", "B"]
+            }],
+            "data_source": {
+                "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
+                "init_args": {
                     "index_cache_dir": "cache/sentinel2/",
-                    "max_time_delta": "1d",
                     "sort_by": "cloud_cover",
                     "use_rtree_index": false
                 }
             }
-        },
-        "tile_store": {
-            "name": "file",
-            "root_dir": "tiles"
         }
     }
+}
+```
 Here, we have initialized an empty dataset and defined a raster layer called
 `sentinel2`. Because it specifies a data source, it will be populated automatically. In
@@ -314,8 +365,10 @@ choosing the scenes with minimal cloud cover.
 Next, let's create our spatiotemporal windows. These will correspond to training
 examples.
-    export DATASET_PATH=/path/to/dataset
-    rslearn dataset add_windows --root $DATASET_PATH --group default --utm --resolution 10 --grid_size 128 --src_crs EPSG:4326 --box=-122.6901,47.2079,-121.4955,47.9403 --start 2024-06-01T00:00:00+00:00 --end 2024-08-01T00:00:00+00:00 --name seattle
+```
+export DATASET_PATH=/path/to/dataset
+rslearn dataset add_windows --root $DATASET_PATH --group default --utm --resolution 10 --grid_size 128 --src_crs EPSG:4326 --box=-122.6901,47.2079,-121.4955,47.9403 --start 2024-06-01T00:00:00+00:00 --end 2024-08-01T00:00:00+00:00 --name seattle
+```
 This creates windows along a 128x128 grid in the specified projection (i.e.,
 appropriate UTM zone for the location with 10 m/pixel resolution) covering the
@@ -327,9 +380,11 @@ We can now obtain the Sentinel-2 images by running prepare, ingest, and material
 * Ingest: retrieve those items. This step populates the `tiles` directory within the dataset.
 * Materialize: crop/mosaic the items to align with the windows. This populates the `layers` folder in each window directory.
-    rslearn dataset prepare --root $DATASET_PATH --workers 32 --batch-size 8
-    rslearn dataset ingest --root $DATASET_PATH --workers 32 --no-use-initial-job --jobs-per-process 1
-    rslearn dataset materialize --root $DATASET_PATH --workers 32 --no-use-initial-job
+```
+rslearn dataset prepare --root $DATASET_PATH --workers 32 --batch-size 8
+rslearn dataset ingest --root $DATASET_PATH --workers 32 --no-use-initial-job --jobs-per-process 1
+rslearn dataset materialize --root $DATASET_PATH --workers 32 --no-use-initial-job
+```
 For ingestion, you may need to reduce the number of workers depending on the available
 memory on your system.
@@ -337,32 +392,36 @@ memory on your system.
 You should now be able to open the GeoTIFF images. Let's find the window that
 corresponds to downtown Seattle:
-    import shapely
-    from rslearn.const import WGS84_PROJECTION
-    from rslearn.dataset import Dataset
-    from rslearn.utils import Projection, STGeometry
-    from upath import UPath
-    # Define longitude and latitude for downtown Seattle.
-    downtown_seattle = shapely.Point(-122.333, 47.606)
-    # Iterate over the windows and find the closest one.
-    dataset = Dataset(path=UPath("/path/to/dataset"))
-    best_window_name = None
-    best_distance = None
-    for window in dataset.load_windows(workers=32):
-        shp = window.get_geometry().to_projection(WGS84_PROJECTION).shp
-        distance = shp.distance(downtown_seattle)
-        if best_distance is None or distance < best_distance:
-            best_window_name = window.name
-            best_distance = distance
-    print(best_window_name)
+```python
+import shapely
+from rslearn.const import WGS84_PROJECTION
+from rslearn.dataset import Dataset
+from rslearn.utils import Projection, STGeometry
+from upath import UPath
+# Define longitude and latitude for downtown Seattle.
+downtown_seattle = shapely.Point(-122.333, 47.606)
+# Iterate over the windows and find the closest one.
+dataset = Dataset(path=UPath("/path/to/dataset"))
+best_window_name = None
+best_distance = None
+for window in dataset.load_windows(workers=32):
+    shp = window.get_geometry().to_projection(WGS84_PROJECTION).shp
+    distance = shp.distance(downtown_seattle)
+    if best_distance is None or distance < best_distance:
+        best_window_name = window.name
+        best_distance = distance
+print(best_window_name)
+```
 It should be `seattle_54912_-527360`, so let's open it in qgis (or your favorite GIS
 software):
-    qgis $DATASET_PATH/windows/default/seattle_54912_-527360/layers/sentinel2/R_G_B/geotiff.tif
+```
+qgis $DATASET_PATH/windows/default/seattle_54912_-527360/layers/sentinel2/R_G_B/geotiff.tif
+```
 ### Adding Land Cover Labels
@@ -372,152 +431,166 @@ the ESA WorldCover land cover map as labels.
 Start by downloading the WorldCover data from https://worldcover2021.esa.int
-    wget https://worldcover2021.esa.int/data/archive/ESA_WorldCover_10m_2021_v200_60deg_macrotile_N30W180.zip
-    mkdir world_cover_tifs
-    unzip ESA_WorldCover_10m_2021_v200_60deg_macrotile_N30W180.zip -d world_cover_tifs/
+```
+wget https://worldcover2021.esa.int/data/archive/ESA_WorldCover_10m_2021_v200_60deg_macrotile_N30W180.zip
+mkdir world_cover_tifs
+unzip ESA_WorldCover_10m_2021_v200_60deg_macrotile_N30W180.zip -d world_cover_tifs/
+```
 It would require some work to write a script to re-project and crop these GeoTIFFs so
 that they align with the windows we have previously defined (and the Sentinel-2 images
 we have already ingested). We can use the LocalFiles data source to have rslearn
 automate this process. Update the dataset `config.json` with a new layer:
-    "layers": {
-        "sentinel2": {
-            ...
-        },
-        "worldcover": {
-            "type": "raster",
-            "band_sets": [{
-                "dtype": "uint8",
-                "bands": ["B1"]
-            }],
-            "resampling_method": "nearest",
-            "data_source": {
-                "name": "rslearn.data_sources.local_files.LocalFiles",
+```jsonc
+"layers": {
+    "sentinel2": {
+        # ...
+    },
+    "worldcover": {
+        "type": "raster",
+        "band_sets": [{
+            "dtype": "uint8",
+            "bands": ["B1"]
+        }],
+        "resampling_method": "nearest",
+        "data_source": {
+            "class_path": "rslearn.data_sources.local_files.LocalFiles",
+            "init_args": {
                 "src_dir": "file:///path/to/world_cover_tifs/"
             }
         }
-    },
-    ...
+    }
+},
+# ...
+```
 Repeat the materialize process so we populate the data for this new layer:
-    rslearn dataset prepare --root $DATASET_PATH --workers 32 --batch-size 8
-    rslearn dataset ingest --root $DATASET_PATH --workers 32 --no-use-initial-job --jobs-per-process 1
-    rslearn dataset materialize --root $DATASET_PATH --workers 32 --no-use-initial-job
+```
+rslearn dataset prepare --root $DATASET_PATH --workers 32 --batch-size 8
+rslearn dataset ingest --root $DATASET_PATH --workers 32 --no-use-initial-job --jobs-per-process 1
+rslearn dataset materialize --root $DATASET_PATH --workers 32 --no-use-initial-job
+```
 We can visualize both the GeoTIFFs together in qgis:
-    qgis $DATASET_PATH/windows/default/seattle_54912_-527360/layers/*/*/geotiff.tif
+```
+qgis $DATASET_PATH/windows/default/seattle_54912_-527360/layers/*/*/geotiff.tif
+```
 ### Training a Model
 Create a model configuration file `land_cover_model.yaml`:
+```yaml
+model:
+  class_path: rslearn.train.lightning_module.RslearnLightningModule
+  init_args:
+    # This part defines the model architecture.
+    # Essentially we apply the SatlasPretrain Sentinel-2 backbone with a UNet decoder
+    # that terminates at a segmentation prediction head.
+    # The backbone outputs four feature maps at different scales, and the UNet uses
+    # these to compute a feature map at the input scale.
+    # Finally the segmentation head applies per-pixel softmax to compute the land
+    # cover class.
     model:
-      class_path: rslearn.train.lightning_module.RslearnLightningModule
+      class_path: rslearn.models.singletask.SingleTaskModel
       init_args:
-        # This part defines the model architecture.
-        # Essentially we apply the SatlasPretrain Sentinel-2 backbone with a UNet decoder
-        # that terminates at a segmentation prediction head.
-        # The backbone outputs four feature maps at different scales, and the UNet uses
-        # these to compute a feature map at the input scale.
-        # Finally the segmentation head applies per-pixel softmax to compute the land
-        # cover class.
-        model:
-          class_path: rslearn.models.singletask.SingleTaskModel
-          init_args:
-            encoder:
-              - class_path: rslearn.models.satlaspretrain.SatlasPretrain
-                init_args:
-                  model_identifier: "Sentinel2_SwinB_SI_RGB"
-            decoder:
-              - class_path: rslearn.models.unet.UNetDecoder
-                init_args:
-                  in_channels: [[4, 128], [8, 256], [16, 512], [32, 1024]]
-                  # We use 101 classes because the WorldCover classes are 10, 20, 30, 40
-                  # 50, 60, 70, 80, 90, 95, 100.
-                  # We could process the GeoTIFFs to collapse them to 0-10 (the 11 actual
-                  # classes) but the model will quickly learn that the intermediate
-                  # values are never used.
-                  out_channels: 101
-                  conv_layers_per_resolution: 2
-              - class_path: rslearn.train.tasks.segmentation.SegmentationHead
-        # Remaining parameters in RslearnLightningModule define different aspects of the
-        # training process like initial learning rate.
-        lr: 0.0001
-    data:
-      class_path: rslearn.train.data_module.RslearnDataModule
+        encoder:
+          - class_path: rslearn.models.satlaspretrain.SatlasPretrain
+            init_args:
+              model_identifier: "Sentinel2_SwinB_SI_RGB"
+        decoder:
+          - class_path: rslearn.models.unet.UNetDecoder
+            init_args:
+              in_channels: [[4, 128], [8, 256], [16, 512], [32, 1024]]
+              # We use 101 classes because the WorldCover classes are 10, 20, 30, 40
+              # 50, 60, 70, 80, 90, 95, 100.
+              # We could process the GeoTIFFs to collapse them to 0-10 (the 11 actual
+              # classes) but the model will quickly learn that the intermediate
+              # values are never used.
+              out_channels: 101
+              conv_layers_per_resolution: 2
+          - class_path: rslearn.train.tasks.segmentation.SegmentationHead
+    # Remaining parameters in RslearnLightningModule define different aspects of the
+    # training process like initial learning rate.
+    lr: 0.0001
+data:
+  class_path: rslearn.train.data_module.RslearnDataModule
+  init_args:
+    path: ${DATASET_PATH}
+    # This defines the layers that should be read for each window.
+    # The key ("image" / "targets") is what the data will be called in the model,
+    # while the layers option specifies which layers will be read.
+    inputs:
+      image:
+        data_type: "raster"
+        layers: ["sentinel2"]
+        bands: ["R", "G", "B"]
+        passthrough: true
+      targets:
+        data_type: "raster"
+        layers: ["worldcover"]
+        bands: ["B1"]
+        is_target: true
+    task:
+      # Train for semantic segmentation.
+      # The remap option is only used when visualizing outputs during testing.
+      class_path: rslearn.train.tasks.segmentation.SegmentationTask
       init_args:
-        # Replace this with the dataset path.
-        path: /path/to/dataset/
-        # This defines the layers that should be read for each window.
-        # The key ("image" / "targets") is what the data will be called in the model,
-        # while the layers option specifies which layers will be read.
-        inputs:
-          image:
-            data_type: "raster"
-            layers: ["sentinel2"]
-            bands: ["R", "G", "B"]
-            passthrough: true
-          targets:
-            data_type: "raster"
-            layers: ["worldcover"]
-            bands: ["B1"]
-            is_target: true
-        task:
-          # Train for semantic segmentation.
-          # The remap option is only used when visualizing outputs during testing.
-          class_path: rslearn.train.tasks.segmentation.SegmentationTask
+        num_classes: 101
+        remap_values: [[0, 1], [0, 255]]
+    batch_size: 8
+    num_workers: 32
+    # These define different options for different phases/splits, like training,
+    # validation, and testing.
+    # Here we use the same transform across splits except training where we add a
+    # flipping augmentation.
+    # For now we are using the same windows for training and validation.
+    default_config:
+      transforms:
+        - class_path: rslearn.train.transforms.normalize.Normalize
           init_args:
-            num_classes: 101
-            remap_values: [[0, 1], [0, 255]]
-        batch_size: 8
-        num_workers: 32
-        # These define different options for different phases/splits, like training,
-        # validation, and testing.
-        # Here we use the same transform across splits except training where we add a
-        # flipping augmentation.
-        # For now we are using the same windows for training and validation.
-        default_config:
-          transforms:
-            - class_path: rslearn.train.transforms.normalize.Normalize
-              init_args:
-                mean: 0
-                std: 255
-        train_config:
-          transforms:
-            - class_path: rslearn.train.transforms.normalize.Normalize
-              init_args:
-                mean: 0
-                std: 255
-            - class_path: rslearn.train.transforms.flip.Flip
-              init_args:
-                image_selectors: ["image", "target/classes", "target/valid"]
-          groups: ["default"]
-        val_config:
-          groups: ["default"]
-        test_config:
-          groups: ["default"]
-        predict_config:
-          groups: ["predict"]
-          load_all_patches: true
-          skip_targets: true
-          patch_size: 512
-    trainer:
-      max_epochs: 10
-      callbacks:
-        - class_path: lightning.pytorch.callbacks.ModelCheckpoint
+            mean: 0
+            std: 255
+    train_config:
+      transforms:
+        - class_path: rslearn.train.transforms.normalize.Normalize
+          init_args:
+            mean: 0
+            std: 255
+        - class_path: rslearn.train.transforms.flip.Flip
           init_args:
-            save_top_k: 1
-            save_last: true
-            monitor: val_accuracy
-            mode: max
+            image_selectors: ["image", "target/classes", "target/valid"]
+      groups: ["default"]
+    val_config:
+      groups: ["default"]
+    test_config:
+      groups: ["default"]
+    predict_config:
+      groups: ["predict"]
+      load_all_patches: true
+      skip_targets: true
+      patch_size: 512
+trainer:
+  max_epochs: 10
+  callbacks:
+    - class_path: lightning.pytorch.callbacks.ModelCheckpoint
+      init_args:
+        save_top_k: 1
+        save_last: true
+        monitor: val_accuracy
+        mode: max
+        dirpath: ./land_cover_model_checkpoints/
+```
 Now we can train the model:
-    rslearn model fit --config land_cover_model.yaml
+```
+rslearn model fit --config land_cover_model.yaml
+```
 ### Apply the Model
@@ -528,22 +601,28 @@ windows along a grid, we just create one big window. This is because we are just
 to run the prediction over the whole window rather than use different windows as
 different training examples.
-    rslearn dataset add_windows --root $DATASET_PATH --group predict --utm --resolution 10 --src_crs EPSG:4326 --box=-122.712,45.477,-122.621,45.549 --start 2024-06-01T00:00:00+00:00 --end 2024-08-01T00:00:00+00:00 --name portland
-    rslearn dataset prepare --root $DATASET_PATH --workers 32 --batch-size 8
-    rslearn dataset ingest --root $DATASET_PATH --workers 32 --no-use-initial-job --jobs-per-process 1
-    rslearn dataset materialize --root $DATASET_PATH --workers 32 --no-use-initial-job
+```
+rslearn dataset add_windows --root $DATASET_PATH --group predict --utm --resolution 10 --src_crs EPSG:4326 --box=-122.712,45.477,-122.621,45.549 --start 2024-06-01T00:00:00+00:00 --end 2024-08-01T00:00:00+00:00 --name portland
+rslearn dataset prepare --root $DATASET_PATH --workers 32 --batch-size 8
+rslearn dataset ingest --root $DATASET_PATH --workers 32 --no-use-initial-job --jobs-per-process 1
+rslearn dataset materialize --root $DATASET_PATH --workers 32 --no-use-initial-job
+```
 We also need to add an RslearnPredictionWriter to the trainer callbacks in the model
 configuration file, as it will handle writing the outputs from the model to a GeoTIFF.
-    trainer:
-      callbacks:
-        - class_path: lightning.pytorch.callbacks.ModelCheckpoint
-          ...
-        - class_path: rslearn.train.prediction_writer.RslearnWriter
-          init_args:
-            path: /path/to/dataset/
-            output_layer: output
+```yaml
+trainer:
+  callbacks:
+    - class_path: lightning.pytorch.callbacks.ModelCheckpoint
+      ...
+    - class_path: rslearn.train.prediction_writer.RslearnWriter
+      init_args:
+        # We need to include this argument, but it will be overridden with the dataset
+        # path from data.init_args.path.
+        path: placeholder
+        output_layer: output
+```
 Because of our `predict_config`, when we run `model predict` it will apply the model on
 windows in the "predict" group, which is where we added the Portland window.
@@ -551,39 +630,46 @@ windows in the "predict" group, which is where we added the Portland window.
 And it will be written in a new output_layer called "output". But we have to update the
 dataset configuration so it specifies the layer:
-    "layers": {
-        "sentinel2": {
-            ...
-        },
-        "worldcover": {
-            ...
-        },
-        "output": {
-            "type": "raster",
-            "band_sets": [{
-                "dtype": "uint8",
-                "bands": ["output"]
-            }]
-        }
+```jsonc
+"layers": {
+    "sentinel2": {
+        # ...
     },
+    "worldcover": {
+        # ...
+    },
+    "output": {
+        "type": "raster",
+        "band_sets": [{
+            "dtype": "uint8",
+            "bands": ["output"]
+        }]
+    }
+},
+```
 Now we can apply the model:
-    # Find model checkpoint in lightning_logs dir.
-    ls lightning_logs/*/checkpoints/last.ckpt
-    rslearn model predict --config land_cover_model.yaml --ckpt_path lightning_logs/version_0/checkpoints/last.ckpt
+```
+# Find model checkpoint in lightning_logs dir.
+ls lightning_logs/*/checkpoints/last.ckpt
+rslearn model predict --config land_cover_model.yaml --ckpt_path land_cover_model_checkpoints/last.ckpt
+```
 And visualize the Sentinel-2 image and output in qgis:
-    qgis $DATASET_PATH/windows/predict/portland/layers/*/*/geotiff.tif
+```
+qgis $DATASET_PATH/windows/predict/portland/layers/*/*/geotiff.tif
+```
 ### Defining Train and Validation Splits
 We can visualize the logged metrics using Tensorboard:
-    tensorboard --logdir=lightning_logs/
+```
+tensorboard --logdir=lightning_logs/
+```
 However, because our training and validation data are identical, the validation metrics
 are not meaningful.
@@ -597,57 +683,61 @@ We will use the second approach. The script below sets a "split" key in the opti
 dict (which is stored in each window's `metadata.json` file) to "train" or "val"
 based on the SHA-256 hash of the window name.
-    import hashlib
-    import tqdm
-    from rslearn.dataset import Dataset, Window
-    from upath import UPath
-    ds_path = UPath("/path/to/dataset/")
-    dataset = Dataset(ds_path)
-    windows = dataset.load_windows(show_progress=True, workers=32)
-    for window in tqdm.tqdm(windows):
-        if hashlib.sha256(window.name.encode()).hexdigest()[0] in ["0", "1"]:
-            split = "val"
-        else:
-            split = "train"
-        if "split" in window.options and window.options["split"] == split:
-            continue
-        window.options["split"] = split
-        window.save()
+```python
+import hashlib
+import tqdm
+from rslearn.dataset import Dataset, Window
+from upath import UPath
+ds_path = UPath("/path/to/dataset/")
+dataset = Dataset(ds_path)
+windows = dataset.load_windows(show_progress=True, workers=32)
+for window in tqdm.tqdm(windows):
+    if hashlib.sha256(window.name.encode()).hexdigest()[0] in ["0", "1"]:
+        split = "val"
+    else:
+        split = "train"
+    if "split" in window.options and window.options["split"] == split:
+        continue
+    window.options["split"] = split
+    window.save()
+```
 Now we can update the model configuration file to use these splits:
-    default_config:
-      transforms:
-        - class_path: rslearn.train.transforms.normalize.Normalize
-          init_args:
-            mean: 0
-            std: 255
-    train_config:
-      transforms:
-        - class_path: rslearn.train.transforms.normalize.Normalize
-          init_args:
-            mean: 0
-            std: 255
-        - class_path: rslearn.train.transforms.flip.Flip
-          init_args:
-            image_selectors: ["image", "target/classes", "target/valid"]
-      groups: ["default"]
-      tags:
-        split: train
-    val_config:
-      groups: ["default"]
-      tags:
-        split: val
-    test_config:
-      groups: ["default"]
-      tags:
-        split: val
-    predict_config:
-      groups: ["predict"]
-      load_all_patches: true
-      skip_targets: true
-      patch_size: 512
+```yaml
+default_config:
+  transforms:
+    - class_path: rslearn.train.transforms.normalize.Normalize
+      init_args:
+        mean: 0
+        std: 255
+train_config:
+  transforms:
+    - class_path: rslearn.train.transforms.normalize.Normalize
+      init_args:
+        mean: 0
+        std: 255
+    - class_path: rslearn.train.transforms.flip.Flip
+      init_args:
+        image_selectors: ["image", "target/classes", "target/valid"]
+  groups: ["default"]
+  tags:
+    split: train
+val_config:
+  groups: ["default"]
+  tags:
+    split: val
+test_config:
+  groups: ["default"]
+  tags:
+    split: val
+predict_config:
+  groups: ["predict"]
+  load_all_patches: true
+  skip_targets: true
+  patch_size: 512
+```
 The `tags` option that we are adding here tells rslearn to only load windows with a
 matching key and value in the window options.
@@ -655,28 +745,187 @@ matching key and value in the window options.
 Previously when we run `model fit`, it should show the same number of windows for
 training and validation:
-    got 4752 examples in split train
-    got 4752 examples in split val
+```
+got 4752 examples in split train
+got 4752 examples in split val
+```
 With the updates, it should show different numbers like this:
-    got 4167 examples in split train
-    got 585 examples in split val
+```
+got 4167 examples in split train
+got 585 examples in split val
+```
 ### Visualizing with `model test`
-Coming soon
+We can visualize the ground truth labels and model predictions in the test set using
+the `model test` command:
+```
+mkdir ./vis
+rslearn model test --config land_cover_model.yaml --ckpt_path land_cover_model_checkpoints/last.ckpt --model.init_args.visualize_dir=./vis/
+```
+This will produce PNGs in the vis directory. The visualizations are produced by the
+`Task.visualize` function, so we could customize the visualization by subclassing
+SegmentationTask and overriding the visualize function.
+### Checkpoint and Logging Management
+Above, we needed to configure the checkpoint directory in the model config (the
+`dirpath` option under `lightning.pytorch.callbacks.ModelCheckpoint`), and explicitly
+specify the checkpoint path when applying the model. Additionally, metrics are logged
+to the local filesystem and not well organized.
+We can instead let rslearn automatically manage checkpoints, along with logging to
+Weights & Biases. To do so, we add project_name, run_name, and management_dir options
+to the model config. The project_name corresponds to the W&B project, and the run name
+corresponds to the W&B name. The management_dir is a directory to store project data;
+rslearn determines a per-project directory at `{management_dir}/{project_name}/{run_name}/`
+and uses it to store checkpoints.
+```yaml
+model:
+  # ...
+data:
+  # ...
+trainer:
+  # ...
+project_name: land_cover_model
+run_name: version_00
+# This sets the option via the MANAGEMENT_DIR environment variable.
+management_dir: ${MANAGEMENT_DIR}
+```
+Now, set the `MANAGEMENT_DIR` environment variable and run `model fit`:
+```
+export MANAGEMENT_DIR=./project_data
+rslearn model fit --config land_cover_model.yaml
+```
+The training and validation loss and accuracy metric should now be logged to W&B. The
+accuracy metric is provided by SegmentationTask, and additional metrics can be enabled
+by passing the relevant init_args to the task, e.g. mean IoU and F1:
+```yaml
+      class_path: rslearn.train.tasks.segmentation.SegmentationTask
+      init_args:
+        num_classes: 101
+        remap_values: [[0, 1], [0, 255]]
+        enable_miou_metric: true
+        enable_f1_metric: true
+```
+When calling `model test` and `model predict` with management_dir set, rslearn will
+automatically load the best checkpoint from the project directory, or raise an error if
+no existing checkpoint exists. This behavior can be overridden with the
+`--load_checkpoint_mode` and `--load_checkpoint_required` options (see `--help` for
+details). Logging will be enabled during fit but not test/predict, and this can also
+be overridden, using `--log_mode`.
 ### Inputting Multiple Sentinel-2 Images
-Coming soon
+Currently our model inputs a single Sentinel-2 image. However, for most tasks where
+labels are not expected to change from week to week, we find that accuracy can be
+significantly improved by inputting multiple images, regardless of the pre-trained
+model used. Multiple images makes the model more resilient to clouds and image
+artifacts, and allows the model to synthesize information across different views that
+may come from different seasons or weather conditions.
+We first update our dataset configuration to obtain three images, by customizing the
+query_config section. This can replace the sentinel2 layer:
+```jsonc
+"layers": {
+    "sentinel2_multi": {
+        "type": "raster",
+        "band_sets": [{
+            "dtype": "uint8",
+            "bands": ["R", "G", "B"]
+        }],
+        "data_source": {
+            "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
+            "init_args": {
+              "index_cache_dir": "cache/sentinel2/",
+              "sort_by": "cloud_cover",
+              "use_rtree_index": false
+            },
+            "query_config": {
+                "max_matches": 3
+            }
+        }
+    },
+    "worldcover": {
+        # ...
+    },
+    "output": {
+        # ...
+    }
+}
+```
+Repeat the steps from earlier to prepare, ingest, and materialize the dataset.
+Now we update our model configuration file. First, we modify the model architecture to
+be able to input an image time series. We use the SimpleTimeSeries model, which takes
+an encoder that expects a single-image input, and applies that encoder on each image in
+the time series. It then applies max temporal pooling to combine the per-image feature
+maps extracted by the encoder.
-### Logging to Weights & Biases
+Image time series in rslearn are currently stored as [T*C, H, W] tensors. So we pass
+the `image_channels` to SimpleTimeSeries so it knows how to slice up the tensor to
+recover the per-timestep images.
-Coming soon
+```yaml
+model:
+  class_path: rslearn.train.lightning_module.RslearnLightningModule
+  init_args:
+    model:
+      class_path: rslearn.models.singletask.SingleTaskModel
+      init_args:
+        encoder:
+          - class_path: rslearn.models.simple_time_series.SimpleTimeSeries
+            init_args:
+              encoder:
+                class_path: rslearn.models.satlaspretrain.SatlasPretrain
+                init_args:
+                  model_identifier: "Sentinel2_SwinB_SI_RGB"
+              image_channels: 3
+        decoder:
+          # ...
+```
+Next, we update the data module section so that the dataset loads the image time series
+rather than a single image. The `load_all_layers` option tells the dataset to stack the
+rasters from all of the layers specified, and also to ignore windows where any of those
+layers are missing.
+```yaml
+data:
+  class_path: rslearn.train.data_module.RslearnDataModule
+  init_args:
+    path: # ...
+    inputs:
+      image:
+        data_type: "raster"
+        layers: ["sentinel2_multi", "sentinel2_multi.1", "sentinel2_multi.2"]
+        bands: ["R", "G", "B"]
+        passthrough: true
+        load_all_layers: true
+      targets:
+        # ...
+```
+Now we can train an updated model:
+```
+rslearn model fit --config land_cover_model.yaml
+```
 Contact

rslearn 0.0.1__py3-none-any.whl → 0.0.21__py3-none-any.whl

rslearn 0.0.1py3-none-any.whl → 0.0.21py3-none-any.whl