PyPI - rslearn - Versions diffs - 0.0.6__tar.gz → 0.0.8__tar.gz - Mend

rslearn 0.0.6tar.gz → 0.0.8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (169) hide show

{rslearn-0.0.6/rslearn.egg-info → rslearn-0.0.8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: rslearn
-Version: 0.0.6
+Version: 0.0.8
 Summary: A library for developing remote sensing datasets and models
 Author: OlmoEarth Team
 License:                                  Apache License
@@ -214,7 +214,7 @@ License-File: LICENSE
 Requires-Dist: boto3>=1.39
 Requires-Dist: class_registry>=2.1
 Requires-Dist: fiona>=1.10
-Requires-Dist: fsspec==2025.3.0
+Requires-Dist: fsspec>=2025.9.0
 Requires-Dist: jsonargparse>=4.35.0
 Requires-Dist: lightning>=2.5.1.post0
 Requires-Dist: Pillow>=11.3
@@ -233,9 +233,10 @@ Requires-Dist: cdsapi>=0.7.6; extra == "extra"
 Requires-Dist: earthdaily[platform]>=1.0.7; extra == "extra"
 Requires-Dist: earthengine-api>=1.6.3; extra == "extra"
 Requires-Dist: einops>=0.8; extra == "extra"
-Requires-Dist: gcsfs==2025.3.0; extra == "extra"
+Requires-Dist: fsspec[gcs,s3]; extra == "extra"
 Requires-Dist: google-cloud-bigquery>=3.35; extra == "extra"
 Requires-Dist: google-cloud-storage>=2.18; extra == "extra"
+Requires-Dist: huggingface_hub>=0.34.4; extra == "extra"
 Requires-Dist: netCDF4>=1.7.2; extra == "extra"
 Requires-Dist: osmium>=4.0.2; extra == "extra"
 Requires-Dist: planet>=3.1; extra == "extra"
@@ -243,12 +244,12 @@ Requires-Dist: planetary_computer>=1.0; extra == "extra"
 Requires-Dist: pycocotools>=2.0; extra == "extra"
 Requires-Dist: pystac_client>=0.9; extra == "extra"
 Requires-Dist: rtree>=1.4; extra == "extra"
-Requires-Dist: s3fs==2025.3.0; extra == "extra"
 Requires-Dist: satlaspretrain_models>=0.3; extra == "extra"
 Requires-Dist: scipy>=1.16; extra == "extra"
 Requires-Dist: terratorch>=1.0.2; extra == "extra"
 Requires-Dist: transformers>=4.55; extra == "extra"
 Requires-Dist: wandb>=0.21; extra == "extra"
+Requires-Dist: timm>=0.9.7; extra == "extra"
 Provides-Extra: dev
 Requires-Dist: interrogate>=1.7.0; extra == "dev"
 Requires-Dist: mypy<2,>=1.17.1; extra == "dev"
@@ -437,10 +438,10 @@ that they align with the windows we have previously defined (and the Sentinel-2
 we have already ingested). We can use the LocalFiles data source to have rslearn
 automate this process. Update the dataset `config.json` with a new layer:
-```json
+```jsonc
 "layers": {
     "sentinel2": {
-        ...
+        # ...
     },
     "worldcover": {
         "type": "raster",
@@ -455,7 +456,7 @@ automate this process. Update the dataset `config.json` with a new layer:
         }
     }
 },
-...
+# ...
 ```
 Repeat the materialize process so we populate the data for this new layer:
@@ -577,6 +578,7 @@ trainer:
         save_last: true
         monitor: val_accuracy
         mode: max
+        dirpath: ./land_cover_model_checkpoints/
 ```
 Now we can train the model:
@@ -621,13 +623,13 @@ windows in the "predict" group, which is where we added the Portland window.
 And it will be written in a new output_layer called "output". But we have to update the
 dataset configuration so it specifies the layer:
-```json
+```jsonc
 "layers": {
     "sentinel2": {
-        ...
+        # ...
     },
     "worldcover": {
-        ...
+        # ...
     },
     "output": {
         "type": "raster",
@@ -644,7 +646,7 @@ Now we can apply the model:
 ```
 # Find model checkpoint in lightning_logs dir.
 ls lightning_logs/*/checkpoints/last.ckpt
-rslearn model predict --config land_cover_model.yaml --ckpt_path lightning_logs/version_0/checkpoints/last.ckpt
+rslearn model predict --config land_cover_model.yaml --ckpt_path land_cover_model_checkpoints/last.ckpt
 ```
 And visualize the Sentinel-2 image and output in qgis:
@@ -751,17 +753,144 @@ got 585 examples in split val
 ### Visualizing with `model test`
-Coming soon
+We can visualize the ground truth labels and model predictions in the test set using
+the `model test` command:
+```
+mkdir ./vis
+rslearn model test --config land_cover_model.yaml --ckpt_path land_cover_model_checkpoints/last.ckpt --model.init_args.visualize_dir=./vis/
+```
+This will produce PNGs in the vis directory. The visualizations are produced by the
+`Task.visualize` function, so we could customize the visualization by subclassing
+SegmentationTask and overriding the visualize function.
+### Logging to Weights & Biases
+We can log to W&B by setting the logger under trainer in the model configuration file:
+```yaml
+trainer:
+  # ...
+  logger:
+    class_path: lightning.pytorch.loggers.WandbLogger
+    init_args:
+      project: land_cover_model
+      name: version_00
+```
+Now, runs with this model configuration should show on W&B. For `model fit` runs,
+the training and validation loss and accuracy metric will be logged. The accuracy
+metric is provided by SegmentationTask, and additional metrics can be enabled by
+passing the relevant init_args to the task, e.g. mean IoU and F1:
+```yaml
+      class_path: rslearn.train.tasks.segmentation.SegmentationTask
+      init_args:
+        num_classes: 101
+        remap_values: [[0, 1], [0, 255]]
+        enable_miou_metric: true
+        enable_f1_metric: true
+```
 ### Inputting Multiple Sentinel-2 Images
-Coming soon
+Currently our model inputs a single Sentinel-2 image. However, for most tasks where
+labels are not expected to change from week to week, we find that accuracy can be
+significantly improved by inputting multiple images, regardless of the pre-trained
+model used. Multiple images makes the model more resilient to clouds and image
+artifacts, and allows the model to synthesize information across different views that
+may come from different seasons or weather conditions.
+We first update our dataset configuration to obtain three images, by customizing the
+query_config section. This can replace the sentinel2 layer:
-### Logging to Weights & Biases
+```jsonc
+"layers": {
+    "sentinel2_multi": {
+        "type": "raster",
+        "band_sets": [{
+            "dtype": "uint8",
+            "bands": ["R", "G", "B"]
+        }],
+        "data_source": {
+            "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
+            "index_cache_dir": "cache/sentinel2/",
+            "sort_by": "cloud_cover",
+            "use_rtree_index": false,
+            "query_config": {
+                "max_matches": 3
+            }
+        }
+    },
+    "worldcover": {
+        # ...
+    },
+    "output": {
+        # ...
+    }
+}
+```
+Repeat the steps from earlier to prepare, ingest, and materialize the dataset.
+Now we update our model configuration file. First, we modify the model architecture to
+be able to input an image time series. We use the SimpleTimeSeries model, which takes
+an encoder that expects a single-image input, and applies that encoder on each image in
+the time series. It then applies max temporal pooling to combine the per-image feature
+maps extracted by the encoder.
+Image time series in rslearn are currently stored as [T*C, H, W] tensors. So we pass
+the `image_channels` to SimpleTimeSeries so it knows how to slice up the tensor to
+recover the per-timestep images.
+```yaml
+model:
+  class_path: rslearn.train.lightning_module.RslearnLightningModule
+  init_args:
+    model:
+      class_path: rslearn.models.singletask.SingleTaskModel
+      init_args:
+        encoder:
+          - class_path: rslearn.models.simple_time_series.SimpleTimeSeries
+            init_args:
+              encoder:
+                class_path: rslearn.models.satlaspretrain.SatlasPretrain
+                init_args:
+                  model_identifier: "Sentinel2_SwinB_SI_RGB"
+              image_channels: 3
+        decoder:
+          # ...
+```
-Coming soon
+Next, we update the data module section so that the dataset loads the image time series
+rather than a single image. The `load_all_layers` option tells the dataset to stack the
+rasters from all of the layers specified, and also to ignore windows where any of those
+layers are missing.
+```yaml
+data:
+  class_path: rslearn.train.data_module.RslearnDataModule
+  init_args:
+    path: # ...
+    inputs:
+      image:
+        data_type: "raster"
+        layers: ["sentinel2_multi", "sentinel2_multi.1", "sentinel2_multi.2"]
+        bands: ["R", "G", "B"]
+        passthrough: true
+        load_all_layers: true
+      targets:
+        # ...
+```
+Now we can train an updated model:
+```
+rslearn model fit --config land_cover_model.yaml
+```
 Contact

{rslearn-0.0.6 → rslearn-0.0.8}/README.md RENAMED Viewed

@@ -175,10 +175,10 @@ that they align with the windows we have previously defined (and the Sentinel-2
 we have already ingested). We can use the LocalFiles data source to have rslearn
 automate this process. Update the dataset `config.json` with a new layer:
-```json
+```jsonc
 "layers": {
     "sentinel2": {
-        ...
+        # ...
     },
     "worldcover": {
         "type": "raster",
@@ -193,7 +193,7 @@ automate this process. Update the dataset `config.json` with a new layer:
         }
     }
 },
-...
+# ...
 ```
 Repeat the materialize process so we populate the data for this new layer:
@@ -315,6 +315,7 @@ trainer:
         save_last: true
         monitor: val_accuracy
         mode: max
+        dirpath: ./land_cover_model_checkpoints/
 ```
 Now we can train the model:
@@ -359,13 +360,13 @@ windows in the "predict" group, which is where we added the Portland window.
 And it will be written in a new output_layer called "output". But we have to update the
 dataset configuration so it specifies the layer:
-```json
+```jsonc
 "layers": {
     "sentinel2": {
-        ...
+        # ...
     },
     "worldcover": {
-        ...
+        # ...
     },
     "output": {
         "type": "raster",
@@ -382,7 +383,7 @@ Now we can apply the model:
 ```
 # Find model checkpoint in lightning_logs dir.
 ls lightning_logs/*/checkpoints/last.ckpt
-rslearn model predict --config land_cover_model.yaml --ckpt_path lightning_logs/version_0/checkpoints/last.ckpt
+rslearn model predict --config land_cover_model.yaml --ckpt_path land_cover_model_checkpoints/last.ckpt
 ```
 And visualize the Sentinel-2 image and output in qgis:
@@ -489,17 +490,144 @@ got 585 examples in split val
 ### Visualizing with `model test`
-Coming soon
+We can visualize the ground truth labels and model predictions in the test set using
+the `model test` command:
+```
+mkdir ./vis
+rslearn model test --config land_cover_model.yaml --ckpt_path land_cover_model_checkpoints/last.ckpt --model.init_args.visualize_dir=./vis/
+```
+This will produce PNGs in the vis directory. The visualizations are produced by the
+`Task.visualize` function, so we could customize the visualization by subclassing
+SegmentationTask and overriding the visualize function.
+### Logging to Weights & Biases
+We can log to W&B by setting the logger under trainer in the model configuration file:
+```yaml
+trainer:
+  # ...
+  logger:
+    class_path: lightning.pytorch.loggers.WandbLogger
+    init_args:
+      project: land_cover_model
+      name: version_00
+```
+Now, runs with this model configuration should show on W&B. For `model fit` runs,
+the training and validation loss and accuracy metric will be logged. The accuracy
+metric is provided by SegmentationTask, and additional metrics can be enabled by
+passing the relevant init_args to the task, e.g. mean IoU and F1:
+```yaml
+      class_path: rslearn.train.tasks.segmentation.SegmentationTask
+      init_args:
+        num_classes: 101
+        remap_values: [[0, 1], [0, 255]]
+        enable_miou_metric: true
+        enable_f1_metric: true
+```
 ### Inputting Multiple Sentinel-2 Images
-Coming soon
+Currently our model inputs a single Sentinel-2 image. However, for most tasks where
+labels are not expected to change from week to week, we find that accuracy can be
+significantly improved by inputting multiple images, regardless of the pre-trained
+model used. Multiple images makes the model more resilient to clouds and image
+artifacts, and allows the model to synthesize information across different views that
+may come from different seasons or weather conditions.
+We first update our dataset configuration to obtain three images, by customizing the
+query_config section. This can replace the sentinel2 layer:
-### Logging to Weights & Biases
+```jsonc
+"layers": {
+    "sentinel2_multi": {
+        "type": "raster",
+        "band_sets": [{
+            "dtype": "uint8",
+            "bands": ["R", "G", "B"]
+        }],
+        "data_source": {
+            "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
+            "index_cache_dir": "cache/sentinel2/",
+            "sort_by": "cloud_cover",
+            "use_rtree_index": false,
+            "query_config": {
+                "max_matches": 3
+            }
+        }
+    },
+    "worldcover": {
+        # ...
+    },
+    "output": {
+        # ...
+    }
+}
+```
+Repeat the steps from earlier to prepare, ingest, and materialize the dataset.
+Now we update our model configuration file. First, we modify the model architecture to
+be able to input an image time series. We use the SimpleTimeSeries model, which takes
+an encoder that expects a single-image input, and applies that encoder on each image in
+the time series. It then applies max temporal pooling to combine the per-image feature
+maps extracted by the encoder.
+Image time series in rslearn are currently stored as [T*C, H, W] tensors. So we pass
+the `image_channels` to SimpleTimeSeries so it knows how to slice up the tensor to
+recover the per-timestep images.
+```yaml
+model:
+  class_path: rslearn.train.lightning_module.RslearnLightningModule
+  init_args:
+    model:
+      class_path: rslearn.models.singletask.SingleTaskModel
+      init_args:
+        encoder:
+          - class_path: rslearn.models.simple_time_series.SimpleTimeSeries
+            init_args:
+              encoder:
+                class_path: rslearn.models.satlaspretrain.SatlasPretrain
+                init_args:
+                  model_identifier: "Sentinel2_SwinB_SI_RGB"
+              image_channels: 3
+        decoder:
+          # ...
+```
-Coming soon
+Next, we update the data module section so that the dataset loads the image time series
+rather than a single image. The `load_all_layers` option tells the dataset to stack the
+rasters from all of the layers specified, and also to ignore windows where any of those
+layers are missing.
+```yaml
+data:
+  class_path: rslearn.train.data_module.RslearnDataModule
+  init_args:
+    path: # ...
+    inputs:
+      image:
+        data_type: "raster"
+        layers: ["sentinel2_multi", "sentinel2_multi.1", "sentinel2_multi.2"]
+        bands: ["R", "G", "B"]
+        passthrough: true
+        load_all_layers: true
+      targets:
+        # ...
+```
+Now we can train an updated model:
+```
+rslearn model fit --config land_cover_model.yaml
+```
 Contact

{rslearn-0.0.6 → rslearn-0.0.8}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "rslearn"
-version = "0.0.6"
+version = "0.0.8"
 description = "A library for developing remote sensing datasets and models"
 authors = [
     { name = "OlmoEarth Team" },
@@ -12,9 +12,7 @@ dependencies = [
     "boto3>=1.39",
     "class_registry>=2.1",
     "fiona>=1.10",
-    # Need this pin since 2025.7.0 has slow performance for exists/ls operations, see
-    # this issue: https://github.com/fsspec/gcsfs/issues/696
-    "fsspec==2025.3.0",
+    "fsspec>=2025.9.0", # this is used both directly and indirectly (via universal_pathlib) in our code
     "jsonargparse>=4.35.0",
     "lightning>=2.5.1.post0",
     "Pillow>=11.3",
@@ -37,9 +35,12 @@ extra = [
     "earthdaily[platform]>=1.0.7",
     "earthengine-api>=1.6.3",
     "einops>=0.8",
-    "gcsfs==2025.3.0",
+    # https://github.com/fsspec/universal_pathlib?tab=readme-ov-file#adding-universal_pathlib-to-your-project
+    # https://github.com/fsspec/filesystem_spec?tab=readme-ov-file#install
+    "fsspec[gcs, s3]", # for both direct use via fsspec and indirect use via universal_pathlib, docs suggest enabling specific backends like this
     "google-cloud-bigquery>=3.35",
     "google-cloud-storage>=2.18",
+    "huggingface_hub>=0.34.4",
     "netCDF4>=1.7.2",
     "osmium>=4.0.2",
     "planet>=3.1",
@@ -47,12 +48,12 @@ extra = [
     "pycocotools>=2.0",
     "pystac_client>=0.9",
     "rtree>=1.4",
-    "s3fs==2025.3.0",
     "satlaspretrain_models>=0.3",
     "scipy>=1.16",
     "terratorch>=1.0.2",
     "transformers>=4.55",
     "wandb>=0.21",
+    "timm>=0.9.7",
 ]
 dev = [
@@ -83,6 +84,8 @@ include = ["rslearn*"]
 [tool.setuptools.package-data]
 rslearn = ["py.typed"]
+"rslearn.models.clay.configs" = ["metadata.yaml"]
+"rslearn.models.panopticon_data.sensors" = ["*.yaml"]
 [tool.ruff]
 fix = true

rslearn-0.0.8/rslearn/dataset/handler_summaries.py ADDED Viewed

@@ -0,0 +1,130 @@
+"""This module contains dataclasses for summarizing the results of dataset operations.
+They can be used by callers to emit telemetry / logs, or discarded.
+"""
+from dataclasses import dataclass
+@dataclass
+class LayerPrepareSummary:
+    """Results for preparing a single layer."""
+    # Identity
+    layer_name: str
+    data_source_name: str
+    # Timing
+    duration_seconds: float
+    # Counts
+    windows_prepared: int
+    windows_skipped: int
+    get_items_attempts: int
+@dataclass
+class PrepareDatasetWindowsSummary:
+    """Results from prepare_dataset_windows operation for telemetry purposes."""
+    # Timing
+    duration_seconds: float
+    # Counts
+    total_windows_requested: int
+    # Per-layer summaries
+    layer_summaries: list[LayerPrepareSummary]
+@dataclass
+class IngestCounts:
+    """Known ingestion counts."""
+    items_ingested: int
+    geometries_ingested: int
+@dataclass
+class UnknownIngestCounts:
+    """Indicates ingestion counts are unknown due to partial failure."""
+    items_attempted: int
+    geometries_attempted: int
+@dataclass
+class LayerIngestSummary:
+    """Results for ingesting a single layer."""
+    # Identity
+    layer_name: str
+    data_source_name: str
+    # Timing
+    duration_seconds: float
+    # Counts - either known or unknown
+    ingest_counts: IngestCounts | UnknownIngestCounts
+    ingest_attempts: int
+@dataclass
+class IngestDatasetJobsSummary:
+    """Results from ingesting a set of jobs; for telemetry purposes."""
+    # Timing
+    duration_seconds: float
+    # Counts
+    num_jobs: int
+    # Per-layer summaries
+    layer_summaries: list[LayerIngestSummary]
+@dataclass
+class MaterializeWindowLayerSummary:
+    """Results for materializing a single window layer."""
+    skipped: bool
+    materialize_attempts: int
+@dataclass
+class MaterializeWindowLayersSummary:
+    """Results for materialize a given layer for all windows in a materialize call."""
+    # Identity
+    layer_name: str
+    data_source_name: str
+    # Timing
+    duration_seconds: float
+    # Counts
+    total_windows_requested: int
+    num_windows_materialized: int
+    materialize_attempts: int
+@dataclass
+class MaterializeDatasetWindowsSummary:
+    """Results from materialize_dataset_windows operation for telemetry purposes."""
+    # Timing
+    duration_seconds: float
+    # Counts
+    total_windows_requested: int
+    # Per-layer summaries
+    layer_summaries: list[MaterializeWindowLayersSummary]
+@dataclass
+class ErrorOutcome:
+    """TBD what goes in here, if anything."""
+    # Timing
+    duration_seconds: float

rslearn 0.0.6__tar.gz → 0.0.8__tar.gz

rslearn 0.0.6tar.gz → 0.0.8tar.gz