PyPI - stable-baselines3 - Versions diffs - 2.3.2__tar.gz → 2.4.0__tar.gz - Mend

stable-baselines3 2.3.2tar.gz → 2.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

{stable_baselines3-2.3.2/stable_baselines3.egg-info → stable_baselines3-2.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: stable_baselines3
-Version: 2.3.2
+Version: 2.4.0
 Summary: Pytorch version of Stable Baselines, implementations of reinforcement learning algorithms.
 Home-page: https://github.com/DLR-RM/stable-baselines3
 Author: Antonin Raffin
@@ -22,8 +22,8 @@ Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 License-File: NOTICE
-Requires-Dist: gymnasium<0.30,>=0.28.1
-Requires-Dist: numpy>=1.20
+Requires-Dist: gymnasium<1.1.0,>=0.29.1
+Requires-Dist: numpy<2.0,>=1.20
 Requires-Dist: torch>=1.13
 Requires-Dist: cloudpickle
 Requires-Dist: pandas
@@ -37,7 +37,7 @@ Requires-Dist: mypy; extra == "tests"
 Requires-Dist: ruff>=0.3.1; extra == "tests"
 Requires-Dist: black<25,>=24.2.0; extra == "tests"
 Provides-Extra: docs
-Requires-Dist: sphinx<8,>=5; extra == "docs"
+Requires-Dist: sphinx<9,>=5; extra == "docs"
 Requires-Dist: sphinx-autobuild; extra == "docs"
 Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == "docs"
 Requires-Dist: sphinxcontrib.spelling; extra == "docs"
@@ -49,18 +49,8 @@ Requires-Dist: tensorboard>=2.9.1; extra == "extra"
 Requires-Dist: psutil; extra == "extra"
 Requires-Dist: tqdm; extra == "extra"
 Requires-Dist: rich; extra == "extra"
-Requires-Dist: shimmy[atari]~=1.3.0; extra == "extra"
+Requires-Dist: ale-py>=0.9.0; extra == "extra"
 Requires-Dist: pillow; extra == "extra"
-Requires-Dist: autorom[accept-rom-license]~=0.6.1; extra == "extra"
-Provides-Extra: extra-no-roms
-Requires-Dist: opencv-python; extra == "extra-no-roms"
-Requires-Dist: pygame; extra == "extra-no-roms"
-Requires-Dist: tensorboard>=2.9.1; extra == "extra-no-roms"
-Requires-Dist: psutil; extra == "extra-no-roms"
-Requires-Dist: tqdm; extra == "extra-no-roms"
-Requires-Dist: rich; extra == "extra-no-roms"
-Requires-Dist: shimmy[atari]~=1.3.0; extra == "extra-no-roms"
-Requires-Dist: pillow; extra == "extra-no-roms"

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/README.md RENAMED Viewed

@@ -1,13 +1,13 @@
-<img src="docs/\_static/img/logo.png" align="right" width="40%"/>
 <!-- [![pipeline status](https://gitlab.com/araffin/stable-baselines3/badges/master/pipeline.svg)](https://gitlab.com/araffin/stable-baselines3/-/commits/master) -->
-![CI](https://github.com/DLR-RM/stable-baselines3/workflows/CI/badge.svg)
-[![Documentation Status](https://readthedocs.org/projects/stable-baselines/badge/?version=master)](https://stable-baselines3.readthedocs.io/en/master/?badge=master) [![coverage report](https://gitlab.com/araffin/stable-baselines3/badges/master/coverage.svg)](https://gitlab.com/araffin/stable-baselines3/-/commits/master)
+[![CI](https://github.com/DLR-RM/stable-baselines3/workflows/CI/badge.svg)](https://github.com/DLR-RM/stable-baselines3/actions/workflows/ci.yml)
+[![Documentation Status](https://readthedocs.org/projects/stable-baselines/badge/?version=master)](https://stable-baselines3.readthedocs.io/en/master/?badge=master) [![coverage report](https://gitlab.com/araffin/stable-baselines3/badges/master/coverage.svg)](https://github.com/DLR-RM/stable-baselines3/actions/workflows/ci.yml)
 [![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 # Stable Baselines3
+<img src="docs/\_static/img/logo.png" align="right" width="40%"/>
 Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of [Stable Baselines](https://github.com/hill-a/stable-baselines).
 You can read a detailed presentation of Stable Baselines3 in the [v1.0 blog post](https://araffin.github.io/post/sb3/) or our [JMLR paper](https://jmlr.org/papers/volume22/20-1364/20-1364.pdf).
@@ -22,6 +22,8 @@ These algorithms will make it easier for the research community and industry to
 **The performance of each algorithm was tested** (see *Results* section in their respective page),
 you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselines3/issues/48) and [#49](https://github.com/DLR-RM/stable-baselines3/issues/49) for more details.
+We also provide detailed logs and reports on the [OpenRL Benchmark](https://wandb.ai/openrlbenchmark/sb3) platform.
 | **Features**                | **Stable-Baselines3** |
 | --------------------------- | ----------------------|
@@ -41,7 +43,13 @@ you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselin
 ### Planned features
-Please take a look at the [Roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) and [Milestones](https://github.com/DLR-RM/stable-baselines3/milestones).
+Since most of the features from the [original roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) have been implemented, there are no major changes planned for SB3, it is now *stable*.
+If you want to contribute, you can search in the issues for the ones where [help is welcomed](https://github.com/DLR-RM/stable-baselines3/labels/help%20wanted) and the other [proposed enhancements](https://github.com/DLR-RM/stable-baselines3/labels/enhancement).
+While SB3 development is now focused on bug fixes and maintenance (doc update, user experience, ...), there is more active development going on in the associated repositories:
+- newer algorithms are regularly added to the [SB3 Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib) repository
+- faster variants are developed in the [SBX (SB3 + Jax)](https://github.com/araffin/sbx) repository
+- the training framework for SB3, the RL Zoo, has an active [roadmap](https://github.com/DLR-RM/rl-baselines3-zoo/issues/299)
 ## Migration guide: from Stable-Baselines (SB2) to Stable-Baselines3 (SB3)
@@ -79,7 +87,7 @@ Documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
 We implement experimental features in a separate contrib repository: [SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)
-This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Recurrent PPO (PPO LSTM), Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO).
+This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Recurrent PPO (PPO LSTM), CrossQ, Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO).
 Documentation is available online: [https://sb3-contrib.readthedocs.io/](https://sb3-contrib.readthedocs.io/)
@@ -97,17 +105,16 @@ It provides a minimal number of features compared to SB3 but can be much faster
 ### Prerequisites
 Stable Baselines3 requires Python 3.8+.
-#### Windows 10
+#### Windows
 To install stable-baselines on Windows, please look at the [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/install.html#prerequisites).
 ### Install using pip
 Install the Stable Baselines3 package:
+```sh
+pip install 'stable-baselines3[extra]'
 ```
-pip install stable-baselines3[extra]
-```
-**Note:** Some shells such as Zsh require quotation marks around brackets, i.e. `pip install 'stable-baselines3[extra]'` ([More Info](https://stackoverflow.com/a/30539963)).
 This includes an optional dependencies like Tensorboard, OpenCV or `ale-py` to train on atari games. If you do not need those, you can use:
 ```sh
@@ -177,6 +184,7 @@ All the following examples can be executed online using Google Colab notebooks:
 | ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | --------------------------------- |
 | ARS<sup>[1](#f1)</sup>   | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
 | A2C   | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
+| CrossQ<sup>[1](#f1)</sup>   | :x: | :heavy_check_mark: | :x:                | :x:                 | :x:                | :heavy_check_mark: |
 | DDPG  | :x: | :heavy_check_mark: | :x:                | :x:                 | :x:                | :heavy_check_mark: |
 | DQN   | :x: | :x: | :heavy_check_mark: | :x:                 | :x:                | :heavy_check_mark: |
 | HER   | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
@@ -191,8 +199,8 @@ All the following examples can be executed online using Google Colab notebooks:
 <b id="f1">1</b>: Implemented in [SB3 Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib) GitHub repository.
-Actions `gym.spaces`:
- * `Box`: A N-dimensional box that containes every point in the action space.
+Actions `gymnasium.spaces`:
+ * `Box`: A N-dimensional box that contains every point in the action space.
  * `Discrete`: A list of possible actions, where each timestep only one of the actions can be used.
  * `MultiDiscrete`: A list of possible actions, where each timestep only one action of each discrete set can be used.
  * `MultiBinary`: A list of possible actions, where each timestep any of the actions can be used in any combination.
@@ -218,9 +226,9 @@ To run a single test:
 python3 -m pytest -v -k 'test_check_env_dict_action'
 ```
-You can also do a static type check using `pytype` and `mypy`:
+You can also do a static type check using `mypy`:
 ```sh
-pip install pytype mypy
+pip install mypy
 make type
 ```
@@ -252,6 +260,8 @@ To cite this repository in publications:
 }
 ```
+Note: If you need to refer to a specific version of SB3, you can also use the [Zenodo DOI](https://doi.org/10.5281/zenodo.8123988).
 ## Maintainers
 Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://gallouedec.com/) (@qgallouedec).

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/pyproject.toml RENAMED Viewed

@@ -13,11 +13,10 @@ ignore = ["B028", "RUF013"]
 [tool.ruff.lint.per-file-ignores]
 # Default implementation in abstract methods
-"./stable_baselines3/common/callbacks.py"= ["B027"]
-"./stable_baselines3/common/noise.py"= ["B027"]
+"./stable_baselines3/common/callbacks.py" = ["B027"]
+"./stable_baselines3/common/noise.py" = ["B027"]
 # ClassVar, implicit optional check not needed for tests
-"./tests/*.py"= ["RUF012", "RUF013"]
+"./tests/*.py" = ["RUF012", "RUF013"]
 [tool.ruff.lint.mccabe]
 # Unlike Flake8, default to a complexity level of 10.
@@ -37,31 +36,35 @@ exclude = """(?x)(
 [tool.pytest.ini_options]
 # Deterministic ordering for tests; useful for pytest-xdist.
-env = [
-	"PYTHONHASHSEED=0"
-]
+env = ["PYTHONHASHSEED=0"]
 filterwarnings = [
     # Tensorboard warnings
     "ignore::DeprecationWarning:tensorboard",
     # Gymnasium warnings
     "ignore::UserWarning:gymnasium",
+    # tqdm warning about rich being experimental
+    "ignore:rich is experimental",
 ]
 markers = [
-    "expensive: marks tests as expensive (deselect with '-m \"not expensive\"')"
+    "expensive: marks tests as expensive (deselect with '-m \"not expensive\"')",
 ]
 [tool.coverage.run]
 disable_warnings = ["couldnt-parse"]
 branch = false
 omit = [
-  "tests/*",
-  "setup.py",
-  # Require graphical interface
-  "stable_baselines3/common/results_plotter.py",
-  # Require ffmpeg
-  "stable_baselines3/common/vec_env/vec_video_recorder.py",
+    "tests/*",
+    "setup.py",
+    # Require graphical interface
+    "stable_baselines3/common/results_plotter.py",
+    # Require ffmpeg
+    "stable_baselines3/common/vec_env/vec_video_recorder.py",
 ]
 [tool.coverage.report]
-exclude_lines = [ "pragma: no cover", "raise NotImplementedError()", "if typing.TYPE_CHECKING:"]
+exclude_lines = [
+    "pragma: no cover",
+    "raise NotImplementedError()",
+    "if typing.TYPE_CHECKING:",
+]

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/setup.py RENAMED Viewed

@@ -70,38 +70,14 @@ model = PPO("MlpPolicy", "CartPole-v1").learn(10_000)
 """  # noqa:E501
-# Atari Games download is sometimes problematic:
-# https://github.com/Farama-Foundation/AutoROM/issues/39
-# That's why we define extra packages without it.
-extra_no_roms = [
-    # For render
-    "opencv-python",
-    "pygame",
-    # Tensorboard support
-    "tensorboard>=2.9.1",
-    # Checking memory taken by replay buffer
-    "psutil",
-    # For progress bar callback
-    "tqdm",
-    "rich",
-    # For atari games,
-    "shimmy[atari]~=1.3.0",
-    "pillow",
-]
-extra_packages = extra_no_roms + [  # noqa: RUF005
-    # For atari roms,
-    "autorom[accept-rom-license]~=0.6.1",
-]
 setup(
     name="stable_baselines3",
     packages=[package for package in find_packages() if package.startswith("stable_baselines3")],
     package_data={"stable_baselines3": ["py.typed", "version.txt"]},
     install_requires=[
-        "gymnasium>=0.28.1,<0.30",
-        "numpy>=1.20",
+        "gymnasium>=0.29.1,<1.1.0",
+        "numpy>=1.20,<2.0",  # PyTorch not compatible https://github.com/pytorch/pytorch/issues/107302
         "torch>=1.13",
         # For saving models
         "cloudpickle",
@@ -125,7 +101,7 @@ setup(
             "black>=24.2.0,<25",
         ],
         "docs": [
-            "sphinx>=5,<8",
+            "sphinx>=5,<9",
             "sphinx-autobuild",
             "sphinx-rtd-theme>=1.3.0",
             # For spelling
@@ -133,8 +109,21 @@ setup(
             # Copy button for code snippets
             "sphinx_copybutton",
         ],
-        "extra": extra_packages,
-        "extra_no_roms": extra_no_roms,
+        "extra": [
+            # For render
+            "opencv-python",
+            "pygame",
+            # Tensorboard support
+            "tensorboard>=2.9.1",
+            # Checking memory taken by replay buffer
+            "psutil",
+            # For progress bar callback
+            "tqdm",
+            "rich",
+            # For atari games,
+            "ale-py>=0.9.0",
+            "pillow",
+        ],
     },
     description="Pytorch version of Stable Baselines, implementations of reinforcement learning algorithms.",
     author="Antonin Raffin",

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/base_class.py RENAMED Viewed

@@ -48,7 +48,7 @@ def maybe_make_env(env: Union[GymEnv, str], verbose: int) -> GymEnv:
     """If env is a string, make the environment; otherwise, return env.
     :param env: The environment to learn from.
-    :param verbose: Verbosity level: 0 for no output, 1 for indicating if envrironment is created
+    :param verbose: Verbosity level: 0 for no output, 1 for indicating if environment is created
     :return A Gym (vector) environment.
     """
     if isinstance(env, str):
@@ -592,7 +592,7 @@ class BaseAlgorithm(ABC):
         if isinstance(load_path_or_dict, dict):
             params = load_path_or_dict
         else:
-            _, params, _ = load_from_zip_file(load_path_or_dict, device=device)
+            _, params, _ = load_from_zip_file(load_path_or_dict, device=device, load_data=False)
         # Keep track which objects were updated.
         # `_get_torch_save_params` returns [params, other_pytorch_variables].
@@ -692,10 +692,9 @@ class BaseAlgorithm(ABC):
             if "device" in data["policy_kwargs"]:
                 del data["policy_kwargs"]["device"]
             # backward compatibility, convert to new format
-            if "net_arch" in data["policy_kwargs"] and len(data["policy_kwargs"]["net_arch"]) > 0:
-                saved_net_arch = data["policy_kwargs"]["net_arch"]
-                if isinstance(saved_net_arch, list) and isinstance(saved_net_arch[0], dict):
-                    data["policy_kwargs"]["net_arch"] = saved_net_arch[0]
+            saved_net_arch = data["policy_kwargs"].get("net_arch")
+            if saved_net_arch and isinstance(saved_net_arch, list) and isinstance(saved_net_arch[0], dict):
+                data["policy_kwargs"]["net_arch"] = saved_net_arch[0]
         if "policy_kwargs" in kwargs and kwargs["policy_kwargs"] != data["policy_kwargs"]:
             raise ValueError(
@@ -743,13 +742,13 @@ class BaseAlgorithm(ABC):
             # put state_dicts back in place
             model.set_parameters(params, exact_match=True, device=device)
         except RuntimeError as e:
-            # Patch to load Policy saved using SB3 < 1.7.0
+            # Patch to load policies saved using SB3 < 1.7.0
             # the error is probably due to old policy being loaded
             # See https://github.com/DLR-RM/stable-baselines3/issues/1233
             if "pi_features_extractor" in str(e) and "Missing key(s) in state_dict" in str(e):
                 model.set_parameters(params, exact_match=False, device=device)
                 warnings.warn(
-                    "You are probably loading a model saved with SB3 < 1.7.0, "
+                    "You are probably loading a A2C/PPO model saved with SB3 < 1.7.0, "
                     "we deactivated exact_match so you can save the model "
                     "again to avoid issues in the future "
                     "(see https://github.com/DLR-RM/stable-baselines3/issues/1233 for more info). "
@@ -758,6 +757,29 @@ class BaseAlgorithm(ABC):
                 )
             else:
                 raise e
+        except ValueError as e:
+            # Patch to load DQN policies saved using SB3 < 2.4.0
+            # The target network params are no longer in the optimizer
+            # See https://github.com/DLR-RM/stable-baselines3/pull/1963
+            saved_optim_params = params["policy.optimizer"]["param_groups"][0]["params"]  # type: ignore[index]
+            n_params_saved = len(saved_optim_params)
+            n_params = len(model.policy.optimizer.param_groups[0]["params"])
+            if n_params_saved == 2 * n_params:
+                # Truncate to include only online network params
+                params["policy.optimizer"]["param_groups"][0]["params"] = saved_optim_params[:n_params]  # type: ignore[index]
+                model.set_parameters(params, exact_match=True, device=device)
+                warnings.warn(
+                    "You are probably loading a DQN model saved with SB3 < 2.4.0, "
+                    "we truncated the optimizer state so you can save the model "
+                    "again to avoid issues in the future "
+                    "(see https://github.com/DLR-RM/stable-baselines3/pull/1963 for more info). "
+                    f"Original error: {e} \n"
+                    "Note: the model should still work fine, this only a warning."
+                )
+            else:
+                raise e
         # put other pytorch variables back in place
         if pytorch_variables is not None:
             for name in pytorch_variables:

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/buffers.py RENAMED Viewed

@@ -419,12 +419,12 @@ class RolloutBuffer(BaseBuffer):
         :param dones: if the last step was a terminal step (one bool for each env).
         """
         # Convert to numpy
-        last_values = last_values.clone().cpu().numpy().flatten()
+        last_values = last_values.clone().cpu().numpy().flatten()  # type: ignore[assignment]
         last_gae_lam = 0
         for step in reversed(range(self.buffer_size)):
             if step == self.buffer_size - 1:
-                next_non_terminal = 1.0 - dones
+                next_non_terminal = 1.0 - dones.astype(np.float32)
                 next_values = last_values
             else:
                 next_non_terminal = 1.0 - self.episode_starts[step + 1]

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/callbacks.py RENAMED Viewed

@@ -204,6 +204,10 @@ class CallbackList(BaseCallback):
         for callback in self.callbacks:
             callback.init_callback(self.model)
+            # Fix for https://github.com/DLR-RM/stable-baselines3/issues/1791
+            # pass through the parent callback to all children
+            callback.parent = self.parent
     def _on_training_start(self) -> None:
         for callback in self.callbacks:
             callback.on_training_start(self.locals, self.globals)
@@ -606,7 +610,7 @@ class StopTrainingOnMaxEpisodes(BaseCallback):
         self.n_episodes = 0
     def _init_callback(self) -> None:
-        # At start set total max according to number of envirnments
+        # At start set total max according to number of environments
         self._total_max_episodes = self.max_episodes * self.training_env.num_envs
     def _on_step(self) -> bool:

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/env_checker.py RENAMED Viewed

@@ -98,6 +98,14 @@ def _check_unsupported_spaces(env: gym.Env, observation_space: spaces.Space, act
                 "is not supported but `dict(space2=Box(), spaces3=Box(), spaces4=Discrete())` is."
             )
+    if isinstance(observation_space, spaces.MultiDiscrete) and len(observation_space.nvec.shape) > 1:
+        warnings.warn(
+            f"The MultiDiscrete observation space uses a multidimensional array {observation_space.nvec} "
+            "which is currently not supported by Stable-Baselines3. "
+            "Please convert it to a 1D array using a wrapper: "
+            "https://github.com/DLR-RM/stable-baselines3/issues/1836."
+        )
     if isinstance(observation_space, spaces.Tuple):
         warnings.warn(
             "The observation space is a Tuple, "
@@ -397,7 +405,7 @@ def _check_render(env: gym.Env, warn: bool = False) -> None:  # pragma: no cover
                 "you may have trouble when calling `.render()`"
             )
-    # Only check currrent render mode
+    # Only check current render mode
     if env.render_mode:
         env.render()
     env.close()

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/logger.py RENAMED Viewed

@@ -412,8 +412,9 @@ class TensorBoardOutputFormat(KVWriter):
                 else:
                     self.writer.add_scalar(key, value, step)
-            if isinstance(value, th.Tensor):
-                self.writer.add_histogram(key, value, step)
+            if isinstance(value, (th.Tensor, np.ndarray)):
+                # Convert to Torch so it works with numpy<1.24 and torch<2.0
+                self.writer.add_histogram(key, th.as_tensor(value), step)
             if isinstance(value, Video):
                 self.writer.add_video(key, value.frames, step, value.fps)

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/monitor.py RENAMED Viewed

@@ -189,7 +189,7 @@ class ResultsWriter:
         filename = os.path.realpath(filename)
         # Create (if any) missing filename directories
         os.makedirs(os.path.dirname(filename), exist_ok=True)
-        # Append mode when not overridding existing file
+        # Append mode when not overriding existing file
         mode = "w" if override_existing else "a"
         # Prevent newline issue on Windows, see GH issue #692
         self.file_handler = open(filename, f"{mode}t", newline="\n")

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/on_policy_algorithm.py RENAMED Viewed

@@ -1,5 +1,6 @@
 import sys
 import time
+import warnings
 from typing import Any, Dict, List, Optional, Tuple, Type, TypeVar, Union
 import numpy as np
@@ -135,6 +136,28 @@ class OnPolicyAlgorithm(BaseAlgorithm):
             self.observation_space, self.action_space, self.lr_schedule, use_sde=self.use_sde, **self.policy_kwargs
         )
         self.policy = self.policy.to(self.device)
+        # Warn when not using CPU with MlpPolicy
+        self._maybe_recommend_cpu()
+    def _maybe_recommend_cpu(self, mlp_class_name: str = "ActorCriticPolicy") -> None:
+        """
+        Recommend to use CPU only when using A2C/PPO with MlpPolicy.
+        :param: The name of the class for the default MlpPolicy.
+        """
+        policy_class_name = self.policy_class.__name__
+        if self.device != th.device("cpu") and policy_class_name == mlp_class_name:
+            warnings.warn(
+                f"You are trying to run {self.__class__.__name__} on the GPU, "
+                "but it is primarily intended to run on the CPU when not using a CNN policy "
+                f"(you are using {policy_class_name} which should be a MlpPolicy). "
+                "See https://github.com/DLR-RM/stable-baselines3/issues/1245 "
+                "for more info. "
+                "You can pass `device='cpu'` or `export CUDA_VISIBLE_DEVICES=` to force using the CPU."
+                "Note: The model will train, but the GPU utilization will be poor and "
+                "the training might take longer than on CPU.",
+                UserWarning,
+            )
     def collect_rollouts(
         self,
@@ -208,7 +231,7 @@ class OnPolicyAlgorithm(BaseAlgorithm):
                 # Reshape in case of discrete action
                 actions = actions.reshape(-1, 1)
-            # Handle timeout by bootstraping with value function
+            # Handle timeout by bootstrapping with value function
             # see GitHub issue #633
             for idx, done in enumerate(dones):
                 if (

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/policies.py RENAMED Viewed

@@ -367,7 +367,7 @@ class BasePolicy(BaseModel, ABC):
         with th.no_grad():
             actions = self._predict(obs_tensor, deterministic=deterministic)
         # Convert to numpy, and reshape to the original action shape
-        actions = actions.cpu().numpy().reshape((-1, *self.action_space.shape))  # type: ignore[misc]
+        actions = actions.cpu().numpy().reshape((-1, *self.action_space.shape))  # type: ignore[misc, assignment]
         if isinstance(self.action_space, spaces.Box):
             if self.squash_output:
@@ -922,7 +922,7 @@ class ContinuousCritic(BaseModel):
     By default, it creates two critic networks used to reduce overestimation
     thanks to clipped Q-learning (cf TD3 paper).
-    :param observation_space: Obervation space
+    :param observation_space: Observation space
     :param action_space: Action space
     :param net_arch: Network architecture
     :param features_extractor: Network to extract features

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/results_plotter.py RENAMED Viewed

@@ -46,7 +46,7 @@ def window_func(var_1: np.ndarray, var_2: np.ndarray, window: int, func: Callabl
 def ts2xy(data_frame: pd.DataFrame, x_axis: str) -> Tuple[np.ndarray, np.ndarray]:
     """
-    Decompose a data frame variable to x ans ys
+    Decompose a data frame variable to x and ys
     :param data_frame: the input data
     :param x_axis: the axis for the x and y output

{stable_baselines3-2.3.2 → stable_baselines3-2.4.0}/stable_baselines3/common/running_mean_std.py RENAMED Viewed

@@ -6,7 +6,7 @@ import numpy as np
 class RunningMeanStd:
     def __init__(self, epsilon: float = 1e-4, shape: Tuple[int, ...] = ()):
         """
-        Calulates the running mean and std of a data stream
+        Calculates the running mean and std of a data stream
         https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
         :param epsilon: helps with arithmetic issues

stable-baselines3 2.3.2__tar.gz → 2.4.0__tar.gz

stable-baselines3 2.3.2tar.gz → 2.4.0tar.gz