PyPI - gymcts - Versions diffs - 1.2.1__tar.gz → 1.3.0__tar.gz - Mend

gymcts 1.2.1tar.gz → 1.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{gymcts-1.2.1/src/gymcts.egg-info → gymcts-1.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gymcts
-Version: 1.2.1
+Version: 1.3.0
 Summary: A minimalistic implementation of the Monte Carlo Tree Search algorithm for planning problems fomulated as gymnaisum reinforcement learning environments.
 Author: Alexander Nasuta
 Author-email: Alexander Nasuta <alexander.nasuta@wzl-iqs.rwth-aachen.de>
@@ -70,11 +70,18 @@ Requires-Dist: jupyter; extra == "dev"
 Requires-Dist: typing_extensions>=4.12.0; extra == "dev"
 Dynamic: license-file
-# Graph Matrix Job Shop Env
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15283390.svg)](https://doi.org/10.5281/zenodo.15283390)
+[![Python Badge](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=fff&style=flat)](https://www.python.org/downloads/)
+[![PyPI version](https://img.shields.io/pypi/v/gymcts)](https://pypi.org/project/gymcts/)
+[![License](https://img.shields.io/pypi/l/gymcts)](https://github.com/Alexander-Nasuta/gymcts/blob/master/LICENSE)
+[![Documentation Status](https://readthedocs.org/projects/gymcts/badge/?version=latest)](https://gymcts.readthedocs.io/en/latest/?badge=latest)
+# GYMCTS
 A Monte Carlo Tree Search Implementation for Gymnasium-style Environments.
 - Github: [GYMCTS on Github](https://github.com/Alexander-Nasuta/gymcts)
+- GitLab: [GYMCTS on GitLab](https://git-ce.rwth-aachen.de/alexander.nasuta/gymcts)
 - Pypi: [GYMCTS on PyPi](https://pypi.org/project/gymcts/)
 - Documentation: [GYMCTS Docs](https://gymcts.readthedocs.io/en/latest/)
@@ -579,9 +586,6 @@ This project uses `pytest` for testing. To run the tests, run the following comm
 ```shell
 pytest
 ```
-Here is a screenshot of what the output might look like:
-![](https://github.com/Alexander-Nasuta/GraphMatrixJobShopEnv/raw/master/resources/pytest-screenshot.png)
 For testing with `tox` run the following command:

{gymcts-1.2.1 → gymcts-1.3.0}/README.md RENAMED Viewed

@@ -1,8 +1,15 @@
-# Graph Matrix Job Shop Env
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15283390.svg)](https://doi.org/10.5281/zenodo.15283390)
+[![Python Badge](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=fff&style=flat)](https://www.python.org/downloads/)
+[![PyPI version](https://img.shields.io/pypi/v/gymcts)](https://pypi.org/project/gymcts/)
+[![License](https://img.shields.io/pypi/l/gymcts)](https://github.com/Alexander-Nasuta/gymcts/blob/master/LICENSE)
+[![Documentation Status](https://readthedocs.org/projects/gymcts/badge/?version=latest)](https://gymcts.readthedocs.io/en/latest/?badge=latest)
+# GYMCTS
 A Monte Carlo Tree Search Implementation for Gymnasium-style Environments.
 - Github: [GYMCTS on Github](https://github.com/Alexander-Nasuta/gymcts)
+- GitLab: [GYMCTS on GitLab](https://git-ce.rwth-aachen.de/alexander.nasuta/gymcts)
 - Pypi: [GYMCTS on PyPi](https://pypi.org/project/gymcts/)
 - Documentation: [GYMCTS Docs](https://gymcts.readthedocs.io/en/latest/)
@@ -507,9 +514,6 @@ This project uses `pytest` for testing. To run the tests, run the following comm
 ```shell
 pytest
 ```
-Here is a screenshot of what the output might look like:
-![](https://github.com/Alexander-Nasuta/GraphMatrixJobShopEnv/raw/master/resources/pytest-screenshot.png)
 For testing with `tox` run the following command:

{gymcts-1.2.1 → gymcts-1.3.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "gymcts"
-version = "1.2.1"
+version = "1.3.0"
 description = "A minimalistic implementation of the Monte Carlo Tree Search algorithm for planning problems fomulated as gymnaisum reinforcement learning environments."
 readme = "README.md"
 authors = [{ name = "Alexander Nasuta", email = "alexander.nasuta@wzl-iqs.rwth-aachen.de" }]

{gymcts-1.2.1 → gymcts-1.3.0}/src/gymcts/gymcts_agent.py RENAMED Viewed

@@ -2,7 +2,7 @@ import copy
 import random
 import gymnasium as gym
-from typing import TypeVar, Any, SupportsFloat, Callable
+from typing import TypeVar, Any, SupportsFloat, Callable, Literal
 from gymcts.gymcts_env_abc import GymctsABC
 from gymcts.gymcts_deepcopy_wrapper import DeepCopyMCTSGymEnvWrapper
@@ -11,7 +11,9 @@ from gymcts.gymcts_tree_plotter import _generate_mcts_tree
 from gymcts.logger import log
-TSoloMCTSNode = TypeVar("TSoloMCTSNode", bound="SoloMCTSNode")
 class GymctsAgent:
@@ -24,17 +26,50 @@ class GymctsAgent:
     search_root_node: GymctsNode  # NOTE: this is not the same as the root of the tree!
     clear_mcts_tree_after_step: bool
+    # (num_simulations: int, step_idx: int) -> int
+    @staticmethod
+    def calc_number_of_simulations_per_step(num_simulations: int, step_idx: int) -> int:
+        """
+        A function that returns a constant number of simulations per step.
+        :param num_simulations: The number of simulations to return.
+        :param step_idx: The current step index (not used in this function).
+        :return: A callable that takes an environment as input and returns the constant number of simulations.
+        """
+        return num_simulations
     def __init__(self,
                  env: GymctsABC,
                  clear_mcts_tree_after_step: bool = True,
                  render_tree_after_step: bool = False,
                  render_tree_max_depth: int = 2,
                  number_of_simulations_per_step: int = 25,
-                 exclude_unvisited_nodes_from_render: bool = False
+                 exclude_unvisited_nodes_from_render: bool = False,
+                 calc_number_of_simulations_per_step: Callable[[int,int], int] = None,
+                 score_variate: Literal["UCT_v0", "UCT_v1", "UCT_v2",] = "UCT_v0",
+                 best_action_weight=None,
                  ):
         # check if action space of env is discrete
         if not isinstance(env.action_space, gym.spaces.Discrete):
             raise ValueError("Action space must be discrete.")
+        if calc_number_of_simulations_per_step is not None:
+            # check if the provided function is callable
+            if not callable(calc_number_of_simulations_per_step):
+                raise ValueError("calc_number_of_simulations_per_step must be a callable accepting two arguments: num_simulations and step_idx.")
+            # assign the provided function to the attribute
+            # it needs to be staticmethod to be used as a class attribute
+            print("Using provided calc_number_of_simulations_per_step function.")
+            self.calc_number_of_simulations_per_step = staticmethod(calc_number_of_simulations_per_step)
+        if score_variate not in ["UCT_v0", "UCT_v1", "UCT_v2"]:
+            raise ValueError("score_variate must be one of ['UCT_v0', 'UCT_v1', 'UCT_v2'].")
+        GymctsNode.score_variate = score_variate
+        if best_action_weight is not None:
+            if best_action_weight < 0 or best_action_weight > 1:
+                raise ValueError("best_action_weight must be in range [0, 1].")
+            GymctsNode.best_action_weight = best_action_weight
         self.render_tree_after_step = render_tree_after_step
         self.exclude_unvisited_nodes_from_render = exclude_unvisited_nodes_from_render
@@ -65,8 +100,8 @@ class GymctsAgent:
         # select child with highest UCB score
         while not temp_node.is_leaf():
             children = list(temp_node.children.values())
-            max_ucb_score = max(child.ucb_score() for child in children)
-            best_children = [child for child in children if child.ucb_score() == max_ucb_score]
+            max_ucb_score = max(child.tree_policy_score() for child in children)
+            best_children = [child for child in children if child.tree_policy_score() == max_ucb_score]
             temp_node = random.choice(best_children)
         log.debug(f"Selected leaf node: {temp_node}")
         return temp_node
@@ -88,7 +123,6 @@ class GymctsAgent:
                 parent=node,
                 env_reference=self.env,
             )
         node.children = child_dict
     def solve(self, num_simulations_per_step: int = None, render_tree_after_step: bool = None) -> list[int]:
@@ -104,13 +138,20 @@ class GymctsAgent:
         action_list = []
+        idx = 0
         while not current_node.terminal:
-            next_action, current_node = self.perform_mcts_step(num_simulations=num_simulations_per_step,
+            num_sims = self.calc_number_of_simulations_per_step(num_simulations_per_step, idx)
+            log.info(f"Performing MCTS step {idx} with {num_sims} simulations.")
+            next_action, current_node = self.perform_mcts_step(num_simulations=num_sims,
                                                                render_tree_after_step=render_tree_after_step)
-            log.info(f"selected action {next_action} after {num_simulations_per_step} simulations.")
+            log.info(f"selected action {next_action} after {num_sims} simulations.")
             action_list.append(next_action)
             log.info(f"current action list: {action_list}")
+            idx += 1
         log.info(f"Final action list: {action_list}")
         # restore state of current node
         return action_list
@@ -149,6 +190,8 @@ class GymctsAgent:
             # we also need to reset the children of the current node
             # this is done by calling the reset method
             next_node.reset()
+        else:
+            next_node.remove_parent()
         self.search_root_node = next_node

{gymcts-1.2.1 → gymcts-1.3.0}/src/gymcts/gymcts_env_abc.py RENAMED Viewed

@@ -1,8 +1,7 @@
 from typing import TypeVar, Any, SupportsFloat, Callable
 from abc import ABC, abstractmethod
 import gymnasium as gym
-TSoloMCTSNode = TypeVar("TSoloMCTSNode", bound="SoloMCTSNode")
+import numpy as np
 class GymctsABC(ABC, gym.Env):
@@ -47,6 +46,17 @@ class GymctsABC(ABC, gym.Env):
         """
         pass
+    @abstractmethod
+    def action_masks(self) -> np.ndarray | None:
+        """
+        Returns a numpy array of action masks for the environment. The array should have the same length as the number
+        of actions in the action space. If an action is valid, the corresponding mask value should be 1, otherwise 0.
+        If no action mask is available, it should return None.
+        :return: a numpy array of action masks or None
+        """
+        pass
     @abstractmethod
     def rollout(self) -> float:
         """

gymcts-1.3.0/src/gymcts/gymcts_neural_agent.py ADDED Viewed

@@ -0,0 +1,479 @@
+import copy
+import sys
+from typing import Any, Literal
+import random
+import math
+import sb3_contrib
+import gymnasium as gym
+import numpy as np
+from graph_jsp_env.disjunctive_graph_jsp_env import DisjunctiveGraphJspEnv
+from jsp_instance_utils.instances import ft06, ft06_makespan
+from sb3_contrib.common.maskable.distributions import MaskableCategoricalDistribution
+from sb3_contrib.common.maskable.policies import MaskableActorCriticPolicy
+from sb3_contrib.common.wrappers import ActionMasker
+from gymcts.gymcts_agent import GymctsAgent
+from gymcts.gymcts_env_abc import GymctsABC
+from gymcts.gymcts_node import GymctsNode
+from gymcts.logger import log
+class GraphJspNeuralGYMCTSWrapper(GymctsABC, gym.Wrapper):
+    def __init__(self, env: DisjunctiveGraphJspEnv):
+        gym.Wrapper.__init__(self, env)
+    def load_state(self, state: Any) -> None:
+        self.env.reset()
+        for action in state:
+            self.env.step(action)
+    def is_terminal(self) -> bool:
+        return self.env.unwrapped.is_terminal()
+    def get_valid_actions(self) -> list[int]:
+        return list(self.env.unwrapped.valid_actions())
+    def rollout(self) -> float:
+        terminal = env.is_terminal()
+        if terminal:
+            lower_bound = env.unwrapped.reward_function_parameters['scaling_divisor']
+            return - env.unwrapped.get_makespan() / lower_bound + 2
+        reward = 0
+        while not terminal:
+            action = random.choice(self.get_valid_actions())
+            obs, reward, terminal, truncated, _ = env.step(action)
+        return reward + 2
+    def get_state(self) -> Any:
+        return env.unwrapped.get_action_history()
+    def action_masks(self) -> np.ndarray | None:
+        """Return the action mask for the current state."""
+        return self.env.unwrapped.valid_action_mask()
+class GymctsNeuralNode(GymctsNode):
+    PUCT_v3_mu = 0.95
+    MuZero_c1 = 1.25
+    MuZero_c2 = 19652.0
+    """
+    PUCT (Predictor + UCT) exploration terms:
+    PUCT_v0:
+        c * P(s, a) * √( N(s) / (1 + N(s,a)) )
+    PUCT_v1:
+        c * P(s, a) * √( 2 * ln(N(s)) / N(s,a) )
+    PUCT_v2:
+        c * P(s, a) * √( N(s) ) / N(s,a)
+    PUCT_v3:
+        c * P(s, a)^μ * √( N(s) / (1 + N(s,a)) )
+    PUCT_v4:
+        c * ( P(s, a) / (1 + N(s,a)) )
+    PUCT_v5:
+        c * P(s, a) * ( √(N(s)) + 1 ) / (N(s,a) + 1)
+    PUCT_v6:
+        c * P(s, a) * N(s) / (1 + N(s,a))
+    PUCT_v7:
+        c * P(s, a) * ( √(N(s)) + ε ) / (N(s,a) + 1)
+    PUCT_v8:
+        c * P(s, a) * √( (ln(N(s)) + 1) / (1 + N(s,a)) )
+    PUCT_v9:
+        c * P(s, a) * √( N(s) / (1 + N(s,a)) )
+    PUCT_v10:
+        c * P(s, a) * √( ln(N(s)) / (1 + N(s,a)) )
+    MuZero exploration terms:
+    MuZero_v0:
+        P(s, a) * √( N(s) / (1 + N(s,a)) ) * [ c₁ + ln( (N(s) + c₂ + 1) / c₂ ) ]
+    MuZero_v1:
+        P(s, a) * √( N(s) / (1 + N(s,a)) ) * [ c₁ + ln( (N(s) + c₂ + 1) / c₂ ) ]
+    Where:
+        - N(s):      number of times state s has been visited
+        - N(s,a):    number of times action a was taken from state s
+        - P(s,a):    prior probability of selecting action a from state s
+        - c, c₁, c₂: exploration constants
+        - μ:         exponent applied to P(s,a) in some variants
+        - ε:         small constant to avoid division by zero (in PUCT 7)
+    """
+    score_variate: Literal[
+        "PUCT_v0",
+        "PUCT_v1",
+        "PUTC_v2",
+        "PUTC_v3",
+        "PUTC_v4",
+        "PUTC_v5",
+        "PUTC_v6",
+        "PUTC_v7",
+        "PUTC_v8",
+        "PUTC_v9",
+        "PUTC_v10",
+        "MuZero_v0",
+        "MuZero_v1",
+    ] = "PUCT_v0"
+    def __init__(
+            self,
+            action: int,
+            parent: 'GymctsNeuralNode',
+            env_reference: GymctsABC,
+            prior_selection_score: float,
+            observation: np.ndarray | None = None,
+        ):
+        super().__init__(action, parent, env_reference)
+        self._obs = observation
+        self._selection_score_prior = prior_selection_score
+    def tree_policy_score(self) -> float:
+        # call the superclass (GymctsNode) for ucb_score
+        c = GymctsNode.ubc_c
+        # the way alpha zero does it
+        # exploration_term = self._selection_score_prior * c * math.sqrt(math.log(self.parent.visit_count)) / (1 + self.visit_count)
+        # the way the vanilla gymcts does it
+        p_sa = self._selection_score_prior
+        n_s = self.parent.visit_count
+        n_sa = self.visit_count
+        if GymctsNeuralNode.score_variate == "PUCT_v0":
+            return self.mean_value + c * p_sa * math.sqrt(n_s) / (1 + n_sa)
+        elif GymctsNeuralNode.score_variate == "PUCT_v1":
+            return self.mean_value + c * p_sa * math.sqrt(2 * math.log(n_s) / (n_sa))
+        elif GymctsNeuralNode.score_variate == "PUCT_v2":
+            return self.mean_value + c * p_sa * math.sqrt(n_s) / n_sa
+        elif GymctsNeuralNode.score_variate == "PUCT_v3":
+            return self.mean_value + c * (p_sa ** GymctsNeuralNode.PUCT_v3_mu) * math.sqrt(n_s / (1 + n_sa))
+        elif GymctsNeuralNode.score_variate == "PUCT_v4":
+            return self.mean_value + c * (p_sa / (1 + n_sa))
+        elif GymctsNeuralNode.score_variate == "PUCT_v5":
+            return self.mean_value + c * p_sa * (math.sqrt(n_s) + 1) / (n_sa + 1)
+        elif GymctsNeuralNode.score_variate == "PUCT_v6":
+            return self.mean_value + c * p_sa * n_s / (1 + n_sa)
+        elif GymctsNeuralNode.score_variate == "PUCT_v7":
+            epsilon = 1e-8
+            return self.mean_value + c * p_sa * (math.sqrt(n_s) + epsilon) / (n_sa + 1)
+        elif GymctsNeuralNode.score_variate == "PUCT_v8":
+            return self.mean_value + c * p_sa * math.sqrt((math.log(n_s) + 1) / (1 + n_sa))
+        elif GymctsNeuralNode.score_variate == "PUCT_v9":
+            return self.mean_value + c * p_sa * math.sqrt(n_s / (1 + n_sa))
+        elif GymctsNeuralNode.score_variate == "PUCT_v10":
+            return self.mean_value + c * p_sa * math.sqrt(math.log(n_s) / (1 + n_sa))
+        elif GymctsNeuralNode.score_variate == "MuZero_v0":
+            c1 = GymctsNeuralNode.MuZero_c1
+            c2 = GymctsNeuralNode.MuZero_c2
+            return self.mean_value + c * p_sa * math.sqrt(n_s) / (1 + n_sa) * (c1 + math.log((n_s + c2 + 1) / c2))
+        elif GymctsNeuralNode.score_variate == "MuZero_v1":
+            c1 = GymctsNeuralNode.MuZero_c1
+            c2 = GymctsNeuralNode.MuZero_c2
+            return self.mean_value + c * p_sa * math.sqrt(n_s) / (1 + n_sa) * (c1 + math.log((n_s + c2 + 1) / c2))
+        exploration_term = self._selection_score_prior * c * math.sqrt(math.log(self.parent.visit_count) / (self.visit_count)) if self.visit_count > 0 else float("inf")
+        return self.mean_value + exploration_term
+    def get_best_action(self) -> int:
+        """
+        Returns the best action of the node. The best action is the action with the highest score.
+        The best action is the action that has the highest score.
+        :return: the best action of the node.
+        """
+        return max(self.children.values(), key=lambda child: child.max_value).action
+    def __str__(self, colored=False, action_space_n=None) -> str:
+        """
+        Returns a string representation of the node. The string representation is used for visualisation purposes.
+        It is used for example in the mcts tree visualisation functionality.
+        :param colored: true if the string representation should be colored, false otherwise. (ture is used by the mcts tree visualisation)
+        :param action_space_n: the number of actions in the action space. This is used for coloring the action in the string representation.
+        :return: a potentially colored string representation of the node.
+        """
+        if not colored:
+            if not self.is_root():
+                return f"(a={self.action}, N={self.visit_count}, Q_v={self.mean_value:.2f}, best={self.max_value:.2f}, ubc={self.tree_policy_score():.2f})"
+            else:
+                return f"(N={self.visit_count}, Q_v={self.mean_value:.2f}, best={self.max_value:.2f}) [root]"
+        import gymcts.colorful_console_utils as ccu
+        if self.is_root():
+            return f"({ccu.CYELLOW}N{ccu.CEND}={self.visit_count}, {ccu.CYELLOW}Q_v{ccu.CEND}={self.mean_value:.2f}, {ccu.CYELLOW}best{ccu.CEND}={self.max_value:.2f})"
+        if action_space_n is None:
+            raise ValueError("action_space_n must be provided if colored is True")
+        p = ccu.CYELLOW
+        e = ccu.CEND
+        v = ccu.CCYAN
+        def colorful_value(value: float | int | None) -> str:
+            if value == None:
+                return f"{ccu.CGREY}None{e}"
+            color = ccu.CCYAN
+            if value == 0:
+                color = ccu.CRED
+            if value == float("inf"):
+                color = ccu.CGREY
+            if value == -float("inf"):
+                color = ccu.CGREY
+            if isinstance(value, float):
+                return f"{color}{value:.2f}{e}"
+            if isinstance(value, int):
+                return f"{color}{value}{e}"
+        root_node = self.get_root()
+        mean_val = f"{self.mean_value:.2f}"
+        return ((f"("
+                 f"{p}a{e}={ccu.wrap_evenly_spaced_color(s=self.action, n_of_item=self.action, n_classes=action_space_n)}, "
+                 f"{p}N{e}={colorful_value(self.visit_count)}, "
+                 f"{p}Q_v{e}={ccu.wrap_with_color_scale(s=mean_val, value=self.mean_value, min_val=root_node.min_value, max_val=root_node.max_value)}, "
+                 f"{p}best{e}={colorful_value(self.max_value)}") +
+                (f", {p}{GymctsNeuralNode.score_variate}{e}={colorful_value(self.tree_policy_score())})" if not self.is_root() else ")"))
+class GymctsNeuralAgent(GymctsAgent):
+    def __init__(self,
+                 env: GymctsABC,
+                 *args,
+                 model_kwargs=None,
+                 score_variate: Literal[
+                     "PUCT_v0",
+                     "PUCT_v1",
+                     "PUTC_v2",
+                     "PUTC_v3",
+                     "PUTC_v4",
+                     "PUTC_v5",
+                     "PUTC_v6",
+                     "PUTC_v7",
+                     "PUTC_v8",
+                     "PUTC_v9",
+                     "PUTC_v10",
+                     "MuZero_v0",
+                     "MuZero_v1",
+                 ] = "PUCT_v0",
+                 **kwargs
+                 ):
+        # init super class
+        super().__init__(
+            env=env,
+            *args,
+            **kwargs
+        )
+        if score_variate not in [
+            "PUCT_v0", "PUCT_v1", "PUTC_v2",
+            "PUTC_v3", "PUTC_v4", "PUTC_v5",
+            "PUTC_v6", "PUTC_v7", "PUTC_v8",
+            "PUTC_v9", "PUTC_v10",
+            "MuZero_v0", "MuZero_v1"
+        ]:
+            raise ValueError(f"Invalid score_variate: {score_variate}. Must be one of: "
+                             f"PUCT_v0, PUCT_v1, PUTC_v2, PUTC_v3, PUTC_v4, PUTC_v5, "
+                             f"PUTC_v6, PUTC_v7, PUTC_v8, PUTC_v9, PUTC_v10, MuZero_v0, MuZero_v1")
+        GymctsNeuralNode.score_variate = score_variate
+        if model_kwargs is None:
+            model_kwargs = {}
+        obs, info = env.reset()
+        self.search_root_node = GymctsNeuralNode(
+            action=None,
+            parent=None,
+            env_reference=env,
+            observation=obs,
+            prior_selection_score=1.0,
+        )
+        def mask_fn(env: gym.Env) -> np.ndarray:
+            mask = env.action_masks()
+            if mask is None:
+                mask = np.ones(env.action_space.n, dtype=np.float32)
+            return mask
+        env = ActionMasker(env, action_mask_fn=mask_fn)
+        model_kwargs = {
+            "policy": MaskableActorCriticPolicy,
+            "env": env,
+            "verbose": 1,
+        } | model_kwargs
+        self._model = sb3_contrib.MaskablePPO(**model_kwargs)
+    def learn(self, total_timesteps:int, **kwargs) -> None:
+        """Learn from the environment using the MaskablePPO model."""
+        self._model.learn(total_timesteps=total_timesteps, **kwargs)
+    def expand_node(self, node: GymctsNeuralNode) -> None:
+        log.debug(f"expanding node: {node}")
+        # EXPANSION STRATEGY
+        # expand all children
+        child_dict = {}
+        self._load_state(node)
+        obs_tensor, vectorized_env = self._model.policy.obs_to_tensor(np.array([node._obs]))
+        action_masks = np.array([self.env.action_masks()])
+        distribution = self._model.policy.get_distribution(obs=obs_tensor, action_masks=action_masks)
+        unwrapped_distribution = distribution.distribution.probs[0]
+        # print(f'valid actions: {node.valid_actions}')
+        # print(f'env mask: {self.env.action_masks()}')
+        # print(f'env valid actions: {self.env.get_valid_actions()}')
+        """
+                for action in node.valid_actions:
+            # reconstruct state
+            # load state of leaf node
+            self._load_state(node)
+            obs, reward, terminal, truncated, _ = self.env.step(action)
+            child_dict[action] = GymctsNeuralNode(
+                action=action,
+                parent=node,
+                env_reference=self.env,
+                observation=obs,
+                prior_selection_score=1.0,
+            )
+        node.children = child_dict
+        return
+        """
+        for action, prob in enumerate(unwrapped_distribution):
+            self._load_state(node)
+            log.debug(f"Probabily for action {action}: {prob}")
+            if prob == 0.0:
+                continue
+            assert action in node.valid_actions, f"Action {action} is not in valid actions: {node.valid_actions}"
+            obs, reward, terminal, truncated, _ = self.env.step(action)
+            child_dict[action] = GymctsNeuralNode(
+                action=action,
+                parent=node,
+                observation=copy.deepcopy(obs),
+                env_reference=self.env,
+                prior_selection_score=float(prob)
+            )
+        node.children = child_dict
+        # print(f"Expanded node {node} with {len(node.children)} children.")
+if __name__ == '__main__':
+    log.setLevel(20)
+    env_kwargs = {
+        "jps_instance": ft06,
+        "default_visualisations": ["gantt_console", "graph_console"],
+        "reward_function_parameters": {
+            "scaling_divisor": ft06_makespan
+        },
+        "reward_function": "nasuta",
+    }
+    env = DisjunctiveGraphJspEnv(**env_kwargs)
+    env.reset()
+    env = GraphJspNeuralGYMCTSWrapper(env)
+    import torch
+    model_kwargs = {
+        "gamma": 0.99013,
+        "gae_lambda": 0.9,
+        "normalize_advantage": True,
+        "n_epochs": 28,
+        "n_steps": 432,
+        "max_grad_norm": 0.5,
+        "learning_rate": 6e-4,
+        "policy_kwargs": {
+            "net_arch": {
+                "pi": [90, 90],
+                "vf": [90, 90],
+            },
+            "ortho_init": True,
+            "activation_fn": torch.nn.ELU,
+            "optimizer_kwargs": {
+                "eps": 1e-7
+            }
+        }
+    }
+    agent = GymctsNeuralAgent(
+        env=env,
+        render_tree_after_step=True,
+        render_tree_max_depth=3,
+        exclude_unvisited_nodes_from_render=False,
+        number_of_simulations_per_step=15,
+        # clear_mcts_tree_after_step = False,
+        model_kwargs=model_kwargs
+    )
+    agent.learn(total_timesteps=10_000)
+    agent.solve()
+    actions = agent.solve(render_tree_after_step=True)
+    for a in actions:
+        obs, rew, term, trun, info = env.step(a)
+    env.render()
+    makespan = env.unwrapped.get_makespan()
+    print(f"makespan: {makespan}")

{gymcts-1.2.1 → gymcts-1.3.0}/src/gymcts/gymcts_node.py RENAMED Viewed

@@ -2,7 +2,7 @@ import uuid
 import random
 import math
-from typing import TypeVar, Any, SupportsFloat, Callable, Generator
+from typing import TypeVar, Any, SupportsFloat, Callable, Generator, Literal
 from gymcts.gymcts_env_abc import GymctsABC
@@ -16,6 +16,25 @@ class GymctsNode:
     best_action_weight: float = 0.05 # weight for the best action
     ubc_c = 0.707 # exploration coefficient
+    """
+    UCT (Upper Confidence Bound applied to Trees) exploration terms:
+    UCT 0:
+        c * √( 2 * ln(N(s)) / N(s,a) )
+    UCT 1:
+        c * √( ln(N(s)) / (1 + N(s,a)) )
+    UCT 2:
+        c * ( √(N(s)) / (1 + N(s,a)) )
+    Where:
+        N(s)     = number of times state s has been visited
+        N(s,a)   = number of times action a was taken from state s
+        c        = exploration constant
+    """
+    score_variate: Literal["UCT_v0", "UCT_v1", "UCT_v2",] = "UCT_v0"
     # attributes
@@ -42,7 +61,7 @@ class GymctsNode:
         if not colored:
             if not self.is_root():
-                return f"(a={self.action}, N={self.visit_count}, Q_v={self.mean_value:.2f}, best={self.max_value:.2f}, ubc={self.ucb_score():.2f})"
+                return f"(a={self.action}, N={self.visit_count}, Q_v={self.mean_value:.2f}, best={self.max_value:.2f}, ubc={self.tree_policy_score():.2f})"
             else:
                 return f"(N={self.visit_count}, Q_v={self.mean_value:.2f}, best={self.max_value:.2f}) [root]"
@@ -83,7 +102,7 @@ class GymctsNode:
                  f"{p}N{e}={colorful_value(self.visit_count)}, "
                  f"{p}Q_v{e}={ccu.wrap_with_color_scale(s=mean_val, value=self.mean_value, min_val=root_node.min_value, max_val=root_node.max_value)}, "
                  f"{p}best{e}={colorful_value(self.max_value)}") +
-                (f", {p}ubc{e}={colorful_value(self.ucb_score())})" if not self.is_root() else ")"))
+                (f", {p}ubc{e}={colorful_value(self.tree_policy_score())})" if not self.is_root() else ")"))
     def traverse_nodes(self) -> Generator[TGymctsNode, None, None]:
         """
@@ -192,6 +211,12 @@ class GymctsNode:
         if self.parent:
             self.parent.reset()
+    def remove_parent(self) -> None:
+        self.parent = None
+        if self.parent is not None:
+            self.parent.remove_parent()
     def is_root(self) -> bool:
         """
         Returns true if the node is a root node. A root node is a node that has no parent.
@@ -252,9 +277,39 @@ class GymctsNode:
         """
         return self.max_value
-    def ucb_score(self):
+    def tree_policy_score(self):
         """
+        TODO: update docstring
         The score for an action that would transition between the parent and child.
+        For vanilla MCTS, this is the UCB1 score.
+        The UCB1 score is calculated using the formula:
+        UCT (Upper Confidence Bound applied to Trees) exploration terms:
+        UCT_v0:
+            c * √( 2 * ln(N(s)) / N(s,a) )
+        UCT_v1:
+            c * √( ln(N(s)) / (1 + N(s,a)) )
+        UCT_v2:
+            c * ( √(N(s)) / (1 + N(s,a)) )
+        Where:
+            N(s)     = number of times state s has been visited
+            N(s,a)   = number of times action a was taken from state s
+            c        = exploration constant
+        where:
+        - mean_value is the mean value of the node
+        - c is a constant that controls the exploration-exploitation trade-off (GymctsNode.ubc_c)
+        - parent_visit_count is the number of times the parent node has been visited
+        - visit_count is the number of times the node has been visited
+        If the node has not been visited yet, the score is set to infinity.
         prior_score = child.prior * math.sqrt(parent.visit_count) / (child.visit_count + 1)
         if child.visit_count > 0:
@@ -269,8 +324,20 @@ class GymctsNode:
         """
         if self.is_root():
             raise ValueError("ucb_score can only be called on non-root nodes")
-        # c = 0.707 # todo: make it an attribute?
-        c = GymctsNode.ubc_c
-        if self.visit_count == 0:
-            return float("inf")
-        return self.mean_value + c * math.sqrt(math.log(self.parent.visit_count) / (self.visit_count))
+        c = GymctsNode.ubc_c # default is 0.707
+        if GymctsNode.score_variate == "UCT_v0":
+            if self.visit_count == 0:
+                return float("inf")
+            return self.mean_value + c * math.sqrt( 2 * math.log(self.parent.visit_count) / (self.visit_count))
+        if GymctsNode.score_variate == "UCT_v1":
+            return self.mean_value + c * math.sqrt( math.log(self.parent.visit_count) / (1 + self.visit_count))
+        if GymctsNode.score_variate == "UCT_v2":
+            return self.mean_value + c * math.sqrt(self.parent.visit_count) / (1 + self.visit_count)
+        raise ValueError(f"unknown score variate: {GymctsNode.score_variate}. ")

{gymcts-1.2.1 → gymcts-1.3.0/src/gymcts.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gymcts
-Version: 1.2.1
+Version: 1.3.0
 Summary: A minimalistic implementation of the Monte Carlo Tree Search algorithm for planning problems fomulated as gymnaisum reinforcement learning environments.
 Author: Alexander Nasuta
 Author-email: Alexander Nasuta <alexander.nasuta@wzl-iqs.rwth-aachen.de>
@@ -70,11 +70,18 @@ Requires-Dist: jupyter; extra == "dev"
 Requires-Dist: typing_extensions>=4.12.0; extra == "dev"
 Dynamic: license-file
-# Graph Matrix Job Shop Env
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15283390.svg)](https://doi.org/10.5281/zenodo.15283390)
+[![Python Badge](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=fff&style=flat)](https://www.python.org/downloads/)
+[![PyPI version](https://img.shields.io/pypi/v/gymcts)](https://pypi.org/project/gymcts/)
+[![License](https://img.shields.io/pypi/l/gymcts)](https://github.com/Alexander-Nasuta/gymcts/blob/master/LICENSE)
+[![Documentation Status](https://readthedocs.org/projects/gymcts/badge/?version=latest)](https://gymcts.readthedocs.io/en/latest/?badge=latest)
+# GYMCTS
 A Monte Carlo Tree Search Implementation for Gymnasium-style Environments.
 - Github: [GYMCTS on Github](https://github.com/Alexander-Nasuta/gymcts)
+- GitLab: [GYMCTS on GitLab](https://git-ce.rwth-aachen.de/alexander.nasuta/gymcts)
 - Pypi: [GYMCTS on PyPi](https://pypi.org/project/gymcts/)
 - Documentation: [GYMCTS Docs](https://gymcts.readthedocs.io/en/latest/)
@@ -579,9 +586,6 @@ This project uses `pytest` for testing. To run the tests, run the following comm
 ```shell
 pytest
 ```
-Here is a screenshot of what the output might look like:
-![](https://github.com/Alexander-Nasuta/GraphMatrixJobShopEnv/raw/master/resources/pytest-screenshot.png)
 For testing with `tox` run the following command:

{gymcts-1.2.1 → gymcts-1.3.0}/src/gymcts.egg-info/SOURCES.txt RENAMED Viewed

@@ -11,6 +11,7 @@ src/gymcts/gymcts_agent.py
 src/gymcts/gymcts_deepcopy_wrapper.py
 src/gymcts/gymcts_distributed_agent.py
 src/gymcts/gymcts_env_abc.py
+src/gymcts/gymcts_neural_agent.py
 src/gymcts/gymcts_node.py
 src/gymcts/gymcts_tree_plotter.py
 src/gymcts/logger.py