PyPI - pyRDDLGym-jax - Versions diffs - 2.1__tar.gz → 2.3__tar.gz - Mend

pyRDDLGym-jax 2.1tar.gz → 2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: pyRDDLGym-jax
-Version: 2.1
+Version: 2.3
 Summary: pyRDDLGym-jax: automatic differentiation for solving sequential planning problems in JAX.
 Home-page: https://github.com/pyrddlgym-project/pyRDDLGym-jax
 Author: Michael Gimelfarb, Ayal Taitler, Scott Sanner
@@ -58,8 +58,11 @@ Dynamic: summary
 Purpose:
-1. automatic translation of any RDDL description file into a differentiable simulator in JAX
-2. flexible policy class representations, automatic model relaxations for working in discrete and hybrid domains, and Bayesian hyper-parameter tuning.
+1. automatic translation of RDDL description files into differentiable JAX simulators
+2. implementation of (highly configurable) operator relaxations for working in discrete and hybrid domains
+3. flexible policy representations and automated Bayesian hyper-parameter tuning
+4. interactive dashboard for dyanmic visualization and debugging
+5. hybridization with parameter-exploring policy gradients.
 Some demos of solved problems by JaxPlan:
@@ -235,8 +238,23 @@ More documentation about this and other new features will be coming soon.
 ## Tuning the Planner
-It is easy to tune the planner's hyper-parameters efficiently and automatically using Bayesian optimization.
-To do this, first create a config file template with patterns replacing concrete parameter values that you want to tune, e.g.:
+A basic run script is provided to run automatic Bayesian hyper-parameter tuning for the most sensitive parameters of JaxPlan:
+```shell
+jaxplan tune <domain> <instance> <method> <trials> <iters> <workers> <dashboard>
+```
+where:
+- ``domain`` is the domain identifier as specified in rddlrepository
+- ``instance`` is the instance identifier
+- ``method`` is the planning method to use (i.e. drp, slp, replan)
+- ``trials`` is the (optional) number of trials/episodes to average in evaluating each hyper-parameter setting
+- ``iters`` is the (optional) maximum number of iterations/evaluations of Bayesian optimization to perform
+- ``workers`` is the (optional) number of parallel evaluations to be done at each iteration, e.g. the total evaluations = ``iters * workers``
+- ``dashboard`` is whether the optimizations are tracked in the dashboard application.
+It is easy to tune a custom range of the planner's hyper-parameters efficiently.
+First create a config file template with patterns replacing concrete parameter values that you want to tune, e.g.:
 ```ini
 [Model]
@@ -260,7 +278,7 @@ train_on_reset=True
 would allow to tune the sharpness of model relaxations, and the learning rate of the optimizer.
-Next, you must link the patterns in the config with concrete hyper-parameter ranges the tuner will understand:
+Next, you must link the patterns in the config with concrete hyper-parameter ranges the tuner will understand, and run the optimizer:
 ```python
 import pyRDDLGym
@@ -292,22 +310,7 @@ tuning = JaxParameterTuning(env=env,
                             gp_iters=iters)
 tuning.tune(key=42, log_file='path/to/log.csv')
 ```
-A basic run script is provided to run the automatic hyper-parameter tuning for the most sensitive parameters of JaxPlan:
-```shell
-jaxplan tune <domain> <instance> <method> <trials> <iters> <workers> <dashboard>
-```
-where:
-- ``domain`` is the domain identifier as specified in rddlrepository
-- ``instance`` is the instance identifier
-- ``method`` is the planning method to use (i.e. drp, slp, replan)
-- ``trials`` is the (optional) number of trials/episodes to average in evaluating each hyper-parameter setting
-- ``iters`` is the (optional) maximum number of iterations/evaluations of Bayesian optimization to perform
-- ``workers`` is the (optional) number of parallel evaluations to be done at each iteration, e.g. the total evaluations = ``iters * workers``
-- ``dashboard`` is whether the optimizations are tracked in the dashboard application.
 ## Simulation

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/README.md RENAMED Viewed

@@ -12,8 +12,11 @@
 Purpose:
-1. automatic translation of any RDDL description file into a differentiable simulator in JAX
-2. flexible policy class representations, automatic model relaxations for working in discrete and hybrid domains, and Bayesian hyper-parameter tuning.
+1. automatic translation of RDDL description files into differentiable JAX simulators
+2. implementation of (highly configurable) operator relaxations for working in discrete and hybrid domains
+3. flexible policy representations and automated Bayesian hyper-parameter tuning
+4. interactive dashboard for dyanmic visualization and debugging
+5. hybridization with parameter-exploring policy gradients.
 Some demos of solved problems by JaxPlan:
@@ -189,8 +192,23 @@ More documentation about this and other new features will be coming soon.
 ## Tuning the Planner
-It is easy to tune the planner's hyper-parameters efficiently and automatically using Bayesian optimization.
-To do this, first create a config file template with patterns replacing concrete parameter values that you want to tune, e.g.:
+A basic run script is provided to run automatic Bayesian hyper-parameter tuning for the most sensitive parameters of JaxPlan:
+```shell
+jaxplan tune <domain> <instance> <method> <trials> <iters> <workers> <dashboard>
+```
+where:
+- ``domain`` is the domain identifier as specified in rddlrepository
+- ``instance`` is the instance identifier
+- ``method`` is the planning method to use (i.e. drp, slp, replan)
+- ``trials`` is the (optional) number of trials/episodes to average in evaluating each hyper-parameter setting
+- ``iters`` is the (optional) maximum number of iterations/evaluations of Bayesian optimization to perform
+- ``workers`` is the (optional) number of parallel evaluations to be done at each iteration, e.g. the total evaluations = ``iters * workers``
+- ``dashboard`` is whether the optimizations are tracked in the dashboard application.
+It is easy to tune a custom range of the planner's hyper-parameters efficiently.
+First create a config file template with patterns replacing concrete parameter values that you want to tune, e.g.:
 ```ini
 [Model]
@@ -214,7 +232,7 @@ train_on_reset=True
 would allow to tune the sharpness of model relaxations, and the learning rate of the optimizer.
-Next, you must link the patterns in the config with concrete hyper-parameter ranges the tuner will understand:
+Next, you must link the patterns in the config with concrete hyper-parameter ranges the tuner will understand, and run the optimizer:
 ```python
 import pyRDDLGym
@@ -246,22 +264,7 @@ tuning = JaxParameterTuning(env=env,
                             gp_iters=iters)
 tuning.tune(key=42, log_file='path/to/log.csv')
 ```
-A basic run script is provided to run the automatic hyper-parameter tuning for the most sensitive parameters of JaxPlan:
-```shell
-jaxplan tune <domain> <instance> <method> <trials> <iters> <workers> <dashboard>
-```
-where:
-- ``domain`` is the domain identifier as specified in rddlrepository
-- ``instance`` is the instance identifier
-- ``method`` is the planning method to use (i.e. drp, slp, replan)
-- ``trials`` is the (optional) number of trials/episodes to average in evaluating each hyper-parameter setting
-- ``iters`` is the (optional) maximum number of iterations/evaluations of Bayesian optimization to perform
-- ``workers`` is the (optional) number of parallel evaluations to be done at each iteration, e.g. the total evaluations = ``iters * workers``
-- ``dashboard`` is whether the optimizations are tracked in the dashboard application.
 ## Simulation

pyrddlgym_jax-2.3/pyRDDLGym_jax/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = '2.3'

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/pyRDDLGym_jax/core/compiler.py RENAMED Viewed

@@ -1019,6 +1019,9 @@ class JaxRDDLCompiler:
     # UnnormDiscrete: complete (subclass uses Gumbel-softmax)
     # Discrete(p): complete (subclass uses Gumbel-softmax)
     # UnnormDiscrete(p): complete (subclass uses Gumbel-softmax)
+    # Poisson (subclass uses Gumbel-softmax or Poisson process trick)
+    # Binomial (subclass uses Gumbel-softmax or Normal approximation)
+    # NegativeBinomial (subclass uses Poisson-Gamma mixture)
     # distributions which seem to support backpropagation (need more testing):
     # Beta
@@ -1026,11 +1029,8 @@ class JaxRDDLCompiler:
     # Gamma
     # ChiSquare
     # Dirichlet
-    # Poisson (subclass uses Gumbel-softmax or Poisson process trick)
     # distributions with incomplete reparameterization support (TODO):
-    # Binomial
-    # NegativeBinomial
     # Multinomial
     def _jax_random(self, expr, init_params):
@@ -1299,8 +1299,17 @@ class JaxRDDLCompiler:
     def _jax_negative_binomial(self, expr, init_params):
         ERR = JaxRDDLCompiler.ERROR_CODES['INVALID_PARAM_NEGATIVE_BINOMIAL']
         JaxRDDLCompiler._check_num_args(expr, 2)
         arg_trials, arg_prob = expr.args
+        # if prob is non-fluent, always use the exact operation
+        if self.compile_non_fluent_exact \
+        and not self.traced.cached_is_fluent(arg_trials) \
+        and not self.traced.cached_is_fluent(arg_prob):
+            negbin_op = self.EXACT_OPS['sampling']['NegativeBinomial']
+        else:
+            negbin_op = self.OPS['sampling']['NegativeBinomial']
+        jax_op = negbin_op(expr.id, init_params)
         jax_trials = self._jax(arg_trials, init_params)
         jax_prob = self._jax(arg_prob, init_params)
@@ -1308,11 +1317,8 @@ class JaxRDDLCompiler:
         def _jax_wrapped_distribution_negative_binomial(x, params, key):
             trials, key, err2, params = jax_trials(x, params, key)
             prob, key, err1, params = jax_prob(x, params, key)
-            trials = jnp.asarray(trials, dtype=self.REAL)
-            prob = jnp.asarray(prob, dtype=self.REAL)
             key, subkey = random.split(key)
-            dist = tfp.distributions.NegativeBinomial(total_count=trials, probs=prob)
-            sample = jnp.asarray(dist.sample(seed=subkey), dtype=self.INT)
+            sample, params = jax_op(subkey, trials, prob, params)
             out_of_bounds = jnp.logical_not(jnp.all(
                 (prob >= 0) & (prob <= 1) & (trials > 0)))
             err = err1 | err2 | (out_of_bounds * ERR)

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/pyRDDLGym_jax/core/logic.py RENAMED Viewed

@@ -29,15 +29,27 @@
 #
 # ***********************************************************************
-from typing import Callable, Dict, Union
+import traceback
+from typing import Callable, Dict, Tuple, Union
 import jax
 import jax.numpy as jnp
 import jax.random as random
 import jax.scipy as scipy
+from pyRDDLGym.core.debug.exception import raise_warning
-def enumerate_literals(shape, axis, dtype=jnp.int32):
+# more robust approach - if user does not have this or broken try to continue
+try:
+    from tensorflow_probability.substrates import jax as tfp
+except Exception:
+    raise_warning('Failed to import tensorflow-probability: '
+                  'compilation of some probability distributions will fail.', 'red')
+    traceback.print_exc()
+    tfp = None
+def enumerate_literals(shape: Tuple[int, ...], axis: int, dtype: type=jnp.int32) -> jnp.ndarray:
     literals = jnp.arange(shape[axis], dtype=dtype)
     literals = literals[(...,) + (jnp.newaxis,) * (len(shape) - 1)]
     literals = jnp.moveaxis(literals, source=0, destination=axis)
@@ -74,7 +86,7 @@ class Comparison:
 class SigmoidComparison(Comparison):
     '''Comparison operations approximated using sigmoid functions.'''
-    def __init__(self, weight: float=10.0):
+    def __init__(self, weight: float=10.0) -> None:
         self.weight = weight
     # https://arxiv.org/abs/2110.05651
@@ -140,7 +152,7 @@ class Rounding:
 class SoftRounding(Rounding):
     '''Rounding operations approximated using soft operations.'''
-    def __init__(self, weight: float=10.0):
+    def __init__(self, weight: float=10.0) -> None:
         self.weight = weight
     # https://www.tensorflow.org/probability/api_docs/python/tfp/substrates/jax/bijectors/Softfloor
@@ -291,7 +303,7 @@ class YagerTNorm(TNorm):
     '''Yager t-norm given by the expression
     (x, y) -> max(1 - ((1 - x)^p + (1 - y)^p)^(1/p)).'''
-    def __init__(self, p=2.0):
+    def __init__(self, p: float=2.0) -> None:
         self.p = float(p)
     def norm(self, id, init_params):
@@ -339,6 +351,9 @@ class RandomSampling:
     def binomial(self, id, init_params, logic):
         raise NotImplementedError
+    def negative_binomial(self, id, init_params, logic):
+        raise NotImplementedError
     def geometric(self, id, init_params, logic):
         raise NotImplementedError
@@ -386,8 +401,7 @@ class SoftRandomSampling(RandomSampling):
     def _poisson_gumbel_softmax(self, id, init_params, logic):
         argmax_approx = logic.argmax(id, init_params)
         def _jax_wrapped_calc_poisson_gumbel_softmax(key, rate, params):
-            ks = jnp.arange(0, self.poisson_bins)
-            ks = ks[(jnp.newaxis,) * jnp.ndim(rate) + (...,)]
+            ks = jnp.arange(self.poisson_bins)[(jnp.newaxis,) * jnp.ndim(rate) + (...,)]
             rate = rate[..., jnp.newaxis]
             log_prob = ks * jnp.log(rate + logic.eps) - rate - scipy.special.gammaln(ks + 1)
             Gumbel01 = random.gumbel(key=key, shape=jnp.shape(log_prob), dtype=logic.REAL)
@@ -400,10 +414,7 @@ class SoftRandomSampling(RandomSampling):
         less_approx = logic.less(id, init_params)
         def _jax_wrapped_calc_poisson_exponential(key, rate, params):
             Exp1 = random.exponential(
-                key=key,
-                shape=(self.poisson_bins,) + jnp.shape(rate),
-                dtype=logic.REAL
-            )
+                key=key,  shape=(self.poisson_bins,) + jnp.shape(rate), dtype=logic.REAL)
             delta_t = Exp1 / rate[jnp.newaxis, ...]
             times = jnp.cumsum(delta_t, axis=0)
             indicator, params = less_approx(times, 1.0, params)
@@ -411,72 +422,98 @@ class SoftRandomSampling(RandomSampling):
             return sample, params
         return _jax_wrapped_calc_poisson_exponential
+    # normal approximation to Poisson: Poisson(rate) -> Normal(rate, rate)
+    def _poisson_normal_approx(self, logic):
+        def _jax_wrapped_calc_poisson_normal_approx(key, rate, params):
+            normal = random.normal(key=key, shape=jnp.shape(rate), dtype=logic.REAL)
+            sample = rate + jnp.sqrt(rate) * normal
+            return sample, params
+        return _jax_wrapped_calc_poisson_normal_approx
     def poisson(self, id, init_params, logic):
-        def _jax_wrapped_calc_poisson_exact(key, rate, params):
-            sample = random.poisson(key=key, lam=rate, dtype=logic.INT)
-            sample = jnp.asarray(sample, dtype=logic.REAL)
-            return sample, params
         if self.poisson_exp_method:
             _jax_wrapped_calc_poisson_diff = self._poisson_exponential(
                 id, init_params, logic)
         else:
             _jax_wrapped_calc_poisson_diff = self._poisson_gumbel_softmax(
                 id, init_params, logic)
+        _jax_wrapped_calc_poisson_normal = self._poisson_normal_approx(logic)
+        # for small rate use the Poisson process or gumbel-softmax reparameterization
+        # for large rate use the normal approximation
         def _jax_wrapped_calc_poisson_approx(key, rate, params):
-            # determine if error of truncation at rate is acceptable
             if self.poisson_bins > 0:
                 cuml_prob = scipy.stats.poisson.cdf(self.poisson_bins, rate)
-                approx_cond = jax.lax.stop_gradient(
-                    jnp.min(cuml_prob) > self.poisson_min_cdf)
+                small_rate = jax.lax.stop_gradient(cuml_prob >= self.poisson_min_cdf)
+                small_sample, params = _jax_wrapped_calc_poisson_diff(key, rate, params)
+                large_sample, params = _jax_wrapped_calc_poisson_normal(key, rate, params)
+                sample = jnp.where(small_rate, small_sample, large_sample)
+                return sample, params
             else:
-                approx_cond = False
-            # for acceptable truncation use the approximation, use exact otherwise
-            return jax.lax.cond(
-                approx_cond,
-                _jax_wrapped_calc_poisson_diff,
-                _jax_wrapped_calc_poisson_exact,
-                key, rate, params
-            )
+                return _jax_wrapped_calc_poisson_normal(key, rate, params)
         return _jax_wrapped_calc_poisson_approx
-    def binomial(self, id, init_params, logic):
-        def _jax_wrapped_calc_binomial_exact(key, trials, prob, params):
-            trials = jnp.asarray(trials, dtype=logic.REAL)
-            prob = jnp.asarray(prob, dtype=logic.REAL)
-            sample = random.binomial(key=key, n=trials, p=prob, dtype=logic.REAL)
-            return sample, params
+    # normal approximation to Binomial: Bin(n, p) -> Normal(np, np(1-p))
+    def _binomial_normal_approx(self, logic):
+        def _jax_wrapped_calc_binomial_normal_approx(key, trials, prob, params):
+            normal = random.normal(key=key, shape=jnp.shape(trials), dtype=logic.REAL)
+            mean = trials * prob
+            std = jnp.sqrt(trials * prob * (1.0 - prob))
+            sample = mean + std * normal
+            return sample, params
+        return _jax_wrapped_calc_binomial_normal_approx
-        # Binomial(n, p) = sum_{i = 1 ... n} Bernoulli(p)
-        bernoulli_approx = self.bernoulli(id, init_params, logic)
-        def _jax_wrapped_calc_binomial_sum(key, trials, prob, params):
-            prob_full = jnp.broadcast_to(
-                prob[..., jnp.newaxis], shape=jnp.shape(prob) + (self.binomial_bins,))
-            sample_bern, params = bernoulli_approx(key, prob_full, params)
-            indices = jnp.arange(self.binomial_bins)[
-                (jnp.newaxis,) * jnp.ndim(prob) + (...,)]
-            mask = indices < trials[..., jnp.newaxis]
-            sample = jnp.sum(sample_bern * mask, axis=-1)
-            return sample, params
+    def _binomial_gumbel_softmax(self, id, init_params, logic):
+        argmax_approx = logic.argmax(id, init_params)
+        def _jax_wrapped_calc_binomial_gumbel_softmax(key, trials, prob, params):
+            ks = jnp.arange(self.binomial_bins)[(jnp.newaxis,) * jnp.ndim(trials) + (...,)]
+            trials = trials[..., jnp.newaxis]
+            prob = prob[..., jnp.newaxis]
+            in_support = ks <= trials
+            ks = jnp.minimum(ks, trials)
+            log_prob = ((scipy.special.gammaln(trials + 1) -
+                         scipy.special.gammaln(ks + 1) -
+                         scipy.special.gammaln(trials - ks + 1)) +
+                        ks * jnp.log(prob + logic.eps) +
+                        (trials - ks) * jnp.log1p(-prob + logic.eps))
+            log_prob = jnp.where(in_support, log_prob, jnp.log(logic.eps))
+            Gumbel01 = random.gumbel(key=key, shape=jnp.shape(log_prob), dtype=logic.REAL)
+            sample = Gumbel01 + log_prob
+            return argmax_approx(sample, axis=-1, params=params)
+        return _jax_wrapped_calc_binomial_gumbel_softmax
-        # for trials not too large use the Bernoulli relaxation, use exact otherwise
+    def binomial(self, id, init_params, logic):
+        _jax_wrapped_calc_binomial_normal = self._binomial_normal_approx(logic)
+        _jax_wrapped_calc_binomial_gs = self._binomial_gumbel_softmax(id, init_params, logic)
+        # for small trials use the Bernoulli relaxation
+        # for large trials use the normal approximation
         def _jax_wrapped_calc_binomial_approx(key, trials, prob, params):
-            return jax.lax.cond(
-                jax.lax.stop_gradient(jnp.max(trials) < self.binomial_bins),
-                _jax_wrapped_calc_binomial_sum,
-                _jax_wrapped_calc_binomial_exact,
-                key, trials, prob, params
-            )
+            small_trials = jax.lax.stop_gradient(trials < self.binomial_bins)
+            small_sample, params = _jax_wrapped_calc_binomial_gs(key, trials, prob, params)
+            large_sample, params = _jax_wrapped_calc_binomial_normal(key, trials, prob, params)
+            sample = jnp.where(small_trials, small_sample, large_sample)
+            return sample, params
         return _jax_wrapped_calc_binomial_approx
+    # https://en.wikipedia.org/wiki/Negative_binomial_distribution#Gamma%E2%80%93Poisson_mixture
+    def negative_binomial(self, id, init_params, logic):
+        poisson_approx = self.poisson(id, init_params, logic)
+        def _jax_wrapped_calc_negative_binomial_approx(key, trials, prob, params):
+            key, subkey = random.split(key)
+            trials = jnp.asarray(trials, dtype=logic.REAL)
+            Gamma = random.gamma(key=key, a=trials, dtype=logic.REAL)
+            scale = (1.0 - prob) / prob
+            poisson_rate = scale * Gamma
+            return poisson_approx(subkey, poisson_rate, params)
+        return _jax_wrapped_calc_negative_binomial_approx
     def geometric(self, id, init_params, logic):
         approx_floor = logic.floor(id, init_params)
         def _jax_wrapped_calc_geometric_approx(key, prob, params):
             U = random.uniform(key=key, shape=jnp.shape(prob), dtype=logic.REAL)
-            floor, params = approx_floor(jnp.log1p(-U) / jnp.log1p(-prob), params)
+            floor, params = approx_floor(
+                jnp.log1p(-U) / jnp.log1p(-prob + logic.eps), params)
             sample = floor + 1
             return sample, params
         return _jax_wrapped_calc_geometric_approx
@@ -532,6 +569,14 @@ class Determinization(RandomSampling):
     def binomial(self, id, init_params, logic):
         return self._jax_wrapped_calc_binomial_determinized
+    @staticmethod
+    def _jax_wrapped_calc_negative_binomial_determinized(key, trials, prob, params):
+        sample = trials * ((1.0 / prob) - 1.0)
+        return sample, params
+    def negative_binomial(self, id, init_params, logic):
+        return self._jax_wrapped_calc_negative_binomial_determinized
     @staticmethod
     def _jax_wrapped_calc_geometric_determinized(key, prob, params):
         sample = 1.0 / prob
@@ -712,7 +757,8 @@ class Logic:
                 'Discrete': self.discrete,
                 'Poisson': self.poisson,
                 'Geometric': self.geometric,
-                'Binomial': self.binomial
+                'Binomial': self.binomial,
+                'NegativeBinomial': self.negative_binomial
             }
         }
@@ -830,6 +876,9 @@ class Logic:
     def binomial(self, id, init_params):
         raise NotImplementedError
+    def negative_binomial(self, id, init_params):
+        raise NotImplementedError
 class ExactLogic(Logic):
     '''A class representing exact logic in JAX.'''
@@ -1005,6 +1054,17 @@ class ExactLogic(Logic):
             sample = jnp.asarray(sample, dtype=self.INT)
             return sample, params
         return _jax_wrapped_calc_binomial_exact
+    # note: for some reason tfp defines it as number of successes before trials failures
+    # I will define it as the number of failures before trials successes
+    def negative_binomial(self, id, init_params):
+        def _jax_wrapped_calc_negative_binomial_exact(key, trials, prob, params):
+            trials = jnp.asarray(trials, dtype=self.REAL)
+            prob = jnp.asarray(prob, dtype=self.REAL)
+            dist = tfp.distributions.NegativeBinomial(total_count=trials, probs=1.0 - prob)
+            sample = jnp.asarray(dist.sample(seed=key), dtype=self.INT)
+            return sample, params
+        return _jax_wrapped_calc_negative_binomial_exact
 class FuzzyLogic(Logic):
@@ -1234,6 +1294,9 @@ class FuzzyLogic(Logic):
     def binomial(self, id, init_params):
         return self.sampling.binomial(id, init_params, self)
+    def negative_binomial(self, id, init_params):
+        return self.sampling.negative_binomial(id, init_params, self)
 # ===========================================================================

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/pyRDDLGym_jax/core/planner.py RENAMED Viewed

@@ -47,7 +47,9 @@ import jax.random as random
 import numpy as np
 import optax
 import termcolor
-from tqdm import tqdm
+from tqdm import tqdm, TqdmWarning
+import warnings
+warnings.filterwarnings("ignore", category=TqdmWarning)
 from pyRDDLGym.core.compiler.model import RDDLPlanningModel, RDDLLiftedModel
 from pyRDDLGym.core.debug.logger import Logger
@@ -1212,17 +1214,22 @@ class GaussianPGPE(PGPE):
                  init_sigma: float=1.0,
                  sigma_range: Tuple[float, float]=(1e-5, 1e5),
                  scale_reward: bool=True,
+                 min_reward_scale: float=1e-5,
                  super_symmetric: bool=True,
                  super_symmetric_accurate: bool=True,
                  optimizer: Callable[..., optax.GradientTransformation]=optax.adam,
                  optimizer_kwargs_mu: Optional[Kwargs]=None,
-                 optimizer_kwargs_sigma: Optional[Kwargs]=None) -> None:
+                 optimizer_kwargs_sigma: Optional[Kwargs]=None,
+                 start_entropy_coeff: float=1e-3,
+                 end_entropy_coeff: float=1e-8,
+                 max_kl_update: Optional[float]=None) -> None:
         '''Creates a new Gaussian PGPE planner.
         :param batch_size: how many policy parameters to sample per optimization step
         :param init_sigma: initial standard deviation of Gaussian
         :param sigma_range: bounds to constrain standard deviation
         :param scale_reward: whether to apply reward scaling as in the paper
+        :param min_reward_scale: minimum reward scaling to avoid underflow
         :param super_symmetric: whether to use super-symmetric sampling as in the paper
         :param super_symmetric_accurate: whether to use the accurate formula for super-
         symmetric sampling or the simplified but biased formula
@@ -1231,6 +1238,9 @@ class GaussianPGPE(PGPE):
         factory for the mean optimizer
         :param optimizer_kwargs_sigma: a dictionary of parameters to pass to the SGD
         factory for the standard deviation optimizer
+        :param start_entropy_coeff: starting entropy regularization coeffient for Gaussian
+        :param end_entropy_coeff: ending entropy regularization coeffient for Gaussian
+        :param max_kl_update: bound on kl-divergence between parameter updates
         '''
         super().__init__()
@@ -1238,8 +1248,13 @@ class GaussianPGPE(PGPE):
         self.init_sigma = init_sigma
         self.sigma_range = sigma_range
         self.scale_reward = scale_reward
+        self.min_reward_scale = min_reward_scale
         self.super_symmetric = super_symmetric
         self.super_symmetric_accurate = super_symmetric_accurate
+        # entropy regularization penalty is decayed exponentially between these values
+        self.start_entropy_coeff = start_entropy_coeff
+        self.end_entropy_coeff = end_entropy_coeff
         # set optimizers
         if optimizer_kwargs_mu is None:
@@ -1249,36 +1264,62 @@ class GaussianPGPE(PGPE):
             optimizer_kwargs_sigma = {'learning_rate': 0.1}
         self.optimizer_kwargs_sigma = optimizer_kwargs_sigma
         self.optimizer_name = optimizer
-        mu_optimizer = optimizer(**optimizer_kwargs_mu)
-        sigma_optimizer = optimizer(**optimizer_kwargs_sigma)
+        try:
+            mu_optimizer = optax.inject_hyperparams(optimizer)(**optimizer_kwargs_mu)
+            sigma_optimizer = optax.inject_hyperparams(optimizer)(**optimizer_kwargs_sigma)
+        except Exception as _:
+            raise_warning(
+                f'Failed to inject hyperparameters into optax optimizer for PGPE, '
+                'rolling back to safer method: please note that kl-divergence '
+                'constraints will be disabled.', 'red')
+            mu_optimizer = optimizer(**optimizer_kwargs_mu)
+            sigma_optimizer = optimizer(**optimizer_kwargs_sigma)
+            max_kl_update = None
         self.optimizers = (mu_optimizer, sigma_optimizer)
+        self.max_kl = max_kl_update
     def __str__(self) -> str:
         return (f'PGPE hyper-parameters:\n'
-                f'    method         ={self.__class__.__name__}\n'
-                f'    batch_size     ={self.batch_size}\n'
-                f'    init_sigma     ={self.init_sigma}\n'
-                f'    sigma_range    ={self.sigma_range}\n'
-                f'    scale_reward   ={self.scale_reward}\n'
-                f'    super_symmetric={self.super_symmetric}\n'
-                f'        accurate   ={self.super_symmetric_accurate}\n'
-                f'    optimizer      ={self.optimizer_name}\n'
+                f'    method             ={self.__class__.__name__}\n'
+                f'    batch_size         ={self.batch_size}\n'
+                f'    init_sigma         ={self.init_sigma}\n'
+                f'    sigma_range        ={self.sigma_range}\n'
+                f'    scale_reward       ={self.scale_reward}\n'
+                f'    min_reward_scale   ={self.min_reward_scale}\n'
+                f'    super_symmetric    ={self.super_symmetric}\n'
+                f'        accurate       ={self.super_symmetric_accurate}\n'
+                f'    optimizer          ={self.optimizer_name}\n'
                 f'    optimizer_kwargs:\n'
                 f'        mu   ={self.optimizer_kwargs_mu}\n'
                 f'        sigma={self.optimizer_kwargs_sigma}\n'
+                f'    start_entropy_coeff={self.start_entropy_coeff}\n'
+                f'    end_entropy_coeff  ={self.end_entropy_coeff}\n'
+                f'    max_kl_update      ={self.max_kl}\n'
         )
     def compile(self, loss_fn: Callable, projection: Callable, real_dtype: Type) -> None:
-        MIN_NORM = 1e-5
         sigma0 = self.init_sigma
         sigma_range = self.sigma_range
         scale_reward = self.scale_reward
+        min_reward_scale = self.min_reward_scale
         super_symmetric = self.super_symmetric
         super_symmetric_accurate = self.super_symmetric_accurate
         batch_size = self.batch_size
         optimizers = (mu_optimizer, sigma_optimizer) = self.optimizers
-        # initializer
+        max_kl = self.max_kl
+        # entropy regularization penalty is decayed exponentially by elapsed budget
+        start_entropy_coeff = self.start_entropy_coeff
+        if start_entropy_coeff == 0:
+            entropy_coeff_decay = 0
+        else:
+            entropy_coeff_decay = (self.end_entropy_coeff / start_entropy_coeff) ** 0.01
+        # ***********************************************************************
+        # INITIALIZATION OF POLICY
+        #
+        # ***********************************************************************
         def _jax_wrapped_pgpe_init(key, policy_params):
             mu = policy_params
             sigma = jax.tree_map(lambda x: sigma0 * jnp.ones_like(x), mu)
@@ -1289,7 +1330,11 @@ class GaussianPGPE(PGPE):
         self._initializer = jax.jit(_jax_wrapped_pgpe_init)
-        # parameter sampling functions
+        # ***********************************************************************
+        # PARAMETER SAMPLING FUNCTIONS
+        #
+        # ***********************************************************************
         def _jax_wrapped_mu_noise(key, sigma):
             return sigma * random.normal(key, shape=jnp.shape(sigma), dtype=real_dtype)
@@ -1299,19 +1344,20 @@ class GaussianPGPE(PGPE):
             a = (sigma - jnp.abs(epsilon)) / sigma
             if super_symmetric_accurate:
                 aa = jnp.abs(a)
+                aa3 = jnp.power(aa, 3)
                 epsilon_star = jnp.sign(epsilon) * phi * jnp.where(
                     a <= 0,
-                    jnp.exp(c1 * aa * (aa * aa - 1) / jnp.log(aa + 1e-10) + c2 * aa),
-                    jnp.exp(aa - c3 * aa * jnp.log(1.0 - jnp.power(aa, 3) + 1e-10))
+                    jnp.exp(c1 * (aa3 - aa) / jnp.log(aa + 1e-10) + c2 * aa),
+                    jnp.exp(aa - c3 * aa * jnp.log(1.0 - aa3 + 1e-10))
                 )
             else:
                 epsilon_star = jnp.sign(epsilon) * phi * jnp.exp(a)
             return epsilon_star
         def _jax_wrapped_sample_params(key, mu, sigma):
-            keys = random.split(key, num=len(jax.tree_util.tree_leaves(mu)))
-            keys_pytree = jax.tree_util.tree_unflatten(
-                treedef=jax.tree_util.tree_structure(mu), leaves=keys)
+            treedef = jax.tree_util.tree_structure(sigma)
+            keys = random.split(key, num=treedef.num_leaves)
+            keys_pytree = jax.tree_util.tree_unflatten(treedef=treedef, leaves=keys)
             epsilon = jax.tree_map(_jax_wrapped_mu_noise, keys_pytree, sigma)
             p1 = jax.tree_map(jnp.add, mu, epsilon)
             p2 = jax.tree_map(jnp.subtract, mu, epsilon)
@@ -1321,14 +1367,18 @@ class GaussianPGPE(PGPE):
                 p4 = jax.tree_map(jnp.subtract, mu, epsilon_star)
             else:
                 epsilon_star, p3, p4 = epsilon, p1, p2
-            return (p1, p2, p3, p4), (epsilon, epsilon_star)
+            return p1, p2, p3, p4, epsilon, epsilon_star
-        # policy gradient update functions
+        # ***********************************************************************
+        # POLICY GRADIENT CALCULATION
+        #
+        # ***********************************************************************
         def _jax_wrapped_mu_grad(epsilon, epsilon_star, r1, r2, r3, r4, m):
             if super_symmetric:
                 if scale_reward:
-                    scale1 = jnp.maximum(MIN_NORM, m - (r1 + r2) / 2)
-                    scale2 = jnp.maximum(MIN_NORM, m - (r3 + r4) / 2)
+                    scale1 = jnp.maximum(min_reward_scale, m - (r1 + r2) / 2)
+                    scale2 = jnp.maximum(min_reward_scale, m - (r3 + r4) / 2)
                 else:
                     scale1 = scale2 = 1.0
                 r_mu1 = (r1 - r2) / (2 * scale1)
@@ -1336,37 +1386,37 @@ class GaussianPGPE(PGPE):
                 grad = -(r_mu1 * epsilon + r_mu2 * epsilon_star)
             else:
                 if scale_reward:
-                    scale = jnp.maximum(MIN_NORM, m - (r1 + r2) / 2)
+                    scale = jnp.maximum(min_reward_scale, m - (r1 + r2) / 2)
                 else:
                     scale = 1.0
                 r_mu = (r1 - r2) / (2 * scale)
                 grad = -r_mu * epsilon
             return grad
-        def _jax_wrapped_sigma_grad(epsilon, epsilon_star, sigma, r1, r2, r3, r4, m):
+        def _jax_wrapped_sigma_grad(epsilon, epsilon_star, sigma, r1, r2, r3, r4, m, ent):
             if super_symmetric:
                 mask = r1 + r2 >= r3 + r4
                 epsilon_tau = mask * epsilon + (1 - mask) * epsilon_star
-                s = epsilon_tau * epsilon_tau / sigma - sigma
+                s = jnp.square(epsilon_tau) / sigma - sigma
                 if scale_reward:
-                    scale = jnp.maximum(MIN_NORM, m - (r1 + r2 + r3 + r4) / 4)
+                    scale = jnp.maximum(min_reward_scale, m - (r1 + r2 + r3 + r4) / 4)
                 else:
                     scale = 1.0
                 r_sigma = ((r1 + r2) - (r3 + r4)) / (4 * scale)
             else:
-                s = epsilon * epsilon / sigma - sigma
+                s = jnp.square(epsilon) / sigma - sigma
                 if scale_reward:
-                    scale = jnp.maximum(MIN_NORM, jnp.abs(m))
+                    scale = jnp.maximum(min_reward_scale, jnp.abs(m))
                 else:
                     scale = 1.0
                 r_sigma = (r1 + r2) / (2 * scale)
-            grad = -r_sigma * s
+            grad = -(r_sigma * s + ent / sigma)
             return grad
-        def _jax_wrapped_pgpe_grad(key, mu, sigma, r_max,
+        def _jax_wrapped_pgpe_grad(key, mu, sigma, r_max, ent,
                                    policy_hyperparams, subs, model_params):
             key, subkey = random.split(key)
-            (p1, p2, p3, p4), (epsilon, epsilon_star) = _jax_wrapped_sample_params(
+            p1, p2, p3, p4, epsilon, epsilon_star = _jax_wrapped_sample_params(
                 key, mu, sigma)
             r1 = -loss_fn(subkey, p1, policy_hyperparams, subs, model_params)[0]
             r2 = -loss_fn(subkey, p2, policy_hyperparams, subs, model_params)[0]
@@ -1384,42 +1434,76 @@ class GaussianPGPE(PGPE):
                 epsilon, epsilon_star
             )
             grad_sigma = jax.tree_map(
-                partial(_jax_wrapped_sigma_grad, r1=r1, r2=r2, r3=r3, r4=r4, m=r_max),
+                partial(_jax_wrapped_sigma_grad,
+                        r1=r1, r2=r2, r3=r3, r4=r4, m=r_max, ent=ent),
                 epsilon, epsilon_star, sigma
             )
             return grad_mu, grad_sigma, r_max
-        def _jax_wrapped_pgpe_grad_batched(key, pgpe_params, r_max,
+        def _jax_wrapped_pgpe_grad_batched(key, pgpe_params, r_max, ent,
                                            policy_hyperparams, subs, model_params):
             mu, sigma = pgpe_params
             if batch_size == 1:
                 mu_grad, sigma_grad, new_r_max = _jax_wrapped_pgpe_grad(
-                    key, mu, sigma, r_max, policy_hyperparams, subs, model_params)
+                    key, mu, sigma, r_max, ent, policy_hyperparams, subs, model_params)
             else:
                 keys = random.split(key, num=batch_size)
                 mu_grads, sigma_grads, r_maxs = jax.vmap(
                     _jax_wrapped_pgpe_grad,
-                    in_axes=(0, None, None, None, None, None, None)
-                )(keys, mu, sigma, r_max, policy_hyperparams, subs, model_params)
+                    in_axes=(0, None, None, None, None, None, None, None)
+                )(keys, mu, sigma, r_max, ent, policy_hyperparams, subs, model_params)
                 mu_grad, sigma_grad = jax.tree_map(
                     partial(jnp.mean, axis=0), (mu_grads, sigma_grads))
                 new_r_max = jnp.max(r_maxs)
             return mu_grad, sigma_grad, new_r_max
+        # ***********************************************************************
+        # PARAMETER UPDATE
+        #
+        # ***********************************************************************
-        def _jax_wrapped_pgpe_update(key, pgpe_params, r_max,
+        def _jax_wrapped_pgpe_kl_term(mu, sigma, old_mu, old_sigma):
+            return 0.5 * jnp.sum(2 * jnp.log(sigma / old_sigma) +
+                                 jnp.square(old_sigma / sigma) +
+                                 jnp.square((mu - old_mu) / sigma) - 1)
+        def _jax_wrapped_pgpe_update(key, pgpe_params, r_max, progress,
                                      policy_hyperparams, subs, model_params,
                                      pgpe_opt_state):
+            # regular update
             mu, sigma = pgpe_params
             mu_state, sigma_state = pgpe_opt_state
+            ent = start_entropy_coeff * jnp.power(entropy_coeff_decay, progress)
             mu_grad, sigma_grad, new_r_max = _jax_wrapped_pgpe_grad_batched(
-                key, pgpe_params, r_max, policy_hyperparams, subs, model_params)
+                key, pgpe_params, r_max, ent, policy_hyperparams, subs, model_params)
             mu_updates, new_mu_state = mu_optimizer.update(mu_grad, mu_state, params=mu)
             sigma_updates, new_sigma_state = sigma_optimizer.update(
                 sigma_grad, sigma_state, params=sigma)
             new_mu = optax.apply_updates(mu, mu_updates)
-            new_mu, converged = projection(new_mu, policy_hyperparams)
             new_sigma = optax.apply_updates(sigma, sigma_updates)
             new_sigma = jax.tree_map(lambda x: jnp.clip(x, *sigma_range), new_sigma)
+            # respect KL divergence contraint with old parameters
+            if max_kl is not None:
+                old_mu_lr = new_mu_state.hyperparams['learning_rate']
+                old_sigma_lr = new_sigma_state.hyperparams['learning_rate']
+                kl_terms = jax.tree_map(
+                    _jax_wrapped_pgpe_kl_term, new_mu, new_sigma, mu, sigma)
+                total_kl = jax.tree_util.tree_reduce(jnp.add, kl_terms)
+                kl_reduction = jnp.minimum(1.0, jnp.sqrt(max_kl / total_kl))
+                mu_state.hyperparams['learning_rate'] = old_mu_lr * kl_reduction
+                sigma_state.hyperparams['learning_rate'] = old_sigma_lr * kl_reduction
+                mu_updates, new_mu_state = mu_optimizer.update(mu_grad, mu_state, params=mu)
+                sigma_updates, new_sigma_state = sigma_optimizer.update(
+                    sigma_grad, sigma_state, params=sigma)
+                new_mu = optax.apply_updates(mu, mu_updates)
+                new_sigma = optax.apply_updates(sigma, sigma_updates)
+                new_sigma = jax.tree_map(lambda x: jnp.clip(x, *sigma_range), new_sigma)
+                new_mu_state.hyperparams['learning_rate'] = old_mu_lr
+                new_sigma_state.hyperparams['learning_rate'] = old_sigma_lr
+            # apply projection step and finalize results
+            new_mu, converged = projection(new_mu, policy_hyperparams)
             new_pgpe_params = (new_mu, new_sigma)
             new_pgpe_opt_state = (new_mu_state, new_sigma_state)
             policy_params = new_mu
@@ -1462,14 +1546,14 @@ def mean_deviation_utility(returns: jnp.ndarray, beta: float) -> float:
 @jax.jit
 def mean_semideviation_utility(returns: jnp.ndarray, beta: float) -> float:
     mu = jnp.mean(returns)
-    msd = jnp.sqrt(jnp.mean(jnp.minimum(0.0, returns - mu) ** 2))
+    msd = jnp.sqrt(jnp.mean(jnp.square(jnp.minimum(0.0, returns - mu))))
     return mu - 0.5 * beta * msd
 @jax.jit
 def mean_semivariance_utility(returns: jnp.ndarray, beta: float) -> float:
     mu = jnp.mean(returns)
-    msv = jnp.mean(jnp.minimum(0.0, returns - mu) ** 2)
+    msv = jnp.mean(jnp.square(jnp.minimum(0.0, returns - mu)))
     return mu - 0.5 * beta * msv
@@ -1768,7 +1852,6 @@ r"""
         # optimization
         self.update = self._jax_update(train_loss)
-        self.check_zero_grad = self._jax_check_zero_gradients()
         # pgpe option
         if self.use_pgpe:
@@ -1831,6 +1914,12 @@ r"""
         projection = self.plan.projection
         use_ls = self.line_search_kwargs is not None
+        # check if the gradients are all zeros
+        def _jax_wrapped_zero_gradients(grad):
+            leaves, _ = jax.tree_util.tree_flatten(
+                jax.tree_map(lambda g: jnp.allclose(g, 0), grad))
+            return jnp.all(jnp.asarray(leaves))
         # calculate the plan gradient w.r.t. return loss and update optimizer
         # also perform a projection step to satisfy constraints on actions
         def _jax_wrapped_loss_swapped(policy_params, key, policy_hyperparams,
@@ -1855,23 +1944,12 @@ r"""
             policy_params, converged = projection(policy_params, policy_hyperparams)
             log['grad'] = grad
             log['updates'] = updates
+            zero_grads = _jax_wrapped_zero_gradients(grad)
             return policy_params, converged, opt_state, opt_aux, \
-                loss_val, log, model_params
+                loss_val, log, model_params, zero_grads
         return jax.jit(_jax_wrapped_plan_update)
-    def _jax_check_zero_gradients(self):
-        def _jax_wrapped_zero_gradient(grad):
-            return jnp.allclose(grad, 0)
-        def _jax_wrapped_zero_gradients(grad):
-            leaves, _ = jax.tree_util.tree_flatten(
-                jax.tree_map(_jax_wrapped_zero_gradient, grad))
-            return jnp.all(jnp.asarray(leaves))
-        return jax.jit(_jax_wrapped_zero_gradients)
     def _batched_init_subs(self, subs):
         rddl = self.rddl
         n_train, n_test = self.batch_size_train, self.batch_size_test
@@ -2175,11 +2253,12 @@ r"""
         # ======================================================================
         # initialize running statistics
-        best_params, best_loss, best_grad = policy_params, jnp.inf, jnp.inf
+        best_params, best_loss, best_grad = policy_params, jnp.inf, None
         last_iter_improve = 0
         rolling_test_loss = RollingMean(test_rolling_window)
         log = {}
         status = JaxPlannerStatus.NORMAL
+        progress_percent = 0
         # initialize stopping criterion
         if stopping_rule is not None:
@@ -2191,18 +2270,19 @@ r"""
                 dashboard_id, dashboard.get_planner_info(self),
                 key=dash_key, viz=self.dashboard_viz)
+        # progress bar
+        if print_progress:
+            progress_bar = tqdm(None, total=100, position=tqdm_position,
+                                bar_format='{l_bar}{bar}| {elapsed} {postfix}')
+        else:
+            progress_bar = None
+        position_str = '' if tqdm_position is None else f'[{tqdm_position}]'
         # ======================================================================
         # MAIN TRAINING LOOP BEGINS
         # ======================================================================
-        iters = range(epochs)
-        if print_progress:
-            iters = tqdm(iters, total=100,
-                         bar_format='{l_bar}{bar}| {elapsed} {postfix}',
-                         position=tqdm_position)
-        position_str = '' if tqdm_position is None else f'[{tqdm_position}]'
-        for it in iters:
+        for it in range(epochs):
             # ==================================================================
             # NEXT GRADIENT DESCENT STEP
@@ -2213,8 +2293,9 @@ r"""
             # update the parameters of the plan
             key, subkey = random.split(key)
             (policy_params, converged, opt_state, opt_aux, train_loss, train_log,
-             model_params) = self.update(subkey, policy_params, policy_hyperparams,
-                                         train_subs, model_params, opt_state, opt_aux)
+             model_params, zero_grads) = self.update(
+                 subkey, policy_params, policy_hyperparams, train_subs, model_params,
+                 opt_state, opt_aux)
             test_loss, (test_log, model_params_test) = self.test_loss(
                 subkey, policy_params, policy_hyperparams, test_subs, model_params_test)
             test_loss_smooth = rolling_test_loss.update(test_loss)
@@ -2224,8 +2305,9 @@ r"""
             if self.use_pgpe:
                 key, subkey = random.split(key)
                 pgpe_params, r_max, pgpe_opt_state, pgpe_param, pgpe_converged = \
-                    self.pgpe.update(subkey, pgpe_params, r_max, policy_hyperparams,
-                                     test_subs, model_params, pgpe_opt_state)
+                    self.pgpe.update(subkey, pgpe_params, r_max, progress_percent,
+                                     policy_hyperparams, test_subs, model_params_test,
+                                     pgpe_opt_state)
                 pgpe_loss, _ = self.test_loss(
                     subkey, pgpe_param, policy_hyperparams, test_subs, model_params_test)
                 pgpe_loss_smooth = rolling_pgpe_loss.update(pgpe_loss)
@@ -2252,7 +2334,7 @@ r"""
             # ==================================================================
             # no progress
-            if (not pgpe_improve) and self.check_zero_grad(train_log['grad']):
+            if (not pgpe_improve) and zero_grads:
                 status = JaxPlannerStatus.NO_PROGRESS
             # constraint satisfaction problem
@@ -2311,14 +2393,15 @@ r"""
             # if the progress bar is used
             if print_progress:
-                iters.n = progress_percent
-                iters.set_description(
+                progress_bar.set_description(
                     f'{position_str} {it:6} it / {-train_loss:14.5f} train / '
                     f'{-test_loss_smooth:14.5f} test / {-best_loss:14.5f} best / '
                     f'{status.value} status / {total_pgpe_it:6} pgpe',
                     refresh=False
                 )
-                iters.set_postfix_str(f"{(it + 1) / elapsed:.2f}it/s", refresh=True)
+                progress_bar.set_postfix_str(
+                    f"{(it + 1) / (elapsed + 1e-6):.2f}it/s", refresh=False)
+                progress_bar.update(progress_percent - progress_bar.n)
             # dash-board
             if dashboard is not None:
@@ -2339,7 +2422,7 @@ r"""
         # release resources
         if print_progress:
-            iters.close()
+            progress_bar.close()
         # validate the test return
         if log:

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/pyRDDLGym_jax.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: pyRDDLGym-jax
-Version: 2.1
+Version: 2.3
 Summary: pyRDDLGym-jax: automatic differentiation for solving sequential planning problems in JAX.
 Home-page: https://github.com/pyrddlgym-project/pyRDDLGym-jax
 Author: Michael Gimelfarb, Ayal Taitler, Scott Sanner
@@ -58,8 +58,11 @@ Dynamic: summary
 Purpose:
-1. automatic translation of any RDDL description file into a differentiable simulator in JAX
-2. flexible policy class representations, automatic model relaxations for working in discrete and hybrid domains, and Bayesian hyper-parameter tuning.
+1. automatic translation of RDDL description files into differentiable JAX simulators
+2. implementation of (highly configurable) operator relaxations for working in discrete and hybrid domains
+3. flexible policy representations and automated Bayesian hyper-parameter tuning
+4. interactive dashboard for dyanmic visualization and debugging
+5. hybridization with parameter-exploring policy gradients.
 Some demos of solved problems by JaxPlan:
@@ -235,8 +238,23 @@ More documentation about this and other new features will be coming soon.
 ## Tuning the Planner
-It is easy to tune the planner's hyper-parameters efficiently and automatically using Bayesian optimization.
-To do this, first create a config file template with patterns replacing concrete parameter values that you want to tune, e.g.:
+A basic run script is provided to run automatic Bayesian hyper-parameter tuning for the most sensitive parameters of JaxPlan:
+```shell
+jaxplan tune <domain> <instance> <method> <trials> <iters> <workers> <dashboard>
+```
+where:
+- ``domain`` is the domain identifier as specified in rddlrepository
+- ``instance`` is the instance identifier
+- ``method`` is the planning method to use (i.e. drp, slp, replan)
+- ``trials`` is the (optional) number of trials/episodes to average in evaluating each hyper-parameter setting
+- ``iters`` is the (optional) maximum number of iterations/evaluations of Bayesian optimization to perform
+- ``workers`` is the (optional) number of parallel evaluations to be done at each iteration, e.g. the total evaluations = ``iters * workers``
+- ``dashboard`` is whether the optimizations are tracked in the dashboard application.
+It is easy to tune a custom range of the planner's hyper-parameters efficiently.
+First create a config file template with patterns replacing concrete parameter values that you want to tune, e.g.:
 ```ini
 [Model]
@@ -260,7 +278,7 @@ train_on_reset=True
 would allow to tune the sharpness of model relaxations, and the learning rate of the optimizer.
-Next, you must link the patterns in the config with concrete hyper-parameter ranges the tuner will understand:
+Next, you must link the patterns in the config with concrete hyper-parameter ranges the tuner will understand, and run the optimizer:
 ```python
 import pyRDDLGym
@@ -292,22 +310,7 @@ tuning = JaxParameterTuning(env=env,
                             gp_iters=iters)
 tuning.tune(key=42, log_file='path/to/log.csv')
 ```
-A basic run script is provided to run the automatic hyper-parameter tuning for the most sensitive parameters of JaxPlan:
-```shell
-jaxplan tune <domain> <instance> <method> <trials> <iters> <workers> <dashboard>
-```
-where:
-- ``domain`` is the domain identifier as specified in rddlrepository
-- ``instance`` is the instance identifier
-- ``method`` is the planning method to use (i.e. drp, slp, replan)
-- ``trials`` is the (optional) number of trials/episodes to average in evaluating each hyper-parameter setting
-- ``iters`` is the (optional) maximum number of iterations/evaluations of Bayesian optimization to perform
-- ``workers`` is the (optional) number of parallel evaluations to be done at each iteration, e.g. the total evaluations = ``iters * workers``
-- ``dashboard`` is whether the optimizations are tracked in the dashboard application.
 ## Simulation

{pyrddlgym_jax-2.1 → pyrddlgym_jax-2.3}/setup.py RENAMED Viewed

@@ -19,7 +19,7 @@ long_description = (Path(__file__).parent / "README.md").read_text()
 setup(
       name='pyRDDLGym-jax',
-      version='2.1',
+      version='2.3',
       author="Michael Gimelfarb, Ayal Taitler, Scott Sanner",
       author_email="mike.gimelfarb@mail.utoronto.ca, ataitler@gmail.com, ssanner@mie.utoronto.ca",
       description="pyRDDLGym-jax: automatic differentiation for solving sequential planning problems in JAX.",