PyPI - tensor-optix - Versions diffs - 1.2.2__tar.gz → 1.2.4__tar.gz - Mend

tensor-optix 1.2.2tar.gz → 1.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/PKG-INFO RENAMED Viewed

@@ -1,16 +1,17 @@
 Metadata-Version: 2.4
 Name: tensor-optix
-Version: 1.2.2
+Version: 1.2.4
 Summary: Autonomous training loop for any sequential learning model — built-in PPO, DQN, and SAC for TensorFlow and PyTorch
 Author: sup3rus3r
 License-Expression: MIT
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
 Requires-Dist: tensorflow>=2.18.0
-Requires-Dist: gymnasium>=1.0.0
+Requires-Dist: gymnasium[box2d]>=1.0.0
 Requires-Dist: numpy>=1.24.0
 Requires-Dist: matplotlib>=3.7.0
 Requires-Dist: optuna>=3.0.0
+Requires-Dist: swig>=4.4.1
 Provides-Extra: torch
 Requires-Dist: torch>=2.0.0; extra == "torch"
 Requires-Dist: torchvision; extra == "torch"
@@ -926,6 +927,86 @@ tensor_optix/
 ---
+## Common Pitfalls & Best Practices
+### Device management
+Every built-in Torch agent (`TorchPPOAgent`, `TorchGaussianPPOAgent`, `TorchDQNAgent`, `TorchSACAgent`) accepts a `device` parameter and moves its networks there on construction. The default is `"auto"`, which selects CUDA if available.
+```python
+agent = TorchPPOAgent(actor=actor, critic=critic, optimizer=opt,
+                      hyperparams=hp, device="cuda")   # or "cpu", "auto"
+```
+The base `TorchAgent` adapter now also accepts `device="auto"` and applies it consistently in `act()` and `load_weights()`. If you subclass `TorchAgent` directly, pass `device` to `super().__init__()` — otherwise obs tensors and loaded checkpoints default to CPU even on a CUDA machine.
+**Watch out:** constructing the optimizer _before_ calling `.to(device)` on the model is safe because optimizers hold references to parameter tensors, not copies. But creating the optimizer _after_ `agent.load_weights()` restores weights to the wrong device can leave parameters split between CPU and GPU, which causes a silent slowdown rather than an error.
+### Ensemble memory on GPU
+Spawning agents with `PolicyManager.spawn_variant()` or `agent_factory` mode creates new networks on the target device. Calling `prune()` removes agents from the ensemble and automatically calls `agent.teardown()` on each removed agent, which moves its networks to CPU and calls `torch.cuda.empty_cache()`.
+If you remove agents from the ensemble by any other means (e.g., rebuilding `_ensemble` manually), call `teardown()` yourself:
+```python
+removed = pm.prune(bottom_k=2)   # teardown() is called automatically
+# If removing manually:
+agent.teardown()
+```
+For long PBT-style runs with frequent spawning, monitor GPU memory with `torch.cuda.memory_allocated()`. If memory grows despite pruning, the likely cause is optimizer state — gradient moments accumulate per parameter. Re-creating the optimizer on each spawn (as the built-in `agent_factory` pattern does) avoids this.
+### On-policy vs. off-policy rollback
+`rollback_on_degradation=True` is safe for PPO but harmful for DQN and SAC. Off-policy agents accumulate experience in a replay buffer across many policies. Rolling back weights without clearing the buffer means the restored policy immediately trains on transitions it never generated — corrupted Bellman targets drag it back down.
+The framework handles this automatically: any agent where `is_on_policy` returns `False` skips the weight rollback even when `rollback_on_degradation=True`. If you write a custom off-policy agent, override the property:
+```python
+@property
+def is_on_policy(self) -> bool:
+    return False
+```
+### Wiring PolicyManager early stopping
+`PolicyManager.as_callback()` returns a `PolicyManagerCallback` that stops training when the spawn budget is exhausted — but only if you wire the stop function:
+```python
+pm_cb = pm.as_callback(agent, agent_factory=my_factory)
+rl_opt = RLOptimizer(...)
+pm_cb.set_stop_fn(rl_opt.stop)   # required — without this, training runs the full budget
+rl_opt.add_callback(pm_cb)
+rl_opt.run()
+```
+Without `set_stop_fn`, the callback prints the training report when the budget runs out but cannot halt the loop. Training continues until `max_episodes` is reached.
+For the factory-mode PPO path (where `agent_factory` is passed to `RLOptimizer` and `pm_cb` is created inside the factory), wire the stop function inside the factory — `rl_opt` is already bound in the enclosing scope by the time the factory is called:
+```python
+def agent_factory_full(params):
+    agent = make_agent(params)
+    pm_cb = pm.as_callback(agent, agent_factory=lambda: make_agent(params))
+    pm_cb.set_stop_fn(rl_opt.stop)   # rl_opt is bound before run() calls this factory
+    rl_opt.add_callback(pm_cb)
+    return agent
+rl_opt = RLOptimizer(agent_factory=agent_factory_full, ...)
+rl_opt.run()
+```
+### Checkpoint directory hygiene
+Each run writes checkpoints to `checkpoint_dir`. If you reuse the same directory across restarts without clearing it, `CheckpointRegistry` will load stale snapshots from a previous run and roll back to them during training. Either pass a unique directory per run (include seed and timestamp) or call `shutil.rmtree(ckpt_dir, ignore_errors=True)` at the start of each run.
+### State dict key mismatches during weight averaging / spawning
+`average_weights()` and `load_weights()` use PyTorch `state_dict` keys. If the architecture passed to a spawned agent shell differs from the one that was checkpointed (different layer names, sizes, or number of layers), `load_state_dict()` will raise a `RuntimeError` with a key mismatch message. The framework does not catch this — it is user responsibility to pass a compatible shell. The safest pattern is to use the same `agent_factory` for both the primary agent and all spawned variants.
+---
 ## Math & Science Reference
 ### SPSA Gradient Estimate (`SPSAOptimizer`)

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/README.md RENAMED Viewed

@@ -884,6 +884,86 @@ tensor_optix/
 ---
+## Common Pitfalls & Best Practices
+### Device management
+Every built-in Torch agent (`TorchPPOAgent`, `TorchGaussianPPOAgent`, `TorchDQNAgent`, `TorchSACAgent`) accepts a `device` parameter and moves its networks there on construction. The default is `"auto"`, which selects CUDA if available.
+```python
+agent = TorchPPOAgent(actor=actor, critic=critic, optimizer=opt,
+                      hyperparams=hp, device="cuda")   # or "cpu", "auto"
+```
+The base `TorchAgent` adapter now also accepts `device="auto"` and applies it consistently in `act()` and `load_weights()`. If you subclass `TorchAgent` directly, pass `device` to `super().__init__()` — otherwise obs tensors and loaded checkpoints default to CPU even on a CUDA machine.
+**Watch out:** constructing the optimizer _before_ calling `.to(device)` on the model is safe because optimizers hold references to parameter tensors, not copies. But creating the optimizer _after_ `agent.load_weights()` restores weights to the wrong device can leave parameters split between CPU and GPU, which causes a silent slowdown rather than an error.
+### Ensemble memory on GPU
+Spawning agents with `PolicyManager.spawn_variant()` or `agent_factory` mode creates new networks on the target device. Calling `prune()` removes agents from the ensemble and automatically calls `agent.teardown()` on each removed agent, which moves its networks to CPU and calls `torch.cuda.empty_cache()`.
+If you remove agents from the ensemble by any other means (e.g., rebuilding `_ensemble` manually), call `teardown()` yourself:
+```python
+removed = pm.prune(bottom_k=2)   # teardown() is called automatically
+# If removing manually:
+agent.teardown()
+```
+For long PBT-style runs with frequent spawning, monitor GPU memory with `torch.cuda.memory_allocated()`. If memory grows despite pruning, the likely cause is optimizer state — gradient moments accumulate per parameter. Re-creating the optimizer on each spawn (as the built-in `agent_factory` pattern does) avoids this.
+### On-policy vs. off-policy rollback
+`rollback_on_degradation=True` is safe for PPO but harmful for DQN and SAC. Off-policy agents accumulate experience in a replay buffer across many policies. Rolling back weights without clearing the buffer means the restored policy immediately trains on transitions it never generated — corrupted Bellman targets drag it back down.
+The framework handles this automatically: any agent where `is_on_policy` returns `False` skips the weight rollback even when `rollback_on_degradation=True`. If you write a custom off-policy agent, override the property:
+```python
+@property
+def is_on_policy(self) -> bool:
+    return False
+```
+### Wiring PolicyManager early stopping
+`PolicyManager.as_callback()` returns a `PolicyManagerCallback` that stops training when the spawn budget is exhausted — but only if you wire the stop function:
+```python
+pm_cb = pm.as_callback(agent, agent_factory=my_factory)
+rl_opt = RLOptimizer(...)
+pm_cb.set_stop_fn(rl_opt.stop)   # required — without this, training runs the full budget
+rl_opt.add_callback(pm_cb)
+rl_opt.run()
+```
+Without `set_stop_fn`, the callback prints the training report when the budget runs out but cannot halt the loop. Training continues until `max_episodes` is reached.
+For the factory-mode PPO path (where `agent_factory` is passed to `RLOptimizer` and `pm_cb` is created inside the factory), wire the stop function inside the factory — `rl_opt` is already bound in the enclosing scope by the time the factory is called:
+```python
+def agent_factory_full(params):
+    agent = make_agent(params)
+    pm_cb = pm.as_callback(agent, agent_factory=lambda: make_agent(params))
+    pm_cb.set_stop_fn(rl_opt.stop)   # rl_opt is bound before run() calls this factory
+    rl_opt.add_callback(pm_cb)
+    return agent
+rl_opt = RLOptimizer(agent_factory=agent_factory_full, ...)
+rl_opt.run()
+```
+### Checkpoint directory hygiene
+Each run writes checkpoints to `checkpoint_dir`. If you reuse the same directory across restarts without clearing it, `CheckpointRegistry` will load stale snapshots from a previous run and roll back to them during training. Either pass a unique directory per run (include seed and timestamp) or call `shutil.rmtree(ckpt_dir, ignore_errors=True)` at the start of each run.
+### State dict key mismatches during weight averaging / spawning
+`average_weights()` and `load_weights()` use PyTorch `state_dict` keys. If the architecture passed to a spawned agent shell differs from the one that was checkpointed (different layer names, sizes, or number of layers), `load_state_dict()` will raise a `RuntimeError` with a key mismatch message. The framework does not catch this — it is user responsibility to pass a compatible shell. The safest pattern is to use the same `agent_factory` for both the primary agent and all spawned variants.
+---
 ## Math & Science Reference
 ### SPSA Gradient Estimate (`SPSAOptimizer`)

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "tensor-optix"
-version = "1.2.2"
+version = "1.2.4"
 description = "Autonomous training loop for any sequential learning model — built-in PPO, DQN, and SAC for TensorFlow and PyTorch"
 readme = "README.md"
 license = "MIT"
@@ -15,10 +15,11 @@ authors = [
 dependencies = [
     "tensorflow>=2.18.0",
-    "gymnasium>=1.0.0",
+    "gymnasium[box2d]>=1.0.0",
     "numpy>=1.24.0",
     "matplotlib>=3.7.0",
     "optuna>=3.0.0",
+    "swig>=4.4.1",
 ]
 [project.optional-dependencies]

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/adapters/pytorch/torch_agent.py RENAMED Viewed

@@ -31,13 +31,17 @@ class TorchAgent(BaseAgent):
             model=model,
             optimizer=torch.optim.Adam(model.parameters(), lr=3e-4),
             hyperparams=HyperparamSet(params={"learning_rate": 3e-4, "gamma": 0.99}, episode_id=0),
+            device="auto",  # "cuda" if available, else "cpu"
         )
     """
-    def __init__(self, model, optimizer, hyperparams: HyperparamSet, compute_loss_fn=None):
+    def __init__(self, model, optimizer, hyperparams: HyperparamSet, compute_loss_fn=None, device: str = "auto"):
         import torch
         self._torch = torch
-        self.model = model
+        if device == "auto":
+            device = "cuda" if torch.cuda.is_available() else "cpu"
+        self._device = torch.device(device)
+        self.model = model.to(self._device)
         self.optimizer = optimizer
         self._hyperparams = hyperparams.copy()
         self._compute_loss_fn = compute_loss_fn
@@ -48,7 +52,7 @@ class TorchAgent(BaseAgent):
         Override for continuous actions or custom sampling strategies.
         """
         import torch
-        obs = torch.as_tensor(np.atleast_2d(observation), dtype=torch.float32)
+        obs = torch.as_tensor(np.atleast_2d(observation), dtype=torch.float32).to(self._device)
         with torch.no_grad():
             logits = self.model(obs)
         action = int(torch.argmax(logits, dim=-1).item())
@@ -140,5 +144,11 @@ class TorchAgent(BaseAgent):
     def load_weights(self, path: str) -> None:
         import torch
-        state = torch.load(os.path.join(path, "model.pt"), map_location="cpu")
+        state = torch.load(os.path.join(path, "model.pt"), map_location=self._device)
         self.model.load_state_dict(state)
+    def teardown(self) -> None:
+        """Move model to CPU and free CUDA memory."""
+        import torch
+        self.model.cpu()
+        torch.cuda.empty_cache()

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/algorithms/tf_ppo.py RENAMED Viewed

@@ -291,6 +291,16 @@ class TFPPOAgent(BaseAgent):
     # Internal helpers
     # ------------------------------------------------------------------
+    def reset_cache(self) -> None:
+        """
+        Discard all entries in the rollout cache without learning from them.
+        Called by LoopController after a val-pipeline window is collected, so
+        that val-rollout entries never bleed into the next training learn() call.
+        """
+        self._cache_obs.clear()
+        self._cache_log_probs.clear()
+        self._cache_values.clear()
     def _clear_cache(self, T: int) -> None:
         del self._cache_obs[:T]
         del self._cache_log_probs[:T]

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/algorithms/tf_ppo_continuous.py RENAMED Viewed

@@ -319,6 +319,12 @@ class TFGaussianPPOAgent(BaseAgent):
             for v in module.trainable_variables:
                 v.assign(v * (1.0 + noise_scale * tf.random.normal(v.shape)))
+    def reset_cache(self) -> None:
+        """Discard all rollout cache entries without learning from them."""
+        self._cache_obs.clear()
+        self._cache_log_probs.clear()
+        self._cache_values.clear()
     @staticmethod
     def _explained_variance(values: np.ndarray, returns: np.ndarray) -> float:
         var_returns = float(np.var(returns))

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/algorithms/torch_dqn.py RENAMED Viewed

@@ -254,3 +254,10 @@ class TorchDQNAgent(BaseAgent):
             for param in self._q.parameters():
                 param.mul_(1.0 + noise_scale * torch.randn_like(param))
         self._q_target.load_state_dict(self._q.state_dict())
+    def teardown(self) -> None:
+        """Move networks to CPU and free CUDA memory."""
+        import torch
+        self._q.cpu()
+        self._q_target.cpu()
+        torch.cuda.empty_cache()

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/algorithms/torch_ppo.py RENAMED Viewed

@@ -275,6 +275,19 @@ class TorchPPOAgent(BaseAgent):
                 for param in module.parameters():
                     param.mul_(1.0 + noise_scale * torch.randn_like(param))
+    def reset_cache(self) -> None:
+        """Discard all rollout cache entries without learning from them."""
+        self._cache_obs.clear()
+        self._cache_log_probs.clear()
+        self._cache_values.clear()
+    def teardown(self) -> None:
+        """Move networks to CPU and free CUDA memory."""
+        import torch
+        self._actor.cpu()
+        self._critic.cpu()
+        torch.cuda.empty_cache()
     @staticmethod
     def _explained_variance(values: np.ndarray, returns: np.ndarray) -> float:
         var_returns = float(np.var(returns))

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/algorithms/torch_ppo_continuous.py RENAMED Viewed

@@ -343,6 +343,19 @@ class TorchGaussianPPOAgent(BaseAgent):
                 for param in module.parameters():
                     param.mul_(1.0 + noise_scale * torch.randn_like(param))
+    def reset_cache(self) -> None:
+        """Discard all rollout cache entries without learning from them."""
+        self._cache_obs.clear()
+        self._cache_log_probs.clear()
+        self._cache_values.clear()
+    def teardown(self) -> None:
+        """Move networks to CPU and free CUDA memory."""
+        import torch
+        self._actor.cpu()
+        self._critic.cpu()
+        torch.cuda.empty_cache()
     @staticmethod
     def _explained_variance(values: np.ndarray, returns: np.ndarray) -> float:
         var_returns = float(np.var(returns))

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/algorithms/torch_sac.py RENAMED Viewed

@@ -265,6 +265,13 @@ class TorchSACAgent(BaseAgent):
     # Internal helpers
     # ------------------------------------------------------------------
+    def teardown(self) -> None:
+        """Move all networks to CPU and free CUDA memory."""
+        import torch
+        for module in (self._actor, self._c1, self._c2, self._c1_tgt, self._c2_tgt):
+            module.cpu()
+        torch.cuda.empty_cache()
     def _sample_action(self, obs):
         import torch
         out     = self._actor(obs)

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/core/base_agent.py RENAMED Viewed

@@ -114,3 +114,12 @@ class BaseAgent(ABC):
         restores the best checkpoint — so perturbation is always relative
         to the best known weights, not the current (possibly degraded) ones.
         """
+    def teardown(self) -> None:
+        """
+        Release resources held by this agent (GPU memory, file handles, etc.).
+        Called by PolicyManager.prune() when an agent is removed from the
+        ensemble.  Override in framework-specific subclasses to move networks
+        to CPU and free CUDA memory.  Default: no-op.
+        """

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/core/loop_controller.py RENAMED Viewed

@@ -227,6 +227,11 @@ class LoopController:
                     val_episode = next(self._val_gen)
                     val_episode.episode_id = episode_id
                     val_metrics = self._evaluator.score_validation(val_episode)
+                    # Val pipeline calls agent.act() to collect rollouts, which
+                    # populates the on-policy rollout cache. That data must never
+                    # be consumed by the next training learn() call. Clear it now.
+                    if hasattr(self._agent, "reset_cache"):
+                        self._agent.reset_cache()
                     eval_metrics = self._evaluator.combine(train_metrics, val_metrics)
                     eval_metrics.episode_id = episode_id
                     logger.debug(

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix/core/policy_manager.py RENAMED Viewed

@@ -215,6 +215,9 @@ class PolicyManager:
         self._score_history = new_score_history
         self._prune_count += bottom_k
+        for agent in removed_agents:
+            agent.teardown()
         logger.info(
             "PolicyManager.prune: removed %d agent(s), ensemble size now %d",
             len(removed_agents),

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix.egg-info/PKG-INFO RENAMED Viewed

@@ -1,16 +1,17 @@
 Metadata-Version: 2.4
 Name: tensor-optix
-Version: 1.2.2
+Version: 1.2.4
 Summary: Autonomous training loop for any sequential learning model — built-in PPO, DQN, and SAC for TensorFlow and PyTorch
 Author: sup3rus3r
 License-Expression: MIT
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
 Requires-Dist: tensorflow>=2.18.0
-Requires-Dist: gymnasium>=1.0.0
+Requires-Dist: gymnasium[box2d]>=1.0.0
 Requires-Dist: numpy>=1.24.0
 Requires-Dist: matplotlib>=3.7.0
 Requires-Dist: optuna>=3.0.0
+Requires-Dist: swig>=4.4.1
 Provides-Extra: torch
 Requires-Dist: torch>=2.0.0; extra == "torch"
 Requires-Dist: torchvision; extra == "torch"
@@ -926,6 +927,86 @@ tensor_optix/
 ---
+## Common Pitfalls & Best Practices
+### Device management
+Every built-in Torch agent (`TorchPPOAgent`, `TorchGaussianPPOAgent`, `TorchDQNAgent`, `TorchSACAgent`) accepts a `device` parameter and moves its networks there on construction. The default is `"auto"`, which selects CUDA if available.
+```python
+agent = TorchPPOAgent(actor=actor, critic=critic, optimizer=opt,
+                      hyperparams=hp, device="cuda")   # or "cpu", "auto"
+```
+The base `TorchAgent` adapter now also accepts `device="auto"` and applies it consistently in `act()` and `load_weights()`. If you subclass `TorchAgent` directly, pass `device` to `super().__init__()` — otherwise obs tensors and loaded checkpoints default to CPU even on a CUDA machine.
+**Watch out:** constructing the optimizer _before_ calling `.to(device)` on the model is safe because optimizers hold references to parameter tensors, not copies. But creating the optimizer _after_ `agent.load_weights()` restores weights to the wrong device can leave parameters split between CPU and GPU, which causes a silent slowdown rather than an error.
+### Ensemble memory on GPU
+Spawning agents with `PolicyManager.spawn_variant()` or `agent_factory` mode creates new networks on the target device. Calling `prune()` removes agents from the ensemble and automatically calls `agent.teardown()` on each removed agent, which moves its networks to CPU and calls `torch.cuda.empty_cache()`.
+If you remove agents from the ensemble by any other means (e.g., rebuilding `_ensemble` manually), call `teardown()` yourself:
+```python
+removed = pm.prune(bottom_k=2)   # teardown() is called automatically
+# If removing manually:
+agent.teardown()
+```
+For long PBT-style runs with frequent spawning, monitor GPU memory with `torch.cuda.memory_allocated()`. If memory grows despite pruning, the likely cause is optimizer state — gradient moments accumulate per parameter. Re-creating the optimizer on each spawn (as the built-in `agent_factory` pattern does) avoids this.
+### On-policy vs. off-policy rollback
+`rollback_on_degradation=True` is safe for PPO but harmful for DQN and SAC. Off-policy agents accumulate experience in a replay buffer across many policies. Rolling back weights without clearing the buffer means the restored policy immediately trains on transitions it never generated — corrupted Bellman targets drag it back down.
+The framework handles this automatically: any agent where `is_on_policy` returns `False` skips the weight rollback even when `rollback_on_degradation=True`. If you write a custom off-policy agent, override the property:
+```python
+@property
+def is_on_policy(self) -> bool:
+    return False
+```
+### Wiring PolicyManager early stopping
+`PolicyManager.as_callback()` returns a `PolicyManagerCallback` that stops training when the spawn budget is exhausted — but only if you wire the stop function:
+```python
+pm_cb = pm.as_callback(agent, agent_factory=my_factory)
+rl_opt = RLOptimizer(...)
+pm_cb.set_stop_fn(rl_opt.stop)   # required — without this, training runs the full budget
+rl_opt.add_callback(pm_cb)
+rl_opt.run()
+```
+Without `set_stop_fn`, the callback prints the training report when the budget runs out but cannot halt the loop. Training continues until `max_episodes` is reached.
+For the factory-mode PPO path (where `agent_factory` is passed to `RLOptimizer` and `pm_cb` is created inside the factory), wire the stop function inside the factory — `rl_opt` is already bound in the enclosing scope by the time the factory is called:
+```python
+def agent_factory_full(params):
+    agent = make_agent(params)
+    pm_cb = pm.as_callback(agent, agent_factory=lambda: make_agent(params))
+    pm_cb.set_stop_fn(rl_opt.stop)   # rl_opt is bound before run() calls this factory
+    rl_opt.add_callback(pm_cb)
+    return agent
+rl_opt = RLOptimizer(agent_factory=agent_factory_full, ...)
+rl_opt.run()
+```
+### Checkpoint directory hygiene
+Each run writes checkpoints to `checkpoint_dir`. If you reuse the same directory across restarts without clearing it, `CheckpointRegistry` will load stale snapshots from a previous run and roll back to them during training. Either pass a unique directory per run (include seed and timestamp) or call `shutil.rmtree(ckpt_dir, ignore_errors=True)` at the start of each run.
+### State dict key mismatches during weight averaging / spawning
+`average_weights()` and `load_weights()` use PyTorch `state_dict` keys. If the architecture passed to a spawned agent shell differs from the one that was checkpointed (different layer names, sizes, or number of layers), `load_state_dict()` will raise a `RuntimeError` with a key mismatch message. The framework does not catch this — it is user responsibility to pass a compatible shell. The safest pattern is to use the same `agent_factory` for both the primary agent and all spawned variants.
+---
 ## Math & Science Reference
 ### SPSA Gradient Estimate (`SPSAOptimizer`)

{tensor_optix-1.2.2 → tensor_optix-1.2.4}/tensor_optix.egg-info/requires.txt RENAMED Viewed

@@ -1,8 +1,9 @@
 tensorflow>=2.18.0
-gymnasium>=1.0.0
+gymnasium[box2d]>=1.0.0
 numpy>=1.24.0
 matplotlib>=3.7.0
 optuna>=3.0.0
+swig>=4.4.1
 [all]
 torch>=2.0.0