boltcrypt 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- boltcrypt-0.1.0/PKG-INFO +125 -0
- boltcrypt-0.1.0/README.md +105 -0
- boltcrypt-0.1.0/boltcrypt/__init__.py +7 -0
- boltcrypt-0.1.0/boltcrypt/examples/tabular_q.py +96 -0
- boltcrypt-0.1.0/boltcrypt/game/boltcrypt_game.py +136 -0
- boltcrypt-0.1.0/boltcrypt/wrapper/__init__.py +3 -0
- boltcrypt-0.1.0/boltcrypt/wrapper/fogofwar.py +22 -0
- boltcrypt-0.1.0/boltcrypt/wrapper/natlang.py +200 -0
- boltcrypt-0.1.0/boltcrypt/wrapper/roomdiscoveryreward.py +24 -0
- boltcrypt-0.1.0/boltcrypt.egg-info/PKG-INFO +125 -0
- boltcrypt-0.1.0/boltcrypt.egg-info/SOURCES.txt +16 -0
- boltcrypt-0.1.0/boltcrypt.egg-info/dependency_links.txt +1 -0
- boltcrypt-0.1.0/boltcrypt.egg-info/entry_points.txt +5 -0
- boltcrypt-0.1.0/boltcrypt.egg-info/requires.txt +2 -0
- boltcrypt-0.1.0/boltcrypt.egg-info/top_level.txt +1 -0
- boltcrypt-0.1.0/license.txt +7 -0
- boltcrypt-0.1.0/setup.cfg +4 -0
- boltcrypt-0.1.0/setup.py +22 -0
boltcrypt-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: boltcrypt
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Boltcrypt environment
|
|
5
|
+
Author: foreverska
|
|
6
|
+
Project-URL: Github:, https://github.com/foreverska/boltcrypt
|
|
7
|
+
Keywords: gymnasium,gym
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
License-File: license.txt
|
|
10
|
+
Requires-Dist: gymnasium>=1.0.0
|
|
11
|
+
Requires-Dist: numpy
|
|
12
|
+
Dynamic: author
|
|
13
|
+
Dynamic: description
|
|
14
|
+
Dynamic: description-content-type
|
|
15
|
+
Dynamic: keywords
|
|
16
|
+
Dynamic: license-file
|
|
17
|
+
Dynamic: project-url
|
|
18
|
+
Dynamic: requires-dist
|
|
19
|
+
Dynamic: summary
|
|
20
|
+
|
|
21
|
+
# BoltCrypt: Procedural Dungeon RL Environment #
|
|
22
|
+
BoltCrypt is a lightweight, OpenAI Gymnasium-compatible environment featuring procedurally generated dungeons. It challenges Reinforcement Learning agents (and humans!) to navigate complex layouts, solve sokoban-style boulder puzzles, and manage inventory items like keys to reach the exit.
|
|
23
|
+
## π° Features ##
|
|
24
|
+
Procedural Generation: Every reset generates a unique dungeon layout based on configurable parameters (density, connectivity, room size).
|
|
25
|
+
Puzzle Mechanics: Includes boulder-pushing puzzles and locked doors that require finding a key.
|
|
26
|
+
Gymnasium API: Fully compatible with standard RL workflows.
|
|
27
|
+
Pygame Visualization: A built-in harness to play manually or watch your agent learn in real-time.
|
|
28
|
+
Flexible Observation Space: Provides local room grids, global coordinates, and inventory status.
|
|
29
|
+
π Installation
|
|
30
|
+
Since this project uses condavenv, ensure you have your environment active:
|
|
31
|
+
``` bash
|
|
32
|
+
# Example if using conda directly
|
|
33
|
+
conda activate <your-env-name>
|
|
34
|
+
pip install gymnasium pygame numpy matplotlib boltcrypt
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## π Getting Started ##
|
|
38
|
+
### Play Manually ###
|
|
39
|
+
Test the dungeon generation and mechanics yourself using the Pygame harness:
|
|
40
|
+
``` bash
|
|
41
|
+
python boltcrypt_game.py
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Arrows:** Move the agent.
|
|
45
|
+
**R:** Reset/Regenerate the dungeon.
|
|
46
|
+
**Goal:** Find the key (if required) and reach the green Exit tile.
|
|
47
|
+
|
|
48
|
+
### Train an Agent ###
|
|
49
|
+
The project includes a tabular Q-Learning implementation to demonstrate how an agent can "memorize" a specific dungeon layout:
|
|
50
|
+
``` bash
|
|
51
|
+
python tabular_q.py
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## βοΈ Configuration ##
|
|
55
|
+
|
|
56
|
+
The DungeonGenerator and BoltCrypt environment can be customized via a config dictionary:
|
|
57
|
+
|
|
58
|
+
### Parameter Description ###
|
|
59
|
+
**min_dist:** Minimum Manhattan distance between Start and Exit.
|
|
60
|
+
**mean_rooms:** Average number of rooms in the dungeon.
|
|
61
|
+
**connectivity:** Probability of creating loops between rooms (0.0 = Tree, 1.0 = Highly connected).
|
|
62
|
+
**puzzle_density:** Chance of a room containing a boulder puzzle.
|
|
63
|
+
**key_puzzle_prob:** Chance that the exit is locked and a key is hidden in a leaf room.
|
|
64
|
+
|
|
65
|
+
|
|
66
|
+
### πΊ Tile Legend ###
|
|
67
|
+
β¬ Floor: Walkable space.
|
|
68
|
+
β¬ Wall: Impassable.
|
|
69
|
+
πͺ Door: Transitions between rooms (may be locked by puzzles).
|
|
70
|
+
π© Exit: Your goal!
|
|
71
|
+
π΄ Switch: Target for boulders.
|
|
72
|
+
π€ Boulder: Can be pushed onto switches.
|
|
73
|
+
π‘ Key: Required to open locked exit rooms.
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
### π€ Observation Space ###
|
|
77
|
+
The environment returns a dictionary:
|
|
78
|
+
**grid:** A 10x10 local view of the current room.
|
|
79
|
+
**agent_pos:** (x, y) coordinates within the room.
|
|
80
|
+
**global_pos:** (gx, gy) coordinates in the dungeon layout.
|
|
81
|
+
**inventory:** Binary flag (1 if holding a key)
|
|
82
|
+
|
|
83
|
+
## π¦ Wrappers & Extensions ##
|
|
84
|
+
BoltCrypt includes several gym.Wrapper implementations to modify observations or rewards, making it a versatile
|
|
85
|
+
testbed for different RL paradigms.
|
|
86
|
+
|
|
87
|
+
### π Natural Language Wrapper (NaturalLanguage) ####
|
|
88
|
+
The crown jewel for testing Reasoning LLMs. This wrapper transforms the numeric observation space into a rich,
|
|
89
|
+
descriptive narrative. Instead of a grid, the agent receives a text-based description of its surroundings.
|
|
90
|
+
**Dynamic Narrative:** Provides room dimensions, relative positions of doors, boulder locations, and puzzle statuses
|
|
91
|
+
(e.g., "A loud mechanical clank echoes! The doors unlock.").
|
|
92
|
+
**LLM Ready:** Accepts string inputs like "NORTH", "SOUTH", "EAST", or "WEST" in the step() function.
|
|
93
|
+
**Physics Logic:** Includes an "Adventurer's Manual" to explain game rules to an LLM via the observation stream.
|
|
94
|
+
|
|
95
|
+
### π«οΈ Fog of War (FogOfWarWrapper) ###
|
|
96
|
+
Transforms the global room view into a partially observable environment.
|
|
97
|
+
**Vision Range:** Limits the grid observation to a (2v+1) \times (2v+1) window centered on the agent.
|
|
98
|
+
**Memory Challenge:** Forces agents to map the room internally rather than having perfect spatial information.
|
|
99
|
+
|
|
100
|
+
### π Room Discovery Reward (RoomDiscoveryReward) ###
|
|
101
|
+
Combats sparse rewards in large dungeons by incentivizing exploration.
|
|
102
|
+
**Exploration Bonus:** Grants a small configurable reward (e.g., +0.1) the first time the agent enters a new room in the dungeon.
|
|
103
|
+
**Global Navigation:** Helps agents learn the layout of the "macro-dungeon" before theyβve found the final exit.
|
|
104
|
+
|
|
105
|
+
π Usage Example
|
|
106
|
+
You can stack wrappers to create complex experimental setups:
|
|
107
|
+
``` python
|
|
108
|
+
import gymnasium as gym
|
|
109
|
+
from boltcrypt.env import BoltCrypt
|
|
110
|
+
from boltcrypt.wrapper import NaturalLanguage, RoomDiscoveryReward
|
|
111
|
+
|
|
112
|
+
env = BoltCrypt()
|
|
113
|
+
env = RoomDiscoveryReward(env, discovery_reward=0.5)
|
|
114
|
+
env = NaturalLanguage(env)
|
|
115
|
+
|
|
116
|
+
# Now the agent receives text and extra rewards for exploration!
|
|
117
|
+
obs, info = env.reset()
|
|
118
|
+
print(obs)
|
|
119
|
+
|
|
120
|
+
action = "NORTH"
|
|
121
|
+
obs, reward, done, trunc, info = env.step(action)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
|
|
125
|
+
Happy Dungeon Crawling! ποΈπΉ
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# BoltCrypt: Procedural Dungeon RL Environment #
|
|
2
|
+
BoltCrypt is a lightweight, OpenAI Gymnasium-compatible environment featuring procedurally generated dungeons. It challenges Reinforcement Learning agents (and humans!) to navigate complex layouts, solve sokoban-style boulder puzzles, and manage inventory items like keys to reach the exit.
|
|
3
|
+
## π° Features ##
|
|
4
|
+
Procedural Generation: Every reset generates a unique dungeon layout based on configurable parameters (density, connectivity, room size).
|
|
5
|
+
Puzzle Mechanics: Includes boulder-pushing puzzles and locked doors that require finding a key.
|
|
6
|
+
Gymnasium API: Fully compatible with standard RL workflows.
|
|
7
|
+
Pygame Visualization: A built-in harness to play manually or watch your agent learn in real-time.
|
|
8
|
+
Flexible Observation Space: Provides local room grids, global coordinates, and inventory status.
|
|
9
|
+
π Installation
|
|
10
|
+
Since this project uses condavenv, ensure you have your environment active:
|
|
11
|
+
``` bash
|
|
12
|
+
# Example if using conda directly
|
|
13
|
+
conda activate <your-env-name>
|
|
14
|
+
pip install gymnasium pygame numpy matplotlib boltcrypt
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## π Getting Started ##
|
|
18
|
+
### Play Manually ###
|
|
19
|
+
Test the dungeon generation and mechanics yourself using the Pygame harness:
|
|
20
|
+
``` bash
|
|
21
|
+
python boltcrypt_game.py
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Arrows:** Move the agent.
|
|
25
|
+
**R:** Reset/Regenerate the dungeon.
|
|
26
|
+
**Goal:** Find the key (if required) and reach the green Exit tile.
|
|
27
|
+
|
|
28
|
+
### Train an Agent ###
|
|
29
|
+
The project includes a tabular Q-Learning implementation to demonstrate how an agent can "memorize" a specific dungeon layout:
|
|
30
|
+
``` bash
|
|
31
|
+
python tabular_q.py
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## βοΈ Configuration ##
|
|
35
|
+
|
|
36
|
+
The DungeonGenerator and BoltCrypt environment can be customized via a config dictionary:
|
|
37
|
+
|
|
38
|
+
### Parameter Description ###
|
|
39
|
+
**min_dist:** Minimum Manhattan distance between Start and Exit.
|
|
40
|
+
**mean_rooms:** Average number of rooms in the dungeon.
|
|
41
|
+
**connectivity:** Probability of creating loops between rooms (0.0 = Tree, 1.0 = Highly connected).
|
|
42
|
+
**puzzle_density:** Chance of a room containing a boulder puzzle.
|
|
43
|
+
**key_puzzle_prob:** Chance that the exit is locked and a key is hidden in a leaf room.
|
|
44
|
+
|
|
45
|
+
|
|
46
|
+
### πΊ Tile Legend ###
|
|
47
|
+
β¬ Floor: Walkable space.
|
|
48
|
+
β¬ Wall: Impassable.
|
|
49
|
+
πͺ Door: Transitions between rooms (may be locked by puzzles).
|
|
50
|
+
π© Exit: Your goal!
|
|
51
|
+
π΄ Switch: Target for boulders.
|
|
52
|
+
π€ Boulder: Can be pushed onto switches.
|
|
53
|
+
π‘ Key: Required to open locked exit rooms.
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
### π€ Observation Space ###
|
|
57
|
+
The environment returns a dictionary:
|
|
58
|
+
**grid:** A 10x10 local view of the current room.
|
|
59
|
+
**agent_pos:** (x, y) coordinates within the room.
|
|
60
|
+
**global_pos:** (gx, gy) coordinates in the dungeon layout.
|
|
61
|
+
**inventory:** Binary flag (1 if holding a key)
|
|
62
|
+
|
|
63
|
+
## π¦ Wrappers & Extensions ##
|
|
64
|
+
BoltCrypt includes several gym.Wrapper implementations to modify observations or rewards, making it a versatile
|
|
65
|
+
testbed for different RL paradigms.
|
|
66
|
+
|
|
67
|
+
### π Natural Language Wrapper (NaturalLanguage) ####
|
|
68
|
+
The crown jewel for testing Reasoning LLMs. This wrapper transforms the numeric observation space into a rich,
|
|
69
|
+
descriptive narrative. Instead of a grid, the agent receives a text-based description of its surroundings.
|
|
70
|
+
**Dynamic Narrative:** Provides room dimensions, relative positions of doors, boulder locations, and puzzle statuses
|
|
71
|
+
(e.g., "A loud mechanical clank echoes! The doors unlock.").
|
|
72
|
+
**LLM Ready:** Accepts string inputs like "NORTH", "SOUTH", "EAST", or "WEST" in the step() function.
|
|
73
|
+
**Physics Logic:** Includes an "Adventurer's Manual" to explain game rules to an LLM via the observation stream.
|
|
74
|
+
|
|
75
|
+
### π«οΈ Fog of War (FogOfWarWrapper) ###
|
|
76
|
+
Transforms the global room view into a partially observable environment.
|
|
77
|
+
**Vision Range:** Limits the grid observation to a (2v+1) \times (2v+1) window centered on the agent.
|
|
78
|
+
**Memory Challenge:** Forces agents to map the room internally rather than having perfect spatial information.
|
|
79
|
+
|
|
80
|
+
### π Room Discovery Reward (RoomDiscoveryReward) ###
|
|
81
|
+
Combats sparse rewards in large dungeons by incentivizing exploration.
|
|
82
|
+
**Exploration Bonus:** Grants a small configurable reward (e.g., +0.1) the first time the agent enters a new room in the dungeon.
|
|
83
|
+
**Global Navigation:** Helps agents learn the layout of the "macro-dungeon" before theyβve found the final exit.
|
|
84
|
+
|
|
85
|
+
π Usage Example
|
|
86
|
+
You can stack wrappers to create complex experimental setups:
|
|
87
|
+
``` python
|
|
88
|
+
import gymnasium as gym
|
|
89
|
+
from boltcrypt.env import BoltCrypt
|
|
90
|
+
from boltcrypt.wrapper import NaturalLanguage, RoomDiscoveryReward
|
|
91
|
+
|
|
92
|
+
env = BoltCrypt()
|
|
93
|
+
env = RoomDiscoveryReward(env, discovery_reward=0.5)
|
|
94
|
+
env = NaturalLanguage(env)
|
|
95
|
+
|
|
96
|
+
# Now the agent receives text and extra rewards for exploration!
|
|
97
|
+
obs, info = env.reset()
|
|
98
|
+
print(obs)
|
|
99
|
+
|
|
100
|
+
action = "NORTH"
|
|
101
|
+
obs, reward, done, trunc, info = env.step(action)
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
Happy Dungeon Crawling! ποΈπΉ
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
import boltcrypt
|
|
2
|
+
import gymnasium as gym
|
|
3
|
+
|
|
4
|
+
import numpy as np
|
|
5
|
+
import random
|
|
6
|
+
import matplotlib.pyplot as plt
|
|
7
|
+
from collections import defaultdict
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def train_tabular_agent():
|
|
11
|
+
config = {
|
|
12
|
+
'min_dist': 5, # Small dungeon for fast learning
|
|
13
|
+
'mean_rooms': 10, # ~10 Rooms total
|
|
14
|
+
'std_rooms': 0, # Fixed size
|
|
15
|
+
'puzzle_density': 0.0, # No puzzles, just navigation
|
|
16
|
+
'connectivity': 0.2, # Mostly tree-like, few loops
|
|
17
|
+
'min_room_dim': 5,
|
|
18
|
+
'max_room_dim': 8
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
env = gym.make('BoltCrypt-v0', generator_config=config)
|
|
22
|
+
|
|
23
|
+
# 2. Hyperparameters
|
|
24
|
+
num_episodes = 1000
|
|
25
|
+
learning_rate = 0.1
|
|
26
|
+
discount_factor = 0.99
|
|
27
|
+
|
|
28
|
+
# Exploration (Epsilon Greedy)
|
|
29
|
+
epsilon = 1.0
|
|
30
|
+
epsilon_decay = 0.99
|
|
31
|
+
min_epsilon = 0.05
|
|
32
|
+
|
|
33
|
+
# The Q-Table
|
|
34
|
+
# Key: (global_x, global_y, local_x, local_y)
|
|
35
|
+
# Value: [Q_north, Q_south, Q_east, Q_west]
|
|
36
|
+
q_table = defaultdict(lambda: np.zeros(4))
|
|
37
|
+
|
|
38
|
+
episode_rewards = []
|
|
39
|
+
episode_lengths = []
|
|
40
|
+
|
|
41
|
+
print("--- Starting Training ---")
|
|
42
|
+
print(f"Map Config: Min Dist {config['min_dist']}, Total Rooms ~{config['mean_rooms']}")
|
|
43
|
+
|
|
44
|
+
for episode in range(num_episodes):
|
|
45
|
+
# Reset with FIXED SEED for every episode
|
|
46
|
+
# This ensures the map layout (Walls/Doors) never changes,
|
|
47
|
+
# allowing the agent to memorize the route.
|
|
48
|
+
obs, _ = env.reset(seed=42)
|
|
49
|
+
|
|
50
|
+
gx, gy = obs['global_pos']
|
|
51
|
+
lx, ly = obs['agent_pos']
|
|
52
|
+
state = (gx, gy, lx, ly)
|
|
53
|
+
|
|
54
|
+
total_reward = 0
|
|
55
|
+
done = False
|
|
56
|
+
steps = 0
|
|
57
|
+
|
|
58
|
+
while not done:
|
|
59
|
+
if random.random() < epsilon:
|
|
60
|
+
action = env.action_space.sample() # Explore
|
|
61
|
+
else:
|
|
62
|
+
action = np.argmax(q_table[state]) # Exploit
|
|
63
|
+
|
|
64
|
+
next_obs, reward, done, trunc, _ = env.step(action)
|
|
65
|
+
|
|
66
|
+
next_gx, next_gy = next_obs['global_pos']
|
|
67
|
+
next_lx, next_ly = next_obs['agent_pos']
|
|
68
|
+
next_state = (next_gx, next_gy, next_lx, next_ly)
|
|
69
|
+
|
|
70
|
+
best_next_q = np.max(q_table[next_state])
|
|
71
|
+
current_q = q_table[state][action]
|
|
72
|
+
|
|
73
|
+
q_table[state][action] = current_q + learning_rate * (reward + discount_factor * best_next_q - current_q)
|
|
74
|
+
|
|
75
|
+
state = next_state
|
|
76
|
+
total_reward += reward
|
|
77
|
+
steps += 1
|
|
78
|
+
|
|
79
|
+
if trunc: break
|
|
80
|
+
|
|
81
|
+
epsilon = max(min_epsilon, epsilon * epsilon_decay)
|
|
82
|
+
|
|
83
|
+
episode_rewards.append(total_reward)
|
|
84
|
+
episode_lengths.append(steps)
|
|
85
|
+
|
|
86
|
+
if (episode + 1) % 50 == 0:
|
|
87
|
+
avg_rew = np.mean(episode_rewards[-50:])
|
|
88
|
+
avg_len = np.mean(episode_lengths[-50:])
|
|
89
|
+
print(
|
|
90
|
+
f"Episode {episode + 1:03d} | Avg Reward: {avg_rew:6.2f} | Avg Steps: {avg_len:6.1f} | Epsilon: {epsilon:.2f}")
|
|
91
|
+
|
|
92
|
+
return episode_rewards, episode_lengths
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
if __name__ == "__main__":
|
|
96
|
+
rewards, lengths = train_tabular_agent()
|
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
import pygame
|
|
2
|
+
from boltcrypt.env import BoltCrypt
|
|
3
|
+
|
|
4
|
+
# --- PYGAME HARNESS ---
|
|
5
|
+
# Update Constants for Visualization
|
|
6
|
+
COLORS = {
|
|
7
|
+
0: (230, 215, 180), # Floor
|
|
8
|
+
1: (40, 40, 40), # Wall
|
|
9
|
+
2: (139, 69, 19), # Door
|
|
10
|
+
3: (0, 255, 0), # Exit
|
|
11
|
+
4: (200, 50, 50), # Switch
|
|
12
|
+
5: (100, 80, 50), # Boulder
|
|
13
|
+
6: (255, 215, 0), # KEY (Gold)
|
|
14
|
+
'AGENT': (50, 100, 200),
|
|
15
|
+
'BG': (20, 20, 20),
|
|
16
|
+
'TEXT': (255, 255, 255)
|
|
17
|
+
}
|
|
18
|
+
|
|
19
|
+
def render_gym(screen, font, env, obs, reward, done, text_status):
|
|
20
|
+
screen.fill(COLORS['BG'])
|
|
21
|
+
grid = obs['grid']
|
|
22
|
+
agent_pos = obs['agent_pos']
|
|
23
|
+
global_pos = obs['global_pos']
|
|
24
|
+
has_key = obs['inventory']
|
|
25
|
+
|
|
26
|
+
# Draw Map
|
|
27
|
+
rows, cols = grid.shape
|
|
28
|
+
TILE_SIZE = 48
|
|
29
|
+
OFFSET_X, OFFSET_Y = 50, 50
|
|
30
|
+
|
|
31
|
+
for y in range(rows):
|
|
32
|
+
for x in range(cols):
|
|
33
|
+
rect = pygame.Rect(OFFSET_X + x*TILE_SIZE, OFFSET_Y + (rows-1-y)*TILE_SIZE, TILE_SIZE, TILE_SIZE)
|
|
34
|
+
tile_id = grid[y, x]
|
|
35
|
+
|
|
36
|
+
# Void vs Room
|
|
37
|
+
if x >= env.curr_room.w or y >= env.curr_room.h:
|
|
38
|
+
pygame.draw.rect(screen, (0,0,0), rect)
|
|
39
|
+
else:
|
|
40
|
+
if tile_id == 5:
|
|
41
|
+
pygame.draw.circle(screen, COLORS.get(tile_id), rect.center, 16)
|
|
42
|
+
else:
|
|
43
|
+
color = COLORS.get(tile_id, (255, 0, 255))
|
|
44
|
+
pygame.draw.rect(screen, color, rect)
|
|
45
|
+
pygame.draw.rect(screen, (0,0,0), rect, 1)
|
|
46
|
+
|
|
47
|
+
if tile_id == 6:
|
|
48
|
+
pygame.draw.circle(screen, (255, 255, 0), rect.center, 10)
|
|
49
|
+
|
|
50
|
+
# Draw Agent
|
|
51
|
+
ax, ay = agent_pos
|
|
52
|
+
agent_rect = pygame.Rect(OFFSET_X + ax*TILE_SIZE, OFFSET_Y + (rows-1-ay)*TILE_SIZE, TILE_SIZE, TILE_SIZE)
|
|
53
|
+
pygame.draw.circle(screen, COLORS['AGENT'], agent_rect.center, 16)
|
|
54
|
+
|
|
55
|
+
# HUD
|
|
56
|
+
status = "SOLVED" if env.curr_room.check_solved() else "PUZZLE ACTIVE"
|
|
57
|
+
inv_text = "KEY: [YES]" if has_key else "KEY: [NO]"
|
|
58
|
+
inv_col = (0, 255, 0) if has_key else (150, 150, 150)
|
|
59
|
+
|
|
60
|
+
# Locked Room Warning
|
|
61
|
+
is_locked_room = env.curr_room.is_locked
|
|
62
|
+
room_type = "LOCKED EXIT (Need Key)" if is_locked_room else "Normal Room"
|
|
63
|
+
|
|
64
|
+
lines = [
|
|
65
|
+
f"Pos: {global_pos} | Local: {agent_pos}",
|
|
66
|
+
f"Room Status: {status}",
|
|
67
|
+
f"Room Type: {room_type}",
|
|
68
|
+
f"Inventory: {inv_text}",
|
|
69
|
+
f"Reward: {reward:.2f}",
|
|
70
|
+
f"{text_status}"
|
|
71
|
+
]
|
|
72
|
+
|
|
73
|
+
for i, line in enumerate(lines):
|
|
74
|
+
col = COLORS['TEXT']
|
|
75
|
+
if "Inventory" in line: col = inv_col
|
|
76
|
+
if "LOCKED EXIT" in line: col = (255, 50, 50)
|
|
77
|
+
|
|
78
|
+
surf = font.render(line, True, col)
|
|
79
|
+
screen.blit(surf, (10, 600 + i*25))
|
|
80
|
+
|
|
81
|
+
def play_dungeon():
|
|
82
|
+
# Config with Key Puzzle Enabled
|
|
83
|
+
config = {
|
|
84
|
+
'min_dist': 3,
|
|
85
|
+
'mean_rooms': 10,
|
|
86
|
+
'std_rooms': 2,
|
|
87
|
+
'puzzle_density': 0.3, # Disable boulder puzzles to focus on Key testing
|
|
88
|
+
'key_puzzle_prob': 0.25,
|
|
89
|
+
'min_room_dim': 5,
|
|
90
|
+
'max_room_dim': 8
|
|
91
|
+
}
|
|
92
|
+
env = BoltCrypt(generator_config=config)
|
|
93
|
+
obs, _ = env.reset()
|
|
94
|
+
total_reward = 0
|
|
95
|
+
|
|
96
|
+
pygame.init()
|
|
97
|
+
screen = pygame.display.set_mode((600, 750))
|
|
98
|
+
pygame.display.set_caption("Boltcrypt")
|
|
99
|
+
font = pygame.font.Font(None, 24)
|
|
100
|
+
clock = pygame.time.Clock()
|
|
101
|
+
|
|
102
|
+
running = True
|
|
103
|
+
done = False
|
|
104
|
+
|
|
105
|
+
while running:
|
|
106
|
+
action = None
|
|
107
|
+
for event in pygame.event.get():
|
|
108
|
+
if event.type == pygame.QUIT: running = False
|
|
109
|
+
if event.type == pygame.KEYDOWN:
|
|
110
|
+
if event.key == pygame.K_r:
|
|
111
|
+
obs, _ = env.reset()
|
|
112
|
+
total_reward = 0
|
|
113
|
+
done = False
|
|
114
|
+
if not done:
|
|
115
|
+
if event.key == pygame.K_UP: action = 0
|
|
116
|
+
elif event.key == pygame.K_DOWN: action = 1
|
|
117
|
+
elif event.key == pygame.K_RIGHT: action = 2
|
|
118
|
+
elif event.key == pygame.K_LEFT: action = 3
|
|
119
|
+
|
|
120
|
+
if not done:
|
|
121
|
+
status = "Press R to Restart"
|
|
122
|
+
else:
|
|
123
|
+
status = "YOU ESCAPED! Press R to Restart"
|
|
124
|
+
if action is not None:
|
|
125
|
+
obs, reward, done, trunc, info = env.step(action)
|
|
126
|
+
total_reward += reward
|
|
127
|
+
if reward == 1.0 and obs['inventory'] == 1: status = "π KEY FOUND!"
|
|
128
|
+
|
|
129
|
+
render_gym(screen, font, env, obs, total_reward, done, status)
|
|
130
|
+
pygame.display.flip()
|
|
131
|
+
clock.tick(30)
|
|
132
|
+
|
|
133
|
+
pygame.quit()
|
|
134
|
+
|
|
135
|
+
if __name__ == "__main__":
|
|
136
|
+
play_dungeon()
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
import gymnasium as gym
|
|
2
|
+
|
|
3
|
+
from boltcrypt.env.boltgym import TILE_WALL
|
|
4
|
+
|
|
5
|
+
class FogOfWarWrapper(gym.ObservationWrapper):
|
|
6
|
+
def __init__(self, env, vision_range=1):
|
|
7
|
+
super().__init__(env)
|
|
8
|
+
self.vision_range = vision_range
|
|
9
|
+
|
|
10
|
+
def observation(self, obs):
|
|
11
|
+
grid = obs['grid']
|
|
12
|
+
lx, ly = obs['agent_pos']
|
|
13
|
+
|
|
14
|
+
# Pad grid to handle edges
|
|
15
|
+
padded = np.pad(grid, self.vision_range, constant_values=TILE_WALL)
|
|
16
|
+
|
|
17
|
+
# Slice the window (adjusting for padding)
|
|
18
|
+
v = self.vision_range
|
|
19
|
+
window = padded[ly:ly + 2*v + 1, lx:lx + 2*v + 1]
|
|
20
|
+
|
|
21
|
+
obs['grid'] = window
|
|
22
|
+
return obs
|
|
@@ -0,0 +1,200 @@
|
|
|
1
|
+
import gymnasium as gym
|
|
2
|
+
|
|
3
|
+
class NaturalLanguage(gym.Wrapper):
|
|
4
|
+
def __init__(self, env):
|
|
5
|
+
super().__init__(env)
|
|
6
|
+
self.last_gx, self.last_gy = None, None
|
|
7
|
+
self.last_solved = True
|
|
8
|
+
self.last_boulder_positions = []
|
|
9
|
+
self.entry_direction = None
|
|
10
|
+
self.last_obs = None
|
|
11
|
+
|
|
12
|
+
# Help text to be displayed once or upon request
|
|
13
|
+
self.manual_text = (
|
|
14
|
+
"\n--- ADVENTURER'S MANUAL ---\n"
|
|
15
|
+
"COMMANDS: This interface is primitive. It only accepts 4 commands:\n"
|
|
16
|
+
" 'NORTH', 'SOUTH', 'EAST', 'WEST'\n\n"
|
|
17
|
+
"PHYSICS:\n"
|
|
18
|
+
"- Movement: Typing a direction moves you 1 tile.\n"
|
|
19
|
+
"- Pushing: To move a boulder, walk INTO it. You will push it 1 tile forward.\n"
|
|
20
|
+
"- Constraints: Boulders are heavy. You cannot push them against walls or corners.\n"
|
|
21
|
+
"- Door Locks: If a room has switches, all doors stay locked until every switch is covered.\n"
|
|
22
|
+
"---------------------------\n"
|
|
23
|
+
)
|
|
24
|
+
|
|
25
|
+
def reset(self, **kwargs):
|
|
26
|
+
obs, info = self.env.reset(**kwargs)
|
|
27
|
+
|
|
28
|
+
self.last_gx, self.last_gy = obs['global_pos']
|
|
29
|
+
self.last_solved = self.env.curr_room.check_solved()
|
|
30
|
+
self.last_boulder_positions = [list(b) for b in self.env.curr_room.boulders]
|
|
31
|
+
self.entry_direction = "Start"
|
|
32
|
+
self.last_obs = obs
|
|
33
|
+
|
|
34
|
+
# Narrative initialization
|
|
35
|
+
objective_text = (
|
|
36
|
+
"MISSION: Escape the Labyrinth. Find the EXIT room located at a great distance.\n"
|
|
37
|
+
f"{self.manual_text}"
|
|
38
|
+
"You wake up on the cold floor. The journey begins."
|
|
39
|
+
)
|
|
40
|
+
|
|
41
|
+
return self._generate_narrative(obs, 0, False, objective_text), info
|
|
42
|
+
|
|
43
|
+
def step(self, action_input: str|int):
|
|
44
|
+
# Handle string input for LLMs
|
|
45
|
+
if isinstance(action_input, str):
|
|
46
|
+
cmd = action_input.lower().strip()
|
|
47
|
+
if cmd == "north": action = 0
|
|
48
|
+
elif cmd == "south": action = 1
|
|
49
|
+
elif cmd == "east": action = 2
|
|
50
|
+
elif cmd == "west": action = 3
|
|
51
|
+
elif cmd == "help":
|
|
52
|
+
return self._generate_narrative(self.last_obs, 0, False, self.manual_text), 0, False, False, {}
|
|
53
|
+
else:
|
|
54
|
+
return "Unknown command. Use NORTH, SOUTH, EAST, WEST, or HELP.", 0, False, False, {}
|
|
55
|
+
else:
|
|
56
|
+
action = action_input
|
|
57
|
+
|
|
58
|
+
# 1. Pre-Step State Capture
|
|
59
|
+
prev_room = self.env.curr_room
|
|
60
|
+
prev_boulders = [list(b) for b in prev_room.boulders]
|
|
61
|
+
|
|
62
|
+
# 2. Execute Step
|
|
63
|
+
obs, reward, done, trunc, info = self.env.step(action)
|
|
64
|
+
|
|
65
|
+
# 3. Analyze What Happened
|
|
66
|
+
event_log = []
|
|
67
|
+
|
|
68
|
+
# A. Room Transition Check
|
|
69
|
+
curr_gx, curr_gy = obs['global_pos']
|
|
70
|
+
just_entered_room = (curr_gx != self.last_gx or curr_gy != self.last_gy)
|
|
71
|
+
|
|
72
|
+
if just_entered_room:
|
|
73
|
+
# Determine Entry Direction
|
|
74
|
+
dx = curr_gx - self.last_gx
|
|
75
|
+
dy = curr_gy - self.last_gy
|
|
76
|
+
if dy == 1: self.entry_direction = "South" # Moved North, entered from South
|
|
77
|
+
elif dy == -1: self.entry_direction = "North"
|
|
78
|
+
elif dx == 1: self.entry_direction = "West"
|
|
79
|
+
elif dx == -1: self.entry_direction = "East"
|
|
80
|
+
|
|
81
|
+
event_log.append(f"You pass through the door and enter a new room from the {self.entry_direction}.")
|
|
82
|
+
|
|
83
|
+
# Reset room-specific memory
|
|
84
|
+
self.last_gx, self.last_gy = curr_gx, curr_gy
|
|
85
|
+
self.last_boulder_positions = [list(b) for b in self.env.curr_room.boulders]
|
|
86
|
+
self.last_solved = self.env.curr_room.check_solved()
|
|
87
|
+
|
|
88
|
+
else:
|
|
89
|
+
# B. Push/Blocked Check
|
|
90
|
+
# Check if we moved?
|
|
91
|
+
# Check if a boulder moved?
|
|
92
|
+
curr_boulders = self.env.curr_room.boulders
|
|
93
|
+
boulder_moved = False
|
|
94
|
+
|
|
95
|
+
for i, b_new in enumerate(curr_boulders):
|
|
96
|
+
if b_new != prev_boulders[i]:
|
|
97
|
+
# Boulder Moved!
|
|
98
|
+
boulder_moved = True
|
|
99
|
+
# Check if it landed on a switch
|
|
100
|
+
on_switch = tuple(b_new) in self.env.curr_room.switches
|
|
101
|
+
click = " *CLICK!*" if on_switch else ""
|
|
102
|
+
event_log.append(f"You push the heavy boulder. It grinds across the floor.{click}")
|
|
103
|
+
break
|
|
104
|
+
|
|
105
|
+
if not boulder_moved and reward < 0:
|
|
106
|
+
# Heuristic: Negative reward usually implies wall bonk or time waste
|
|
107
|
+
# But we strictly want 'Wall Bump' detection.
|
|
108
|
+
# We can infer it if agent pos didn't change AND action was taken
|
|
109
|
+
pass # (Simplified for brevity, usually LLM can infer from 'You are at same pos')
|
|
110
|
+
|
|
111
|
+
# C. Puzzle Status Change
|
|
112
|
+
is_now_solved = self.env.curr_room.check_solved()
|
|
113
|
+
if is_now_solved and not self.last_solved:
|
|
114
|
+
event_log.append("A loud mechanical clank echoes! The doors unlock.")
|
|
115
|
+
elif not is_now_solved and self.last_solved and self.env.curr_room.has_puzzle:
|
|
116
|
+
event_log.append("The mechanism disengages. The doors slam shut!")
|
|
117
|
+
|
|
118
|
+
self.last_solved = is_now_solved
|
|
119
|
+
self.last_boulder_positions = [list(b) for b in self.env.curr_room.boulders]
|
|
120
|
+
|
|
121
|
+
# 4. Generate Full Text
|
|
122
|
+
text_obs = self._generate_narrative(obs, reward, done, " ".join(event_log))
|
|
123
|
+
|
|
124
|
+
return text_obs, reward, done, trunc, info
|
|
125
|
+
|
|
126
|
+
def _generate_narrative(self, obs, reward, done, event_text):
|
|
127
|
+
room = self.env.curr_room
|
|
128
|
+
lx, ly = obs['agent_pos']
|
|
129
|
+
|
|
130
|
+
# --- 1. The Header ---
|
|
131
|
+
puzzle_status = "SOLVED" if room.is_solved else "LOCKED"
|
|
132
|
+
narrative = [f"--- {puzzle_status} ROOM ({room.w}x{room.h}) ---"]
|
|
133
|
+
|
|
134
|
+
if event_text:
|
|
135
|
+
narrative.append(f"> {event_text}")
|
|
136
|
+
|
|
137
|
+
# --- 2. Relative Positioning (The "Where am I" Logic) ---
|
|
138
|
+
narrative.append(f"You are standing at coordinate ({lx}, {ly}).")
|
|
139
|
+
|
|
140
|
+
# List Doors relative to agent
|
|
141
|
+
doors_desc = []
|
|
142
|
+
for direction, offset in room.doors.items():
|
|
143
|
+
d_name = direction.name.title()
|
|
144
|
+
|
|
145
|
+
# Distance Calculation
|
|
146
|
+
# North (top), South (bottom), East (right), West (left)
|
|
147
|
+
# North door is at (offset, h-1)
|
|
148
|
+
dist_str = ""
|
|
149
|
+
if direction.name == "NORTH":
|
|
150
|
+
dist = (room.h - 1) - ly
|
|
151
|
+
dist_str = f"{dist} steps North"
|
|
152
|
+
elif direction.name == "SOUTH":
|
|
153
|
+
dist = ly
|
|
154
|
+
dist_str = f"{dist} steps South"
|
|
155
|
+
elif direction.name == "EAST":
|
|
156
|
+
dist = (room.w - 1) - lx
|
|
157
|
+
dist_str = f"{dist} steps East"
|
|
158
|
+
elif direction.name == "WEST":
|
|
159
|
+
dist = lx
|
|
160
|
+
dist_str = f"{dist} steps West"
|
|
161
|
+
|
|
162
|
+
state = "(OPEN)" if room.is_solved else "(LOCKED)"
|
|
163
|
+
doors_desc.append(f"- {d_name} Door: {dist_str} {state}")
|
|
164
|
+
|
|
165
|
+
narrative.append("Exits:")
|
|
166
|
+
narrative.extend(doors_desc)
|
|
167
|
+
|
|
168
|
+
# --- 3. Puzzle Elements ---
|
|
169
|
+
if room.has_puzzle:
|
|
170
|
+
narrative.append("Puzzle Elements:")
|
|
171
|
+
# Boulders
|
|
172
|
+
for i, b in enumerate(room.boulders):
|
|
173
|
+
bx, by = b[1], b[0] # Note: grid is (y,x), obs is usually (x,y) for text
|
|
174
|
+
|
|
175
|
+
# Check if on switch
|
|
176
|
+
on_switch = (by, bx) in room.switches
|
|
177
|
+
status = "sitting on a switch" if on_switch else "on the bare floor"
|
|
178
|
+
|
|
179
|
+
# Relative direction from agent
|
|
180
|
+
rel_dir = []
|
|
181
|
+
if by > ly: rel_dir.append("North")
|
|
182
|
+
elif by < ly: rel_dir.append("South")
|
|
183
|
+
if bx > lx: rel_dir.append("East")
|
|
184
|
+
elif bx < lx: rel_dir.append("West")
|
|
185
|
+
|
|
186
|
+
dir_str = "-".join(rel_dir) if rel_dir else "HERE"
|
|
187
|
+
narrative.append(f"- Boulder {i+1}: Located at ({bx}, {by}) [{dir_str} of you]. It is {status}.")
|
|
188
|
+
|
|
189
|
+
# Empty Switches
|
|
190
|
+
for s in room.switches:
|
|
191
|
+
sy, sx = s
|
|
192
|
+
# Is there a boulder on it?
|
|
193
|
+
covered = any(b[0] == sy and b[1] == sx for b in room.boulders)
|
|
194
|
+
if not covered:
|
|
195
|
+
narrative.append(f"- Empty Switch: Located at ({sx}, {sy}).")
|
|
196
|
+
|
|
197
|
+
if done:
|
|
198
|
+
narrative.append("\n*** YOU HAVE FOUND THE EXIT! ***")
|
|
199
|
+
|
|
200
|
+
return "\n".join(narrative)
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
import gymnasium as gym
|
|
2
|
+
|
|
3
|
+
class RoomDiscoveryReward(gym.Wrapper):
|
|
4
|
+
def __init__(self, env, discovery_reward=0.1):
|
|
5
|
+
super().__init__(env)
|
|
6
|
+
self.discovery_reward = discovery_reward
|
|
7
|
+
self.visited_rooms = set()
|
|
8
|
+
|
|
9
|
+
def reset(self, **kwargs):
|
|
10
|
+
obs, info = self.env.reset(**kwargs)
|
|
11
|
+
start_pos = tuple(obs['global_pos'])
|
|
12
|
+
self.visited_rooms = {start_pos}
|
|
13
|
+
return obs, info
|
|
14
|
+
|
|
15
|
+
def step(self, action):
|
|
16
|
+
obs, reward, terminated, truncated, info = self.env.step(action)
|
|
17
|
+
|
|
18
|
+
curr_pos = tuple(obs['global_pos'])
|
|
19
|
+
|
|
20
|
+
if curr_pos not in self.visited_rooms:
|
|
21
|
+
reward += self.discovery_reward
|
|
22
|
+
self.visited_rooms.add(curr_pos)
|
|
23
|
+
|
|
24
|
+
return obs, reward, terminated, truncated, info
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: boltcrypt
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Boltcrypt environment
|
|
5
|
+
Author: foreverska
|
|
6
|
+
Project-URL: Github:, https://github.com/foreverska/boltcrypt
|
|
7
|
+
Keywords: gymnasium,gym
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
License-File: license.txt
|
|
10
|
+
Requires-Dist: gymnasium>=1.0.0
|
|
11
|
+
Requires-Dist: numpy
|
|
12
|
+
Dynamic: author
|
|
13
|
+
Dynamic: description
|
|
14
|
+
Dynamic: description-content-type
|
|
15
|
+
Dynamic: keywords
|
|
16
|
+
Dynamic: license-file
|
|
17
|
+
Dynamic: project-url
|
|
18
|
+
Dynamic: requires-dist
|
|
19
|
+
Dynamic: summary
|
|
20
|
+
|
|
21
|
+
# BoltCrypt: Procedural Dungeon RL Environment #
|
|
22
|
+
BoltCrypt is a lightweight, OpenAI Gymnasium-compatible environment featuring procedurally generated dungeons. It challenges Reinforcement Learning agents (and humans!) to navigate complex layouts, solve sokoban-style boulder puzzles, and manage inventory items like keys to reach the exit.
|
|
23
|
+
## π° Features ##
|
|
24
|
+
Procedural Generation: Every reset generates a unique dungeon layout based on configurable parameters (density, connectivity, room size).
|
|
25
|
+
Puzzle Mechanics: Includes boulder-pushing puzzles and locked doors that require finding a key.
|
|
26
|
+
Gymnasium API: Fully compatible with standard RL workflows.
|
|
27
|
+
Pygame Visualization: A built-in harness to play manually or watch your agent learn in real-time.
|
|
28
|
+
Flexible Observation Space: Provides local room grids, global coordinates, and inventory status.
|
|
29
|
+
π Installation
|
|
30
|
+
Since this project uses condavenv, ensure you have your environment active:
|
|
31
|
+
``` bash
|
|
32
|
+
# Example if using conda directly
|
|
33
|
+
conda activate <your-env-name>
|
|
34
|
+
pip install gymnasium pygame numpy matplotlib boltcrypt
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## π Getting Started ##
|
|
38
|
+
### Play Manually ###
|
|
39
|
+
Test the dungeon generation and mechanics yourself using the Pygame harness:
|
|
40
|
+
``` bash
|
|
41
|
+
python boltcrypt_game.py
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Arrows:** Move the agent.
|
|
45
|
+
**R:** Reset/Regenerate the dungeon.
|
|
46
|
+
**Goal:** Find the key (if required) and reach the green Exit tile.
|
|
47
|
+
|
|
48
|
+
### Train an Agent ###
|
|
49
|
+
The project includes a tabular Q-Learning implementation to demonstrate how an agent can "memorize" a specific dungeon layout:
|
|
50
|
+
``` bash
|
|
51
|
+
python tabular_q.py
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## βοΈ Configuration ##
|
|
55
|
+
|
|
56
|
+
The DungeonGenerator and BoltCrypt environment can be customized via a config dictionary:
|
|
57
|
+
|
|
58
|
+
### Parameter Description ###
|
|
59
|
+
**min_dist:** Minimum Manhattan distance between Start and Exit.
|
|
60
|
+
**mean_rooms:** Average number of rooms in the dungeon.
|
|
61
|
+
**connectivity:** Probability of creating loops between rooms (0.0 = Tree, 1.0 = Highly connected).
|
|
62
|
+
**puzzle_density:** Chance of a room containing a boulder puzzle.
|
|
63
|
+
**key_puzzle_prob:** Chance that the exit is locked and a key is hidden in a leaf room.
|
|
64
|
+
|
|
65
|
+
|
|
66
|
+
### πΊ Tile Legend ###
|
|
67
|
+
β¬ Floor: Walkable space.
|
|
68
|
+
β¬ Wall: Impassable.
|
|
69
|
+
πͺ Door: Transitions between rooms (may be locked by puzzles).
|
|
70
|
+
π© Exit: Your goal!
|
|
71
|
+
π΄ Switch: Target for boulders.
|
|
72
|
+
π€ Boulder: Can be pushed onto switches.
|
|
73
|
+
π‘ Key: Required to open locked exit rooms.
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
### π€ Observation Space ###
|
|
77
|
+
The environment returns a dictionary:
|
|
78
|
+
**grid:** A 10x10 local view of the current room.
|
|
79
|
+
**agent_pos:** (x, y) coordinates within the room.
|
|
80
|
+
**global_pos:** (gx, gy) coordinates in the dungeon layout.
|
|
81
|
+
**inventory:** Binary flag (1 if holding a key)
|
|
82
|
+
|
|
83
|
+
## π¦ Wrappers & Extensions ##
|
|
84
|
+
BoltCrypt includes several gym.Wrapper implementations to modify observations or rewards, making it a versatile
|
|
85
|
+
testbed for different RL paradigms.
|
|
86
|
+
|
|
87
|
+
### π Natural Language Wrapper (NaturalLanguage) ####
|
|
88
|
+
The crown jewel for testing Reasoning LLMs. This wrapper transforms the numeric observation space into a rich,
|
|
89
|
+
descriptive narrative. Instead of a grid, the agent receives a text-based description of its surroundings.
|
|
90
|
+
**Dynamic Narrative:** Provides room dimensions, relative positions of doors, boulder locations, and puzzle statuses
|
|
91
|
+
(e.g., "A loud mechanical clank echoes! The doors unlock.").
|
|
92
|
+
**LLM Ready:** Accepts string inputs like "NORTH", "SOUTH", "EAST", or "WEST" in the step() function.
|
|
93
|
+
**Physics Logic:** Includes an "Adventurer's Manual" to explain game rules to an LLM via the observation stream.
|
|
94
|
+
|
|
95
|
+
### π«οΈ Fog of War (FogOfWarWrapper) ###
|
|
96
|
+
Transforms the global room view into a partially observable environment.
|
|
97
|
+
**Vision Range:** Limits the grid observation to a (2v+1) \times (2v+1) window centered on the agent.
|
|
98
|
+
**Memory Challenge:** Forces agents to map the room internally rather than having perfect spatial information.
|
|
99
|
+
|
|
100
|
+
### π Room Discovery Reward (RoomDiscoveryReward) ###
|
|
101
|
+
Combats sparse rewards in large dungeons by incentivizing exploration.
|
|
102
|
+
**Exploration Bonus:** Grants a small configurable reward (e.g., +0.1) the first time the agent enters a new room in the dungeon.
|
|
103
|
+
**Global Navigation:** Helps agents learn the layout of the "macro-dungeon" before theyβve found the final exit.
|
|
104
|
+
|
|
105
|
+
π Usage Example
|
|
106
|
+
You can stack wrappers to create complex experimental setups:
|
|
107
|
+
``` python
|
|
108
|
+
import gymnasium as gym
|
|
109
|
+
from boltcrypt.env import BoltCrypt
|
|
110
|
+
from boltcrypt.wrapper import NaturalLanguage, RoomDiscoveryReward
|
|
111
|
+
|
|
112
|
+
env = BoltCrypt()
|
|
113
|
+
env = RoomDiscoveryReward(env, discovery_reward=0.5)
|
|
114
|
+
env = NaturalLanguage(env)
|
|
115
|
+
|
|
116
|
+
# Now the agent receives text and extra rewards for exploration!
|
|
117
|
+
obs, info = env.reset()
|
|
118
|
+
print(obs)
|
|
119
|
+
|
|
120
|
+
action = "NORTH"
|
|
121
|
+
obs, reward, done, trunc, info = env.step(action)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
|
|
125
|
+
Happy Dungeon Crawling! ποΈπΉ
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
README.md
|
|
2
|
+
license.txt
|
|
3
|
+
setup.py
|
|
4
|
+
boltcrypt/__init__.py
|
|
5
|
+
boltcrypt.egg-info/PKG-INFO
|
|
6
|
+
boltcrypt.egg-info/SOURCES.txt
|
|
7
|
+
boltcrypt.egg-info/dependency_links.txt
|
|
8
|
+
boltcrypt.egg-info/entry_points.txt
|
|
9
|
+
boltcrypt.egg-info/requires.txt
|
|
10
|
+
boltcrypt.egg-info/top_level.txt
|
|
11
|
+
boltcrypt/examples/tabular_q.py
|
|
12
|
+
boltcrypt/game/boltcrypt_game.py
|
|
13
|
+
boltcrypt/wrapper/__init__.py
|
|
14
|
+
boltcrypt/wrapper/fogofwar.py
|
|
15
|
+
boltcrypt/wrapper/natlang.py
|
|
16
|
+
boltcrypt/wrapper/roomdiscoveryreward.py
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
boltcrypt
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
Copyright 2025 Adam Parker
|
|
2
|
+
|
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the βSoftwareβ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
|
4
|
+
|
|
5
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
|
6
|
+
|
|
7
|
+
THE SOFTWARE IS PROVIDED βAS ISβ, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
boltcrypt-0.1.0/setup.py
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
from importlib.metadata import entry_points
|
|
2
|
+
|
|
3
|
+
from setuptools import setup
|
|
4
|
+
|
|
5
|
+
long_description = open('README.md').read()
|
|
6
|
+
|
|
7
|
+
setup(
|
|
8
|
+
name="boltcrypt",
|
|
9
|
+
description="Boltcrypt environment",
|
|
10
|
+
long_description=long_description,
|
|
11
|
+
long_description_content_type="text/markdown",
|
|
12
|
+
version="0.1.0",
|
|
13
|
+
author="foreverska",
|
|
14
|
+
install_requires=["gymnasium>=1.0.0", "numpy"],
|
|
15
|
+
keywords="gymnasium, gym",
|
|
16
|
+
license_files = ('license.txt',),
|
|
17
|
+
project_urls={"Github:": "https://github.com/foreverska/boltcrypt"},
|
|
18
|
+
entry_points={
|
|
19
|
+
'gym.envs': ['boltcrypt=boltcrypt.env:BoltCrypt'],
|
|
20
|
+
'console_scripts': 'boltcrypt=boltcrypt.game.boltcrypt_game:play_dungeon'
|
|
21
|
+
}
|
|
22
|
+
)
|