boltcrypt 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,125 @@
1
+ Metadata-Version: 2.4
2
+ Name: boltcrypt
3
+ Version: 0.1.0
4
+ Summary: Boltcrypt environment
5
+ Author: foreverska
6
+ Project-URL: Github:, https://github.com/foreverska/boltcrypt
7
+ Keywords: gymnasium,gym
8
+ Description-Content-Type: text/markdown
9
+ License-File: license.txt
10
+ Requires-Dist: gymnasium>=1.0.0
11
+ Requires-Dist: numpy
12
+ Dynamic: author
13
+ Dynamic: description
14
+ Dynamic: description-content-type
15
+ Dynamic: keywords
16
+ Dynamic: license-file
17
+ Dynamic: project-url
18
+ Dynamic: requires-dist
19
+ Dynamic: summary
20
+
21
+ # BoltCrypt: Procedural Dungeon RL Environment #
22
+ BoltCrypt is a lightweight, OpenAI Gymnasium-compatible environment featuring procedurally generated dungeons. It challenges Reinforcement Learning agents (and humans!) to navigate complex layouts, solve sokoban-style boulder puzzles, and manage inventory items like keys to reach the exit.
23
+ ## 🏰 Features ##
24
+ Procedural Generation: Every reset generates a unique dungeon layout based on configurable parameters (density, connectivity, room size).
25
+ Puzzle Mechanics: Includes boulder-pushing puzzles and locked doors that require finding a key.
26
+ Gymnasium API: Fully compatible with standard RL workflows.
27
+ Pygame Visualization: A built-in harness to play manually or watch your agent learn in real-time.
28
+ Flexible Observation Space: Provides local room grids, global coordinates, and inventory status.
29
+ πŸ›  Installation
30
+ Since this project uses condavenv, ensure you have your environment active:
31
+ ``` bash
32
+ # Example if using conda directly
33
+ conda activate <your-env-name>
34
+ pip install gymnasium pygame numpy matplotlib boltcrypt
35
+ ```
36
+
37
+ ## πŸš€ Getting Started ##
38
+ ### Play Manually ###
39
+ Test the dungeon generation and mechanics yourself using the Pygame harness:
40
+ ``` bash
41
+ python boltcrypt_game.py
42
+ ```
43
+
44
+ **Arrows:** Move the agent.
45
+ **R:** Reset/Regenerate the dungeon.
46
+ **Goal:** Find the key (if required) and reach the green Exit tile.
47
+
48
+ ### Train an Agent ###
49
+ The project includes a tabular Q-Learning implementation to demonstrate how an agent can "memorize" a specific dungeon layout:
50
+ ``` bash
51
+ python tabular_q.py
52
+ ```
53
+
54
+ ## βš™οΈ Configuration ##
55
+
56
+ The DungeonGenerator and BoltCrypt environment can be customized via a config dictionary:
57
+
58
+ ### Parameter Description ###
59
+ **min_dist:** Minimum Manhattan distance between Start and Exit.
60
+ **mean_rooms:** Average number of rooms in the dungeon.
61
+ **connectivity:** Probability of creating loops between rooms (0.0 = Tree, 1.0 = Highly connected).
62
+ **puzzle_density:** Chance of a room containing a boulder puzzle.
63
+ **key_puzzle_prob:** Chance that the exit is locked and a key is hidden in a leaf room.
64
+
65
+
66
+ ### πŸ—Ί Tile Legend ###
67
+ ⬜ Floor: Walkable space.
68
+ ⬛ Wall: Impassable.
69
+ πŸšͺ Door: Transitions between rooms (may be locked by puzzles).
70
+ 🟩 Exit: Your goal!
71
+ πŸ”΄ Switch: Target for boulders.
72
+ 🟀 Boulder: Can be pushed onto switches.
73
+ 🟑 Key: Required to open locked exit rooms.
74
+
75
+
76
+ ### πŸ€– Observation Space ###
77
+ The environment returns a dictionary:
78
+ **grid:** A 10x10 local view of the current room.
79
+ **agent_pos:** (x, y) coordinates within the room.
80
+ **global_pos:** (gx, gy) coordinates in the dungeon layout.
81
+ **inventory:** Binary flag (1 if holding a key)
82
+
83
+ ## πŸ“¦ Wrappers & Extensions ##
84
+ BoltCrypt includes several gym.Wrapper implementations to modify observations or rewards, making it a versatile
85
+ testbed for different RL paradigms.
86
+
87
+ ### πŸ“ Natural Language Wrapper (NaturalLanguage) ####
88
+ The crown jewel for testing Reasoning LLMs. This wrapper transforms the numeric observation space into a rich,
89
+ descriptive narrative. Instead of a grid, the agent receives a text-based description of its surroundings.
90
+ **Dynamic Narrative:** Provides room dimensions, relative positions of doors, boulder locations, and puzzle statuses
91
+ (e.g., "A loud mechanical clank echoes! The doors unlock.").
92
+ **LLM Ready:** Accepts string inputs like "NORTH", "SOUTH", "EAST", or "WEST" in the step() function.
93
+ **Physics Logic:** Includes an "Adventurer's Manual" to explain game rules to an LLM via the observation stream.
94
+
95
+ ### 🌫️ Fog of War (FogOfWarWrapper) ###
96
+ Transforms the global room view into a partially observable environment.
97
+ **Vision Range:** Limits the grid observation to a (2v+1) \times (2v+1) window centered on the agent.
98
+ **Memory Challenge:** Forces agents to map the room internally rather than having perfect spatial information.
99
+
100
+ ### πŸ† Room Discovery Reward (RoomDiscoveryReward) ###
101
+ Combats sparse rewards in large dungeons by incentivizing exploration.
102
+ **Exploration Bonus:** Grants a small configurable reward (e.g., +0.1) the first time the agent enters a new room in the dungeon.
103
+ **Global Navigation:** Helps agents learn the layout of the "macro-dungeon" before they’ve found the final exit.
104
+
105
+ πŸ›  Usage Example
106
+ You can stack wrappers to create complex experimental setups:
107
+ ``` python
108
+ import gymnasium as gym
109
+ from boltcrypt.env import BoltCrypt
110
+ from boltcrypt.wrapper import NaturalLanguage, RoomDiscoveryReward
111
+
112
+ env = BoltCrypt()
113
+ env = RoomDiscoveryReward(env, discovery_reward=0.5)
114
+ env = NaturalLanguage(env)
115
+
116
+ # Now the agent receives text and extra rewards for exploration!
117
+ obs, info = env.reset()
118
+ print(obs)
119
+
120
+ action = "NORTH"
121
+ obs, reward, done, trunc, info = env.step(action)
122
+ ```
123
+
124
+
125
+ Happy Dungeon Crawling! πŸ—οΈπŸΉ
@@ -0,0 +1,105 @@
1
+ # BoltCrypt: Procedural Dungeon RL Environment #
2
+ BoltCrypt is a lightweight, OpenAI Gymnasium-compatible environment featuring procedurally generated dungeons. It challenges Reinforcement Learning agents (and humans!) to navigate complex layouts, solve sokoban-style boulder puzzles, and manage inventory items like keys to reach the exit.
3
+ ## 🏰 Features ##
4
+ Procedural Generation: Every reset generates a unique dungeon layout based on configurable parameters (density, connectivity, room size).
5
+ Puzzle Mechanics: Includes boulder-pushing puzzles and locked doors that require finding a key.
6
+ Gymnasium API: Fully compatible with standard RL workflows.
7
+ Pygame Visualization: A built-in harness to play manually or watch your agent learn in real-time.
8
+ Flexible Observation Space: Provides local room grids, global coordinates, and inventory status.
9
+ πŸ›  Installation
10
+ Since this project uses condavenv, ensure you have your environment active:
11
+ ``` bash
12
+ # Example if using conda directly
13
+ conda activate <your-env-name>
14
+ pip install gymnasium pygame numpy matplotlib boltcrypt
15
+ ```
16
+
17
+ ## πŸš€ Getting Started ##
18
+ ### Play Manually ###
19
+ Test the dungeon generation and mechanics yourself using the Pygame harness:
20
+ ``` bash
21
+ python boltcrypt_game.py
22
+ ```
23
+
24
+ **Arrows:** Move the agent.
25
+ **R:** Reset/Regenerate the dungeon.
26
+ **Goal:** Find the key (if required) and reach the green Exit tile.
27
+
28
+ ### Train an Agent ###
29
+ The project includes a tabular Q-Learning implementation to demonstrate how an agent can "memorize" a specific dungeon layout:
30
+ ``` bash
31
+ python tabular_q.py
32
+ ```
33
+
34
+ ## βš™οΈ Configuration ##
35
+
36
+ The DungeonGenerator and BoltCrypt environment can be customized via a config dictionary:
37
+
38
+ ### Parameter Description ###
39
+ **min_dist:** Minimum Manhattan distance between Start and Exit.
40
+ **mean_rooms:** Average number of rooms in the dungeon.
41
+ **connectivity:** Probability of creating loops between rooms (0.0 = Tree, 1.0 = Highly connected).
42
+ **puzzle_density:** Chance of a room containing a boulder puzzle.
43
+ **key_puzzle_prob:** Chance that the exit is locked and a key is hidden in a leaf room.
44
+
45
+
46
+ ### πŸ—Ί Tile Legend ###
47
+ ⬜ Floor: Walkable space.
48
+ ⬛ Wall: Impassable.
49
+ πŸšͺ Door: Transitions between rooms (may be locked by puzzles).
50
+ 🟩 Exit: Your goal!
51
+ πŸ”΄ Switch: Target for boulders.
52
+ 🟀 Boulder: Can be pushed onto switches.
53
+ 🟑 Key: Required to open locked exit rooms.
54
+
55
+
56
+ ### πŸ€– Observation Space ###
57
+ The environment returns a dictionary:
58
+ **grid:** A 10x10 local view of the current room.
59
+ **agent_pos:** (x, y) coordinates within the room.
60
+ **global_pos:** (gx, gy) coordinates in the dungeon layout.
61
+ **inventory:** Binary flag (1 if holding a key)
62
+
63
+ ## πŸ“¦ Wrappers & Extensions ##
64
+ BoltCrypt includes several gym.Wrapper implementations to modify observations or rewards, making it a versatile
65
+ testbed for different RL paradigms.
66
+
67
+ ### πŸ“ Natural Language Wrapper (NaturalLanguage) ####
68
+ The crown jewel for testing Reasoning LLMs. This wrapper transforms the numeric observation space into a rich,
69
+ descriptive narrative. Instead of a grid, the agent receives a text-based description of its surroundings.
70
+ **Dynamic Narrative:** Provides room dimensions, relative positions of doors, boulder locations, and puzzle statuses
71
+ (e.g., "A loud mechanical clank echoes! The doors unlock.").
72
+ **LLM Ready:** Accepts string inputs like "NORTH", "SOUTH", "EAST", or "WEST" in the step() function.
73
+ **Physics Logic:** Includes an "Adventurer's Manual" to explain game rules to an LLM via the observation stream.
74
+
75
+ ### 🌫️ Fog of War (FogOfWarWrapper) ###
76
+ Transforms the global room view into a partially observable environment.
77
+ **Vision Range:** Limits the grid observation to a (2v+1) \times (2v+1) window centered on the agent.
78
+ **Memory Challenge:** Forces agents to map the room internally rather than having perfect spatial information.
79
+
80
+ ### πŸ† Room Discovery Reward (RoomDiscoveryReward) ###
81
+ Combats sparse rewards in large dungeons by incentivizing exploration.
82
+ **Exploration Bonus:** Grants a small configurable reward (e.g., +0.1) the first time the agent enters a new room in the dungeon.
83
+ **Global Navigation:** Helps agents learn the layout of the "macro-dungeon" before they’ve found the final exit.
84
+
85
+ πŸ›  Usage Example
86
+ You can stack wrappers to create complex experimental setups:
87
+ ``` python
88
+ import gymnasium as gym
89
+ from boltcrypt.env import BoltCrypt
90
+ from boltcrypt.wrapper import NaturalLanguage, RoomDiscoveryReward
91
+
92
+ env = BoltCrypt()
93
+ env = RoomDiscoveryReward(env, discovery_reward=0.5)
94
+ env = NaturalLanguage(env)
95
+
96
+ # Now the agent receives text and extra rewards for exploration!
97
+ obs, info = env.reset()
98
+ print(obs)
99
+
100
+ action = "NORTH"
101
+ obs, reward, done, trunc, info = env.step(action)
102
+ ```
103
+
104
+
105
+ Happy Dungeon Crawling! πŸ—οΈπŸΉ
@@ -0,0 +1,7 @@
1
+ from gymnasium.envs.registration import register
2
+
3
+ register(
4
+ id='BoltCrypt-v0',
5
+ entry_point='boltcrypt.env:BoltCrypt',
6
+ max_episode_steps=1000
7
+ )
@@ -0,0 +1,96 @@
1
+ import boltcrypt
2
+ import gymnasium as gym
3
+
4
+ import numpy as np
5
+ import random
6
+ import matplotlib.pyplot as plt
7
+ from collections import defaultdict
8
+
9
+
10
+ def train_tabular_agent():
11
+ config = {
12
+ 'min_dist': 5, # Small dungeon for fast learning
13
+ 'mean_rooms': 10, # ~10 Rooms total
14
+ 'std_rooms': 0, # Fixed size
15
+ 'puzzle_density': 0.0, # No puzzles, just navigation
16
+ 'connectivity': 0.2, # Mostly tree-like, few loops
17
+ 'min_room_dim': 5,
18
+ 'max_room_dim': 8
19
+ }
20
+
21
+ env = gym.make('BoltCrypt-v0', generator_config=config)
22
+
23
+ # 2. Hyperparameters
24
+ num_episodes = 1000
25
+ learning_rate = 0.1
26
+ discount_factor = 0.99
27
+
28
+ # Exploration (Epsilon Greedy)
29
+ epsilon = 1.0
30
+ epsilon_decay = 0.99
31
+ min_epsilon = 0.05
32
+
33
+ # The Q-Table
34
+ # Key: (global_x, global_y, local_x, local_y)
35
+ # Value: [Q_north, Q_south, Q_east, Q_west]
36
+ q_table = defaultdict(lambda: np.zeros(4))
37
+
38
+ episode_rewards = []
39
+ episode_lengths = []
40
+
41
+ print("--- Starting Training ---")
42
+ print(f"Map Config: Min Dist {config['min_dist']}, Total Rooms ~{config['mean_rooms']}")
43
+
44
+ for episode in range(num_episodes):
45
+ # Reset with FIXED SEED for every episode
46
+ # This ensures the map layout (Walls/Doors) never changes,
47
+ # allowing the agent to memorize the route.
48
+ obs, _ = env.reset(seed=42)
49
+
50
+ gx, gy = obs['global_pos']
51
+ lx, ly = obs['agent_pos']
52
+ state = (gx, gy, lx, ly)
53
+
54
+ total_reward = 0
55
+ done = False
56
+ steps = 0
57
+
58
+ while not done:
59
+ if random.random() < epsilon:
60
+ action = env.action_space.sample() # Explore
61
+ else:
62
+ action = np.argmax(q_table[state]) # Exploit
63
+
64
+ next_obs, reward, done, trunc, _ = env.step(action)
65
+
66
+ next_gx, next_gy = next_obs['global_pos']
67
+ next_lx, next_ly = next_obs['agent_pos']
68
+ next_state = (next_gx, next_gy, next_lx, next_ly)
69
+
70
+ best_next_q = np.max(q_table[next_state])
71
+ current_q = q_table[state][action]
72
+
73
+ q_table[state][action] = current_q + learning_rate * (reward + discount_factor * best_next_q - current_q)
74
+
75
+ state = next_state
76
+ total_reward += reward
77
+ steps += 1
78
+
79
+ if trunc: break
80
+
81
+ epsilon = max(min_epsilon, epsilon * epsilon_decay)
82
+
83
+ episode_rewards.append(total_reward)
84
+ episode_lengths.append(steps)
85
+
86
+ if (episode + 1) % 50 == 0:
87
+ avg_rew = np.mean(episode_rewards[-50:])
88
+ avg_len = np.mean(episode_lengths[-50:])
89
+ print(
90
+ f"Episode {episode + 1:03d} | Avg Reward: {avg_rew:6.2f} | Avg Steps: {avg_len:6.1f} | Epsilon: {epsilon:.2f}")
91
+
92
+ return episode_rewards, episode_lengths
93
+
94
+
95
+ if __name__ == "__main__":
96
+ rewards, lengths = train_tabular_agent()
@@ -0,0 +1,136 @@
1
+ import pygame
2
+ from boltcrypt.env import BoltCrypt
3
+
4
+ # --- PYGAME HARNESS ---
5
+ # Update Constants for Visualization
6
+ COLORS = {
7
+ 0: (230, 215, 180), # Floor
8
+ 1: (40, 40, 40), # Wall
9
+ 2: (139, 69, 19), # Door
10
+ 3: (0, 255, 0), # Exit
11
+ 4: (200, 50, 50), # Switch
12
+ 5: (100, 80, 50), # Boulder
13
+ 6: (255, 215, 0), # KEY (Gold)
14
+ 'AGENT': (50, 100, 200),
15
+ 'BG': (20, 20, 20),
16
+ 'TEXT': (255, 255, 255)
17
+ }
18
+
19
+ def render_gym(screen, font, env, obs, reward, done, text_status):
20
+ screen.fill(COLORS['BG'])
21
+ grid = obs['grid']
22
+ agent_pos = obs['agent_pos']
23
+ global_pos = obs['global_pos']
24
+ has_key = obs['inventory']
25
+
26
+ # Draw Map
27
+ rows, cols = grid.shape
28
+ TILE_SIZE = 48
29
+ OFFSET_X, OFFSET_Y = 50, 50
30
+
31
+ for y in range(rows):
32
+ for x in range(cols):
33
+ rect = pygame.Rect(OFFSET_X + x*TILE_SIZE, OFFSET_Y + (rows-1-y)*TILE_SIZE, TILE_SIZE, TILE_SIZE)
34
+ tile_id = grid[y, x]
35
+
36
+ # Void vs Room
37
+ if x >= env.curr_room.w or y >= env.curr_room.h:
38
+ pygame.draw.rect(screen, (0,0,0), rect)
39
+ else:
40
+ if tile_id == 5:
41
+ pygame.draw.circle(screen, COLORS.get(tile_id), rect.center, 16)
42
+ else:
43
+ color = COLORS.get(tile_id, (255, 0, 255))
44
+ pygame.draw.rect(screen, color, rect)
45
+ pygame.draw.rect(screen, (0,0,0), rect, 1)
46
+
47
+ if tile_id == 6:
48
+ pygame.draw.circle(screen, (255, 255, 0), rect.center, 10)
49
+
50
+ # Draw Agent
51
+ ax, ay = agent_pos
52
+ agent_rect = pygame.Rect(OFFSET_X + ax*TILE_SIZE, OFFSET_Y + (rows-1-ay)*TILE_SIZE, TILE_SIZE, TILE_SIZE)
53
+ pygame.draw.circle(screen, COLORS['AGENT'], agent_rect.center, 16)
54
+
55
+ # HUD
56
+ status = "SOLVED" if env.curr_room.check_solved() else "PUZZLE ACTIVE"
57
+ inv_text = "KEY: [YES]" if has_key else "KEY: [NO]"
58
+ inv_col = (0, 255, 0) if has_key else (150, 150, 150)
59
+
60
+ # Locked Room Warning
61
+ is_locked_room = env.curr_room.is_locked
62
+ room_type = "LOCKED EXIT (Need Key)" if is_locked_room else "Normal Room"
63
+
64
+ lines = [
65
+ f"Pos: {global_pos} | Local: {agent_pos}",
66
+ f"Room Status: {status}",
67
+ f"Room Type: {room_type}",
68
+ f"Inventory: {inv_text}",
69
+ f"Reward: {reward:.2f}",
70
+ f"{text_status}"
71
+ ]
72
+
73
+ for i, line in enumerate(lines):
74
+ col = COLORS['TEXT']
75
+ if "Inventory" in line: col = inv_col
76
+ if "LOCKED EXIT" in line: col = (255, 50, 50)
77
+
78
+ surf = font.render(line, True, col)
79
+ screen.blit(surf, (10, 600 + i*25))
80
+
81
+ def play_dungeon():
82
+ # Config with Key Puzzle Enabled
83
+ config = {
84
+ 'min_dist': 3,
85
+ 'mean_rooms': 10,
86
+ 'std_rooms': 2,
87
+ 'puzzle_density': 0.3, # Disable boulder puzzles to focus on Key testing
88
+ 'key_puzzle_prob': 0.25,
89
+ 'min_room_dim': 5,
90
+ 'max_room_dim': 8
91
+ }
92
+ env = BoltCrypt(generator_config=config)
93
+ obs, _ = env.reset()
94
+ total_reward = 0
95
+
96
+ pygame.init()
97
+ screen = pygame.display.set_mode((600, 750))
98
+ pygame.display.set_caption("Boltcrypt")
99
+ font = pygame.font.Font(None, 24)
100
+ clock = pygame.time.Clock()
101
+
102
+ running = True
103
+ done = False
104
+
105
+ while running:
106
+ action = None
107
+ for event in pygame.event.get():
108
+ if event.type == pygame.QUIT: running = False
109
+ if event.type == pygame.KEYDOWN:
110
+ if event.key == pygame.K_r:
111
+ obs, _ = env.reset()
112
+ total_reward = 0
113
+ done = False
114
+ if not done:
115
+ if event.key == pygame.K_UP: action = 0
116
+ elif event.key == pygame.K_DOWN: action = 1
117
+ elif event.key == pygame.K_RIGHT: action = 2
118
+ elif event.key == pygame.K_LEFT: action = 3
119
+
120
+ if not done:
121
+ status = "Press R to Restart"
122
+ else:
123
+ status = "YOU ESCAPED! Press R to Restart"
124
+ if action is not None:
125
+ obs, reward, done, trunc, info = env.step(action)
126
+ total_reward += reward
127
+ if reward == 1.0 and obs['inventory'] == 1: status = "πŸ”‘ KEY FOUND!"
128
+
129
+ render_gym(screen, font, env, obs, total_reward, done, status)
130
+ pygame.display.flip()
131
+ clock.tick(30)
132
+
133
+ pygame.quit()
134
+
135
+ if __name__ == "__main__":
136
+ play_dungeon()
@@ -0,0 +1,3 @@
1
+ from .roomdiscoveryreward import RoomDiscoveryReward
2
+ from .fogofwar import FogOfWarWrapper
3
+ from .natlang import NaturalLanguage
@@ -0,0 +1,22 @@
1
+ import gymnasium as gym
2
+
3
+ from boltcrypt.env.boltgym import TILE_WALL
4
+
5
+ class FogOfWarWrapper(gym.ObservationWrapper):
6
+ def __init__(self, env, vision_range=1):
7
+ super().__init__(env)
8
+ self.vision_range = vision_range
9
+
10
+ def observation(self, obs):
11
+ grid = obs['grid']
12
+ lx, ly = obs['agent_pos']
13
+
14
+ # Pad grid to handle edges
15
+ padded = np.pad(grid, self.vision_range, constant_values=TILE_WALL)
16
+
17
+ # Slice the window (adjusting for padding)
18
+ v = self.vision_range
19
+ window = padded[ly:ly + 2*v + 1, lx:lx + 2*v + 1]
20
+
21
+ obs['grid'] = window
22
+ return obs
@@ -0,0 +1,200 @@
1
+ import gymnasium as gym
2
+
3
+ class NaturalLanguage(gym.Wrapper):
4
+ def __init__(self, env):
5
+ super().__init__(env)
6
+ self.last_gx, self.last_gy = None, None
7
+ self.last_solved = True
8
+ self.last_boulder_positions = []
9
+ self.entry_direction = None
10
+ self.last_obs = None
11
+
12
+ # Help text to be displayed once or upon request
13
+ self.manual_text = (
14
+ "\n--- ADVENTURER'S MANUAL ---\n"
15
+ "COMMANDS: This interface is primitive. It only accepts 4 commands:\n"
16
+ " 'NORTH', 'SOUTH', 'EAST', 'WEST'\n\n"
17
+ "PHYSICS:\n"
18
+ "- Movement: Typing a direction moves you 1 tile.\n"
19
+ "- Pushing: To move a boulder, walk INTO it. You will push it 1 tile forward.\n"
20
+ "- Constraints: Boulders are heavy. You cannot push them against walls or corners.\n"
21
+ "- Door Locks: If a room has switches, all doors stay locked until every switch is covered.\n"
22
+ "---------------------------\n"
23
+ )
24
+
25
+ def reset(self, **kwargs):
26
+ obs, info = self.env.reset(**kwargs)
27
+
28
+ self.last_gx, self.last_gy = obs['global_pos']
29
+ self.last_solved = self.env.curr_room.check_solved()
30
+ self.last_boulder_positions = [list(b) for b in self.env.curr_room.boulders]
31
+ self.entry_direction = "Start"
32
+ self.last_obs = obs
33
+
34
+ # Narrative initialization
35
+ objective_text = (
36
+ "MISSION: Escape the Labyrinth. Find the EXIT room located at a great distance.\n"
37
+ f"{self.manual_text}"
38
+ "You wake up on the cold floor. The journey begins."
39
+ )
40
+
41
+ return self._generate_narrative(obs, 0, False, objective_text), info
42
+
43
+ def step(self, action_input: str|int):
44
+ # Handle string input for LLMs
45
+ if isinstance(action_input, str):
46
+ cmd = action_input.lower().strip()
47
+ if cmd == "north": action = 0
48
+ elif cmd == "south": action = 1
49
+ elif cmd == "east": action = 2
50
+ elif cmd == "west": action = 3
51
+ elif cmd == "help":
52
+ return self._generate_narrative(self.last_obs, 0, False, self.manual_text), 0, False, False, {}
53
+ else:
54
+ return "Unknown command. Use NORTH, SOUTH, EAST, WEST, or HELP.", 0, False, False, {}
55
+ else:
56
+ action = action_input
57
+
58
+ # 1. Pre-Step State Capture
59
+ prev_room = self.env.curr_room
60
+ prev_boulders = [list(b) for b in prev_room.boulders]
61
+
62
+ # 2. Execute Step
63
+ obs, reward, done, trunc, info = self.env.step(action)
64
+
65
+ # 3. Analyze What Happened
66
+ event_log = []
67
+
68
+ # A. Room Transition Check
69
+ curr_gx, curr_gy = obs['global_pos']
70
+ just_entered_room = (curr_gx != self.last_gx or curr_gy != self.last_gy)
71
+
72
+ if just_entered_room:
73
+ # Determine Entry Direction
74
+ dx = curr_gx - self.last_gx
75
+ dy = curr_gy - self.last_gy
76
+ if dy == 1: self.entry_direction = "South" # Moved North, entered from South
77
+ elif dy == -1: self.entry_direction = "North"
78
+ elif dx == 1: self.entry_direction = "West"
79
+ elif dx == -1: self.entry_direction = "East"
80
+
81
+ event_log.append(f"You pass through the door and enter a new room from the {self.entry_direction}.")
82
+
83
+ # Reset room-specific memory
84
+ self.last_gx, self.last_gy = curr_gx, curr_gy
85
+ self.last_boulder_positions = [list(b) for b in self.env.curr_room.boulders]
86
+ self.last_solved = self.env.curr_room.check_solved()
87
+
88
+ else:
89
+ # B. Push/Blocked Check
90
+ # Check if we moved?
91
+ # Check if a boulder moved?
92
+ curr_boulders = self.env.curr_room.boulders
93
+ boulder_moved = False
94
+
95
+ for i, b_new in enumerate(curr_boulders):
96
+ if b_new != prev_boulders[i]:
97
+ # Boulder Moved!
98
+ boulder_moved = True
99
+ # Check if it landed on a switch
100
+ on_switch = tuple(b_new) in self.env.curr_room.switches
101
+ click = " *CLICK!*" if on_switch else ""
102
+ event_log.append(f"You push the heavy boulder. It grinds across the floor.{click}")
103
+ break
104
+
105
+ if not boulder_moved and reward < 0:
106
+ # Heuristic: Negative reward usually implies wall bonk or time waste
107
+ # But we strictly want 'Wall Bump' detection.
108
+ # We can infer it if agent pos didn't change AND action was taken
109
+ pass # (Simplified for brevity, usually LLM can infer from 'You are at same pos')
110
+
111
+ # C. Puzzle Status Change
112
+ is_now_solved = self.env.curr_room.check_solved()
113
+ if is_now_solved and not self.last_solved:
114
+ event_log.append("A loud mechanical clank echoes! The doors unlock.")
115
+ elif not is_now_solved and self.last_solved and self.env.curr_room.has_puzzle:
116
+ event_log.append("The mechanism disengages. The doors slam shut!")
117
+
118
+ self.last_solved = is_now_solved
119
+ self.last_boulder_positions = [list(b) for b in self.env.curr_room.boulders]
120
+
121
+ # 4. Generate Full Text
122
+ text_obs = self._generate_narrative(obs, reward, done, " ".join(event_log))
123
+
124
+ return text_obs, reward, done, trunc, info
125
+
126
+ def _generate_narrative(self, obs, reward, done, event_text):
127
+ room = self.env.curr_room
128
+ lx, ly = obs['agent_pos']
129
+
130
+ # --- 1. The Header ---
131
+ puzzle_status = "SOLVED" if room.is_solved else "LOCKED"
132
+ narrative = [f"--- {puzzle_status} ROOM ({room.w}x{room.h}) ---"]
133
+
134
+ if event_text:
135
+ narrative.append(f"> {event_text}")
136
+
137
+ # --- 2. Relative Positioning (The "Where am I" Logic) ---
138
+ narrative.append(f"You are standing at coordinate ({lx}, {ly}).")
139
+
140
+ # List Doors relative to agent
141
+ doors_desc = []
142
+ for direction, offset in room.doors.items():
143
+ d_name = direction.name.title()
144
+
145
+ # Distance Calculation
146
+ # North (top), South (bottom), East (right), West (left)
147
+ # North door is at (offset, h-1)
148
+ dist_str = ""
149
+ if direction.name == "NORTH":
150
+ dist = (room.h - 1) - ly
151
+ dist_str = f"{dist} steps North"
152
+ elif direction.name == "SOUTH":
153
+ dist = ly
154
+ dist_str = f"{dist} steps South"
155
+ elif direction.name == "EAST":
156
+ dist = (room.w - 1) - lx
157
+ dist_str = f"{dist} steps East"
158
+ elif direction.name == "WEST":
159
+ dist = lx
160
+ dist_str = f"{dist} steps West"
161
+
162
+ state = "(OPEN)" if room.is_solved else "(LOCKED)"
163
+ doors_desc.append(f"- {d_name} Door: {dist_str} {state}")
164
+
165
+ narrative.append("Exits:")
166
+ narrative.extend(doors_desc)
167
+
168
+ # --- 3. Puzzle Elements ---
169
+ if room.has_puzzle:
170
+ narrative.append("Puzzle Elements:")
171
+ # Boulders
172
+ for i, b in enumerate(room.boulders):
173
+ bx, by = b[1], b[0] # Note: grid is (y,x), obs is usually (x,y) for text
174
+
175
+ # Check if on switch
176
+ on_switch = (by, bx) in room.switches
177
+ status = "sitting on a switch" if on_switch else "on the bare floor"
178
+
179
+ # Relative direction from agent
180
+ rel_dir = []
181
+ if by > ly: rel_dir.append("North")
182
+ elif by < ly: rel_dir.append("South")
183
+ if bx > lx: rel_dir.append("East")
184
+ elif bx < lx: rel_dir.append("West")
185
+
186
+ dir_str = "-".join(rel_dir) if rel_dir else "HERE"
187
+ narrative.append(f"- Boulder {i+1}: Located at ({bx}, {by}) [{dir_str} of you]. It is {status}.")
188
+
189
+ # Empty Switches
190
+ for s in room.switches:
191
+ sy, sx = s
192
+ # Is there a boulder on it?
193
+ covered = any(b[0] == sy and b[1] == sx for b in room.boulders)
194
+ if not covered:
195
+ narrative.append(f"- Empty Switch: Located at ({sx}, {sy}).")
196
+
197
+ if done:
198
+ narrative.append("\n*** YOU HAVE FOUND THE EXIT! ***")
199
+
200
+ return "\n".join(narrative)
@@ -0,0 +1,24 @@
1
+ import gymnasium as gym
2
+
3
+ class RoomDiscoveryReward(gym.Wrapper):
4
+ def __init__(self, env, discovery_reward=0.1):
5
+ super().__init__(env)
6
+ self.discovery_reward = discovery_reward
7
+ self.visited_rooms = set()
8
+
9
+ def reset(self, **kwargs):
10
+ obs, info = self.env.reset(**kwargs)
11
+ start_pos = tuple(obs['global_pos'])
12
+ self.visited_rooms = {start_pos}
13
+ return obs, info
14
+
15
+ def step(self, action):
16
+ obs, reward, terminated, truncated, info = self.env.step(action)
17
+
18
+ curr_pos = tuple(obs['global_pos'])
19
+
20
+ if curr_pos not in self.visited_rooms:
21
+ reward += self.discovery_reward
22
+ self.visited_rooms.add(curr_pos)
23
+
24
+ return obs, reward, terminated, truncated, info
@@ -0,0 +1,125 @@
1
+ Metadata-Version: 2.4
2
+ Name: boltcrypt
3
+ Version: 0.1.0
4
+ Summary: Boltcrypt environment
5
+ Author: foreverska
6
+ Project-URL: Github:, https://github.com/foreverska/boltcrypt
7
+ Keywords: gymnasium,gym
8
+ Description-Content-Type: text/markdown
9
+ License-File: license.txt
10
+ Requires-Dist: gymnasium>=1.0.0
11
+ Requires-Dist: numpy
12
+ Dynamic: author
13
+ Dynamic: description
14
+ Dynamic: description-content-type
15
+ Dynamic: keywords
16
+ Dynamic: license-file
17
+ Dynamic: project-url
18
+ Dynamic: requires-dist
19
+ Dynamic: summary
20
+
21
+ # BoltCrypt: Procedural Dungeon RL Environment #
22
+ BoltCrypt is a lightweight, OpenAI Gymnasium-compatible environment featuring procedurally generated dungeons. It challenges Reinforcement Learning agents (and humans!) to navigate complex layouts, solve sokoban-style boulder puzzles, and manage inventory items like keys to reach the exit.
23
+ ## 🏰 Features ##
24
+ Procedural Generation: Every reset generates a unique dungeon layout based on configurable parameters (density, connectivity, room size).
25
+ Puzzle Mechanics: Includes boulder-pushing puzzles and locked doors that require finding a key.
26
+ Gymnasium API: Fully compatible with standard RL workflows.
27
+ Pygame Visualization: A built-in harness to play manually or watch your agent learn in real-time.
28
+ Flexible Observation Space: Provides local room grids, global coordinates, and inventory status.
29
+ πŸ›  Installation
30
+ Since this project uses condavenv, ensure you have your environment active:
31
+ ``` bash
32
+ # Example if using conda directly
33
+ conda activate <your-env-name>
34
+ pip install gymnasium pygame numpy matplotlib boltcrypt
35
+ ```
36
+
37
+ ## πŸš€ Getting Started ##
38
+ ### Play Manually ###
39
+ Test the dungeon generation and mechanics yourself using the Pygame harness:
40
+ ``` bash
41
+ python boltcrypt_game.py
42
+ ```
43
+
44
+ **Arrows:** Move the agent.
45
+ **R:** Reset/Regenerate the dungeon.
46
+ **Goal:** Find the key (if required) and reach the green Exit tile.
47
+
48
+ ### Train an Agent ###
49
+ The project includes a tabular Q-Learning implementation to demonstrate how an agent can "memorize" a specific dungeon layout:
50
+ ``` bash
51
+ python tabular_q.py
52
+ ```
53
+
54
+ ## βš™οΈ Configuration ##
55
+
56
+ The DungeonGenerator and BoltCrypt environment can be customized via a config dictionary:
57
+
58
+ ### Parameter Description ###
59
+ **min_dist:** Minimum Manhattan distance between Start and Exit.
60
+ **mean_rooms:** Average number of rooms in the dungeon.
61
+ **connectivity:** Probability of creating loops between rooms (0.0 = Tree, 1.0 = Highly connected).
62
+ **puzzle_density:** Chance of a room containing a boulder puzzle.
63
+ **key_puzzle_prob:** Chance that the exit is locked and a key is hidden in a leaf room.
64
+
65
+
66
+ ### πŸ—Ί Tile Legend ###
67
+ ⬜ Floor: Walkable space.
68
+ ⬛ Wall: Impassable.
69
+ πŸšͺ Door: Transitions between rooms (may be locked by puzzles).
70
+ 🟩 Exit: Your goal!
71
+ πŸ”΄ Switch: Target for boulders.
72
+ 🟀 Boulder: Can be pushed onto switches.
73
+ 🟑 Key: Required to open locked exit rooms.
74
+
75
+
76
+ ### πŸ€– Observation Space ###
77
+ The environment returns a dictionary:
78
+ **grid:** A 10x10 local view of the current room.
79
+ **agent_pos:** (x, y) coordinates within the room.
80
+ **global_pos:** (gx, gy) coordinates in the dungeon layout.
81
+ **inventory:** Binary flag (1 if holding a key)
82
+
83
+ ## πŸ“¦ Wrappers & Extensions ##
84
+ BoltCrypt includes several gym.Wrapper implementations to modify observations or rewards, making it a versatile
85
+ testbed for different RL paradigms.
86
+
87
+ ### πŸ“ Natural Language Wrapper (NaturalLanguage) ####
88
+ The crown jewel for testing Reasoning LLMs. This wrapper transforms the numeric observation space into a rich,
89
+ descriptive narrative. Instead of a grid, the agent receives a text-based description of its surroundings.
90
+ **Dynamic Narrative:** Provides room dimensions, relative positions of doors, boulder locations, and puzzle statuses
91
+ (e.g., "A loud mechanical clank echoes! The doors unlock.").
92
+ **LLM Ready:** Accepts string inputs like "NORTH", "SOUTH", "EAST", or "WEST" in the step() function.
93
+ **Physics Logic:** Includes an "Adventurer's Manual" to explain game rules to an LLM via the observation stream.
94
+
95
+ ### 🌫️ Fog of War (FogOfWarWrapper) ###
96
+ Transforms the global room view into a partially observable environment.
97
+ **Vision Range:** Limits the grid observation to a (2v+1) \times (2v+1) window centered on the agent.
98
+ **Memory Challenge:** Forces agents to map the room internally rather than having perfect spatial information.
99
+
100
+ ### πŸ† Room Discovery Reward (RoomDiscoveryReward) ###
101
+ Combats sparse rewards in large dungeons by incentivizing exploration.
102
+ **Exploration Bonus:** Grants a small configurable reward (e.g., +0.1) the first time the agent enters a new room in the dungeon.
103
+ **Global Navigation:** Helps agents learn the layout of the "macro-dungeon" before they’ve found the final exit.
104
+
105
+ πŸ›  Usage Example
106
+ You can stack wrappers to create complex experimental setups:
107
+ ``` python
108
+ import gymnasium as gym
109
+ from boltcrypt.env import BoltCrypt
110
+ from boltcrypt.wrapper import NaturalLanguage, RoomDiscoveryReward
111
+
112
+ env = BoltCrypt()
113
+ env = RoomDiscoveryReward(env, discovery_reward=0.5)
114
+ env = NaturalLanguage(env)
115
+
116
+ # Now the agent receives text and extra rewards for exploration!
117
+ obs, info = env.reset()
118
+ print(obs)
119
+
120
+ action = "NORTH"
121
+ obs, reward, done, trunc, info = env.step(action)
122
+ ```
123
+
124
+
125
+ Happy Dungeon Crawling! πŸ—οΈπŸΉ
@@ -0,0 +1,16 @@
1
+ README.md
2
+ license.txt
3
+ setup.py
4
+ boltcrypt/__init__.py
5
+ boltcrypt.egg-info/PKG-INFO
6
+ boltcrypt.egg-info/SOURCES.txt
7
+ boltcrypt.egg-info/dependency_links.txt
8
+ boltcrypt.egg-info/entry_points.txt
9
+ boltcrypt.egg-info/requires.txt
10
+ boltcrypt.egg-info/top_level.txt
11
+ boltcrypt/examples/tabular_q.py
12
+ boltcrypt/game/boltcrypt_game.py
13
+ boltcrypt/wrapper/__init__.py
14
+ boltcrypt/wrapper/fogofwar.py
15
+ boltcrypt/wrapper/natlang.py
16
+ boltcrypt/wrapper/roomdiscoveryreward.py
@@ -0,0 +1,5 @@
1
+ [console_scripts]
2
+ boltcrypt = boltcrypt.game.boltcrypt_game:play_dungeon
3
+
4
+ [gym.envs]
5
+ boltcrypt = boltcrypt.env:BoltCrypt
@@ -0,0 +1,2 @@
1
+ gymnasium>=1.0.0
2
+ numpy
@@ -0,0 +1 @@
1
+ boltcrypt
@@ -0,0 +1,7 @@
1
+ Copyright 2025 Adam Parker
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the β€œSoftware”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED β€œAS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,22 @@
1
+ from importlib.metadata import entry_points
2
+
3
+ from setuptools import setup
4
+
5
+ long_description = open('README.md').read()
6
+
7
+ setup(
8
+ name="boltcrypt",
9
+ description="Boltcrypt environment",
10
+ long_description=long_description,
11
+ long_description_content_type="text/markdown",
12
+ version="0.1.0",
13
+ author="foreverska",
14
+ install_requires=["gymnasium>=1.0.0", "numpy"],
15
+ keywords="gymnasium, gym",
16
+ license_files = ('license.txt',),
17
+ project_urls={"Github:": "https://github.com/foreverska/boltcrypt"},
18
+ entry_points={
19
+ 'gym.envs': ['boltcrypt=boltcrypt.env:BoltCrypt'],
20
+ 'console_scripts': 'boltcrypt=boltcrypt.game.boltcrypt_game:play_dungeon'
21
+ }
22
+ )