PyPI - cogames-agents - Versions diffs - 0.0.0.7__cp312-cp312-macosx_11_0_arm64.whl - Mend

cogames-agents 0.0.0.7__cp312-cp312-macosx_11_0_arm64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

cogames_agents/policy/scripted_agent/planky/CLAUDE.md ADDED Viewed

@@ -0,0 +1,124 @@
+# Planky — LLM Development Guide
+## Objective
+Maximize reward in CogsGuard by improving the Planky scripted agent. Reward = junction hold time. See `STRATEGY.md` for
+game mechanics and role system.
+## File Layout
+```
+planky/
+  policy.py          # Multi-agent policy, role distribution defaults, per-tick brain loop
+  goal.py            # Goal base class and evaluate_goals()
+  context.py         # PlankyContext (state snapshot, blackboard, navigator, trace)
+  navigator.py       # Pathfinding (A*, exploration, direction bias)
+  entity_map.py      # Spatial memory of observed entities
+  obs_parser.py      # Raw observation → StateSnapshot + visible entities
+  trace.py           # Debug trace formatting
+  goals/
+    gear.py          # GetGearGoal base — gear acquisition with reserve checks
+    miner.py         # ExploreHub, GetMinerGear, PickResource, DepositCargo, MineResource
+    aligner.py       # GetAlignerGear, AlignJunction
+    scrambler.py     # GetScramblerGear, ScrambleJunction
+    scout.py         # GetScoutGear, Explore
+    shared.py        # GetHearts, FallbackMine (used by aligner/scrambler)
+    survive.py       # SurviveGoal — retreat when HP low
+    stem.py          # SelectRoleGoal — dynamic role selection
+  tests/
+    conftest.py      # Fixtures: miner_episode, aligner_episode, etc.
+    helpers.py       # run_planky_episode(), EpisodeResult
+    test_miner.py    # Miner capability tests
+    test_aligner.py  # Aligner capability tests
+    test_scrambler.py
+    test_scout.py
+    test_stem.py
+  STRATEGY.md        # Game mechanics, role costs, strategic loop
+```
+## Core Debugging Loop
+This is the iteration cycle for improving Planky:
+### 1. Run an episode and measure reward
+```bash
+uv run cogames play --mission cogsguard_machina_1.basic \
+  -p planky --cogs 5 --steps 1000 --render none
+```
+### 2. Run with tracing to diagnose behavior
+```bash
+# Trace a specific agent
+uv run cogames play --mission cogsguard_machina_1.basic \
+  -p 'metta://policy/planky?miner=2&aligner=3&trace=1&trace_level=2&trace_agent=0' \
+  --cogs 5 --steps 1000 --render none
+# Trace all agents
+uv run cogames play --mission cogsguard_machina_1.basic \
+  -p 'metta://policy/planky?miner=2&aligner=3&trace=1&trace_level=2' \
+  --cogs 5 --steps 1000 --render none
+```
+Trace output shows per-tick: goal chain, skipped goals (with reason), active goal, action, idle counter. Collective
+resource logs print every 25 steps.
+### 3. Edit goals/policy code
+Each role has a goal list in `policy.py:_make_goal_list()`. Goals are evaluated in priority order. A goal's
+`is_satisfied()` returns True to skip it; `execute()` returns an Action.
+### 4. Validate with multi-seed sweep
+```bash
+# 10-seed reward sweep (copy-paste this)
+total=0; for i in $(seq 1 10); do \
+  r=$(uv run cogames play --mission cogsguard_machina_1.basic \
+    -p planky --cogs 5 --steps 1000 --render none --seed $i 2>&1 \
+    | grep "Reward" | grep -oE '[0-9]+\.[0-9]+'); \
+  echo "Seed $i: $r"; total=$(echo "$total + $r" | bc); \
+done; echo "Average: $(echo "scale=2; $total / 10" | bc)"
+```
+### 5. Run unit tests
+```bash
+metta pytest packages/cogames-agents/src/cogames_agents/policy/scripted_agent/planky/tests/ -v
+```
+Always run tests after changes. All 15 tests + 1 xfail must pass.
+## Current Configuration
+- **5 agents**: 2 miners, 3 aligners (set in `policy.py` defaults)
+- **Mining stop**: miners idle when collective has >100 of every resource (`miner.py:COLLECTIVE_SUFFICIENT_THRESHOLD`)
+- **Deposit threshold**: 50% cargo capacity (`miner.py:DepositCargoGoal`)
+- **Gear reserve**: collective must have cost + 3 of each resource before buying gear (`gear.py:RESOURCE_RESERVE`)
+- **Heart reserve**: collective must have 1 + 3 of each resource before buying hearts
+  (`shared.py:GetHeartsGoal.RESOURCE_RESERVE`)
+- **Miner gear**: no reserve requirement (miners are resource producers) but skipped when resources sufficient
+## Reward Baseline
+10-seed average at 1000 steps, --cogs=5: **~3.3 reward**
+## Key Reward Insights
+- Reward is `(aligned.junction.held / num_junctions) * (100 / max_steps)` — purely junction hold time
+- Clips claims ~11 junctions and doesn't lose them (no scrambler in default config)
+- More aligners = more reward, but they need miners to fund gear + hearts
+- Scramblers tested poorly (1.84 avg with 2m/2a/1s) — hearts are too expensive
+- Seed variance is high; always evaluate across 10+ seeds
+## What to Improve
+Read `STRATEGY.md` for full context. High-impact areas:
+1. **Aligner junction targeting** (`goals/aligner.py:AlignJunctionGoal`) — prioritize junctions that maximize hold time
+   (e.g., cluster nearby, avoid clips AOE)
+2. **Dynamic role switching** — miners could become aligners once resources are sufficient instead of idling
+3. **Early game economy** — first 50 steps are critical; miners need to deposit quickly so aligners get hearts
+4. **Heart acquisition timing** — aligners sometimes waste time walking to chests when collective can't afford hearts
+5. **Navigation efficiency** (`navigator.py`) — A\* pathfinding could be improved, agents sometimes get stuck
+6. **Coordination** — multiple aligners targeting the same junction wastes effort

cogames_agents/policy/scripted_agent/planky/IMPROVEMENTS.md ADDED Viewed

@@ -0,0 +1,160 @@
+# Planky Improvement Log
+Track each improvement attempt with scrimmage scores to measure progress.
+## Benchmark Command
+```bash
+# Standard benchmark (explicit roles, 5 episodes)
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?miner=3&aligner=3&scrambler=4" \
+  --episodes 5 --seed 42
+# IMPORTANT: When using stem=10, you MUST zero out explicit roles:
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?miner=0&aligner=0&scrambler=0&stem=10" \
+  --episodes 5 --seed 42
+```
+---
+## Critical Blockers Identified (2025-01-29)
+### Blocker #1: Gear Station Interaction Fails
+Agents can reach gear stations (dist=1) but bumping doesn't give them gear:
+- Agent bumps station repeatedly (move_north/south/etc fails)
+- Position doesn't change (correct - you bump to interact)
+- But gear is never acquired
+- Likely cause: collective resources depleted, or wrong station being detected
+**Evidence**:
+```
+[t=27] GetMinerGear dist=1 → move_north (fails)
+[t=28] GetMinerGear dist=1 → move_north (fails)
+[t=29] GetMinerGear dist=1 → move_north (fails)
+[t=30] ForceExplore kicks in, agent wanders away
+```
+**To Fix**: Debug the gear station interaction in the game layer, or add wealth=100 to give starting resources.
+### Blocker #2: stem=10 Doesn't Override Defaults
+When using `?stem=10`, the default role counts (miner=4, aligner=2, etc.) still apply. Must explicitly set
+`miner=0&aligner=0&scrambler=0&stem=10` for actual stem mode.
+---
+## Improvement History
+### Baseline
+**Date**: 2025-01-29 **Config**: miner=4, aligner=2, scrambler=4 (explicit)
+```
+Episodes: 3, Seed: 42
+Mean Reward: 0.04
+Junction Aligned: 0.7
+Junction Scrambled: 0.1
+Miners got gear: 0.2 (2 total)
+```
+**Notes**: Very poor performance. Miners stuck at gear station.
+---
+### Attempt #1: Fix Stem Role Selection
+**Date**: 2025-01-29 **Change**: Distribute roles by agent_id in early game instead of all becoming scouts
+**Result**: NO CHANGE (still ~0.04 reward) **Notes**: Roles distributed correctly, but gear acquisition still broken.
+---
+### Attempt #2: Improve Gear Station Approach
+**Date**: 2025-01-29 **Change**: Track bump attempts, try different approach sides, clear cache when stuck
+**Result**: NO CHANGE **Notes**: Agent still can't get gear even when approaching from different directions.
+---
+### Attempt #3: Skip Miner Gear
+**Date**: 2025-01-29 **Change**: Miners skip gear acquisition, mine directly (reduced cargo capacity)
+```
+Episodes: 5, Seed: 42
+Mean Reward: 0.04
+Junction Aligned: 0.9
+Hearts gained: 2.5
+```
+**Result**: NO CHANGE **Notes**: Miners function but economy doesn't sustain combat roles. Aligners/scramblers still
+need gear.
+---
+### Attempt #4: Resource-Aware Gear & Heart Goals
+**Date**: 2025-01-28 **Change**: GetGearGoal and GetHeartsGoal now check collective resources before attempting. Agents
+skip gear/heart acquisition when collective can't afford it, falling through to productive goals (mining, exploring)
+instead of wasting time bumping empty stations.
+Also added:
+- AlignJunctionGoal/ScrambleJunctionGoal skip when agent lacks gear or heart (was bumping junctions uselessly)
+- FallbackMineGoal at end of aligner/scrambler goal lists (mine when idle)
+- Default role distribution changed to 6 miners / 2 aligners / 2 scramblers
+```
+Episodes: 20, Seed: 42, Config: stem=10 (defaults to miner=6, aligner=2, scrambler=2)
+Mean Reward: ~0.25 (range 0.00-0.92)
+junction.aligned_by_agent: 19.80
+junction.scrambled_by_agent: 0.90
+heart.gained: 30.60
+```
+**Result**: SIGNIFICANT IMPROVEMENT — from 0.04 baseline to ~0.25 mean reward. Junction alignments went from ~0 to 19.8
+per episode average.
+---
+### Attempt #5: Deposit fix, nav timeout, role rebalance
+**Date**: 2026-01-28
+Changes:
+- Fixed deposit threshold for ungeared miners (was 10, capacity is 4 — never deposited!)
+- Added navigation timeout (40 steps) for aligner/scrambler junction goals
+- Rebalanced default roles: 6 miners / 4 aligners / 0 scramblers
+- Hub-targeted exploration for gear station discovery
+```
+Episodes: 20, Seed: 42, Config: stem=10
+Mean Reward: ~0.93 (range 0.00-2.46)
+junction.aligned_by_agent: 47.70
+heart.gained: 60.80
+```
+**Result**: 23x improvement from baseline. Economy-first strategy works.
+---
+## Next Steps
+1. **Reduce 0.00 episodes** — 4/20 still score zero (unfavorable map layouts?)
+2. **Faster gear acquisition** — aligners wait ~80 steps for collective resources
+3. **Junction defense** — aligned junctions get scrambled back by clips
+## Current Best Config
+```bash
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?stem=10" \
+  --episodes 20 --seed 42
+# Mean reward: ~0.93
+```

cogames_agents/policy/scripted_agent/planky/NOTES.md ADDED Viewed

@@ -0,0 +1,153 @@
+# Planky Debugging Notes
+## Session: 2025-01-29
+### Goal
+Improve Planky agent to achieve 100 reward in CogsGuard scrimmage.
+### Current Performance
+- **Mean Reward:** ~0.04 (target: 100)
+- **Best Single Episode:** 1.62
+---
+## Critical Findings
+### 1. stem=10 Policy URL Bug
+When using `?stem=10`, the default explicit role counts still apply:
+- Default: `miner=4, aligner=2, scrambler=4, stem=0`
+- With `?stem=10`: `miner=4, aligner=2, scrambler=4, stem=10` (20 total slots!)
+Since CogsGuard only has 10 agents, the first 10 role slots are used. So `stem=10` actually gives you 4 miners + 2
+aligners + 4 scramblers, NOT 10 stem agents.
+**Fix:** Must explicitly zero out other roles:
+```bash
+--policy "metta://policy/planky?miner=0&aligner=0&scrambler=0&stem=10"
+```
+### 2. Gear Station Interaction — Collective Resources Required
+Agents reach gear stations but bumping fails when collective resources are insufficient. Gear stations use
+`actorCollectiveHas(cost)` filter — the bump silently fails if resources are missing.
+**Root cause:** Collective resources deplete quickly when multiple agents gear up. Agents were wasting dozens of steps
+bumping stations that couldn't dispense gear.
+**Fix applied:** `GetGearGoal.is_satisfied()` now checks collective resources via `_collective_can_afford()` before
+walking to the station. If the collective can't afford the gear, the goal is skipped and the agent falls through to its
+next goal (e.g., mining). Same fix applied to `GetHeartsGoal` (heart costs 1 of each element).
+**Gear costs (from collective):**
+- Miner: C1 O1 G3 S1
+- Aligner: C3 O1 G1 S1
+- Scrambler: C1 O3 G1 S1
+- Scout: C1 O1 G1 S3
+### 3. Multiple Station Positions Detected
+Different agents find "miner_station" at different positions — this is normal, there may be multiple gear stations in
+the hub area.
+### 4. Miners Can Function Without Gear
+Miners can mine without gear (just smaller cargo capacity: 4 vs 40). Now with resource-aware gear goals, miners will
+attempt gear when affordable, and fall through to mining without gear when the collective can't afford it.
+---
+## Code Changes Made
+### goals/stem.py - Role Selection
+Fixed early-game role distribution:
+```python
+# Before: All agents became scouts when map knowledge low
+# After: Distribute by agent_id
+if explored_count < 50 and len(extractors) == 0:
+    if agent_id < 2:
+        return "miner"      # Agents 0-1
+    elif agent_id < 5:
+        return "aligner"    # Agents 2-4
+    elif agent_id < 9:
+        return "scrambler"  # Agents 5-8
+    else:
+        return "scout"      # Agent 9
+```
+### goals/gear.py - Stuck Detection
+Added stuck detection and cache clearing:
+- Track bump attempts at dist=1
+- Clear navigator cache when stuck
+- Explore randomly to find alternative path
+- Reduced MAX_TOTAL_ATTEMPTS to 80, RETRY_INTERVAL to 150
+### policy.py - Skip Miner Gear
+Removed gear requirement for miners (they can mine without it).
+---
+## Diagnostic Commands
+```bash
+# Trace specific agent
+cogames play --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?miner=3&aligner=3&scrambler=4&trace=1&trace_level=2&trace_agent=0" \
+  --steps 100 --render none
+# Test with wealth (bypass resource constraints)
+# Edit missions.py: add wealth=100 to CogsGuardMachina1Mission
+cogames play --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?aligner=10" --steps 300
+# Single episode with stats
+cogames play --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?miner=3&aligner=3&scrambler=4" \
+  --steps 200 --render none
+```
+---
+## Next Steps to Investigate
+1. **Debug gear station interaction**
+   - Add logging to gear station bump handler in game layer
+   - Check if `type:miner_station` tag is correct
+   - Verify collective resource levels when bumping
+2. **Test with wealth=100**
+   - Temporarily set wealth in mission config
+   - Isolate whether issue is resources vs interaction
+3. **Check entity detection**
+   - Print what obs_parser detects as miner_station
+   - Verify only one miner_station exists in hub
+4. **Compare with Nim implementation**
+   - The Nim scripted agent works - what does it do differently?
+   - Check how Nim handles gear station interaction
+---
+## Stats Reference
+Key metrics to watch in scrimmage output:
+- `miner.gained` - How many miners got gear
+- `aligner.gained` - How many aligners got gear
+- `scrambler.gained` - How many scramblers got gear
+- `junction.aligned_by_agent` - Junctions captured
+- `junction.scrambled_by_agent` - Enemy junctions neutralized
+- `heart.gained` - Hearts acquired for combat roles
+- `action.move.failed` - High = agents stuck
+- `status.max_steps_without_motion` - Stuck indicator

cogames_agents/policy/scripted_agent/planky/PLAN.md ADDED Viewed

@@ -0,0 +1,254 @@
+# Planky Improvement Plan
+Iterative improvement loop for the Planky CogsGuard agent.
+## The Loop
+```
+1. BENCHMARK  → Run scrimmage, collect baseline metrics
+2. IDENTIFY   → Find the biggest weakness from metrics/observation
+3. IMPLEMENT  → Make a targeted fix
+4. TEST       → Run unit tests + scrimmage
+5. COMMIT     → If improved, commit. If not, revert and try different approach
+6. REPEAT
+```
+## Benchmark Command
+**Default**: Use `stem=10` to let agents dynamically choose roles. Only use explicit role counts (e.g.,
+`miner=4&aligner=2&scrambler=4`) when testing a specific role behavior.
+```bash
+# Quick debug (3 episodes, 500 steps max, ~30 sec)
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?stem=10" \
+  --episodes 3 --steps 500 --seed 42
+# Standard benchmark (5 episodes, ~2 min)
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?stem=10" \
+  --episodes 5 --seed 42
+# Full benchmark (20 episodes, ~5 min)
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?stem=10" \
+  --episodes 20 --seed 42
+# Testing a specific role (only when needed):
+cogames scrimmage --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?miner=10" \
+  --episodes 3 --steps 500 --seed 42
+```
+### Key Metrics to Track
+**Focus on Cog score** - ignore Clip performance, we only care about maximizing Cog outcomes.
+| Metric                   | Target | Description                      |
+| ------------------------ | ------ | -------------------------------- |
+| `cogs.junctions` (final) | > 10   | Territory control at episode end |
+| `cogs.junctions` (peak)  | High   | Best territory control achieved  |
+| `Reward` (mean)          | > 15   | Average reward across episodes   |
+| Resources gathered       | High   | Total resources mined/deposited  |
+| Steps to first junction  | < 200  | Early game expansion speed       |
+## Test Commands
+```bash
+# Run all planky behavior tests
+metta pytest packages/cogames-agents/tests/test_planky_behaviors.py -v
+# Run specific test category
+metta pytest packages/cogames-agents/tests/test_planky_behaviors.py::TestPlankyMiner -v
+metta pytest packages/cogames-agents/tests/test_planky_behaviors.py::TestPlankyAligner -v
+metta pytest packages/cogames-agents/tests/test_planky_behaviors.py::TestPlankyScrambler -v
+# Quick debug play (stem=10, limited steps)
+cogames play --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?stem=10&trace=1&trace_level=2" \
+  --steps 300
+# Debug a specific role (only when testing that role):
+cogames play --mission cogsguard_machina_1.basic \
+  --policy "metta://policy/planky?miner=10&trace=1&trace_level=2" \
+  --steps 300
+```
+## Diagnostic Tips
+### Testing Combat Roles Without Economy
+The `CogsGuardMission` has a `wealth` field that multiplies initial collective resources. To test aligners/scramblers
+without resource constraints, temporarily edit `mission.py`:
+```python
+# In packages/cogames/src/cogames/cogs_vs_clips/missions.py
+# Change wealth=100 for 1000 of each resource + 500 hearts
+CogsGuardMachina1Mission = CogsGuardMission(
+    name="basic",
+    ...
+    wealth=100,  # Add this line temporarily
+)
+```
+**IMPORTANT**: Revert this change before committing. Do NOT commit changes outside cogames-agents.
+### Alternative: Use policy-level resource injection
+For unit tests, the diagnostic evals in `planky_evals.py` can set up custom initial conditions.
+## Score Tracking
+Record improvements in `IMPROVEMENTS.md` - update it after each successful fix with scrimmage scores.
+## Improvement Backlog
+### Priority 1: Economy (Miners)
+**Problem**: Combat roles (aligner/scrambler) can't act without hearts. Hearts require collective resources.
+- [x] **1.1 Resource prioritization**: Mine the resource most needed for hearts (balanced) ✓
+- [x] **1.2 Deposit efficiency**: Miners deposit when >= 50% full ✓
+- [ ] **1.3 Base extractors first**: Each base corner has one extractor per resource type
+  - Miners should prioritize these nearby, safe extractors initially
+  - Only explore for distant extractors once base extractors are depleted
+  - Benefits: short travel time, safe from enemy AOE, quick early economy
+- [ ] **1.4 Dynamic role balance**: Start with more miners, shift to combat as economy stabilizes
+- [ ] **1.5 Co-mining**: Miners should stay near each other for synergy bonuses
+  - Extractors give bonus output when multiple agents mine together (see `synergy` field in stations.py)
+  - Germanium extractors have 50% synergy bonus per additional miner
+  - Miners should coordinate to arrive at extractors together
+  - Consider: leader/follower pattern, or shared target selection
+### Priority 2: Scrambler Targeting
+**Problem**: Scramblers may chase distant junctions while closer threats expand.
+- [ ] **2.1 Distance weighting**: Heavily weight distance in target selection (closer = better)
+- [ ] **2.2 Threat assessment**: Prioritize junctions that are actively expanding clips territory
+- [ ] **2.3 Coordination**: Multiple scramblers shouldn't target the same junction
+### Priority 3: Aligner Efficiency
+**Problem**: Aligners avoid AOE but may ignore good opportunities.
+- [ ] **3.1 Risk/reward scoring**: Accept some AOE risk for high-value junctions
+- [ ] **3.2 Cluster targeting**: Prefer junctions that would create cogs clusters
+- [ ] **3.3 Follow-up coordination**: Align junctions right after scramblers neutralize them
+### Priority 4: Survival & Recovery
+**Problem**: Agents die in enemy AOE and lose gear/hearts.
+- [ ] **4.1 Proactive retreat**: Retreat before HP hits threshold, not after
+- [ ] **4.2 AOE awareness**: All roles should avoid enemy AOE, not just aligners
+- [ ] **4.3 Recovery speed**: Faster gear/heart re-acquisition after death
+### Priority 5: Map Control Strategy
+**Problem**: No global strategy for territory expansion.
+- [ ] **5.1 Hub defense**: Keep at least one junction near hub
+- [ ] **5.2 Frontline awareness**: Push toward clips territory systematically
+- [ ] **5.3 Pincer strategy**: Coordinate scramblers to attack clips from multiple angles
+## Implementation Guide
+### Adding a New Improvement
+1. **Create a test first** (if behavior-testable):
+   ```python
+   # In test_planky_behaviors.py
+   def test_new_behavior(self) -> None:
+       stats = run_planky_episode(NewBehaviorMission, ...)
+       assert stats["some_metric"] > threshold
+   ```
+2. **Create eval mission** (if needed):
+   ```python
+   # In planky_evals.py
+   class PlankyNewBehavior(_PlankyDiagnosticBase):
+       name: str = "planky_new_behavior"
+       map_name: str = "new_behavior.map"
+   ```
+3. **Implement in goal file**:
+   - `goals/miner.py` - resource gathering
+   - `goals/aligner.py` - junction alignment
+   - `goals/scrambler.py` - junction scrambling
+   - `goals/survive.py` - HP-based retreat
+   - `goals/shared.py` - cross-role behaviors (hearts)
+4. **Test locally**:
+   ```bash
+   metta pytest packages/cogames-agents/tests/test_planky_behaviors.py -v -k "new_behavior"
+   ```
+5. **Benchmark**:
+   ```bash
+   cogames scrimmage --mission cogsguard_machina_1.basic \
+     --policy "metta://policy/planky?stem=10" --episodes 5
+   ```
+### File Quick Reference
+| File            | Purpose                                            |
+| --------------- | -------------------------------------------------- |
+| `policy.py`     | Entry point, role distribution, goal list creation |
+| `context.py`    | PlankyContext, StateSnapshot                       |
+| `entity_map.py` | Sparse map with find/query                         |
+| `navigator.py`  | A\* pathfinding, exploration                       |
+| `goal.py`       | Goal base class, evaluate_goals()                  |
+| `goals/*.py`    | Role-specific goals                                |
+## Current Baseline
+Record baseline metrics here before each improvement session:
+```
+Date: [DATE]
+Config: stem=10
+Episodes: 20
+Seed: 42
+Results:
+- Mean reward: [X]
+- Mean final cogs junctions: [X]
+- Peak cogs junctions: [X]
+- Total resources gathered: [X]
+```
+## Completed Improvements
+Track completed work here:
+- [x] Initial goal-tree implementation
+- [x] Basic role goals (miner, aligner, scrambler, scout)
+- [x] Navigation with A\* pathfinding
+- [x] Attempt tracking to avoid stuck loops
+- [x] Aligner AOE avoidance
+- [x] **Resource balancing** - Miners now prioritize the resource the collective has least of, ensuring balanced
+      gathering for hearts
+- [x] **Periodic re-evaluation** - Miners re-evaluate target resource every 50 steps to adapt to changing needs
+- [x] **Useful action tracking** - Track steps since last useful action (mine/deposit/align/scramble) with `IDLE=N`
+      indicator in trace when idle > 20 steps
+- [x] **Smarter deposit threshold** - Miners only deposit when cargo >= 50% full (or >= 10 resources)
+- [x] **Faster extractor failure detection** - Reduced from 5 to 3 attempts, 500 step cooldown on failed extractors
+- [x] **Idle reset mechanism** - Clear navigation cache and targets after 100+ idle steps to break stuck loops
+## Current Observations
+After improvements, resources ARE being balanced (all 4 types mined), but:
+- Junction control is poor (capture 2-3, lose all)
+- Cog junctions peak early then decline - need to sustain territory
+- Agents keep losing gear (walking into enemy AOE)
+- [x] **Resource-aware gear/heart goals** — GetGearGoal and GetHeartsGoal check collective resources before attempting.
+      Agents skip when collective can't afford, falling through to productive goals instead of wasting time bumping
+      empty stations.
+**Next Priority**: Economy bootstrapping — ensure miners get gear first so combat roles can follow