PyPI - rlm-code - Versions diffs - 0.1.8__tar.gz → 0.1.9__tar.gz - Mend

rlm-code 0.1.8tar.gz → 0.1.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (305) hide show

{rlm_code-0.1.8 → rlm_code-0.1.9}/.gitignore RENAMED Viewed

@@ -153,6 +153,7 @@ cython_debug/
 # Project specific
 dspy_config.yaml
+rlm_config.yaml
 *.log
 # Internal workspace data directories (all data in CWD)

{rlm_code-0.1.8 → rlm_code-0.1.9}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,17 @@ All notable changes to this project are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.9] - 2026-06-26
+### Added
+- Pure RLM runner context initialization from explicit workspace file references in the task, with compact repository snapshot fallback.
+- Context-load events for Pure RLM runs, including loaded file names and total context characters.
+- Runner JSONL replay coverage for action code, observations, success state, token counts, and cumulative reward.
+### Changed
+- TUI trajectory and replay views now show Pure RLM signals including REPL code, stdout/stderr previews, `llm_query` counts, executed code blocks, finalization status, and REPL variables.
+- Run visualization now includes richer Pure RLM previews for completed runs.
 ## [0.1.8] - 2026-05-01
 ### Added
@@ -76,5 +87,6 @@ Initial public release of **RLM Code**.
 [0.1.5]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.5
 [0.1.6]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.6
+[0.1.9]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.9
 [0.1.8]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.8
 [0.1.7]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.7

{rlm_code-0.1.8 → rlm_code-0.1.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: rlm-code
-Version: 0.1.8
+Version: 0.1.9
 Summary: RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems
 Project-URL: Homepage, https://github.com/SuperagenticAI/rlm-code
 Project-URL: Documentation, https://superagenticai.github.io/rlm-code/
@@ -118,21 +118,20 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
-## Release v0.1.8
+## Release v0.1.9
-This release extends HALO/AHE-style trace analysis with layered evidence export.
+This release improves Pure RLM repository runs and makes completed trajectories more inspectable from the TUI and replay views.
-- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
-- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
-- AHE-style evidence corpus export with `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans
-- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
-- `/rlm` help/docs updated for `env=trace_analysis`
-- Dedicated trace analysis docs under the Core Engine section
+- Pure RLM runs now initialize `context` from explicit workspace files mentioned in the task, with a compact repository snapshot fallback
+- Runner events now record context-load metadata for Pure RLM runs
+- Legacy runner JSONL step events replay with action code, observations, success, token counts, and cumulative reward
+- Run visualization now includes REPL code previews, stdout/stderr previews, `llm_query` counts, executed code blocks, finalization status, and REPL variables
+- TUI trajectory and replay views now surface Pure RLM signals directly for completed runs
 Example:
 ```text
-/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
+/rlm run "Validate pure_rlm_environment.py and cite context, REPL, llm_query, and FINAL evidence" env=pure_rlm steps=6
 ```
 ## Documentation

{rlm_code-0.1.8 → rlm_code-0.1.9}/README.md RENAMED Viewed

@@ -25,21 +25,20 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
-## Release v0.1.8
+## Release v0.1.9
-This release extends HALO/AHE-style trace analysis with layered evidence export.
+This release improves Pure RLM repository runs and makes completed trajectories more inspectable from the TUI and replay views.
-- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
-- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
-- AHE-style evidence corpus export with `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans
-- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
-- `/rlm` help/docs updated for `env=trace_analysis`
-- Dedicated trace analysis docs under the Core Engine section
+- Pure RLM runs now initialize `context` from explicit workspace files mentioned in the task, with a compact repository snapshot fallback
+- Runner events now record context-load metadata for Pure RLM runs
+- Legacy runner JSONL step events replay with action code, observations, success, token counts, and cumulative reward
+- Run visualization now includes REPL code previews, stdout/stderr previews, `llm_query` counts, executed code blocks, finalization status, and REPL variables
+- TUI trajectory and replay views now surface Pure RLM signals directly for completed runs
 Example:
 ```text
-/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
+/rlm run "Validate pure_rlm_environment.py and cite context, REPL, llm_query, and FINAL evidence" env=pure_rlm steps=6
 ```
 ## Documentation

{rlm_code-0.1.8 → rlm_code-0.1.9}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "rlm-code"
-version = "0.1.8"
+version = "0.1.9"
 description = "RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems"
 readme = "README.md"
 license = "Apache-2.0"

{rlm_code-0.1.8 → rlm_code-0.1.9}/rlm_code/__init__.py RENAMED Viewed

@@ -5,5 +5,5 @@ This package provides tools for creating, managing, and optimizing DSPy componen
 through natural language interactions.
 """
-__version__ = "0.1.8"
+__version__ = "0.1.9"
 __author__ = "Super Agentic AI"

{rlm_code-0.1.8 → rlm_code-0.1.9}/rlm_code/mcp/__init__.py RENAMED Viewed

@@ -17,7 +17,7 @@ from .exceptions import (
 )
 from .session_wrapper import MCPSessionWrapper
-__version__ = "0.1.8"
+__version__ = "0.1.9"
 __all__ = [
     "MCPClientManager",

{rlm_code-0.1.8 → rlm_code-0.1.9}/rlm_code/rlm/runner.py RENAMED Viewed

@@ -9,6 +9,7 @@ from __future__ import annotations
 import hashlib
 import json
+import re
 import threading
 import time
 from dataclasses import asdict, dataclass, is_dataclass
@@ -29,7 +30,7 @@ from .benchmark_manager import (
 )
 from .benchmarks import RLMBenchmarkCase, load_benchmark_packs
 from .chat_session import ChatSessionMixin
-from .context_store import LazyFileContext
+from .context_store import ContextRef, LazyFileContext
 from .delegation import DelegationMixin
 from .environments import (
     DSPyCodingRLMEnvironment,
@@ -467,6 +468,93 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
             allow_unsafe_exec=(selected_backend == "exec" and self._pure_rlm_allow_unsafe_exec),
         )
+    def _extract_task_file_refs(self, task: str, limit: int = 12) -> list[ContextRef]:
+        """Find explicit workspace file references mentioned in a task string."""
+        candidates = re.findall(
+            r"(?<![\w.-])(?:[\w.-]+/)*[\w.-]+\.(?:py|md|toml|yaml|yml|json|txt|js|jsx|ts|tsx)",
+            task,
+        )
+        seen: set[str] = set()
+        refs: list[ContextRef] = []
+        for candidate in candidates:
+            normalized = candidate.strip().strip("`'\".,:;)")
+            if not normalized or normalized in seen:
+                continue
+            seen.add(normalized)
+            refs.append(ContextRef(path=normalized))
+            if len(refs) >= limit:
+                break
+        return refs
+    def _build_pure_rlm_initial_context(self, task: str) -> dict[str, str]:
+        """
+        Build a small real-code context for Pure RLM runs.
+        The direct PureRLMEnvironment API expects context to be initialized
+        explicitly.  Runner/TUI users expect `/rlm run ... env=pure_rlm` to
+        start with useful workspace data, so we seed `context` with explicit
+        files named in the task, falling back to a compact repository snapshot.
+        """
+        refs = self._extract_task_file_refs(task)
+        if not refs:
+            refs = self.context_store.discover(limit=12)
+        context: dict[str, str] = {}
+        for ref in refs:
+            snippet = self.context_store.read(ref, max_chars=12000)
+            if snippet:
+                context[ref.path] = snippet
+        if context:
+            return context
+        discovered = self.context_store.discover(limit=80)
+        tree = "\n".join(ref.path for ref in discovered)
+        return {
+            "_workspace": (
+                f"Workspace: {self.workdir}\n"
+                "No explicit file snippets were loaded. Available files:\n"
+                f"{tree}"
+            ).strip()
+        }
+    def _initialize_pure_rlm_run_context(
+        self,
+        env: RLMEnvironment,
+        task: str,
+        *,
+        run_id: str,
+        run_path: Path,
+    ) -> int:
+        """Initialize `context` for Pure RLM runs and persist a context event."""
+        if env.name != "pure_rlm" or not hasattr(env, "initialize_context"):
+            return 0
+        context = self._build_pure_rlm_initial_context(task)
+        env.initialize_context(
+            context,
+            description="Workspace files selected for this Pure RLM run",
+            additional_vars={"query": task},
+        )
+        context_event = {
+            "type": "context",
+            "run_id": run_id,
+            "environment": env.name,
+            "timestamp": self._utc_now(),
+            "context_files": list(context.keys()),
+            "context_chars": sum(len(value) for value in context.values()),
+        }
+        self._append_event(run_path, context_event)
+        self._emit_runtime_event(
+            "context_load",
+            {
+                "run_id": run_id,
+                "files": len(context),
+                "chars": context_event["context_chars"],
+            },
+        )
+        return len(context)
     def run_task(
         self,
         task: str,
@@ -596,6 +684,12 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
         final_response = ""
         cancelled = False
         trajectory: list[dict[str, Any]] = []
+        context_files = self._initialize_pure_rlm_run_context(
+            env,
+            cleaned_task,
+            run_id=run_id,
+            run_path=run_path,
+        )
         usage_start = self._usage_snapshot()
         self.observability.on_run_start(
             run_id,
@@ -616,6 +710,7 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
                 "parent_run_id": _parent_run_id,
                 "pure_rlm_backend": self._pure_rlm_backend if env.name == "pure_rlm" else None,
                 "pure_rlm_strict": strict_pure_mode if env.name == "pure_rlm" else None,
+                "context_files": context_files if env.name == "pure_rlm" else None,
             },
         )
         self._emit_runtime_event(
@@ -627,6 +722,7 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
                 "framework": native_framework,
                 "depth": _depth,
                 "parent_run_id": _parent_run_id,
+                "context_files": context_files if env.name == "pure_rlm" else None,
             },
         )

{rlm_code-0.1.8 → rlm_code-0.1.9}/rlm_code/rlm/session_replay.py RENAMED Viewed

@@ -1035,14 +1035,30 @@ def _convert_legacy_step(data: dict[str, Any]) -> SessionEvent:
     step_type = data.get("type", "")
     if step_type == "step":
+        observation = data.get("observation", {})
+        observation_dict = observation if isinstance(observation, dict) else {}
+        action = data.get("action", {})
+        action_dict = action if isinstance(action, dict) else {}
+        success = observation_dict.get("success")
+        if success is None:
+            success = not bool(observation_dict.get("error") or observation_dict.get("stderr"))
+        usage = data.get("usage", {})
+        usage_dict = usage if isinstance(usage, dict) else {}
         return SessionEvent(
             event_type=SessionEventType.STEP_END,
             timestamp=data.get("timestamp", _utc_now()),
-            step=data.get("step", 0),
+            step=int(data.get("step", 0) or 0),
             data={
-                "action": data.get("action", {}),
-                "observation": data.get("observation", {}),
+                "step": int(data.get("step", 0) or 0),
+                "timestamp": data.get("timestamp", _utc_now()),
+                "action": action_dict,
+                "observation": observation_dict,
                 "reward": data.get("reward", 0.0),
+                "success": bool(success),
+                "tokens_used": int(
+                    usage_dict.get("prompt_tokens", 0) or 0
+                )
+                + int(usage_dict.get("completion_tokens", 0) or 0),
             },
             run_id=data.get("run_id", ""),
             depth=data.get("depth", 0),
@@ -1125,12 +1141,18 @@ def _build_snapshot_from_events(
         elif event.event_type == SessionEventType.STEP_END:
             # Build StepState from accumulated data
+            if "step" not in current_step_data:
+                current_step_data = {
+                    "step": int(event.data.get("step", event.step) or 0),
+                    "timestamp": str(event.data.get("timestamp", event.timestamp) or ""),
+                }
             if "step" in current_step_data:
                 # Merge any additional data from STEP_END event
                 if "action" in event.data:
                     action = event.data["action"]
                     current_step_data.setdefault("action_type", action.get("action", ""))
                     current_step_data.setdefault("action_code", action.get("code", ""))
+                    current_step_data.setdefault("action_rationale", action.get("reasoning", ""))
                     current_step_data.setdefault("raw_action", action)
                 if "observation" in event.data:
                     obs = event.data["observation"]
@@ -1138,12 +1160,16 @@ def _build_snapshot_from_events(
                     current_step_data.setdefault("error", obs.get("error", obs.get("stderr", "")))
                     current_step_data.setdefault("raw_observation", obs)
                 if "reward" in event.data:
+                    reward = float(event.data.get("reward", 0.0) or 0.0)
+                    cumulative = event.data.get("cumulative_reward")
+                    if cumulative is None:
+                        cumulative = total_reward + reward
                     current_step_data.setdefault("reward", event.data["reward"])
-                    current_step_data.setdefault(
-                        "cumulative_reward", event.data.get("cumulative_reward", 0.0)
-                    )
+                    current_step_data.setdefault("cumulative_reward", cumulative)
                 if "success" in event.data:
                     current_step_data.setdefault("success", event.data["success"])
+                if "tokens_used" in event.data:
+                    current_step_data.setdefault("tokens_used", event.data["tokens_used"])
                 step_state = StepState(
                     step=current_step_data.get("step", 0),
@@ -1163,6 +1189,8 @@ def _build_snapshot_from_events(
                     raw_observation=current_step_data.get("raw_observation", {}),
                 )
                 steps.append(step_state)
+                total_reward = float(step_state.cumulative_reward)
+                total_tokens += int(step_state.tokens_used or 0)
                 current_step_data = {}
         elif event.event_type == SessionEventType.MEMORY_UPDATE:

{rlm_code-0.1.8 → rlm_code-0.1.9}/rlm_code/rlm/visualizer.py RENAMED Viewed

@@ -62,6 +62,16 @@ def build_run_visualization(
             "success": observation_dict.get("success") if "success" in observation_dict else None,
             "path": str(observation_dict.get("path") or ""),
             "children_executed": int(observation_dict.get("children_executed") or 0),
+            "planner_preview": _clip_text(str(step.get("planner_raw") or ""), limit=260),
+            "code_preview": _clip_text(_action_code(step), limit=260),
+            "stdout_preview": _clip_text(str(observation_dict.get("stdout") or ""), limit=260),
+            "stderr_preview": _clip_text(str(observation_dict.get("stderr") or ""), limit=180),
+            "llm_calls_made": int(observation_dict.get("llm_calls_made") or 0),
+            "code_blocks_executed": int(observation_dict.get("code_blocks_executed") or 0),
+            "final_detected": bool(observation_dict.get("final_detected", False)),
+            "repl_variables": list(observation_dict.get("repl_variables") or [])[:20]
+            if isinstance(observation_dict.get("repl_variables"), list)
+            else [],
         }
         error = _extract_error(step)
         if error:
@@ -190,6 +200,19 @@ def _action_name(step: dict[str, Any]) -> str:
     return "unknown"
+def _action_code(step: dict[str, Any]) -> str:
+    action = step.get("action")
+    if not isinstance(action, dict):
+        return ""
+    code = action.get("code")
+    if isinstance(code, str) and code.strip():
+        return code
+    blocks = action.get("_code_blocks")
+    if isinstance(blocks, list):
+        return "\n\n".join(str(block) for block in blocks if str(block).strip())
+    return ""
 def _extract_error(step: dict[str, Any]) -> str:
     observation = step.get("observation")
     if not isinstance(observation, dict):

{rlm_code-0.1.8 → rlm_code-0.1.9}/rlm_code/ui/tui_app.py RENAMED Viewed

@@ -2403,14 +2403,40 @@ def run_textual_tui(config_manager: ConfigManager) -> None:
             if not timeline:
                 target.update("[dim]No steps recorded in this run.[/dim]")
                 return
-            lines = ["[bold cyan]Step  Action          Reward   Success[/bold cyan]"]
+            lines = [
+                f"[bold cyan]Trajectory[/bold cyan] [dim]{viz.get('run_id', '')}[/dim]",
+                "[bold cyan]Step  Action          Reward   Success  RLM signals[/bold cyan]",
+            ]
             for entry in timeline:
                 step = entry.get("step", "?")
                 action = str(entry.get("action", "?"))[:14].ljust(14)
                 reward = entry.get("reward", 0.0)
                 cum = entry.get("cumulative_reward", 0.0)
-                ok = "[green]Y[/green]" if entry.get("success") else "[red]N[/red]"
-                lines.append(f"  {step:<4} {action}  {reward:+.3f} ({cum:.3f})  {ok}")
+                success = entry.get("success")
+                if success is None:
+                    ok = "[dim]-[/dim]"
+                else:
+                    ok = "[green]Y[/green]" if success else "[red]N[/red]"
+                signals: list[str] = []
+                if entry.get("code_blocks_executed"):
+                    signals.append(f"code={entry.get('code_blocks_executed')}")
+                if entry.get("llm_calls_made"):
+                    signals.append(f"llm={entry.get('llm_calls_made')}")
+                if entry.get("final_detected"):
+                    signals.append("[green]FINAL[/green]")
+                variables = entry.get("repl_variables") or []
+                if variables:
+                    preview_vars = ", ".join(str(item) for item in variables[:5])
+                    signals.append(f"vars={preview_vars}")
+                signal_text = "  ".join(signals) if signals else "[dim]-[/dim]"
+                lines.append(f"  {step:<4} {action}  {reward:+.3f} ({cum:.3f})  {ok}       {signal_text}")
+                code_preview = str(entry.get("code_preview") or "").strip()
+                stdout_preview = str(entry.get("stdout_preview") or "").strip()
+                if code_preview:
+                    lines.append(f"       [magenta]code[/magenta] {code_preview}")
+                if stdout_preview:
+                    lines.append(f"       [blue]out [/blue] {stdout_preview}")
             target.update("\n".join(lines))
         def _apply_view_mode(self) -> None:
@@ -2842,21 +2868,76 @@ def run_textual_tui(config_manager: ConfigManager) -> None:
             if self._session_replayer is None:
                 return
             try:
+                state = None
                 if button_id == "replay_start_btn":
                     self._session_replayer.goto_start()
                 elif button_id == "replay_back_btn":
-                    self._session_replayer.step_backward()
+                    state = self._session_replayer.step_backward()
                 elif button_id == "replay_fwd_btn":
-                    self._session_replayer.step_forward()
+                    state = self._session_replayer.step_forward()
                 elif button_id == "replay_end_btn":
                     self._session_replayer.goto_end()
+                    state = self._session_replayer.get_current_state()
                 # Update position display
                 cur = self._session_replayer.current_step
                 total = self._session_replayer.total_steps
                 self.query_one("#replay_position", Static).update(f"Step {cur}/{total}")
+                if state is None:
+                    state = self._session_replayer.get_current_state()
+                self._render_replay_step_detail(state)
             except Exception:
                 pass
+        def _render_replay_step_detail(self, state: Any | None) -> None:
+            """Render the current replay step with pure-RLM-specific details."""
+            try:
+                target = self.query_one("#replay_step_detail", Static)
+            except Exception:
+                return
+            if state is None:
+                target.update("[dim]Replay is at the start or end of the run.[/dim]")
+                return
+            raw_observation = getattr(state, "raw_observation", {}) or {}
+            raw_action = getattr(state, "raw_action", {}) or {}
+            lines = [
+                f"[bold cyan]Step {getattr(state, 'step', '?')}[/bold cyan] "
+                f"action=[bold]{getattr(state, 'action_type', '') or raw_action.get('action', '')}[/bold] "
+                f"reward={float(getattr(state, 'reward', 0.0) or 0.0):+.3f}",
+            ]
+            code = str(getattr(state, "action_code", "") or raw_action.get("code", "") or "").strip()
+            if code:
+                lines.append("")
+                lines.append("[magenta]REPL code[/magenta]")
+                lines.append(code[:1800])
+            stdout = str(getattr(state, "output", "") or raw_observation.get("stdout", "") or "").strip()
+            stderr = str(getattr(state, "error", "") or raw_observation.get("stderr", "") or "").strip()
+            if stdout:
+                lines.append("")
+                lines.append("[blue]Observation stdout[/blue]")
+                lines.append(stdout[:1800])
+            if stderr:
+                lines.append("")
+                lines.append("[red]Observation stderr[/red]")
+                lines.append(stderr[:1000])
+            signals: list[str] = []
+            if raw_observation.get("code_blocks_executed"):
+                signals.append(f"code_blocks={raw_observation.get('code_blocks_executed')}")
+            if raw_observation.get("llm_calls_made"):
+                signals.append(f"llm_calls={raw_observation.get('llm_calls_made')}")
+            if raw_observation.get("final_detected"):
+                signals.append("FINAL detected")
+            variables = raw_observation.get("repl_variables")
+            if isinstance(variables, list) and variables:
+                signals.append("vars=" + ", ".join(str(item) for item in variables[:12]))
+            if signals:
+                lines.append("")
+                lines.append("[green]RLM signals[/green] " + "  ".join(signals))
+            target.update("\n".join(lines))
         def _refresh_research_dashboard(self, run_path: Path) -> None:
             """Populate the Research dashboard from a completed run trace."""
             try:
@@ -2904,7 +2985,7 @@ def run_textual_tui(config_manager: ConfigManager) -> None:
                     chart.values = [pt.get("cumulative_reward", 0.0) for pt in reward_curve]
                 self.query_one("#replay_step_detail", Static).update(
-                    "[dim]Use < > buttons to step through the run.[/dim]"
+                    "[dim]Use < > buttons to step through the run. Each step will show REPL code, observations, and pure-RLM signals.[/dim]"
                 )
                 self._set_research_sub_view("replay")
             except Exception as exc:

{rlm_code-0.1.8 → rlm_code-0.1.9}/tests/rlm/test_session_replay.py RENAMED Viewed

@@ -761,6 +761,62 @@ class TestLoadSession:
             replayer = load_session(jsonl_path)
             assert replayer.total_steps >= 1
+    def test_load_runner_jsonl_step_events(self):
+        """Runner JSONL step/final events should replay with useful state."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            jsonl_path = Path(tmpdir) / "runner.jsonl"
+            events = [
+                {
+                    "type": "step",
+                    "run_id": "run_demo",
+                    "environment": "pure_rlm",
+                    "task": "Validate pure RLM",
+                    "timestamp": "2026-06-25T10:00:01+00:00",
+                    "step": 1,
+                    "action": {
+                        "action": "run_repl",
+                        "code": "print(context.keys())",
+                        "reasoning": "Inspect context",
+                    },
+                    "observation": {
+                        "success": True,
+                        "stdout": "dict_keys(['a.py'])",
+                        "llm_calls_made": 1,
+                        "code_blocks_executed": 1,
+                        "repl_variables": ["context", "answer"],
+                    },
+                    "reward": 0.4,
+                    "usage": {"prompt_tokens": 10, "completion_tokens": 5},
+                },
+                {
+                    "type": "final",
+                    "run_id": "run_demo",
+                    "environment": "pure_rlm",
+                    "task": "Validate pure RLM",
+                    "timestamp": "2026-06-25T10:00:02+00:00",
+                    "completed": True,
+                    "steps": 1,
+                    "total_reward": 0.4,
+                    "final_response": "Yes",
+                    "usage": {"prompt_tokens": 10, "completion_tokens": 5},
+                },
+            ]
+            with jsonl_path.open("w") as f:
+                for event in events:
+                    f.write(json.dumps(event) + "\n")
+            replayer = load_session(jsonl_path)
+            assert replayer.total_steps == 1
+            assert replayer.snapshot.completed is True
+            assert replayer.snapshot.final_answer == "Yes"
+            step = replayer.step_forward()
+            assert step is not None
+            assert step.action_type == "run_repl"
+            assert step.action_code == "print(context.keys())"
+            assert step.output == "dict_keys(['a.py'])"
+            assert step.raw_observation["llm_calls_made"] == 1
 class TestCreateRecorder:
     """Tests for create_recorder convenience function."""

{rlm_code-0.1.8 → rlm_code-0.1.9}/tests/test_rlm_runner.py RENAMED Viewed

@@ -1414,6 +1414,39 @@ def test_rlm_pure_strict_blocks_delegate_actions(tmp_path):
     assert "delegate action is disabled" in str(observation)
+def test_rlm_pure_run_initializes_context_from_task_files(tmp_path):
+    source = tmp_path / "demo_module.py"
+    source.write_text("VALUE = 42\n", encoding="utf-8")
+    connector = _FakeConnector(
+        responses=[
+            '```repl\nfinal_answer = list(context.keys())\nFINAL_VAR("final_answer")\n```',
+        ]
+    )
+    runner = RLMRunner(
+        llm_connector=connector,
+        execution_engine=_ConfigurableExecutionEngine(pure_rlm_backend="exec"),
+        run_dir=tmp_path / "runs",
+        workdir=tmp_path,
+    )
+    result = runner.run_task(
+        "Inspect demo_module.py",
+        max_steps=1,
+        exec_timeout=5,
+        environment="pure_rlm",
+    )
+    assert result.completed is True
+    assert "demo_module.py" in result.final_response
+    events = runner.load_run_events(result.run_id)
+    context_event = next(event for event in events if event.get("type") == "context")
+    assert context_event["context_files"] == ["demo_module.py"]
+    step_event = next(event for event in events if event.get("type") == "step")
+    observation = step_event.get("observation", {})
+    assert observation.get("final_detected") is True
+    assert "context" in observation.get("repl_variables", [])
 def test_rlm_runner_blocks_exec_without_unsafe_opt_in(tmp_path):
     engine = _ConfigurableExecutionEngine(
         pure_rlm_backend="exec",