PyPI - rlm-code - Versions diffs - 0.1.7__tar.gz → 0.1.9__tar.gz - Mend

rlm-code 0.1.7tar.gz → 0.1.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (305) hide show

{rlm_code-0.1.7 → rlm_code-0.1.9}/.gitignore RENAMED Viewed

@@ -153,6 +153,7 @@ cython_debug/
 # Project specific
 dspy_config.yaml
+rlm_config.yaml
 *.log
 # Internal workspace data directories (all data in CWD)

{rlm_code-0.1.7 → rlm_code-0.1.9}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,24 @@ All notable changes to this project are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.9] - 2026-06-26
+### Added
+- Pure RLM runner context initialization from explicit workspace file references in the task, with compact repository snapshot fallback.
+- Context-load events for Pure RLM runs, including loaded file names and total context characters.
+- Runner JSONL replay coverage for action code, observations, success state, token counts, and cumulative reward.
+### Changed
+- TUI trajectory and replay views now show Pure RLM signals including REPL code, stdout/stderr previews, `llm_query` counts, executed code blocks, finalization status, and REPL variables.
+- Run visualization now includes richer Pure RLM previews for completed runs.
+## [0.1.8] - 2026-05-01
+### Added
+- AHE-style layered trace evidence corpus export from `TraceStore`.
+- New `trace_analysis` action `export_evidence_corpus` for writing `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans.
+- Evidence corpus tests covering direct store export and environment action export.
 ## [0.1.7] - 2026-04-30
 ### Added
@@ -69,4 +87,6 @@ Initial public release of **RLM Code**.
 [0.1.5]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.5
 [0.1.6]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.6
+[0.1.9]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.9
+[0.1.8]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.8
 [0.1.7]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.7

{rlm_code-0.1.7 → rlm_code-0.1.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: rlm-code
-Version: 0.1.7
+Version: 0.1.9
 Summary: RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems
 Project-URL: Homepage, https://github.com/SuperagenticAI/rlm-code
 Project-URL: Documentation, https://superagenticai.github.io/rlm-code/
@@ -118,20 +118,20 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
-## Release v0.1.7
+## Release v0.1.9
-This release adds HALO-style trace analysis as a new RLM environment.
+This release improves Pure RLM repository runs and makes completed trajectories more inspectable from the TUI and replay views.
-- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
-- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
-- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
-- `/rlm` help/docs updated for `env=trace_analysis`
-- Dedicated trace analysis docs under the Core Engine section
+- Pure RLM runs now initialize `context` from explicit workspace files mentioned in the task, with a compact repository snapshot fallback
+- Runner events now record context-load metadata for Pure RLM runs
+- Legacy runner JSONL step events replay with action code, observations, success, token counts, and cumulative reward
+- Run visualization now includes REPL code previews, stdout/stderr previews, `llm_query` counts, executed code blocks, finalization status, and REPL variables
+- TUI trajectory and replay views now surface Pure RLM signals directly for completed runs
 Example:
 ```text
-/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
+/rlm run "Validate pure_rlm_environment.py and cite context, REPL, llm_query, and FINAL evidence" env=pure_rlm steps=6
 ```
 ## Documentation

{rlm_code-0.1.7 → rlm_code-0.1.9}/README.md RENAMED Viewed

@@ -25,20 +25,20 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
-## Release v0.1.7
+## Release v0.1.9
-This release adds HALO-style trace analysis as a new RLM environment.
+This release improves Pure RLM repository runs and makes completed trajectories more inspectable from the TUI and replay views.
-- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
-- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
-- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
-- `/rlm` help/docs updated for `env=trace_analysis`
-- Dedicated trace analysis docs under the Core Engine section
+- Pure RLM runs now initialize `context` from explicit workspace files mentioned in the task, with a compact repository snapshot fallback
+- Runner events now record context-load metadata for Pure RLM runs
+- Legacy runner JSONL step events replay with action code, observations, success, token counts, and cumulative reward
+- Run visualization now includes REPL code previews, stdout/stderr previews, `llm_query` counts, executed code blocks, finalization status, and REPL variables
+- TUI trajectory and replay views now surface Pure RLM signals directly for completed runs
 Example:
 ```text
-/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
+/rlm run "Validate pure_rlm_environment.py and cite context, REPL, llm_query, and FINAL evidence" env=pure_rlm steps=6
 ```
 ## Documentation

{rlm_code-0.1.7 → rlm_code-0.1.9}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "rlm-code"
-version = "0.1.7"
+version = "0.1.9"
 description = "RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems"
 readme = "README.md"
 license = "Apache-2.0"

{rlm_code-0.1.7 → rlm_code-0.1.9}/rlm_code/__init__.py RENAMED Viewed

@@ -5,5 +5,5 @@ This package provides tools for creating, managing, and optimizing DSPy componen
 through natural language interactions.
 """
-__version__ = "0.1.7"
+__version__ = "0.1.9"
 __author__ = "Super Agentic AI"

{rlm_code-0.1.7 → rlm_code-0.1.9}/rlm_code/mcp/__init__.py RENAMED Viewed

@@ -17,7 +17,7 @@ from .exceptions import (
 )
 from .session_wrapper import MCPSessionWrapper
-__version__ = "0.1.7"
+__version__ = "0.1.9"
 __all__ = [
     "MCPClientManager",

{rlm_code-0.1.7 → rlm_code-0.1.9}/rlm_code/rlm/environments.py RENAMED Viewed

@@ -306,8 +306,10 @@ class TraceAnalysisEnvironment(GenericRLMEnvironment):
             "Return ONLY valid JSON object with keys:\n"
             "{"
             '"action": "set_trace_path" | "get_dataset_overview" | "query_traces" | '
-            '"count_traces" | "view_trace" | "search_trace" | "view_spans" | "final", '
+            '"count_traces" | "view_trace" | "search_trace" | "view_spans" | '
+            '"export_evidence_corpus" | "final", '
             '"trace_path": "<path to JSONL traces>", '
+            '"output_dir": "<directory for exported evidence corpus>", '
             '"filters": {"has_errors": true, "model_names": ["..."], "service_names": ["..."], '
             '"agent_names": ["..."], "project_id": "..."}, '
             '"trace_id": "<trace id>", '
@@ -324,6 +326,7 @@ class TraceAnalysisEnvironment(GenericRLMEnvironment):
             "- Always begin analysis with get_dataset_overview.\n"
             "- Use query_traces to choose real trace ids; never invent trace ids.\n"
             "- For large traces, prefer search_trace followed by view_spans.\n"
+            "- Use export_evidence_corpus when the caller needs files for MetaHarness or another coding agent.\n"
             "- Identify systemic harness failures, not one-off anomalies.\n"
             "- Output JSON only."
         )
@@ -448,6 +451,21 @@ class TraceAnalysisEnvironment(GenericRLMEnvironment):
                     reward=0.7,
                     memory_note=f"Viewed selected spans for trace {trace_id}.",
                 )
+            if action_name == "export_evidence_corpus":
+                output_dir = self._required_str(action, "output_dir")
+                resolved_output = Path(output_dir).expanduser()
+                if not resolved_output.is_absolute():
+                    resolved_output = self.workdir / resolved_output
+                return self._ok(
+                    observation=store.export_evidence_corpus(
+                        resolved_output,
+                        filters,
+                        limit=self._int_arg(action, "limit", 100, minimum=1, maximum=1000),
+                        include_raw=self._bool_arg(action, "include_raw", True),
+                    ),
+                    reward=0.75,
+                    memory_note="Exported layered trace evidence corpus.",
+                )
         except Exception as exc:
             return EnvironmentActionResult(
                 observation={"success": False, "error": f"{type(exc).__name__}: {exc}"},
@@ -530,6 +548,19 @@ class TraceAnalysisEnvironment(GenericRLMEnvironment):
             parsed = default
         return max(minimum, min(maximum, parsed))
+    @staticmethod
+    def _bool_arg(action: dict[str, Any], key: str, default: bool) -> bool:
+        value = action.get(key, default)
+        if isinstance(value, bool):
+            return value
+        if isinstance(value, str):
+            normalized = value.strip().lower()
+            if normalized in {"1", "true", "yes", "on"}:
+                return True
+            if normalized in {"0", "false", "no", "off"}:
+                return False
+        return default
 class DSPyCodingRLMEnvironment(GenericRLMEnvironment):
     """DSPy-focused environment with file edit + tests + DSPy-aware scoring."""

{rlm_code-0.1.7 → rlm_code-0.1.9}/rlm_code/rlm/runner.py RENAMED Viewed

@@ -9,6 +9,7 @@ from __future__ import annotations
 import hashlib
 import json
+import re
 import threading
 import time
 from dataclasses import asdict, dataclass, is_dataclass
@@ -29,7 +30,7 @@ from .benchmark_manager import (
 )
 from .benchmarks import RLMBenchmarkCase, load_benchmark_packs
 from .chat_session import ChatSessionMixin
-from .context_store import LazyFileContext
+from .context_store import ContextRef, LazyFileContext
 from .delegation import DelegationMixin
 from .environments import (
     DSPyCodingRLMEnvironment,
@@ -467,6 +468,93 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
             allow_unsafe_exec=(selected_backend == "exec" and self._pure_rlm_allow_unsafe_exec),
         )
+    def _extract_task_file_refs(self, task: str, limit: int = 12) -> list[ContextRef]:
+        """Find explicit workspace file references mentioned in a task string."""
+        candidates = re.findall(
+            r"(?<![\w.-])(?:[\w.-]+/)*[\w.-]+\.(?:py|md|toml|yaml|yml|json|txt|js|jsx|ts|tsx)",
+            task,
+        )
+        seen: set[str] = set()
+        refs: list[ContextRef] = []
+        for candidate in candidates:
+            normalized = candidate.strip().strip("`'\".,:;)")
+            if not normalized or normalized in seen:
+                continue
+            seen.add(normalized)
+            refs.append(ContextRef(path=normalized))
+            if len(refs) >= limit:
+                break
+        return refs
+    def _build_pure_rlm_initial_context(self, task: str) -> dict[str, str]:
+        """
+        Build a small real-code context for Pure RLM runs.
+        The direct PureRLMEnvironment API expects context to be initialized
+        explicitly.  Runner/TUI users expect `/rlm run ... env=pure_rlm` to
+        start with useful workspace data, so we seed `context` with explicit
+        files named in the task, falling back to a compact repository snapshot.
+        """
+        refs = self._extract_task_file_refs(task)
+        if not refs:
+            refs = self.context_store.discover(limit=12)
+        context: dict[str, str] = {}
+        for ref in refs:
+            snippet = self.context_store.read(ref, max_chars=12000)
+            if snippet:
+                context[ref.path] = snippet
+        if context:
+            return context
+        discovered = self.context_store.discover(limit=80)
+        tree = "\n".join(ref.path for ref in discovered)
+        return {
+            "_workspace": (
+                f"Workspace: {self.workdir}\n"
+                "No explicit file snippets were loaded. Available files:\n"
+                f"{tree}"
+            ).strip()
+        }
+    def _initialize_pure_rlm_run_context(
+        self,
+        env: RLMEnvironment,
+        task: str,
+        *,
+        run_id: str,
+        run_path: Path,
+    ) -> int:
+        """Initialize `context` for Pure RLM runs and persist a context event."""
+        if env.name != "pure_rlm" or not hasattr(env, "initialize_context"):
+            return 0
+        context = self._build_pure_rlm_initial_context(task)
+        env.initialize_context(
+            context,
+            description="Workspace files selected for this Pure RLM run",
+            additional_vars={"query": task},
+        )
+        context_event = {
+            "type": "context",
+            "run_id": run_id,
+            "environment": env.name,
+            "timestamp": self._utc_now(),
+            "context_files": list(context.keys()),
+            "context_chars": sum(len(value) for value in context.values()),
+        }
+        self._append_event(run_path, context_event)
+        self._emit_runtime_event(
+            "context_load",
+            {
+                "run_id": run_id,
+                "files": len(context),
+                "chars": context_event["context_chars"],
+            },
+        )
+        return len(context)
     def run_task(
         self,
         task: str,
@@ -596,6 +684,12 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
         final_response = ""
         cancelled = False
         trajectory: list[dict[str, Any]] = []
+        context_files = self._initialize_pure_rlm_run_context(
+            env,
+            cleaned_task,
+            run_id=run_id,
+            run_path=run_path,
+        )
         usage_start = self._usage_snapshot()
         self.observability.on_run_start(
             run_id,
@@ -616,6 +710,7 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
                 "parent_run_id": _parent_run_id,
                 "pure_rlm_backend": self._pure_rlm_backend if env.name == "pure_rlm" else None,
                 "pure_rlm_strict": strict_pure_mode if env.name == "pure_rlm" else None,
+                "context_files": context_files if env.name == "pure_rlm" else None,
             },
         )
         self._emit_runtime_event(
@@ -627,6 +722,7 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
                 "framework": native_framework,
                 "depth": _depth,
                 "parent_run_id": _parent_run_id,
+                "context_files": context_files if env.name == "pure_rlm" else None,
             },
         )

{rlm_code-0.1.7 → rlm_code-0.1.9}/rlm_code/rlm/session_replay.py RENAMED Viewed

@@ -1035,14 +1035,30 @@ def _convert_legacy_step(data: dict[str, Any]) -> SessionEvent:
     step_type = data.get("type", "")
     if step_type == "step":
+        observation = data.get("observation", {})
+        observation_dict = observation if isinstance(observation, dict) else {}
+        action = data.get("action", {})
+        action_dict = action if isinstance(action, dict) else {}
+        success = observation_dict.get("success")
+        if success is None:
+            success = not bool(observation_dict.get("error") or observation_dict.get("stderr"))
+        usage = data.get("usage", {})
+        usage_dict = usage if isinstance(usage, dict) else {}
         return SessionEvent(
             event_type=SessionEventType.STEP_END,
             timestamp=data.get("timestamp", _utc_now()),
-            step=data.get("step", 0),
+            step=int(data.get("step", 0) or 0),
             data={
-                "action": data.get("action", {}),
-                "observation": data.get("observation", {}),
+                "step": int(data.get("step", 0) or 0),
+                "timestamp": data.get("timestamp", _utc_now()),
+                "action": action_dict,
+                "observation": observation_dict,
                 "reward": data.get("reward", 0.0),
+                "success": bool(success),
+                "tokens_used": int(
+                    usage_dict.get("prompt_tokens", 0) or 0
+                )
+                + int(usage_dict.get("completion_tokens", 0) or 0),
             },
             run_id=data.get("run_id", ""),
             depth=data.get("depth", 0),
@@ -1125,12 +1141,18 @@ def _build_snapshot_from_events(
         elif event.event_type == SessionEventType.STEP_END:
             # Build StepState from accumulated data
+            if "step" not in current_step_data:
+                current_step_data = {
+                    "step": int(event.data.get("step", event.step) or 0),
+                    "timestamp": str(event.data.get("timestamp", event.timestamp) or ""),
+                }
             if "step" in current_step_data:
                 # Merge any additional data from STEP_END event
                 if "action" in event.data:
                     action = event.data["action"]
                     current_step_data.setdefault("action_type", action.get("action", ""))
                     current_step_data.setdefault("action_code", action.get("code", ""))
+                    current_step_data.setdefault("action_rationale", action.get("reasoning", ""))
                     current_step_data.setdefault("raw_action", action)
                 if "observation" in event.data:
                     obs = event.data["observation"]
@@ -1138,12 +1160,16 @@ def _build_snapshot_from_events(
                     current_step_data.setdefault("error", obs.get("error", obs.get("stderr", "")))
                     current_step_data.setdefault("raw_observation", obs)
                 if "reward" in event.data:
+                    reward = float(event.data.get("reward", 0.0) or 0.0)
+                    cumulative = event.data.get("cumulative_reward")
+                    if cumulative is None:
+                        cumulative = total_reward + reward
                     current_step_data.setdefault("reward", event.data["reward"])
-                    current_step_data.setdefault(
-                        "cumulative_reward", event.data.get("cumulative_reward", 0.0)
-                    )
+                    current_step_data.setdefault("cumulative_reward", cumulative)
                 if "success" in event.data:
                     current_step_data.setdefault("success", event.data["success"])
+                if "tokens_used" in event.data:
+                    current_step_data.setdefault("tokens_used", event.data["tokens_used"])
                 step_state = StepState(
                     step=current_step_data.get("step", 0),
@@ -1163,6 +1189,8 @@ def _build_snapshot_from_events(
                     raw_observation=current_step_data.get("raw_observation", {}),
                 )
                 steps.append(step_state)
+                total_reward = float(step_state.cumulative_reward)
+                total_tokens += int(step_state.tokens_used or 0)
                 current_step_data = {}
         elif event.event_type == SessionEventType.MEMORY_UPDATE:

{rlm_code-0.1.7 → rlm_code-0.1.9}/rlm_code/rlm/visualizer.py RENAMED Viewed

@@ -62,6 +62,16 @@ def build_run_visualization(
             "success": observation_dict.get("success") if "success" in observation_dict else None,
             "path": str(observation_dict.get("path") or ""),
             "children_executed": int(observation_dict.get("children_executed") or 0),
+            "planner_preview": _clip_text(str(step.get("planner_raw") or ""), limit=260),
+            "code_preview": _clip_text(_action_code(step), limit=260),
+            "stdout_preview": _clip_text(str(observation_dict.get("stdout") or ""), limit=260),
+            "stderr_preview": _clip_text(str(observation_dict.get("stderr") or ""), limit=180),
+            "llm_calls_made": int(observation_dict.get("llm_calls_made") or 0),
+            "code_blocks_executed": int(observation_dict.get("code_blocks_executed") or 0),
+            "final_detected": bool(observation_dict.get("final_detected", False)),
+            "repl_variables": list(observation_dict.get("repl_variables") or [])[:20]
+            if isinstance(observation_dict.get("repl_variables"), list)
+            else [],
         }
         error = _extract_error(step)
         if error:
@@ -190,6 +200,19 @@ def _action_name(step: dict[str, Any]) -> str:
     return "unknown"
+def _action_code(step: dict[str, Any]) -> str:
+    action = step.get("action")
+    if not isinstance(action, dict):
+        return ""
+    code = action.get("code")
+    if isinstance(code, str) and code.strip():
+        return code
+    blocks = action.get("_code_blocks")
+    if isinstance(blocks, list):
+        return "\n\n".join(str(block) for block in blocks if str(block).strip())
+    return ""
 def _extract_error(step: dict[str, Any]) -> str:
     observation = step.get("observation")
     if not isinstance(observation, dict):

rlm-code 0.1.7__tar.gz → 0.1.9__tar.gz

rlm-code 0.1.7tar.gz → 0.1.9tar.gz