PyPI - rlm-code - Versions diffs - 0.1.6__tar.gz → 0.1.8__tar.gz - Mend

rlm-code 0.1.6tar.gz → 0.1.8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (305) hide show

{rlm_code-0.1.6 → rlm_code-0.1.8}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,26 @@ All notable changes to this project are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.8] - 2026-05-01
+### Added
+- AHE-style layered trace evidence corpus export from `TraceStore`.
+- New `trace_analysis` action `export_evidence_corpus` for writing `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans.
+- Evidence corpus tests covering direct store export and environment action export.
+## [0.1.7] - 2026-04-30
+### Added
+- HALO-style `trace_analysis` RLM environment for diagnosing agent harness failures from one-span-per-line JSONL traces.
+- Trace sidecar indexing with dataset rollups for trace counts, span counts, error traces, services, models, agents, token totals, and sample trace ids.
+- Bounded trace inspection actions: `get_dataset_overview`, `query_traces`, `count_traces`, `view_trace`, `search_trace`, and `view_spans`.
+- Large-trace safeguards: per-attribute truncation, oversized trace summaries, and higher-cap selected-span reads.
+- Tests for trace indexing, querying, searching, selected-span viewing, and trace environment actions.
+- Trace analysis documentation under the Core Engine docs.
+### Changed
+- `/rlm` command help now advertises `env=trace_analysis` for run, chat, and doctor workflows.
 ## [0.1.6] - 2026-02-20
 ### Added
@@ -56,3 +76,5 @@ Initial public release of **RLM Code**.
 [0.1.5]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.5
 [0.1.6]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.6
+[0.1.8]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.8
+[0.1.7]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.7

{rlm_code-0.1.6 → rlm_code-0.1.8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: rlm-code
-Version: 0.1.6
+Version: 0.1.8
 Summary: RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems
 Project-URL: Homepage, https://github.com/SuperagenticAI/rlm-code
 Project-URL: Documentation, https://superagenticai.github.io/rlm-code/
@@ -118,20 +118,21 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
-## Release v0.1.6
+## Release v0.1.8
-This release adds the new CodeMode path as an opt-in harness strategy.
+This release extends HALO/AHE-style trace analysis with layered evidence export.
-- New harness strategy: `strategy=codemode` (default remains `strategy=tool_call`)
-- MCP bridge flow for CodeMode: `search_tools` -> typed tool surface -> `call_tool_chain`
-- Guardrails before execution: blocked API classes plus timeout/size/tool-call caps
-- Benchmark telemetry for side-by-side comparison: `tool_call` vs `codemode`
-- Dedicated docs section for CodeMode: quickstart, architecture, guardrails, evaluation
+- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
+- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
+- AHE-style evidence corpus export with `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans
+- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
+- `/rlm` help/docs updated for `env=trace_analysis`
+- Dedicated trace analysis docs under the Core Engine section
 Example:
 ```text
-/harness run "implement feature and add tests" steps=8 mcp=on strategy=codemode mcp_server=codemode
+/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
 ```
 ## Documentation
@@ -287,6 +288,62 @@ Notes:
 - In Local/BYOK connection modes, likely coding prompts in chat can auto-route to harness.
 - In ACP mode, auto-routing is intentionally off; use `/harness run ...` explicitly.
+### 8. CodeMode with UTCP and Cloudflare MCP
+Use these server entries in your project `rlm_config.yaml`:
+```yaml
+mcp_servers:
+  utcp-codemode:
+    name: utcp-codemode
+    description: "Local CodeMode MCP bridge"
+    enabled: true
+    auto_connect: false
+    timeout_seconds: 30
+    retry_attempts: 3
+    transport:
+      type: stdio
+      command: npx
+      args:
+        - "@utcp/code-mode-mcp"
+  cloudflare-codemode:
+    name: cloudflare-codemode
+    description: "Cloudflare MCP via remote bridge"
+    enabled: true
+    auto_connect: false
+    timeout_seconds: 30
+    retry_attempts: 3
+    transport:
+      type: stdio
+      command: npx
+      args:
+        - "mcp-remote"
+        - "https://mcp.cloudflare.com/mcp"
+```
+UTCP path (native CodeMode in current release):
+```text
+/mcp-connect utcp-codemode
+/mcp-tools utcp-codemode
+/harness run "analyze this repo, find TODO/FIXME, and create report.json" steps=3 mcp=on strategy=codemode mcp_server=utcp-codemode
+```
+Cloudflare path (recommended strategy today):
+```text
+/mcp-connect cloudflare-codemode
+/mcp-tools cloudflare-codemode
+/harness run "list available tools and run one safe read-only action, then summarize in 3 bullets" steps=3 mcp=on strategy=tool_call mcp_server=cloudflare-codemode
+```
+Notes:
+- On first Cloudflare connect, `mcp-remote` may ask for interactive authentication.
+- In this release, `strategy=codemode` expects the `search_tools` + `call_tool_chain` bridge contract.
+- If a remote MCP server exposes a different tool contract, use `strategy=tool_call`.
 ## How the RLM Loop Works
 Traditional LLM usage: paste your document into the prompt, ask a question, hope the model doesn't lose details in the middle.

{rlm_code-0.1.6 → rlm_code-0.1.8}/README.md RENAMED Viewed

@@ -25,20 +25,21 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
-## Release v0.1.6
+## Release v0.1.8
-This release adds the new CodeMode path as an opt-in harness strategy.
+This release extends HALO/AHE-style trace analysis with layered evidence export.
-- New harness strategy: `strategy=codemode` (default remains `strategy=tool_call`)
-- MCP bridge flow for CodeMode: `search_tools` -> typed tool surface -> `call_tool_chain`
-- Guardrails before execution: blocked API classes plus timeout/size/tool-call caps
-- Benchmark telemetry for side-by-side comparison: `tool_call` vs `codemode`
-- Dedicated docs section for CodeMode: quickstart, architecture, guardrails, evaluation
+- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
+- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
+- AHE-style evidence corpus export with `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans
+- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
+- `/rlm` help/docs updated for `env=trace_analysis`
+- Dedicated trace analysis docs under the Core Engine section
 Example:
 ```text
-/harness run "implement feature and add tests" steps=8 mcp=on strategy=codemode mcp_server=codemode
+/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
 ```
 ## Documentation
@@ -194,6 +195,62 @@ Notes:
 - In Local/BYOK connection modes, likely coding prompts in chat can auto-route to harness.
 - In ACP mode, auto-routing is intentionally off; use `/harness run ...` explicitly.
+### 8. CodeMode with UTCP and Cloudflare MCP
+Use these server entries in your project `rlm_config.yaml`:
+```yaml
+mcp_servers:
+  utcp-codemode:
+    name: utcp-codemode
+    description: "Local CodeMode MCP bridge"
+    enabled: true
+    auto_connect: false
+    timeout_seconds: 30
+    retry_attempts: 3
+    transport:
+      type: stdio
+      command: npx
+      args:
+        - "@utcp/code-mode-mcp"
+  cloudflare-codemode:
+    name: cloudflare-codemode
+    description: "Cloudflare MCP via remote bridge"
+    enabled: true
+    auto_connect: false
+    timeout_seconds: 30
+    retry_attempts: 3
+    transport:
+      type: stdio
+      command: npx
+      args:
+        - "mcp-remote"
+        - "https://mcp.cloudflare.com/mcp"
+```
+UTCP path (native CodeMode in current release):
+```text
+/mcp-connect utcp-codemode
+/mcp-tools utcp-codemode
+/harness run "analyze this repo, find TODO/FIXME, and create report.json" steps=3 mcp=on strategy=codemode mcp_server=utcp-codemode
+```
+Cloudflare path (recommended strategy today):
+```text
+/mcp-connect cloudflare-codemode
+/mcp-tools cloudflare-codemode
+/harness run "list available tools and run one safe read-only action, then summarize in 3 bullets" steps=3 mcp=on strategy=tool_call mcp_server=cloudflare-codemode
+```
+Notes:
+- On first Cloudflare connect, `mcp-remote` may ask for interactive authentication.
+- In this release, `strategy=codemode` expects the `search_tools` + `call_tool_chain` bridge contract.
+- If a remote MCP server exposes a different tool contract, use `strategy=tool_call`.
 ## How the RLM Loop Works
 Traditional LLM usage: paste your document into the prompt, ask a question, hope the model doesn't lose details in the middle.

{rlm_code-0.1.6 → rlm_code-0.1.8}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "rlm-code"
-version = "0.1.6"
+version = "0.1.8"
 description = "RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems"
 readme = "README.md"
 license = "Apache-2.0"

{rlm_code-0.1.6 → rlm_code-0.1.8}/rlm_code/__init__.py RENAMED Viewed

@@ -5,5 +5,5 @@ This package provides tools for creating, managing, and optimizing DSPy componen
 through natural language interactions.
 """
-__version__ = "0.1.6"
+__version__ = "0.1.8"
 __author__ = "Super Agentic AI"

{rlm_code-0.1.6 → rlm_code-0.1.8}/rlm_code/commands/slash_commands.py RENAMED Viewed

@@ -1684,7 +1684,7 @@ class SlashCommandHandler:
         Manage RLM runs.
         Usage:
-            /rlm run <task> [steps=N] [timeout=N] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [env=generic|dspy|pure_rlm] [sub=provider/model]
+            /rlm run <task> [steps=N] [timeout=N] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [env=generic|dspy|pure_rlm|trace_analysis] [sub=provider/model]
             /rlm bench [list|preset=name] [mode=native|harness|direct-llm] [strategy=tool_call|codemode] [mcp=on|off] [mcp_server=name] [pack=path[,path2]] [limit=N] [steps=N] [timeout=N] [branch=N] [framework=<see /rlm frameworks>] [env=generic|dspy|pure_rlm] [sub=provider/model]
             /rlm bench compare [candidate=<id|path|latest>] [baseline=<id|path|previous>] [min_reward_delta=N] [min_completion_delta=N] [max_steps_increase=N]
             /rlm bench validate [candidate=<id|path|latest>] [baseline=<id|path|previous>] [min_reward_delta=N] [min_completion_delta=N] [max_steps_increase=N] [--json]
@@ -1696,8 +1696,8 @@ class SlashCommandHandler:
             /rlm status [run_id]
             /rlm abort [run_id|all]
             /rlm replay [run_id|latest]
-            /rlm doctor [env=generic|dspy|pure_rlm] [--json]
-            /rlm chat <message> [session=name] [env=generic|dspy|pure_rlm] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [sub=provider/model]
+            /rlm doctor [env=generic|dspy|pure_rlm|trace_analysis] [--json]
+            /rlm chat <message> [session=name] [env=generic|dspy|pure_rlm|trace_analysis] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [sub=provider/model]
             /rlm chat status [session=name]
             /rlm chat reset [session=name]
             /rlm observability
@@ -1708,14 +1708,14 @@ class SlashCommandHandler:
             console.print("[bold cyan]🧠 RLM Commands[/bold cyan]")
             console.print(
                 "  [yellow]/rlm run <task> [steps=N] [timeout=N] [branch=N] [depth=N] [children=N] "
-                f"[parallel=N] [budget=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm] "
+                f"[parallel=N] [budget=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm|trace_analysis] "
                 "[sub=provider/model][/yellow]"
             )
             console.print(
                 "  [yellow]/rlm bench [list|preset=name] [mode=native|harness|direct-llm] "
                 "[strategy=tool_call|codemode] [mcp=on|off] [mcp_server=name] "
                 "[pack=path[,path2]] [limit=N] [steps=N] "
-                f"[timeout=N] [branch=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm] [sub=provider/model][/yellow]"
+                f"[timeout=N] [branch=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm|trace_analysis] [sub=provider/model][/yellow]"
             )
             console.print(
                 "  [yellow]/rlm bench compare [candidate=<id|path|latest>] [baseline=<id|path|previous>] "
@@ -1741,9 +1741,9 @@ class SlashCommandHandler:
             console.print("  [yellow]/rlm status [run_id][/yellow]")
             console.print("  [yellow]/rlm abort [run_id|all][/yellow]")
             console.print("  [yellow]/rlm replay [run_id|latest][/yellow]")
-            console.print("  [yellow]/rlm doctor [env=generic|dspy|pure_rlm] [--json][/yellow]")
+            console.print("  [yellow]/rlm doctor [env=generic|dspy|pure_rlm|trace_analysis] [--json][/yellow]")
             console.print(
-                "  [yellow]/rlm chat <message> [session=name] [env=generic|dspy|pure_rlm] [branch=N] [depth=N] "
+                "  [yellow]/rlm chat <message> [session=name] [env=generic|dspy|pure_rlm|trace_analysis] [branch=N] [depth=N] "
                 f"[children=N] [parallel=N] [budget=N] [framework={framework_opts}] "
                 "[sub=provider/model][/yellow]"
             )
@@ -2135,7 +2135,7 @@ class SlashCommandHandler:
             task = " ".join(task_tokens).strip()
             if not task:
                 show_error_message(
-                    "Usage: /rlm run <task> [steps=N] [timeout=N] [env=generic|dspy|pure_rlm] "
+                    "Usage: /rlm run <task> [steps=N] [timeout=N] [env=generic|dspy|pure_rlm|trace_analysis] "
                     "[depth=N] [children=N] [parallel=N] [budget=N] "
                     f"[framework={framework_opts}] "
                     "[branch=N] [sub=provider/model]"

{rlm_code-0.1.6 → rlm_code-0.1.8}/rlm_code/mcp/__init__.py RENAMED Viewed

@@ -17,7 +17,7 @@ from .exceptions import (
 )
 from .session_wrapper import MCPSessionWrapper
-__version__ = "0.1.6"
+__version__ = "0.1.8"
 __all__ = [
     "MCPClientManager",

{rlm_code-0.1.6 → rlm_code-0.1.8}/rlm_code/rlm/action_planner.py RENAMED Viewed

@@ -15,6 +15,7 @@ from .environments import (
     DSPyCodingRLMEnvironment,
     GenericRLMEnvironment,
     RLMEnvironment,
+    TraceAnalysisEnvironment,
 )
 from .pure_rlm_environment import PureRLMConfig, PureRLMEnvironment
@@ -276,6 +277,8 @@ class ActionPlannerMixin:
             )
         if isinstance(env, DSPyCodingRLMEnvironment):
             return DSPyCodingRLMEnvironment(workdir=workdir, reward_profile=self.reward_profile)
+        if isinstance(env, TraceAnalysisEnvironment):
+            return TraceAnalysisEnvironment(workdir=workdir, reward_profile=self.reward_profile)
         if isinstance(env, GenericRLMEnvironment):
             return GenericRLMEnvironment(workdir=workdir, reward_profile=self.reward_profile)
         # Fallback to generic environment in preview if an unknown env type appears.

{rlm_code-0.1.6 → rlm_code-0.1.8}/rlm_code/rlm/environments.py RENAMED Viewed

@@ -286,6 +286,282 @@ class GenericRLMEnvironment:
         return "Execution failed without stderr."
+class TraceAnalysisEnvironment(GenericRLMEnvironment):
+    """HALO-style trace analysis environment over one-span-per-line JSONL traces."""
+    name = "trace_analysis"
+    def __init__(
+        self,
+        workdir: Path | None = None,
+        reward_profile: RLMRewardProfile | dict[str, Any] | None = None,
+    ):
+        super().__init__(workdir=workdir, reward_profile=reward_profile)
+        self._trace_path: Path | None = None
+        self._store: Any | None = None
+    def system_prompt(self) -> str:
+        return (
+            "You are an RLM planner specialized for analyzing agent execution traces.\n"
+            "Return ONLY valid JSON object with keys:\n"
+            "{"
+            '"action": "set_trace_path" | "get_dataset_overview" | "query_traces" | '
+            '"count_traces" | "view_trace" | "search_trace" | "view_spans" | '
+            '"export_evidence_corpus" | "final", '
+            '"trace_path": "<path to JSONL traces>", '
+            '"output_dir": "<directory for exported evidence corpus>", '
+            '"filters": {"has_errors": true, "model_names": ["..."], "service_names": ["..."], '
+            '"agent_names": ["..."], "project_id": "..."}, '
+            '"trace_id": "<trace id>", '
+            '"span_ids": ["<span id>"], '
+            '"pattern": "<literal substring>", '
+            '"limit": <integer>, '
+            '"offset": <integer>, '
+            '"rationale": "<brief reason>", '
+            '"done": true|false, '
+            '"final_response": "<required when action=final>"'
+            "}\n"
+            "Rules:\n"
+            "- Load a trace file first if one is not already active.\n"
+            "- Always begin analysis with get_dataset_overview.\n"
+            "- Use query_traces to choose real trace ids; never invent trace ids.\n"
+            "- For large traces, prefer search_trace followed by view_spans.\n"
+            "- Use export_evidence_corpus when the caller needs files for MetaHarness or another coding agent.\n"
+            "- Identify systemic harness failures, not one-off anomalies.\n"
+            "- Output JSON only."
+        )
+    def planner_prompt(
+        self, task: str, memory: list[str], trajectory: list[dict[str, Any]], step_index: int
+    ) -> str:
+        inferred = self._extract_trace_path(task)
+        if inferred is not None and inferred != self._trace_path:
+            try:
+                self._load_store(inferred)
+            except Exception:
+                # Surface the failure through the prompt; execute_action will return
+                # the structured error if the planner attempts to use the path.
+                self._trace_path = inferred
+                self._store = None
+        base = super().planner_prompt(task, memory, trajectory, step_index)
+        active = str(self._trace_path) if self._trace_path is not None else "(none)"
+        overview = ""
+        if self._store is not None:
+            try:
+                data = self._store.get_overview({})
+                overview = (
+                    f"\nActive trace overview: traces={data['total_traces']} "
+                    f"spans={data['total_spans']} errors={data['error_trace_count']} "
+                    f"sample_trace_ids={data['sample_trace_ids'][:5]}"
+                )
+            except Exception:
+                overview = ""
+        return (
+            f"{base}\n\n"
+            f"Trace analysis environment.\n"
+            f"Active trace path: {active}\n"
+            "If the task includes trace=<path> or trace_path=<path>, use that file.\n"
+            "Goal: produce a concise evidence report of repeated harness failure modes "
+            "with concrete trace ids/spans and suggested harness changes."
+            f"{overview}"
+        )
+    def execute_action(
+        self,
+        action: dict[str, Any],
+        execution_engine: Any,
+        exec_timeout: int,
+        llm_connector: Any | None = None,
+    ) -> EnvironmentActionResult:
+        action_name = str(action.get("action", "")).strip().lower()
+        if action_name == "final":
+            return super().execute_action(
+                action,
+                execution_engine,
+                exec_timeout,
+                llm_connector=llm_connector,
+            )
+        try:
+            if action_name == "set_trace_path":
+                store = self._store_from_action(action, required_path=True)
+                return EnvironmentActionResult(
+                    observation={
+                        "success": True,
+                        "trace_path": str(store.trace_path),
+                        "index_path": str(store.index_path),
+                        "overview": store.get_overview({}),
+                    },
+                    reward=0.55,
+                    memory_note=f"Loaded trace dataset: {store.trace_path}",
+                )
+            store = self._store_from_action(action, required_path=False)
+            filters = action.get("filters") if isinstance(action.get("filters"), dict) else {}
+            if action_name == "get_dataset_overview":
+                return self._ok(
+                    observation=store.get_overview(filters),
+                    reward=0.45,
+                    memory_note="Loaded trace dataset overview.",
+                )
+            if action_name == "query_traces":
+                return self._ok(
+                    observation=store.query_traces(
+                        filters,
+                        limit=self._int_arg(action, "limit", 50, minimum=1, maximum=200),
+                        offset=self._int_arg(action, "offset", 0, minimum=0, maximum=1_000_000),
+                    ),
+                    reward=0.5,
+                    memory_note="Queried trace summaries.",
+                )
+            if action_name == "count_traces":
+                return self._ok(
+                    observation=store.count_traces(filters),
+                    reward=0.35,
+                    memory_note="Counted traces matching filters.",
+                )
+            if action_name == "view_trace":
+                trace_id = self._required_str(action, "trace_id")
+                return self._ok(
+                    observation=store.view_trace(trace_id),
+                    reward=0.65,
+                    memory_note=f"Viewed trace {trace_id}.",
+                )
+            if action_name == "search_trace":
+                trace_id = self._required_str(action, "trace_id")
+                pattern = self._required_str(action, "pattern")
+                return self._ok(
+                    observation=store.search_trace(
+                        trace_id,
+                        pattern,
+                        limit=self._int_arg(action, "limit", 100, minimum=1, maximum=500),
+                    ),
+                    reward=0.65,
+                    memory_note=f"Searched trace {trace_id} for {pattern!r}.",
+                )
+            if action_name == "view_spans":
+                trace_id = self._required_str(action, "trace_id")
+                span_ids = action.get("span_ids")
+                if not isinstance(span_ids, list) or not span_ids:
+                    raise ValueError("view_spans requires non-empty span_ids list")
+                return self._ok(
+                    observation=store.view_spans(trace_id, [str(item) for item in span_ids]),
+                    reward=0.7,
+                    memory_note=f"Viewed selected spans for trace {trace_id}.",
+                )
+            if action_name == "export_evidence_corpus":
+                output_dir = self._required_str(action, "output_dir")
+                resolved_output = Path(output_dir).expanduser()
+                if not resolved_output.is_absolute():
+                    resolved_output = self.workdir / resolved_output
+                return self._ok(
+                    observation=store.export_evidence_corpus(
+                        resolved_output,
+                        filters,
+                        limit=self._int_arg(action, "limit", 100, minimum=1, maximum=1000),
+                        include_raw=self._bool_arg(action, "include_raw", True),
+                    ),
+                    reward=0.75,
+                    memory_note="Exported layered trace evidence corpus.",
+                )
+        except Exception as exc:
+            return EnvironmentActionResult(
+                observation={"success": False, "error": f"{type(exc).__name__}: {exc}"},
+                reward=-0.25,
+                memory_note=f"Trace analysis action failed: {type(exc).__name__}.",
+            )
+        return EnvironmentActionResult(
+            observation={"success": False, "error": f"Unsupported action '{action_name}'."},
+            reward=-0.2,
+            memory_note="Planner produced unsupported trace action.",
+        )
+    def doctor_checks(self) -> list[EnvironmentDoctorCheck]:
+        checks = super().doctor_checks()
+        checks.append(
+            EnvironmentDoctorCheck(
+                name="trace_analysis",
+                status="pass",
+                detail="Trace analysis environment is available.",
+            )
+        )
+        return checks
+    def _ok(self, *, observation: dict[str, Any], reward: float, memory_note: str) -> EnvironmentActionResult:
+        payload = {"success": True, **observation}
+        return EnvironmentActionResult(
+            observation=payload,
+            reward=reward,
+            memory_note=memory_note,
+        )
+    def _store_from_action(self, action: dict[str, Any], *, required_path: bool):
+        raw = action.get("trace_path") or action.get("path")
+        if isinstance(raw, str) and raw.strip():
+            return self._load_store(Path(raw.strip()).expanduser())
+        if self._store is not None:
+            return self._store
+        if required_path:
+            raise ValueError("trace_path is required")
+        raise ValueError("no trace dataset loaded; pass trace_path or use set_trace_path first")
+    def _load_store(self, trace_path: Path):
+        from ..traces import TraceStore
+        resolved = trace_path if trace_path.is_absolute() else (self.workdir / trace_path)
+        store = TraceStore.load(resolved)
+        self._trace_path = resolved.resolve()
+        self._store = store
+        return store
+    @staticmethod
+    def _extract_trace_path(task: str) -> Path | None:
+        match = re.search(r"(?:^|\s)(?:trace|trace_path)=([^\s]+)", task)
+        if not match:
+            return None
+        raw = match.group(1).strip().strip("\"'")
+        return Path(raw).expanduser() if raw else None
+    @staticmethod
+    def _required_str(action: dict[str, Any], key: str) -> str:
+        value = action.get(key)
+        if not isinstance(value, str) or not value.strip():
+            raise ValueError(f"{key} is required")
+        return value.strip()
+    @staticmethod
+    def _int_arg(
+        action: dict[str, Any],
+        key: str,
+        default: int,
+        *,
+        minimum: int,
+        maximum: int,
+    ) -> int:
+        value = action.get(key, default)
+        try:
+            parsed = int(value)
+        except Exception:
+            parsed = default
+        return max(minimum, min(maximum, parsed))
+    @staticmethod
+    def _bool_arg(action: dict[str, Any], key: str, default: bool) -> bool:
+        value = action.get(key, default)
+        if isinstance(value, bool):
+            return value
+        if isinstance(value, str):
+            normalized = value.strip().lower()
+            if normalized in {"1", "true", "yes", "on"}:
+                return True
+            if normalized in {"0", "false", "no", "off"}:
+                return False
+        return default
 class DSPyCodingRLMEnvironment(GenericRLMEnvironment):
     """DSPy-focused environment with file edit + tests + DSPy-aware scoring."""

{rlm_code-0.1.6 → rlm_code-0.1.8}/rlm_code/rlm/runner.py RENAMED Viewed

@@ -38,6 +38,7 @@ from .environments import (
     GenericRLMEnvironment,
     RLMEnvironment,
     RLMRewardProfile,
+    TraceAnalysisEnvironment,
 )
 from .events import RLMEventBus
 from .frameworks import FrameworkAdapterRegistry, FrameworkEpisodeResult
@@ -279,6 +280,18 @@ class RLMRunner(BenchmarkManagerMixin, ChatSessionMixin, DelegationMixin, Action
                 workdir=self.workdir,
                 reward_profile=self.reward_profile,
             ),
+            "trace_analysis": TraceAnalysisEnvironment(
+                workdir=self.workdir,
+                reward_profile=self.reward_profile,
+            ),
+            "trace-analysis": TraceAnalysisEnvironment(
+                workdir=self.workdir,
+                reward_profile=self.reward_profile,
+            ),
+            "traces": TraceAnalysisEnvironment(
+                workdir=self.workdir,
+                reward_profile=self.reward_profile,
+            ),
             "framework": DSPyCodingRLMEnvironment(
                 workdir=self.workdir,
                 reward_profile=self.reward_profile,

rlm_code-0.1.8/rlm_code/traces/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+"""Trace indexing and query helpers for HALO-style RLM analysis."""
+from .index import TraceIndexBuilder
+from .store import TraceStore
+__all__ = ["TraceIndexBuilder", "TraceStore"]

rlm-code 0.1.6__tar.gz → 0.1.8__tar.gz

rlm-code 0.1.6tar.gz → 0.1.8tar.gz