PyPI - aru-code - Versions diffs - 0.60.0__tar.gz → 0.61.0__tar.gz - Mend

aru-code 0.60.0tar.gz → 0.61.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (218) hide show

{aru_code-0.60.0/aru_code.egg-info → aru_code-0.61.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aru-code
-Version: 0.60.0
+Version: 0.61.0
 Summary: A Claude Code clone built with Agno agents
 Author-email: Estevao <estevaofon@gmail.com>
 License-Expression: MIT

aru_code-0.61.0/aru/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.61.0"

{aru_code-0.60.0 → aru_code-0.61.0}/aru/agents/base.py RENAMED Viewed

@@ -3,6 +3,33 @@
 # Common rules shared across all agents (planner, executor, general).
 # Each agent appends its role-specific instructions to this base.
 BASE_INSTRUCTIONS = """\
+## Autonomy and Persistence
+Persist until the task is fully handled end-to-end within the current turn whenever feasible: \
+do not stop at analysis or partial fixes; carry changes through implementation, verification, \
+and a clear explanation of outcomes unless the user explicitly pauses or redirects you. \
+Assume the user wants you to make code changes or run tools to solve the problem — \
+it is bad to output your proposed solution in a message and stop; go ahead and actually \
+implement the change. If you encounter challenges or blockers, attempt to resolve them yourself.
+## Task execution
+You are a coding agent. Please keep going until the query is completely resolved, before \
+ending your turn and yielding back to the user. Only terminate your turn when you are sure \
+that the problem is solved. Autonomously resolve the query to the best of your ability, \
+using the tools available to you, before coming back to the user. Do NOT guess or make up \
+an answer.
+If a review, test run, plan step, or check surfaces concrete follow-up work that is clearly \
+in scope, resolve it in the same turn. "More work I identified" is NOT a blocker — it is the \
+next thing to do. The turn ends only when (a) the task is completely resolved and verified, \
+(b) you hit a real blocker that needs information only the user has, or (c) the plan / task \
+list is exhausted with every item terminal (completed / skipped / failed).
+End your turn by reporting what you DID, not by previewing what should happen next. Phrases \
+like "Próximo passo objetivo é…", "Next step is…", "I will now…" are forbidden as turn-end \
+content — if you write them you must execute them in the same turn.
 ## Output rules — CRITICAL for token efficiency
 Minimize output tokens. Your responses should be fewer than 4 lines unless the user \
@@ -144,7 +171,10 @@ You are a software engineer agent. Your job is to implement code changes.
 You MUST call `create_task_list` as your FIRST action before any other tool call. \
 Define 1-10 concrete subtasks for the current step. Then execute them in order, \
 calling `update_task` to mark each as "completed" or "failed" as you go. \
-When all subtasks are done, STOP. Do not add extra actions beyond the task list.
+When all subtasks finish, output a brief summary of what changed. The turn ends \
+only when the macro plan / multi-task workflow is also exhausted; if there are \
+more plan steps or skill-driven tasks pending, continue executing them in the \
+same turn — finishing a subtask list is not finishing the user's request.
 ## Subtask granularity — CRITICAL
 Each subtask should touch at most **3-4 files**. If the step involves many files, \
@@ -212,8 +242,10 @@ response. Read-only fan-out has no write-path hazards.
 When given a plan, execute it step by step. When given a direct task, figure out what needs to be done and do it.
 **ZERO narration between tool calls.** No "Now I have enough context...", \
 "Let me check...", "Now I understand...", "I need to...". Just call the next tool silently. \
-Only output text AFTER all subtasks are finished — a brief summary of what was done. \
-Text output is ONLY for the final result or when you hit a blocker that needs user input.
+Output text only when (a) the user's full request is resolved — including all macro plan \
+steps and skill-driven tasks — or (b) you hit a blocker that needs user input. Completing \
+a single subtask list or a single delegated task is NOT a turn boundary; continue with the \
+next pending item in the same turn.
 **Never retry failed shell commands with alternative syntax.** If a command fails, diagnose \
 the error — do not try `cmd /c`, absolute paths, or other wrappers hoping one works.
@@ -352,6 +384,23 @@ those tools — finish the plan and call exit_plan_mode instead.
 For simple tasks (1-2 file changes) where the user did NOT ask for a plan, \
 execute directly without entering plan mode.
+## Subtask lists vs the user's request — CRITICAL
+`create_task_list` / `update_task` track subtasks for ONE unit of work — \
+typically a single plan step, a single delegated task, or a single Task in a \
+multi-task skill workflow (e.g. /subagent-driven-development). Finishing a \
+subtask list is NOT finishing the user's request. When the `update_task` \
+tool_result says "All subtasks finished. Output a brief summary", that summary \
+is the summary of THAT unit only — not the whole turn.
+Before yielding, check: is there a pending plan step? A skill workflow that \
+declares more Tasks (Task 1..N)? A check that surfaced more work? If yes, \
+keep going in the same turn — call `create_task_list` again for the next \
+unit, or dispatch the next subagent, or call `update_plan_step` and move on. \
+Phrases like "Se quiser, continuo direto para a Task N", "Próximo passo \
+objetivo é…", "Next step is…" are forbidden as turn-end content. The turn \
+ends only when the user's full request is exhausted.
 ## Plan execution
 When you see a `<system-reminder>` listing PLAN ACTIVE steps, work through them in order:
@@ -384,8 +433,14 @@ Safe parallel-write pattern (only when ALL three hold):
 2. The tasks touch disjoint file sets.
 3. No task's output is another task's input inside the same batch.
-If any of the three fails, run tasks sequentially — one `delegate_task` per \
-response, or stay in-session and execute the step yourself. Parallel fan-out \
+If any of the three fails, run tasks sequentially — dispatch one \
+`delegate_task` per assistant response (so the next one only starts after the \
+previous returns), but keep doing this within the same turn until the multi-task \
+plan/skill workflow is exhausted. "Sequential" means "not in parallel"; it does \
+NOT mean "one task per turn" — finishing a single delegated task and then \
+yielding to the user defeats skills like /subagent-driven-development that \
+dispatch a fresh implementer per task. After each subagent returns, immediately \
+dispatch the next pending task in the same turn. Parallel fan-out \
 for read-only research (explorer) follows the Delegation strategy rules above; \
 it does not carry these write-path hazards.\
 """

{aru_code-0.60.0 → aru_code-0.61.0}/aru/agents/planner.py RENAMED Viewed

@@ -42,9 +42,10 @@ async def review_plan(request: str, plan: str) -> str:
     )
     prompt = f"## User Request\n{request}\n\n## Generated Plan\n{plan}"
     try:
-        response = await reviewer.arun(prompt)
-        if response and response.content and response.content.strip():
-            return response.content.strip()
+        from aru.runner import arun_text_only
+        content = await arun_text_only(reviewer, prompt)
+        if content and content.strip():
+            return content.strip()
     except Exception:
         pass
     return plan

{aru_code-0.60.0 → aru_code-0.61.0}/aru/cli.py RENAMED Viewed

@@ -198,16 +198,8 @@ async def run_oneshot(prompt: str, print_only: bool = False, skip_permissions: b
     ctx.model_id = session.model_id
     small_ref = config.model_aliases.get("small") if config else None
     if not small_ref:
-        from aru.providers import resolve_model_ref
-        provider_key, _ = resolve_model_ref(session.model_ref)
-        _small_defaults = {
-            "anthropic": "anthropic/claude-haiku-4-5",
-            "openai": "openai/gpt-4o-mini",
-            "groq": "groq/llama-3.1-8b-instant",
-            "deepseek": "deepseek/deepseek-chat",
-            "ollama": "ollama/llama3.1",
-        }
-        small_ref = _small_defaults.get(provider_key, session.model_ref)
+        from aru.providers import default_small_model_ref
+        small_ref = default_small_model_ref(session.model_ref)
     ctx.small_model_ref = small_ref
     extra_instructions = config.get_extra_instructions()
@@ -225,10 +217,11 @@ async def run_oneshot(prompt: str, print_only: bool = False, skip_permissions: b
             instructions=build_instructions("general", extra_instructions),
             markdown=True,
         )
-        response = await agent.arun(prompt)
-        if response and response.content:
+        from aru.runner import arun_text_only
+        content = await arun_text_only(agent, prompt)
+        if content:
             # Print raw text to stdout for piping
-            print(response.content)
+            print(content)
     else:
         # Full mode with tools
         from aru.runner import build_env_context

{aru_code-0.60.0 → aru_code-0.61.0}/aru/context.py RENAMED Viewed

@@ -974,8 +974,8 @@ async def compact_conversation(
             markdown=True,
         )
-        result = await compactor.arun(prompt, stream=False)
-        summary = result.content if result and result.content else ""
+        from aru.runner import arun_text_only
+        summary = await arun_text_only(compactor, prompt)
         if not summary:
             # Fallback: simple mechanical summary

{aru_code-0.60.0 → aru_code-0.61.0}/aru/history_blocks.py RENAMED Viewed

@@ -206,9 +206,49 @@ def to_agno_messages(history: list[HistoryItem]) -> list:
     This function is the single translation layer between Aru's storage
     format and the runtime format Agno's Claude adapter expects (see
     `.venv/Lib/site-packages/agno/utils/models/claude.py:334-358`).
+    Defensive orphan filtering: both directions of the tool_use / tool_result
+    pair are checked, because each one breaks the API in its own way.
+    * ``tool_result`` whose ``tool_use_id`` has no matching ``tool_use``
+      anywhere in the history is dropped. Anthropic rejects with
+      ``404 tool_use_id not found``; the OpenAI Responses backend (Codex)
+      rejects with ``400 No tool call found for function call output``.
+    * ``tool_use`` whose ``id`` has no matching ``tool_result`` anywhere
+      after it is dropped (from the assistant message's ``tool_calls``
+      list). Anthropic accepts trailing unmatched tool_use only if it's
+      the very last turn (because the *next* assistant turn is expected to
+      include the tool_result); but for any older assistant turn an
+      unmatched tool_use leaves the conversation in an "awaiting tool
+      output" state and the Responses API rejects with ``400 No tool
+      output found for function call``. This typically happens when a
+      tool wrapper raised before producing a result (timeout, schema
+      error, Ctrl+C mid-batch) or when a delegated subagent crashed and
+      its tool_result was never recorded.
+    Filtering here keeps the API contract intact regardless of how the
+    history got unbalanced upstream (compaction, prune, crash recovery).
     """
     from agno.models.message import Message  # local import to avoid cycles
+    declared_tool_use_ids: set[str] = set()
+    answered_tool_use_ids: set[str] = set()
+    for item in history:
+        role = item.get("role")
+        blocks = item.get("content") or []
+        if role == "assistant":
+            for block in blocks:
+                if is_tool_use(block):
+                    tid = block.get("id")
+                    if tid:
+                        declared_tool_use_ids.add(tid)
+        elif role in ("user", "tool"):
+            for block in blocks:
+                if is_tool_result(block):
+                    tid = block.get("tool_use_id")
+                    if tid:
+                        answered_tool_use_ids.add(tid)
     out: list[Message] = []
     for item in history:
         role = item.get("role", "user")
@@ -220,13 +260,18 @@ def to_agno_messages(history: list[HistoryItem]) -> list:
             text_parts = [b.get("text", "") for b in blocks if is_text(b)]
             tool_result_blocks = [b for b in blocks if is_tool_result(b)]
-            # Tool results must be emitted as separate `role="tool"` Messages
+            # Tool results must be emitted as separate `role="tool"` Messages.
+            # Skip orphans — see docstring; both Anthropic and Codex reject
+            # tool_results whose tool_use_id has no declaring tool_use.
             for tr in tool_result_blocks:
+                tid = tr.get("tool_use_id", "")
+                if tid and tid not in declared_tool_use_ids:
+                    continue
                 out.append(
                     Message(
                         role="tool",
                         content=str(tr.get("content", "")),
-                        tool_call_id=tr.get("tool_use_id", ""),
+                        tool_call_id=tid,
                         from_history=True,
                     )
                 )
@@ -245,9 +290,19 @@ def to_agno_messages(history: list[HistoryItem]) -> list:
             for b in blocks:
                 if not is_tool_use(b):
                     continue
+                tid = b.get("id", "")
+                # Drop tool_calls that never produced a tool_result. Without
+                # this, the next API call carries an unanswered function_call
+                # from a prior turn and the Responses backend errors out
+                # ("No tool output found for function call <id>"). The tool
+                # wrapper *should* always produce a result, but defensive
+                # filtering here recovers a stuck history even when the
+                # wrapper failed (timeout/crash/abort).
+                if tid and tid not in answered_tool_use_ids:
+                    continue
                 tool_calls.append(
                     {
-                        "id": b.get("id", ""),
+                        "id": tid,
                         "type": "function",
                         "function": {
                             "name": b.get("name", ""),
@@ -265,15 +320,21 @@ def to_agno_messages(history: list[HistoryItem]) -> list:
         elif role == "tool":
             # Explicit tool-role items (we don't produce these ourselves but
-            # support them for forward compat with loaded sessions).
+            # support them for forward compat with loaded sessions). Same
+            # orphan filter as the user-role branch — this is actually the
+            # branch that catches the loaded-session case where a prior
+            # compaction summarised the matching assistant turn away.
             for tr in blocks:
                 if not is_tool_result(tr):
                     continue
+                tid = tr.get("tool_use_id", "")
+                if tid and tid not in declared_tool_use_ids:
+                    continue
                 out.append(
                     Message(
                         role="tool",
                         content=str(tr.get("content", "")),
-                        tool_call_id=tr.get("tool_use_id", ""),
+                        tool_call_id=tid,
                         from_history=True,
                     )
                 )

{aru_code-0.60.0 → aru_code-0.61.0}/aru/memory/extractor.py RENAMED Viewed

@@ -135,8 +135,8 @@ async def _run_extractor_agent(prompt: str, model_ref: str) -> str:
         instructions="You curate durable memories. Output only the requested JSON.",
         markdown=False,
     )
-    result = await agent.arun(prompt, stream=False)
-    return (result.content or "") if result else ""
+    from aru.runner import arun_text_only
+    return await arun_text_only(agent, prompt)
 def _parse_json_array(content: str) -> list[dict]:

{aru_code-0.60.0 → aru_code-0.61.0}/aru/permissions.py RENAMED Viewed

@@ -444,16 +444,24 @@ def set_permission_mode(mode: str) -> str:
 def cycle_permission_mode() -> str:
-    """Advance to the next mode and return it."""
+    """Advance to the next mode and return it.
+    Delegates the actual mutation to ``set_permission_mode`` so the Ctrl+A
+    path (this function) and the ``/yolo`` slash command path (which calls
+    ``set_permission_mode`` directly) follow the exact same code path —
+    same mutation, same ``permission.mode.changed`` publish, same UI
+    refresh trigger. Historically these two were near-duplicate and the
+    Ctrl+A version skipped the bus publish; this caused subtle drift
+    where the StatusPane visually advanced but downstream subscribers
+    saw stale state.
+    """
     ctx = get_ctx()
     try:
         idx = _MODE_CYCLE.index(ctx.permission_mode)
     except ValueError:
         idx = 0
     next_mode = _MODE_CYCLE[(idx + 1) % len(_MODE_CYCLE)]
-    ctx.permission_mode = next_mode
-    ctx.skip_permissions = (next_mode == "yolo")
-    return next_mode
+    return set_permission_mode(next_mode)
 def consume_rejection_feedback() -> str:

{aru_code-0.60.0 → aru_code-0.61.0}/aru/providers.py RENAMED Viewed

@@ -465,6 +465,22 @@ def resolve_model_ref(model_ref: str) -> tuple[str, str]:
     return provider_key, model_name
+def default_small_model_ref(session_model_ref: str) -> str:
+    """Default model ref for sub-agents when no ``small`` alias is set.
+    Mirrors Codex's ``build_agent_shared_config`` (multi_agents_common.rs):
+    the spawned agent inherits the parent's effective model. Keeps the
+    sub-agent on the same provider (preserves credentials + cache lineage)
+    and avoids the cross-provider failure mode where a hard-coded "small
+    model" is rejected by the parent's backend — e.g. ``gpt-4o-mini`` on
+    a ChatGPT Plus/Pro OAuth credential whose Codex endpoint only accepts
+    ``gpt-5*`` ids. Users who want a cheaper sub-agent model can still set
+    ``model_aliases.small`` in aru.json; that override wins at every call
+    site before this helper runs.
+    """
+    return session_model_ref
 def _get_actual_model_id(provider: ProviderConfig, model_name: str) -> str:
     """Get the actual model ID to send to the API.

{aru_code-0.60.0 → aru_code-0.61.0}/aru/runner.py RENAMED Viewed

@@ -27,6 +27,33 @@ _MAX_TOKENS_RECOVERY_PROMPT = (
 )
+async def arun_text_only(agent, prompt: str) -> str:
+    """Run a tools-less helper agent and return its final text output.
+    Always streams, because the Codex Responses backend rejects non-streaming
+    calls with ``400 'Stream must be set to true'``. The official OpenAI
+    Responses API + every other provider also accept stream=True, so a single
+    code path covers both. Used by the compaction summarizer, memory
+    extractor, and plan reviewer — all of which previously called
+    ``agent.arun(prompt, stream=False)`` and broke for any user on a ChatGPT
+    Plus/Pro OAuth credential.
+    Falls back to the empty string when the model returns no content (caller
+    decides what that means — e.g. compaction has its own mechanical
+    fallback).
+    """
+    from agno.run.agent import RunOutput
+    final_output = None
+    async for event in agent.arun(prompt, stream=True, yield_run_output=True):
+        if isinstance(event, RunOutput):
+            final_output = event
+            break
+    if final_output and final_output.content:
+        return final_output.content
+    return ""
 def _prepare_recovery_input(
     *,
     agent,
@@ -185,6 +212,59 @@ def _build_plan_reminder(session) -> str | None:
     return "\n".join(lines)
+def _build_permission_mode_reminder() -> str | None:
+    """Surface the active permission mode to the model when it's non-default.
+    GPT-5 / Codex-trained models default to asking "should I commit?",
+    "want me to run X?" before mutating actions — that posture matches the
+    Codex CLI's default-approval gate, but it's the wrong posture inside
+    Aru's YOLO mode where every gate is already pre-approved. The harness
+    *can* see ``ctx.permission_mode``; the model can't unless we tell it.
+    Without this nudge the user has to re-type "go" / "do it" after every
+    proposal, which is exactly what YOLO is supposed to skip. Same shape
+    as ``_build_plan_reminder`` so the model treats it as authoritative.
+    Returns ``None`` for ``default`` mode — the model's built-in caution
+    is correct there and an extra reminder would just consume cache space
+    on every turn for no behavioural change.
+    """
+    try:
+        from aru.runtime import get_ctx
+        mode = get_ctx().permission_mode
+    except LookupError:
+        return None
+    if mode == "yolo":
+        # Persistence / autonomy posture lives in BASE_INSTRUCTIONS (always
+        # in the system prompt, so it covers default / acceptEdits / yolo
+        # alike) — mirrors Codex's design where the "Task execution" /
+        # "Autonomy and Persistence" sections live in the cached base
+        # instructions, not in a per-turn reminder. This reminder is
+        # scoped to the one thing that is mode-specific: permission
+        # gating. Do NOT re-state the persistence rules here; that would
+        # bloat the per-turn cache for no behavioural gain.
+        return (
+            "<system-reminder>\n"
+            "YOLO MODE ACTIVE — equivalent to Codex `approval-policy: "
+            "never`. Every tool call is pre-approved. Do NOT ask permission "
+            "before running tools (\"posso fazer o commit?\", \"want me to "
+            "run the tests?\", \"shall I edit X?\", \"Se quiser, faço…\"). "
+            "Just execute. The autonomy and task-execution rules from your "
+            "system prompt still apply.\n"
+            "</system-reminder>"
+        )
+    if mode == "acceptEdits":
+        return (
+            "<system-reminder>\n"
+            "AUTO-ACCEPT EDITS ACTIVE — file edits are pre-approved. Do NOT "
+            "ask before writing/editing files. Bash and other non-edit "
+            "actions still gate normally; for those you may pause if the "
+            "command is destructive or ambiguous. For routine edits, "
+            "execute without confirmation.\n"
+            "</system-reminder>"
+        )
+    return None
 def _consume_plan_rejection_feedback(session) -> str | None:
     """Read-and-clear plan rejection feedback stored on the session.
@@ -405,6 +485,10 @@ async def run_agent_capture(agent, message: str, session=None, lightweight: bool
             if reminder:
                 msg_parts.append(reminder)
+            mode_reminder = _build_permission_mode_reminder()
+            if mode_reminder:
+                msg_parts.append(mode_reminder)
             warning = session.check_budget_warning()
             if warning:
                 console.print(warning)

{aru_code-0.60.0 → aru_code-0.61.0}/aru/session.py RENAMED Viewed

@@ -13,7 +13,7 @@ from dataclasses import dataclass, field
 from datetime import datetime
 from typing import Literal
-from aru.providers import MODEL_ALIASES, get_model_display, resolve_model_ref
+from aru.providers import MODEL_ALIASES, get_model_display, get_provider, resolve_model_ref
 # Default model reference (provider/model format)
 DEFAULT_MODEL = "anthropic/claude-sonnet-4-5"
@@ -577,11 +577,25 @@ class Session:
         suffix convention plus any future provider that adopts the same
         naming. None of the major paid models contain "free" in their id,
         so false positives are negligible.
+        ChatGPT Plus/Pro via Codex OAuth is a flat-rate subscription — usage
+        is bounded by the plan's session quotas, not per-token charges — so
+        all four prices are zero whenever the active openai credential is an
+        OAuth token (``provider.codex_oauth``). The user disconnecting via
+        ``/connect logout`` clears the flag and the regular gpt-5 pricing
+        kicks back in for any subsequent turns.
         """
         ref = (self.model_ref or "").lower()
         mid = (self.model_id or "").lower()
         if "free" in ref or "free" in mid:
             return (0.0, 0.0, 0.0, 0.0)
+        try:
+            provider_key, _ = resolve_model_ref(self.model_ref or "")
+            provider = get_provider(provider_key)
+            if provider is not None and getattr(provider, "codex_oauth", False):
+                return (0.0, 0.0, 0.0, 0.0)
+        except Exception:
+            pass
         model_id = self.model_id
         # Try exact match, then prefix match, then fallback
         for prefix, pricing in MODEL_PRICING.items():

{aru_code-0.60.0 → aru_code-0.61.0}/aru/tools/delegate.py RENAMED Viewed

@@ -422,8 +422,17 @@ Do not create documentation files unless explicitly asked.
             })
             from aru.runtime import _schedule_publish as _sched_t
+            # Prepend the permission-mode reminder to the subagent's prompt so
+            # YOLO mode reaches the spawned agent too — delegate runs through
+            # ``agent_instance.arun`` directly, bypassing run_agent_capture's
+            # reminder injection. The persistence / task-execution posture
+            # is in BASE_INSTRUCTIONS (subagent's system prompt) so it
+            # propagates without needing a per-spawn reminder.
+            from aru.runner import _build_permission_mode_reminder
+            _mode_reminder = _build_permission_mode_reminder()
+            sub_task = f"{_mode_reminder}\n\n{task}" if _mode_reminder else task
             try:
-                async for event in agent_instance.arun(task, stream=True, stream_events=True, yield_run_output=True):
+                async for event in agent_instance.arun(sub_task, stream=True, stream_events=True, yield_run_output=True):
                     if is_aborted():
                         _trace.status = "cancelled"
                         _trace.ended_at = _time.monotonic()

{aru_code-0.60.0 → aru_code-0.61.0}/aru/tools/file_ops.py RENAMED Viewed

@@ -199,6 +199,39 @@ def write_files(file_list: list[dict]) -> str:
                    Example: [{"path": "src/main.py", "content": "print('hello')"}, {"path": "src/utils.py", "content": "..."}]
     """
     from aru.runtime import resolve_path as _resolve_path
+    # Defensive schema validation — return a string error instead of raising.
+    # An uncaught TypeError / AttributeError here would propagate through the
+    # async tool wrapper without producing a tool_result, leaving the next
+    # turn with a function_call lacking its function_call_output (Codex
+    # rejects with ``400 No tool output found for function call``).
+    if not isinstance(file_list, list):
+        return (
+            "Error: write_files expects ``file_list`` to be a JSON array of "
+            "objects with 'path' and 'content' keys. Got "
+            f"{type(file_list).__name__!r}."
+        )
+    cleaned: list[dict] = []
+    schema_errors: list[str] = []
+    for i, e in enumerate(file_list):
+        if not isinstance(e, dict):
+            schema_errors.append(
+                f"entry {i}: expected object with 'path' and 'content', got {type(e).__name__}"
+            )
+            continue
+        if "path" not in e or "content" not in e:
+            schema_errors.append(
+                f"entry {i}: missing required key(s) — needs both 'path' and 'content'"
+            )
+            continue
+        cleaned.append(e)
+    if not cleaned:
+        return (
+            "Error: write_files received no valid entries. "
+            + "; ".join(schema_errors)
+            if schema_errors
+            else "Error: write_files received an empty list."
+        )
+    file_list = cleaned
     parts = [Text(f"Write {len(file_list)} files:", style="bold"), Text()]
     for e in file_list:
         p = _resolve_path(e.get("path", "<missing>"))
@@ -311,6 +344,19 @@ def edit_files(edits: list[dict]) -> str:
                Example: [{"path": "src/main.py", "old_string": "foo", "new_string": "bar"}]
     """
     from aru.runtime import resolve_path as _resolve_path
+    # Defensive schema validation — same rationale as write_files: a TypeError
+    # raised here would propagate through the async wrapper without producing
+    # a tool_result, leaving the assistant message with an unanswered tool_use
+    # that the Responses backend rejects on the next turn.
+    if not isinstance(edits, list):
+        return (
+            "Error: edit_files expects ``edits`` to be a JSON array of "
+            "objects with 'path', 'old_string', 'new_string'. Got "
+            f"{type(edits).__name__!r}."
+        )
+    edits = [e for e in edits if isinstance(e, dict)]
+    if not edits:
+        return "Error: edit_files received no valid edit entries."
     original: dict[str, str] = {}
     preview: dict[str, str] = {}
     preview_errors: list[str] = []

{aru_code-0.60.0 → aru_code-0.61.0}/aru/tools/registry.py RENAMED Viewed

@@ -18,7 +18,6 @@ from aru.tools.file_ops import (
     _list_directory_tool,
     _read_file_tool,
     _write_file_tool,
-    _write_files_tool,
     read_files,
 )
 from aru.tools.plan_mode import enter_plan_mode, exit_plan_mode
@@ -69,7 +68,18 @@ _READ_ONLY_TOOLS = [
 _WRITE_TOOLS = [
     _write_file_tool,
-    _write_files_tool,
+    # ``_write_files_tool`` (batch write) intentionally NOT exposed: the
+    # nested ``[{"path", "content"}]`` schema is consistently mis-called by
+    # every model family we tested (GPT-5 included) — it passes a plain dict,
+    # a list of strings, or forgets one of the required keys. The wrapper
+    # used to raise on the malformed input, which left the assistant message
+    # with an unanswered tool_call and broke the next turn on the Codex
+    # backend (``400 No tool output found for function call``). Adding
+    # schema-validation didn't fix the underlying ergonomics — models still
+    # waste calls fighting the schema. Single ``write_file`` works
+    # reliably; batches can be expressed as N sequential calls. The
+    # function and its async wrapper are kept in ``file_ops.py`` for any
+    # custom tool or plugin that imports them directly.
     _edit_file_tool,
     _edit_files_tool,
     _apply_patch_tool,

{aru_code-0.60.0 → aru_code-0.61.0}/aru/tools/tasklist.py RENAMED Viewed

@@ -237,10 +237,25 @@ def update_task(index: int, status: str) -> str:
     failed_count = sum(1 for t in all_tasks if t["status"] == "failed")
     total = len(all_tasks)
+    # Tool result kept minimal — Codex's update_plan returns the constant
+    # ``"Plan updated"`` (plan.rs:22) and relies on the base-instructions
+    # "Task execution" section (always in the system prompt) to keep GPT-5
+    # going across subtasks. We follow the same shape so the persistence
+    # signal lives in one place (BASE_INSTRUCTIONS) instead of being
+    # duplicated in every tool result. The "all finished" branch is
+    # deliberately worded as "this subtask list is done" (not "the work is
+    # done") so the model doesn't read it as a turn-end signal when there
+    # are still pending plan steps or skill-driven macro Tasks.
     if completed_count + failed_count == total:
-        return f"All subtasks finished ({completed_count} completed, {failed_count} failed). Step done. Output a brief summary of what was created/changed."
+        return (
+            f"This subtask list is done ({completed_count} completed, "
+            f"{failed_count} failed). If more plan steps or skill-driven "
+            "Tasks remain in your request, continue with the next one in "
+            "the same turn (call create_task_list again, dispatch the next "
+            "subagent, or call update_plan_step). Only yield to the user "
+            "when the full request is exhausted."
+        )
-    # Find next pending subtask
     next_task = next((t for t in all_tasks if t["status"] == "pending"), None)
     if next_task:
         return f"Subtask {index} → {status}. Next: subtask {next_task['index']} — {next_task['description']}"

aru-code 0.60.0__tar.gz → 0.61.0__tar.gz

aru-code 0.60.0tar.gz → 0.61.0tar.gz