RubyGems - harnex - Versions diffs - 0.7.6 → 0.7.8 - Mend

harnex 0.7.6 → 0.7.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +34 -0
data/README.md +30 -14
data/guides/01_dispatch.md +13 -7
data/guides/03_buddy.md +8 -1
data/guides/04_monitoring.md +36 -38
data/lib/harnex/adapters/codex_appserver.rb +1 -1
data/lib/harnex/cli.rb +7 -0
data/lib/harnex/codex/app_server/client.rb +1 -2
data/lib/harnex/commands/status.rb +10 -2
data/lib/harnex/commands/wait.rb +36 -13
data/lib/harnex/commands/watch.rb +204 -0
data/lib/harnex/dispatch_history.rb +1 -0
data/lib/harnex/runtime/session.rb +140 -30
data/lib/harnex/terminal_status.rb +5 -0
data/lib/harnex/version.rb +2 -2
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: f29114f723ccfc61ade344c784cb6a11144c25e79392bf136f5b722473b47f0b
-  data.tar.gz: 810686487788509887bb8bfb5f9a4c45a8a1b8bca645530b1bf178f1435cee40
+  metadata.gz: 64a1ddf83ef070cc2b418c1a70ae3e1fc2c8b5fd671e8fa3e5a30536184728ca
+  data.tar.gz: 6a076faed04db3eddbaf4bf2fefbee95426cd8ee8008499e3474fbfa1b5c62ed
 SHA512:
-  metadata.gz: 90c68832ab9218716fd2e6181f006a12b996a23c28b120a769411618aee35e507c38df43bcf9293a98fb3902923e70eb1be9e08a90795e9ec970cb5b51ca2b54
-  data.tar.gz: 64ebab38e3cf716bfe69fd7f8d1b28c0d114ef72e81f8028cb87f0b83f6a75cb555e1036a7831a41b3dddd156c0b9f71e51fa9550271f5be529dcea1fd23c9e1
+  metadata.gz: 3508afcbddc0e9afaf17372ca98cb3146d5edc128fb54c6217f8b80be7f81baae0ab7f50feba4b513a9ba46165323f8fb95ddd243df5bd19921b198beb89bbc6
+  data.tar.gz: 25419186825ac14d2f6cd844497d6dc356e88785f23784f30492284b80c384781ab834011f394d27c93078821fc394fed3ac97b8bff818f7b60fc280be83a560

data/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,40 @@
 ## [Unreleased]
+## [0.7.8] - 2026-06-13 | 08:45 PM | IST
+### Added
+- `harnex watch --id <id> --until done` now provides a native work-terminal
+  watcher for existing visible/detached sessions. It exits `0` for successful
+  work, non-zero for `task_failed` / failed terminal telemetry, `124` for
+  wall-clock timeout, and can write optional done/fail marker files for legacy
+  queue integrations.
+### Changed
+- Monitoring docs now recommend native `harnex watch` for unattended
+  single-dispatch monitoring and reserve `harnex run --watch` for foreground
+  launch-and-stall-recovery.
+## [0.7.7] - 2026-06-12 | 10:48 AM | IST
+### Fixed
+- Codex app-server failed turns now emit `task_failed` instead of being
+  misreported as successful `task_complete` work. `harnex wait --until done`
+  returns non-zero for failed-turn events, dispatch history records
+  `terminal_event=task_failed`, and auto-stop terminates structured sessions
+  without sending a stale `turn/interrupt` after the turn is already complete.
+- Codex app-server nested error notifications now preserve the real Codex error
+  message (for example missing provider credentials) without counting them as
+  transport disconnects.
+### Changed
+- Refreshed the pinned Codex app-server JSON Schema fixtures to
+  `codex-cli 0.139.0` and taught the test schema validator `minLength`.
 ## [0.7.6] - 2026-06-09 | 12:59 AM | IST
 ### Added

data/README.md CHANGED Viewed

@@ -68,8 +68,9 @@ job, watch it work, stop it when done.
   `harnex events` streams structured JSONL lifecycle events.
 - **You don't want to babysit.** Use `--context --auto-stop` for
-  one-shot work, `--watch` for bounded stall recovery, or
-  `--wait-for-idle` as a send fence.
+  one-shot work, `harnex watch` for existing visible/detached dispatches,
+  `run --watch` for bounded foreground stall recovery, or `--wait-for-idle`
+  as a send fence.
 - **You want local-only orchestration.** Everything runs on your
   machine. No cloud services, no API keys beyond what the agents need.
@@ -150,18 +151,27 @@ harnex agents-guide monitoring
 ## Built-in dispatch monitoring
-For unattended dispatches, use `--watch` instead of writing a bash poll loop:
+For unattended visible/background dispatches, use `harnex watch` instead of
+writing a bash poll loop around `harnex wait`:
 ```bash
-harnex run pi --id pi-impl-42 --watch --preset impl \
+harnex run pi --id pi-impl-42 --tmux pi-impl-42 \
   --context "Implement koder/plans/42_plan.md. Run tests and commit when done."
+harnex watch --id pi-impl-42 --until done --max-wait 90m \
+  --done-marker /tmp/pi-impl-42-done.json \
+  --fail-marker /tmp/pi-impl-42-failed.json
 ```
-`--watch` runs a foreground babysitter that checks session activity every 60s,
-force-resumes on stall up to a cap, and exits when the target session exits or
-the resume cap is reached. It is foreground-only; use `--tmux` or `--detach`
-for visible/background sessions, and `--watch` when the current command should
-block as the monitor.
+`harnex watch --until done` is the safe work-terminal watcher for existing
+`--tmux` or detached sessions. It exits `0` for `task_complete`/done, non-zero
+for `task_failed` or failed terminal summaries, and `124` for `--max-wait`
+timeouts. It does not keep pane/status polling after a terminal failure signal.
+`harnex run --watch` is a separate foreground babysitter that checks session
+activity every 60s, force-resumes on stall up to a cap, and exits when the
+target session exits or the resume cap is reached. It is foreground-only; use
+`--tmux` or `--detach` for visible/background sessions, and `run --watch` when
+the current command should launch and monitor one worker.
 Presets map to stall policy defaults:
@@ -180,17 +190,22 @@ and stops the session after the first task completion or PTY prompt return.
 ## Completion and waiting
-Choose the wait predicate that matches how you launched the worker:
+Choose the wait/watch predicate that matches how you launched the worker:
-- `harnex wait --id ID --until done --timeout SECS` is the safest unattended
-  work fence. It returns when Harnex sees `task_complete` or a terminal exit,
-  whichever comes first.
+- `harnex watch --id ID --until done --max-wait DUR` is the safest unattended
+  monitor for an existing visible or detached dispatch. It wraps the work-level
+  fence, preserves the timeout/failure distinction, and can write done/fail
+  marker files for legacy queue integrations.
+- `harnex wait --id ID --until done --timeout SECS` is the primitive work fence.
+  It returns when Harnex sees `task_complete`, `task_failed`, or a terminal
+  exit, whichever comes first; failed work returns non-zero.
 - `harnex wait --id ID` waits for the wrapped process to exit. This is right
   for already-exited sessions and terminal-summary recovery, but interactive
   agents can stay open after finishing a turn.
 - For structured Pi RPC and Codex app-server sessions, use
   `harnex wait --id ID --until task_complete --timeout SECS` when you need the
-  exact turn-completion event instead of terminal-exit fallback.
+  exact successful-turn event instead of terminal-exit fallback. Use
+  `--until task_failed` to wait specifically for a failed structured turn.
 - `harnex send --wait-for-idle` is an atomic send fence for PTY-style
   interactions. It proves the turn returned to an idle/prompt state, not that
   your acceptance criteria passed.
@@ -312,6 +327,7 @@ See [recipes/03_buddy.md](recipes/03_buddy.md) for the full pattern.
 | `harnex send --id <id>` | Send a message (queues if busy, `--wait-for-idle` to block until the turn returns idle) |
 | `harnex stop --id <id>` | Send the agent's native exit sequence |
 | `harnex status` | List running sessions; with `--id ID --json`, terminal summaries can classify completed/failed sessions after exit |
+| `harnex watch --id <id>` | Safely monitor existing visible/detached work until `done`, `task_failed`, or timeout; optional done/fail markers |
 | `harnex pane --id <id>` | Capture a tmux-backed session's screen (`--follow` for live) |
 | `harnex logs --id <id>` | Read session transcript (`--follow` to tail) |
 | `harnex events --id <id>` | Stream structured session events (`--snapshot` for non-blocking dump) |

data/guides/01_dispatch.md CHANGED Viewed

@@ -73,7 +73,7 @@ for i in 1 2 3; do
   harnex run pi --id w-$i --tmux w-$i --detach \
     --context "Read and execute /tmp/task-$i.md" --auto-stop &
 done
-for i in 1 2 3; do harnex wait --id w-$i --until done & done
+for i in 1 2 3; do harnex watch --id w-$i --until done --max-wait 90m & done
 wait
 ```
@@ -125,18 +125,24 @@ Use the lightest primitive that gives the signal you need:
 | Continuous pane view | `harnex pane --id pi-i-NN --follow` |
 | Transcript tail | `harnex logs --id pi-i-NN --lines 80` |
 | Structured events | `harnex events --id pi-i-NN --snapshot` |
-| Work completion fence | `harnex wait --id pi-i-NN --until done` |
-| Native turn completion | `harnex wait --id pi-i-NN --until task_complete` |
+| Existing-session work monitor | `harnex watch --id pi-i-NN --until done --max-wait 90m` |
+| Primitive work completion/failure fence | `harnex wait --id pi-i-NN --until done` |
+| Native successful-turn completion | `harnex wait --id pi-i-NN --until task_complete` |
-For unattended policy-only stall recovery, use built-in watch mode:
+For visible `--tmux` or detached dispatches, prefer `harnex watch --id`: it
+returns `0` on done, non-zero on `task_failed`/failed terminal summaries, and
+`124` on `--max-wait` timeout. Use `--done-marker` / `--fail-marker` only as
+compatibility outputs for older queue scripts.
+For foreground launch-and-stall-recovery, use `harnex run --watch`:
 ```bash
 harnex run pi --id pi-i-NN --watch --preset impl --context "Read /tmp/task-impl-NN.md"
 ```
-`--watch` is foreground-blocking. Use it when a single process should launch
-and monitor the worker. Use pane/log/event polling or a buddy when you need
-interpretation, multiple sessions, or a separate watcher.
+`run --watch` is foreground-blocking. Use it when a single process should
+launch and monitor the worker. Use pane/log/event polling or a buddy when you
+need interpretation across multiple sessions.
 ## Verify And Stop

data/guides/03_buddy.md CHANGED Viewed

@@ -4,7 +4,14 @@ A buddy is a second harnex session that watches one or more workers and nudges
 them if they stall. Use a buddy when the work is long-running, unattended, or
 needs interpretation that simple stall policy cannot provide.
-For simple inactivity recovery, prefer built-in watch mode:
+For simple work-terminal monitoring of an existing visible/detached session,
+prefer the native watcher:
+```bash
+harnex watch --id pi-i-NN --until done --max-wait 90m
+```
+For foreground launch-and-inactivity recovery, use built-in run watch mode:
 ```bash
 harnex run pi --id pi-i-NN --watch --preset impl --context "Read /tmp/task-impl-NN.md"

data/guides/04_monitoring.md CHANGED Viewed

@@ -17,11 +17,15 @@ Prefer signals in this order:
 | `harnex pane` | Live UI interpretation and prompt/error diagnosis |
 | `harnex status` | Session liveness and coarse state |
-For unattended monitors, prefer `harnex wait --until done`: it returns on the
-work-level `task_complete` signal or terminal exit, whichever comes first. For
-structured sessions (Pi RPC and Codex app-server), `harnex wait --until
-task_complete` remains the exact turn-level fence. Neither knows your acceptance
-criteria; verify the expected artifact or tests afterward.
+For unattended monitors on existing visible/detached sessions, prefer
+`harnex watch --until done`: it returns on the work-level `task_complete` or
+`task_failed` signal, or terminal exit, whichever comes first. Successful work
+exits `0`, failed work exits non-zero, and wall-clock caps exit `124`. For
+callers that need the lower-level primitive, `harnex wait --until done` exposes
+the same work fence. For structured sessions (Pi RPC and Codex app-server),
+`harnex wait --until task_complete` remains the exact successful-turn fence.
+None of these know your acceptance criteria; verify the expected artifact or
+tests afterward.
 ## Completion Test
@@ -29,14 +33,19 @@ For unattended work, first gate on harnex work completion, then verify the task
 artifact and repo health:
 ```bash
-harnex wait --id pi-i-NN --until done --timeout 5400 &&
+harnex watch --id pi-i-NN --until done --max-wait 90m \
+  --done-marker /tmp/pi-i-NN-done.json \
+  --fail-marker /tmp/pi-i-NN-failed.json &&
   test -f path/to/expected-artifact &&
   test -z "$(git status --short)"
 ```
-`harnex wait --until done` succeeds from `task_complete` or durable terminal
-telemetry (`--summary-out` / `.harnex/dispatch.jsonl` / exit status), not from
-tmp done markers.
+`harnex watch --until done` wraps the `harnex wait --until done` work fence:
+it succeeds from `task_complete` or durable successful terminal telemetry
+(`--summary-out` / `.harnex/dispatch.jsonl` / exit status), returns non-zero for
+`task_failed` / failed terminal telemetry, returns `124` for `--max-wait`, and
+only writes done/fail markers as compatibility outputs after harnex has seen a
+terminal work signal.
 Adjust the artifact path to the task. The point is to avoid declaring done while
 a worker is between edits or between commits.
@@ -74,42 +83,29 @@ harnex events --id pi-i-NN
 For task completion:
 ```bash
+harnex watch --id pi-i-NN --until done --max-wait 15m
+# Primitive equivalent when a script wants raw wait semantics:
 harnex wait --id pi-i-NN --until done --timeout 900
-# Or, when you specifically need the structured turn event:
+# Or, when you specifically need the structured successful-turn event:
 harnex wait --id pi-i-NN --until task_complete --timeout 900
 ```
 ## Background Sweeper
-Consumers often run a small shell loop that checks terminal state, then drops
-to pane diagnostics only while work is still running. Keep a hard wall-clock cap
-so an unattended pipeline cannot wait forever:
+Avoid custom shell loops that repeatedly call `harnex wait`/`harnex status` and
+then accidentally swallow a failed work result. For a single unattended
+visible/detached dispatch, use the native watcher with a hard wall-clock cap:
 ```bash
-start=$(date +%s)
-max_wait=5400
-while :; do
-  if test "$(($(date +%s) - start))" -gt "$max_wait"; then
-    echo "wall-clock cap hit for pi-i-NN" >&2
-    exit 2
-  fi
-  row=$(harnex status --id pi-i-NN --json | ruby -rjson -e 'rows=JSON.parse(STDIN.read); print JSON.generate(rows.first || {})')
-  done=$(printf '%s' "$row" | ruby -rjson -e 'print(JSON.parse(STDIN.read)["done"] ? "true" : "false")')
-  work_state=$(printf '%s' "$row" | ruby -rjson -e 'print(JSON.parse(STDIN.read)["work_state"].to_s)')
-  state=$(printf '%s' "$row" | ruby -rjson -e 'print(JSON.parse(STDIN.read)["state"].to_s)')
-  case "$done:$work_state" in
-    true:*) echo "pi-i-NN work completed"; break ;;
-    false:failed) echo "pi-i-NN work failed; process state: $state" >&2; exit 1 ;;
-    *) harnex pane --id pi-i-NN --lines 20 ;;
-  esac
-  sleep 60
-done
+harnex watch --id pi-i-NN --until done --max-wait 90m \
+  --done-marker /tmp/pi-i-NN-done.json \
+  --fail-marker /tmp/pi-i-NN-failed.json
 ```
+If that exits `124`, inspect the pane/logs/events and decide whether to nudge,
+stop, or continue. If it exits any other non-zero code, treat the work as
+failed; do not continue polling the same task as though it were still running.
 Recommended caps:
 | Work type | Cap |
@@ -118,17 +114,18 @@ Recommended caps:
 | Medium implementation | 90 minutes |
 | Large unattended phase | 3 hours |
-## Built-In Watch Mode
+## Built-In Stall Babysitter
 Use `harnex run --watch` when one foreground process should launch the worker
-and apply bounded stall recovery:
+and apply bounded stall recovery. This is different from `harnex watch --id`,
+which watches an existing session's work-terminal state:
 ```bash
 harnex run pi --id pi-i-NN --watch --preset impl \
   --context "Read /tmp/task-impl-NN.md"
 ```
-`--watch` exits with:
+`run --watch` exits with:
 | Code | Meaning |
 | --- | --- |
@@ -143,6 +140,7 @@ interpretation.
 - Polling `state=completed` alone and missing live sessions with `task_complete=true`.
 - Polling `state=prompt` alone and calling it done.
+- Wrapping `harnex wait` in loops that swallow non-zero `task_failed` results.
 - Blocking orchestrators on `/tmp/*-done.txt` as the only completion signal.
 - Letting an unattended loop run with no wall-clock cap.
 - Reading raw tmux panes instead of `harnex pane`.

data/lib/harnex/adapters/codex_appserver.rb CHANGED Viewed

@@ -339,7 +339,7 @@ module Harnex
           @current_turn_id = nil
           @state = :prompt
         when "error"
-          @state = :disconnected
+          @state = :busy
         end
         @notification_handler&.call(message)

data/lib/harnex/cli.rb CHANGED Viewed

@@ -15,6 +15,8 @@ module Harnex
         Sender.new(@argv.drop(1)).run
       when "wait"
         Waiter.new(@argv.drop(1)).run
+      when "watch"
+        WatchCommand.new(@argv.drop(1)).run
       when "stop"
         Stopper.new(@argv.drop(1)).run
       when "status"
@@ -59,6 +61,8 @@ module Harnex
         Sender.usage
       when "wait"
         Waiter.usage
+      when "watch"
+        WatchCommand.usage
       when "stop"
         Stopper.usage
       when "status"
@@ -90,6 +94,7 @@ module Harnex
           harnex run <cli> [options] [--] [cli-args...]
           harnex send --id ID [options] [text...]
           harnex wait --id ID [options]
+          harnex watch --id ID [options]
           harnex stop --id ID [options]
           harnex status [options]
           harnex logs --id ID [options]
@@ -104,6 +109,7 @@ module Harnex
           run     Start a wrapped interactive session and local API
           send    Send text to an active session
           wait    Block until a session exits or reaches a state
+          watch   Safely watch existing work until done/task_failed/timeout
           stop    Send the adapter stop sequence to a session
           status  List live sessions
           logs    Read session output transcripts
@@ -129,6 +135,7 @@ module Harnex
           harnex run aider --id blue-cat
           harnex run codex -- --cd /path/to/repo
           harnex status
+          harnex watch --id main --until done --max-wait 15m
           harnex logs --id main --follow
           harnex events --id main --snapshot
           harnex history --limit 20

data/lib/harnex/codex/app_server/client.rb CHANGED Viewed

@@ -265,8 +265,7 @@ module Harnex
             if message["error"]
               err_msg = message.dig("error", "message") || "RPC error"
-              pending.push(StandardError.new("codex_appserver RPC error: #{err_msg}"))
-              signal_disconnect(message["error"])
+              pending.push(StandardError.new(err_msg))
             else
               pending.push(message["result"] || {})
             end

data/lib/harnex/commands/status.rb CHANGED Viewed

@@ -102,14 +102,17 @@ module Harnex
     end
     def normalize_live_status(session)
-      task_complete = task_complete?(session)
+      task_failed = task_failed?(session)
+      task_complete = task_complete?(session) && !task_failed
+      work_state = task_failed ? "failed" : Harnex.work_state_for("running", task_complete: task_complete)
       session.merge(
         "state" => "running",
         "process_state" => "running",
         "terminal" => false,
         "task_complete" => task_complete,
+        "task_failed" => task_failed,
         "done" => Harnex.work_done_for("running", task_complete: task_complete),
-        "work_state" => Harnex.work_state_for("running", task_complete: task_complete),
+        "work_state" => work_state,
         "exit" => nil,
         "exit_code" => nil,
         "summary_out" => nil,
@@ -123,6 +126,11 @@ module Harnex
         !session["last_completed_at"].to_s.empty?
     end
+    def task_failed?(session)
+      session["task_failed"] == true || session["task_failed"].to_s == "true" ||
+        !session["last_failed_at"].to_s.empty?
+    end
     def load_live_status(session)
       uri = URI("http://#{session.fetch('host')}:#{session.fetch('port')}/status")
       request = Net::HTTP::Get.new(uri)

data/lib/harnex/commands/wait.rb CHANGED Viewed

@@ -11,8 +11,8 @@ module Harnex
     EXIT_STATUS_GRACE_POLL_INTERVAL = 0.05
     FINAL_EVENT_GRACE_SECONDS = 5.0
-    EVENT_PREDICATES = %w[task_complete].freeze
-    LEGACY_EVENT_TYPES = %w[agent_state exited task_complete].freeze
+    EVENT_PREDICATES = %w[task_complete task_failed].freeze
+    LEGACY_EVENT_TYPES = %w[agent_state exited task_complete task_failed].freeze
     def self.usage(program_name = "harnex wait")
       <<~TEXT
@@ -21,10 +21,13 @@ module Harnex
         Options:
           --id ID         Session ID to wait for (required)
           --until STATE   Wait until session reaches STATE. Supported:
-                            done            (work fence — task_complete or
-                                             terminal exit, whichever comes first)
+                            done            (work fence — task_complete,
+                                             task_failed, or terminal exit,
+                                             whichever comes first)
                             task_complete   (events JSONL — fires on
-                                             turn/completed; adapter-agnostic)
+                                             successful turn completion)
+                            task_failed     (events JSONL — fires on
+                                             failed turn completion)
                             <other>         (agent_state HTTP poll, e.g.
                                              "prompt", "busy")
                           Without --until, waits for session exit (default).
@@ -40,7 +43,7 @@ module Harnex
         Gotchas:
           done is the safest work-level fence for monitors.
-          task_complete is an event predicate; prompt/busy are live state polls.
+          task_complete/task_failed are event predicates; prompt/busy are live state polls.
           Prompt state alone does not prove work acceptance. Verify artifacts/tests.
           Exit waits can resolve from terminal summary rows when live registry/
           exit-status files are already gone.
@@ -140,7 +143,7 @@ module Harnex
           event = parse_event(line)
           next unless event
-          task_complete_seen = true if event_type(event) == "task_complete"
+          task_complete_seen = true if %w[task_complete task_failed].include?(event_type(event))
           if matches?(event, predicate, task_complete_seen)
             return [emit_event_match(event, start_time, predicate), f.pos, task_complete_seen]
           end
@@ -173,8 +176,12 @@ module Harnex
     def matches?(event, predicate, task_complete_seen)
       type = event_type(event)
       case predicate
-      when "task_complete", "done"
+      when "task_complete"
         type == "task_complete"
+      when "task_failed"
+        type == "task_failed"
+      when "done"
+        %w[task_complete task_failed].include?(type)
       when "prompt"
         type == "task_complete" ||
           (task_complete_seen && type == "agent_state" && event["state"] == "prompt")
@@ -183,6 +190,13 @@ module Harnex
       end
     end
+    def done_event_failed?(event)
+      return true if event_type(event) == "task_failed"
+      status = event["status"].to_s
+      !status.empty? && !%w[completed success succeeded].include?(status)
+    end
     def emit_event_match(event, start_time, predicate)
       waited = (Time.now - start_time).round(1)
       payload = {
@@ -193,17 +207,22 @@ module Harnex
         waited_seconds: waited
       }
       if predicate == "done"
+        failed = done_event_failed?(event)
         payload.merge!(
-          status: "done",
+          ok: !failed,
+          status: failed ? "failed" : "done",
           state: "running",
           process_state: "running",
           terminal: false,
-          task_complete: true,
-          done: true,
-          work_state: "completed"
+          task_complete: !failed,
+          done: !failed,
+          work_state: failed ? "failed" : "completed"
         )
+        payload[:last_error] = event["message"] || event["error"] if failed
       end
       puts JSON.generate(payload)
+      return 1 if predicate == "done" && done_event_failed?(event)
       0
     end
@@ -431,7 +450,8 @@ module Harnex
       data = JSON.parse(File.read(exit_path))
       exit_code = data["exit_code"]
       task_complete = data["task_complete"] == true || data["task_complete"].to_s == "true"
-      exit_success = exit_code.nil? || exit_code.to_i == 0
+      task_failed = data["task_failed"] == true || data["task_failed"].to_s == "true"
+      exit_success = !task_failed && (exit_code.nil? || exit_code.to_i == 0)
       state = exit_success ? "completed" : "failed"
       done = task_complete || exit_success
       payload = data.merge(
@@ -441,6 +461,7 @@ module Harnex
         "process_state" => "exited",
         "terminal" => true,
         "task_complete" => task_complete,
+        "task_failed" => task_failed,
         "done" => done,
         "work_state" => Harnex.work_state_for(state, task_complete: task_complete)
       )
@@ -486,6 +507,7 @@ module Harnex
     def terminal_payload(status)
       task_complete = !!status["task_complete"]
+      task_failed = !!status["task_failed"]
       work_state = status["work_state"] || Harnex.work_state_for(status["state"], task_complete: task_complete)
       done = status.key?("done") ? !!status["done"] : work_state == "completed"
       {
@@ -495,6 +517,7 @@ module Harnex
         process_state: status["process_state"] || Harnex.process_state_for(status["state"], terminal: true),
         terminal: status.key?("terminal") ? !!status["terminal"] : true,
         task_complete: task_complete,
+        task_failed: task_failed,
         done: done,
         work_state: work_state,
         exit: status["exit"],

data/lib/harnex/commands/watch.rb CHANGED Viewed

@@ -1,5 +1,8 @@
+require "fileutils"
 require "json"
 require "net/http"
+require "optparse"
+require "stringio"
 require "uri"
 module Harnex
@@ -206,4 +209,205 @@ module Harnex
       @monotonic_clock.call
     end
   end
+  class TerminalWatcher
+    TIMEOUT_EXIT_CODE = 124
+    def initialize(
+      id:,
+      repo_path: Dir.pwd,
+      until_state: "done",
+      max_wait: nil,
+      done_marker: nil,
+      fail_marker: nil,
+      stop_on_terminal: false,
+      out: $stdout,
+      err: $stderr
+    )
+      @id = Harnex.normalize_id(id)
+      @repo_path = repo_path
+      @until_state = until_state.to_s.strip.empty? ? "done" : until_state.to_s
+      @max_wait = max_wait
+      @done_marker = done_marker
+      @fail_marker = fail_marker
+      @stop_on_terminal = stop_on_terminal
+      @out = out
+      @err = err
+    end
+    def run
+      raise "harnex watch: only --until done is supported" unless @until_state == "done"
+      output, warnings, exit_code = capture_wait
+      @err.write(warnings) unless warnings.empty?
+      @out.write(output) unless output.empty?
+      payload = parse_payload(output)
+      outcome = classify(exit_code, payload)
+      case outcome
+      when :success
+        write_marker(@done_marker, payload, outcome: outcome, exit_code: exit_code)
+      when :failed
+        write_marker(@fail_marker, payload, outcome: outcome, exit_code: exit_code)
+      end
+      stop_session if @stop_on_terminal && outcome != :timeout
+      exit_code
+    end
+    private
+    def capture_wait
+      argv = ["--id", @id, "--repo", @repo_path, "--until", @until_state]
+      argv += ["--timeout", @max_wait.to_s] if @max_wait
+      out_buffer = StringIO.new
+      err_buffer = StringIO.new
+      original_stdout = $stdout
+      original_stderr = $stderr
+      $stdout = out_buffer
+      $stderr = err_buffer
+      exit_code = Waiter.new(argv).run
+      [out_buffer.string, err_buffer.string, exit_code]
+    ensure
+      $stdout = original_stdout
+      $stderr = original_stderr
+    end
+    def parse_payload(output)
+      line = output.to_s.lines.reverse.find { |candidate| !candidate.strip.empty? }
+      return {} unless line
+      parsed = JSON.parse(line)
+      parsed.is_a?(Hash) ? parsed : {}
+    rescue JSON::ParserError
+      {}
+    end
+    def classify(exit_code, payload)
+      return :timeout if exit_code == TIMEOUT_EXIT_CODE || payload["status"].to_s == "timeout"
+      return :success if exit_code.to_i.zero? && (payload.empty? || payload["ok"] != false)
+      :failed
+    end
+    def write_marker(path, payload, outcome:, exit_code:)
+      marker_path = path.to_s.strip
+      return if marker_path.empty?
+      expanded_path = File.expand_path(marker_path)
+      FileUtils.mkdir_p(File.dirname(expanded_path))
+      marker_payload = {
+        ok: outcome == :success,
+        id: @id,
+        outcome: outcome.to_s,
+        exit_code: exit_code,
+        status: payload["status"],
+        work_state: payload["work_state"],
+        task_complete: payload["task_complete"] || payload["event"] == "task_complete",
+        task_failed: payload["task_failed"] || payload["event"] == "task_failed",
+        done: payload["done"],
+        terminal: payload["terminal"],
+        source: "harnex watch"
+      }.compact
+      File.write(expanded_path, JSON.generate(marker_payload) + "\n")
+    end
+    def stop_session
+      repo_root = Harnex.resolve_repo_root(@repo_path)
+      registry = Harnex.read_registry(repo_root, @id)
+      return unless registry
+      uri = URI("http://#{registry.fetch('host')}:#{registry.fetch('port')}/stop")
+      request = Net::HTTP::Post.new(uri)
+      request["Authorization"] = "Bearer #{registry['token']}" if registry["token"]
+      response = Net::HTTP.start(uri.host, uri.port, open_timeout: 1, read_timeout: 2) do |http|
+        http.request(request)
+      end
+      @err.puts("harnex watch: stop-on-terminal failed with HTTP #{response.code}") unless response.is_a?(Net::HTTPSuccess)
+    rescue StandardError => e
+      @err.puts("harnex watch: stop-on-terminal failed: #{e.message}")
+    end
+  end
+  class WatchCommand
+    def self.usage(program_name = "harnex watch")
+      <<~TEXT
+        Usage: #{program_name} --id ID [options]
+        Options:
+          --id ID              Existing session ID to watch (required)
+          --until done         Watch work-level terminal state (default: done)
+          --repo PATH          Resolve session using PATH's repo root (default: current repo)
+          --max-wait DUR       Wall-clock cap before returning timeout (examples: 900, 15m, 2h)
+          --timeout DUR        Alias for --max-wait
+          --done-marker PATH   Write a JSON marker when work completes successfully
+          --fail-marker PATH   Write a JSON marker when work fails
+          --stop-on-terminal   Stop the live session after success/failure (not on timeout)
+          -h, --help           Show this help
+        `harnex watch` is the safe watcher for existing --tmux or detached
+        dispatches. It exits 0 for task_complete/done, non-zero for task_failed
+        or failed terminal summaries, and 124 for --max-wait timeouts.
+        For launch-and-babysit stall recovery, use `harnex run --watch`.
+      TEXT
+    end
+    def initialize(argv)
+      @argv = argv.dup
+      @options = {
+        id: nil,
+        repo_path: Dir.pwd,
+        until_state: "done",
+        max_wait: nil,
+        done_marker: nil,
+        fail_marker: nil,
+        stop_on_terminal: false,
+        help: false
+      }
+    end
+    def run
+      parser.parse!(@argv)
+      if @options[:help]
+        puts self.class.usage
+        return 0
+      end
+      raise "--id is required for harnex watch" unless @options[:id]
+      TerminalWatcher.new(
+        id: @options[:id],
+        repo_path: @options[:repo_path],
+        until_state: @options[:until_state],
+        max_wait: @options[:max_wait],
+        done_marker: @options[:done_marker],
+        fail_marker: @options[:fail_marker],
+        stop_on_terminal: @options[:stop_on_terminal]
+      ).run
+    end
+    private
+    def parser
+      @parser ||= OptionParser.new do |opts|
+        opts.banner = "Usage: harnex watch --id ID [options]"
+        opts.on("--id ID", "Existing session ID to watch") { |value| @options[:id] = Harnex.normalize_id(value) }
+        opts.on("--until STATE", "Watch until terminal state") { |value| @options[:until_state] = value }
+        opts.on("--repo PATH", "Resolve session using PATH's repo root") { |value| @options[:repo_path] = value }
+        opts.on("--max-wait DUR", "Wall-clock cap") do |value|
+          @options[:max_wait] = Harnex.parse_duration_seconds(value, option_name: "--max-wait")
+        end
+        opts.on("--timeout DUR", "Alias for --max-wait") do |value|
+          @options[:max_wait] = Harnex.parse_duration_seconds(value, option_name: "--timeout")
+        end
+        opts.on("--done-marker PATH", "Write marker on successful completion") { |value| @options[:done_marker] = value }
+        opts.on("--fail-marker PATH", "Write marker on failed completion") { |value| @options[:fail_marker] = value }
+        opts.on("--stop-on-terminal", "Stop live session after success/failure") { @options[:stop_on_terminal] = true }
+        opts.on("-h", "--help", "Show help") { @options[:help] = true }
+      end
+    end
+  end
 end

data/lib/harnex/dispatch_history.rb CHANGED Viewed

@@ -85,6 +85,7 @@ module Harnex
     end
     def classify(session)
+      return ["failed", "task_failed"] if session.respond_to?(:task_failed?) && session.task_failed?
       return ["completed", "task_complete"] if session.task_complete?
       return ["timeout", "timeout"] if session.exit_code == 124
       return ["killed", "process_kill"] if session.term_signal

data/lib/harnex/runtime/session.rb CHANGED Viewed

@@ -16,6 +16,7 @@ module Harnex
       agent_session_id cost_usd
     ].freeze
     BUDGET_META_FIELDS = %w[read_budget_lines output_ceiling_lines].freeze
+    SUCCESSFUL_TURN_STATUSES = %w[completed success succeeded].freeze
     class EventCounters
       def initialize
         @counts = {
@@ -103,6 +104,8 @@ module Harnex
       @session_finalized = false
       @turn_started_seen = false
       @last_completed_at = nil
+      @last_failed_at = nil
+      @last_failed_status = nil
       @pi_streamed_text_by_message = {}
       @auto_stop = !!auto_stop
       @auto_stop_fired = false
@@ -221,14 +224,19 @@ module Harnex
       end
       payload[:input_state] = adapter.input_state(screen_snapshot) if include_input_state
-      task_complete = !!@last_completed_at
+      task_complete = task_complete?
+      task_failed = task_failed?
+      work_state = task_failed ? "failed" : Harnex.work_state_for("running", task_complete: task_complete)
       payload[:agent_state] = @state_machine.to_s
       payload[:process_state] = "running"
       payload[:inbox] = @inbox.stats
       payload[:last_completed_at] = @last_completed_at&.iso8601
+      payload[:last_failed_at] = @last_failed_at&.iso8601
       payload[:task_complete] = task_complete
+      payload[:task_failed] = task_failed
       payload[:done] = Harnex.work_done_for("running", task_complete: task_complete)
-      payload[:work_state] = Harnex.work_state_for("running", task_complete: task_complete)
+      payload[:work_state] = work_state
+      payload[:last_error] = @last_error
       payload[:model] = summary_model
       payload[:effort] = meta_hash["effort"]
       payload[:auto_disconnects] = @event_counters.snapshot[:disconnections]
@@ -236,7 +244,11 @@ module Harnex
     end
     def task_complete?
-      !!@last_completed_at
+      !!@last_completed_at && !task_failed?
+    end
+    def task_failed?
+      !!@last_failed_at
     end
     def git_start
@@ -257,7 +269,7 @@ module Harnex
       inject_sequence([{ text: text, newline: newline }])
     end
-    def inject_stop(turn_id: nil)
+    def inject_stop(turn_id: nil, interrupt: true)
       unless structured_transport?
         raise "session is not running" unless pid && Harnex.alive_pid?(pid)
       end
@@ -274,15 +286,21 @@ module Harnex
             end
           end
         end
-        @inject_mutex.synchronize do
-          begin
-            adapter.interrupt(turn_id: turn_id)
-          rescue StandardError
-            nil
+        if interrupt
+          @inject_mutex.synchronize do
+            begin
+              adapter.interrupt(turn_id: turn_id)
+            rescue StandardError
+              nil
+            end
+            @state_machine.force_busy!
           end
-          @state_machine.force_busy!
+          return { ok: true, signal: "interrupt_sent" }
         end
-        return { ok: true, signal: "interrupt_sent" }
+        @state_machine.force_busy!
+        signal_rpc_done! unless @pid
+        return { ok: true, signal: "terminate_sent" }
       end
       @inject_mutex.synchronize do
@@ -336,7 +354,12 @@ module Harnex
       turn_id = nil
       @inject_mutex.synchronize do
-        turn_id = adapter.dispatch(**dispatch)
+        begin
+          turn_id = adapter.dispatch(**dispatch)
+        rescue StandardError => e
+          mark_task_failed(status: "dispatch_error", error: e.message)
+          raise
+        end
         @state_machine.force_busy!
         @injected_count += 1
         @last_injected_at = Time.now
@@ -460,14 +483,25 @@ module Harnex
         @state_machine.force_busy!
         emit_event("turn_started", turnId: params.dig("turn", "id"))
       when "turn/completed"
-        @last_completed_at = Time.now
         @state_machine.force_prompt!
         turn = params["turn"] || {}
-        payload = { turnId: turn["id"] }
-        payload[:status] = turn["status"] if turn["status"]
-        payload[:tokenUsage] = params["tokenUsage"] if params["tokenUsage"]
-        emit_event("task_complete", **payload)
-        schedule_auto_stop("task_complete", turn_id: payload[:turnId])
+        status = turn["status"]
+        turn_id = turn["id"] || params["turnId"]
+        payload = { turnId: turn_id }
+        payload[:status] = status if status
+        payload[:tokenUsage] = params["tokenUsage"] if params["tokenUsage"].is_a?(Hash)
+        if successful_turn_status?(status)
+          @last_completed_at = Time.now
+          emit_event("task_complete", **payload)
+        else
+          mark_task_failed(
+            turn_id: turn_id,
+            status: status,
+            error: extract_turn_error_message(turn),
+            codex_error_info: extract_turn_error_info(turn)
+          )
+        end
+        schedule_auto_stop("turn_completed", interrupt: false)
       when "item/completed"
         emit_event("item_completed", item: params["item"])
         @event_counters.record_item(params["item"])
@@ -487,15 +521,70 @@ module Harnex
       when "account/rateLimits/updated"
         @rate_limits = params
       when "error"
-        @last_error = params["message"].to_s unless params["message"].to_s.empty?
+        message = extract_error_notification_message(params)
+        @last_error = message unless message.to_s.empty?
         @state_machine.force_busy!
-        emit_event("disconnected", source: "error_notification", message: params["message"])
-        signal_rpc_done!
+        emit_event(
+          "error",
+          source: "error_notification",
+          message: message,
+          codex_error_info: extract_error_notification_info(params),
+          will_retry: params["willRetry"],
+          threadId: params["threadId"],
+          turnId: params["turnId"]
+        )
+        signal_rpc_done! if params["turnId"].to_s.empty?
       end
     rescue StandardError => e
       warn("harnex: rpc notification handler error: #{e.message}")
     end
+    def successful_turn_status?(status)
+      text = status.to_s
+      return true if text.empty?
+      SUCCESSFUL_TURN_STATUSES.include?(text)
+    end
+    def mark_task_failed(turn_id: nil, status: nil, error: nil, codex_error_info: nil)
+      @last_failed_at = Time.now
+      @last_failed_status = status.to_s.empty? ? "failed" : status.to_s
+      @last_error = error.to_s unless error.to_s.empty?
+      payload = { status: @last_failed_status }
+      payload[:turnId] = turn_id if turn_id
+      payload[:message] = error unless error.to_s.empty?
+      payload[:codex_error_info] = codex_error_info if codex_error_info
+      emit_event("task_failed", **payload)
+    end
+    def extract_error_notification_message(params)
+      error = params["error"]
+      if error.is_a?(Hash)
+        error["message"] || error.dig("error", "message") || params["message"]
+      else
+        params["message"]
+      end
+    end
+    def extract_error_notification_info(params)
+      error = params["error"]
+      error.is_a?(Hash) ? error["codexErrorInfo"] : nil
+    end
+    def extract_turn_error_message(turn)
+      error = turn["error"]
+      return error["message"] if error.is_a?(Hash)
+      return error if error.is_a?(String)
+      nil
+    end
+    def extract_turn_error_info(turn)
+      error = turn["error"]
+      error.is_a?(Hash) ? error["codexErrorInfo"] : nil
+    end
     def handle_jsonl_notification(message)
       event_type = message["type"].to_s
@@ -509,7 +598,7 @@ module Harnex
         @state_machine.force_prompt!
         emit_event("task_complete")
         adapter.request_session_stats_async if adapter.respond_to?(:request_session_stats_async)
-        schedule_auto_stop("task_complete")
+        schedule_auto_stop("task_complete", interrupt: false)
       when "message_start"
         @pi_streamed_text_by_message[pi_message_key(message["message"])] = false
       when "message_update"
@@ -578,12 +667,21 @@ module Harnex
     def handle_rpc_disconnect(error)
       msg = error.is_a?(Hash) ? error["message"] : error&.message
+      if normal_auto_stop_disconnect?(msg)
+        signal_rpc_done!
+        return
+      end
       @last_error = msg.to_s unless msg.to_s.empty?
       @state_machine.force_busy!
       emit_event("disconnected", source: "transport", message: msg) rescue nil
       signal_rpc_done!
     end
+    def normal_auto_stop_disconnect?(message)
+      message.to_s.empty? && @auto_stop_fired && (task_complete? || task_failed?)
+    end
     def dispatch_initial_prompt
       return unless adapter.respond_to?(:initial_prompt)
@@ -738,10 +836,11 @@ module Harnex
       return unless defined?(@exit_code) && !@exit_code.nil?
       exit_path = Harnex.exit_status_path(repo_root, id)
-      task_complete = !!@last_completed_at
-      state = @exit_code.to_i == 0 ? "completed" : "failed"
+      task_complete = task_complete?
+      task_failed = task_failed?
+      state = task_failed || @exit_code.to_i != 0 ? "failed" : "completed"
       payload = {
-        ok: true,
+        ok: !task_failed && state == "completed",
         id: id,
         cli: adapter.key,
         session_id: session_id,
@@ -750,6 +849,7 @@ module Harnex
         state: state,
         process_state: "exited",
         task_complete: task_complete,
+        task_failed: task_failed,
         done: Harnex.work_done_for(state, task_complete: task_complete),
         work_state: Harnex.work_state_for(state, task_complete: task_complete),
         started_at: @started_at.iso8601,
@@ -955,7 +1055,7 @@ module Harnex
       schedule_auto_stop("prompt_after_busy") if seen_busy && new_state == :prompt
     end
-    def schedule_auto_stop(reason, turn_id: nil)
+    def schedule_auto_stop(reason, turn_id: nil, interrupt: true)
       return unless @auto_stop
       should_fire = @auto_stop_mutex.synchronize do
@@ -970,7 +1070,7 @@ module Harnex
       thread = Thread.new do
         begin
-          inject_stop(turn_id: turn_id)
+          inject_stop(turn_id: turn_id, interrupt: interrupt)
         rescue StandardError => e
           warn("harnex: auto-stop failed after #{reason}: #{e.message}")
         end
@@ -1016,17 +1116,26 @@ module Harnex
     def normalize_auto_stop_exit_code!
       return unless @auto_stop
-      return unless @last_completed_at
       return unless @auto_stop_fired
+      if task_failed?
+        @exit_code = 1 if @exit_code.nil? || @exit_code.zero? || @term_signal
+        @term_signal = nil if @exit_code == 1
+        return
+      end
+      return unless task_complete?
       @exit_code = 0
       @term_signal = nil
     end
     def classify_exit
       return "timeout" if @exit_code == 124
-      return "success" if @exit_code == 0 && session_summary_present?
       return "boot_failure" if boot_failure_exit?
+      return "failure" if task_failed?
+      return "success" if @exit_code == 0 && task_complete?
+      return "success" if @exit_code == 0 && session_summary_present?
       return "failure" unless @exit_code == 0
       "disconnected"
@@ -1110,7 +1219,7 @@ module Harnex
         files_changed: @git_end[:files_changed],
         commits: @git_end[:commits],
         exit: @exit_reason,
-        task_complete: !!@last_completed_at,
+        task_complete: task_complete?,
         signal: @term_signal,
         exit_code: @exit_code,
         last_error: @last_error,
@@ -1256,6 +1365,7 @@ module Harnex
       @event_counters.record(type)
       @events_mutex.synchronize do
         return unless @events_log
+        return if @events_log.closed?
         @events_log_seq += 1
         event = {

data/lib/harnex/terminal_status.rb CHANGED Viewed

@@ -45,6 +45,7 @@ module Harnex
         "process_state" => "unknown",
         "terminal" => false,
         "task_complete" => false,
+        "task_failed" => false,
         "done" => false,
         "work_state" => "unknown",
         "exit" => nil,
@@ -131,6 +132,7 @@ module Harnex
       actual = record["actual"] || {}
       state = classify_summary_state(actual)
       task_complete = !!actual["task_complete"]
+      task_failed = state == "failed" && !task_complete
       terminal = state != "unknown"
       {
         "id" => meta["id"].to_s,
@@ -139,6 +141,7 @@ module Harnex
         "process_state" => Harnex.process_state_for(state, terminal: terminal),
         "terminal" => terminal,
         "task_complete" => task_complete,
+        "task_failed" => task_failed,
         "done" => Harnex.work_done_for(state, task_complete: task_complete),
         "work_state" => Harnex.work_state_for(state, task_complete: task_complete),
         "exit" => blank_to_nil(actual["exit"]),
@@ -173,6 +176,7 @@ module Harnex
           "unknown"
         end
       task_complete = record["terminal_event"].to_s == "task_complete"
+      task_failed = record["terminal_event"].to_s == "task_failed" || (state == "failed" && !task_complete)
       terminal = state != "unknown"
       {
         "id" => record["id"].to_s,
@@ -181,6 +185,7 @@ module Harnex
         "process_state" => Harnex.process_state_for(state, terminal: terminal),
         "terminal" => terminal,
         "task_complete" => task_complete,
+        "task_failed" => task_failed,
         "done" => Harnex.work_done_for(state, task_complete: task_complete),
         "work_state" => Harnex.work_state_for(state, task_complete: task_complete),
         "exit" => history_exit(status),

data/lib/harnex/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 module Harnex
-  VERSION = "0.7.6"
-  RELEASE_DATE = "2026-06-09"
+  VERSION = "0.7.8"
+  RELEASE_DATE = "2026-06-13"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: harnex
 version: !ruby/object:Gem::Version
-  version: 0.7.6
+  version: 0.7.8
 platform: ruby
 authors:
 - Jikku Jose
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2026-06-08 00:00:00.000000000 Z
+date: 2026-06-13 00:00:00.000000000 Z
 dependencies: []
 description: A local PTY harness that wraps terminal AI agents (Claude, Codex, Pi)
   and adds a control plane for discovery, messaging, and coordination.