npm - @rubytech/create-maxy - Versions diffs - 1.0.633 → 1.0.635 - Mend

@rubytech/create-maxy 1.0.633 → 1.0.635

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/payload/platform/plugins/cloudflare/scripts/setup-tunnel.sh CHANGED Viewed

@@ -203,27 +203,58 @@ fi
 # --------------------------------------------------------------------------
 # Step 2+3: Create tunnel if absent; otherwise reuse. Capture UUID.
+# Emit phase_line step=tunnel-resolve with action=reused|created so the
+# stream log tailer shows which tunnel identity Steps 4+5 are writing
+# against (Task 559 — Bug B: previously a bare `echo` that only surfaced
+# in the Bash tool_result after subprocess exit).
 # --------------------------------------------------------------------------
 TUNNEL_NAME="${BRAND}-$(hostname -s)"
 TUNNEL_ID="$(cloudflared --origincert "${CFG_DIR}/cert.pem" tunnel list --output json 2>/dev/null \
   | jq -r --arg N "${TUNNEL_NAME}" '.[]? | select(.name == $N) | .id' | head -1)"
+TUNNEL_ACTION="reused"
 if [ -z "${TUNNEL_ID}" ] || [ "${TUNNEL_ID}" = "null" ]; then
   cloudflared --origincert "${CFG_DIR}/cert.pem" tunnel create "${TUNNEL_NAME}"
   TUNNEL_ID="$(cloudflared --origincert "${CFG_DIR}/cert.pem" tunnel list --output json \
     | jq -r --arg N "${TUNNEL_NAME}" '.[]? | select(.name == $N) | .id' | head -1)"
+  TUNNEL_ACTION="created"
 fi
 if [ -z "${TUNNEL_ID}" ] || [ "${TUNNEL_ID}" = "null" ]; then
+  phase_line setup-tunnel step=tunnel-resolve result=error \
+    reason=uuid-missing tunnel_name="${TUNNEL_NAME}"
   echo "ERROR: failed to create or find tunnel ${TUNNEL_NAME}" >&2
   exit 1
 fi
+phase_line setup-tunnel step=tunnel-resolve tunnel_name="${TUNNEL_NAME}" \
+  tunnel_id="${TUNNEL_ID}" action="${TUNNEL_ACTION}"
 echo "tunnel: ${TUNNEL_NAME} (${TUNNEL_ID})"
 # --------------------------------------------------------------------------
-# Step 4: Route DNS. Apex hostnames (exactly two DNS labels) cannot be
-# routed via `cloudflared tunnel route dns` — it misroutes them into
-# another zone on the account. Skip CLI routing for apex; collect for the
-# ACTION REQUIRED summary at the end.
+# Step 3b: Zone pre-flight. Before routing DNS, verify every non-apex
+# hostname's registrable parent (last two labels, e.g. rogerblack.maxy.bot
+# → maxy.bot) has NS records pointing at Cloudflare. If any hostname's
+# parent zone is not on Cloudflare, refuse the whole run before calling
+# `cloudflared tunnel route dns`.
+#
+# DESIGN NOTE — what this catches and what it does NOT catch (Task 559):
+#   CATCHES: parent zone does not exist, or its NS records do not point
+#     at Cloudflare's nameservers. Pre-529 the shell relied on a post-
+#     flight sed of cloudflared's stdout for this defence; Task 559
+#     deletes that parser because it rejects the idempotent no-op output
+#     shape (session 25674fe3) and replaces it with this inline NS probe.
+#     Same primitive the MCP path uses in
+#     cloudflared.ts::checkZoneParentOnCloudflare.
+#   DOES NOT CATCH: the zone is on Cloudflare but on a DIFFERENT account
+#     than the one cert.pem is bound to. A true account-zone-list check
+#     requires either a cloudflared CLI zone-list subcommand (does not
+#     exist as of 2026-04) or persisting the bound account's zones at
+#     tunnel-login time (deferred — separate task). The wrong-account
+#     case is detected post-hoc by tunnel-status's hostname probe, not
+#     here. This is an explicitly accepted gap per Task 559's scope.
+#
+# Probe uses 1.1.1.1 directly to bypass the device's local resolver
+# (matching Resolver.setServers in the MCP path) — avoids cache /
+# split-horizon issues on the Pi.
 # --------------------------------------------------------------------------
 is_apex() {
@@ -233,39 +264,110 @@ is_apex() {
   [ "$(echo -n "$h" | tr -cd '.' | wc -c)" = "1" ]
 }
+registrable_parent() {
+  local h="$1"
+  local labels n
+  IFS='.' read -ra labels <<< "${h}"
+  n=${#labels[@]}
+  if [ "${n}" -le 2 ]; then
+    printf '%s' "${h}"
+  else
+    printf '%s.%s' "${labels[$((n-2))]}" "${labels[$((n-1))]}"
+  fi
+}
+if ! command -v dig >/dev/null 2>&1; then
+  phase_line setup-tunnel step=zone-preflight result=error \
+    reason=dig-missing
+  echo "ERROR: dig is not in PATH — required for the zone pre-flight check." >&2
+  echo "       Install DNS tooling: sudo apt-get install -y bind9-dnsutils" >&2
+  exit 1
+fi
+ZONES_SEEN=""
+MISSING_PARENT=""
+for H in "${HOSTNAMES[@]}"; do
+  if is_apex "$H"; then continue; fi
+  ZONE="$(registrable_parent "$H")"
+  NS_OUT="$(dig +short +time=3 +tries=1 NS "${ZONE}" @1.1.1.1 2>/dev/null || true)"
+  if printf '%s' "${NS_OUT}" | grep -qiE '\.ns\.cloudflare\.com\.?$'; then
+    case ",${ZONES_SEEN}," in
+      *",${ZONE},"*) ;;
+      *) ZONES_SEEN="${ZONES_SEEN:+${ZONES_SEEN},}${ZONE}" ;;
+    esac
+  else
+    MISSING_PARENT="${H}"
+    break
+  fi
+done
+if [ -n "${MISSING_PARENT}" ]; then
+  MISSING_ZONE="$(registrable_parent "${MISSING_PARENT}")"
+  phase_line setup-tunnel step=zone-preflight result=error \
+    missing_parent_for="${MISSING_PARENT}" \
+    zones_on_account="${ZONES_SEEN}"
+  echo "" >&2
+  echo "ERROR: cannot route ${MISSING_PARENT} — its parent zone ${MISSING_ZONE}" >&2
+  echo "       is not on Cloudflare (NS records do not point at *.ns.cloudflare.com)." >&2
+  echo "       Zones confirmed on Cloudflare so far: ${ZONES_SEEN:-none}" >&2
+  echo "" >&2
+  echo "       Fix: sign into the Cloudflare account that owns ${MISSING_ZONE}" >&2
+  echo "         1. ~/reset-tunnel.sh        # clear cert.pem and tunnel state" >&2
+  echo "         2. ~/setup-tunnel.sh ...    # re-run while signed into the correct account" >&2
+  exit 1
+fi
+phase_line setup-tunnel step=zone-preflight result=ok \
+  zones_on_account="${ZONES_SEEN}"
+# --------------------------------------------------------------------------
+# Step 4: Route DNS. Apex hostnames (exactly two DNS labels) cannot be
+# routed via `cloudflared tunnel route dns` — it misroutes them into
+# another zone on the account. Skip CLI routing for apex; collect for the
+# ACTION REQUIRED summary at the end.
+#
+# Control flow (Task 559): cloudflared's exit code is the sole decision
+# signal. No stdout parsing. `cloudflared tunnel route dns --overwrite-dns`
+# exits 0 on every legitimate outcome (create, overwrite, already-correct
+# no-op) and non-zero on every legitimate failure. The pre-flight above
+# already refused if the parent zone is not on Cloudflare; the post-flight
+# parser the shell historically carried (deleted in 559) rejected the
+# idempotent no-op output shape `INF <h> is already configured to
+# route...` and caused session 25674fe3 to die after cloudflared exited 0.
+# --------------------------------------------------------------------------
 APEX_HOSTNAMES=()
 for H in "${HOSTNAMES[@]}"; do
   if is_apex "$H"; then
     APEX_HOSTNAMES+=("$H")
+    phase_line setup-tunnel step=route-dns hostname="${H}" result=apex-skip
     echo "apex ${H} — skipping CLI DNS routing (manual dashboard step required)"
     continue
   fi
-  ROUTE_OUT=$(cloudflared --origincert "${CFG_DIR}/cert.pem" \
-    tunnel route dns --overwrite-dns "${TUNNEL_ID}" "${H}" 2>&1)
-  echo "${ROUTE_OUT}"
-  # Post-flight FQDN validation: cert.pem is zone-scoped for DNS routing;
-  # if the requested hostname is not under cert's zone, cloudflared silently
-  # prepends it as a sub-label (e.g. admin.maxy.bot → admin.maxy.bot.maxy.chat
-  # when cert is for maxy.chat zone). Parse the output and fail loudly.
-  ACTUAL_FQDN=$(echo "${ROUTE_OUT}" | sed -n 's|.*Added CNAME \([^ ]*\) which will route.*|\1|p')
-  if [ -z "${ACTUAL_FQDN}" ]; then
-    echo "ERROR: could not parse CNAME FQDN from cloudflared output for ${H}" >&2
-    exit 1
-  fi
-  if [ "${ACTUAL_FQDN}" != "${H}" ]; then
-    echo "" >&2
-    echo "ERROR: cloudflared misrouted ${H} → ${ACTUAL_FQDN}" >&2
-    echo "       The cert.pem at ${CFG_DIR}/cert.pem is scoped to a zone that does not own ${H}." >&2
-    echo "       Fix:" >&2
-    echo "         1. Delete the stray CNAME ${ACTUAL_FQDN} in the CF dashboard." >&2
-    echo "         2. Re-authorize cloudflared against the zone that owns ${H}:" >&2
-    echo "              rm ${CFG_DIR}/cert.pem" >&2
-    echo "              DISPLAY=:99 cloudflared --origincert ${CFG_DIR}/cert.pem tunnel login" >&2
-    echo "              (then pick the correct zone in the dashboard consent screen)" >&2
-    echo "              mv ~/.cloudflared/cert.pem ${CFG_DIR}/cert.pem" >&2
-    echo "         3. Re-run this script." >&2
+  phase_line setup-tunnel step=route-dns hostname="${H}" tunnel_id="${TUNNEL_ID}"
+  ROUTE_LOG="$(mktemp -t maxy-route-dns.XXXXXX)"
+  # tee_subprocess_capture streams cloudflared's combined stdout+stderr
+  # into STREAM_LOG_PATH line-by-line with the [setup-tunnel:cloudflared]
+  # tag (live-tailable) AND passes the same output through this shell's
+  # stdout so the `> "${ROUTE_LOG}"` redirection can capture it for the
+  # failure-path phase_line. Exit code is cloudflared's PIPESTATUS[0].
+  if tee_subprocess_capture setup-tunnel:cloudflared -- \
+      cloudflared --origincert "${CFG_DIR}/cert.pem" \
+      tunnel route dns --overwrite-dns "${TUNNEL_ID}" "${H}" \
+      > "${ROUTE_LOG}"; then
+    phase_line setup-tunnel step=route-dns hostname="${H}" result=ok
+  else
+    ROUTE_RC=$?
+    STDERR_BOUNDED="$(tr '\n' ' ' < "${ROUTE_LOG}" | head -c 400)"
+    phase_line setup-tunnel step=route-dns hostname="${H}" result=error \
+      exit="${ROUTE_RC}" stderr="${STDERR_BOUNDED}"
+    echo "ERROR: cloudflared tunnel route dns failed for ${H} (exit=${ROUTE_RC})" >&2
+    echo "       stderr: ${STDERR_BOUNDED}" >&2
+    rm -f "${ROUTE_LOG}"
     exit 1
   fi
+  rm -f "${ROUTE_LOG}"
 done
 # --------------------------------------------------------------------------

package/payload/platform/plugins/docs/references/cloudflare.md CHANGED Viewed

@@ -24,8 +24,9 @@ Ask the agent to set up Cloudflare. The agent collects four things before acting
 The agent then invokes `setup-tunnel.sh` on the device with your inputs. The script runs end-to-end:
 - `cloudflared tunnel login` — OAuth browser sign-in. The VNC browser opens the Cloudflare authorize page; pick the account that owns your domain, click Authorize. `cert.pem` lands.
-- Tunnel creation under the naming convention `{brand}-{hostname}` (e.g. `maxy-neo`).
-- `cloudflared tunnel route dns` for each subdomain hostname. Apex hostnames cannot be routed this way — the script prints an **ACTION REQUIRED** block naming the exact dashboard record to add or edit.
+- Tunnel creation under the naming convention `{brand}-{hostname}` (e.g. `maxy-neo`). Stream log emits `step=tunnel-resolve action=reused|created` once the UUID is known so the admin agent can see which tunnel the later steps will write against.
+- **Zone pre-flight** — for every non-apex hostname the script queries `1.1.1.1` for the registrable parent's NS records and refuses the whole run if they don't point at Cloudflare. Stream log: `step=zone-preflight result=ok|error zones_on_account=… missing_parent_for=…`. Catches "domain not on Cloudflare"; does not catch "domain on a different Cloudflare account than `cert.pem` is bound to" — that case surfaces later via `tunnel-status`.
+- `cloudflared tunnel route dns` for each subdomain hostname. Apex hostnames cannot be routed this way — the script prints an **ACTION REQUIRED** block naming the exact dashboard record to add or edit. Stream log emits `step=route-dns hostname=… tunnel_id=…` before the call and `step=route-dns hostname=… result=ok|apex-skip|error` after; on error the bounded cloudflared stderr (≤400 chars) rides in the same phase line. **The script does not parse cloudflared's stdout** — exit code is the sole decision signal, so all three legitimate cloudflared output shapes (new record, overwrite, idempotent "already configured") are treated as success.
 - `config.yml` and `tunnel.state` written under `${CFG_DIR}`.
 - `systemctl --user restart ${BRAND}.service` — restarts the platform service so the new tunnel spawns via the service's `ExecStartPre=resume-tunnel.sh`.
 - Post-restart verification — `ps -ef | grep '[c]loudflared'` confirms the connector is alive, then `curl -I https://<hostname>` against each subdomain (up to 60 s per host) confirms a non-530 response.

package/payload/platform/plugins/docs/references/plugins-guide.md CHANGED Viewed

@@ -113,3 +113,11 @@ After this, every `console.error("[your-tool] ...")` from any tool in the plugin
 **Tee-state markers** land in the stream log: `[platform] [mcp-tee-attach] server=<name> streamLogPath=...` when the tee wires up, `[platform] [mcp-tee-skip] server=<name> destination=... reason=...` when a destination fails (missing `LOG_DIR`, unwritable path, `STREAM_LOG_PATH` not set, etc.), `[platform] [mcp-tee-detach] server=<name>` on graceful shutdown. If a server invoked tools but no `[mcp:<name>]` lines appear in the conversation's log, look for the skip marker first.
 **Main-subprocess stderr (Task 535).** The same teeing pattern applies to the main Claude Code subprocess's stderr — every line lands in the per-conversation stream log as `[subproc-stderr] …`, with lifecycle markers `[subproc-stderr-tee-attached] pid=…` and `[subproc-stderr-tee-detached] pid=… bytes=N lines=N`. A `bytes=0 lines=0` detach means the tee was attached but the subprocess emitted nothing on stderr — which is the normal state today, because the Claude Code CLI is a bundled Bun runtime binary that does not honour Node's `NODE_DEBUG` env var. The platform records this explicitly with one line per spawn: `[subproc-debug-unavailable] reason=bundled-bun-binary-ignores-node-debug pid=… cli=claude`. A reader who finds a `[spawn]` without these markers should treat that as a regression of the tee infrastructure, not as silence.
+## Failure-path observability contract (Task 560)
+The `initStderrTee` wrapper writes to the per-conversation stream log and per-server raw file via `createWriteStream` — async, buffered. Any diagnostic `console.error(…)` followed by an immediate `process.exit(…)` is lost: the event loop never drains the WriteStream before the process terminates. Plugins that call `process.exit()` during module load (rare — `graph-mcp` is the only in-tree example today; it spawns a child at boot to proxy upstream stdio) MUST use `fs.appendFileSync` at every exit path to guarantee the cause lands in both log destinations before exit. Lines should follow the `[mcp:<name>] [<plugin-prefix>] <cause>` format so existing `grep '[mcp:<name>]'` investigator paths work. Each destination must be wrapped in its own try/catch — an unwritable log must not mask the primary failure.
+A second observability layer closes the same gap from the platform side: when `claude-agent.ts` observes an `init` event with any MCP server reporting `status:"failed"`, it reads the last 512 bytes of `${LOG_DIR}/mcp-<name>-stderr-<date>.log` and emits `[mcp-init-error] server=<name> tail=<quoted>` into the stream log. Absent file → `tail="(no stderr file)"`; empty file → `tail="(empty)"`. This works for every plugin regardless of whether it adopted the sync-write discipline — the tail of whatever landed in the raw stderr file (from whichever destination made it out of the async buffer) is always captured.
+Signal inventory after a failed session: `[init] FAILED MCP servers: <names>` (names), `[mcp-init-error] server=<name> tail=…` (cause for each, from platform), optionally `[mcp:<name>] [<plugin>] …` (cause for each, from plugin's own sync-writes when the plugin is disciplined). Their union gives the investigator two independent sources for the same failure.

package/payload/platform/plugins/memory/references/graph-primitives.md CHANGED Viewed

@@ -15,6 +15,21 @@ The connected Neo4j instance contains only this brand's data (per-brand
 instance architecture — see `.docs/neo4j.md`). You never need an account
 filter in the query.
+## When the graph tools are absent
+If neither `maxy-graph_read_neo4j_cypher` nor `maxy-graph_get_neo4j_schema`
+appears in your tool list, the graph MCP server failed to start on this
+device. Reply once with exactly:
+> The graph MCP server failed to start on this device. Run the admin
+> system-status check to diagnose — do not retry by other routes.
+Then stop. Do not search for a similarly-named tool via `ToolSearch`, do
+not fall back to `cypher-shell` via `Bash`, do not paraphrase — the
+deterministic path through the shim is the only supported way to read
+the graph, and any substitute path loses the read-only + namespace +
+token-limit discipline the upstream server enforces.
 ## Non-negotiable: never return raw nodes
 `RETURN n` dumps every property, including the 768-dim `embedding` float