npm - create-walle - Versions diffs - 0.9.21 → 0.9.23 - Mend

create-walle 0.9.21 → 0.9.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (500) hide show

package/template/claude-task-manager/docs/app-update-refresh-protocol.md ADDED Viewed

@@ -0,0 +1,69 @@
+# CTM App Update Refresh Protocol
+## Problem
+CTM is a long-lived browser app. A user can keep several CTM tabs open while
+`create-walle update` replaces CTM and Wall-E code, restarts the server, and
+serves newer HTML, CSS, and JavaScript. Existing browser tabs may reconnect to
+the new server while still executing the old client bundle.
+That is unsafe in both directions:
+- silently keeping old code leaves users on stale UX after an update;
+- blindly reloading every tab can destroy active terminal input, queue drafts,
+  screenshot edits, or mobile composer text.
+## Design
+The server is the source of truth for the installed application identity. Each
+WebSocket `hello` includes an app identity:
+- `version`: the installed create-walle version;
+- `components`: CTM and Wall-E package versions;
+- `buildId`: a stable hash of key shipped files and package versions.
+`buildId` is recomputed from file stats when version info is requested. It must
+not be process-cached because update installers can replace static assets before
+or during a server restart.
+The client records the first app identity it sees for the current document. On
+later WebSocket reconnects, if the server identity changes, the page is running
+old code and must refresh before continuing normal operation.
+## UX Contract
+When a changed app identity is detected:
+1. Broadcast `reload-required` to same-origin CTM tabs using `BroadcastChannel`.
+2. If the current tab is idle, reload immediately.
+3. If the current tab has active user work, show a sticky reload-required banner
+   and do not steal focus.
+4. The banner remains until the user clicks **Reload now** or the page is
+   refreshed.
+Active user work includes focused xterm.js terminals, editable inputs,
+contenteditable composers, open modals, open queue panel, and screenshot editor
+state.
+## Mobile/PWA
+The mobile service worker already uses a network-first shell strategy and
+activates new workers. That only updates future navigations; a currently open
+mobile document still needs the same runtime identity handshake. Mobile uses the
+same `hello` comparison and shows a compact refresh banner when a composer or
+detail view is active.
+## Non-Goals
+- No hard reload while a user is typing.
+- No reload prompt for ordinary CTM restarts when the app identity is unchanged.
+- No dependency on service workers for desktop refresh behavior.
+## Verification
+Focused render tests should cover:
+- an idle desktop tab auto-refreshes on server identity change;
+- a focused terminal desktop tab shows the reload banner without losing focus;
+- another open tab receives the reload-required event through `BroadcastChannel`;
+- mobile shows the refresh banner instead of silently keeping stale code.

package/template/claude-task-manager/docs/approval-ai-refinement.md ADDED Viewed

@@ -0,0 +1,138 @@
+# Approval AI Refinement
+CTM's approval system has two separate jobs:
+1. Detect and parse the active terminal approval prompt.
+2. Decide whether the requested action is safe to approve.
+AI refinement only belongs to the first job. It must never become the approval
+policy itself.
+## Decision model (2026 update): allow-by-default + denylist
+CTM's shadow approver is **allow-by-default**: it auto-approves every detected
+prompt EXCEPT commands on the denylist. The denylist is the dangerous-command
+**blocklist** (`workers/approval-blocklist.js`) — ON by default and editable in
+the Permission Manager / Shadow Approver panel (the "permissions tab"). This is
+the single default gate.
+`handleApprovalCheck` order: parse → dedup → **blocklist** (escalate if matched)
+→ **Permission Manager rules** (user deny → escalate; user allow → approve, skip
+verifier) → **verifier** (medium+ risk, if not user-allowed) → **auto-approve**
+(one-time keystroke). The AI reviewer (`reviewWithAI`) is no longer in the
+default path.
+- **Blocklist is the denylist, ON by default.** Seeded with catastrophic /
+  irreversible patterns (rm -rf, mkfs, dd, sudo-destructive, reboot/shutdown,
+  `> /etc|/usr|/var|/boot`, curl|bash, credential exfil, recursive chmod/chown,
+  `git push --force`, `drop/truncate table`, `chmod 777` on system paths,
+  destructive `find -exec`, `node -e`/`python -c` with dangerous syscalls). Edit
+  it in the panel to add/remove what should require approval.
+- **Permission Manager rules are authoritative across providers.** The approver
+  honors the user's `perm_rules` (`Bash(node:*)` allow/deny) via
+  `lib/permission-match.js` — a **deny** match escalates; an **allow** match
+  (without `always_ask`) auto-approves and skips the verifier. These rules
+  otherwise only configure Claude Code's own settings.json, so honoring them here
+  makes them apply to Codex and every other provider too.
+- **Verifier is ON by default (second opinion).** For commands the user has NOT
+  explicitly allowed, the LLM verifier runs on **medium+ risk** and can veto an
+  auto-approval (read-only/low-risk ops skip it). Provider-agnostic via
+  `lib/background-llm.js` (`callBackgroundLlm`); a configured
+  `auto_approval_verifier_command` overrides the built-in. Turn off via
+  `ctm_settings.auto_approval_verifier_enabled = false`.
+- **One-time approvals only.** CTM sends the one-time "yes" keystroke and never
+  auto-selects the durable "allow all / don't ask again" option.
+## Self-adaptation from user behavior
+Beyond detector/parser gate-misses, CTM learns from the user's own corrections
+(`lib/approval-self-adapt.js`):
+- **Interrupt after auto-approve** (Ctrl-C right after a learned-rule
+  auto-approval) → the offending learned rule is retired.
+- **Approve after escalation** (user manually approves a prompt CTM escalated) →
+  a positive signal is recorded for the history-scan / maintenance loop to
+  promote a narrow rule.
+- A periodic **`approval-self-adapt`** scheduler job retires active detection
+  rules whose patterns no longer compile/are safe, keeping the learned set healthy.
+## Problem
+The shadow approver can miss a real approval prompt before policy review runs:
+- the detector sees approval-shaped text but the structural gate rejects it;
+- a provider parser fails to extract the command/tool;
+- the terminal snapshot is partial or races with a redraw;
+- the provider changed prompt wording.
+Previously, the rescue path returned `unparsed` before AI could diagnose the
+miss. That made refinement hide behind the parser even when the parser was the
+broken component.
+## Flow
+The refinement loop is:
+1. Record the raw missed approval observation.
+2. Ask AI to classify the miss and propose a narrow parser/detector rule.
+3. Save the candidate rule in `approval_ai_refinement_rules`, separate from
+   shipped approval rules and user-learned approval policy rules.
+4. Validate the rule locally against the same raw terminal frame:
+   - regexes compile and pass safety checks;
+   - the one-time Yes/Allow option is matched;
+   - command/tool context can be extracted;
+   - no durable "always allow" shortcut is selected.
+5. If validation passes, mark the rule `active`.
+6. Rerun the normal approval rescue path. This proves the new rule participates
+   in the same blocklist, policy, live-prompt preflight, keystroke, and
+   post-send verification as every other approval.
+7. If the rerun works, the rule remains active.
+8. If proposal, validation, or rerun fails, persist an open warning in
+   `approval_ai_refinement_warnings` and stop. The warning is queryable through
+   `/api/approval-ai-refinement/warnings` so it does not disappear if the user
+   misses a toast.
+## Storage Boundaries
+AI generated parser rules are intentionally not stored in `approval_rules`.
+- `approval_rules`: user/AI learned approval policy rules for known command
+  signatures.
+- `approval_rescue_patterns`: retry/cooldown bookkeeping for rescue attempts.
+- `approval_ai_refinement_rules`: AI generated parser/detector refinements.
+- `approval_ai_refinement_warnings`: persistent warnings when refinement fails.
+This keeps product-shipped rules, user policy, and AI-generated parsing
+patches auditable and reversible independently.
+## Telemetry
+When a refinement proposal is evaluated, CTM sends a privacy-safe telemetry
+event named `ctm_approval_ai_refinement` through the Wall-E telemetry endpoint.
+The event includes:
+- provider id;
+- source and gate reason;
+- miss type;
+- validation status;
+- confidence;
+- booleans for which rule components were proposed;
+- a sanitized generated rule payload for detector/question/yes/command-pattern
+  structure.
+The event does not include raw terminal text, command text, paths, or secrets.
+Command-shaped anchors are redacted before upload. Product maintainers can
+aggregate these validated rule shapes and use them to improve CTM's shipped
+detectors later.
+## Safety Rules
+Refinement rules are parser rules only:
+- they cannot approve anything by themselves;
+- they cannot select durable allow-all options;
+- active rules are still subject to the dangerous-command blocklist;
+- active rules are still subject to normal approval policy review;
+- keystrokes are still guarded by live terminal preflight and verification;
+- a failed rerun marks the AI rule failed and persists a warning.

package/template/claude-task-manager/docs/approval-rescue-loop.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Approval Rescue Loop
+> Defaults (2026): the approver is **allow-by-default** — it auto-approves
+> everything except the dangerous-command blocklist (the denylist, ON by default,
+> editable in the Permission Manager). The LLM verifier is opt-in (off by
+> default) and provider-agnostic via `callBackgroundLlm`; auto-approvals send the
+> one-time "yes" (never durable allow-all). See `approval-ai-refinement.md`.
+CTM has two approval layers:
+1. The deterministic approval pipeline is the source of truth. The headless terminal detects provider-specific approval widgets, structural gates reject stale/non-widget text, and `approval-agent.js` decides whether to approve, escalate, or block.
+2. The rescue loop is a bounded observer for misses. It only runs when the deterministic pipeline sees approval-shaped text but rejects it before policy can act, such as a `gate-miss` from the headless terminal.
+The rescue loop must not replace provider parsers. Its job is to recover one missed prompt, verify whether the intervention worked, and create durable evidence so the product can improve the deterministic path.
+## Flow
+1. Headless terminal emits a normal `approval-alert` when a prompt passes structural validation.
+2. If detection sees approval-shaped text but structural validation rejects it, the worker emits `gate-miss` with a redacted/capped screen tail, provider id when known, and the gate reason.
+3. Server records the observation, then asks `approval-agent` to evaluate a rescue candidate.
+4. The approval agent:
+   - parses the prompt using the provider parser, with a conservative generic numbered-approval fallback for unknown providers;
+   - checks the dangerous-command blocklist and normal heuristics;
+   - asks the background LLM, when available, whether this is an active approval prompt that should be tried once;
+   - sends only the one-time approval shortcut, never durable allow-all, for rescue attempts;
+   - verifies the result by checking output movement and whether the prompt disappeared.
+5. The rescue record is updated as success, failure, skipped stale, suppressed, or promoted.
+## Suppression
+Rescue is intentionally stingy:
+- A fingerprint covers provider, normalized command/context, and the gate reason.
+- The same fingerprint is not retried during a cooldown.
+- Repeated failures suppress future attempts for that fingerprint.
+- User warnings are throttled per fingerprint, so one bad pattern does not flood the UI.
+- A skipped stale prompt is recorded but not warned, because it usually means the user or provider already moved on.
+## Rule Promotion
+A successful rescue does not automatically create a broad approval rule.
+After success, the agent classifies why the deterministic path missed it:
+- `structural_gate_miss`, `parser_bug`, or `race`: record the evidence and keep the deterministic code path as the fix target.
+- `new_provider_pattern`: promote the exact rescue fingerprint into a deterministic rescue rule for future matching and telemetry.
+The promotion decision is deterministic. If a known provider parser already detected the prompt and a structural gate rejected it, CTM records the rescue as `structural_gate_miss` even when the LLM labels it `new_provider_pattern`. The LLM can decide whether a one-shot rescue is safe; it cannot promote a known-provider gate bug into a durable rule.
+Promotion is still bounded to the exact fingerprint. Shipping broader provider support requires adding or updating the provider parser, then covering it with regression tests.
+## Telemetry Contract
+The local DB stores `approval_rescue_patterns` rows with:
+- fingerprint, provider, source, and gate reason;
+- attempts, successes, failures, and consecutive failures;
+- last decision, outcome, diagnosis, and cooldown;
+- promoted rule metadata and last warning time.
+This gives operators enough evidence to answer:
+- Did CTM miss an approval?
+- Did the rescue attempt work?
+- Was the miss a new provider pattern or a bug in an existing detector/gate?
+- Is CTM repeatedly trying and failing on the same pattern?
+## Provider Guidance
+New coding agents should add a provider parser under `providers/` first. The rescue loop can help discover a new shape, but a provider is considered supported only after:
+- parser detection and parse tests pass;
+- headless terminal structural validation accepts the live widget;
+- rescue telemetry is no longer needed for the common prompt shape.

package/template/claude-task-manager/docs/codex-operational-warning-health.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Codex Operational Warning Health
+CTM treats Codex startup diagnostics as source problems, not rendering noise.
+The raw Terminal view stays raw: CTM does not hide or rewrite provider output.
+Low-value warnings are already collapsed in Review and Conversation renderers,
+but Terminal remains the escape hatch for exact debugging.
+## Warning Classes
+Codex warning floods usually come from two independent launch inputs:
+- **Invalid generated skill metadata.** Codex validates every enabled plugin
+  skill before the session starts. A generated `SKILL.md` with missing
+  frontmatter or a `description` over 1024 characters causes repeated
+  `Skipped loading ... invalid SKILL.md` warnings.
+- **Unhealthy MCP servers.** Enabled MCP servers are initialized by Codex at
+  startup. OAuth-backed remote MCPs such as Slack or enterprise gateway tools
+  can be enabled but expired or not logged in, which produces startup warnings
+  before the agent can do useful work.
+- **Wrong OpenAI transport.** CTM can run with `OPENAI_API_KEY` and
+  `OPENAI_BASE_URL` in its own environment for setup and background LLM jobs.
+  Interactive Codex sessions should use Codex's own auth and a stable HTTPS
+  OpenAI transport unless the operator explicitly overrides the provider.
+Wall-E is special: it is CTM's local memory/context server. CTM never disables
+Wall-E automatically. If Wall-E fails to initialize, the failure remains visible
+and is recorded in session diagnostics.
+## Launch-Time Repair
+Before spawning a Codex session, CTM runs `codex-launch-health`:
+1. Scan the generated Codex plugin cache under `~/.codex/plugins/cache`.
+2. Repair only safe skill metadata issues:
+   - parseable YAML frontmatter,
+   - present `description`,
+   - description length above the Codex limit.
+3. Preserve the original file as `SKILL.md.ctm-bak` before rewriting.
+4. Leave unrepairable files alone and record a diagnostic.
+This repair is intentionally narrow. CTM does not rewrite user-authored skills
+or infer missing frontmatter.
+Generated plugin cache paths can change when Codex refreshes a plugin bundle.
+The repair scheduler therefore only backs off after a real clean or repaired
+scan. A launch with repair disabled does not count as a completed scan, and a
+repairable write failure uses a short retry interval so the next Codex launch
+can recover after transient filesystem or sync issues clear.
+Set `CTM_CODEX_SKILL_REPAIR=0` to disable this repair path.
+## Codex Auth Guard
+Before spawning Codex, CTM strips inherited `OPENAI_API_KEY` and
+`OPENAI_BASE_URL` from interactive Codex child processes so CTM's provider env
+cannot override the user's normal Codex login/config path.
+CTM previously prepended this provider override by default:
+```text
+-c model_provider="ctm-openai-https"
+-c model_providers.ctm-openai-https.base_url="https://api.openai.com/v1"
+-c model_providers.ctm-openai-https.requires_openai_auth=true
+-c model_providers.ctm-openai-https.supports_websockets=false
+```
+That override forces Codex onto the public OpenAI Responses API endpoint and can
+break ChatGPT/Codex subscription auth with `api.responses.write` scope errors.
+It is now opt-in only for operators who intentionally want API-provider
+transport:
+```text
+CTM_CODEX_HTTP_TRANSPORT=1
+```
+The override is launch-local and does not mutate `~/.codex/config.toml`. Set
+`CTM_KEEP_OPENAI_ENV_FOR_CODEX=1` only when you intentionally want Codex to
+inherit CTM's OpenAI environment.
+## MCP Failure Feedback Loop
+CTM observes raw Codex output for MCP startup/auth failures and stores a compact
+failure record in the CTM settings table. On later Codex launches, CTM can add
+session-local Codex config overrides for optional remote MCPs that recently
+failed auth:
+```text
+-c mcp_servers.slack.enabled=false
+-c mcp_servers.ask-data-ai.enabled=false
+```
+The override is per launch. It does not mutate `~/.codex/config.toml`, and it
+expires after a short cooldown so a reconnect can take effect without manual DB
+cleanup. Protected/local MCPs, including `wall-e`, are never disabled this way.
+Set `CTM_CODEX_MCP_AUTO_DISABLE=0` to disable launch-local MCP overrides.
+## UI Contract
+- **Terminal:** exact provider stream; no warning folding or suppression.
+- **Conversation/Review:** operational warnings collapse into a warning group
+  so the real answer remains easy to read.
+- **Diagnostics:** launch repairs and MCP failures are recorded per CTM session
+  for debugging.
+This preserves debuggability while fixing the recurring startup causes that make
+the terminal unusable.

package/template/claude-task-manager/docs/codex-resume-state-guard-design.md CHANGED Viewed

@@ -26,7 +26,7 @@ The observed failure mode:
 ## Contract
-Before spawning Codex for a resume or restart restore, CTM must prove this:
+Before spawning Codex for a resume or restart restore, CTM should inspect this:
 ```text
 resume id == expected rollout filename id
@@ -34,17 +34,20 @@ Codex state row for resume id points to expected rollout path
 expected rollout file exists and its session metadata matches the resume id
 ```
-If the contract cannot be proven, CTM must not silently spawn Codex.
+If the contract cannot be proven, CTM records diagnostics and may warn the UI,
+but it must still spawn Codex. `state_5.sqlite` is Codex-owned provider state;
+Codex owns repair prompts, migrations, and final resume behavior.
 ## Behavior
 ### Healthy State DB
 If `state_5.sqlite` is readable, passes an integrity check, and has a
-thread row whose rollout path differs from CTM's mapped JSONL, CTM may repair
-that single `threads.rollout_path` cell before spawning.
+thread row whose rollout path differs from CTM's mapped JSONL, the normal CTM
+resume path treats this as an inspect-only diagnostic. CTM must not mutate
+`state_5.sqlite` before spawning Codex.
-The repair is intentionally narrow:
+Manual repair tooling, if used, must be intentionally narrow:
 - backup `state_5.sqlite` first;
 - update only `threads.rollout_path` for the exact resume id;
@@ -55,15 +58,17 @@ The repair is intentionally narrow:
 ### Corrupt Or Unusable State DB
 If the DB is corrupt, unreadable, lacks the expected row, or cannot be repaired,
-CTM blocks the Codex spawn and records diagnostics. Blocking is safer than
-starting a provider process that will append to an unrelated transcript.
+CTM records diagnostics and still launches `codex resume`. Blocking is not
+allowed because it prevents Codex from showing its own repair prompt and turns
+CTM into the owner of provider state.
-The UI/API should receive a clear error telling the user that Codex resume was
-blocked because provider state does not match CTM's session mapping.
+The UI/API may receive a non-blocking warning telling the user that provider
+state could not be verified and terminal output may temporarily diverge from
+the Conversation tab.
 ### Diagnostics
-Every mismatch, repair, or blocked spawn must be recorded in:
+Every mismatch, repair, or unverifiable state result must be recorded in:
 - the in-memory session diagnostics ring for immediate UI/debug access;
 - the durable CTM DB diagnostics table for post-restart investigation;
@@ -81,5 +86,5 @@ should not blindly move messages between JSONLs.
 - Do not make `session_index` authoritative again. It is legacy residue.
 - Do not assume Codex has a path-based resume flag; current Codex CLI help only
   exposes `codex resume [SESSION_ID]`.
-- Do not auto-repair a corrupt SQLite file in-place. First stop the bleeding by
-  blocking wrong writes, then repair state with a dedicated tool.
+- Do not auto-repair a corrupt SQLite file in-place.
+- Do not block `codex resume`; let Codex own its provider database and repair UI.