create-walle 0.9.21 → 0.9.23
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +27 -5
- package/package.json +2 -2
- package/template/CLAUDE.md +2 -2
- package/template/LICENSE +1 -1
- package/template/bin/ctm-dev-cleanup.js +24 -3
- package/template/bin/ctm-launch.sh +13 -0
- package/template/bin/dev.sh +156 -18
- package/template/bin/node-bin.sh +84 -0
- package/template/bin/pin-node.sh +51 -0
- package/template/claude-task-manager/api-prompts.js +1203 -182
- package/template/claude-task-manager/api-reviews.js +109 -15
- package/template/claude-task-manager/approval-agent.js +1360 -280
- package/template/claude-task-manager/bin/restart-ctm.sh +64 -23
- package/template/claude-task-manager/bin/storage-migration-supervisor.js +338 -0
- package/template/claude-task-manager/db.js +4417 -295
- package/template/claude-task-manager/docs/app-update-refresh-protocol.md +69 -0
- package/template/claude-task-manager/docs/approval-ai-refinement.md +138 -0
- package/template/claude-task-manager/docs/approval-rescue-loop.md +74 -0
- package/template/claude-task-manager/docs/codex-operational-warning-health.md +107 -0
- package/template/claude-task-manager/docs/codex-resume-state-guard-design.md +17 -12
- package/template/claude-task-manager/docs/codex-terminal-render-controller-handoff.md +311 -0
- package/template/claude-task-manager/docs/coding-agent-hooks-architecture.md +418 -0
- package/template/claude-task-manager/docs/conversation-import-freshness.md +20 -0
- package/template/claude-task-manager/docs/google-workspace-auth-health.md +77 -0
- package/template/claude-task-manager/docs/image-paste-ux.md +13 -0
- package/template/claude-task-manager/docs/ipad-web-preview.md +88 -0
- package/template/claude-task-manager/docs/main-loop-offload-architecture.md +66 -0
- package/template/claude-task-manager/docs/microsoft-dev-tunnel-phone-access-design.md +274 -519
- package/template/claude-task-manager/docs/mobile-live-streaming.md +27 -5
- package/template/claude-task-manager/docs/mobile-remote-submission-lifecycle.md +69 -0
- package/template/claude-task-manager/docs/phone-access-design.md +53 -15
- package/template/claude-task-manager/docs/phone-passkey-identity.md +122 -0
- package/template/claude-task-manager/docs/phone-setup.md +3 -0
- package/template/claude-task-manager/docs/prompt-editing-tree-design.md +25 -1
- package/template/claude-task-manager/docs/remote-desktop-access-design.md +268 -0
- package/template/claude-task-manager/docs/restart-lifecycle-architecture.md +95 -0
- package/template/claude-task-manager/docs/runtime-work-control-plane.md +53 -0
- package/template/claude-task-manager/docs/session-interactive-wait-surfaces.md +38 -0
- package/template/claude-task-manager/docs/session-needs-you-dismissal.md +84 -0
- package/template/claude-task-manager/docs/session-render-state-management-design.md +91 -3
- package/template/claude-task-manager/docs/session-standup-command-center-design.md +25 -1
- package/template/claude-task-manager/docs/session-title-authority.md +32 -0
- package/template/claude-task-manager/docs/session-workspace-binding.md +33 -0
- package/template/claude-task-manager/docs/skill-intent-resolution-design.md +72 -0
- package/template/claude-task-manager/docs/walle-mcp-supervisor-health.md +86 -0
- package/template/claude-task-manager/docs/walle-relay-phone-access-design.md +24 -15
- package/template/claude-task-manager/docs/walle-session-history-hydration.md +114 -0
- package/template/claude-task-manager/docs/walle-session-input-queue.md +104 -0
- package/template/claude-task-manager/docs/walle-session-model-catalog.md +90 -0
- package/template/claude-task-manager/docs/walle-session-model-preferences.md +15 -6
- package/template/claude-task-manager/git-utils.js +897 -27
- package/template/claude-task-manager/lib/agent-capabilities.js +33 -0
- package/template/claude-task-manager/lib/agent-cli-cache.js +37 -7
- package/template/claude-task-manager/lib/agent-hooks-installer.js +26 -2
- package/template/claude-task-manager/lib/agent-presets.js +17 -1
- package/template/claude-task-manager/lib/all-sessions-query.js +108 -0
- package/template/claude-task-manager/lib/approval-ai-refinement.js +488 -0
- package/template/claude-task-manager/lib/approval-self-adapt.js +168 -0
- package/template/claude-task-manager/lib/async-semaphore.js +44 -0
- package/template/claude-task-manager/lib/auth-context.js +5 -0
- package/template/claude-task-manager/lib/auth-rate-limit.js +47 -4
- package/template/claude-task-manager/lib/auth-rules.js +29 -2
- package/template/claude-task-manager/lib/auto-approval-verifier.js +129 -16
- package/template/claude-task-manager/lib/background-llm.js +144 -17
- package/template/claude-task-manager/lib/branch-inventory.js +212 -0
- package/template/claude-task-manager/lib/claude-desktop-sessions.js +15 -3
- package/template/claude-task-manager/lib/coalesce-sync-frames.js +151 -0
- package/template/claude-task-manager/lib/codex-launch-health.js +762 -0
- package/template/claude-task-manager/lib/codex-transcript-pager.js +51 -0
- package/template/claude-task-manager/lib/codex-zst.js +124 -0
- package/template/claude-task-manager/lib/coding-agent-models.js +233 -30
- package/template/claude-task-manager/lib/connection-health.js +232 -0
- package/template/claude-task-manager/lib/conversation-blob-parser.js +42 -0
- package/template/claude-task-manager/lib/conversation-tail-merge.js +89 -26
- package/template/claude-task-manager/lib/ctm-session-context-api.js +39 -10
- package/template/claude-task-manager/lib/cursor-conversation-store.js +354 -0
- package/template/claude-task-manager/lib/db-owner-worker-client.js +315 -0
- package/template/claude-task-manager/lib/document-review.js +141 -6
- package/template/claude-task-manager/lib/escalation-review.js +152 -0
- package/template/claude-task-manager/lib/graceful-shutdown.js +159 -0
- package/template/claude-task-manager/lib/headless-term-service.js +678 -0
- package/template/claude-task-manager/lib/heavy-worker-fallback.js +38 -0
- package/template/claude-task-manager/lib/jsonl-conversation-parser.js +542 -0
- package/template/claude-task-manager/lib/jsonl-range-reader.js +112 -0
- package/template/claude-task-manager/lib/main-db-census.js +216 -0
- package/template/claude-task-manager/lib/message-pagination.js +106 -4
- package/template/claude-task-manager/lib/microsoft-dev-tunnel-setup.js +750 -26
- package/template/claude-task-manager/lib/mobile-auth-api.js +274 -7
- package/template/claude-task-manager/lib/mobile-auth-store.js +592 -10
- package/template/claude-task-manager/lib/mobile-notification-dispatcher.js +15 -0
- package/template/claude-task-manager/lib/model-overview-brain-fallback.js +311 -0
- package/template/claude-task-manager/lib/model-overview-cache.js +141 -0
- package/template/claude-task-manager/lib/models-health-routing-notice.js +126 -0
- package/template/claude-task-manager/lib/node-pin-guard.js +93 -0
- package/template/claude-task-manager/lib/perf-tracker.js +242 -6
- package/template/claude-task-manager/lib/permission-match.js +76 -0
- package/template/claude-task-manager/lib/permission-sync.js +133 -20
- package/template/claude-task-manager/lib/process-title.js +35 -0
- package/template/claude-task-manager/lib/prompt-executions-query.js +25 -0
- package/template/claude-task-manager/lib/prompt-index-disk-cache.js +44 -0
- package/template/claude-task-manager/lib/prompt-intent.js +132 -0
- package/template/claude-task-manager/lib/provider-user-context.js +34 -0
- package/template/claude-task-manager/lib/read-pool-client.js +313 -0
- package/template/claude-task-manager/lib/readpool-breaker.js +31 -0
- package/template/claude-task-manager/lib/recent-sessions-breaker.js +12 -0
- package/template/claude-task-manager/lib/remote-feedback-client.js +72 -0
- package/template/claude-task-manager/lib/remote-relay-protocol.js +37 -4
- package/template/claude-task-manager/lib/remote-relay-store.js +159 -0
- package/template/claude-task-manager/lib/remote-submission-observer.js +278 -0
- package/template/claude-task-manager/lib/restart-guard.js +109 -0
- package/template/claude-task-manager/lib/restore-interruption-detector.js +439 -0
- package/template/claude-task-manager/lib/restore-policy.js +13 -0
- package/template/claude-task-manager/lib/restore-resume-batch.js +74 -0
- package/template/claude-task-manager/lib/restore-runtime.js +68 -0
- package/template/claude-task-manager/lib/restore-storm.js +34 -0
- package/template/claude-task-manager/lib/resume-cwd.js +36 -0
- package/template/claude-task-manager/lib/resume-preflight.js +313 -0
- package/template/claude-task-manager/lib/runtime-work-registry.js +444 -0
- package/template/claude-task-manager/lib/sanitize-openai-auth.js +31 -0
- package/template/claude-task-manager/lib/scheduler.js +21 -1
- package/template/claude-task-manager/lib/scrollback-snapshot-store.js +159 -0
- package/template/claude-task-manager/lib/serial-task-queue.js +64 -0
- package/template/claude-task-manager/lib/server-listeners.js +239 -0
- package/template/claude-task-manager/lib/session-capture.js +42 -7
- package/template/claude-task-manager/lib/session-content-backfill.js +131 -0
- package/template/claude-task-manager/lib/session-history.js +388 -43
- package/template/claude-task-manager/lib/session-host-manager.js +287 -0
- package/template/claude-task-manager/lib/session-image-refs.js +209 -0
- package/template/claude-task-manager/lib/session-jobs.js +399 -59
- package/template/claude-task-manager/lib/session-prompt-index.js +137 -0
- package/template/claude-task-manager/lib/session-restore.js +53 -0
- package/template/claude-task-manager/lib/session-standup.js +123 -23
- package/template/claude-task-manager/lib/session-state-bus.js +14 -0
- package/template/claude-task-manager/lib/session-stream.js +64 -16
- package/template/claude-task-manager/lib/session-timeline-summary.js +260 -0
- package/template/claude-task-manager/lib/session-token-usage.js +494 -0
- package/template/claude-task-manager/lib/session-workspace-binding.js +356 -0
- package/template/claude-task-manager/lib/setup-network-config.js +9 -0
- package/template/claude-task-manager/lib/size-cap.js +45 -0
- package/template/claude-task-manager/lib/size-cap.test.js +62 -0
- package/template/claude-task-manager/lib/skill-autocomplete.js +180 -1
- package/template/claude-task-manager/lib/skill-intent-resolver.js +304 -0
- package/template/claude-task-manager/lib/sqlite-driver.js +19 -3
- package/template/claude-task-manager/lib/standup-attention.js +7 -3
- package/template/claude-task-manager/lib/status-authority.js +39 -0
- package/template/claude-task-manager/lib/status-hooks.js +4 -0
- package/template/claude-task-manager/lib/storage-migration.js +235 -0
- package/template/claude-task-manager/lib/structured-capture.js +298 -0
- package/template/claude-task-manager/lib/sync-io-census.js +163 -0
- package/template/claude-task-manager/lib/tailscale-setup.js +6 -0
- package/template/claude-task-manager/lib/terminal-activity-evidence.js +33 -0
- package/template/claude-task-manager/lib/terminal-choice.js +364 -0
- package/template/claude-task-manager/lib/terminal-control-sanitize.js +17 -0
- package/template/claude-task-manager/lib/terminal-fingerprint.js +48 -0
- package/template/claude-task-manager/lib/terminal-output-flush.js +84 -0
- package/template/claude-task-manager/lib/timeline-order.js +122 -0
- package/template/claude-task-manager/lib/transcript-store.js +348 -43
- package/template/claude-task-manager/lib/transport-security.js +84 -1
- package/template/claude-task-manager/lib/wait-state.js +184 -0
- package/template/claude-task-manager/lib/walle-client.js +47 -5
- package/template/claude-task-manager/lib/walle-ctm-history.js +564 -4
- package/template/claude-task-manager/lib/walle-external-actions.js +135 -16
- package/template/claude-task-manager/lib/walle-history-hydration.js +46 -0
- package/template/claude-task-manager/lib/walle-native-health.js +403 -0
- package/template/claude-task-manager/lib/walle-repair.js +701 -0
- package/template/claude-task-manager/lib/walle-session-cache.js +109 -0
- package/template/claude-task-manager/lib/walle-session-context.js +57 -21
- package/template/claude-task-manager/lib/walle-session-model-catalog.js +34 -0
- package/template/claude-task-manager/lib/walle-supervisor.js +539 -63
- package/template/claude-task-manager/lib/walle-transcript.js +52 -0
- package/template/claude-task-manager/lib/worktree-active-sync.js +11 -7
- package/template/claude-task-manager/lib/worktree-cwd.js +32 -1
- package/template/claude-task-manager/package.json +1 -1
- package/template/claude-task-manager/prompt-harvest.js +89 -66
- package/template/claude-task-manager/providers/claude-code.js +51 -3
- package/template/claude-task-manager/providers/cursor.js +140 -45
- package/template/claude-task-manager/public/css/reviews.css +551 -61
- package/template/claude-task-manager/public/css/setup.css +191 -0
- package/template/claude-task-manager/public/css/walle-session.css +865 -10
- package/template/claude-task-manager/public/css/walle.css +154 -0
- package/template/claude-task-manager/public/designs/ai-providers-consolidation-v2.html +830 -0
- package/template/claude-task-manager/public/index.html +18516 -2058
- package/template/claude-task-manager/public/ipad.html +363 -0
- package/template/claude-task-manager/public/js/document-review-links.js +301 -0
- package/template/claude-task-manager/public/js/image-normalize.js +69 -36
- package/template/claude-task-manager/public/js/message-renderer.js +1265 -77
- package/template/claude-task-manager/public/js/prompts.js +66 -29
- package/template/claude-task-manager/public/js/reviews.js +901 -133
- package/template/claude-task-manager/public/js/session-activity-utils.js +11 -1
- package/template/claude-task-manager/public/js/session-search-utils.js +94 -10
- package/template/claude-task-manager/public/js/session-status-precedence.js +23 -5
- package/template/claude-task-manager/public/js/setup.js +1273 -176
- package/template/claude-task-manager/public/js/stream-view.js +691 -73
- package/template/claude-task-manager/public/js/terminal-reconciler.js +210 -0
- package/template/claude-task-manager/public/js/walle-session.js +2455 -158
- package/template/claude-task-manager/public/js/walle.js +455 -28
- package/template/claude-task-manager/public/m/app.css +2909 -262
- package/template/claude-task-manager/public/m/app.js +6601 -398
- package/template/claude-task-manager/public/m/claim.html +224 -17
- package/template/claude-task-manager/public/m/index.html +117 -21
- package/template/claude-task-manager/public/m/sw.js +3 -1
- package/template/claude-task-manager/public/manifest.json +2 -2
- package/template/claude-task-manager/public/prompts.html +30 -14
- package/template/claude-task-manager/queue-engine.js +507 -28
- package/template/claude-task-manager/scripts/repair-claude-session-images.js +27 -8
- package/template/claude-task-manager/server.js +14341 -2197
- package/template/claude-task-manager/session-integrity.js +160 -18
- package/template/claude-task-manager/session-search-ranking.js +1 -0
- package/template/claude-task-manager/session-utils.js +25 -5
- package/template/claude-task-manager/workers/approval-blocklist.js +96 -6
- package/template/claude-task-manager/workers/approval-widget-validator.js +14 -8
- package/template/claude-task-manager/workers/conversation-import-worker.js +11 -50
- package/template/claude-task-manager/workers/db-owner-worker.js +386 -0
- package/template/claude-task-manager/workers/harvest-worker.js +9 -55
- package/template/claude-task-manager/workers/headless-term-worker.js +9 -530
- package/template/claude-task-manager/workers/read-pool-worker.js +387 -0
- package/template/claude-task-manager/workers/scrollback-worker.js +11 -72
- package/template/claude-task-manager/workers/session-host-process.js +146 -0
- package/template/claude-task-manager/workers/session-integrity-worker.js +10 -54
- package/template/claude-task-manager/workers/state-detectors/base.js +18 -1
- package/template/claude-task-manager/workers/state-detectors/claude-code.js +182 -9
- package/template/claude-task-manager/workers/state-detectors/codex.js +150 -2
- package/template/claude-task-manager/workers/state-detectors/cursor.js +127 -0
- package/template/claude-task-manager/workers/state-detectors/gemini.js +21 -0
- package/template/claude-task-manager/workers/state-detectors/index.js +29 -0
- package/template/claude-task-manager/workers/state-detectors/opencode.js +103 -0
- package/template/docs/design/markdown-review-pane.md +206 -0
- package/template/docs/designs/2026-05-17-portkey-gateway-provider-ux.md +129 -38
- package/template/docs/designs/2026-05-20-mobile-worktree-finish-command.md +27 -0
- package/template/docs/designs/2026-05-22-ai-configuration-consolidation.md +248 -0
- package/template/docs/designs/ai-configuration-consolidation-mock.html +812 -0
- package/template/docs/private-memory-and-pii-policy.md +69 -0
- package/template/package.json +2 -1
- package/template/scripts/check-private-data.js +201 -0
- package/template/shared/sqlite-owner-guard.js +30 -0
- package/template/shared/sqlite-owner-write-queue.js +225 -0
- package/template/shared/sqlite-storage-policy.js +111 -0
- package/template/shared/sqlite-write-lock.js +428 -0
- package/template/wall-e/agent-runners/claude-code.js +5 -0
- package/template/wall-e/agent.js +166 -22
- package/template/wall-e/api-walle.js +524 -70
- package/template/wall-e/auth/provider-flows.js +11 -1
- package/template/wall-e/bin/walle-mcp-stdio.js +341 -17
- package/template/wall-e/brain.js +1614 -141
- package/template/wall-e/chat/attachment-blocks.js +96 -0
- package/template/wall-e/chat/attachments.js +2 -1
- package/template/wall-e/chat/capability-resolver.js +7 -7
- package/template/wall-e/chat/context-messages.js +28 -0
- package/template/wall-e/chat/conversation-frame.js +630 -0
- package/template/wall-e/chat/provider-messages.js +125 -0
- package/template/wall-e/chat.js +1002 -233
- package/template/wall-e/coding/acceptance-contract.js +170 -0
- package/template/wall-e/coding/acp-adapter.js +1 -1
- package/template/wall-e/coding/agent-catalog.js +3 -0
- package/template/wall-e/coding/artifact-store.js +93 -0
- package/template/wall-e/coding/capability-router.js +120 -0
- package/template/wall-e/coding/coding-run-controller.js +423 -0
- package/template/wall-e/coding/compaction-service.js +157 -12
- package/template/wall-e/coding/frontend-verification.js +258 -0
- package/template/wall-e/coding/lifecycle-hooks.js +75 -0
- package/template/wall-e/coding/local-preview-contract.js +157 -0
- package/template/wall-e/coding/permission-service.js +57 -13
- package/template/wall-e/coding/prompt-bundle.js +19 -1
- package/template/wall-e/coding/prompt-section-registry.js +227 -0
- package/template/wall-e/coding/provider-compat.js +15 -0
- package/template/wall-e/coding/runtime-events.js +224 -0
- package/template/wall-e/coding/runtime-mode.js +3 -0
- package/template/wall-e/coding/side-git-snapshot.js +160 -4
- package/template/wall-e/coding/snapshot-service.js +143 -1
- package/template/wall-e/coding/stream-processor.js +388 -34
- package/template/wall-e/coding/task-tool.js +141 -4
- package/template/wall-e/coding/tool-execution-controller.js +365 -0
- package/template/wall-e/coding/tool-registry.js +43 -5
- package/template/wall-e/coding/user-hooks.js +217 -0
- package/template/wall-e/coding-orchestrator.js +1330 -221
- package/template/wall-e/coding-prompts.js +20 -4
- package/template/wall-e/context/context-builder.js +15 -2
- package/template/wall-e/decision/confidence.js +1 -1
- package/template/wall-e/docs/coding-acceptance-contract.md +41 -0
- package/template/wall-e/docs/external-action-controller.md +26 -6
- package/template/wall-e/docs/telemetry-lifecycle.md +8 -2
- package/template/wall-e/embeddings.js +591 -53
- package/template/wall-e/external-action-controller.js +12 -0
- package/template/wall-e/http/auth.js +1 -0
- package/template/wall-e/http/chat-api.js +46 -11
- package/template/wall-e/http/model-admin.js +836 -34
- package/template/wall-e/lib/boot-profile.js +88 -0
- package/template/wall-e/lib/event-loop-monitor.js +93 -0
- package/template/wall-e/lib/service-health.js +194 -0
- package/template/wall-e/llm/anthropic.js +130 -5
- package/template/wall-e/llm/client.js +266 -63
- package/template/wall-e/llm/default-fallback.js +382 -0
- package/template/wall-e/llm/health.js +19 -0
- package/template/wall-e/llm/message-guard.js +78 -0
- package/template/wall-e/llm/model-catalog.js +252 -1
- package/template/wall-e/llm/openai.js +26 -4
- package/template/wall-e/llm/portkey-sync.js +654 -0
- package/template/wall-e/llm/provider-error.js +30 -2
- package/template/wall-e/llm/registry.js +5 -1
- package/template/wall-e/llm/request-compat.js +67 -0
- package/template/wall-e/loops/backfill.js +79 -23
- package/template/wall-e/loops/brain-optimize.js +67 -0
- package/template/wall-e/loops/ingest.js +25 -10
- package/template/wall-e/loops/question-digest.js +160 -0
- package/template/wall-e/loops/reflect.js +6 -4
- package/template/wall-e/loops/think.js +39 -12
- package/template/wall-e/mcp-server.js +318 -36
- package/template/wall-e/memory/ctm-context-client.js +52 -14
- package/template/wall-e/memory/ctm-operational-context.js +237 -0
- package/template/wall-e/memory/ctm-prompt-executions-client.js +128 -0
- package/template/wall-e/memory/ctm-session-context.js +111 -63
- package/template/wall-e/prompts/coding/deepseek.txt +3 -0
- package/template/wall-e/prompts/coding/gemini.txt +6 -0
- package/template/wall-e/prompts/coding/gpt.txt +6 -0
- package/template/wall-e/prompts/coding/local.txt +7 -0
- package/template/wall-e/runtime/decision-hooks.js +115 -0
- package/template/wall-e/runtime/devbox-gateway.js +82 -8
- package/template/wall-e/runtime/prompt-manifest.js +86 -0
- package/template/wall-e/runtime/tool-executor.js +269 -0
- package/template/wall-e/runtime/tool-result-envelope.js +138 -0
- package/template/wall-e/runtime/transcript-projection.js +60 -0
- package/template/wall-e/runtime/walle-runtime.js +224 -0
- package/template/wall-e/scripts/db-optimize/migrate.js +162 -0
- package/template/wall-e/scripts/db-optimize/recall-eval.js +117 -0
- package/template/wall-e/server.js +15 -0
- package/template/wall-e/session-files.js +9 -0
- package/template/wall-e/skills/_bundled/google-calendar/run.js +1 -1
- package/template/wall-e/skills/_bundled/gws-workspace/run.js +1 -1
- package/template/wall-e/skills/_bundled/slack-mentions/run.js +76 -6
- package/template/wall-e/skills/claude-code-reader.js +7 -3
- package/template/wall-e/skills/script-skill-runner.js +10 -0
- package/template/wall-e/skills/skill-planner.js +38 -0
- package/template/wall-e/tools/builtin-middleware.js +19 -9
- package/template/wall-e/tools/local-tools.js +1428 -16
- package/template/wall-e/tools/permission-checker.js +73 -5
- package/template/wall-e/tools/question-manager.js +117 -7
- package/template/wall-e/training/harvester.js +12 -28
- package/template/wall-e/training/replay.js +25 -80
- package/template/website/index.html +10 -10
- package/template/wall-e/eval/ab-test.js +0 -203
- package/template/wall-e/eval/agent-runner.js +0 -772
- package/template/wall-e/eval/agent-scorer.js +0 -461
- package/template/wall-e/eval/aggregator.js +0 -414
- package/template/wall-e/eval/allowed-test-commands.js +0 -34
- package/template/wall-e/eval/benchmark-generator.js +0 -113
- package/template/wall-e/eval/benchmarks/chat-eval.json +0 -1662
- package/template/wall-e/eval/benchmarks/chat.json +0 -82
- package/template/wall-e/eval/benchmarks/coding-agent-real.json +0 -1
- package/template/wall-e/eval/benchmarks/coding-agent.json +0 -1581
- package/template/wall-e/eval/benchmarks/coding.json +0 -122
- package/template/wall-e/eval/benchmarks/memory-retrieval.json +0 -234
- package/template/wall-e/eval/benchmarks/reasoning.json +0 -82
- package/template/wall-e/eval/benchmarks/swebench-lite-30.json +0 -212
- package/template/wall-e/eval/benchmarks.js +0 -669
- package/template/wall-e/eval/cc-replay.js +0 -719
- package/template/wall-e/eval/chat-eval.js +0 -525
- package/template/wall-e/eval/check-keys.js +0 -15
- package/template/wall-e/eval/check-providers.js +0 -42
- package/template/wall-e/eval/codex-cli-baseline.js +0 -669
- package/template/wall-e/eval/coding-agent-real.js +0 -570
- package/template/wall-e/eval/context-compactor.js +0 -251
- package/template/wall-e/eval/debug-agent003.js +0 -68
- package/template/wall-e/eval/diagnostics.js +0 -216
- package/template/wall-e/eval/eval-orchestrator.js +0 -642
- package/template/wall-e/eval/evaluate.js +0 -202
- package/template/wall-e/eval/evaluator.js +0 -373
- package/template/wall-e/eval/exporter.js +0 -212
- package/template/wall-e/eval/fixtures/express-basic/package.json +0 -9
- package/template/wall-e/eval/fixtures/express-basic/server.js +0 -115
- package/template/wall-e/eval/fixtures/express-basic/test.js +0 -83
- package/template/wall-e/eval/fixtures/express-buggy/package.json +0 -9
- package/template/wall-e/eval/fixtures/express-buggy/server.js +0 -113
- package/template/wall-e/eval/fixtures/express-buggy/test.js +0 -83
- package/template/wall-e/eval/fixtures/express-buggy-items/package.json +0 -9
- package/template/wall-e/eval/fixtures/express-buggy-items/server.js +0 -112
- package/template/wall-e/eval/fixtures/express-buggy-items/test.js +0 -83
- package/template/wall-e/eval/fixtures/express-buggy-search/package.json +0 -9
- package/template/wall-e/eval/fixtures/express-buggy-search/server.js +0 -121
- package/template/wall-e/eval/fixtures/express-buggy-search/test.js +0 -83
- package/template/wall-e/eval/fixtures/express-rename-data/data.js +0 -34
- package/template/wall-e/eval/fixtures/express-rename-data/package.json +0 -9
- package/template/wall-e/eval/fixtures/express-rename-data/server.js +0 -97
- package/template/wall-e/eval/fixtures/express-rename-data/test.js +0 -88
- package/template/wall-e/eval/fixtures/express-xss/package.json +0 -12
- package/template/wall-e/eval/fixtures/express-xss/server.js +0 -90
- package/template/wall-e/eval/fixtures/express-xss/test.js +0 -67
- package/template/wall-e/eval/fixtures/express-xss/views/profile.ejs +0 -9
- package/template/wall-e/eval/fixtures/fullstack-app/config/default.js +0 -9
- package/template/wall-e/eval/fixtures/fullstack-app/config/test.js +0 -13
- package/template/wall-e/eval/fixtures/fullstack-app/package.json +0 -11
- package/template/wall-e/eval/fixtures/fullstack-app/public/css/style.css +0 -137
- package/template/wall-e/eval/fixtures/fullstack-app/public/index.html +0 -46
- package/template/wall-e/eval/fixtures/fullstack-app/public/js/app.js +0 -121
- package/template/wall-e/eval/fixtures/fullstack-app/public/js/auth.js +0 -71
- package/template/wall-e/eval/fixtures/fullstack-app/public/js/items.js +0 -80
- package/template/wall-e/eval/fixtures/fullstack-app/public/js/users.js +0 -46
- package/template/wall-e/eval/fixtures/fullstack-app/public/login.html +0 -45
- package/template/wall-e/eval/fixtures/fullstack-app/public/register.html +0 -38
- package/template/wall-e/eval/fixtures/fullstack-app/scripts/migrate.js +0 -23
- package/template/wall-e/eval/fixtures/fullstack-app/scripts/seed.js +0 -46
- package/template/wall-e/eval/fixtures/fullstack-app/server/db.js +0 -99
- package/template/wall-e/eval/fixtures/fullstack-app/server/index.js +0 -94
- package/template/wall-e/eval/fixtures/fullstack-app/server/middleware/auth.js +0 -19
- package/template/wall-e/eval/fixtures/fullstack-app/server/middleware/logger.js +0 -19
- package/template/wall-e/eval/fixtures/fullstack-app/server/router.js +0 -50
- package/template/wall-e/eval/fixtures/fullstack-app/server/routes/auth.js +0 -69
- package/template/wall-e/eval/fixtures/fullstack-app/server/routes/health.js +0 -23
- package/template/wall-e/eval/fixtures/fullstack-app/server/routes/items.js +0 -88
- package/template/wall-e/eval/fixtures/fullstack-app/server/routes/users.js +0 -75
- package/template/wall-e/eval/fixtures/fullstack-app/server/test.js +0 -198
- package/template/wall-e/eval/fixtures/fullstack-app/server/utils/response.js +0 -34
- package/template/wall-e/eval/fixtures/fullstack-app/server/utils/validate.js +0 -26
- package/template/wall-e/eval/fixtures/fullstack-app/server.js +0 -8
- package/template/wall-e/eval/fixtures/fullstack-app/test.js +0 -12
- package/template/wall-e/eval/fixtures/monorepo-basic/package.json +0 -8
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/api/data.js +0 -58
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/api/middleware.js +0 -46
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/api/package.json +0 -8
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/api/routes.js +0 -64
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/api/server.js +0 -56
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/api/test.js +0 -116
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/cli/commands.js +0 -61
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/cli/index.js +0 -62
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/cli/output.js +0 -43
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/cli/package.json +0 -11
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/cli/test.js +0 -44
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/shared/formatters.js +0 -43
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/shared/index.js +0 -12
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/shared/package.json +0 -5
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/shared/test.js +0 -55
- package/template/wall-e/eval/fixtures/monorepo-basic/packages/shared/validators.js +0 -29
- package/template/wall-e/eval/fixtures/monorepo-basic/test.js +0 -46
- package/template/wall-e/eval/fixtures/node-cli/index.js +0 -78
- package/template/wall-e/eval/fixtures/node-cli/package.json +0 -10
- package/template/wall-e/eval/fixtures/node-cli/test.js +0 -57
- package/template/wall-e/eval/fixtures/node-typed/package.json +0 -8
- package/template/wall-e/eval/fixtures/node-typed/src/handlers.js +0 -31
- package/template/wall-e/eval/fixtures/node-typed/src/utils.js +0 -33
- package/template/wall-e/eval/fixtures/node-typed/test.js +0 -36
- package/template/wall-e/eval/fixtures/python-flask/app.py +0 -14
- package/template/wall-e/eval/fixtures/python-flask/requirements.txt +0 -2
- package/template/wall-e/eval/fixtures/python-flask/test_app.py +0 -25
- package/template/wall-e/eval/fixtures/wall-e-subset/brain.js +0 -105
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/aggregator.js +0 -101
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/benchmarks/chat.json +0 -20
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/benchmarks/coding.json +0 -32
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/benchmarks.js +0 -64
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/fixtures/simple-project/package.json +0 -6
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/fixtures/simple-project/server.js +0 -31
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/fixtures/simple-project/test.js +0 -18
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/fixtures/simple-project/utils.js +0 -34
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/runner.js +0 -104
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/scorer.js +0 -73
- package/template/wall-e/eval/fixtures/wall-e-subset/eval/test.js +0 -134
- package/template/wall-e/eval/fixtures/wall-e-subset/llm/client.js +0 -99
- package/template/wall-e/eval/fixtures/wall-e-subset/llm/providers.js +0 -63
- package/template/wall-e/eval/fixtures/wall-e-subset/llm/test.js +0 -70
- package/template/wall-e/eval/fixtures/wall-e-subset/package.json +0 -10
- package/template/wall-e/eval/fixtures/wall-e-subset/test.js +0 -86
- package/template/wall-e/eval/harvester.js +0 -685
- package/template/wall-e/eval/head-to-head.js +0 -388
- package/template/wall-e/eval/humaneval-adapter.js +0 -321
- package/template/wall-e/eval/list-models.js +0 -31
- package/template/wall-e/eval/livecodebench-adapter.js +0 -291
- package/template/wall-e/eval/mail-integration.js +0 -443
- package/template/wall-e/eval/manifest.js +0 -186
- package/template/wall-e/eval/meta-harness/adapters/coding-agent.js +0 -57
- package/template/wall-e/eval/meta-harness/bootstrap-snapshot.js +0 -149
- package/template/wall-e/eval/meta-harness/candidate-store.js +0 -117
- package/template/wall-e/eval/meta-harness/cli.js +0 -86
- package/template/wall-e/eval/meta-harness/domain-spec.js +0 -154
- package/template/wall-e/eval/meta-harness/domains/coding-agent.domain.json +0 -84
- package/template/wall-e/eval/meta-harness/examples/env-bootstrap-candidate.js +0 -29
- package/template/wall-e/eval/meta-harness/experience-store.js +0 -174
- package/template/wall-e/eval/meta-harness/frontier.js +0 -96
- package/template/wall-e/eval/meta-harness/harness-interface.js +0 -90
- package/template/wall-e/eval/meta-harness/leakage-guard.js +0 -80
- package/template/wall-e/eval/meta-harness/optimizer.js +0 -207
- package/template/wall-e/eval/meta-harness/proposer-runner.js +0 -110
- package/template/wall-e/eval/meta-harness/reporting.js +0 -58
- package/template/wall-e/eval/meta-harness/telemetry.js +0 -27
- package/template/wall-e/eval/meta-harness/validation.js +0 -81
- package/template/wall-e/eval/promoter.js +0 -228
- package/template/wall-e/eval/provider-normalizer.js +0 -33
- package/template/wall-e/eval/replay.js +0 -395
- package/template/wall-e/eval/run-agent-benchmarks.js +0 -386
- package/template/wall-e/eval/run-codex-cli-baseline.js +0 -177
- package/template/wall-e/eval/run-coding-agent-real.js +0 -187
- package/template/wall-e/eval/run-eval.js +0 -435
- package/template/wall-e/eval/run-model-comparison.js +0 -142
- package/template/wall-e/eval/session-evaluator.js +0 -187
- package/template/wall-e/eval/session-miner.js +0 -207
- package/template/wall-e/eval/session-retrieval-benchmark.js +0 -150
- package/template/wall-e/eval/session-transcripts.js +0 -509
- package/template/wall-e/eval/shadow.js +0 -161
- package/template/wall-e/eval/swebench-adapter.js +0 -345
- package/template/wall-e/eval/swebench-docker.js +0 -192
- package/template/wall-e/eval/train.py +0 -320
- package/template/wall-e/eval/trainer.js +0 -232
- package/template/wall-e/eval/weekly-eval-loop.js +0 -241
|
@@ -9,6 +9,11 @@ const {
|
|
|
9
9
|
} = require('./runtime/prompt-envelope');
|
|
10
10
|
const { runtimeModeInstructions } = require('./coding/runtime-mode');
|
|
11
11
|
const { buildCapabilityContext } = require('./coding/prompt-capabilities');
|
|
12
|
+
const {
|
|
13
|
+
buildArtifactCapabilityContext,
|
|
14
|
+
hasCapability,
|
|
15
|
+
routeArtifactCapabilities,
|
|
16
|
+
} = require('./coding/capability-router');
|
|
12
17
|
const { buildResponseLanguagePolicy } = require('./context/response-language');
|
|
13
18
|
|
|
14
19
|
/**
|
|
@@ -124,7 +129,7 @@ These are calibration prompts, not absolute rules. The user's explicit choices (
|
|
|
124
129
|
- IMAGE-LED for visual portfolios — when the subject's work IS images, one generous image beats four small grid cards; let the strongest piece bleed full-width; the work is the hero, not the bio. (Inverse: for text-led genres like docs or essays, typography is the hero.)
|
|
125
130
|
- EARN every UI component — no stat cards, timelines, pull-quotes, or testimonial walls by reflex. Each should pass: would this section exist in a thoughtfully-designed analog version (printed monograph, gallery placard, zine, brochure, dashboard)?
|
|
126
131
|
- REFERENCE 2-3 real sites by name before designing; borrow with intent. "I'm drawing on [X]'s editorial scale and [Y]'s color restraint" — then build. Pick references in the right genre.
|
|
127
|
-
- SCREENSHOT what you built — call
|
|
132
|
+
- SCREENSHOT + SMOKE what you built — call \`browser_screenshot\` with desktop and mobile viewports, then call \`browser_smoke_test\` against the same file:// or verified localhost URL. Screenshots prove appearance; browser_smoke_test proves the page has no runtime JS errors or broken click handlers. If browser verification returns ok:false, fix the issue or state the exact blocker — don't pretend you verified.
|
|
128
133
|
- VOICE in copy: write specifically for this subject, not in AI-generic prose. If you lack voice material, leave a clear placeholder ("WRITE A REAL BIO HERE") instead of inventing plausible-sounding filler.
|
|
129
134
|
|
|
130
135
|
If the user-provided brief explicitly chose a style on this list (e.g., "use Cormorant + Inter for an editorial book site"), follow the brief — the discipline above does not override explicit instructions. Note your choice and the user's source in code comments so the next reader sees the intent.
|
|
@@ -195,13 +200,21 @@ function buildAgentSystemPrompt({ resolvedCwd, projectInfo, projectSkills, taskF
|
|
|
195
200
|
const runtimeCtx = runtimeModeInstructions(runtimeMode);
|
|
196
201
|
const memoryToolsAvailable = runtimeContext.memoryToolsAvailable !== false;
|
|
197
202
|
const memoryProtocolCtx = loadMemoryProtocolBlock({ available: memoryToolsAvailable });
|
|
198
|
-
const
|
|
203
|
+
const artifactCapabilities = runtimeContext.artifactCapabilities
|
|
204
|
+
|| routeArtifactCapabilities({
|
|
205
|
+
prompt: runtimeContext.userTask,
|
|
206
|
+
taskFileHints,
|
|
207
|
+
projectInfo,
|
|
208
|
+
});
|
|
209
|
+
const frontendDesignCtx = (hasCapability(artifactCapabilities, 'frontend_design') || isFrontendTask(taskFileHints, runtimeContext.userTask))
|
|
199
210
|
? loadFrontendDesignBlock({ available: true })
|
|
200
211
|
: '';
|
|
201
212
|
const capabilityContext = runtimeContext.capabilityContext
|
|
202
213
|
|| buildCapabilityContext(runtimeContext.promptCapabilities);
|
|
214
|
+
const artifactCapabilityContext = buildArtifactCapabilityContext(artifactCapabilities);
|
|
203
215
|
const channelContext = [
|
|
204
216
|
runtimeContext.channelContext || '',
|
|
217
|
+
artifactCapabilityContext || '',
|
|
205
218
|
capabilityContext || '',
|
|
206
219
|
].filter(Boolean).join('\n\n');
|
|
207
220
|
const responseLanguagePolicy = buildResponseLanguagePolicy({
|
|
@@ -240,7 +253,7 @@ ${memoryProtocolCtx ? `${memoryProtocolCtx}\n\n` : ''}${frontendDesignCtx ? `${f
|
|
|
240
253
|
- Prefer dedicated tools over run_shell when one fits: Read for known paths, edit_file/multi_edit for surgical edits, list_directory over \`ls -R\`. Reserve run_shell for things only a shell can do.
|
|
241
254
|
- For writing source files, use write_file/edit_file/multi_edit. These tools can write inside the current project/cwd, including temporary project directories. Do not use run_shell heredocs or redirects just to create source files.
|
|
242
255
|
- run_shell takes a complete shell command string in \`command\`. If you need pipes, redirects, heredocs, or \`cd ... && ...\`, put the whole shell expression in \`command\`, not in \`args\` or an interpreter \`-c\` wrapper.
|
|
243
|
-
- For static HTML/CSS/JS verification, use browser_screenshot with a \`file://\` URL for the local HTML file.
|
|
256
|
+
- For static HTML/CSS/JS verification, use browser_screenshot and browser_smoke_test with a \`file://\` URL for the local HTML file. If a static file server is genuinely needed, use start_static_server and check_url. For NON-static long-lived processes (dev servers, watchers, long builds), use run_shell with \`background: true\` and poll with bg_output / stop with bg_kill — never append \`&\` to a command. Never say a localhost/127.0.0.1 preview is live, back up, HTTP 200, or reachable unless the current turn has successful start_static_server/check_url/browser_screenshot/browser_smoke_test evidence for that URL; localhost evidence is Wall-E host loopback only, not phone/remote-browser proof.
|
|
244
257
|
- Multiple INDEPENDENT tool calls can run in parallel — use that to keep the loop fast. SEQUENTIAL calls (each depends on the previous result) must run one at a time.
|
|
245
258
|
- Destructive run_shell operations (git reset --hard, force push, rm -rf, dropping tables) need explicit user authorization unless they're trivially recoverable in this sandbox. Investigate before deleting unfamiliar files.
|
|
246
259
|
|
|
@@ -351,7 +364,7 @@ ${context.plannerNotes}`);
|
|
|
351
364
|
Output JSON with the following structure (no other text outside the JSON):
|
|
352
365
|
{
|
|
353
366
|
"subtasks": [
|
|
354
|
-
{ "title": "Short title", "prompt": "Detailed prompt for this subtask" }
|
|
367
|
+
{ "title": "Short title", "prompt": "Detailed prompt for this subtask", "acceptance": { "task_kind": "code|frontend-ui|docs|test", "write_policy": "must-write|may-write|read-only", "validators": ["project.tests"] } }
|
|
355
368
|
],
|
|
356
369
|
"branch_name": "type/short-description",
|
|
357
370
|
"estimated_scope": "small|medium|large"
|
|
@@ -360,9 +373,12 @@ Output JSON with the following structure (no other text outside the JSON):
|
|
|
360
373
|
Rules:
|
|
361
374
|
- This planning step has no tool access. Do not inspect files, run commands, or ask for tools; use only the supplied context.
|
|
362
375
|
- Maximum ${maxSubtasks} subtasks. Smaller is better — combine where you can.
|
|
376
|
+
- Vague polish prompts such as "improve UX", "make it world class", or "polish the page" are bounded improvement tasks by default, not permission for a full rewrite. Plan 1-3 high-impact changes unless the user explicitly asks for a rebuild.
|
|
377
|
+
- For frontend/UI subtasks, keep HTML, CSS, and JS contracts together. If a subtask adds an inline handler or interactive control in HTML, the same subtask must implement the matching JavaScript and verify it.
|
|
363
378
|
- Each subtask must be independently executable: complete enough context that the next agent doesn't need this plan to do it.
|
|
364
379
|
- Order so dependencies come first. Test-writing subtasks often go LAST (so we test the final shape, not an intermediate one).
|
|
365
380
|
- Preserve explicit verification/tool requirements from the user request inside the relevant subtask prompt, using the same tool names and key arguments when given.
|
|
381
|
+
- For frontend/UI work, include acceptance validators ["frontend.static_contract","frontend.screenshot_evidence","frontend.browser_runtime"] and tell the worker to run both browser_screenshot and browser_smoke_test.
|
|
366
382
|
- branch_name follows conventional naming (feat/, fix/, refactor/, test/, docs/, chore/).
|
|
367
383
|
- Each subtask prompt must be specific: which files, what behavior, how to verify. "Implement X" without context is not a subtask.`);
|
|
368
384
|
|
|
@@ -343,16 +343,28 @@ Relevant memories and knowledge are provided above. If they answer the question,
|
|
|
343
343
|
|
|
344
344
|
### Step 2: SEARCH — only if the context above is insufficient
|
|
345
345
|
Call search_memories to find additional evidence. Batch multiple searches in ONE turn.
|
|
346
|
-
|
|
346
|
+
- **Search by keywords, not whole sentences.** Pull the key nouns/names out of the request; don't paste the full instruction as the query.
|
|
347
|
+
- **People:** when the request is about a person, search their name AND call **lookup_person** to get their aliases/email/handle, then search those too. People are often referenced by email or @handle, not full name.
|
|
348
|
+
- Use different query angles: English keywords, Chinese terms, source filters.
|
|
347
349
|
For private, remembered, or work-context questions, use Wall-E memory before public web_fetch. This includes prior conversations, decisions, preferences, people, teams, projects, tools, Slack/email/calendar context, questions about the user's own writing style/tone/personality/work patterns, and "last time" / "do you know" / "what did we discuss" prompts. Use public web only for public/current facts or after memory misses.
|
|
348
350
|
|
|
351
|
+
### Step 2b: ESCALATE — when stored memory comes back empty
|
|
352
|
+
Empty memory is NOT a dead end — it usually means the brain hasn't ingested it yet, not that it doesn't exist. Before concluding you lack context, query the LIVE sources you have:
|
|
353
|
+
- **slack_search** (and slack_read_channel) for live Slack history about a person/topic.
|
|
354
|
+
- **mail_search** / **mail_messages** for email threads.
|
|
355
|
+
- **calendar_events** for meetings together.
|
|
356
|
+
- For a colleague, use **lookup_person** / entity tools and Glean (org directory, role, manager) to anchor who they are.
|
|
357
|
+
Only after BOTH stored memory and the relevant live sources come back empty may you say you couldn't find specific history — and then proceed to Step 4 with what you do know.
|
|
358
|
+
|
|
349
359
|
### Step 3: THINK — reason through the evidence
|
|
350
360
|
Use the **think** tool before responding to:
|
|
351
361
|
- Analyze what the evidence ACTUALLY shows vs what it SEEMS to show
|
|
352
362
|
- Challenge your conclusions: do you have 3+ examples, or are you over-generalizing?
|
|
353
363
|
- Consider if behavior is DELIBERATE and STRATEGIC rather than a gap
|
|
354
364
|
|
|
355
|
-
### Step 4: RESPOND —
|
|
365
|
+
### Step 4: RESPOND — never dead-end a request
|
|
366
|
+
- **If asked to WRITE/DRAFT/COMPOSE something** (a note, message, email, summary): produce the actual draft. "Reads like me / in my voice" does NOT require more lookups — apply ${ownerName}'s writing style (from the profile/memories above; search "writing style"/"tone" if needed) and, if you have past examples of this kind of note, mirror them. When you're missing a specific fact, write the draft anyway and mark the gap inline with a clear \`[placeholder: ...]\` rather than refusing.
|
|
367
|
+
- **Never answer a "do X for me" request with only "I don't have enough context."** Offer the best partial result you can, plus EITHER a draft-with-placeholders OR exactly ONE targeted clarifying question (e.g. "What's one moment with them you'd want to highlight?"). A useful draft beats a polished refusal.
|
|
356
368
|
- Use **bold** for key names, dates, and decisions
|
|
357
369
|
- Use > blockquotes when quoting actual Slack messages
|
|
358
370
|
- Include dates and people for attribution
|
|
@@ -376,6 +388,7 @@ function buildToolRefBlock(ownerName, intent) {
|
|
|
376
388
|
}
|
|
377
389
|
lines.push('- **run_skill / mcp_call / list_mcp_tools**: Actions and external services.');
|
|
378
390
|
lines.push('- **Local tools**: web_fetch, run_shell, read_file, write_file, search_files, calendar_events, calendar_list, calendar_create, reminder_create, notification, applescript, open_url, open_app, screenshot, system_info, clipboard_read/write');
|
|
391
|
+
lines.push('- **Local preview URLs**: Before saying a localhost/127.0.0.1 preview is live, back up, HTTP 200, or reachable, call check_url or start_static_server in the current turn. These checks prove Wall-E host loopback only; phone or remote-browser reachability needs CTM remote/tunnel evidence.');
|
|
379
392
|
lines.push('- **Email (Google Workspace first)**: mail_messages, mail_read, mail_search, mail_reply, and mail_send search all usable configured Gmail/Google Workspace accounts by default. Read the `accounts`, `mail_account_scope`, and `unavailable_accounts` fields in tool results before concluding an email is missing. If the likely account is unavailable or the result has `needs_clarification`, tell the user exactly which accounts were searched and ask one short clarification/reconnect question. Use `mail_reply` for any reply/respond/original-thread request; it derives recipients and Gmail thread headers from the original message_id. Use `mail_send` only for brand-new emails. macOS Mail is only an explicit fallback after GWS is unavailable or source:"macos" is requested. Outbound external actions are admitted through an action controller: validate sender/account, recipient, payload, and user intent; if the controller stages a draft or requires confirmation, never claim the action was sent.');
|
|
380
393
|
lines.push('- **Slack**: search_memories with source:"slack" for stored messages. slack_search / slack_read_channel for live data. slack_send_message to post. pull_slack to ingest.');
|
|
381
394
|
lines.push("- **Glean**: When using reportsto: queries, \"entities\" = direct reports only. Check manager.email to verify.");
|
|
@@ -27,7 +27,7 @@ function checkGraduation(domain) {
|
|
|
27
27
|
|
|
28
28
|
if (currentTier === 1) {
|
|
29
29
|
// Tier 1 -> 2: enough memories in domain
|
|
30
|
-
const memCount = brain.
|
|
30
|
+
const memCount = brain.countMemories({ source: domain });
|
|
31
31
|
if (memCount > 50) newTier = 2;
|
|
32
32
|
} else if (currentTier === 2) {
|
|
33
33
|
const rate = dc.total_actions > 0 ? dc.approved_actions / dc.total_actions : 0;
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Wall-E Coding Acceptance Contract
|
|
2
|
+
|
|
3
|
+
Wall-E coding runs must not treat model prose as proof of completion. Every build run now has a typed acceptance contract that maps the task to required validators before a subtask or final completion can succeed.
|
|
4
|
+
|
|
5
|
+
## Contract Shape
|
|
6
|
+
|
|
7
|
+
Planner subtasks may include an `acceptance` object:
|
|
8
|
+
|
|
9
|
+
```json
|
|
10
|
+
{
|
|
11
|
+
"task_kind": "frontend-ui",
|
|
12
|
+
"write_policy": "must-write",
|
|
13
|
+
"validators": [
|
|
14
|
+
"frontend.static_contract",
|
|
15
|
+
"frontend.screenshot_evidence",
|
|
16
|
+
"frontend.browser_runtime"
|
|
17
|
+
]
|
|
18
|
+
}
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
The orchestrator also derives a contract when the planner omits one. Frontend tasks are detected from changed frontend files or UI/web task language.
|
|
22
|
+
|
|
23
|
+
## Frontend Validators
|
|
24
|
+
|
|
25
|
+
- `frontend.static_contract` checks local assets and HTML-to-JS inline event handlers. If HTML calls `toggleAudio()` or `openMusicLightbox()`, a loaded inline or local script must define that global function.
|
|
26
|
+
- `frontend.screenshot_evidence` requires successful `browser_screenshot` evidence for frontend changes.
|
|
27
|
+
- `frontend.browser_runtime` uses `browser_smoke_test` or the orchestrator's final auto-smoke path. It loads the page in headless Chrome through CDP, captures runtime exceptions, console errors, failed requests, and safe-clicks interactive elements such as `[onclick]`, buttons, role buttons, and hash links.
|
|
28
|
+
|
|
29
|
+
## Enforcement Points
|
|
30
|
+
|
|
31
|
+
- Subtask gate: after a worker changes files, Wall-E runs the fast contract checks before tests/review. Static frontend failures retry the subtask with concrete failure text.
|
|
32
|
+
- Final gate: before Wall-E can report success or commit, Wall-E re-runs frontend static checks, requires screenshot evidence, and runs browser runtime smoke on discovered HTML entrypoints.
|
|
33
|
+
- Telemetry/progress: each validator emits structured `acceptance_validator` progress and anonymous `coding_acceptance_validator` telemetry with validator name, status, and failure count.
|
|
34
|
+
|
|
35
|
+
## Scope Control
|
|
36
|
+
|
|
37
|
+
Vague UI prompts such as "improve UX" or "make it world class" are bounded improvement tasks by default. The planner should produce 1-3 high-impact changes unless the user explicitly asks for a rebuild. For frontend subtasks, HTML, CSS, and JS changes must stay in the same subtask when they form one interaction contract.
|
|
38
|
+
|
|
39
|
+
## Why This Exists
|
|
40
|
+
|
|
41
|
+
The failure mode that motivated this contract was a successful-looking frontend run that produced HTML with inline handlers whose JavaScript functions did not exist. Screenshots and file-change checks did not catch it. The new contract catches the defect statically and, at final completion, through a real browser runtime check.
|
|
@@ -32,6 +32,19 @@ user confirmation, and the exact approved envelope is replayed back to Wall-E.
|
|
|
32
32
|
Wall-E then executes the original payload directly rather than asking the model
|
|
33
33
|
to recreate it.
|
|
34
34
|
|
|
35
|
+
There are two valid CTM approval handoffs:
|
|
36
|
+
|
|
37
|
+
- **Exact-payload replay**: the prior assistant turn already produced blocked
|
|
38
|
+
external-action envelopes. CTM reconstructs those envelopes from either
|
|
39
|
+
durable `toolCalls` or provider-native `tool_result` blocks and sends exact
|
|
40
|
+
`actionId`/`payloadHash` approvals.
|
|
41
|
+
- **Current-turn dispatch approval**: the user approves visible drafts with
|
|
42
|
+
explicit domain language such as `yes, please send both emails` before any
|
|
43
|
+
envelope exists. CTM sends a one-turn approval scoped to the matching external
|
|
44
|
+
domain. Wall-E may execute same-turn tool calls only if validation has no
|
|
45
|
+
issues and the only remaining confirmation is
|
|
46
|
+
`external_action_confirmation_required`.
|
|
47
|
+
|
|
35
48
|
## Approval Tiers
|
|
36
49
|
|
|
37
50
|
Wall-E uses two approval tiers:
|
|
@@ -108,10 +121,14 @@ Anthropic, OpenAI, and other providers all use the same envelope replay path.
|
|
|
108
121
|
account.
|
|
109
122
|
- Retry tracing classifies outbound external actions as unsafe side effects.
|
|
110
123
|
- CTM learned approval rules cannot auto-allow external action tools.
|
|
111
|
-
- CTM
|
|
112
|
-
|
|
113
|
-
action confirmation.
|
|
114
|
-
|
|
124
|
+
- CTM converts the next prompt into an exact approval when the latest pending
|
|
125
|
+
action group contains staged external actions and the user text is a clear
|
|
126
|
+
external action confirmation. It can also create a one-turn domain-scoped
|
|
127
|
+
approval for explicit dispatch confirmations such as `yes, please send both
|
|
128
|
+
emails`. Bare confirmations such as `yes` do not create current-turn
|
|
129
|
+
approvals without a pending action.
|
|
130
|
+
- Coding prompts such as `go ahead with the fix` do not approve mail/calendar
|
|
131
|
+
side effects.
|
|
115
132
|
- Approved envelopes are idempotent per Wall-E session and payload hash to avoid
|
|
116
133
|
accidental duplicate sends from double-submit or retry.
|
|
117
134
|
- Calendar approval envelopes preserve `account`, `source`, `calendarId`,
|
|
@@ -148,8 +165,11 @@ Focused regressions:
|
|
|
148
165
|
- `wall-e/tests/coding-orchestrator.test.js`
|
|
149
166
|
- `wall-e/tests/coding-stream-processor.test.js`
|
|
150
167
|
- `wall-e/tests/execution-trace.test.js`
|
|
151
|
-
- `wall-e/tests/chat.test.js` with `stages a draft email`
|
|
168
|
+
- `wall-e/tests/chat.test.js` with `stages a draft email` and
|
|
169
|
+
`validated same-turn email dispatch`
|
|
152
170
|
|
|
153
171
|
For realistic prompt validation, run the Wall-E chat loop with a disposable data
|
|
154
172
|
directory and a mock provider that attempts `mail_send`. The expected tool result
|
|
155
|
-
is `decision: "stage_preview"` and `sent: false
|
|
173
|
+
is `decision: "stage_preview"` and `sent: false` for draft-only prompts. For a
|
|
174
|
+
current-turn approval prompt, the expected tool result is a verified executor
|
|
175
|
+
result, not a blocked confirmation envelope.
|
|
@@ -36,7 +36,7 @@ Defaults are intentionally conservative and can be overridden by environment var
|
|
|
36
36
|
| Install IP | 30 days | IP is operational only for abuse/debugging. |
|
|
37
37
|
| Owner display name | 7 days | Hashes and machine buckets are enough for fleet analysis. |
|
|
38
38
|
|
|
39
|
-
Diagnostic event names include `error`, `skill_fallback_attempt`, `initiative_provider_cooldown`, `compat_usage`, `upgrade`, `upgrade_prompt`, `funnel`, and the `ctm_update_` prefix.
|
|
39
|
+
Diagnostic event names include `error`, `skill_fallback_attempt`, `initiative_provider_cooldown`, `compat_usage`, `upgrade`, `upgrade_prompt`, `funnel`, `session_integrity_issue`, `session_integrity_issue_summary`, and the `ctm_update_` prefix.
|
|
40
40
|
|
|
41
41
|
Noisy event names include `skill_dispatch_decision`, `skill_exec`, `skill_run`, `skills_run`, `task_run`, `think`, `initiative`, `reflect`, and `ingest`.
|
|
42
42
|
|
|
@@ -59,7 +59,13 @@ This makes archived event counts exact for deleted rows and lets summary endpoin
|
|
|
59
59
|
|
|
60
60
|
Feedback cleanup does not delete the report row. It writes daily feedback rollups first, then replaces title, description, triage text, evidence, context previews, and attachment metadata with redacted placeholders.
|
|
61
61
|
|
|
62
|
-
The cleanup process does not run `VACUUM` automatically. SQLite file compaction can be expensive and should be a separate maintenance action after checking disk pressure and service load. WAL checkpointing
|
|
62
|
+
The cleanup process does not run `VACUUM` automatically. SQLite file compaction can be expensive and should be a separate maintenance action after checking disk pressure and service load. WAL checkpointing defaults to `PASSIVE` to avoid aggressive truncate behavior on storage backends that can return short reads; operators can set `WALLE_TELEMETRY_CLEANUP_CHECKPOINT_MODE=FULL`, `RESTART`, or `TRUNCATE` only when they explicitly need stronger WAL draining.
|
|
63
|
+
|
|
64
|
+
Session integrity telemetry is intentionally aggregate-first. CTM emits `session_integrity_issue_summary` for fleet-level counts and only emits per-session `session_integrity_issue` details when `CTM_SESSION_INTEGRITY_DETAIL_TELEMETRY=1` is set for a focused debugging run.
|
|
65
|
+
|
|
66
|
+
Upgrade telemetry distinguishes in-product updates from externally observed version changes. Summary fields use `completed`/`completed_after_apply` for updates with a matching apply-start signal, and `external_completed` for version changes detected without that signal.
|
|
67
|
+
|
|
68
|
+
Compatibility telemetry reports both current usage and removal blockers. `safe_to_remove` may stay empty even when a compatibility feature has low usage if that feature is not deprecated or still has active installs.
|
|
63
69
|
|
|
64
70
|
## Operations
|
|
65
71
|
|