@bitseek/hermes-webui 0.1.0-beta.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +213 -0
- package/bin/hermes-webui.mjs +588 -0
- package/package.json +25 -0
- package/scripts/sync-vendor.mjs +74 -0
- package/templates/launchd/com.bitseek.hermes-webui.plist +21 -0
- package/templates/systemd/hermes-webui.service +13 -0
- package/templates/windows/hermes-webui-task.ps1 +3 -0
- package/vendor/agent-frontend-shell/.bitseek-source.json +6 -0
- package/vendor/agent-frontend-shell/.dockerignore +7 -0
- package/vendor/agent-frontend-shell/.env.docker.example +89 -0
- package/vendor/agent-frontend-shell/.env.example +34 -0
- package/vendor/agent-frontend-shell/.github/FUNDING.yml +3 -0
- package/vendor/agent-frontend-shell/.github/workflows/browser-smoke.yml +42 -0
- package/vendor/agent-frontend-shell/.github/workflows/docker-smoke.yml +233 -0
- package/vendor/agent-frontend-shell/.github/workflows/native-windows-startup.yml +132 -0
- package/vendor/agent-frontend-shell/.github/workflows/release.yml +57 -0
- package/vendor/agent-frontend-shell/.github/workflows/tests.yml +88 -0
- package/vendor/agent-frontend-shell/.vscode/launch.json +59 -0
- package/vendor/agent-frontend-shell/.vscode/settings.json +13 -0
- package/vendor/agent-frontend-shell/AGENTS.md +80 -0
- package/vendor/agent-frontend-shell/ARCHITECTURE.md +1658 -0
- package/vendor/agent-frontend-shell/BUGS.md +52 -0
- package/vendor/agent-frontend-shell/CHANGELOG.md +7295 -0
- package/vendor/agent-frontend-shell/CONTRIBUTING.md +205 -0
- package/vendor/agent-frontend-shell/CONTRIBUTORS.md +107 -0
- package/vendor/agent-frontend-shell/DESIGN.md +173 -0
- package/vendor/agent-frontend-shell/Dockerfile +91 -0
- package/vendor/agent-frontend-shell/LICENSE +21 -0
- package/vendor/agent-frontend-shell/README-CUSTOM.md +76 -0
- package/vendor/agent-frontend-shell/README.md +705 -0
- package/vendor/agent-frontend-shell/ROADMAP.md +351 -0
- package/vendor/agent-frontend-shell/SPRINTS.md +147 -0
- package/vendor/agent-frontend-shell/TESTING.md +1932 -0
- package/vendor/agent-frontend-shell/THEMES.md +170 -0
- package/vendor/agent-frontend-shell/api/__init__.py +1 -0
- package/vendor/agent-frontend-shell/api/agent_health.py +392 -0
- package/vendor/agent-frontend-shell/api/agent_sessions.py +782 -0
- package/vendor/agent-frontend-shell/api/auth.py +592 -0
- package/vendor/agent-frontend-shell/api/background.py +87 -0
- package/vendor/agent-frontend-shell/api/clarify.py +238 -0
- package/vendor/agent-frontend-shell/api/commands.py +124 -0
- package/vendor/agent-frontend-shell/api/compression_anchor.py +134 -0
- package/vendor/agent-frontend-shell/api/config.py +5178 -0
- package/vendor/agent-frontend-shell/api/dashboard_probe.py +255 -0
- package/vendor/agent-frontend-shell/api/extensions.py +253 -0
- package/vendor/agent-frontend-shell/api/gateway_chat.py +435 -0
- package/vendor/agent-frontend-shell/api/gateway_watcher.py +230 -0
- package/vendor/agent-frontend-shell/api/goals.py +608 -0
- package/vendor/agent-frontend-shell/api/helpers.py +474 -0
- package/vendor/agent-frontend-shell/api/kanban_bridge.py +1255 -0
- package/vendor/agent-frontend-shell/api/metering.py +194 -0
- package/vendor/agent-frontend-shell/api/models.py +4210 -0
- package/vendor/agent-frontend-shell/api/oauth.py +770 -0
- package/vendor/agent-frontend-shell/api/onboarding.py +1046 -0
- package/vendor/agent-frontend-shell/api/passkeys.py +365 -0
- package/vendor/agent-frontend-shell/api/profiles.py +1499 -0
- package/vendor/agent-frontend-shell/api/providers.py +2175 -0
- package/vendor/agent-frontend-shell/api/request_diagnostics.py +160 -0
- package/vendor/agent-frontend-shell/api/rollback.py +320 -0
- package/vendor/agent-frontend-shell/api/routes.py +13990 -0
- package/vendor/agent-frontend-shell/api/run_journal.py +284 -0
- package/vendor/agent-frontend-shell/api/runner_client.py +156 -0
- package/vendor/agent-frontend-shell/api/runtime_adapter.py +431 -0
- package/vendor/agent-frontend-shell/api/session_discoverability.py +640 -0
- package/vendor/agent-frontend-shell/api/session_events.py +45 -0
- package/vendor/agent-frontend-shell/api/session_lifecycle.py +208 -0
- package/vendor/agent-frontend-shell/api/session_ops.py +207 -0
- package/vendor/agent-frontend-shell/api/session_recovery.py +655 -0
- package/vendor/agent-frontend-shell/api/skill_usage.py +32 -0
- package/vendor/agent-frontend-shell/api/startup.py +128 -0
- package/vendor/agent-frontend-shell/api/state_sync.py +187 -0
- package/vendor/agent-frontend-shell/api/streaming.py +7048 -0
- package/vendor/agent-frontend-shell/api/system_health.py +167 -0
- package/vendor/agent-frontend-shell/api/terminal.py +410 -0
- package/vendor/agent-frontend-shell/api/turn_journal.py +214 -0
- package/vendor/agent-frontend-shell/api/updates.py +1261 -0
- package/vendor/agent-frontend-shell/api/upload.py +322 -0
- package/vendor/agent-frontend-shell/api/usage.py +26 -0
- package/vendor/agent-frontend-shell/api/workspace.py +867 -0
- package/vendor/agent-frontend-shell/api/workspace_git.py +1261 -0
- package/vendor/agent-frontend-shell/api/worktrees.py +357 -0
- package/vendor/agent-frontend-shell/bootstrap.py +492 -0
- package/vendor/agent-frontend-shell/ctl.sh +427 -0
- package/vendor/agent-frontend-shell/docker-compose.custom.yml +26 -0
- package/vendor/agent-frontend-shell/docker-compose.three-container.yml +168 -0
- package/vendor/agent-frontend-shell/docker-compose.two-container.yml +147 -0
- package/vendor/agent-frontend-shell/docker-compose.yml +57 -0
- package/vendor/agent-frontend-shell/docker_init.bash +459 -0
- package/vendor/agent-frontend-shell/docs/CONTRACTS.md +207 -0
- package/vendor/agent-frontend-shell/docs/EXTENSIONS.md +212 -0
- package/vendor/agent-frontend-shell/docs/ISSUES.md +23 -0
- package/vendor/agent-frontend-shell/docs/UIUX-GUIDE.md +196 -0
- package/vendor/agent-frontend-shell/docs/advanced-chat-setup.md +83 -0
- package/vendor/agent-frontend-shell/docs/docker.md +337 -0
- package/vendor/agent-frontend-shell/docs/onboarding-agent-checklist.md +207 -0
- package/vendor/agent-frontend-shell/docs/onboarding.md +202 -0
- package/vendor/agent-frontend-shell/docs/remote-access.md +75 -0
- package/vendor/agent-frontend-shell/docs/rfcs/README.md +53 -0
- package/vendor/agent-frontend-shell/docs/rfcs/agent-source-boundary.md +70 -0
- package/vendor/agent-frontend-shell/docs/rfcs/canonical-session-resolution.md +124 -0
- package/vendor/agent-frontend-shell/docs/rfcs/hermes-run-adapter-contract.md +1079 -0
- package/vendor/agent-frontend-shell/docs/rfcs/turn-journal.md +195 -0
- package/vendor/agent-frontend-shell/docs/rfcs/webui-run-state-consistency-contract.md +157 -0
- package/vendor/agent-frontend-shell/docs/supervisor.md +280 -0
- package/vendor/agent-frontend-shell/docs/troubleshooting.md +132 -0
- package/vendor/agent-frontend-shell/docs/ui-ux/index.html +863 -0
- package/vendor/agent-frontend-shell/docs/ui-ux/two-stage-proposal.html +768 -0
- package/vendor/agent-frontend-shell/docs/why-hermes.md +489 -0
- package/vendor/agent-frontend-shell/docs/workspace-git.md +92 -0
- package/vendor/agent-frontend-shell/docs/wsl-autostart.md +126 -0
- package/vendor/agent-frontend-shell/eslint.runtime-guard.config.mjs +35 -0
- package/vendor/agent-frontend-shell/extensions/bitseek-design-system.md +330 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/apple-touch-icon.png +0 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/empty-logo.svg +739 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-192.png +0 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-32.png +0 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-512.png +0 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-512.svg +745 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/favicon.ico +0 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/favicon.svg +745 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/titlebar-icon-v2.svg +751 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/titlebar-icon-v3.svg +739 -0
- package/vendor/agent-frontend-shell/extensions/branding/assets/titlebar-icon.svg +745 -0
- package/vendor/agent-frontend-shell/extensions/branding/branding.js +112 -0
- package/vendor/agent-frontend-shell/extensions/branding/config.json +14 -0
- package/vendor/agent-frontend-shell/extensions/branding/manifest.json +53 -0
- package/vendor/agent-frontend-shell/extensions/index.js +67 -0
- package/vendor/agent-frontend-shell/extensions/loader/hermes-loader.js +77 -0
- package/vendor/agent-frontend-shell/extensions/manifest.json +16 -0
- package/vendor/agent-frontend-shell/extensions/pages/ai-teammates/page.css +333 -0
- package/vendor/agent-frontend-shell/extensions/pages/ai-teammates/page.js +487 -0
- package/vendor/agent-frontend-shell/extensions/pages/manifest.json +6 -0
- package/vendor/agent-frontend-shell/extensions/pages/registry.css +56 -0
- package/vendor/agent-frontend-shell/extensions/pages/registry.js +302 -0
- package/vendor/agent-frontend-shell/extensions/themes/bitseek/index.css +93 -0
- package/vendor/agent-frontend-shell/extensions/themes/bitseek/index.js +98 -0
- package/vendor/agent-frontend-shell/install.sh +63 -0
- package/vendor/agent-frontend-shell/mcp_server.py +567 -0
- package/vendor/agent-frontend-shell/package.json +12 -0
- package/vendor/agent-frontend-shell/pyproject.toml +56 -0
- package/vendor/agent-frontend-shell/pytest.ini +3 -0
- package/vendor/agent-frontend-shell/requirements.txt +5 -0
- package/vendor/agent-frontend-shell/server.py +624 -0
- package/vendor/agent-frontend-shell/start.ps1 +210 -0
- package/vendor/agent-frontend-shell/start.sh +65 -0
- package/vendor/agent-frontend-shell/static/apple-touch-icon.png +0 -0
- package/vendor/agent-frontend-shell/static/boot.js +1990 -0
- package/vendor/agent-frontend-shell/static/commands.js +1402 -0
- package/vendor/agent-frontend-shell/static/favicon-192.png +0 -0
- package/vendor/agent-frontend-shell/static/favicon-32.png +0 -0
- package/vendor/agent-frontend-shell/static/favicon-512.png +0 -0
- package/vendor/agent-frontend-shell/static/favicon-512.svg +18 -0
- package/vendor/agent-frontend-shell/static/favicon.ico +0 -0
- package/vendor/agent-frontend-shell/static/favicon.svg +20 -0
- package/vendor/agent-frontend-shell/static/i18n.js +15389 -0
- package/vendor/agent-frontend-shell/static/icons.js +92 -0
- package/vendor/agent-frontend-shell/static/index.html +1506 -0
- package/vendor/agent-frontend-shell/static/login.js +177 -0
- package/vendor/agent-frontend-shell/static/manifest.json +53 -0
- package/vendor/agent-frontend-shell/static/messages.js +3521 -0
- package/vendor/agent-frontend-shell/static/onboarding.js +800 -0
- package/vendor/agent-frontend-shell/static/panels.js +7995 -0
- package/vendor/agent-frontend-shell/static/pwa-startup.js +83 -0
- package/vendor/agent-frontend-shell/static/sessions.js +5165 -0
- package/vendor/agent-frontend-shell/static/style.css +4774 -0
- package/vendor/agent-frontend-shell/static/sw.js +173 -0
- package/vendor/agent-frontend-shell/static/terminal.js +632 -0
- package/vendor/agent-frontend-shell/static/ui.js +8997 -0
- package/vendor/agent-frontend-shell/static/vendor/js-yaml/4.1.0/js-yaml.min.js +2 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_AMS-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_AMS-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_AMS-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Bold.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Bold.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Bold.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Bold.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Bold.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Bold.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Bold.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Bold.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Bold.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-BoldItalic.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-BoldItalic.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-BoldItalic.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Italic.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Italic.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Italic.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-BoldItalic.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-BoldItalic.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-BoldItalic.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-Italic.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-Italic.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-Italic.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Bold.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Bold.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Bold.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Italic.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Italic.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Italic.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Script-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Script-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Script-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size1-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size1-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size1-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size2-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size2-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size2-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size3-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size3-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size3-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size4-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size4-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size4-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Typewriter-Regular.ttf +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Typewriter-Regular.woff +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Typewriter-Regular.woff2 +0 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/katex.min.css +1 -0
- package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/katex.min.js +1 -0
- package/vendor/agent-frontend-shell/static/vendor/smd.min.js +29 -0
- package/vendor/agent-frontend-shell/static/workspace.js +680 -0
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
# RFC: WebUI Turn Journal for Crash-Safe Chat Submissions
|
|
2
|
+
|
|
3
|
+
- **Status:** Proposed
|
|
4
|
+
- **Author:** @ai-ag2026
|
|
5
|
+
- **Created:** 2026-05-11
|
|
6
|
+
|
|
7
|
+
## Problem
|
|
8
|
+
|
|
9
|
+
A WebUI chat turn crosses several durability boundaries:
|
|
10
|
+
|
|
11
|
+
1. browser submits a user message,
|
|
12
|
+
2. WebUI creates or updates session runtime metadata,
|
|
13
|
+
3. the agent worker starts streaming,
|
|
14
|
+
4. assistant output is appended,
|
|
15
|
+
5. the JSON sidecar and derived index are saved.
|
|
16
|
+
|
|
17
|
+
If the server crashes between submission and the final sidecar save, recovery has to infer what happened from `pending_user_message`, `active_stream_id`, `.json.bak`, `_index.json`, and `state.db`. Those safeguards are useful, but they are still reconstructing intent after the fact.
|
|
18
|
+
|
|
19
|
+
The missing primitive is a small write-ahead journal for turns: record the submitted user turn durably before the worker starts, then advance the journal as the turn progresses.
|
|
20
|
+
|
|
21
|
+
## Goals
|
|
22
|
+
|
|
23
|
+
- Preserve the exact user-submitted turn, including attachments metadata, before any provider or worker work starts.
|
|
24
|
+
- Make crash recovery deterministic: a submitted-but-unfinished turn can be reported or reconstructed without guessing.
|
|
25
|
+
- Keep the journal append/update format simple enough for startup recovery, CLI audit, and future API repair endpoints.
|
|
26
|
+
- Avoid turning recovery into a background daemon. This is storage hygiene, not a long-running service.
|
|
27
|
+
|
|
28
|
+
## Non-goals
|
|
29
|
+
|
|
30
|
+
- Replacing `state.db.sessions` or WebUI JSON sidecars.
|
|
31
|
+
- Journaling every token or every SSE event.
|
|
32
|
+
- Replaying tool calls or provider streams.
|
|
33
|
+
- Automatically inventing assistant messages after ambiguous crashes.
|
|
34
|
+
|
|
35
|
+
## Proposed storage
|
|
36
|
+
|
|
37
|
+
Use one JSONL file per session under the existing WebUI state area:
|
|
38
|
+
|
|
39
|
+
```text
|
|
40
|
+
<SESSION_DIR>/_turn_journal/<session_id>.jsonl
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Each line is an immutable event. Recovery can scan by `turn_id` and choose the latest status.
|
|
44
|
+
|
|
45
|
+
### Event shape
|
|
46
|
+
|
|
47
|
+
```json
|
|
48
|
+
{
|
|
49
|
+
"version": 1,
|
|
50
|
+
"event": "submitted",
|
|
51
|
+
"turn_id": "20260511T001122Z-abcdef",
|
|
52
|
+
"session_id": "abc123",
|
|
53
|
+
"stream_id": "stream-xyz",
|
|
54
|
+
"created_at": 1778458282.123,
|
|
55
|
+
"role": "user",
|
|
56
|
+
"content": "...",
|
|
57
|
+
"attachments": [],
|
|
58
|
+
"workspace": "/workspace",
|
|
59
|
+
"model": "openai/gpt-5",
|
|
60
|
+
"model_provider": "openai"
|
|
61
|
+
}
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Later events for the same `turn_id`:
|
|
65
|
+
|
|
66
|
+
```json
|
|
67
|
+
{"version":1,"event":"worker_started","turn_id":"...","created_at":1778458283.0}
|
|
68
|
+
{"version":1,"event":"assistant_started","turn_id":"...","created_at":1778458284.0}
|
|
69
|
+
{"version":1,"event":"completed","turn_id":"...","created_at":1778458299.0,"assistant_message_index":12}
|
|
70
|
+
{"version":1,"event":"interrupted","turn_id":"...","created_at":1778458301.0,"reason":"server_startup_recovery"}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Turn state machine
|
|
74
|
+
|
|
75
|
+
```text
|
|
76
|
+
submitted -> worker_started -> assistant_started -> completed
|
|
77
|
+
submitted -> interrupted
|
|
78
|
+
worker_started -> interrupted
|
|
79
|
+
assistant_started -> interrupted
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
`completed` is terminal. `interrupted` is terminal unless a later explicit repair creates a new turn. Recovery should not silently resume a provider call.
|
|
83
|
+
|
|
84
|
+
## Write rules
|
|
85
|
+
|
|
86
|
+
1. On `/api/chat/start` or equivalent turn-submission path:
|
|
87
|
+
- generate `turn_id`,
|
|
88
|
+
- append `submitted`,
|
|
89
|
+
- fsync the journal file,
|
|
90
|
+
- only then start the worker.
|
|
91
|
+
2. When worker thread enters `_run_agent_streaming`, append `worker_started`.
|
|
92
|
+
3. When assistant output is first persisted or clearly begins, append `assistant_started`.
|
|
93
|
+
4. After the sidecar save that includes the assistant answer succeeds, append `completed`.
|
|
94
|
+
5. On cancellation or known worker exception, append `interrupted` with a reason.
|
|
95
|
+
|
|
96
|
+
## Synchronous durability design rationale
|
|
97
|
+
|
|
98
|
+
The `submitted` event uses synchronous `fsync` on every write today. This is a deliberate tradeoff between latency and crash-safety guarantees:
|
|
99
|
+
|
|
100
|
+
### Why synchronous for submitted events
|
|
101
|
+
|
|
102
|
+
The `submitted` event is the durability anchor for the entire recovery story. If the server crashes before the worker starts, the journal must reflect that the user message was received. Async writes risk losing that guarantee: a crash shortly after a non-fsync'd write could leave the journal silent while `pending_user_message` still exists, creating ambiguity during recovery. The current design avoids that ambiguity at the cost of one extra disk round-trip per turn submission.
|
|
103
|
+
|
|
104
|
+
### Latency expectations by storage type
|
|
105
|
+
|
|
106
|
+
Reported fsync latency varies significantly across storage backends. Approximate qualitative ranges to keep in mind:
|
|
107
|
+
|
|
108
|
+
- **SSD (NVM/NVMe)**: Single-digit milliseconds; p99 typically well under 10 ms on modern hardware. Most turn submissions will see sub-5 ms overhead.
|
|
109
|
+
- **Rotational disk (HDD)**: Seek time dominates; p50 ~5–15 ms, p99 can reach 50–100 ms under load. A busy server with many concurrent submissions may see queueing effects.
|
|
110
|
+
- **Docker/overlay filesystems**: fsync latency depends on the container storage driver and the backing host filesystem. Write-through and copy-on-write semantics can introduce additional overhead; p95 may be 10–50 ms in typical containerized deployments, though exact figures vary by configuration and host load.
|
|
111
|
+
|
|
112
|
+
These ranges are order-of-magnitude guidance, not benchmarks. Exact figures depend on hardware, kernel version, filesystem mount options, and concurrent load. Do not commit specific millisecond claims to documentation without measured evidence.
|
|
113
|
+
|
|
114
|
+
### Benchmark guidance for maintainers
|
|
115
|
+
|
|
116
|
+
If evidence suggests the synchronous write is a bottleneck, measure before changing anything:
|
|
117
|
+
|
|
118
|
+
1. Instrument the `append_turn_journal_event` helper to record wall-clock time for each event type (submitted, worker_started, etc.).
|
|
119
|
+
2. Capture p50/p95/p99 append/fsync latency over a representative workload (e.g., at least 1,000 submitted turns under realistic concurrency).
|
|
120
|
+
3. Isolate the fsync component: on Linux, use `strace -e fsync` or kernel tracing (`ftrace`, `perf`) to confirm where time is spent.
|
|
121
|
+
4. Check for patterns: if most submissions are under 5 ms but the p99 is 200 ms due to occasional disk contention, async writes help the tail but not the median. The tradeoff must be evaluated in context of your recovery guarantees.
|
|
122
|
+
|
|
123
|
+
### Future follow-up: async lifecycle-event journaling
|
|
124
|
+
|
|
125
|
+
Making journal writes asynchronous is a valid future optimization, but it requires:
|
|
126
|
+
|
|
127
|
+
- A reliable flush strategy (e.g., time-bounded flush every N seconds, flush on session close, flush after K pending events).
|
|
128
|
+
- Recovery logic that handles partial flush windows: if a crash occurs before the flush, the last few submitted events may be missing from the journal. Recovery must account for that ambiguity.
|
|
129
|
+
- Tests that verify the flush correctness under crash injection.
|
|
130
|
+
|
|
131
|
+
Async journal writes are **not** part of the initial implementation. They belong in a follow-up RFC once the synchronous baseline is proven stable and the recovery semantics are well-understood.
|
|
132
|
+
|
|
133
|
+
## Startup recovery semantics
|
|
134
|
+
|
|
135
|
+
On startup, for each journal file:
|
|
136
|
+
|
|
137
|
+
- Latest event is `completed`: no action.
|
|
138
|
+
- Latest event is `submitted` or `worker_started` and no matching user message exists in sidecar:
|
|
139
|
+
- append/recover the user message into the session sidecar with a recovery marker.
|
|
140
|
+
- Latest event is `submitted`, `worker_started`, or `assistant_started` and no completed assistant turn exists:
|
|
141
|
+
- add a visible interruption marker, not a fake assistant answer.
|
|
142
|
+
- Existing `.json.bak` and `state.db` recovery still run first so the sidecar is as complete as possible before journal reconciliation.
|
|
143
|
+
|
|
144
|
+
## Audit additions
|
|
145
|
+
|
|
146
|
+
`audit_session_recovery()` can report:
|
|
147
|
+
|
|
148
|
+
- `turn_journal_pending_turn` — repairable if the user message is absent from sidecar.
|
|
149
|
+
- `turn_journal_interrupted_turn` — ok/warn depending on whether a visible marker exists.
|
|
150
|
+
- `turn_journal_malformed_event` — manual review.
|
|
151
|
+
|
|
152
|
+
Safe repair should only materialize submitted user messages and interruption markers when the journal event content is valid JSON and the target message is absent.
|
|
153
|
+
|
|
154
|
+
## API surface
|
|
155
|
+
|
|
156
|
+
Initial read-only endpoint can be folded into the existing recovery audit:
|
|
157
|
+
|
|
158
|
+
```text
|
|
159
|
+
GET /api/session/recovery/audit
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
Later, if needed:
|
|
163
|
+
|
|
164
|
+
```text
|
|
165
|
+
GET /api/session/turn-journal?session_id=<id>
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
The latter should be diagnostic-only and redact or omit large attachment payloads.
|
|
169
|
+
|
|
170
|
+
## Rollout plan
|
|
171
|
+
|
|
172
|
+
1. Land backup/sidecar recovery and audit primitives.
|
|
173
|
+
2. Add this journal writer in the turn-submission path behind no config flag; it is local-only and append-only.
|
|
174
|
+
3. Add read-only audit reporting for pending journal turns.
|
|
175
|
+
4. Add safe repair for missing user messages and interruption markers.
|
|
176
|
+
5. Once stable, consider pruning completed journal entries older than a retention window, but only after sidecar/index recovery has no findings.
|
|
177
|
+
|
|
178
|
+
## Open questions
|
|
179
|
+
|
|
180
|
+
- Exact place to define `turn_id` so browser retry and server retry do not duplicate the same user message.
|
|
181
|
+
- Whether attachment files need their own durable manifest entry or whether metadata-only is enough for v1.
|
|
182
|
+
- How much of the assistant partial output, if any, should be recoverable after `assistant_started` but before `completed`.
|
|
183
|
+
- Whether completed journal entries should be compacted into a per-session checkpoint file.
|
|
184
|
+
|
|
185
|
+
## Minimal implementation slice
|
|
186
|
+
|
|
187
|
+
The first implementation PR should be deliberately small:
|
|
188
|
+
|
|
189
|
+
- helper: `append_turn_journal_event(session_id, event)`
|
|
190
|
+
- helper: `read_turn_journal(session_id)`
|
|
191
|
+
- unit tests for atomic append, malformed-line tolerance, and state derivation
|
|
192
|
+
- one call site: append `submitted` before worker start
|
|
193
|
+
- audit-only report of pending journal turns
|
|
194
|
+
|
|
195
|
+
Do **not** combine the first implementation with replay/repair. Replay is where most of the bugs in WAL systems live; ship the writer and audit first, prove the format, then add repair.
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
# WebUI Run State Consistency Contract
|
|
2
|
+
|
|
3
|
+
- **Status:** Proposed
|
|
4
|
+
- **Author:** @franksong2702
|
|
5
|
+
- **Created:** 2026-05-16
|
|
6
|
+
- **Tracking issue:** [#2361](https://github.com/nesquena/hermes-webui/issues/2361)
|
|
7
|
+
- **Related architecture:** [#1925](https://github.com/nesquena/hermes-webui/issues/1925), [`hermes-run-adapter-contract.md`](hermes-run-adapter-contract.md)
|
|
8
|
+
|
|
9
|
+
## Problem
|
|
10
|
+
|
|
11
|
+
A single WebUI agent turn is represented by several overlapping state layers:
|
|
12
|
+
|
|
13
|
+
- the visible transcript the user can read,
|
|
14
|
+
- the model context / `context_messages` the agent actually receives,
|
|
15
|
+
- `pending_user_message` and active stream metadata,
|
|
16
|
+
- live SSE events and in-memory stream state,
|
|
17
|
+
- durable run journal / replay state,
|
|
18
|
+
- automatic compression summaries and active-task handoff text,
|
|
19
|
+
- the browser's live timeline DOM/cache,
|
|
20
|
+
- sidebar ordering, unread state, and `updated_at` metadata.
|
|
21
|
+
|
|
22
|
+
Those layers are not independent. When they drift apart, the user sees failures
|
|
23
|
+
that look unrelated: a prompt is visible but missing from recovered model
|
|
24
|
+
context, a live run loses or reorders thinking/tool cards after switching
|
|
25
|
+
sessions, cleanup makes old sessions look newly active, replay duplicates content,
|
|
26
|
+
or automatic compression reference material appears inside the active turn.
|
|
27
|
+
|
|
28
|
+
This RFC defines a consistency contract for those layers. It complements the
|
|
29
|
+
larger run adapter direction in #1925 by documenting what must remain coherent
|
|
30
|
+
while WebUI still has multiple overlapping state stores.
|
|
31
|
+
|
|
32
|
+
## Goals
|
|
33
|
+
|
|
34
|
+
- Define the state layers involved in active and recovered WebUI turns.
|
|
35
|
+
- Make the source-of-truth expectations explicit for each layer.
|
|
36
|
+
- Give reviewers a checklist for streaming, replay, compression, recovery,
|
|
37
|
+
model-context, and sidebar changes.
|
|
38
|
+
- Map recent real issues to reusable invariants so future fixes do not solve the
|
|
39
|
+
same class of bug one symptom at a time.
|
|
40
|
+
|
|
41
|
+
## Non-goals
|
|
42
|
+
|
|
43
|
+
- Do not implement a runner process, sidecar, or new runtime boundary here.
|
|
44
|
+
- Do not replace #1925 or the run adapter contract.
|
|
45
|
+
- Do not rewrite the streaming protocol in this RFC.
|
|
46
|
+
- Do not reopen already-fixed narrow bugs.
|
|
47
|
+
- Do not make this a catch-all for unrelated UI polish.
|
|
48
|
+
|
|
49
|
+
## State Layers
|
|
50
|
+
|
|
51
|
+
| Layer | Purpose | Source-of-truth expectation | Must not do |
|
|
52
|
+
|---|---|---|---|
|
|
53
|
+
| Visible transcript | Shows what the user and assistant said | Session transcript plus live replay should produce one chronological user-visible story | Hide the user turn that started active work, or show internal recovery text as current user intent |
|
|
54
|
+
| Model context / `context_messages` | Supplies conversation state to the agent | Must include the current visible user turn unless deliberately excluded with a user-visible reason | Let the agent resume from context that contradicts what the user can see |
|
|
55
|
+
| Pending turn metadata | Bridges submitted-but-not-yet-finalized user input | Must identify the user turn and stream that own active work | Become a permanent duplicate transcript row after recovery |
|
|
56
|
+
| Live stream / SSE | Delivers active runtime events to the browser | Must remain an observation path, not the only durable truth for already-emitted events | Lose the visible scene on refresh, reconnect, or session switch |
|
|
57
|
+
| Run journal / replay | Rebuilds emitted runtime events after reconnect or restart | Must be cursor-safe and idempotent | Duplicate assistant text, thinking text, tool cards, or compression cards |
|
|
58
|
+
| Compression summary / handoff | Gives the agent recovery context after automatic compression | Must remain agent-facing recovery material unless explicitly rendered as history | Pollute the active turn or become implicit current user intent |
|
|
59
|
+
| Live UI scene/cache | Preserves expanded rows, in-progress cards, local scroll, and transient grouping | May optimize presentation but must be rebuildable or degradable from transcript/replay | Become the only place where chronological ordering exists |
|
|
60
|
+
| Sidebar/session metadata | Helps the user find active and recent sessions | Must reflect meaningful user or assistant activity | Treat background cleanup as a fresh user-facing update |
|
|
61
|
+
|
|
62
|
+
## Core Invariants
|
|
63
|
+
|
|
64
|
+
1. **Visible current turns enter model context.** If the user can see a current
|
|
65
|
+
prompt and WebUI asks the model to continue that work, the prompt must be in
|
|
66
|
+
the reconstructed model context unless WebUI shows an explicit reason it was
|
|
67
|
+
excluded.
|
|
68
|
+
2. **Active turn UI keeps its owner.** The user turn that started active work
|
|
69
|
+
must remain visible before assistant text, thinking cards, tool cards, or
|
|
70
|
+
activity groups that belong to that work.
|
|
71
|
+
3. **Reattach preserves order or degrades clearly.** Refresh, reconnect, and
|
|
72
|
+
session switch must preserve chronological live-scene order. If WebUI cannot
|
|
73
|
+
restore the exact live scene, it should downgrade to an explicit structured
|
|
74
|
+
replay state instead of silently reordering content.
|
|
75
|
+
4. **Maintenance is not activity.** Runtime maintenance such as stale-stream
|
|
76
|
+
cleanup, orphan repair, or background compression must not refresh sidebar
|
|
77
|
+
ordering, unread markers, or active-session affordances as if the user or
|
|
78
|
+
assistant just acted.
|
|
79
|
+
5. **Replay is idempotent.** Replaying a run from a cursor must not duplicate
|
|
80
|
+
transcript rows, thinking content, interim assistant text, tool cards, or
|
|
81
|
+
compression cards. Replayed long-task events should enter the same
|
|
82
|
+
browser-facing timeline renderer as live SSE events so recovery does not
|
|
83
|
+
downgrade a structured Thinking / progress / tool / compression turn into a
|
|
84
|
+
separate flattened presentation.
|
|
85
|
+
Visible interim assistant progress must remain visible timeline content; a
|
|
86
|
+
compact Activity disclosure may summarize adjacent tool/debug detail, but it
|
|
87
|
+
must not be the only place where the user can see emitted progress text.
|
|
88
|
+
6. **Compression is not current intent.** Automatic compression summaries and
|
|
89
|
+
reference cards are recovery/handoff material. They must not be treated as a
|
|
90
|
+
new user request, active-turn content, or the default visible explanation for
|
|
91
|
+
the current answer.
|
|
92
|
+
7. **Observation has a degraded path.** Long-running or many-session observation
|
|
93
|
+
should expose enough heartbeat/degraded status that the UI does not appear
|
|
94
|
+
silent and ordinary APIs do not stall behind active streams.
|
|
95
|
+
8. **Every mutation names its layer.** A PR touching streaming, recovery,
|
|
96
|
+
context reconstruction, compression, replay, or sidebar metadata should state
|
|
97
|
+
which layer it changes and what regression proves the invariant still holds.
|
|
98
|
+
|
|
99
|
+
## Review Checklist
|
|
100
|
+
|
|
101
|
+
Use this checklist for PRs that touch run state, streaming, replay, compression,
|
|
102
|
+
context reconstruction, or session metadata:
|
|
103
|
+
|
|
104
|
+
- Which state layers does this PR read or write?
|
|
105
|
+
- Which layer is the source of truth after this change?
|
|
106
|
+
- Can the visible transcript and model context diverge? If yes, is that
|
|
107
|
+
deliberate and user-visible?
|
|
108
|
+
- What happens after browser refresh, session switch, SSE reconnect, and WebUI
|
|
109
|
+
restart?
|
|
110
|
+
- Does replay rebuild the same scene without duplicates?
|
|
111
|
+
- Does replay use the same timeline-rendering path as live SSE for thinking,
|
|
112
|
+
interim assistant text, tool cards, compression cards, and terminal states?
|
|
113
|
+
- Can this change move a session in the sidebar without meaningful user or
|
|
114
|
+
assistant activity?
|
|
115
|
+
- Can automatic compression or recovery text become visible active-turn content?
|
|
116
|
+
- What test or manual evidence proves the invariant?
|
|
117
|
+
|
|
118
|
+
## Existing Issue Map
|
|
119
|
+
|
|
120
|
+
| Example | State boundary exposed | Relevant invariant |
|
|
121
|
+
|---|---|---|
|
|
122
|
+
| [#2341](https://github.com/nesquena/hermes-webui/issues/2341) / [#2342](https://github.com/nesquena/hermes-webui/pull/2342) | Active reattach could show agent activity without the pending user turn that started it | 2 |
|
|
123
|
+
| [#2344](https://github.com/nesquena/hermes-webui/issues/2344) / [#2347](https://github.com/nesquena/hermes-webui/pull/2347) | Session switching could lose or reorder the live thinking/tool/interim timeline | 3, 5 |
|
|
124
|
+
| [#2345](https://github.com/nesquena/hermes-webui/issues/2345) / [#2349](https://github.com/nesquena/hermes-webui/pull/2349) | Stale stream cleanup could mutate `updated_at` and resurface old sessions | 4 |
|
|
125
|
+
| [#2346](https://github.com/nesquena/hermes-webui/issues/2346) / [#2348](https://github.com/nesquena/hermes-webui/pull/2348) | Thinking cards could repeat interim assistant progress text | 5 |
|
|
126
|
+
| [#2353](https://github.com/nesquena/hermes-webui/issues/2353) / [#2354](https://github.com/nesquena/hermes-webui/pull/2354) | Recovered pending user turns could be visible but missing from model context | 1 |
|
|
127
|
+
| [#2355](https://github.com/nesquena/hermes-webui/issues/2355) / [#2357](https://github.com/nesquena/hermes-webui/pull/2357) | Auto-compression rotation could leave reference-only cards in the active conversation tail | 3, 6 |
|
|
128
|
+
| [#2308](https://github.com/nesquena/hermes-webui/issues/2308) / [#2309](https://github.com/nesquena/hermes-webui/pull/2309) | Compressed sessions could resume stale agent tasks when the user starts an ordinary fresh chat | 6 |
|
|
129
|
+
| [#2283](https://github.com/nesquena/hermes-webui/pull/2283) | Run event journal replay provides the foundation for ordered recovery | 5 |
|
|
130
|
+
|
|
131
|
+
These references are evidence for the contract. This RFC does not make the
|
|
132
|
+
linked implementation PRs dependent on this document, and it does not close the
|
|
133
|
+
tracking issue by itself.
|
|
134
|
+
|
|
135
|
+
## Relationship To The Run Adapter RFC
|
|
136
|
+
|
|
137
|
+
The run adapter RFC defines the longer-term event/control boundary for WebUI and
|
|
138
|
+
Hermes runtime ownership. This RFC defines the consistency rules that the current
|
|
139
|
+
WebUI and any future adapter-backed implementation must preserve.
|
|
140
|
+
|
|
141
|
+
The two documents should be read together:
|
|
142
|
+
|
|
143
|
+
- The adapter contract answers: "Where should execution ownership live?"
|
|
144
|
+
- This consistency contract answers: "How do transcript, context, streams,
|
|
145
|
+
replay, compression, and UI metadata stay coherent while execution is active
|
|
146
|
+
or being recovered?"
|
|
147
|
+
|
|
148
|
+
## Rollout Plan
|
|
149
|
+
|
|
150
|
+
1. Land this RFC as a reviewable draft and refine it through PR discussion.
|
|
151
|
+
2. Link future streaming/recovery/compression/sidebar PRs back to the invariant
|
|
152
|
+
they intentionally preserve or change.
|
|
153
|
+
3. Convert recurring checklist items into focused regression tests where
|
|
154
|
+
practical.
|
|
155
|
+
4. If #1925 introduces a new adapter-backed runtime layer, update this RFC or
|
|
156
|
+
replace it with the accepted implementation contract so these invariants do
|
|
157
|
+
not live only in historical discussion.
|
|
@@ -0,0 +1,280 @@
|
|
|
1
|
+
# Running Hermes Web UI under a process supervisor
|
|
2
|
+
|
|
3
|
+
Use a process supervisor (launchd, systemd, supervisord, runit, s6) when you
|
|
4
|
+
want the Web UI to start at boot, restart on crash, or be managed alongside
|
|
5
|
+
other services.
|
|
6
|
+
|
|
7
|
+
## TL;DR
|
|
8
|
+
|
|
9
|
+
Pass ``--foreground`` to ``bootstrap.py`` (or ``bash start.sh``):
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
bash start.sh --foreground
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Or set ``HERMES_WEBUI_FOREGROUND=1`` in the environment. The Web UI will
|
|
16
|
+
auto-detect launchd / systemd / supervisord even without the flag, but being
|
|
17
|
+
explicit is safer.
|
|
18
|
+
|
|
19
|
+
**Important (launchd on macOS):** if the ``com.parantoux.hermes-webui`` LaunchAgent is enabled, treat launchd as the single source of truth for WebUI lifecycle. Do **not** also run ``./ctl.sh start``, ``bash start.sh``, ``python bootstrap.py``, or ``python server.py`` against the same state dir/port, or you can create a second WebUI instance and trigger port-8787 restart churn.
|
|
20
|
+
|
|
21
|
+
## Why ``--foreground`` matters
|
|
22
|
+
|
|
23
|
+
Without it, ``bootstrap.py`` does this:
|
|
24
|
+
|
|
25
|
+
1. Spawn ``server.py`` as a detached subprocess (``start_new_session=True``)
|
|
26
|
+
2. Probe ``/health`` until the server is up
|
|
27
|
+
3. Exit 0
|
|
28
|
+
|
|
29
|
+
That works for an interactive shell run (``./start.sh`` returns to your
|
|
30
|
+
prompt with the server alive in the background). It is **broken** under any
|
|
31
|
+
process supervisor: the supervisor sees its tracked PID exit, marks the job
|
|
32
|
+
as completed, and respawns ``bootstrap.py``. The respawn fails to bind port
|
|
33
|
+
8787 (the orphaned server still has it), exits non-zero, supervisor
|
|
34
|
+
respawns again — loop.
|
|
35
|
+
|
|
36
|
+
In foreground mode, ``bootstrap.py`` does its setup work and then calls
|
|
37
|
+
``os.execv`` to replace its own process with ``server.py``. The supervisor
|
|
38
|
+
sees the long-lived server as the original child. ``KeepAlive=true`` /
|
|
39
|
+
``Restart=always`` work correctly.
|
|
40
|
+
|
|
41
|
+
## launchd (macOS)
|
|
42
|
+
|
|
43
|
+
``~/Library/LaunchAgents/com.example.hermes-webui.plist``:
|
|
44
|
+
|
|
45
|
+
```xml
|
|
46
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
47
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
48
|
+
<plist version="1.0">
|
|
49
|
+
<dict>
|
|
50
|
+
<key>Label</key>
|
|
51
|
+
<string>com.example.hermes-webui</string>
|
|
52
|
+
|
|
53
|
+
<key>ProgramArguments</key>
|
|
54
|
+
<array>
|
|
55
|
+
<string>/bin/bash</string>
|
|
56
|
+
<string>/Users/yourname/hermes-webui/start.sh</string>
|
|
57
|
+
<string>--foreground</string>
|
|
58
|
+
</array>
|
|
59
|
+
|
|
60
|
+
<key>WorkingDirectory</key>
|
|
61
|
+
<string>/Users/yourname/hermes-webui</string>
|
|
62
|
+
|
|
63
|
+
<key>RunAtLoad</key>
|
|
64
|
+
<true/>
|
|
65
|
+
|
|
66
|
+
<key>KeepAlive</key>
|
|
67
|
+
<true/>
|
|
68
|
+
|
|
69
|
+
<key>StandardOutPath</key>
|
|
70
|
+
<string>/Users/yourname/.hermes/webui/launchd-stdout.log</string>
|
|
71
|
+
|
|
72
|
+
<key>StandardErrorPath</key>
|
|
73
|
+
<string>/Users/yourname/.hermes/webui/launchd-stderr.log</string>
|
|
74
|
+
|
|
75
|
+
<key>EnvironmentVariables</key>
|
|
76
|
+
<dict>
|
|
77
|
+
<key>HOME</key>
|
|
78
|
+
<string>/Users/yourname</string>
|
|
79
|
+
<key>PATH</key>
|
|
80
|
+
<string>/usr/local/bin:/usr/bin:/bin</string>
|
|
81
|
+
</dict>
|
|
82
|
+
</dict>
|
|
83
|
+
</plist>
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Load:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
|
|
90
|
+
launchctl print gui/$(id -u)/com.example.hermes-webui # check state
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Reload after editing the plist:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
launchctl unload ~/Library/LaunchAgents/com.example.hermes-webui.plist
|
|
97
|
+
launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
launchd sets ``XPC_SERVICE_NAME`` automatically, so even without the
|
|
101
|
+
``--foreground`` argument the Web UI will auto-promote to foreground mode.
|
|
102
|
+
The flag is still recommended as documentation of intent.
|
|
103
|
+
|
|
104
|
+
## systemd (Linux)
|
|
105
|
+
|
|
106
|
+
``~/.config/systemd/user/hermes-webui.service``:
|
|
107
|
+
|
|
108
|
+
```ini
|
|
109
|
+
[Unit]
|
|
110
|
+
Description=Hermes Web UI
|
|
111
|
+
After=network.target
|
|
112
|
+
|
|
113
|
+
[Service]
|
|
114
|
+
Type=simple
|
|
115
|
+
WorkingDirectory=%h/hermes-webui
|
|
116
|
+
ExecStart=/bin/bash %h/hermes-webui/start.sh --foreground
|
|
117
|
+
Restart=on-failure
|
|
118
|
+
RestartSec=5
|
|
119
|
+
|
|
120
|
+
# Optional: route stdout/stderr to journald instead of files
|
|
121
|
+
StandardOutput=journal
|
|
122
|
+
StandardError=journal
|
|
123
|
+
|
|
124
|
+
[Install]
|
|
125
|
+
WantedBy=default.target
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Enable + start:
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
systemctl --user daemon-reload
|
|
132
|
+
systemctl --user enable --now hermes-webui.service
|
|
133
|
+
journalctl --user -u hermes-webui.service -f
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
systemd sets ``INVOCATION_ID`` and ``JOURNAL_STREAM`` (when stdio is wired to
|
|
137
|
+
the journal), both of which auto-promote to foreground mode.
|
|
138
|
+
|
|
139
|
+
## supervisord (cross-platform)
|
|
140
|
+
|
|
141
|
+
``/etc/supervisor/conf.d/hermes-webui.conf``:
|
|
142
|
+
|
|
143
|
+
```ini
|
|
144
|
+
[program:hermes-webui]
|
|
145
|
+
command=/bin/bash /home/youruser/hermes-webui/start.sh --foreground
|
|
146
|
+
directory=/home/youruser/hermes-webui
|
|
147
|
+
user=youruser
|
|
148
|
+
autostart=true
|
|
149
|
+
autorestart=true
|
|
150
|
+
stopsignal=TERM
|
|
151
|
+
stopwaitsecs=10
|
|
152
|
+
stdout_logfile=/var/log/hermes-webui.out.log
|
|
153
|
+
stderr_logfile=/var/log/hermes-webui.err.log
|
|
154
|
+
environment=HOME="/home/youruser",PATH="/usr/local/bin:/usr/bin:/bin"
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Reload + start:
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
sudo supervisorctl reread
|
|
161
|
+
sudo supervisorctl update
|
|
162
|
+
sudo supervisorctl status hermes-webui
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
supervisord sets ``SUPERVISOR_ENABLED``, which auto-promotes to foreground
|
|
166
|
+
mode.
|
|
167
|
+
|
|
168
|
+
## Auto-detected env vars (full list)
|
|
169
|
+
|
|
170
|
+
These trigger ``--foreground`` behavior even when the flag is not passed:
|
|
171
|
+
|
|
172
|
+
| Env var | Set by | Notes |
|
|
173
|
+
|---|---|---|
|
|
174
|
+
| ``INVOCATION_ID`` | systemd | Set on every service activation |
|
|
175
|
+
| ``JOURNAL_STREAM`` | systemd | Set when stdio is wired to journald |
|
|
176
|
+
| ``NOTIFY_SOCKET`` | systemd ``Type=notify`` / s6 | sd_notify-style notification socket |
|
|
177
|
+
| ``XPC_SERVICE_NAME`` | launchd | Set to the plist Label — narrowed to ``com.<rdns>.<svc>`` form (see below) |
|
|
178
|
+
| ``SUPERVISOR_ENABLED`` | supervisord | Always set under supervisord |
|
|
179
|
+
| ``HERMES_WEBUI_FOREGROUND`` | you | Explicit opt-in; accepts ``1`` / ``true`` / ``yes`` / ``on`` |
|
|
180
|
+
|
|
181
|
+
### XPC_SERVICE_NAME noise filter
|
|
182
|
+
|
|
183
|
+
macOS launchd sets ``XPC_SERVICE_NAME`` in **every Terminal-spawned shell**,
|
|
184
|
+
not just real services. Typical noise values:
|
|
185
|
+
|
|
186
|
+
- ``0`` — set on launchd descendants generally
|
|
187
|
+
- ``application.com.apple.Terminal.<UUID>`` — Terminal.app shells
|
|
188
|
+
- ``application.com.googlecode.iterm2`` — iTerm2
|
|
189
|
+
- ``application.com.microsoft.VSCode`` — VSCode integrated terminal
|
|
190
|
+
|
|
191
|
+
A bare existence check on this var would auto-promote interactive
|
|
192
|
+
``./start.sh`` runs to foreground mode on every Mac dev machine, breaking
|
|
193
|
+
the most common installation path. We narrow detection to launchd
|
|
194
|
+
**Label-style** names (typically reverse-DNS like ``com.example.foo``).
|
|
195
|
+
Real launchd plists always use this form. If you ever see
|
|
196
|
+
``XPC_SERVICE_NAME=0`` in your service environment, the auto-detect will
|
|
197
|
+
ignore it — set ``HERMES_WEBUI_FOREGROUND=1`` or pass ``--foreground``
|
|
198
|
+
explicitly to be safe.
|
|
199
|
+
|
|
200
|
+
### Supervisors that are NOT auto-detected
|
|
201
|
+
|
|
202
|
+
The following set no env var that we can reliably detect. Pass
|
|
203
|
+
``--foreground`` (or ``HERMES_WEBUI_FOREGROUND=1``) explicitly:
|
|
204
|
+
|
|
205
|
+
- **runit** (without sd_notify) — pure runit chains
|
|
206
|
+
- **daemontools** / ``svc``
|
|
207
|
+
- **PM2** (Node.js process manager occasionally repurposed for Python)
|
|
208
|
+
- **Foreman** / **Honcho** (Procfile-style)
|
|
209
|
+
- **Docker** with a custom CMD entrypoint that doesn't already use ``exec``
|
|
210
|
+
- **Custom shell-script supervisors** that fork-and-wait
|
|
211
|
+
|
|
212
|
+
If your supervisor isn't in the auto-detect list and you see the orphan-PID
|
|
213
|
+
respawn loop, set ``HERMES_WEBUI_FOREGROUND=1`` in the service environment.
|
|
214
|
+
|
|
215
|
+
## Diagnostic recipe
|
|
216
|
+
|
|
217
|
+
If the Web UI keeps getting respawned and you suspect the double-fork loop:
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
# Check the running PID for the server
|
|
221
|
+
lsof -iTCP:8787 -sTCP:LISTEN
|
|
222
|
+
|
|
223
|
+
# Get its parent — should be the supervisor itself, NOT init (PID 1)
|
|
224
|
+
PID=$(lsof -tiTCP:8787 -sTCP:LISTEN)
|
|
225
|
+
ps -p "$PID" -o pid,ppid,cmd
|
|
226
|
+
ps -p "$(ps -o ppid= -p "$PID" | tr -d ' ')" -o pid,cmd
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
A healthy foreground-mode setup looks like:
|
|
230
|
+
|
|
231
|
+
```
|
|
232
|
+
PID PPID CMD
|
|
233
|
+
12345 6789 /path/to/python /path/to/server.py
|
|
234
|
+
6789 1 /sbin/launchd # or /usr/lib/systemd/systemd, etc.
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
If PPID is ``1`` (init) when it should be the supervisor, the orphan-server
|
|
238
|
+
loop is happening — re-check that ``--foreground`` (or one of the env vars)
|
|
239
|
+
is reaching the process.
|
|
240
|
+
|
|
241
|
+
## HTTP watchdog / deep health
|
|
242
|
+
|
|
243
|
+
``KeepAlive`` / ``Restart=always`` only recover a process that exits. If the
|
|
244
|
+
process is still listening on the port but request handling is wedged, pair your
|
|
245
|
+
supervisor with an HTTP probe and force a restart when the probe fails.
|
|
246
|
+
|
|
247
|
+
Hermes Web UI exposes two health levels:
|
|
248
|
+
|
|
249
|
+
- ``/health`` — cheap liveness probe with ``active_streams``, uptime, and an
|
|
250
|
+
``accept_loop`` heartbeat counter.
|
|
251
|
+
- ``/health?deep=1`` — readiness probe that briefly acquires the stream lock,
|
|
252
|
+
reads the sidebar/session path, reads projects state, and touches Hermes
|
|
253
|
+
``state.db`` if it exists. Use this for watchdogs.
|
|
254
|
+
|
|
255
|
+
At startup the server also tries to raise its file-descriptor soft limit to
|
|
256
|
+
4096 on platforms that support ``RLIMIT_NOFILE``. That is defense in depth for
|
|
257
|
+
persistent hosts: leaks should still be fixed, but a higher soft limit gives
|
|
258
|
+
you more diagnostic headroom before request handling falls over.
|
|
259
|
+
|
|
260
|
+
Minimal macOS launchd watchdog script:
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
#!/usr/bin/env bash
|
|
264
|
+
set -euo pipefail
|
|
265
|
+
LABEL="com.example.hermes-webui"
|
|
266
|
+
BASE="http://127.0.0.1:8787"
|
|
267
|
+
|
|
268
|
+
if ! curl -fsS --max-time 10 "$BASE/health?deep=1" >/dev/null; then
|
|
269
|
+
launchctl kickstart -k "gui/$(id -u)/$LABEL"
|
|
270
|
+
fi
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
Run it every few minutes from a separate ``StartInterval`` LaunchAgent. For
|
|
274
|
+
systemd, prefer a timer/service pair that runs the same curl probe and
|
|
275
|
+
``systemctl --user restart hermes-webui.service`` on failure.
|
|
276
|
+
|
|
277
|
+
The ``accept_loop.requests_total`` value should increase when probes arrive. If
|
|
278
|
+
it stays flat while the process is still alive, the server accept loop is not
|
|
279
|
+
making progress; capture logs/thread samples before restarting if you are
|
|
280
|
+
collecting diagnostics for a bug report.
|