@bitseek/hermes-webui 0.1.0-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (233) hide show
  1. package/README.md +213 -0
  2. package/bin/hermes-webui.mjs +588 -0
  3. package/package.json +25 -0
  4. package/scripts/sync-vendor.mjs +74 -0
  5. package/templates/launchd/com.bitseek.hermes-webui.plist +21 -0
  6. package/templates/systemd/hermes-webui.service +13 -0
  7. package/templates/windows/hermes-webui-task.ps1 +3 -0
  8. package/vendor/agent-frontend-shell/.bitseek-source.json +6 -0
  9. package/vendor/agent-frontend-shell/.dockerignore +7 -0
  10. package/vendor/agent-frontend-shell/.env.docker.example +89 -0
  11. package/vendor/agent-frontend-shell/.env.example +34 -0
  12. package/vendor/agent-frontend-shell/.github/FUNDING.yml +3 -0
  13. package/vendor/agent-frontend-shell/.github/workflows/browser-smoke.yml +42 -0
  14. package/vendor/agent-frontend-shell/.github/workflows/docker-smoke.yml +233 -0
  15. package/vendor/agent-frontend-shell/.github/workflows/native-windows-startup.yml +132 -0
  16. package/vendor/agent-frontend-shell/.github/workflows/release.yml +57 -0
  17. package/vendor/agent-frontend-shell/.github/workflows/tests.yml +88 -0
  18. package/vendor/agent-frontend-shell/.vscode/launch.json +59 -0
  19. package/vendor/agent-frontend-shell/.vscode/settings.json +13 -0
  20. package/vendor/agent-frontend-shell/AGENTS.md +80 -0
  21. package/vendor/agent-frontend-shell/ARCHITECTURE.md +1658 -0
  22. package/vendor/agent-frontend-shell/BUGS.md +52 -0
  23. package/vendor/agent-frontend-shell/CHANGELOG.md +7295 -0
  24. package/vendor/agent-frontend-shell/CONTRIBUTING.md +205 -0
  25. package/vendor/agent-frontend-shell/CONTRIBUTORS.md +107 -0
  26. package/vendor/agent-frontend-shell/DESIGN.md +173 -0
  27. package/vendor/agent-frontend-shell/Dockerfile +91 -0
  28. package/vendor/agent-frontend-shell/LICENSE +21 -0
  29. package/vendor/agent-frontend-shell/README-CUSTOM.md +76 -0
  30. package/vendor/agent-frontend-shell/README.md +705 -0
  31. package/vendor/agent-frontend-shell/ROADMAP.md +351 -0
  32. package/vendor/agent-frontend-shell/SPRINTS.md +147 -0
  33. package/vendor/agent-frontend-shell/TESTING.md +1932 -0
  34. package/vendor/agent-frontend-shell/THEMES.md +170 -0
  35. package/vendor/agent-frontend-shell/api/__init__.py +1 -0
  36. package/vendor/agent-frontend-shell/api/agent_health.py +392 -0
  37. package/vendor/agent-frontend-shell/api/agent_sessions.py +782 -0
  38. package/vendor/agent-frontend-shell/api/auth.py +592 -0
  39. package/vendor/agent-frontend-shell/api/background.py +87 -0
  40. package/vendor/agent-frontend-shell/api/clarify.py +238 -0
  41. package/vendor/agent-frontend-shell/api/commands.py +124 -0
  42. package/vendor/agent-frontend-shell/api/compression_anchor.py +134 -0
  43. package/vendor/agent-frontend-shell/api/config.py +5178 -0
  44. package/vendor/agent-frontend-shell/api/dashboard_probe.py +255 -0
  45. package/vendor/agent-frontend-shell/api/extensions.py +253 -0
  46. package/vendor/agent-frontend-shell/api/gateway_chat.py +435 -0
  47. package/vendor/agent-frontend-shell/api/gateway_watcher.py +230 -0
  48. package/vendor/agent-frontend-shell/api/goals.py +608 -0
  49. package/vendor/agent-frontend-shell/api/helpers.py +474 -0
  50. package/vendor/agent-frontend-shell/api/kanban_bridge.py +1255 -0
  51. package/vendor/agent-frontend-shell/api/metering.py +194 -0
  52. package/vendor/agent-frontend-shell/api/models.py +4210 -0
  53. package/vendor/agent-frontend-shell/api/oauth.py +770 -0
  54. package/vendor/agent-frontend-shell/api/onboarding.py +1046 -0
  55. package/vendor/agent-frontend-shell/api/passkeys.py +365 -0
  56. package/vendor/agent-frontend-shell/api/profiles.py +1499 -0
  57. package/vendor/agent-frontend-shell/api/providers.py +2175 -0
  58. package/vendor/agent-frontend-shell/api/request_diagnostics.py +160 -0
  59. package/vendor/agent-frontend-shell/api/rollback.py +320 -0
  60. package/vendor/agent-frontend-shell/api/routes.py +13990 -0
  61. package/vendor/agent-frontend-shell/api/run_journal.py +284 -0
  62. package/vendor/agent-frontend-shell/api/runner_client.py +156 -0
  63. package/vendor/agent-frontend-shell/api/runtime_adapter.py +431 -0
  64. package/vendor/agent-frontend-shell/api/session_discoverability.py +640 -0
  65. package/vendor/agent-frontend-shell/api/session_events.py +45 -0
  66. package/vendor/agent-frontend-shell/api/session_lifecycle.py +208 -0
  67. package/vendor/agent-frontend-shell/api/session_ops.py +207 -0
  68. package/vendor/agent-frontend-shell/api/session_recovery.py +655 -0
  69. package/vendor/agent-frontend-shell/api/skill_usage.py +32 -0
  70. package/vendor/agent-frontend-shell/api/startup.py +128 -0
  71. package/vendor/agent-frontend-shell/api/state_sync.py +187 -0
  72. package/vendor/agent-frontend-shell/api/streaming.py +7048 -0
  73. package/vendor/agent-frontend-shell/api/system_health.py +167 -0
  74. package/vendor/agent-frontend-shell/api/terminal.py +410 -0
  75. package/vendor/agent-frontend-shell/api/turn_journal.py +214 -0
  76. package/vendor/agent-frontend-shell/api/updates.py +1261 -0
  77. package/vendor/agent-frontend-shell/api/upload.py +322 -0
  78. package/vendor/agent-frontend-shell/api/usage.py +26 -0
  79. package/vendor/agent-frontend-shell/api/workspace.py +867 -0
  80. package/vendor/agent-frontend-shell/api/workspace_git.py +1261 -0
  81. package/vendor/agent-frontend-shell/api/worktrees.py +357 -0
  82. package/vendor/agent-frontend-shell/bootstrap.py +492 -0
  83. package/vendor/agent-frontend-shell/ctl.sh +427 -0
  84. package/vendor/agent-frontend-shell/docker-compose.custom.yml +26 -0
  85. package/vendor/agent-frontend-shell/docker-compose.three-container.yml +168 -0
  86. package/vendor/agent-frontend-shell/docker-compose.two-container.yml +147 -0
  87. package/vendor/agent-frontend-shell/docker-compose.yml +57 -0
  88. package/vendor/agent-frontend-shell/docker_init.bash +459 -0
  89. package/vendor/agent-frontend-shell/docs/CONTRACTS.md +207 -0
  90. package/vendor/agent-frontend-shell/docs/EXTENSIONS.md +212 -0
  91. package/vendor/agent-frontend-shell/docs/ISSUES.md +23 -0
  92. package/vendor/agent-frontend-shell/docs/UIUX-GUIDE.md +196 -0
  93. package/vendor/agent-frontend-shell/docs/advanced-chat-setup.md +83 -0
  94. package/vendor/agent-frontend-shell/docs/docker.md +337 -0
  95. package/vendor/agent-frontend-shell/docs/onboarding-agent-checklist.md +207 -0
  96. package/vendor/agent-frontend-shell/docs/onboarding.md +202 -0
  97. package/vendor/agent-frontend-shell/docs/remote-access.md +75 -0
  98. package/vendor/agent-frontend-shell/docs/rfcs/README.md +53 -0
  99. package/vendor/agent-frontend-shell/docs/rfcs/agent-source-boundary.md +70 -0
  100. package/vendor/agent-frontend-shell/docs/rfcs/canonical-session-resolution.md +124 -0
  101. package/vendor/agent-frontend-shell/docs/rfcs/hermes-run-adapter-contract.md +1079 -0
  102. package/vendor/agent-frontend-shell/docs/rfcs/turn-journal.md +195 -0
  103. package/vendor/agent-frontend-shell/docs/rfcs/webui-run-state-consistency-contract.md +157 -0
  104. package/vendor/agent-frontend-shell/docs/supervisor.md +280 -0
  105. package/vendor/agent-frontend-shell/docs/troubleshooting.md +132 -0
  106. package/vendor/agent-frontend-shell/docs/ui-ux/index.html +863 -0
  107. package/vendor/agent-frontend-shell/docs/ui-ux/two-stage-proposal.html +768 -0
  108. package/vendor/agent-frontend-shell/docs/why-hermes.md +489 -0
  109. package/vendor/agent-frontend-shell/docs/workspace-git.md +92 -0
  110. package/vendor/agent-frontend-shell/docs/wsl-autostart.md +126 -0
  111. package/vendor/agent-frontend-shell/eslint.runtime-guard.config.mjs +35 -0
  112. package/vendor/agent-frontend-shell/extensions/bitseek-design-system.md +330 -0
  113. package/vendor/agent-frontend-shell/extensions/branding/assets/apple-touch-icon.png +0 -0
  114. package/vendor/agent-frontend-shell/extensions/branding/assets/empty-logo.svg +739 -0
  115. package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-192.png +0 -0
  116. package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-32.png +0 -0
  117. package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-512.png +0 -0
  118. package/vendor/agent-frontend-shell/extensions/branding/assets/favicon-512.svg +745 -0
  119. package/vendor/agent-frontend-shell/extensions/branding/assets/favicon.ico +0 -0
  120. package/vendor/agent-frontend-shell/extensions/branding/assets/favicon.svg +745 -0
  121. package/vendor/agent-frontend-shell/extensions/branding/assets/titlebar-icon-v2.svg +751 -0
  122. package/vendor/agent-frontend-shell/extensions/branding/assets/titlebar-icon-v3.svg +739 -0
  123. package/vendor/agent-frontend-shell/extensions/branding/assets/titlebar-icon.svg +745 -0
  124. package/vendor/agent-frontend-shell/extensions/branding/branding.js +112 -0
  125. package/vendor/agent-frontend-shell/extensions/branding/config.json +14 -0
  126. package/vendor/agent-frontend-shell/extensions/branding/manifest.json +53 -0
  127. package/vendor/agent-frontend-shell/extensions/index.js +67 -0
  128. package/vendor/agent-frontend-shell/extensions/loader/hermes-loader.js +77 -0
  129. package/vendor/agent-frontend-shell/extensions/manifest.json +16 -0
  130. package/vendor/agent-frontend-shell/extensions/pages/ai-teammates/page.css +333 -0
  131. package/vendor/agent-frontend-shell/extensions/pages/ai-teammates/page.js +487 -0
  132. package/vendor/agent-frontend-shell/extensions/pages/manifest.json +6 -0
  133. package/vendor/agent-frontend-shell/extensions/pages/registry.css +56 -0
  134. package/vendor/agent-frontend-shell/extensions/pages/registry.js +302 -0
  135. package/vendor/agent-frontend-shell/extensions/themes/bitseek/index.css +93 -0
  136. package/vendor/agent-frontend-shell/extensions/themes/bitseek/index.js +98 -0
  137. package/vendor/agent-frontend-shell/install.sh +63 -0
  138. package/vendor/agent-frontend-shell/mcp_server.py +567 -0
  139. package/vendor/agent-frontend-shell/package.json +12 -0
  140. package/vendor/agent-frontend-shell/pyproject.toml +56 -0
  141. package/vendor/agent-frontend-shell/pytest.ini +3 -0
  142. package/vendor/agent-frontend-shell/requirements.txt +5 -0
  143. package/vendor/agent-frontend-shell/server.py +624 -0
  144. package/vendor/agent-frontend-shell/start.ps1 +210 -0
  145. package/vendor/agent-frontend-shell/start.sh +65 -0
  146. package/vendor/agent-frontend-shell/static/apple-touch-icon.png +0 -0
  147. package/vendor/agent-frontend-shell/static/boot.js +1990 -0
  148. package/vendor/agent-frontend-shell/static/commands.js +1402 -0
  149. package/vendor/agent-frontend-shell/static/favicon-192.png +0 -0
  150. package/vendor/agent-frontend-shell/static/favicon-32.png +0 -0
  151. package/vendor/agent-frontend-shell/static/favicon-512.png +0 -0
  152. package/vendor/agent-frontend-shell/static/favicon-512.svg +18 -0
  153. package/vendor/agent-frontend-shell/static/favicon.ico +0 -0
  154. package/vendor/agent-frontend-shell/static/favicon.svg +20 -0
  155. package/vendor/agent-frontend-shell/static/i18n.js +15389 -0
  156. package/vendor/agent-frontend-shell/static/icons.js +92 -0
  157. package/vendor/agent-frontend-shell/static/index.html +1506 -0
  158. package/vendor/agent-frontend-shell/static/login.js +177 -0
  159. package/vendor/agent-frontend-shell/static/manifest.json +53 -0
  160. package/vendor/agent-frontend-shell/static/messages.js +3521 -0
  161. package/vendor/agent-frontend-shell/static/onboarding.js +800 -0
  162. package/vendor/agent-frontend-shell/static/panels.js +7995 -0
  163. package/vendor/agent-frontend-shell/static/pwa-startup.js +83 -0
  164. package/vendor/agent-frontend-shell/static/sessions.js +5165 -0
  165. package/vendor/agent-frontend-shell/static/style.css +4774 -0
  166. package/vendor/agent-frontend-shell/static/sw.js +173 -0
  167. package/vendor/agent-frontend-shell/static/terminal.js +632 -0
  168. package/vendor/agent-frontend-shell/static/ui.js +8997 -0
  169. package/vendor/agent-frontend-shell/static/vendor/js-yaml/4.1.0/js-yaml.min.js +2 -0
  170. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_AMS-Regular.ttf +0 -0
  171. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_AMS-Regular.woff +0 -0
  172. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_AMS-Regular.woff2 +0 -0
  173. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Bold.ttf +0 -0
  174. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Bold.woff +0 -0
  175. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Bold.woff2 +0 -0
  176. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Regular.ttf +0 -0
  177. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Regular.woff +0 -0
  178. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Caligraphic-Regular.woff2 +0 -0
  179. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Bold.ttf +0 -0
  180. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Bold.woff +0 -0
  181. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Bold.woff2 +0 -0
  182. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Regular.ttf +0 -0
  183. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Regular.woff +0 -0
  184. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Fraktur-Regular.woff2 +0 -0
  185. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Bold.ttf +0 -0
  186. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Bold.woff +0 -0
  187. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Bold.woff2 +0 -0
  188. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-BoldItalic.ttf +0 -0
  189. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-BoldItalic.woff +0 -0
  190. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-BoldItalic.woff2 +0 -0
  191. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Italic.ttf +0 -0
  192. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Italic.woff +0 -0
  193. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Italic.woff2 +0 -0
  194. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Regular.ttf +0 -0
  195. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Regular.woff +0 -0
  196. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Main-Regular.woff2 +0 -0
  197. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-BoldItalic.ttf +0 -0
  198. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-BoldItalic.woff +0 -0
  199. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-BoldItalic.woff2 +0 -0
  200. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-Italic.ttf +0 -0
  201. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-Italic.woff +0 -0
  202. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Math-Italic.woff2 +0 -0
  203. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Bold.ttf +0 -0
  204. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Bold.woff +0 -0
  205. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Bold.woff2 +0 -0
  206. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Italic.ttf +0 -0
  207. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Italic.woff +0 -0
  208. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Italic.woff2 +0 -0
  209. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Regular.ttf +0 -0
  210. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Regular.woff +0 -0
  211. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_SansSerif-Regular.woff2 +0 -0
  212. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Script-Regular.ttf +0 -0
  213. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Script-Regular.woff +0 -0
  214. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Script-Regular.woff2 +0 -0
  215. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size1-Regular.ttf +0 -0
  216. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size1-Regular.woff +0 -0
  217. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size1-Regular.woff2 +0 -0
  218. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size2-Regular.ttf +0 -0
  219. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size2-Regular.woff +0 -0
  220. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size2-Regular.woff2 +0 -0
  221. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size3-Regular.ttf +0 -0
  222. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size3-Regular.woff +0 -0
  223. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size3-Regular.woff2 +0 -0
  224. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size4-Regular.ttf +0 -0
  225. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size4-Regular.woff +0 -0
  226. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Size4-Regular.woff2 +0 -0
  227. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Typewriter-Regular.ttf +0 -0
  228. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Typewriter-Regular.woff +0 -0
  229. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/fonts/KaTeX_Typewriter-Regular.woff2 +0 -0
  230. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/katex.min.css +1 -0
  231. package/vendor/agent-frontend-shell/static/vendor/katex/0.16.22/katex.min.js +1 -0
  232. package/vendor/agent-frontend-shell/static/vendor/smd.min.js +29 -0
  233. package/vendor/agent-frontend-shell/static/workspace.js +680 -0
@@ -0,0 +1,195 @@
1
+ # RFC: WebUI Turn Journal for Crash-Safe Chat Submissions
2
+
3
+ - **Status:** Proposed
4
+ - **Author:** @ai-ag2026
5
+ - **Created:** 2026-05-11
6
+
7
+ ## Problem
8
+
9
+ A WebUI chat turn crosses several durability boundaries:
10
+
11
+ 1. browser submits a user message,
12
+ 2. WebUI creates or updates session runtime metadata,
13
+ 3. the agent worker starts streaming,
14
+ 4. assistant output is appended,
15
+ 5. the JSON sidecar and derived index are saved.
16
+
17
+ If the server crashes between submission and the final sidecar save, recovery has to infer what happened from `pending_user_message`, `active_stream_id`, `.json.bak`, `_index.json`, and `state.db`. Those safeguards are useful, but they are still reconstructing intent after the fact.
18
+
19
+ The missing primitive is a small write-ahead journal for turns: record the submitted user turn durably before the worker starts, then advance the journal as the turn progresses.
20
+
21
+ ## Goals
22
+
23
+ - Preserve the exact user-submitted turn, including attachments metadata, before any provider or worker work starts.
24
+ - Make crash recovery deterministic: a submitted-but-unfinished turn can be reported or reconstructed without guessing.
25
+ - Keep the journal append/update format simple enough for startup recovery, CLI audit, and future API repair endpoints.
26
+ - Avoid turning recovery into a background daemon. This is storage hygiene, not a long-running service.
27
+
28
+ ## Non-goals
29
+
30
+ - Replacing `state.db.sessions` or WebUI JSON sidecars.
31
+ - Journaling every token or every SSE event.
32
+ - Replaying tool calls or provider streams.
33
+ - Automatically inventing assistant messages after ambiguous crashes.
34
+
35
+ ## Proposed storage
36
+
37
+ Use one JSONL file per session under the existing WebUI state area:
38
+
39
+ ```text
40
+ <SESSION_DIR>/_turn_journal/<session_id>.jsonl
41
+ ```
42
+
43
+ Each line is an immutable event. Recovery can scan by `turn_id` and choose the latest status.
44
+
45
+ ### Event shape
46
+
47
+ ```json
48
+ {
49
+ "version": 1,
50
+ "event": "submitted",
51
+ "turn_id": "20260511T001122Z-abcdef",
52
+ "session_id": "abc123",
53
+ "stream_id": "stream-xyz",
54
+ "created_at": 1778458282.123,
55
+ "role": "user",
56
+ "content": "...",
57
+ "attachments": [],
58
+ "workspace": "/workspace",
59
+ "model": "openai/gpt-5",
60
+ "model_provider": "openai"
61
+ }
62
+ ```
63
+
64
+ Later events for the same `turn_id`:
65
+
66
+ ```json
67
+ {"version":1,"event":"worker_started","turn_id":"...","created_at":1778458283.0}
68
+ {"version":1,"event":"assistant_started","turn_id":"...","created_at":1778458284.0}
69
+ {"version":1,"event":"completed","turn_id":"...","created_at":1778458299.0,"assistant_message_index":12}
70
+ {"version":1,"event":"interrupted","turn_id":"...","created_at":1778458301.0,"reason":"server_startup_recovery"}
71
+ ```
72
+
73
+ ## Turn state machine
74
+
75
+ ```text
76
+ submitted -> worker_started -> assistant_started -> completed
77
+ submitted -> interrupted
78
+ worker_started -> interrupted
79
+ assistant_started -> interrupted
80
+ ```
81
+
82
+ `completed` is terminal. `interrupted` is terminal unless a later explicit repair creates a new turn. Recovery should not silently resume a provider call.
83
+
84
+ ## Write rules
85
+
86
+ 1. On `/api/chat/start` or equivalent turn-submission path:
87
+ - generate `turn_id`,
88
+ - append `submitted`,
89
+ - fsync the journal file,
90
+ - only then start the worker.
91
+ 2. When worker thread enters `_run_agent_streaming`, append `worker_started`.
92
+ 3. When assistant output is first persisted or clearly begins, append `assistant_started`.
93
+ 4. After the sidecar save that includes the assistant answer succeeds, append `completed`.
94
+ 5. On cancellation or known worker exception, append `interrupted` with a reason.
95
+
96
+ ## Synchronous durability design rationale
97
+
98
+ The `submitted` event uses synchronous `fsync` on every write today. This is a deliberate tradeoff between latency and crash-safety guarantees:
99
+
100
+ ### Why synchronous for submitted events
101
+
102
+ The `submitted` event is the durability anchor for the entire recovery story. If the server crashes before the worker starts, the journal must reflect that the user message was received. Async writes risk losing that guarantee: a crash shortly after a non-fsync'd write could leave the journal silent while `pending_user_message` still exists, creating ambiguity during recovery. The current design avoids that ambiguity at the cost of one extra disk round-trip per turn submission.
103
+
104
+ ### Latency expectations by storage type
105
+
106
+ Reported fsync latency varies significantly across storage backends. Approximate qualitative ranges to keep in mind:
107
+
108
+ - **SSD (NVM/NVMe)**: Single-digit milliseconds; p99 typically well under 10 ms on modern hardware. Most turn submissions will see sub-5 ms overhead.
109
+ - **Rotational disk (HDD)**: Seek time dominates; p50 ~5–15 ms, p99 can reach 50–100 ms under load. A busy server with many concurrent submissions may see queueing effects.
110
+ - **Docker/overlay filesystems**: fsync latency depends on the container storage driver and the backing host filesystem. Write-through and copy-on-write semantics can introduce additional overhead; p95 may be 10–50 ms in typical containerized deployments, though exact figures vary by configuration and host load.
111
+
112
+ These ranges are order-of-magnitude guidance, not benchmarks. Exact figures depend on hardware, kernel version, filesystem mount options, and concurrent load. Do not commit specific millisecond claims to documentation without measured evidence.
113
+
114
+ ### Benchmark guidance for maintainers
115
+
116
+ If evidence suggests the synchronous write is a bottleneck, measure before changing anything:
117
+
118
+ 1. Instrument the `append_turn_journal_event` helper to record wall-clock time for each event type (submitted, worker_started, etc.).
119
+ 2. Capture p50/p95/p99 append/fsync latency over a representative workload (e.g., at least 1,000 submitted turns under realistic concurrency).
120
+ 3. Isolate the fsync component: on Linux, use `strace -e fsync` or kernel tracing (`ftrace`, `perf`) to confirm where time is spent.
121
+ 4. Check for patterns: if most submissions are under 5 ms but the p99 is 200 ms due to occasional disk contention, async writes help the tail but not the median. The tradeoff must be evaluated in context of your recovery guarantees.
122
+
123
+ ### Future follow-up: async lifecycle-event journaling
124
+
125
+ Making journal writes asynchronous is a valid future optimization, but it requires:
126
+
127
+ - A reliable flush strategy (e.g., time-bounded flush every N seconds, flush on session close, flush after K pending events).
128
+ - Recovery logic that handles partial flush windows: if a crash occurs before the flush, the last few submitted events may be missing from the journal. Recovery must account for that ambiguity.
129
+ - Tests that verify the flush correctness under crash injection.
130
+
131
+ Async journal writes are **not** part of the initial implementation. They belong in a follow-up RFC once the synchronous baseline is proven stable and the recovery semantics are well-understood.
132
+
133
+ ## Startup recovery semantics
134
+
135
+ On startup, for each journal file:
136
+
137
+ - Latest event is `completed`: no action.
138
+ - Latest event is `submitted` or `worker_started` and no matching user message exists in sidecar:
139
+ - append/recover the user message into the session sidecar with a recovery marker.
140
+ - Latest event is `submitted`, `worker_started`, or `assistant_started` and no completed assistant turn exists:
141
+ - add a visible interruption marker, not a fake assistant answer.
142
+ - Existing `.json.bak` and `state.db` recovery still run first so the sidecar is as complete as possible before journal reconciliation.
143
+
144
+ ## Audit additions
145
+
146
+ `audit_session_recovery()` can report:
147
+
148
+ - `turn_journal_pending_turn` — repairable if the user message is absent from sidecar.
149
+ - `turn_journal_interrupted_turn` — ok/warn depending on whether a visible marker exists.
150
+ - `turn_journal_malformed_event` — manual review.
151
+
152
+ Safe repair should only materialize submitted user messages and interruption markers when the journal event content is valid JSON and the target message is absent.
153
+
154
+ ## API surface
155
+
156
+ Initial read-only endpoint can be folded into the existing recovery audit:
157
+
158
+ ```text
159
+ GET /api/session/recovery/audit
160
+ ```
161
+
162
+ Later, if needed:
163
+
164
+ ```text
165
+ GET /api/session/turn-journal?session_id=<id>
166
+ ```
167
+
168
+ The latter should be diagnostic-only and redact or omit large attachment payloads.
169
+
170
+ ## Rollout plan
171
+
172
+ 1. Land backup/sidecar recovery and audit primitives.
173
+ 2. Add this journal writer in the turn-submission path behind no config flag; it is local-only and append-only.
174
+ 3. Add read-only audit reporting for pending journal turns.
175
+ 4. Add safe repair for missing user messages and interruption markers.
176
+ 5. Once stable, consider pruning completed journal entries older than a retention window, but only after sidecar/index recovery has no findings.
177
+
178
+ ## Open questions
179
+
180
+ - Exact place to define `turn_id` so browser retry and server retry do not duplicate the same user message.
181
+ - Whether attachment files need their own durable manifest entry or whether metadata-only is enough for v1.
182
+ - How much of the assistant partial output, if any, should be recoverable after `assistant_started` but before `completed`.
183
+ - Whether completed journal entries should be compacted into a per-session checkpoint file.
184
+
185
+ ## Minimal implementation slice
186
+
187
+ The first implementation PR should be deliberately small:
188
+
189
+ - helper: `append_turn_journal_event(session_id, event)`
190
+ - helper: `read_turn_journal(session_id)`
191
+ - unit tests for atomic append, malformed-line tolerance, and state derivation
192
+ - one call site: append `submitted` before worker start
193
+ - audit-only report of pending journal turns
194
+
195
+ Do **not** combine the first implementation with replay/repair. Replay is where most of the bugs in WAL systems live; ship the writer and audit first, prove the format, then add repair.
@@ -0,0 +1,157 @@
1
+ # WebUI Run State Consistency Contract
2
+
3
+ - **Status:** Proposed
4
+ - **Author:** @franksong2702
5
+ - **Created:** 2026-05-16
6
+ - **Tracking issue:** [#2361](https://github.com/nesquena/hermes-webui/issues/2361)
7
+ - **Related architecture:** [#1925](https://github.com/nesquena/hermes-webui/issues/1925), [`hermes-run-adapter-contract.md`](hermes-run-adapter-contract.md)
8
+
9
+ ## Problem
10
+
11
+ A single WebUI agent turn is represented by several overlapping state layers:
12
+
13
+ - the visible transcript the user can read,
14
+ - the model context / `context_messages` the agent actually receives,
15
+ - `pending_user_message` and active stream metadata,
16
+ - live SSE events and in-memory stream state,
17
+ - durable run journal / replay state,
18
+ - automatic compression summaries and active-task handoff text,
19
+ - the browser's live timeline DOM/cache,
20
+ - sidebar ordering, unread state, and `updated_at` metadata.
21
+
22
+ Those layers are not independent. When they drift apart, the user sees failures
23
+ that look unrelated: a prompt is visible but missing from recovered model
24
+ context, a live run loses or reorders thinking/tool cards after switching
25
+ sessions, cleanup makes old sessions look newly active, replay duplicates content,
26
+ or automatic compression reference material appears inside the active turn.
27
+
28
+ This RFC defines a consistency contract for those layers. It complements the
29
+ larger run adapter direction in #1925 by documenting what must remain coherent
30
+ while WebUI still has multiple overlapping state stores.
31
+
32
+ ## Goals
33
+
34
+ - Define the state layers involved in active and recovered WebUI turns.
35
+ - Make the source-of-truth expectations explicit for each layer.
36
+ - Give reviewers a checklist for streaming, replay, compression, recovery,
37
+ model-context, and sidebar changes.
38
+ - Map recent real issues to reusable invariants so future fixes do not solve the
39
+ same class of bug one symptom at a time.
40
+
41
+ ## Non-goals
42
+
43
+ - Do not implement a runner process, sidecar, or new runtime boundary here.
44
+ - Do not replace #1925 or the run adapter contract.
45
+ - Do not rewrite the streaming protocol in this RFC.
46
+ - Do not reopen already-fixed narrow bugs.
47
+ - Do not make this a catch-all for unrelated UI polish.
48
+
49
+ ## State Layers
50
+
51
+ | Layer | Purpose | Source-of-truth expectation | Must not do |
52
+ |---|---|---|---|
53
+ | Visible transcript | Shows what the user and assistant said | Session transcript plus live replay should produce one chronological user-visible story | Hide the user turn that started active work, or show internal recovery text as current user intent |
54
+ | Model context / `context_messages` | Supplies conversation state to the agent | Must include the current visible user turn unless deliberately excluded with a user-visible reason | Let the agent resume from context that contradicts what the user can see |
55
+ | Pending turn metadata | Bridges submitted-but-not-yet-finalized user input | Must identify the user turn and stream that own active work | Become a permanent duplicate transcript row after recovery |
56
+ | Live stream / SSE | Delivers active runtime events to the browser | Must remain an observation path, not the only durable truth for already-emitted events | Lose the visible scene on refresh, reconnect, or session switch |
57
+ | Run journal / replay | Rebuilds emitted runtime events after reconnect or restart | Must be cursor-safe and idempotent | Duplicate assistant text, thinking text, tool cards, or compression cards |
58
+ | Compression summary / handoff | Gives the agent recovery context after automatic compression | Must remain agent-facing recovery material unless explicitly rendered as history | Pollute the active turn or become implicit current user intent |
59
+ | Live UI scene/cache | Preserves expanded rows, in-progress cards, local scroll, and transient grouping | May optimize presentation but must be rebuildable or degradable from transcript/replay | Become the only place where chronological ordering exists |
60
+ | Sidebar/session metadata | Helps the user find active and recent sessions | Must reflect meaningful user or assistant activity | Treat background cleanup as a fresh user-facing update |
61
+
62
+ ## Core Invariants
63
+
64
+ 1. **Visible current turns enter model context.** If the user can see a current
65
+ prompt and WebUI asks the model to continue that work, the prompt must be in
66
+ the reconstructed model context unless WebUI shows an explicit reason it was
67
+ excluded.
68
+ 2. **Active turn UI keeps its owner.** The user turn that started active work
69
+ must remain visible before assistant text, thinking cards, tool cards, or
70
+ activity groups that belong to that work.
71
+ 3. **Reattach preserves order or degrades clearly.** Refresh, reconnect, and
72
+ session switch must preserve chronological live-scene order. If WebUI cannot
73
+ restore the exact live scene, it should downgrade to an explicit structured
74
+ replay state instead of silently reordering content.
75
+ 4. **Maintenance is not activity.** Runtime maintenance such as stale-stream
76
+ cleanup, orphan repair, or background compression must not refresh sidebar
77
+ ordering, unread markers, or active-session affordances as if the user or
78
+ assistant just acted.
79
+ 5. **Replay is idempotent.** Replaying a run from a cursor must not duplicate
80
+ transcript rows, thinking content, interim assistant text, tool cards, or
81
+ compression cards. Replayed long-task events should enter the same
82
+ browser-facing timeline renderer as live SSE events so recovery does not
83
+ downgrade a structured Thinking / progress / tool / compression turn into a
84
+ separate flattened presentation.
85
+ Visible interim assistant progress must remain visible timeline content; a
86
+ compact Activity disclosure may summarize adjacent tool/debug detail, but it
87
+ must not be the only place where the user can see emitted progress text.
88
+ 6. **Compression is not current intent.** Automatic compression summaries and
89
+ reference cards are recovery/handoff material. They must not be treated as a
90
+ new user request, active-turn content, or the default visible explanation for
91
+ the current answer.
92
+ 7. **Observation has a degraded path.** Long-running or many-session observation
93
+ should expose enough heartbeat/degraded status that the UI does not appear
94
+ silent and ordinary APIs do not stall behind active streams.
95
+ 8. **Every mutation names its layer.** A PR touching streaming, recovery,
96
+ context reconstruction, compression, replay, or sidebar metadata should state
97
+ which layer it changes and what regression proves the invariant still holds.
98
+
99
+ ## Review Checklist
100
+
101
+ Use this checklist for PRs that touch run state, streaming, replay, compression,
102
+ context reconstruction, or session metadata:
103
+
104
+ - Which state layers does this PR read or write?
105
+ - Which layer is the source of truth after this change?
106
+ - Can the visible transcript and model context diverge? If yes, is that
107
+ deliberate and user-visible?
108
+ - What happens after browser refresh, session switch, SSE reconnect, and WebUI
109
+ restart?
110
+ - Does replay rebuild the same scene without duplicates?
111
+ - Does replay use the same timeline-rendering path as live SSE for thinking,
112
+ interim assistant text, tool cards, compression cards, and terminal states?
113
+ - Can this change move a session in the sidebar without meaningful user or
114
+ assistant activity?
115
+ - Can automatic compression or recovery text become visible active-turn content?
116
+ - What test or manual evidence proves the invariant?
117
+
118
+ ## Existing Issue Map
119
+
120
+ | Example | State boundary exposed | Relevant invariant |
121
+ |---|---|---|
122
+ | [#2341](https://github.com/nesquena/hermes-webui/issues/2341) / [#2342](https://github.com/nesquena/hermes-webui/pull/2342) | Active reattach could show agent activity without the pending user turn that started it | 2 |
123
+ | [#2344](https://github.com/nesquena/hermes-webui/issues/2344) / [#2347](https://github.com/nesquena/hermes-webui/pull/2347) | Session switching could lose or reorder the live thinking/tool/interim timeline | 3, 5 |
124
+ | [#2345](https://github.com/nesquena/hermes-webui/issues/2345) / [#2349](https://github.com/nesquena/hermes-webui/pull/2349) | Stale stream cleanup could mutate `updated_at` and resurface old sessions | 4 |
125
+ | [#2346](https://github.com/nesquena/hermes-webui/issues/2346) / [#2348](https://github.com/nesquena/hermes-webui/pull/2348) | Thinking cards could repeat interim assistant progress text | 5 |
126
+ | [#2353](https://github.com/nesquena/hermes-webui/issues/2353) / [#2354](https://github.com/nesquena/hermes-webui/pull/2354) | Recovered pending user turns could be visible but missing from model context | 1 |
127
+ | [#2355](https://github.com/nesquena/hermes-webui/issues/2355) / [#2357](https://github.com/nesquena/hermes-webui/pull/2357) | Auto-compression rotation could leave reference-only cards in the active conversation tail | 3, 6 |
128
+ | [#2308](https://github.com/nesquena/hermes-webui/issues/2308) / [#2309](https://github.com/nesquena/hermes-webui/pull/2309) | Compressed sessions could resume stale agent tasks when the user starts an ordinary fresh chat | 6 |
129
+ | [#2283](https://github.com/nesquena/hermes-webui/pull/2283) | Run event journal replay provides the foundation for ordered recovery | 5 |
130
+
131
+ These references are evidence for the contract. This RFC does not make the
132
+ linked implementation PRs dependent on this document, and it does not close the
133
+ tracking issue by itself.
134
+
135
+ ## Relationship To The Run Adapter RFC
136
+
137
+ The run adapter RFC defines the longer-term event/control boundary for WebUI and
138
+ Hermes runtime ownership. This RFC defines the consistency rules that the current
139
+ WebUI and any future adapter-backed implementation must preserve.
140
+
141
+ The two documents should be read together:
142
+
143
+ - The adapter contract answers: "Where should execution ownership live?"
144
+ - This consistency contract answers: "How do transcript, context, streams,
145
+ replay, compression, and UI metadata stay coherent while execution is active
146
+ or being recovered?"
147
+
148
+ ## Rollout Plan
149
+
150
+ 1. Land this RFC as a reviewable draft and refine it through PR discussion.
151
+ 2. Link future streaming/recovery/compression/sidebar PRs back to the invariant
152
+ they intentionally preserve or change.
153
+ 3. Convert recurring checklist items into focused regression tests where
154
+ practical.
155
+ 4. If #1925 introduces a new adapter-backed runtime layer, update this RFC or
156
+ replace it with the accepted implementation contract so these invariants do
157
+ not live only in historical discussion.
@@ -0,0 +1,280 @@
1
+ # Running Hermes Web UI under a process supervisor
2
+
3
+ Use a process supervisor (launchd, systemd, supervisord, runit, s6) when you
4
+ want the Web UI to start at boot, restart on crash, or be managed alongside
5
+ other services.
6
+
7
+ ## TL;DR
8
+
9
+ Pass ``--foreground`` to ``bootstrap.py`` (or ``bash start.sh``):
10
+
11
+ ```bash
12
+ bash start.sh --foreground
13
+ ```
14
+
15
+ Or set ``HERMES_WEBUI_FOREGROUND=1`` in the environment. The Web UI will
16
+ auto-detect launchd / systemd / supervisord even without the flag, but being
17
+ explicit is safer.
18
+
19
+ **Important (launchd on macOS):** if the ``com.parantoux.hermes-webui`` LaunchAgent is enabled, treat launchd as the single source of truth for WebUI lifecycle. Do **not** also run ``./ctl.sh start``, ``bash start.sh``, ``python bootstrap.py``, or ``python server.py`` against the same state dir/port, or you can create a second WebUI instance and trigger port-8787 restart churn.
20
+
21
+ ## Why ``--foreground`` matters
22
+
23
+ Without it, ``bootstrap.py`` does this:
24
+
25
+ 1. Spawn ``server.py`` as a detached subprocess (``start_new_session=True``)
26
+ 2. Probe ``/health`` until the server is up
27
+ 3. Exit 0
28
+
29
+ That works for an interactive shell run (``./start.sh`` returns to your
30
+ prompt with the server alive in the background). It is **broken** under any
31
+ process supervisor: the supervisor sees its tracked PID exit, marks the job
32
+ as completed, and respawns ``bootstrap.py``. The respawn fails to bind port
33
+ 8787 (the orphaned server still has it), exits non-zero, supervisor
34
+ respawns again — loop.
35
+
36
+ In foreground mode, ``bootstrap.py`` does its setup work and then calls
37
+ ``os.execv`` to replace its own process with ``server.py``. The supervisor
38
+ sees the long-lived server as the original child. ``KeepAlive=true`` /
39
+ ``Restart=always`` work correctly.
40
+
41
+ ## launchd (macOS)
42
+
43
+ ``~/Library/LaunchAgents/com.example.hermes-webui.plist``:
44
+
45
+ ```xml
46
+ <?xml version="1.0" encoding="UTF-8"?>
47
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
48
+ <plist version="1.0">
49
+ <dict>
50
+ <key>Label</key>
51
+ <string>com.example.hermes-webui</string>
52
+
53
+ <key>ProgramArguments</key>
54
+ <array>
55
+ <string>/bin/bash</string>
56
+ <string>/Users/yourname/hermes-webui/start.sh</string>
57
+ <string>--foreground</string>
58
+ </array>
59
+
60
+ <key>WorkingDirectory</key>
61
+ <string>/Users/yourname/hermes-webui</string>
62
+
63
+ <key>RunAtLoad</key>
64
+ <true/>
65
+
66
+ <key>KeepAlive</key>
67
+ <true/>
68
+
69
+ <key>StandardOutPath</key>
70
+ <string>/Users/yourname/.hermes/webui/launchd-stdout.log</string>
71
+
72
+ <key>StandardErrorPath</key>
73
+ <string>/Users/yourname/.hermes/webui/launchd-stderr.log</string>
74
+
75
+ <key>EnvironmentVariables</key>
76
+ <dict>
77
+ <key>HOME</key>
78
+ <string>/Users/yourname</string>
79
+ <key>PATH</key>
80
+ <string>/usr/local/bin:/usr/bin:/bin</string>
81
+ </dict>
82
+ </dict>
83
+ </plist>
84
+ ```
85
+
86
+ Load:
87
+
88
+ ```bash
89
+ launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
90
+ launchctl print gui/$(id -u)/com.example.hermes-webui # check state
91
+ ```
92
+
93
+ Reload after editing the plist:
94
+
95
+ ```bash
96
+ launchctl unload ~/Library/LaunchAgents/com.example.hermes-webui.plist
97
+ launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
98
+ ```
99
+
100
+ launchd sets ``XPC_SERVICE_NAME`` automatically, so even without the
101
+ ``--foreground`` argument the Web UI will auto-promote to foreground mode.
102
+ The flag is still recommended as documentation of intent.
103
+
104
+ ## systemd (Linux)
105
+
106
+ ``~/.config/systemd/user/hermes-webui.service``:
107
+
108
+ ```ini
109
+ [Unit]
110
+ Description=Hermes Web UI
111
+ After=network.target
112
+
113
+ [Service]
114
+ Type=simple
115
+ WorkingDirectory=%h/hermes-webui
116
+ ExecStart=/bin/bash %h/hermes-webui/start.sh --foreground
117
+ Restart=on-failure
118
+ RestartSec=5
119
+
120
+ # Optional: route stdout/stderr to journald instead of files
121
+ StandardOutput=journal
122
+ StandardError=journal
123
+
124
+ [Install]
125
+ WantedBy=default.target
126
+ ```
127
+
128
+ Enable + start:
129
+
130
+ ```bash
131
+ systemctl --user daemon-reload
132
+ systemctl --user enable --now hermes-webui.service
133
+ journalctl --user -u hermes-webui.service -f
134
+ ```
135
+
136
+ systemd sets ``INVOCATION_ID`` and ``JOURNAL_STREAM`` (when stdio is wired to
137
+ the journal), both of which auto-promote to foreground mode.
138
+
139
+ ## supervisord (cross-platform)
140
+
141
+ ``/etc/supervisor/conf.d/hermes-webui.conf``:
142
+
143
+ ```ini
144
+ [program:hermes-webui]
145
+ command=/bin/bash /home/youruser/hermes-webui/start.sh --foreground
146
+ directory=/home/youruser/hermes-webui
147
+ user=youruser
148
+ autostart=true
149
+ autorestart=true
150
+ stopsignal=TERM
151
+ stopwaitsecs=10
152
+ stdout_logfile=/var/log/hermes-webui.out.log
153
+ stderr_logfile=/var/log/hermes-webui.err.log
154
+ environment=HOME="/home/youruser",PATH="/usr/local/bin:/usr/bin:/bin"
155
+ ```
156
+
157
+ Reload + start:
158
+
159
+ ```bash
160
+ sudo supervisorctl reread
161
+ sudo supervisorctl update
162
+ sudo supervisorctl status hermes-webui
163
+ ```
164
+
165
+ supervisord sets ``SUPERVISOR_ENABLED``, which auto-promotes to foreground
166
+ mode.
167
+
168
+ ## Auto-detected env vars (full list)
169
+
170
+ These trigger ``--foreground`` behavior even when the flag is not passed:
171
+
172
+ | Env var | Set by | Notes |
173
+ |---|---|---|
174
+ | ``INVOCATION_ID`` | systemd | Set on every service activation |
175
+ | ``JOURNAL_STREAM`` | systemd | Set when stdio is wired to journald |
176
+ | ``NOTIFY_SOCKET`` | systemd ``Type=notify`` / s6 | sd_notify-style notification socket |
177
+ | ``XPC_SERVICE_NAME`` | launchd | Set to the plist Label — narrowed to ``com.<rdns>.<svc>`` form (see below) |
178
+ | ``SUPERVISOR_ENABLED`` | supervisord | Always set under supervisord |
179
+ | ``HERMES_WEBUI_FOREGROUND`` | you | Explicit opt-in; accepts ``1`` / ``true`` / ``yes`` / ``on`` |
180
+
181
+ ### XPC_SERVICE_NAME noise filter
182
+
183
+ macOS launchd sets ``XPC_SERVICE_NAME`` in **every Terminal-spawned shell**,
184
+ not just real services. Typical noise values:
185
+
186
+ - ``0`` — set on launchd descendants generally
187
+ - ``application.com.apple.Terminal.<UUID>`` — Terminal.app shells
188
+ - ``application.com.googlecode.iterm2`` — iTerm2
189
+ - ``application.com.microsoft.VSCode`` — VSCode integrated terminal
190
+
191
+ A bare existence check on this var would auto-promote interactive
192
+ ``./start.sh`` runs to foreground mode on every Mac dev machine, breaking
193
+ the most common installation path. We narrow detection to launchd
194
+ **Label-style** names (typically reverse-DNS like ``com.example.foo``).
195
+ Real launchd plists always use this form. If you ever see
196
+ ``XPC_SERVICE_NAME=0`` in your service environment, the auto-detect will
197
+ ignore it — set ``HERMES_WEBUI_FOREGROUND=1`` or pass ``--foreground``
198
+ explicitly to be safe.
199
+
200
+ ### Supervisors that are NOT auto-detected
201
+
202
+ The following set no env var that we can reliably detect. Pass
203
+ ``--foreground`` (or ``HERMES_WEBUI_FOREGROUND=1``) explicitly:
204
+
205
+ - **runit** (without sd_notify) — pure runit chains
206
+ - **daemontools** / ``svc``
207
+ - **PM2** (Node.js process manager occasionally repurposed for Python)
208
+ - **Foreman** / **Honcho** (Procfile-style)
209
+ - **Docker** with a custom CMD entrypoint that doesn't already use ``exec``
210
+ - **Custom shell-script supervisors** that fork-and-wait
211
+
212
+ If your supervisor isn't in the auto-detect list and you see the orphan-PID
213
+ respawn loop, set ``HERMES_WEBUI_FOREGROUND=1`` in the service environment.
214
+
215
+ ## Diagnostic recipe
216
+
217
+ If the Web UI keeps getting respawned and you suspect the double-fork loop:
218
+
219
+ ```bash
220
+ # Check the running PID for the server
221
+ lsof -iTCP:8787 -sTCP:LISTEN
222
+
223
+ # Get its parent — should be the supervisor itself, NOT init (PID 1)
224
+ PID=$(lsof -tiTCP:8787 -sTCP:LISTEN)
225
+ ps -p "$PID" -o pid,ppid,cmd
226
+ ps -p "$(ps -o ppid= -p "$PID" | tr -d ' ')" -o pid,cmd
227
+ ```
228
+
229
+ A healthy foreground-mode setup looks like:
230
+
231
+ ```
232
+ PID PPID CMD
233
+ 12345 6789 /path/to/python /path/to/server.py
234
+ 6789 1 /sbin/launchd # or /usr/lib/systemd/systemd, etc.
235
+ ```
236
+
237
+ If PPID is ``1`` (init) when it should be the supervisor, the orphan-server
238
+ loop is happening — re-check that ``--foreground`` (or one of the env vars)
239
+ is reaching the process.
240
+
241
+ ## HTTP watchdog / deep health
242
+
243
+ ``KeepAlive`` / ``Restart=always`` only recover a process that exits. If the
244
+ process is still listening on the port but request handling is wedged, pair your
245
+ supervisor with an HTTP probe and force a restart when the probe fails.
246
+
247
+ Hermes Web UI exposes two health levels:
248
+
249
+ - ``/health`` — cheap liveness probe with ``active_streams``, uptime, and an
250
+ ``accept_loop`` heartbeat counter.
251
+ - ``/health?deep=1`` — readiness probe that briefly acquires the stream lock,
252
+ reads the sidebar/session path, reads projects state, and touches Hermes
253
+ ``state.db`` if it exists. Use this for watchdogs.
254
+
255
+ At startup the server also tries to raise its file-descriptor soft limit to
256
+ 4096 on platforms that support ``RLIMIT_NOFILE``. That is defense in depth for
257
+ persistent hosts: leaks should still be fixed, but a higher soft limit gives
258
+ you more diagnostic headroom before request handling falls over.
259
+
260
+ Minimal macOS launchd watchdog script:
261
+
262
+ ```bash
263
+ #!/usr/bin/env bash
264
+ set -euo pipefail
265
+ LABEL="com.example.hermes-webui"
266
+ BASE="http://127.0.0.1:8787"
267
+
268
+ if ! curl -fsS --max-time 10 "$BASE/health?deep=1" >/dev/null; then
269
+ launchctl kickstart -k "gui/$(id -u)/$LABEL"
270
+ fi
271
+ ```
272
+
273
+ Run it every few minutes from a separate ``StartInterval`` LaunchAgent. For
274
+ systemd, prefer a timer/service pair that runs the same curl probe and
275
+ ``systemctl --user restart hermes-webui.service`` on failure.
276
+
277
+ The ``accept_loop.requests_total`` value should increase when probes arrive. If
278
+ it stays flat while the process is still alive, the server accept loop is not
279
+ making progress; capture logs/thread samples before restarting if you are
280
+ collecting diagnostics for a bug report.