@dp-pcs/ogp 0.3.3 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (95) hide show
  1. package/README.md +275 -49
  2. package/dist/cli/completion.d.ts +5 -0
  3. package/dist/cli/completion.d.ts.map +1 -0
  4. package/dist/cli/completion.js +148 -0
  5. package/dist/cli/completion.js.map +1 -0
  6. package/dist/cli/config.d.ts +3 -0
  7. package/dist/cli/config.d.ts.map +1 -0
  8. package/dist/cli/config.js +207 -0
  9. package/dist/cli/config.js.map +1 -0
  10. package/dist/cli/expose.d.ts.map +1 -1
  11. package/dist/cli/expose.js +20 -13
  12. package/dist/cli/expose.js.map +1 -1
  13. package/dist/cli/federation.d.ts.map +1 -1
  14. package/dist/cli/federation.js +182 -6
  15. package/dist/cli/federation.js.map +1 -1
  16. package/dist/cli/setup.d.ts +19 -0
  17. package/dist/cli/setup.d.ts.map +1 -1
  18. package/dist/cli/setup.js +507 -32
  19. package/dist/cli/setup.js.map +1 -1
  20. package/dist/cli.js +348 -32
  21. package/dist/cli.js.map +1 -1
  22. package/dist/daemon/agent-comms.d.ts.map +1 -1
  23. package/dist/daemon/agent-comms.js +14 -9
  24. package/dist/daemon/agent-comms.js.map +1 -1
  25. package/dist/daemon/intent-registry.d.ts.map +1 -1
  26. package/dist/daemon/intent-registry.js +7 -4
  27. package/dist/daemon/intent-registry.js.map +1 -1
  28. package/dist/daemon/keypair.d.ts.map +1 -1
  29. package/dist/daemon/keypair.js +34 -13
  30. package/dist/daemon/keypair.js.map +1 -1
  31. package/dist/daemon/message-handler.d.ts.map +1 -1
  32. package/dist/daemon/message-handler.js +7 -0
  33. package/dist/daemon/message-handler.js.map +1 -1
  34. package/dist/daemon/notify.d.ts +19 -0
  35. package/dist/daemon/notify.d.ts.map +1 -1
  36. package/dist/daemon/notify.js +329 -73
  37. package/dist/daemon/notify.js.map +1 -1
  38. package/dist/daemon/openclaw-bridge.d.ts +34 -0
  39. package/dist/daemon/openclaw-bridge.d.ts.map +1 -0
  40. package/dist/daemon/openclaw-bridge.js +261 -0
  41. package/dist/daemon/openclaw-bridge.js.map +1 -0
  42. package/dist/daemon/peers.d.ts +8 -0
  43. package/dist/daemon/peers.d.ts.map +1 -1
  44. package/dist/daemon/peers.js +48 -14
  45. package/dist/daemon/peers.js.map +1 -1
  46. package/dist/daemon/projects.d.ts.map +1 -1
  47. package/dist/daemon/projects.js +7 -4
  48. package/dist/daemon/projects.js.map +1 -1
  49. package/dist/daemon/server.d.ts +16 -0
  50. package/dist/daemon/server.d.ts.map +1 -1
  51. package/dist/daemon/server.js +147 -46
  52. package/dist/daemon/server.js.map +1 -1
  53. package/dist/shared/config.d.ts +52 -1
  54. package/dist/shared/config.d.ts.map +1 -1
  55. package/dist/shared/config.js +18 -11
  56. package/dist/shared/config.js.map +1 -1
  57. package/dist/shared/framework-detection.d.ts +31 -0
  58. package/dist/shared/framework-detection.d.ts.map +1 -0
  59. package/dist/shared/framework-detection.js +91 -0
  60. package/dist/shared/framework-detection.js.map +1 -0
  61. package/dist/shared/help.d.ts +5 -0
  62. package/dist/shared/help.d.ts.map +1 -0
  63. package/dist/shared/help.js +280 -0
  64. package/dist/shared/help.js.map +1 -0
  65. package/dist/shared/meta-config.d.ts +44 -0
  66. package/dist/shared/meta-config.d.ts.map +1 -0
  67. package/dist/shared/meta-config.js +89 -0
  68. package/dist/shared/meta-config.js.map +1 -0
  69. package/dist/shared/migration.d.ts +57 -0
  70. package/dist/shared/migration.d.ts.map +1 -0
  71. package/dist/shared/migration.js +255 -0
  72. package/dist/shared/migration.js.map +1 -0
  73. package/docs/CLI-REFERENCE.md +1360 -0
  74. package/docs/GETTING-STARTED.md +942 -0
  75. package/docs/MIGRATION.md +202 -0
  76. package/docs/MULTI-FRAMEWORK-DEMO.md +352 -0
  77. package/docs/MULTI-FRAMEWORK-DESIGN.md +378 -0
  78. package/docs/MULTI-FRAMEWORK-IMPL.md +197 -0
  79. package/docs/case-studies/CRASH_RESOLUTION_20260407.md +190 -0
  80. package/docs/case-studies/OpenClaw_Hermes_Status_Report_20260407.md +142 -0
  81. package/docs/case-studies/OpenClaw_Stability_Fix_Summary.md +209 -0
  82. package/docs/case-studies/README.md +40 -0
  83. package/docs/case-studies/crash_observations.md +250 -0
  84. package/docs/federation-flow.md +21 -31
  85. package/docs/hermes-implementation-checklist.md +4 -0
  86. package/docs/rendezvous.md +13 -14
  87. package/package.json +9 -3
  88. package/scripts/completion.bash +123 -0
  89. package/scripts/completion.zsh +372 -0
  90. package/scripts/test-migration-execute.js +74 -0
  91. package/scripts/test-migration.js +42 -0
  92. package/skills/ogp/SKILL.md +197 -64
  93. package/skills/ogp-agent-comms/SKILL.md +107 -41
  94. package/skills/ogp-expose/SKILL.md +84 -21
  95. package/skills/ogp-project/SKILL.md +66 -58
@@ -0,0 +1,190 @@
1
+ # OpenClaw Crash Resolution
2
+ **Date:** April 7, 2026
3
+ **Status:** ✅ RESOLVED with mitigations
4
+
5
+ > **Note:** This document is a sanitized version of internal debugging notes. System-specific details have been generalized. Original created during OGP development to document OpenClaw regression debugging.
6
+
7
+ ---
8
+
9
+ ## Quick Summary
10
+
11
+ Your OpenClaw crashes were caused by **TWO KNOWN BUGS in version 2026.4.5** - NOT by your OGP work or dual-assistant setup. Multiple GitHub issues filed by other users in the last 1-2 days confirm this.
12
+
13
+ **Fixes implemented:**
14
+ 1. ✅ Wrapper script with 8GB heap limit
15
+ 2. ✅ All cron jobs disabled
16
+ 3. ✅ BrainLift plugin disabled
17
+ 4. ✅ Gateway auto-restart enabled
18
+
19
+ **Current Status:** Gateway running stable with mitigations. Exec lifecycle bug may still cause occasional crashes but will auto-restart.
20
+
21
+ ---
22
+
23
+ ## The Bugs
24
+
25
+ ### Bug #1: Exec Lifecycle Crash
26
+ **GitHub Issues:** [#62137](https://github.com/openclaw/openclaw/issues/62137), [#61592](https://github.com/openclaw/openclaw/issues/61592), [#61812](https://github.com/openclaw/openclaw/issues/61812)
27
+
28
+ **Error:** `Unhandled promise rejection: Error: Agent listener invoked outside active run`
29
+
30
+ **Cause:** Regression in 2026.4.5 where background exec process stdout crashes gateway after agent run completes
31
+
32
+ **Platforms Affected:** Linux, Windows, macOS (all platforms)
33
+
34
+ **Your Impact:** Crashed every 10-60 minutes during normal operations
35
+
36
+ **Mitigation:** Wrapper script enables auto-restart; upstream fix pending
37
+
38
+ ### Bug #2: Browser Automation OOM
39
+ **Error:** `FATAL ERROR: JavaScript heap out of memory`
40
+
41
+ **Cause:** Default 4GB V8 heap limit too small for heavy browser automation
42
+
43
+ **Your Impact:** Crashed after 2+ hours of browser activity
44
+
45
+ **Fix:** Increased heap to 8GB via `--max-old-space-size=8192` flag
46
+
47
+ ### Bug #3: Cron Job API Key Failures
48
+ **Error:** `401 Incorrect API key provided` (env var expansion failing)
49
+
50
+ **Cause:** Environment variable evaluation failing when cron jobs execute
51
+
52
+ **Your Impact:** Cron job running every 5 minutes triggering cascading failures
53
+
54
+ **Fix:** Disabled all cron jobs and BrainLift plugin
55
+
56
+ ---
57
+
58
+ ## What We Did
59
+
60
+ ### 1. Created Gateway Wrapper Script
61
+
62
+ **File:** `$HOME/.openclaw/bin/gateway-wrapper.sh`
63
+
64
+ **What it does:**
65
+ ```bash
66
+ #!/bin/bash
67
+ # - Sets all environment variables
68
+ # - Launches gateway with 8GB heap limit
69
+ # - Enables auto-restart via LaunchAgent
70
+ exec <node-path>/bin/node --max-old-space-size=8192 \
71
+ <openclaw-path>/dist/index.js gateway --port <port>
72
+ ```
73
+
74
+ ### 2. Updated LaunchAgent
75
+
76
+ **File:** `$HOME/Library/LaunchAgents/ai.openclaw.gateway.plist`
77
+
78
+ **Change:** Now calls wrapper script instead of node directly
79
+
80
+ ### 3. Disabled Cron Jobs
81
+
82
+ **Files:**
83
+ - `$HOME/.openclaw/openclaw.json` - BrainLift disabled
84
+ - `$HOME/.openclaw/cron/jobs.json` - All cron jobs disabled
85
+
86
+ ---
87
+
88
+ ## OGP Cleared of Suspicion
89
+
90
+ **Verdict:** Your OGP work is NOT causing the crashes.
91
+
92
+ **Evidence:**
93
+ - Same bugs reported by users not using OGP
94
+ - GitHub issues filed 1-2 days ago across all platforms
95
+ - Crashes occur with zero OGP activity
96
+ - Known regressions in OpenClaw 2026.4.5
97
+
98
+ **Your dual-assistant setup (OpenClaw + Hermes) may have exposed the bugs faster due to higher load, but didn't create them.**
99
+
100
+ ---
101
+
102
+ ## Current Gateway Status
103
+
104
+ - **PID:** [Running]
105
+ - **Port:** ✅ listening
106
+ - **Heap Limit:** 8GB (doubled from 4GB)
107
+ - **Wrapper:** ✅ Active
108
+ - **Cron Jobs:** ✅ Disabled
109
+ - **BrainLift:** ✅ Disabled
110
+ - **LaunchAgent:** ✅ Auto-restart enabled
111
+
112
+ **Uptime:** Started and currently stable
113
+
114
+ ---
115
+
116
+ ## Expected Behavior
117
+
118
+ **Fixed:**
119
+ - ✅ No more cron-triggered crashes
120
+ - ✅ No more browser OOM crashes (unless you exceed 8GB heap)
121
+ - ✅ Auto-restart on any crash
122
+
123
+ **Still Possible:**
124
+ - ⚠️ Exec lifecycle bug may still crash gateway occasionally
125
+ - When this happens, LaunchAgent will auto-restart within seconds
126
+
127
+ ---
128
+
129
+ ## Monitoring
130
+
131
+ **Check gateway status:**
132
+ ```bash
133
+ launchctl list | grep openclaw
134
+ lsof -i :<port>
135
+ ps aux | grep openclaw-gateway
136
+ ```
137
+
138
+ **Watch for crashes:**
139
+ ```bash
140
+ tail -f ~/.openclaw/logs/gateway.err.log | grep -E "unhandled|FATAL"
141
+ ```
142
+
143
+ **Success metrics:**
144
+ - Uptime > 24 hours without manual intervention
145
+ - No API key errors in logs
146
+ - Auto-restart working if crashes occur
147
+
148
+ ---
149
+
150
+ ## Rollback (if needed)
151
+
152
+ ```bash
153
+ # Restore original LaunchAgent
154
+ cp $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist.backup-* \
155
+ $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist
156
+
157
+ # Restore cron jobs
158
+ cp $HOME/.openclaw/cron/jobs.json.backup-* \
159
+ $HOME/.openclaw/cron/jobs.json
160
+
161
+ # Reload
162
+ launchctl unload $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist
163
+ launchctl load $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist
164
+ ```
165
+
166
+ Or downgrade to OpenClaw 2026.4.2:
167
+ ```bash
168
+ npm install -g openclaw@2026.4.2
169
+ ```
170
+
171
+ ---
172
+
173
+ ## Documentation
174
+
175
+ **Full Details:** See `OpenClaw_Stability_Fix_Summary.md`
176
+ **Original Analysis:** See `crash_observations.md`
177
+ **Status Report:** See `OpenClaw_Hermes_Status_Report_20260407.md`
178
+
179
+ **GitHub Issues to Watch:**
180
+ - https://github.com/openclaw/openclaw/issues/62137
181
+ - https://github.com/openclaw/openclaw/issues/61592
182
+ - https://github.com/openclaw/openclaw/issues/61812
183
+ - https://github.com/openclaw/openclaw/issues/61733
184
+
185
+ ---
186
+
187
+ **Resolution Date:** April 7, 2026
188
+ **Gateway Status:** ✅ Running with mitigations
189
+ **Next Check:** Monitor for 24 hours
190
+ **Sanitized for Publication:** April 8, 2026
@@ -0,0 +1,142 @@
1
+ # OpenClaw & Hermes — Status Report
2
+ **Date:** April 7, 2026
3
+
4
+ > **Note:** This document is a sanitized version of internal status reporting. System-specific paths, PIDs, and operational details have been generalized. Created during OGP development to compare gateway stability.
5
+
6
+ ---
7
+
8
+ ## Executive Summary
9
+
10
+ Both local AI gateways (OpenClaw, Hermes) were evaluated. OpenClaw crashed multiple times in 24 hours from two distinct bugs. Hermes has been stable. A fact-check of an agent-drafted comparison article revealed it was substantially wrong. Config changes were made to switch OpenClaw's primary model to GPT-5.4 and adjust provider configurations.
11
+
12
+ ---
13
+
14
+ ## OpenClaw Gateway — Crash Analysis
15
+
16
+ ### Crash #1: OOM (April 6, evening)
17
+
18
+ **Root cause:** The BrainLift plugin kicked off its nightly run for all 5 agents simultaneously. The agents hit Anthropic's rate limit (429) on Claude Sonnet 4.6. The embedded agent runner retried aggressively with no backoff ceiling and no memory cleanup between attempts. Heap grew to 4.08 GB and Node.js SIGABRT'd.
19
+
20
+ **Contributing factors:**
21
+ - All 5 agents scheduled at the same time
22
+ - No exponential backoff on 429 retries
23
+ - Default V8 heap limit (4 GB) with no `--max-old-space-size` override
24
+ - Auth error mixed in (API key issue)
25
+
26
+ **Evidence:** Logs showed repeated 429s, followed by `FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory`
27
+
28
+ ### Crash #2: Unhandled Promise Rejection (April 7, afternoon)
29
+
30
+ **Root cause:** Bug in `pi-agent-core` — the exec tool's stdout handler fired a callback after the agent run had already ended. The gateway's global unhandled rejection handler treated this as fatal.
31
+
32
+ **Stack trace origin:** `Agent.processEvents` in `pi-agent-core/src/agent.ts:533` — "Agent listener invoked outside active run"
33
+
34
+ **Trigger:** Agent was editing files via the exec tool when the run completed but the exec process continued emitting stdout.
35
+
36
+ ### Crash #3: Same as #2 (April 7, evening)
37
+
38
+ Identical stack trace. Same exec lifecycle bug. Reproducible.
39
+
40
+ ---
41
+
42
+ ## OpenClaw — Config Issues Found
43
+
44
+ ### 1. LaunchAgent Environment Variables Don't Work
45
+
46
+ The LaunchAgent plist uses `$(security find-generic-password ...)` shell expansion syntax for API keys. **This doesn't work in launchd plists** — plist values are literal strings, not shell-evaluated. Keychain-derived env vars are empty when launched via launchd.
47
+
48
+ **Impact:** Gateway starts without API keys → auth failures → retry loops → OOM.
49
+
50
+ **Fix needed:** Either use a wrapper shell script in the plist that resolves keys before exec'ing the gateway, or store keys directly in the plist (less secure).
51
+
52
+ ### 2. Kimi Provider Configuration Issue
53
+
54
+ The Kimi direct provider was referencing an API key that wasn't properly configured. The gateway's secret resolver treated this as a hard failure.
55
+
56
+ **Current workaround:** Disabled the kimi plugin, removed kimi auth profile. The Fireworks-routed Kimi K2.5 still works via FIREWORKS_API_KEY.
57
+
58
+ ### 3. Skills Loading Issues
59
+
60
+ On every startup, many skills log `"Skipping skill path that resolves outside its configured root."` These are likely symlinks or relative path references. A significant portion of the skill set is silently not loading.
61
+
62
+ **Impact:** Agent capabilities reduced without any user-visible error.
63
+
64
+ ### 4. BrainLift Double-Fires
65
+
66
+ The BrainLift plugin logged two `"starting nightly run"` entries within seconds of each other — running the full 5-agent sweep twice. This doubles API usage and compounds the rate limit problem.
67
+
68
+ ---
69
+
70
+ ## Changes Made This Session
71
+
72
+ | Change | File | Detail |
73
+ |--------|------|--------|
74
+ | Primary model → GPT-5.4 | `openclaw.json` | Was `anthropic/claude-sonnet-4-6` |
75
+ | Fallback chain updated | `openclaw.json` | Multiple fallback providers configured |
76
+ | `openai/gpt-5.4` added to models | `openclaw.json` | New model entry with Responses API |
77
+ | Kimi plugin disabled | `openclaw.json` | `plugins.entries.kimi.enabled: false` |
78
+ | Kimi auth profile removed | `openclaw.json` | Removed kimi auth profile |
79
+ | Gateway started with 8GB heap | Manual launch | `--max-old-space-size=8192` |
80
+ | Logs truncated | `logs/` | Log rotation applied |
81
+
82
+ ---
83
+
84
+ ## Hermes Gateway — Status
85
+
86
+ Hermes has been stable throughout. Running Python 3.11, `hermes gateway run --replace`. Port responding (403 on unauthenticated requests, expected). Low resource usage.
87
+
88
+ OGP bridge process also running.
89
+
90
+ No crashes, no issues.
91
+
92
+ ---
93
+
94
+ ## Article Fact-Check: "Hermes vs OpenClaw"
95
+
96
+ An agent-drafted article was **substantially wrong**. Its central thesis — "OpenClaw is desktop-first, Hermes is cloud-native" — is fabricated. Both are local daemons running on the same machine.
97
+
98
+ **Key errors corrected:**
99
+ - Hermes is NOT cloud-hosted (it's a local Python process)
100
+ - Hermes storage is NOT cloud-backed (it's local SQLite + markdown)
101
+ - Hermes skills are NOT synced cloud storage (local filesystem)
102
+ - Hermes does NOT have built-in public endpoints (needs tunnels like OpenClaw)
103
+ - "Turn off your phone and federation continues" is false (machine off = Hermes off)
104
+
105
+ **Corrected article delivered** with fact-checked claims against live configs and running processes.
106
+
107
+ ---
108
+
109
+ ## Recommended Actions
110
+
111
+ ### Immediate (Stability)
112
+
113
+ 1. **Fix the exec lifecycle crash** — File issue against `pi-agent-core`. The unhandled rejection in `Agent.processEvents` when exec stdout fires after run completion is a repeatable crasher. Until fixed, the gateway will keep dying.
114
+
115
+ 2. **Fix LaunchAgent env vars** — Replace the `$(...)` plist values with a wrapper script:
116
+ ```bash
117
+ #!/bin/bash
118
+ export ANTHROPIC_API_KEY=$(security find-generic-password ...)
119
+ # ... other keys ...
120
+ exec <node-path> --max-old-space-size=8192 \
121
+ <openclaw-path> gateway --port <port>
122
+ ```
123
+ Point the plist's ProgramArguments at this script instead of node directly.
124
+
125
+ 3. **Add `--max-old-space-size=8192`** to the LaunchAgent permanently (via the wrapper script above).
126
+
127
+ ### Short-term (Reliability)
128
+
129
+ 4. **Stagger BrainLift agent runs** — Don't fire all agents at the same cron tick. Space them apart to avoid rate limit contention.
130
+
131
+ 5. **Investigate the skipped skills issue** — Check for broken symlinks or path traversal in skills directory. These represent a significant portion of the skill set not loading.
132
+
133
+ ### Medium-term (Resilience)
134
+
135
+ 6. **Request backoff/retry ceiling in embedded agent runner** — The 429 retry loop with no backoff is the #1 contributor to OOM crashes. Needs exponential backoff + max retry count + memory cleanup between attempts.
136
+
137
+ 7. **Add process supervision** — Current state: launchd throttles after crash, manual nohup doesn't survive reboot. Consider a wrapper that catches SIGABRT and restarts with a cooldown.
138
+
139
+ ---
140
+
141
+ **Document Created:** April 7, 2026
142
+ **Sanitized for Publication:** April 8, 2026
@@ -0,0 +1,209 @@
1
+ # OpenClaw Stability Fix Summary
2
+ **Date:** April 7, 2026
3
+ **Status:** RESOLVED - Mitigations Implemented
4
+
5
+ > **Note:** This document is a sanitized version of internal debugging notes. File paths, process IDs, and system-specific details have been generalized. The original was created during OGP development to debug an unrelated OpenClaw regression.
6
+
7
+ ---
8
+
9
+ ## Problem Summary
10
+
11
+ OpenClaw gateway (v2026.4.5) was crashing every 10-60 minutes with two distinct failure modes:
12
+
13
+ 1. **Exec Lifecycle Bug** - "Agent listener invoked outside active run" error
14
+ 2. **Browser Automation OOM** - V8 heap exhaustion from heavy browser use
15
+
16
+ ---
17
+
18
+ ## Root Cause Analysis
19
+
20
+ ### Bug #1: Exec Lifecycle Crash (CRITICAL)
21
+
22
+ **Status:** **KNOWN BUG in OpenClaw 2026.4.5** - Regression from 2026.4.2
23
+
24
+ **Error:** `Unhandled promise rejection: Error: Agent listener invoked outside active run`
25
+
26
+ **GitHub Issues:**
27
+ - [#62137](https://github.com/openclaw/openclaw/issues/62137) - Exec/PTY unhandled promise rejection
28
+ - [#61592](https://github.com/openclaw/openclaw/issues/61592) - Background exec process crashes
29
+ - [#61812](https://github.com/openclaw/openclaw/issues/61812) - Regression in 2026.4.5
30
+ - [#61733](https://github.com/openclaw/openclaw/issues/61733) - Windows crashes with same error
31
+
32
+ **Technical Details:**
33
+ When a background exec process emits stdout after the agent run has completed, the gateway crashes instead of safely ignoring or buffering the output. The `pi-agent-core` library's `Agent.processEvents` method throws when called outside an active run context.
34
+
35
+ **Trigger Scenarios:**
36
+ - File operations
37
+ - Long-running exec processes
38
+ - Bash tools calling `openclaw message send`
39
+ - Cron jobs spawning exec sessions
40
+
41
+ **Impact:** Gateway crashes every 10-60 minutes during normal operation
42
+
43
+ ### Bug #2: Browser Automation OOM
44
+
45
+ **Error:** `FATAL ERROR: v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath Allocation failed - JavaScript heap out of memory`
46
+
47
+ **Root Cause:** Heavy browser automation creates large serialized objects that overflow the default V8 heap limit (4GB)
48
+
49
+ **Impact:** Gateway crashes after extended browser automation sessions (2-4 hours)
50
+
51
+ ### Bug #3: Cron Job API Key Failures (FIXED)
52
+
53
+ **Status:** RESOLVED by disabling cron jobs
54
+
55
+ **Error:** `401 Incorrect API key provided` (environment variable expansion failing)
56
+
57
+ **Root Cause:** Environment variable evaluation failing in LaunchAgent context when cron jobs execute, triggering cascading model fallback failures and eventual OOM
58
+
59
+ **Fix:** Disabled all cron jobs + BrainLift plugin
60
+
61
+ ---
62
+
63
+ ## Solutions Implemented
64
+
65
+ ### ✅ Solution #1: Wrapper Script with 8GB Heap Limit
66
+
67
+ **File:** `$HOME/.openclaw/bin/gateway-wrapper.sh`
68
+
69
+ **What it does:**
70
+ - Sets all required environment variables explicitly
71
+ - Launches gateway with `--max-old-space-size=8192` (8GB heap limit)
72
+ - Provides logging for debugging
73
+
74
+ **LaunchAgent Integration:**
75
+ Updated LaunchAgent plist to use wrapper instead of calling node directly
76
+
77
+ **Benefits:**
78
+ - Doubles heap limit to prevent browser OOM crashes
79
+ - Ensures env vars are always set correctly
80
+ - Survives OpenClaw updates (wrapper script is outside node_modules)
81
+
82
+ ### ✅ Solution #2: Disabled All Cron Jobs
83
+
84
+ **Files Modified:**
85
+ - `$HOME/.openclaw/openclaw.json` - BrainLift plugin disabled
86
+ - `$HOME/.openclaw/cron/jobs.json` - All cron jobs disabled
87
+
88
+ **Impact:**
89
+ - Eliminates cron-triggered API key evaluation failures
90
+ - Prevents BrainLift OOM crashes from simultaneous agent runs
91
+ - Stops scheduled jobs that were triggering crashes
92
+
93
+ ### ✅ Solution #3: API Keys in Config File
94
+
95
+ **Status:** Already fixed via config modification
96
+
97
+ **What happened:** OpenClaw config's `env` section was modified to include API keys directly instead of shell command expansion
98
+
99
+ **Effect:** Environment variables now always available, preventing auth cascades
100
+
101
+ ---
102
+
103
+ ## Remaining Issues
104
+
105
+ ### ⚠️ Exec Lifecycle Bug - NOT FIXED, MITIGATED
106
+
107
+ **Status:** Waiting for OpenClaw developers to fix in pi-agent-core
108
+
109
+ **Mitigation:** Gateway will still crash when exec lifecycle bug triggers, but LaunchAgent will auto-restart it
110
+
111
+ **Upstream Fix Options:**
112
+ 1. Wait for OpenClaw team to release patch
113
+ 2. Roll back to 2026.4.2 (workaround mentioned in GitHub issues)
114
+ 3. Avoid file operations that trigger long-running exec processes
115
+
116
+ **Recommended Action:** Monitor for OpenClaw 2026.4.6 or later that fixes these issues
117
+
118
+ ---
119
+
120
+ ## OGP Correlation
121
+
122
+ **Conclusion:** OGP work is **NOT** the cause of crashes
123
+
124
+ **Evidence:**
125
+ - Both bugs are known OpenClaw 2026.4.5 regressions affecting all users
126
+ - Crashes occur with zero OGP activity
127
+ - GitHub issues filed by users not using OGP
128
+ - Dual-assistant setup (OpenClaw + Hermes) may have exposed bugs faster due to higher load, but didn't create them
129
+
130
+ ---
131
+
132
+ ## Current Status
133
+
134
+ **Gateway:** ✅ Running
135
+ **Heap Limit:** ✅ 8GB (doubled from default 4GB)
136
+ **Cron Jobs:** ✅ Disabled
137
+ **BrainLift:** ✅ Disabled
138
+ **Wrapper Script:** ✅ Active via LaunchAgent
139
+
140
+ **Expected Stability:**
141
+ - ✅ No more cron-triggered crashes
142
+ - ✅ No more browser OOM crashes (unless >8GB heap usage)
143
+ - ⚠️ Exec lifecycle bug may still cause occasional crashes (auto-restart enabled)
144
+
145
+ ---
146
+
147
+ ## Testing & Monitoring
148
+
149
+ **To verify stability:**
150
+
151
+ ```bash
152
+ # Check gateway status
153
+ launchctl list | grep openclaw
154
+ lsof -i :<gateway-port>
155
+
156
+ # Monitor for crashes
157
+ tail -f ~/.openclaw/logs/gateway.err.log | grep -E "unhandled|crash|FATAL"
158
+
159
+ # Check uptime
160
+ ps aux | grep openclaw-gateway
161
+ ```
162
+
163
+ **Success Metrics:**
164
+ - Gateway uptime > 24 hours without manual restart
165
+ - No API key evaluation errors in logs
166
+ - No OOM crashes during browser automation
167
+
168
+ ---
169
+
170
+ ## Rollback Instructions
171
+
172
+ If issues persist, to rollback:
173
+
174
+ ```bash
175
+ # Restore original LaunchAgent
176
+ cp $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist.backup-* \
177
+ $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist
178
+
179
+ # Restore cron jobs
180
+ cp $HOME/.openclaw/cron/jobs.json.backup-* \
181
+ $HOME/.openclaw/cron/jobs.json
182
+
183
+ # Re-enable BrainLift in openclaw.json
184
+ # (manually change "enabled": false to true)
185
+
186
+ # Reload LaunchAgent
187
+ launchctl unload $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist
188
+ launchctl load $HOME/Library/LaunchAgents/ai.openclaw.gateway.plist
189
+ ```
190
+
191
+ Or consider rolling back OpenClaw to 2026.4.2:
192
+ ```bash
193
+ npm install -g openclaw@2026.4.2
194
+ # Note: May require removing plugins.entries.memory-core.config.dreaming from config
195
+ ```
196
+
197
+ ---
198
+
199
+ ## Next Steps
200
+
201
+ 1. ✅ Monitor gateway stability for 24-48 hours
202
+ 2. ⏸️ Wait for OpenClaw 2026.4.6+ release with exec lifecycle fix
203
+ 3. 🔍 Investigate skipped skills issue (low priority)
204
+
205
+ ---
206
+
207
+ **Document Created:** April 7, 2026
208
+ **Last Updated:** April 7, 2026
209
+ **Sanitized for Publication:** April 8, 2026
@@ -0,0 +1,40 @@
1
+ # OGP Development Case Studies
2
+
3
+ This directory contains sanitized debugging notes from real-world OGP development and deployment challenges. These documents capture the messy reality of building federated AI systems — including the false starts, red herrings, and lessons learned.
4
+
5
+ > **⚠️ Note:** These files are sanitized versions of internal debugging notes. System-specific details (file paths, PIDs, API key fragments, port numbers) have been removed or generalized to protect operational security while preserving the technical narrative.
6
+
7
+ ---
8
+
9
+ ## Contents
10
+
11
+ | File | Description |
12
+ |------|-------------|
13
+ | `OpenClaw_Stability_Fix_Summary.md` | Comprehensive analysis of OpenClaw 2026.4.5 regression bugs encountered during OGP development. Includes root cause analysis, mitigations, and wrapper script implementation. |
14
+ | `CRASH_RESOLUTION_20260407.md` | Quick reference guide for the same stability issues — condensed version for immediate action. |
15
+ | `crash_observations.md` | Raw timeline and observations from the debugging session. Shows the iterative process of elimination that ultimately cleared OGP of suspicion. |
16
+ | `OpenClaw_Hermes_Status_Report_20260407.md` | Comparative analysis of OpenClaw vs. Hermes gateway stability during federation testing. Includes fact-check of an AI-drafted article that was substantially wrong. |
17
+
18
+ ---
19
+
20
+ ## Context
21
+
22
+ These documents were created on April 7, 2026, during intensive OGP federation testing. The initial hypothesis was that OGP's dual-assistant setup (OpenClaw + Hermes) was causing gateway instability. **The reality:** OpenClaw 2026.4.5 had known regression bugs affecting all users.
23
+
24
+ **Key Lesson:** When debugging complex systems, correlation is not causation. The OGP work exposed OpenClaw bugs faster due to higher load, but didn't create them.
25
+
26
+ ---
27
+
28
+ ## Related Article
29
+
30
+ The debugging narrative behind these files is documented in:
31
+
32
+ **"[Case Study] When Your AI Tools Keep Crashing: A Meta-Debugging Loop with OpenClaw and Claude"**
33
+
34
+ This Substack article tells the story of using Claude (via Dispatch) to diagnose OpenClaw crashes while OpenClaw was down, then using OpenClaw/Claude Code to fix OGP bugs, then back to Claude when OpenClaw crashed again — a meta-loop that became the only way forward.
35
+
36
+ ---
37
+
38
+ **Why These Are Here:**
39
+
40
+ The article promised these files would be "available in dp-pcs/ogp." Rather than leave them as unverified claims, we're publishing the sanitized source material. Real debugging is messy. Real systems fail in unexpected ways. Federation requires resilience not just in protocol design, but in the development process itself.