@elvatis_com/openclaw-self-healing-elvatis 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,13 @@
1
+ # DASHBOARD (AAHP)
2
+
3
+ | Task | Ready? | Owner | Notes |
4
+ |---|---|---|---|
5
+ | Create GitHub issues | ✅ | Human | Run `gh auth login` then `bash scripts/create-roadmap-issues.sh` |
6
+ | Unit test suite | ✅ | Akido | Highest priority - vitest + tsconfig |
7
+ | TS build pipeline | ✅ | Akido | Add tsconfig.json, build script |
8
+ | Plugin health monitoring | Blocked | Akido | Waiting for `openclaw plugins list --json` |
9
+ | Health status endpoint | ✅ | Akido | Medium priority |
10
+ | Observability events | ✅ | Akido | Medium priority |
11
+ | Dry-run mode | ✅ | Akido | Medium priority |
12
+ | Active recovery probing | ✅ | Akido | Low priority |
13
+ | Config hot-reload | ✅ | Akido | Low priority |
@@ -0,0 +1,4 @@
1
+ # LOG (AAHP)
2
+
3
+ - 2026-02-27: Defined v0.3 roadmap with 8 items (3 high, 3 medium, 2 low). Created ROADMAP.md and scripts/create-roadmap-issues.sh. GitHub issue creation deferred - gh CLI not authenticated.
4
+ - 2026-02-24: Initialized AAHP handoff structure.
@@ -0,0 +1,177 @@
1
+ {
2
+ "files": {
3
+ "NEXT_ACTIONS.md": {
4
+ "checksum": "sha256:pending",
5
+ "lines": 22,
6
+ "updated": "2026-02-27T23:00:00Z",
7
+ "summary": "8 roadmap items defined: 3 high, 3 medium, 2 low priority. GitHub issues created (#1-#8)."
8
+ },
9
+ "ROADMAP.md": {
10
+ "checksum": "sha256:pending",
11
+ "lines": 85,
12
+ "updated": "2026-02-27T23:00:00Z",
13
+ "summary": "Full v0.3 roadmap with 8 items, scoped and prioritized."
14
+ }
15
+ },
16
+ "tasks": {
17
+ "T-001": {
18
+ "created": "2026-02-28T00:00:00Z",
19
+ "priority": "high",
20
+ "title": "Define v0.2 roadmap items as GitHub issues and prioritize",
21
+ "status": "done",
22
+ "depends_on": [],
23
+ "completed": "2026-02-27T13:15:01.997Z"
24
+ },
25
+ "T-002": {
26
+ "created": "2026-02-27T23:00:00Z",
27
+ "priority": "high",
28
+ "title": "Run scripts/create-roadmap-issues.sh after gh auth login",
29
+ "status": "done",
30
+ "depends_on": [],
31
+ "completed": "2026-02-27T13:27:44.739Z"
32
+ },
33
+ "T-003": {
34
+ "created": "2026-02-27T23:00:00Z",
35
+ "priority": "high",
36
+ "title": "Add unit test suite for core healing logic",
37
+ "status": "done",
38
+ "depends_on": [],
39
+ "github_issue": 1,
40
+ "completed": "2026-02-27T14:12:40.386Z"
41
+ },
42
+ "T-004": {
43
+ "created": "2026-02-27T23:00:00Z",
44
+ "priority": "high",
45
+ "title": "Add TypeScript build pipeline and type-checking",
46
+ "status": "done",
47
+ "depends_on": [],
48
+ "github_issue": 2,
49
+ "completed": "2026-03-01T15:01:03.900Z"
50
+ },
51
+ "T-005": {
52
+ "created": "2026-02-27T23:00:00Z",
53
+ "priority": "high",
54
+ "title": "Implement structured plugin health monitoring and auto-disable",
55
+ "status": "blocked",
56
+ "depends_on": [],
57
+ "blocked_reason": "Waiting for openclaw plugins list --json API",
58
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
59
+ "github_issue": 3
60
+ },
61
+ "T-006": {
62
+ "title": "Support configuration hot-reload without gateway restart",
63
+ "status": "done",
64
+ "priority": "low",
65
+ "depends_on": [],
66
+ "created": "2026-02-28T13:25:43.201Z",
67
+ "notes": "## Summary\n\nPlugin config (`modelOrder`, `cooldownMinutes`, `autoFix.*`) is read once at startup via `api.pluginConfig`. Changing any config value requires a full gateway restart.\n\n## Scope\n\n- Watch for config changes (file watch or periodic re-read)\n- Re-read plugin config on change and update internal variables\n- Handle edge cases: invalid new config (keep old), partial updates\n- Alternatively, support a reload command: `openclaw self-heal reload`\n\n## Acceptance criteria\n\n- Config changes take",
68
+ "github_issue": 8,
69
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
70
+ "completed": "2026-02-28T13:42:34.823Z"
71
+ },
72
+ "T-007": {
73
+ "title": "Add active model recovery probing to shorten cooldown periods",
74
+ "status": "done",
75
+ "priority": "medium",
76
+ "depends_on": [],
77
+ "created": "2026-02-28T13:25:43.204Z",
78
+ "notes": "## Summary\n\nModels in cooldown are currently recovered passively: `pickFallback()` checks `nextAvailableAt` only when a new failure occurs. If a model recovers early, the plugin still uses the fallback until the full cooldown expires.\n\n## Scope\n\n- Add a periodic probe (e.g., every 5 minutes) that tests cooldown models\n- Use a lightweight API call (e.g., model info endpoint or small completion)\n- If the model responds successfully, remove it from cooldown early\n- Configurable probe interval and e",
79
+ "github_issue": 7,
80
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
81
+ "completed": "2026-02-28T14:42:14.373Z"
82
+ },
83
+ "T-008": {
84
+ "title": "Add dry-run mode for safe validation of healing logic",
85
+ "status": "done",
86
+ "priority": "medium",
87
+ "depends_on": [],
88
+ "created": "2026-02-28T13:25:43.204Z",
89
+ "notes": "## Summary\n\nThere is no way to test what the plugin would do without it actually taking healing actions (restarting gateways, disabling crons, patching sessions). A dry-run mode is needed for validation and debugging.\n\n## Scope\n\n- Add a `dryRun: boolean` config option (default: false)\n- When enabled: log all actions that _would_ be taken, but do not execute them\n- State tracking still updates (to test state transitions) but side-effects are skipped\n- Useful for: initial setup validation, debuggi",
90
+ "github_issue": 6,
91
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
92
+ "completed": "2026-02-28T15:14:21.885Z"
93
+ },
94
+ "T-009": {
95
+ "title": "Emit structured observability events for heal actions",
96
+ "status": "done",
97
+ "priority": "medium",
98
+ "depends_on": [],
99
+ "created": "2026-02-28T13:25:43.204Z",
100
+ "notes": "## Summary\n\nThe plugin uses `api.logger` for logging but emits no structured events. Monitoring and alerting systems cannot track heal actions, cooldown entries, or failure rates.\n\n## Scope\n\n- Emit structured events via `api.emit()` or equivalent for:\n - `self-heal:model-cooldown` - model put into cooldown (with model ID, reason, duration)\n - `self-heal:session-patched` - session model pin overridden (with session key, old/new model)\n - `self-heal:whatsapp-restart` - WhatsApp gateway restarte",
101
+ "github_issue": 5,
102
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
103
+ "completed": "2026-02-28T15:23:54.846Z"
104
+ },
105
+ "T-010": {
106
+ "title": "Expose self-heal status for external monitoring",
107
+ "status": "done",
108
+ "priority": "medium",
109
+ "depends_on": [],
110
+ "created": "2026-02-28T13:25:43.204Z",
111
+ "notes": "## Summary\n\nExternal tools (dashboards, other plugins, CLI) cannot query the self-heal plugin state. There is no way to know which models are in cooldown, how many heals have occurred, or the current WhatsApp connection status without reading the raw state file.\n\n## Scope\n\n- Register a plugin command or API endpoint (depends on openclaw plugin API):\n - `openclaw self-heal status` or similar\n - Returns JSON with: cooldown models, WhatsApp status, cron heal history, last heal actions\n- Alternati",
112
+ "github_issue": 4,
113
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
114
+ "completed": "2026-02-28T16:15:24.131Z"
115
+ },
116
+ "T-011": {
117
+ "title": "Add integration tests for monitor service tick flows",
118
+ "status": "done",
119
+ "priority": "low",
120
+ "depends_on": [],
121
+ "created": "2026-03-02T00:00:00Z",
122
+ "notes": "## Summary\n\nThe current test suite has 122 unit tests covering helpers and pure functions (parseConfig, pickFallback, buildStatusSnapshot, etc.) but the monitor service tick() function runs all healing logic and is not covered by integration tests.\n\n## Scope\n\n- Add integration tests for the full monitor tick cycle using a mocked api object\n- Cover: WhatsApp disconnect streak -> restart path\n- Cover: cron failure accumulation -> disable + issue create path\n- Cover: active model recovery probe -> cooldown removal path\n- Cover: config hot-reload during tick\n- Cover: dry-run flag suppresses all side-effects\n- Use Jest timer mocks for setInterval control\n\n## Acceptance criteria\n\n- At least 20 new integration-level tests added\n- All healing paths (WhatsApp, cron, probe) exercised in tests\n- Tests pass with npm test",
123
+ "github_issue": 9,
124
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
125
+ "completed": "2026-03-02T02:17:03.808Z"
126
+ },
127
+ "T-012": {
128
+ "title": "Add startup configuration validation with fail-fast behavior",
129
+ "status": "done",
130
+ "priority": "low",
131
+ "depends_on": [],
132
+ "created": "2026-03-02T00:00:00Z",
133
+ "notes": "## Summary\n\nCurrently parseConfig() silently falls back to defaults for any invalid value. The plugin starts even if modelOrder is empty, paths are not writable, or cooldownMinutes is set to an absurd value. This makes misconfiguration hard to diagnose.\n\n## Scope\n\n- Add a validateConfig(config: PluginConfig): { valid: boolean; errors: string[] } function\n- Validate: modelOrder has at least one entry\n- Validate: cooldownMinutes is between 1 and 10080 (1 week)\n- Validate: probeIntervalSec >= 60\n- Validate: whatsappMinRestartIntervalSec >= 60\n- Validate: state file directory is writable (best-effort)\n- Log each validation error clearly with api.logger?.error\n- Return early from register() if validation fails (fail-fast)\n\n## Acceptance criteria\n\n- validateConfig function is exported and unit-tested\n- Plugin refuses to start on invalid config and logs all reasons\n- README config section updated with valid ranges",
134
+ "github_issue": 10,
135
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
136
+ "completed": "2026-03-02T02:50:50.546Z"
137
+ },
138
+ "T-013": {
139
+ "title": "Write status snapshot file on each monitor tick",
140
+ "status": "done",
141
+ "priority": "low",
142
+ "depends_on": [],
143
+ "created": "2026-03-02T00:00:00Z",
144
+ "notes": "## Summary\n\nThe plugin already builds a StatusSnapshot object and emits it via api.emit('self-heal:status', snapshot) on each tick. However, external tools (scripts, dashboards, other plugins) cannot read this snapshot without subscribing to the event bus. A file-based status report would allow polling from any tool.\n\n## Scope\n\n- After each tick, write buildStatusSnapshot(state, config) to a JSON file\n- Default path: ~/.openclaw/workspace/memory/self-heal-status.json (configurable via statusFile config key)\n- Write atomically (write to .tmp then rename)\n- Add statusFile to PluginConfig type and parseConfig\n- Add a writeStatusFile(path, snapshot) helper function (exported for tests)\n\n## Acceptance criteria\n\n- Status file is written on every tick\n- File contains valid JSON matching StatusSnapshot type\n- Atomic write prevents partial reads\n- Tests cover writeStatusFile helper\n- README documents the statusFile config key and file format",
145
+ "github_issue": 11,
146
+ "github_repo": "elvatis/openclaw-self-healing-homeofe",
147
+ "completed": "2026-03-02T03:16:30.819Z"
148
+ },
149
+ "T-014": {
150
+ "title": "Export heal metrics to ~/.aahp/metrics.jsonl",
151
+ "status": "ready",
152
+ "priority": "low",
153
+ "depends_on": [],
154
+ "created": "2026-03-02T00:00:00Z",
155
+ "notes": "## Summary\n\nHeal events are currently only visible in logs and via api.emit(). There is no persistent record of past heal actions for analysis or alerting. The aahp-runner project writes structured metrics to ~/.aahp/metrics.jsonl; the self-heal plugin should do the same.\n\n## Scope\n\n- Add an appendMetric(line: object, metricsFile: string) helper that appends a JSON line\n- Emit one JSONL line per heal event:\n - { ts, plugin: \"self-heal\", event: \"model-cooldown\", model, reason, cooldownSec }\n - { ts, plugin: \"self-heal\", event: \"session-patched\", sessionKey, oldModel, newModel }\n - { ts, plugin: \"self-heal\", event: \"whatsapp-restart\", disconnectStreak }\n - { ts, plugin: \"self-heal\", event: \"cron-disabled\", cronId, cronName, consecutiveFailures }\n - { ts, plugin: \"self-heal\", event: \"model-recovered\", model, isPreferred }\n- Default metrics file: ~/.aahp/metrics.jsonl (configurable via metricsFile config key)\n- Skip metrics write in dry-run mode (or mark with dryRun: true)\n- Create parent directory if missing\n\n## Acceptance criteria\n\n- appendMetric helper is exported and unit-tested\n- All 5 heal event types write a metrics line\n- Dry-run events are either skipped or marked\n- README documents the metrics format and metricsFile config key",
156
+ "github_issue": 12,
157
+ "github_repo": "elvatis/openclaw-self-healing-homeofe"
158
+ }
159
+ },
160
+ "quick_context": "T-013 done: writeStatusFile() helper with atomic write (tmp+rename) added. statusFile config key with default ~/.openclaw/workspace/memory/self-heal-status.json. Written on every monitor tick. 9 new tests, total 181 passing. CI green. T-005 still blocked on openclaw plugins list --json API. Remaining v0.3 task: T-014 (metrics export).",
161
+ "aahp_version": "3.0",
162
+ "project": "openclaw-self-healing-homeofe",
163
+ "token_budget": {
164
+ "full_read": 1000,
165
+ "manifest_only": 100,
166
+ "manifest_plus_core": 300
167
+ },
168
+ "last_session": {
169
+ "timestamp": "2026-03-02T03:16:30.819Z",
170
+ "phase": "execution",
171
+ "agent": "claude-code",
172
+ "commit": "4a9d7a5",
173
+ "duration_minutes": 2,
174
+ "session_id": "t013-status-file"
175
+ },
176
+ "next_task_id": 15
177
+ }
@@ -0,0 +1,54 @@
1
+ # NEXT_ACTIONS (AAHP)
2
+
3
+ ## Status Summary
4
+
5
+ | Status | Count |
6
+ |---------|-------|
7
+ | Done | 13 |
8
+ | Ready | 1 |
9
+ | Blocked | 1 |
10
+
11
+ ---
12
+
13
+ ## Ready - Work These Next
14
+
15
+ ### T-014: [medium] - Export heal metrics to ~/.aahp/metrics.jsonl (issue #12)
16
+ - **Goal:** Append one JSONL line per heal event to `~/.aahp/metrics.jsonl` for analysis and alerting.
17
+ - **Context:** Heal events are currently only visible in logs and via api.emit(). There is no persistent record for analysis.
18
+ - **What to do:**
19
+ - Export `appendMetric(line, metricsFile)` helper
20
+ - Write entries for: model-cooldown, session-patched, whatsapp-restart, cron-disabled, model-recovered
21
+ - Default metrics file: `~/.aahp/metrics.jsonl` (configurable via `metricsFile`)
22
+ - Skip or mark dry-run events
23
+ - Create parent directory if missing
24
+ - **Files:** `index.ts`, `test/index.test.ts`, `README.md`
25
+ - **Definition of done:** Helper exported and tested; all 5 event types write metrics; README documents format.
26
+ - **GitHub Issue:** #12
27
+
28
+ ---
29
+
30
+ ## Blocked
31
+
32
+ ### T-005: [high] - Implement structured plugin health monitoring and auto-disable (issue #3)
33
+ - **Blocked by:** Waiting for `openclaw plugins list --json` API from openclaw core
34
+ - **Goal:** Monitor plugin health and auto-disable failing plugins using structured JSON output.
35
+ - **Context:** Current code has a stub that parses plain text output from `openclaw plugins list`. No robust parsing is possible without the `--json` flag.
36
+ - **What to do (when unblocked):**
37
+ - Parse `openclaw plugins list --json` output for plugin status
38
+ - Auto-disable plugins with `status=error` (respecting `pluginDisableCooldownSec`)
39
+ - Create GitHub issues for disabled plugins
40
+ - **Files:** `index.ts`, `test/index.test.ts`
41
+ - **Definition of done:** Failing plugins are detected via JSON API and auto-disabled; tests cover detection and disable logic.
42
+ - **GitHub Issue:** #3
43
+
44
+ ---
45
+
46
+ ## Recently Completed
47
+
48
+ | Task | Title | Date |
49
+ |-------|-------|------|
50
+ | T-013 | Write status snapshot file on each monitor tick | 2026-03-02 |
51
+ | T-012 | Add startup configuration validation with fail-fast behavior | 2026-03-02 |
52
+ | T-011 | Add integration tests for monitor service tick flows | 2026-03-02 |
53
+ | T-004 | Add TypeScript build pipeline and type-checking | 2026-03-01 |
54
+ | T-010 | Expose self-heal status for external monitoring | 2026-02-28 |
@@ -0,0 +1,20 @@
1
+ # STATUS (AAHP)
2
+
3
+ (Verified/Assumed/Unknown)
4
+
5
+ ## Current State
6
+ - Status: (Verified) Scaffold complete, v0.3 roadmap defined
7
+ - Version: 0.2.4
8
+ - CI: (Unknown) No build/test pipeline yet - top priority
9
+
10
+ ## Architecture Notes
11
+ - (Verified) Single-file plugin (`index.ts`) using openclaw plugin API
12
+ - (Verified) Three healing domains: model failover, WhatsApp reconnect, cron failure
13
+ - (Verified) Plugin disable feature is a stub (blocked on structured API)
14
+ - (Verified) State persisted to JSON file, no external dependencies
15
+
16
+ ## Risks / Open Questions
17
+ - (Verified) Zero test coverage - any change risks silent regressions
18
+ - (Verified) No TypeScript type-checking in build pipeline
19
+ - (Unknown) When will `openclaw plugins list --json` be available?
20
+ - (Verified) `gh` CLI not authenticated - GitHub issue creation deferred
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Elvatis - Emre Kohler
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,121 @@
1
+ # openclaw-self-healing-elvatis
2
+
3
+ OpenClaw plugin that improves resilience by automatically fixing reversible failures.
4
+
5
+ ## What it can heal (v0.2)
6
+
7
+ Implemented now:
8
+
9
+ - Model outage healing
10
+ - Detect rate limit / quota / auth-scope failures
11
+ - Put the affected model into cooldown
12
+ - Patch pinned session model overrides to a safe fallback (prevents endless `API rate limit reached` loops)
13
+
14
+ - WhatsApp disconnect healing
15
+ - If WhatsApp appears disconnected repeatedly: restart the gateway
16
+ - Guardrails: streak threshold + minimum restart interval
17
+
18
+ - Cron failure healing (optional)
19
+ - If a cron job fails repeatedly: disable it
20
+ - Create a GitHub issue with last error context (rate limited)
21
+
22
+ Not implemented yet (next):
23
+ - Plugin install error rollback (disable plugin) based on structured plugin status
24
+ - Waiting for `openclaw plugins list --json` or an equivalent stable API
25
+
26
+ ## Install
27
+
28
+ From ClawHub:
29
+
30
+ ```bash
31
+ clawhub install openclaw-self-healing-elvatis
32
+ ```
33
+
34
+ For local development:
35
+
36
+ ```bash
37
+ openclaw plugins install -l ~/.openclaw/workspace/openclaw-self-healing-elvatis
38
+ openclaw gateway restart
39
+ ```
40
+
41
+ ## Config
42
+
43
+ ```json
44
+ {
45
+ "plugins": {
46
+ "entries": {
47
+ "openclaw-self-healing": {
48
+ "enabled": true,
49
+ "config": {
50
+ "modelOrder": [
51
+ "anthropic/claude-opus-4-6",
52
+ "openai-codex/gpt-5.2",
53
+ "google-gemini-cli/gemini-2.5-flash"
54
+ ],
55
+ "cooldownMinutes": 300,
56
+ "autoFix": {
57
+ "patchSessionPins": true,
58
+ "disableFailingPlugins": false,
59
+ "disableFailingCrons": false,
60
+ "issueRepo": "elvatis/openclaw-self-healing-homeofe"
61
+ }
62
+ }
63
+ }
64
+ }
65
+ }
66
+ }
67
+ ```
68
+
69
+ `autoFix.issueRepo` must use `owner/repo` format. Invalid values are ignored and the plugin falls back to `GITHUB_REPOSITORY` (if valid) or `elvatis/openclaw-self-healing-homeofe`.
70
+
71
+ ### Config validation
72
+
73
+ The plugin validates configuration at startup and refuses to start if any value is invalid. All validation errors are logged via `api.logger.error` before the plugin exits.
74
+
75
+ | Key | Valid range | Default |
76
+ |-----|------------|---------|
77
+ | `modelOrder` | At least one entry (non-empty array) | 3 default models |
78
+ | `cooldownMinutes` | 1 - 10080 (1 minute to 1 week) | 300 |
79
+ | `probeIntervalSec` | >= 60 | 300 |
80
+ | `autoFix.whatsappMinRestartIntervalSec` | >= 60 | 300 |
81
+ | `stateFile` | Parent directory must be writable | `~/.openclaw/workspace/memory/self-heal-state.json` |
82
+ | `statusFile` | Path to status snapshot JSON | `~/.openclaw/workspace/memory/self-heal-status.json` |
83
+
84
+ ## Status file
85
+
86
+ On every monitor tick (60s), the plugin writes a JSON status snapshot to `statusFile`. External scripts, dashboards, or other plugins can poll this file without subscribing to the event bus.
87
+
88
+ Default path: `~/.openclaw/workspace/memory/self-heal-status.json`
89
+
90
+ The file is written atomically (write to `.tmp` then rename) to prevent partial reads. The JSON structure matches the `StatusSnapshot` type:
91
+
92
+ ```json
93
+ {
94
+ "health": "healthy | degraded | healing",
95
+ "activeModel": "anthropic/claude-opus-4-6",
96
+ "models": [
97
+ { "id": "...", "status": "available | cooldown", "cooldownRemainingSec": 1234 }
98
+ ],
99
+ "whatsapp": { "status": "connected | disconnected | unknown", "disconnectStreak": 0 },
100
+ "cron": { "trackedJobs": 2, "failingJobs": [] },
101
+ "config": { "dryRun": false, "probeEnabled": true, "cooldownMinutes": 300, "modelOrder": ["..."] },
102
+ "generatedAt": 1700000000
103
+ }
104
+ ```
105
+
106
+ ## Notes
107
+
108
+ Infrastructure changes remain ask-first.
109
+
110
+ ## Critical Guardrail: openclaw.json validation
111
+
112
+ This plugin treats `~/.openclaw/openclaw.json` as a boot-critical file.
113
+
114
+ Before any self-heal action that could restart the gateway or change cron/plugin state, it verifies:
115
+ - the config file exists
116
+ - it is valid JSON
117
+
118
+ If the config is invalid, the plugin will refuse to restart the gateway to avoid restart loops.
119
+
120
+ It also creates timestamped backups before restarts or disruptive changes:
121
+ `~/.openclaw/backups/openclaw.json/openclaw.json.<timestamp>.bak`
package/ROADMAP.md ADDED
@@ -0,0 +1,92 @@
1
+ # Roadmap - v0.3
2
+
3
+ Prioritized list of work items for the next release cycle. Each item maps to a GitHub issue (create with `bash scripts/create-roadmap-issues.sh` once `gh` is authenticated).
4
+
5
+ ## High priority
6
+
7
+ ### 1. Add unit test suite for core healing logic
8
+ **Labels:** `priority:high`, `type:testing`
9
+
10
+ The plugin has zero tests. All core logic - failure detection, model fallback selection, session patching, state management - is untested. This is the single highest-priority gap because every other change risks regressions without a safety net.
11
+
12
+ Scope:
13
+ - Set up a test runner (vitest or similar)
14
+ - Add `tsconfig.json` for type-checking
15
+ - Unit tests for `isRateLimitLike`, `isAuthScopeLike`, `pickFallback`, `patchSessionModel`, `loadState`/`saveState`, `isConfigValid`, backup lifecycle
16
+ - Integration test for `agent_end` and `message_sent` event handlers
17
+ - CI: add `test` script to `package.json`
18
+
19
+ ### 2. Add TypeScript build pipeline and type-checking
20
+ **Labels:** `priority:high`, `type:infra`
21
+
22
+ No `tsconfig.json` exists. No build step or type-check. TypeScript errors could ship undetected.
23
+
24
+ Scope:
25
+ - Add `tsconfig.json` with strict mode
26
+ - Add build/typecheck script to `package.json`
27
+ - Ensure `tsc --noEmit` passes
28
+ - Verify the plugin still loads correctly
29
+
30
+ ### 3. Implement structured plugin health monitoring and auto-disable
31
+ **Labels:** `priority:high`, `type:feature`
32
+
33
+ The `disableFailingPlugins` feature is a stub (`index.ts` lines 391-403). Needs proper implementation when `openclaw plugins list --json` becomes available.
34
+
35
+ Scope:
36
+ - Parse structured output from `openclaw plugins list --json`
37
+ - Auto-disable failing plugins (with cooldown)
38
+ - Create GitHub issue with error context
39
+ - Guardrail: never disable self
40
+ - Blocked on: `openclaw plugins list --json` API
41
+
42
+ ## Medium priority
43
+
44
+ ### 4. Expose self-heal status for external monitoring
45
+ **Labels:** `priority:medium`, `type:feature`
46
+
47
+ External tools cannot query self-heal state. No way to check which models are in cooldown, heal history, or WhatsApp status without reading the raw state file.
48
+
49
+ Scope:
50
+ - Register a plugin command or API endpoint returning JSON status
51
+ - Include: active cooldowns, WhatsApp status, recent heal actions
52
+
53
+ ### 5. Emit structured observability events for heal actions
54
+ **Labels:** `priority:medium`, `type:observability`
55
+
56
+ No structured events emitted. Monitoring systems cannot track heal actions or failure rates.
57
+
58
+ Scope:
59
+ - Emit events via `api.emit()` for: model cooldown, session patching, WhatsApp restart, cron disable, plugin disable
60
+ - Include timestamp, action type, and context in each event
61
+
62
+ ### 6. Add dry-run mode for safe validation
63
+ **Labels:** `priority:medium`, `type:dx`
64
+
65
+ No way to test healing logic without executing real side-effects.
66
+
67
+ Scope:
68
+ - `dryRun: boolean` config option
69
+ - Log all actions without executing them
70
+ - State tracking still updates for validation
71
+
72
+ ## Low priority
73
+
74
+ ### 7. Active model recovery probing
75
+ **Labels:** `priority:low`, `type:feature`
76
+
77
+ Models in cooldown are recovered passively only. If a model recovers early, the plugin still uses the fallback until the full cooldown expires.
78
+
79
+ Scope:
80
+ - Periodic probe (e.g., every 5 minutes) testing cooldown models
81
+ - Early recovery detection and cooldown clearing
82
+ - Configurable and respects rate limits
83
+
84
+ ### 8. Config hot-reload without gateway restart
85
+ **Labels:** `priority:low`, `type:dx`
86
+
87
+ Plugin config is read once at startup. Changes require a full gateway restart.
88
+
89
+ Scope:
90
+ - Watch for config changes or support reload command
91
+ - Re-read and update internal variables
92
+ - Reject invalid config gracefully
package/SKILL.md ADDED
@@ -0,0 +1,24 @@
1
+ ---
2
+ name: openclaw-self-healing-elvatis
3
+ description: OpenClaw plugin that applies guardrails and auto-fixes reversible failures (rate limits, disconnects, stuck session pins).
4
+ ---
5
+
6
+ # openclaw-self-healing-elvatis
7
+
8
+ Self-healing extension for OpenClaw.
9
+
10
+ ## What it does
11
+
12
+ - Detects common reversible failures (rate limits, auth errors, stuck session model pins)
13
+ - Applies guardrails (e.g. avoid breaking config)
14
+ - Can auto-recover WhatsApp disconnects (when enabled)
15
+
16
+ ## Install
17
+
18
+ ```bash
19
+ clawhub install openclaw-self-healing-elvatis
20
+ ```
21
+
22
+ ## Notes
23
+
24
+ Keep repository content public-safe (no private identifiers).