npm - @elvatis_com/openclaw-self-healing-elvatis - Versions diffs - 0.2.4 - Mend

@elvatis_com/openclaw-self-healing-elvatis 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/.ai/handoff/DASHBOARD.md +13 -0
package/.ai/handoff/LOG.md +4 -0
package/.ai/handoff/MANIFEST.json +177 -0
package/.ai/handoff/NEXT_ACTIONS.md +54 -0
package/.ai/handoff/STATUS.md +20 -0
package/LICENSE +21 -0
package/README.md +121 -0
package/ROADMAP.md +92 -0
package/SKILL.md +24 -0
package/index.ts +834 -0
package/openclaw.plugin.json +110 -0
package/package.json +23 -0
package/scripts/create-roadmap-issues.sh +249 -0
package/test/index.test.ts +2682 -0
package/test/monitor-integration.test.ts +1403 -0
package/tsconfig.check.json +9 -0
package/tsconfig.json +20 -0
package/vitest.config.ts +7 -0

package/.ai/handoff/DASHBOARD.md ADDED Viewed

@@ -0,0 +1,13 @@
+# DASHBOARD (AAHP)
+| Task | Ready? | Owner | Notes |
+|---|---|---|---|
+| Create GitHub issues | ✅ | Human | Run `gh auth login` then `bash scripts/create-roadmap-issues.sh` |
+| Unit test suite | ✅ | Akido | Highest priority - vitest + tsconfig |
+| TS build pipeline | ✅ | Akido | Add tsconfig.json, build script |
+| Plugin health monitoring | Blocked | Akido | Waiting for `openclaw plugins list --json` |
+| Health status endpoint | ✅ | Akido | Medium priority |
+| Observability events | ✅ | Akido | Medium priority |
+| Dry-run mode | ✅ | Akido | Medium priority |
+| Active recovery probing | ✅ | Akido | Low priority |
+| Config hot-reload | ✅ | Akido | Low priority |

package/.ai/handoff/LOG.md ADDED Viewed

@@ -0,0 +1,4 @@
+# LOG (AAHP)
+- 2026-02-27: Defined v0.3 roadmap with 8 items (3 high, 3 medium, 2 low). Created ROADMAP.md and scripts/create-roadmap-issues.sh. GitHub issue creation deferred - gh CLI not authenticated.
+- 2026-02-24: Initialized AAHP handoff structure.

package/.ai/handoff/MANIFEST.json ADDED Viewed

@@ -0,0 +1,177 @@
+{
+  "files": {
+    "NEXT_ACTIONS.md": {
+      "checksum": "sha256:pending",
+      "lines": 22,
+      "updated": "2026-02-27T23:00:00Z",
+      "summary": "8 roadmap items defined: 3 high, 3 medium, 2 low priority. GitHub issues created (#1-#8)."
+    },
+    "ROADMAP.md": {
+      "checksum": "sha256:pending",
+      "lines": 85,
+      "updated": "2026-02-27T23:00:00Z",
+      "summary": "Full v0.3 roadmap with 8 items, scoped and prioritized."
+    }
+  },
+  "tasks": {
+    "T-001": {
+      "created": "2026-02-28T00:00:00Z",
+      "priority": "high",
+      "title": "Define v0.2 roadmap items as GitHub issues and prioritize",
+      "status": "done",
+      "depends_on": [],
+      "completed": "2026-02-27T13:15:01.997Z"
+    },
+    "T-002": {
+      "created": "2026-02-27T23:00:00Z",
+      "priority": "high",
+      "title": "Run scripts/create-roadmap-issues.sh after gh auth login",
+      "status": "done",
+      "depends_on": [],
+      "completed": "2026-02-27T13:27:44.739Z"
+    },
+    "T-003": {
+      "created": "2026-02-27T23:00:00Z",
+      "priority": "high",
+      "title": "Add unit test suite for core healing logic",
+      "status": "done",
+      "depends_on": [],
+      "github_issue": 1,
+      "completed": "2026-02-27T14:12:40.386Z"
+    },
+    "T-004": {
+      "created": "2026-02-27T23:00:00Z",
+      "priority": "high",
+      "title": "Add TypeScript build pipeline and type-checking",
+      "status": "done",
+      "depends_on": [],
+      "github_issue": 2,
+      "completed": "2026-03-01T15:01:03.900Z"
+    },
+    "T-005": {
+      "created": "2026-02-27T23:00:00Z",
+      "priority": "high",
+      "title": "Implement structured plugin health monitoring and auto-disable",
+      "status": "blocked",
+      "depends_on": [],
+      "blocked_reason": "Waiting for openclaw plugins list --json API",
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "github_issue": 3
+    },
+    "T-006": {
+      "title": "Support configuration hot-reload without gateway restart",
+      "status": "done",
+      "priority": "low",
+      "depends_on": [],
+      "created": "2026-02-28T13:25:43.201Z",
+      "notes": "## Summary\n\nPlugin config (`modelOrder`, `cooldownMinutes`, `autoFix.*`) is read once at startup via `api.pluginConfig`. Changing any config value requires a full gateway restart.\n\n## Scope\n\n- Watch for config changes (file watch or periodic re-read)\n- Re-read plugin config on change and update internal variables\n- Handle edge cases: invalid new config (keep old), partial updates\n- Alternatively, support a reload command: `openclaw self-heal reload`\n\n## Acceptance criteria\n\n- Config changes take",
+      "github_issue": 8,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-02-28T13:42:34.823Z"
+    },
+    "T-007": {
+      "title": "Add active model recovery probing to shorten cooldown periods",
+      "status": "done",
+      "priority": "medium",
+      "depends_on": [],
+      "created": "2026-02-28T13:25:43.204Z",
+      "notes": "## Summary\n\nModels in cooldown are currently recovered passively: `pickFallback()` checks `nextAvailableAt` only when a new failure occurs. If a model recovers early, the plugin still uses the fallback until the full cooldown expires.\n\n## Scope\n\n- Add a periodic probe (e.g., every 5 minutes) that tests cooldown models\n- Use a lightweight API call (e.g., model info endpoint or small completion)\n- If the model responds successfully, remove it from cooldown early\n- Configurable probe interval and e",
+      "github_issue": 7,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-02-28T14:42:14.373Z"
+    },
+    "T-008": {
+      "title": "Add dry-run mode for safe validation of healing logic",
+      "status": "done",
+      "priority": "medium",
+      "depends_on": [],
+      "created": "2026-02-28T13:25:43.204Z",
+      "notes": "## Summary\n\nThere is no way to test what the plugin would do without it actually taking healing actions (restarting gateways, disabling crons, patching sessions). A dry-run mode is needed for validation and debugging.\n\n## Scope\n\n- Add a `dryRun: boolean` config option (default: false)\n- When enabled: log all actions that _would_ be taken, but do not execute them\n- State tracking still updates (to test state transitions) but side-effects are skipped\n- Useful for: initial setup validation, debuggi",
+      "github_issue": 6,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-02-28T15:14:21.885Z"
+    },
+    "T-009": {
+      "title": "Emit structured observability events for heal actions",
+      "status": "done",
+      "priority": "medium",
+      "depends_on": [],
+      "created": "2026-02-28T13:25:43.204Z",
+      "notes": "## Summary\n\nThe plugin uses `api.logger` for logging but emits no structured events. Monitoring and alerting systems cannot track heal actions, cooldown entries, or failure rates.\n\n## Scope\n\n- Emit structured events via `api.emit()` or equivalent for:\n  - `self-heal:model-cooldown` - model put into cooldown (with model ID, reason, duration)\n  - `self-heal:session-patched` - session model pin overridden (with session key, old/new model)\n  - `self-heal:whatsapp-restart` - WhatsApp gateway restarte",
+      "github_issue": 5,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-02-28T15:23:54.846Z"
+    },
+    "T-010": {
+      "title": "Expose self-heal status for external monitoring",
+      "status": "done",
+      "priority": "medium",
+      "depends_on": [],
+      "created": "2026-02-28T13:25:43.204Z",
+      "notes": "## Summary\n\nExternal tools (dashboards, other plugins, CLI) cannot query the self-heal plugin state. There is no way to know which models are in cooldown, how many heals have occurred, or the current WhatsApp connection status without reading the raw state file.\n\n## Scope\n\n- Register a plugin command or API endpoint (depends on openclaw plugin API):\n  - `openclaw self-heal status` or similar\n  - Returns JSON with: cooldown models, WhatsApp status, cron heal history, last heal actions\n- Alternati",
+      "github_issue": 4,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-02-28T16:15:24.131Z"
+    },
+    "T-011": {
+      "title": "Add integration tests for monitor service tick flows",
+      "status": "done",
+      "priority": "low",
+      "depends_on": [],
+      "created": "2026-03-02T00:00:00Z",
+      "notes": "## Summary\n\nThe current test suite has 122 unit tests covering helpers and pure functions (parseConfig, pickFallback, buildStatusSnapshot, etc.) but the monitor service tick() function runs all healing logic and is not covered by integration tests.\n\n## Scope\n\n- Add integration tests for the full monitor tick cycle using a mocked api object\n- Cover: WhatsApp disconnect streak -> restart path\n- Cover: cron failure accumulation -> disable + issue create path\n- Cover: active model recovery probe -> cooldown removal path\n- Cover: config hot-reload during tick\n- Cover: dry-run flag suppresses all side-effects\n- Use Jest timer mocks for setInterval control\n\n## Acceptance criteria\n\n- At least 20 new integration-level tests added\n- All healing paths (WhatsApp, cron, probe) exercised in tests\n- Tests pass with npm test",
+      "github_issue": 9,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-03-02T02:17:03.808Z"
+    },
+    "T-012": {
+      "title": "Add startup configuration validation with fail-fast behavior",
+      "status": "done",
+      "priority": "low",
+      "depends_on": [],
+      "created": "2026-03-02T00:00:00Z",
+      "notes": "## Summary\n\nCurrently parseConfig() silently falls back to defaults for any invalid value. The plugin starts even if modelOrder is empty, paths are not writable, or cooldownMinutes is set to an absurd value. This makes misconfiguration hard to diagnose.\n\n## Scope\n\n- Add a validateConfig(config: PluginConfig): { valid: boolean; errors: string[] } function\n- Validate: modelOrder has at least one entry\n- Validate: cooldownMinutes is between 1 and 10080 (1 week)\n- Validate: probeIntervalSec >= 60\n- Validate: whatsappMinRestartIntervalSec >= 60\n- Validate: state file directory is writable (best-effort)\n- Log each validation error clearly with api.logger?.error\n- Return early from register() if validation fails (fail-fast)\n\n## Acceptance criteria\n\n- validateConfig function is exported and unit-tested\n- Plugin refuses to start on invalid config and logs all reasons\n- README config section updated with valid ranges",
+      "github_issue": 10,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-03-02T02:50:50.546Z"
+    },
+    "T-013": {
+      "title": "Write status snapshot file on each monitor tick",
+      "status": "done",
+      "priority": "low",
+      "depends_on": [],
+      "created": "2026-03-02T00:00:00Z",
+      "notes": "## Summary\n\nThe plugin already builds a StatusSnapshot object and emits it via api.emit('self-heal:status', snapshot) on each tick. However, external tools (scripts, dashboards, other plugins) cannot read this snapshot without subscribing to the event bus. A file-based status report would allow polling from any tool.\n\n## Scope\n\n- After each tick, write buildStatusSnapshot(state, config) to a JSON file\n- Default path: ~/.openclaw/workspace/memory/self-heal-status.json (configurable via statusFile config key)\n- Write atomically (write to .tmp then rename)\n- Add statusFile to PluginConfig type and parseConfig\n- Add a writeStatusFile(path, snapshot) helper function (exported for tests)\n\n## Acceptance criteria\n\n- Status file is written on every tick\n- File contains valid JSON matching StatusSnapshot type\n- Atomic write prevents partial reads\n- Tests cover writeStatusFile helper\n- README documents the statusFile config key and file format",
+      "github_issue": 11,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe",
+      "completed": "2026-03-02T03:16:30.819Z"
+    },
+    "T-014": {
+      "title": "Export heal metrics to ~/.aahp/metrics.jsonl",
+      "status": "ready",
+      "priority": "low",
+      "depends_on": [],
+      "created": "2026-03-02T00:00:00Z",
+      "notes": "## Summary\n\nHeal events are currently only visible in logs and via api.emit(). There is no persistent record of past heal actions for analysis or alerting. The aahp-runner project writes structured metrics to ~/.aahp/metrics.jsonl; the self-heal plugin should do the same.\n\n## Scope\n\n- Add an appendMetric(line: object, metricsFile: string) helper that appends a JSON line\n- Emit one JSONL line per heal event:\n  - { ts, plugin: \"self-heal\", event: \"model-cooldown\", model, reason, cooldownSec }\n  - { ts, plugin: \"self-heal\", event: \"session-patched\", sessionKey, oldModel, newModel }\n  - { ts, plugin: \"self-heal\", event: \"whatsapp-restart\", disconnectStreak }\n  - { ts, plugin: \"self-heal\", event: \"cron-disabled\", cronId, cronName, consecutiveFailures }\n  - { ts, plugin: \"self-heal\", event: \"model-recovered\", model, isPreferred }\n- Default metrics file: ~/.aahp/metrics.jsonl (configurable via metricsFile config key)\n- Skip metrics write in dry-run mode (or mark with dryRun: true)\n- Create parent directory if missing\n\n## Acceptance criteria\n\n- appendMetric helper is exported and unit-tested\n- All 5 heal event types write a metrics line\n- Dry-run events are either skipped or marked\n- README documents the metrics format and metricsFile config key",
+      "github_issue": 12,
+      "github_repo": "elvatis/openclaw-self-healing-homeofe"
+    }
+  },
+  "quick_context": "T-013 done: writeStatusFile() helper with atomic write (tmp+rename) added. statusFile config key with default ~/.openclaw/workspace/memory/self-heal-status.json. Written on every monitor tick. 9 new tests, total 181 passing. CI green. T-005 still blocked on openclaw plugins list --json API. Remaining v0.3 task: T-014 (metrics export).",
+  "aahp_version": "3.0",
+  "project": "openclaw-self-healing-homeofe",
+  "token_budget": {
+    "full_read": 1000,
+    "manifest_only": 100,
+    "manifest_plus_core": 300
+  },
+  "last_session": {
+    "timestamp": "2026-03-02T03:16:30.819Z",
+    "phase": "execution",
+    "agent": "claude-code",
+    "commit": "4a9d7a5",
+    "duration_minutes": 2,
+    "session_id": "t013-status-file"
+  },
+  "next_task_id": 15
+}

package/.ai/handoff/NEXT_ACTIONS.md ADDED Viewed

@@ -0,0 +1,54 @@
+# NEXT_ACTIONS (AAHP)
+## Status Summary
+| Status  | Count |
+|---------|-------|
+| Done    | 13    |
+| Ready   | 1     |
+| Blocked | 1     |
+---
+## Ready - Work These Next
+### T-014: [medium] - Export heal metrics to ~/.aahp/metrics.jsonl (issue #12)
+- **Goal:** Append one JSONL line per heal event to `~/.aahp/metrics.jsonl` for analysis and alerting.
+- **Context:** Heal events are currently only visible in logs and via api.emit(). There is no persistent record for analysis.
+- **What to do:**
+  - Export `appendMetric(line, metricsFile)` helper
+  - Write entries for: model-cooldown, session-patched, whatsapp-restart, cron-disabled, model-recovered
+  - Default metrics file: `~/.aahp/metrics.jsonl` (configurable via `metricsFile`)
+  - Skip or mark dry-run events
+  - Create parent directory if missing
+- **Files:** `index.ts`, `test/index.test.ts`, `README.md`
+- **Definition of done:** Helper exported and tested; all 5 event types write metrics; README documents format.
+- **GitHub Issue:** #12
+---
+## Blocked
+### T-005: [high] - Implement structured plugin health monitoring and auto-disable (issue #3)
+- **Blocked by:** Waiting for `openclaw plugins list --json` API from openclaw core
+- **Goal:** Monitor plugin health and auto-disable failing plugins using structured JSON output.
+- **Context:** Current code has a stub that parses plain text output from `openclaw plugins list`. No robust parsing is possible without the `--json` flag.
+- **What to do (when unblocked):**
+  - Parse `openclaw plugins list --json` output for plugin status
+  - Auto-disable plugins with `status=error` (respecting `pluginDisableCooldownSec`)
+  - Create GitHub issues for disabled plugins
+- **Files:** `index.ts`, `test/index.test.ts`
+- **Definition of done:** Failing plugins are detected via JSON API and auto-disabled; tests cover detection and disable logic.
+- **GitHub Issue:** #3
+---
+## Recently Completed
+| Task  | Title | Date |
+|-------|-------|------|
+| T-013 | Write status snapshot file on each monitor tick | 2026-03-02 |
+| T-012 | Add startup configuration validation with fail-fast behavior | 2026-03-02 |
+| T-011 | Add integration tests for monitor service tick flows | 2026-03-02 |
+| T-004 | Add TypeScript build pipeline and type-checking | 2026-03-01 |
+| T-010 | Expose self-heal status for external monitoring | 2026-02-28 |

package/.ai/handoff/STATUS.md ADDED Viewed

@@ -0,0 +1,20 @@
+# STATUS (AAHP)
+(Verified/Assumed/Unknown)
+## Current State
+- Status: (Verified) Scaffold complete, v0.3 roadmap defined
+- Version: 0.2.4
+- CI: (Unknown) No build/test pipeline yet - top priority
+## Architecture Notes
+- (Verified) Single-file plugin (`index.ts`) using openclaw plugin API
+- (Verified) Three healing domains: model failover, WhatsApp reconnect, cron failure
+- (Verified) Plugin disable feature is a stub (blocked on structured API)
+- (Verified) State persisted to JSON file, no external dependencies
+## Risks / Open Questions
+- (Verified) Zero test coverage - any change risks silent regressions
+- (Verified) No TypeScript type-checking in build pipeline
+- (Unknown) When will `openclaw plugins list --json` be available?
+- (Verified) `gh` CLI not authenticated - GitHub issue creation deferred

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Elvatis - Emre Kohler
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,121 @@
+# openclaw-self-healing-elvatis
+OpenClaw plugin that improves resilience by automatically fixing reversible failures.
+## What it can heal (v0.2)
+Implemented now:
+- Model outage healing
+  - Detect rate limit / quota / auth-scope failures
+  - Put the affected model into cooldown
+  - Patch pinned session model overrides to a safe fallback (prevents endless `API rate limit reached` loops)
+- WhatsApp disconnect healing
+  - If WhatsApp appears disconnected repeatedly: restart the gateway
+  - Guardrails: streak threshold + minimum restart interval
+- Cron failure healing (optional)
+  - If a cron job fails repeatedly: disable it
+  - Create a GitHub issue with last error context (rate limited)
+Not implemented yet (next):
+- Plugin install error rollback (disable plugin) based on structured plugin status
+  - Waiting for `openclaw plugins list --json` or an equivalent stable API
+## Install
+From ClawHub:
+```bash
+clawhub install openclaw-self-healing-elvatis
+```
+For local development:
+```bash
+openclaw plugins install -l ~/.openclaw/workspace/openclaw-self-healing-elvatis
+openclaw gateway restart
+```
+## Config
+```json
+{
+  "plugins": {
+    "entries": {
+      "openclaw-self-healing": {
+        "enabled": true,
+        "config": {
+          "modelOrder": [
+            "anthropic/claude-opus-4-6",
+            "openai-codex/gpt-5.2",
+            "google-gemini-cli/gemini-2.5-flash"
+          ],
+          "cooldownMinutes": 300,
+          "autoFix": {
+            "patchSessionPins": true,
+            "disableFailingPlugins": false,
+            "disableFailingCrons": false,
+            "issueRepo": "elvatis/openclaw-self-healing-homeofe"
+          }
+        }
+      }
+    }
+  }
+}
+```
+`autoFix.issueRepo` must use `owner/repo` format. Invalid values are ignored and the plugin falls back to `GITHUB_REPOSITORY` (if valid) or `elvatis/openclaw-self-healing-homeofe`.
+### Config validation
+The plugin validates configuration at startup and refuses to start if any value is invalid. All validation errors are logged via `api.logger.error` before the plugin exits.
+| Key | Valid range | Default |
+|-----|------------|---------|
+| `modelOrder` | At least one entry (non-empty array) | 3 default models |
+| `cooldownMinutes` | 1 - 10080 (1 minute to 1 week) | 300 |
+| `probeIntervalSec` | >= 60 | 300 |
+| `autoFix.whatsappMinRestartIntervalSec` | >= 60 | 300 |
+| `stateFile` | Parent directory must be writable | `~/.openclaw/workspace/memory/self-heal-state.json` |
+| `statusFile` | Path to status snapshot JSON | `~/.openclaw/workspace/memory/self-heal-status.json` |
+## Status file
+On every monitor tick (60s), the plugin writes a JSON status snapshot to `statusFile`. External scripts, dashboards, or other plugins can poll this file without subscribing to the event bus.
+Default path: `~/.openclaw/workspace/memory/self-heal-status.json`
+The file is written atomically (write to `.tmp` then rename) to prevent partial reads. The JSON structure matches the `StatusSnapshot` type:
+```json
+{
+  "health": "healthy | degraded | healing",
+  "activeModel": "anthropic/claude-opus-4-6",
+  "models": [
+    { "id": "...", "status": "available | cooldown", "cooldownRemainingSec": 1234 }
+  ],
+  "whatsapp": { "status": "connected | disconnected | unknown", "disconnectStreak": 0 },
+  "cron": { "trackedJobs": 2, "failingJobs": [] },
+  "config": { "dryRun": false, "probeEnabled": true, "cooldownMinutes": 300, "modelOrder": ["..."] },
+  "generatedAt": 1700000000
+}
+```
+## Notes
+Infrastructure changes remain ask-first.
+## Critical Guardrail: openclaw.json validation
+This plugin treats `~/.openclaw/openclaw.json` as a boot-critical file.
+Before any self-heal action that could restart the gateway or change cron/plugin state, it verifies:
+- the config file exists
+- it is valid JSON
+If the config is invalid, the plugin will refuse to restart the gateway to avoid restart loops.
+It also creates timestamped backups before restarts or disruptive changes:
+`~/.openclaw/backups/openclaw.json/openclaw.json.<timestamp>.bak`

package/ROADMAP.md ADDED Viewed

@@ -0,0 +1,92 @@
+# Roadmap - v0.3
+Prioritized list of work items for the next release cycle. Each item maps to a GitHub issue (create with `bash scripts/create-roadmap-issues.sh` once `gh` is authenticated).
+## High priority
+### 1. Add unit test suite for core healing logic
+**Labels:** `priority:high`, `type:testing`
+The plugin has zero tests. All core logic - failure detection, model fallback selection, session patching, state management - is untested. This is the single highest-priority gap because every other change risks regressions without a safety net.
+Scope:
+- Set up a test runner (vitest or similar)
+- Add `tsconfig.json` for type-checking
+- Unit tests for `isRateLimitLike`, `isAuthScopeLike`, `pickFallback`, `patchSessionModel`, `loadState`/`saveState`, `isConfigValid`, backup lifecycle
+- Integration test for `agent_end` and `message_sent` event handlers
+- CI: add `test` script to `package.json`
+### 2. Add TypeScript build pipeline and type-checking
+**Labels:** `priority:high`, `type:infra`
+No `tsconfig.json` exists. No build step or type-check. TypeScript errors could ship undetected.
+Scope:
+- Add `tsconfig.json` with strict mode
+- Add build/typecheck script to `package.json`
+- Ensure `tsc --noEmit` passes
+- Verify the plugin still loads correctly
+### 3. Implement structured plugin health monitoring and auto-disable
+**Labels:** `priority:high`, `type:feature`
+The `disableFailingPlugins` feature is a stub (`index.ts` lines 391-403). Needs proper implementation when `openclaw plugins list --json` becomes available.
+Scope:
+- Parse structured output from `openclaw plugins list --json`
+- Auto-disable failing plugins (with cooldown)
+- Create GitHub issue with error context
+- Guardrail: never disable self
+- Blocked on: `openclaw plugins list --json` API
+## Medium priority
+### 4. Expose self-heal status for external monitoring
+**Labels:** `priority:medium`, `type:feature`
+External tools cannot query self-heal state. No way to check which models are in cooldown, heal history, or WhatsApp status without reading the raw state file.
+Scope:
+- Register a plugin command or API endpoint returning JSON status
+- Include: active cooldowns, WhatsApp status, recent heal actions
+### 5. Emit structured observability events for heal actions
+**Labels:** `priority:medium`, `type:observability`
+No structured events emitted. Monitoring systems cannot track heal actions or failure rates.
+Scope:
+- Emit events via `api.emit()` for: model cooldown, session patching, WhatsApp restart, cron disable, plugin disable
+- Include timestamp, action type, and context in each event
+### 6. Add dry-run mode for safe validation
+**Labels:** `priority:medium`, `type:dx`
+No way to test healing logic without executing real side-effects.
+Scope:
+- `dryRun: boolean` config option
+- Log all actions without executing them
+- State tracking still updates for validation
+## Low priority
+### 7. Active model recovery probing
+**Labels:** `priority:low`, `type:feature`
+Models in cooldown are recovered passively only. If a model recovers early, the plugin still uses the fallback until the full cooldown expires.
+Scope:
+- Periodic probe (e.g., every 5 minutes) testing cooldown models
+- Early recovery detection and cooldown clearing
+- Configurable and respects rate limits
+### 8. Config hot-reload without gateway restart
+**Labels:** `priority:low`, `type:dx`
+Plugin config is read once at startup. Changes require a full gateway restart.
+Scope:
+- Watch for config changes or support reload command
+- Re-read and update internal variables
+- Reject invalid config gracefully

package/SKILL.md ADDED Viewed

@@ -0,0 +1,24 @@
+---
+name: openclaw-self-healing-elvatis
+description: OpenClaw plugin that applies guardrails and auto-fixes reversible failures (rate limits, disconnects, stuck session pins).
+---
+# openclaw-self-healing-elvatis
+Self-healing extension for OpenClaw.
+## What it does
+- Detects common reversible failures (rate limits, auth errors, stuck session model pins)
+- Applies guardrails (e.g. avoid breaking config)
+- Can auto-recover WhatsApp disconnects (when enabled)
+## Install
+```bash
+clawhub install openclaw-self-healing-elvatis
+```
+## Notes
+Keep repository content public-safe (no private identifiers).