openclaw-aegis 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +81 -51
- package/dist/cli/index.js +450 -43
- package/dist/cli/index.js.map +1 -1
- package/dist/index.d.ts +28 -1
- package/dist/index.js +467 -60
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,27 +1,34 @@
|
|
|
1
|
-
|
|
1
|
+
<p align="center">
|
|
2
|
+
<img src="assets/cover.jpg" alt="OpenClaw Aegis — Self-Healing Sidecar for OpenClaw Gateway" width="820" height="450" />
|
|
3
|
+
</p>
|
|
2
4
|
|
|
3
|
-
|
|
5
|
+
<p align="center">
|
|
6
|
+
<a href="https://www.npmjs.com/package/openclaw-aegis"><img src="https://img.shields.io/npm/v/openclaw-aegis" alt="npm" /></a>
|
|
7
|
+
<a href="https://github.com/Canary-Builds/openclaw-aegis/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/Canary-Builds/openclaw-aegis/ci.yml?label=CI" alt="CI" /></a>
|
|
8
|
+
<a href="https://nodejs.org"><img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="Node.js" /></a>
|
|
9
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License: MIT" /></a>
|
|
10
|
+
</p>
|
|
4
11
|
|
|
5
|
-
|
|
12
|
+
---
|
|
6
13
|
|
|
7
|
-
|
|
8
|
-
[](https://github.com/Canary-Builds/openclaw-aegis/actions/workflows/ci.yml)
|
|
9
|
-
[](https://nodejs.org)
|
|
10
|
-
[](LICENSE)
|
|
14
|
+
## The Armor Your Gateway Deserves
|
|
11
15
|
|
|
12
|
-
|
|
16
|
+
When your OpenClaw gateway goes down, **everything goes dark** — Telegram, WhatsApp, all channels. Silent. No alerts, no warnings, nothing. If a bad config caused the crash, restarting won't help. The `.bak` files carry the same poison. You only find out hours later when someone asks why messages stopped.
|
|
13
17
|
|
|
14
|
-
|
|
18
|
+
**Aegis doesn't let that happen.**
|
|
15
19
|
|
|
16
|
-
|
|
20
|
+
It stands between your gateway and disaster — a tireless sentinel that detects failures in seconds, diagnoses the root cause, repairs what it can, and alerts you through channels that bypass the gateway entirely.
|
|
17
21
|
|
|
18
|
-
|
|
22
|
+
### What It Does
|
|
19
23
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
24
|
+
| | |
|
|
25
|
+
|---|---|
|
|
26
|
+
| **Detects** | 10 health probes scan process, port, HTTP, config, WebSocket, TUN, memory, CPU, disk, and logs every 10 seconds |
|
|
27
|
+
| **Diagnoses** | 6 failure pattern matchers identify poison configs, stale PIDs, port conflicts, permission errors, corruption, and OOM kills |
|
|
28
|
+
| **Heals** | L1 restart, L2 targeted repair, L3 deep repair (network, dependencies, safe mode, disk), config rollback — all automatic |
|
|
29
|
+
| **Alerts** | 8 out-of-band providers (ntfy, Telegram, WhatsApp, Slack, Discord, Email, Pushover, webhook) that work even when the gateway is dead |
|
|
30
|
+
| **Responds** | Message `/health` on Telegram, WhatsApp, Slack, or Discord — Aegis replies with real-time status |
|
|
31
|
+
| **Remembers** | Full incident timeline, MTTR tracking, and a 18-endpoint REST API for dashboard integration |
|
|
25
32
|
|
|
26
33
|
**Total downtime: ~15 seconds instead of hours.**
|
|
27
34
|
|
|
@@ -29,56 +36,45 @@ Aegis prevents this:
|
|
|
29
36
|
|
|
30
37
|
## Quick Start
|
|
31
38
|
|
|
39
|
+
Three commands. That's it.
|
|
40
|
+
|
|
32
41
|
```bash
|
|
33
|
-
#
|
|
42
|
+
# Deploy the shield
|
|
34
43
|
npm install -g openclaw-aegis
|
|
35
44
|
|
|
36
|
-
#
|
|
45
|
+
# Auto-detect your gateway — zero questions asked
|
|
37
46
|
aegis init --auto
|
|
38
47
|
|
|
39
|
-
#
|
|
48
|
+
# Confirm the shield is up
|
|
40
49
|
aegis check
|
|
41
50
|
```
|
|
42
51
|
|
|
43
|
-
Output:
|
|
44
52
|
```
|
|
45
53
|
Health: HEALTHY (score: 10)
|
|
46
54
|
Probes: 10 passed, 0 failed
|
|
47
55
|
```
|
|
48
56
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
## Commands
|
|
52
|
-
|
|
53
|
-
| Command | Description |
|
|
54
|
-
|---------|-------------|
|
|
55
|
-
| `aegis init` | Interactive setup wizard |
|
|
56
|
-
| `aegis init --auto` | Auto-detect everything, zero prompts |
|
|
57
|
-
| `aegis check` | Run all 10 health probes once |
|
|
58
|
-
| `aegis check --json` | JSON output for scripting |
|
|
59
|
-
| `aegis status` | Health dashboard with per-probe details |
|
|
60
|
-
| `aegis test-alert` | Send a test notification to all configured channels |
|
|
61
|
-
| `aegis incidents` | Browse past incident logs |
|
|
62
|
-
| `aegis incidents <id>` | Show full timeline for a specific incident |
|
|
63
|
-
| `aegis serve` | Start REST API server + bot listeners |
|
|
57
|
+
Your gateway is now protected.
|
|
64
58
|
|
|
65
59
|
---
|
|
66
60
|
|
|
67
|
-
##
|
|
61
|
+
## Arsenal
|
|
68
62
|
|
|
69
|
-
|
|
|
70
|
-
|
|
71
|
-
|
|
|
72
|
-
|
|
|
73
|
-
|
|
|
74
|
-
|
|
|
75
|
-
|
|
|
76
|
-
|
|
|
77
|
-
|
|
|
63
|
+
| Command | What It Does |
|
|
64
|
+
|---------|-------------|
|
|
65
|
+
| `aegis init` | Interactive setup — walks you through everything |
|
|
66
|
+
| `aegis init --auto` | Zero-config setup — detects gateway, sets defaults |
|
|
67
|
+
| `aegis check` | Run all 10 probes, get a health verdict |
|
|
68
|
+
| `aegis check --json` | Machine-readable output for scripts and monitoring |
|
|
69
|
+
| `aegis status` | Live dashboard — every probe, color-coded |
|
|
70
|
+
| `aegis test-alert` | Fire a test alert to all configured channels |
|
|
71
|
+
| `aegis incidents` | Browse past battles — what failed, what was fixed |
|
|
72
|
+
| `aegis incidents <id>` | Full incident timeline with every recovery step |
|
|
73
|
+
| `aegis serve` | Start REST API + bot listeners for dashboard integration |
|
|
78
74
|
|
|
79
75
|
---
|
|
80
76
|
|
|
81
|
-
##
|
|
77
|
+
## Defense Architecture
|
|
82
78
|
|
|
83
79
|
```
|
|
84
80
|
OpenClaw Gateway Aegis Sidecar
|
|
@@ -87,11 +83,12 @@ OpenClaw Gateway Aegis Sidecar
|
|
|
87
83
|
│ ~/.openclaw/ │◄────────►│ Config Guardian │
|
|
88
84
|
│ openclaw.json │ │ Dead Man's Switch │
|
|
89
85
|
│ logs/ │ │ Recovery Orchestrator │
|
|
90
|
-
│ │ │ L1: Restart
|
|
86
|
+
│ │ │ L1: Quick Restart │
|
|
91
87
|
│ systemd/launchd │◄─────────│ L2: Targeted Repair │
|
|
88
|
+
│ │ │ L3: Deep Repair │
|
|
92
89
|
│ │ │ L4: Human Alert │
|
|
93
90
|
└─────────────────────┘ │ Alert Dispatcher │
|
|
94
|
-
│ (8
|
|
91
|
+
│ (8 out-of-band providers) │
|
|
95
92
|
└──────────────────────────────┘
|
|
96
93
|
│
|
|
97
94
|
Out-of-band
|
|
@@ -102,13 +99,46 @@ OpenClaw Gateway Aegis Sidecar
|
|
|
102
99
|
Your phone
|
|
103
100
|
```
|
|
104
101
|
|
|
102
|
+
Alerts bypass the gateway entirely. If the gateway is down, Aegis talks directly to Telegram, Slack, Discord, and the rest. **No single point of failure.**
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Recovery Cascade
|
|
107
|
+
|
|
108
|
+
When Aegis detects a problem, it doesn't just restart and pray:
|
|
109
|
+
|
|
110
|
+
**L1 — Quick Restart** (5s) — Pre-flight config check first. If config is clean, restart with exponential backoff. If config is poisoned, skip straight to L2.
|
|
111
|
+
|
|
112
|
+
**L2 — Targeted Repair** (30s-2min) — Diagnose the exact failure pattern and apply the right fix. Restore known-good config, delete stale PID files, fix permissions.
|
|
113
|
+
|
|
114
|
+
**L3 — Deep Repair** (30s-2min) — Riskier fixes when L2 isn't enough. Network repair (DNS flush, TUN reset), process resurrection (reinstall binary), dependency rebuild, safe mode boot, and disk cleanup.
|
|
115
|
+
|
|
116
|
+
**L4 — Human Alert** (instant) — When auto-recovery fails, Aegis sends a full incident report through every configured channel. You get the health score, what was tried, and why it failed.
|
|
117
|
+
|
|
118
|
+
Anti-flap protection, circuit breakers, and exponential backoff prevent crash loops. Aegis won't make things worse.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Documentation
|
|
123
|
+
|
|
124
|
+
| Document | Description |
|
|
125
|
+
|----------|-------------|
|
|
126
|
+
| [Getting Started](docs/getting-started.md) | Installation, first setup, verification |
|
|
127
|
+
| [Architecture](docs/architecture.md) | Probe pipeline, recovery tiers, system design |
|
|
128
|
+
| [Configuration](docs/configuration.md) | Full TOML reference — every knob and dial |
|
|
129
|
+
| [Alerts](docs/alerts.md) | Setup guides for all 8 providers |
|
|
130
|
+
| [CLI Reference](docs/cli-reference.md) | Every command with examples |
|
|
131
|
+
| [Contributing](docs/contributing.md) | Dev setup, testing, PR process |
|
|
132
|
+
| [Releasing](docs/releasing.md) | Version bumps, npm publish, GitHub releases |
|
|
133
|
+
| [Roadmap](docs/roadmap.md) | What's coming — L3 recovery, observability, fleet management |
|
|
134
|
+
|
|
105
135
|
---
|
|
106
136
|
|
|
107
137
|
## Requirements
|
|
108
138
|
|
|
109
|
-
- Node.js >= 18
|
|
110
|
-
- OpenClaw Gateway (any version with `openclaw gateway health`
|
|
111
|
-
- Linux (systemd) or macOS (launchd)
|
|
139
|
+
- **Node.js** >= 18
|
|
140
|
+
- **OpenClaw Gateway** (any version with `openclaw gateway health`)
|
|
141
|
+
- **Linux** (systemd) or **macOS** (launchd)
|
|
112
142
|
|
|
113
143
|
---
|
|
114
144
|
|