@glassmkr/crucible 0.13.3 → 0.13.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +41 -33
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -3,25 +3,29 @@
|
|
|
3
3
|
[](LICENSE)
|
|
4
4
|
[](https://www.npmjs.com/package/@glassmkr/crucible)
|
|
5
5
|
|
|
6
|
-
<!-- Canonical rule count:
|
|
7
|
-
Lightweight bare
|
|
6
|
+
<!-- Canonical rule count: 61 across 9 categories. -->
|
|
7
|
+
Lightweight bare-metal server monitoring agent. Collects hardware and OS health every 60 seconds at the default interval and pushes snapshots to the [Glassmkr Dashboard](https://app.glassmkr.com), which evaluates 61 alert rules across 9 categories and sends notifications.
|
|
8
8
|
|
|
9
|
-
Open source. MIT licensed. Built by [Glassmkr](https://glassmkr.com).
|
|
9
|
+
Open source. MIT licensed. Built by [Glassmkr](https://glassmkr.com). Crucible is the open-source product; the optional [Glassmkr Dashboard](https://app.glassmkr.com) is a hosted SaaS that consumes Crucible's snapshots.
|
|
10
10
|
|
|
11
|
-
**Resource usage:** ~
|
|
11
|
+
**Resource usage:** median ~91 MB RSS at idle (validation-fleet measurement 2026-05-21 across 7 hosts, 3 vendors, 4 OS families; range 65 to 103 MB peak; varies primarily with disk count and IPMI sensor count). Effectively 0% CPU at the default 60-second snapshot interval. Random-read I/O throughput delta under 1.5% under fio saturation (no measurable impact on customer workloads). The full measurement campaign lives at [`docs/measurements/2026-05-19/`](docs/measurements/2026-05-19/).
|
|
12
12
|
|
|
13
|
-
**Security:** See [glassmkr.com/
|
|
13
|
+
**Security:** See [glassmkr.com/trust](https://glassmkr.com/trust) for the full list of what Crucible does and does not collect.
|
|
14
14
|
|
|
15
15
|
## Screenshots
|
|
16
16
|
|
|
17
|
-

|
|
18
|
+
*A P1 alert showing the rule trigger, evidence, and the exact remediation
|
|
19
|
+
commands. Each rule ships pre-written fix content; the agent does not write
|
|
20
|
+
to your server.*
|
|
19
21
|
|
|
20
|
-

|
|
23
|
+
*Per-mount capacity and per-disk SMART status. Drives are checked against
|
|
24
|
+
SMART attributes, NVMe Critical Warning bits, and ZFS pool state.*
|
|
22
25
|
|
|
23
|
-

|
|
27
|
+
*Fleet view with per-server status, distro, IP, and last-seen timestamp.
|
|
28
|
+
Alerted servers surface a counter at a glance.*
|
|
25
29
|
|
|
26
30
|
## Install
|
|
27
31
|
|
|
@@ -56,7 +60,7 @@ sudo mkdir -p /etc/glassmkr
|
|
|
56
60
|
sudo tee /etc/glassmkr/collector.yaml << 'EOF'
|
|
57
61
|
server_name: "web-01"
|
|
58
62
|
collection:
|
|
59
|
-
interval_seconds:
|
|
63
|
+
interval_seconds: 60
|
|
60
64
|
ipmi: true
|
|
61
65
|
smart: true
|
|
62
66
|
dashboard:
|
|
@@ -73,7 +77,7 @@ docker compose up -d
|
|
|
73
77
|
docker compose logs -f crucible
|
|
74
78
|
```
|
|
75
79
|
|
|
76
|
-
Images are published to [ghcr.io/glassmkr/crucible](https://github.com/glassmkr/crucible/pkgs/container/crucible) on every tag release. The container needs `--privileged` and `network_mode: host` for IPMI, SMART, and accurate host network monitoring. Details in the [compose file](./docker-compose.yml).
|
|
80
|
+
Images are published to both [`ghcr.io/glassmkr/crucible`](https://github.com/glassmkr/crucible/pkgs/container/crucible) and [`docker.io/glassmkr/crucible`](https://hub.docker.com/r/glassmkr/crucible) on every tag release; either works. The container needs `--privileged` and `network_mode: host` for IPMI, SMART, and accurate host network monitoring. Details in the [compose file](./docker-compose.yml).
|
|
77
81
|
|
|
78
82
|
## Quick Start
|
|
79
83
|
|
|
@@ -120,7 +124,7 @@ Options:
|
|
|
120
124
|
```yaml
|
|
121
125
|
server_name: "web-01"
|
|
122
126
|
collection:
|
|
123
|
-
interval_seconds:
|
|
127
|
+
interval_seconds: 60
|
|
124
128
|
ipmi: true
|
|
125
129
|
smart: true
|
|
126
130
|
dashboard:
|
|
@@ -153,7 +157,7 @@ dashboard:
|
|
|
153
157
|
api_key: "gmk_cru_live_..."
|
|
154
158
|
```
|
|
155
159
|
|
|
156
|
-
The `api_key` value itself is unchanged
|
|
160
|
+
The `api_key` value itself is unchanged; only the parent key
|
|
157
161
|
(`forge:` → `dashboard:`) and the endpoint hostname need updating.
|
|
158
162
|
After the edit, restart the service:
|
|
159
163
|
|
|
@@ -246,22 +250,26 @@ this detection automatically; the manual flow above is just the equivalent.
|
|
|
246
250
|
| Module | Data |
|
|
247
251
|
|--------|------|
|
|
248
252
|
| CPU | Aggregate and per-core utilization (user, system, iowait, idle) |
|
|
249
|
-
| Memory | RAM usage, swap usage |
|
|
250
|
-
|
|
|
251
|
-
|
|
|
252
|
-
|
|
|
253
|
-
|
|
|
254
|
-
|
|
|
255
|
-
|
|
|
256
|
-
|
|
|
257
|
-
|
|
|
258
|
-
|
|
|
259
|
-
|
|
|
253
|
+
| Memory | RAM usage, swap usage, EDAC counters, vmstat pswpin/pswpout |
|
|
254
|
+
| Pressure (PSI) | cpu / io / memory `some` and `full` stall avg + total (kernel >= 4.20) |
|
|
255
|
+
| Disks | Space per mount point, inode counts, mount options, filesystem type, LVM thin metadata |
|
|
256
|
+
| SMART | Drive health, model, temperature, power-on hours, reallocated sectors, NVMe wear, NVMe Critical Warning decode |
|
|
257
|
+
| Network | Interface traffic, delta error/drop counters, link speed, ethtool advertised modes, softnet per-CPU drops |
|
|
258
|
+
| RAID | mdadm array status, degraded detection; hardware RAID via storcli/perccli (fleet-tested), ssacli/arcconf (stub) |
|
|
259
|
+
| IPMI | Sensor readings, ECC errors, SEL events, fan RPM, PSU redundancy state; vendor SEL parsers (Dell/Supermicro/HPE fleet-tested, Lenovo/Cisco/OpenBMC stub) |
|
|
260
|
+
| Security | SSH config, firewall status, pending updates, kernel vulnerabilities, kernel-needs-reboot, CVE collection |
|
|
261
|
+
| ZFS | Pool state, vdev redundancy class, SLOG/L2ARC split, scrub age, scrub errors |
|
|
262
|
+
| GPU (NVIDIA) | nvidia-smi tier 1 (default), DCGM tier 2 (enrichment), Redfish OEM tier 3 (stub); per-GPU XID events, temperature, ECC, power draw, PCIe link state |
|
|
263
|
+
| I/O | Per-device latency, IOPS, dmesg I/O errors, structured dmesg events |
|
|
264
|
+
| Conntrack | nf_conntrack table usage, insert_failed rate |
|
|
265
|
+
| Network process | Per-process FD scan, LACP partner state, TCP retrans rate |
|
|
266
|
+
| Systemd | Failed unit count, Result codes (oom-kill, watchdog, signal) |
|
|
260
267
|
| NTP | Sync state and source |
|
|
261
268
|
| File descriptors | System-wide allocation |
|
|
269
|
+
| Reboot evidence | pstore / kdump / wtmp; expected-vs-unexpected reboot classification |
|
|
262
270
|
|
|
263
|
-
<!-- Canonical rule count:
|
|
264
|
-
Dashboard evaluates
|
|
271
|
+
<!-- Canonical rule count: 61 across 9 categories. -->
|
|
272
|
+
Dashboard evaluates 61 alert rules server-side across 9 categories (storage, zfs, filesystem, memory & CPU, network, hardware/BMC, time & services, security & patching, GPU), with priorities P1 Urgent through P4 Low. 20 rules ship with deep FIX content (copy-pasteable remediation + verdict prior + rollback notes); 30+ are verified end-to-end on real hardware. Full list: [glassmkr.com/docs/rules](https://glassmkr.com/docs/rules).
|
|
265
273
|
|
|
266
274
|
## Requirements
|
|
267
275
|
|
|
@@ -272,11 +280,11 @@ Dashboard evaluates 38 alert rules server-side across OS, Storage, Network, Hard
|
|
|
272
280
|
|
|
273
281
|
## Documentation
|
|
274
282
|
|
|
275
|
-
- [Getting Started](https://
|
|
276
|
-
- [Configuration Reference](https://
|
|
277
|
-
- [Alert Rules (
|
|
278
|
-
- [Troubleshooting](https://
|
|
279
|
-
- [API Reference](https://
|
|
283
|
+
- [Getting Started](https://glassmkr.com/docs/getting-started)
|
|
284
|
+
- [Configuration Reference](https://glassmkr.com/docs/configuration)
|
|
285
|
+
- [Alert Rules (61)](https://glassmkr.com/docs/rules)
|
|
286
|
+
- [Troubleshooting](https://glassmkr.com/docs/troubleshooting)
|
|
287
|
+
- [API Reference](https://glassmkr.com/docs/api)
|
|
280
288
|
|
|
281
289
|
## License
|
|
282
290
|
|