@glassmkr/crucible 0.13.3 → 0.13.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +41 -33
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -3,25 +3,29 @@
3
3
  [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
4
4
  [![npm version](https://img.shields.io/npm/v/@glassmkr/crucible.svg)](https://www.npmjs.com/package/@glassmkr/crucible)
5
5
 
6
- <!-- Canonical rule count: see RULES_COUNT.md in the Glassmkr monorepo. -->
7
- Lightweight bare metal server monitoring agent. Collects hardware and OS health every 5 minutes and pushes snapshots to the [Glassmkr Dashboard](https://app.glassmkr.com), which evaluates 38 alert rules and sends notifications.
6
+ <!-- Canonical rule count: 61 across 9 categories. -->
7
+ Lightweight bare-metal server monitoring agent. Collects hardware and OS health every 60 seconds at the default interval and pushes snapshots to the [Glassmkr Dashboard](https://app.glassmkr.com), which evaluates 61 alert rules across 9 categories and sends notifications.
8
8
 
9
- Open source. MIT licensed. Built by [Glassmkr](https://glassmkr.com). See also the [Bench MCP packages](https://glassmkr.com/docs/mcp) (`@glassmkr/bench-*` on npm) for AI-tool access to your Glassmkr fleet.
9
+ Open source. MIT licensed. Built by [Glassmkr](https://glassmkr.com). Crucible is the open-source product; the optional [Glassmkr Dashboard](https://app.glassmkr.com) is a hosted SaaS that consumes Crucible's snapshots.
10
10
 
11
- **Resource usage:** ~90MB RSS memory (varies by hardware: servers with more IPMI sensors use more), <0.1% CPU at 5-minute collection interval. Collects IPMI, SMART, ZFS, network bonds, security posture, conntrack, systemd, NTP, and file descriptors.
11
+ **Resource usage:** median ~91 MB RSS at idle (validation-fleet measurement 2026-05-21 across 7 hosts, 3 vendors, 4 OS families; range 65 to 103 MB peak; varies primarily with disk count and IPMI sensor count). Effectively 0% CPU at the default 60-second snapshot interval. Random-read I/O throughput delta under 1.5% under fio saturation (no measurable impact on customer workloads). The full measurement campaign lives at [`docs/measurements/2026-05-19/`](docs/measurements/2026-05-19/).
12
12
 
13
- **Security:** See [glassmkr.com/security](https://glassmkr.com/security) for the full list of what Crucible does and does not collect.
13
+ **Security:** See [glassmkr.com/trust](https://glassmkr.com/trust) for the full list of what Crucible does and does not collect.
14
14
 
15
15
  ## Screenshots
16
16
 
17
- ![Dashboard alerts with fix commands](https://glassmkr.com/screenshots/alerts.png)
18
- *Alerts grouped by server, with AI-generated fix commands for each rule.*
17
+ ![Dashboard alert with copy-pasteable fix commands](https://glassmkr.com/screenshots/alerts.png)
18
+ *A P1 alert showing the rule trigger, evidence, and the exact remediation
19
+ commands. Each rule ships pre-written fix content; the agent does not write
20
+ to your server.*
19
21
 
20
- ![Storage, SMART health, and network bonds](https://glassmkr.com/screenshots/hardware.png)
21
- *Per-disk SMART status, storage capacity, and network interface bonding.*
22
+ ![Storage and SMART drive health](https://glassmkr.com/screenshots/hardware.png)
23
+ *Per-mount capacity and per-disk SMART status. Drives are checked against
24
+ SMART attributes, NVMe Critical Warning bits, and ZFS pool state.*
22
25
 
23
- ![Security posture and server overview](https://glassmkr.com/screenshots/overview.png)
24
- *Security posture, server overview, and active alerts.*
26
+ ![Server fleet overview](https://glassmkr.com/screenshots/overview.png)
27
+ *Fleet view with per-server status, distro, IP, and last-seen timestamp.
28
+ Alerted servers surface a counter at a glance.*
25
29
 
26
30
  ## Install
27
31
 
@@ -56,7 +60,7 @@ sudo mkdir -p /etc/glassmkr
56
60
  sudo tee /etc/glassmkr/collector.yaml << 'EOF'
57
61
  server_name: "web-01"
58
62
  collection:
59
- interval_seconds: 300
63
+ interval_seconds: 60
60
64
  ipmi: true
61
65
  smart: true
62
66
  dashboard:
@@ -73,7 +77,7 @@ docker compose up -d
73
77
  docker compose logs -f crucible
74
78
  ```
75
79
 
76
- Images are published to [ghcr.io/glassmkr/crucible](https://github.com/glassmkr/crucible/pkgs/container/crucible) on every tag release. The container needs `--privileged` and `network_mode: host` for IPMI, SMART, and accurate host network monitoring. Details in the [compose file](./docker-compose.yml).
80
+ Images are published to both [`ghcr.io/glassmkr/crucible`](https://github.com/glassmkr/crucible/pkgs/container/crucible) and [`docker.io/glassmkr/crucible`](https://hub.docker.com/r/glassmkr/crucible) on every tag release; either works. The container needs `--privileged` and `network_mode: host` for IPMI, SMART, and accurate host network monitoring. Details in the [compose file](./docker-compose.yml).
77
81
 
78
82
  ## Quick Start
79
83
 
@@ -120,7 +124,7 @@ Options:
120
124
  ```yaml
121
125
  server_name: "web-01"
122
126
  collection:
123
- interval_seconds: 300
127
+ interval_seconds: 60
124
128
  ipmi: true
125
129
  smart: true
126
130
  dashboard:
@@ -153,7 +157,7 @@ dashboard:
153
157
  api_key: "gmk_cru_live_..."
154
158
  ```
155
159
 
156
- The `api_key` value itself is unchanged only the parent key
160
+ The `api_key` value itself is unchanged; only the parent key
157
161
  (`forge:` → `dashboard:`) and the endpoint hostname need updating.
158
162
  After the edit, restart the service:
159
163
 
@@ -246,22 +250,26 @@ this detection automatically; the manual flow above is just the equivalent.
246
250
  | Module | Data |
247
251
  |--------|------|
248
252
  | CPU | Aggregate and per-core utilization (user, system, iowait, idle) |
249
- | Memory | RAM usage, swap usage |
250
- | Disks | Space per mount point, inode counts, mount options, filesystem type |
251
- | SMART | Drive health, model, temperature, power-on hours, reallocated sectors, NVMe wear |
252
- | Network | Interface traffic, delta error/drop counters, link speed |
253
- | RAID | mdadm array status, degraded detection |
254
- | IPMI | Sensor readings, ECC errors, SEL events, fan RPM |
255
- | Security | SSH config, firewall status, pending updates, kernel vulnerabilities, kernel-needs-reboot |
256
- | ZFS | Pool state, scrub age, scrub errors |
257
- | I/O | Per-device latency, IOPS, dmesg I/O errors |
258
- | Conntrack | nf_conntrack table usage |
259
- | Systemd | Failed unit count |
253
+ | Memory | RAM usage, swap usage, EDAC counters, vmstat pswpin/pswpout |
254
+ | Pressure (PSI) | cpu / io / memory `some` and `full` stall avg + total (kernel >= 4.20) |
255
+ | Disks | Space per mount point, inode counts, mount options, filesystem type, LVM thin metadata |
256
+ | SMART | Drive health, model, temperature, power-on hours, reallocated sectors, NVMe wear, NVMe Critical Warning decode |
257
+ | Network | Interface traffic, delta error/drop counters, link speed, ethtool advertised modes, softnet per-CPU drops |
258
+ | RAID | mdadm array status, degraded detection; hardware RAID via storcli/perccli (fleet-tested), ssacli/arcconf (stub) |
259
+ | IPMI | Sensor readings, ECC errors, SEL events, fan RPM, PSU redundancy state; vendor SEL parsers (Dell/Supermicro/HPE fleet-tested, Lenovo/Cisco/OpenBMC stub) |
260
+ | Security | SSH config, firewall status, pending updates, kernel vulnerabilities, kernel-needs-reboot, CVE collection |
261
+ | ZFS | Pool state, vdev redundancy class, SLOG/L2ARC split, scrub age, scrub errors |
262
+ | GPU (NVIDIA) | nvidia-smi tier 1 (default), DCGM tier 2 (enrichment), Redfish OEM tier 3 (stub); per-GPU XID events, temperature, ECC, power draw, PCIe link state |
263
+ | I/O | Per-device latency, IOPS, dmesg I/O errors, structured dmesg events |
264
+ | Conntrack | nf_conntrack table usage, insert_failed rate |
265
+ | Network process | Per-process FD scan, LACP partner state, TCP retrans rate |
266
+ | Systemd | Failed unit count, Result codes (oom-kill, watchdog, signal) |
260
267
  | NTP | Sync state and source |
261
268
  | File descriptors | System-wide allocation |
269
+ | Reboot evidence | pstore / kdump / wtmp; expected-vs-unexpected reboot classification |
262
270
 
263
- <!-- Canonical rule count: see RULES_COUNT.md in the Glassmkr monorepo. -->
264
- Dashboard evaluates 38 alert rules server-side across OS, Storage, Network, Hardware, ZFS, Security, and Service Health, with priorities P1 Urgent through P4 Low. Full list: [app.glassmkr.com/docs/alerts](https://app.glassmkr.com/docs/alerts).
271
+ <!-- Canonical rule count: 61 across 9 categories. -->
272
+ Dashboard evaluates 61 alert rules server-side across 9 categories (storage, zfs, filesystem, memory & CPU, network, hardware/BMC, time & services, security & patching, GPU), with priorities P1 Urgent through P4 Low. 20 rules ship with deep FIX content (copy-pasteable remediation + verdict prior + rollback notes); 30+ are verified end-to-end on real hardware. Full list: [glassmkr.com/docs/rules](https://glassmkr.com/docs/rules).
265
273
 
266
274
  ## Requirements
267
275
 
@@ -272,11 +280,11 @@ Dashboard evaluates 38 alert rules server-side across OS, Storage, Network, Hard
272
280
 
273
281
  ## Documentation
274
282
 
275
- - [Getting Started](https://app.glassmkr.com/docs/getting-started)
276
- - [Configuration Reference](https://app.glassmkr.com/docs/configuration)
277
- - [Alert Rules (38)](https://app.glassmkr.com/docs/alerts)
278
- - [Troubleshooting](https://app.glassmkr.com/docs/troubleshooting)
279
- - [API Reference](https://app.glassmkr.com/docs/api)
283
+ - [Getting Started](https://glassmkr.com/docs/getting-started)
284
+ - [Configuration Reference](https://glassmkr.com/docs/configuration)
285
+ - [Alert Rules (61)](https://glassmkr.com/docs/rules)
286
+ - [Troubleshooting](https://glassmkr.com/docs/troubleshooting)
287
+ - [API Reference](https://glassmkr.com/docs/api)
280
288
 
281
289
  ## License
282
290
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@glassmkr/crucible",
3
- "version": "0.13.3",
3
+ "version": "0.13.4",
4
4
  "description": "Lightweight bare metal server monitoring. IPMI, SMART, OS, network. Opinionated alerts.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",