wtftools 0.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. wtftools-0.0.0/CHANGELOG.md +398 -0
  2. wtftools-0.0.0/LICENSE +21 -0
  3. wtftools-0.0.0/MANIFEST.in +5 -0
  4. wtftools-0.0.0/PKG-INFO +246 -0
  5. wtftools-0.0.0/README.md +184 -0
  6. wtftools-0.0.0/pyproject.toml +108 -0
  7. wtftools-0.0.0/scripts/build-deb.sh +33 -0
  8. wtftools-0.0.0/scripts/wtf.bash-completion +134 -0
  9. wtftools-0.0.0/setup.cfg +4 -0
  10. wtftools-0.0.0/tests/test_audit.py +331 -0
  11. wtftools-0.0.0/tests/test_audit_extras.py +116 -0
  12. wtftools-0.0.0/tests/test_colors.py +91 -0
  13. wtftools-0.0.0/tests/test_config.py +100 -0
  14. wtftools-0.0.0/tests/test_cron.py +390 -0
  15. wtftools-0.0.0/tests/test_explain_deep.py +254 -0
  16. wtftools-0.0.0/tests/test_info.py +100 -0
  17. wtftools-0.0.0/tests/test_iteration10.py +455 -0
  18. wtftools-0.0.0/tests/test_iteration2_extras.py +206 -0
  19. wtftools-0.0.0/tests/test_iteration3.py +359 -0
  20. wtftools-0.0.0/tests/test_iteration4.py +368 -0
  21. wtftools-0.0.0/tests/test_iteration5.py +315 -0
  22. wtftools-0.0.0/tests/test_iteration6.py +380 -0
  23. wtftools-0.0.0/tests/test_iteration7.py +405 -0
  24. wtftools-0.0.0/tests/test_iteration8.py +389 -0
  25. wtftools-0.0.0/tests/test_main.py +245 -0
  26. wtftools-0.0.0/tests/test_main_extras.py +144 -0
  27. wtftools-0.0.0/tests/test_public_api.py +99 -0
  28. wtftools-0.0.0/tests/test_sysinfo.py +660 -0
  29. wtftools-0.0.0/wtftools/__init__.py +55 -0
  30. wtftools-0.0.0/wtftools/__main__.py +10 -0
  31. wtftools-0.0.0/wtftools/audit.py +809 -0
  32. wtftools-0.0.0/wtftools/colors.py +111 -0
  33. wtftools-0.0.0/wtftools/config.py +249 -0
  34. wtftools-0.0.0/wtftools/cron.py +388 -0
  35. wtftools-0.0.0/wtftools/events.py +220 -0
  36. wtftools-0.0.0/wtftools/explain.py +290 -0
  37. wtftools-0.0.0/wtftools/info.py +90 -0
  38. wtftools-0.0.0/wtftools/llm.py +129 -0
  39. wtftools-0.0.0/wtftools/main.py +1328 -0
  40. wtftools-0.0.0/wtftools/snapshot.py +203 -0
  41. wtftools-0.0.0/wtftools/sysinfo.py +1608 -0
  42. wtftools-0.0.0/wtftools.egg-info/PKG-INFO +246 -0
  43. wtftools-0.0.0/wtftools.egg-info/SOURCES.txt +45 -0
  44. wtftools-0.0.0/wtftools.egg-info/dependency_links.txt +1 -0
  45. wtftools-0.0.0/wtftools.egg-info/entry_points.txt +3 -0
  46. wtftools-0.0.0/wtftools.egg-info/requires.txt +12 -0
  47. wtftools-0.0.0/wtftools.egg-info/top_level.txt +1 -0
@@ -0,0 +1,398 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
5
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## [Unreleased]
8
+
9
+ ### Removed — plugin infrastructure
10
+ - `wtftools/plugin_sdk.py` — Python helper for plugin authors.
11
+ - `wtftools/checks/plugins.py` — discovery / executor / parser for
12
+ `/etc/wtf/checks.d/` scripts (bash + Python).
13
+ - `_plugin_to_check` + `_all_check_callables` glue in `wtftools/audit.py`.
14
+ - `tests/test_plugins.py`, `tests/test_iteration16.py`.
15
+ - README's «Plugins» section and QUICKSTART's «Custom checks (plugins)»
16
+ section.
17
+
18
+ The CLI is now a closed set of built-in checks. Custom logic should live
19
+ upstream (e.g. monitoring tools) or be added as new built-in checks via PR.
20
+
21
+ ### Changed — layout flattening
22
+ - `wtftools/checks/cron.py` → `wtftools/cron.py`
23
+ - `wtftools/checks/sysinfo.py` → `wtftools/sysinfo.py`
24
+ - `wtftools/checks/` subpackage removed (`__init__.py` deleted).
25
+ - All imports updated: `from wtftools.checks import X` → `from wtftools import X`.
26
+
27
+ ### Build & release
28
+ - `release.yml` no longer publishes a Docker image to GHCR. On a `v*` /
29
+ `*.*.*` tag it now runs tests, builds a `.deb` via `scripts/build-deb.sh`
30
+ (stdeb + debhelper toolchain), and attaches the artifact to the matching
31
+ GitHub Release. PyPI publishing happens separately in `publish.yml` via
32
+ OIDC trusted-publisher.
33
+ - `Dockerfile` stays in-repo for ad-hoc `docker build .` use, but no longer
34
+ ships as a release artifact.
35
+
36
+ ### Tooling
37
+ - `black` added to `.pre-commit-config.yaml`, running before `ruff` on
38
+ every commit. Config lives in `pyproject.toml [tool.black]`
39
+ (`line-length=180`, `target-version=["py38"]`). Existing tree is already
40
+ black-compatible — no formatting churn.
41
+
42
+ ### Changed — scope cleanup
43
+ wtftools is now strictly a **one-shot CLI**. The daemon / fleet / multi-host
44
+ story was removed in favor of the original PROJECT.md Phase 1 vision:
45
+ one server, one command, immediate answer. Removed:
46
+
47
+ - `wtfd` daemon (HTTP server, periodic audit loop, `POST /run-now`)
48
+ - `wtf serve` subcommand and `wtfd` console-script entry point
49
+ - `wtf fleet` (multi-host aggregation) and `wtf compare HOSTA HOSTB`
50
+ - `wtf plugins` listing subcommand (plugins still load — see
51
+ `wtf audit --list-checks` for the `plugin:*` entries)
52
+ - `wtf motd-install` (replace with three lines of shell, see QUICKSTART)
53
+ - `wtf init` interactive wizard (its useful step — writing the example
54
+ config — is `wtf config --example | sudo tee /etc/wtftools/config.ini`)
55
+ - `--watch` flags on `wtf audit`, `wtf info`, `wtf events`
56
+ - `wtf audit --diff` (the standalone `wtf diff` command remains)
57
+ - Bundled `scripts/wtfd.service` systemd unit
58
+ - `wtftools/daemon.py` and `wtftools/fleet.py` modules
59
+
60
+ Kept: `wtf audit --save`, `wtf diff`, `wtf history` — snapshots are pure
61
+ filesystem operations under `~/.cache/wtftools/`, no daemon required.
62
+
63
+ ### Added
64
+ - **`wtf problems`** — alias for `wtf audit --only problem`, surfaces just
65
+ the WARN+FAIL rows. Most common audit invocation during an incident,
66
+ given its own subcommand for typing comfort.
67
+
68
+ ## [0.0.0] — 2026-05-20
69
+
70
+ Initial public release. Highlights:
71
+
72
+ - **19 subcommands** covering audit / info / services / logs / events /
73
+ history / diff / crontab / doctor / plugins / config / motd-install /
74
+ init / fleet / compare / explain / top / ports / serve.
75
+ - **`wtfd` daemon** with HTTP API (`/audit`, `/audit.json`, `/audit.prom`,
76
+ `/history`, `/snapshot/N`, `POST /run-now`) — drives the fleet story.
77
+ - **38 built-in checks**, plugin system with bash + Python SDK, six output
78
+ formats (text/json/csv/plain/html/prometheus).
79
+ - **Multi-host fleet aggregation** (`wtf fleet`) + host-to-host drift
80
+ detection (`wtf compare`), both with `--watch` and `--run-now`.
81
+ - **LLM bridge** for `wtf explain --llm ollama|claude|openai|auto`.
82
+ - Distribution: PyPI, debian packaging, Docker image, systemd unit,
83
+ bundled MOTD installer, bash completion, GitHub Actions
84
+ release workflow.
85
+ - **724 tests, 92.6 % coverage.**
86
+
87
+ ### Added — Plugin SDK & docs (final iteration)
88
+ - **`wtftools.plugin_sdk`** — tiny helper module so Python plugins don't have
89
+ to remember exit codes or hand-roll JSON:
90
+
91
+ ```python
92
+ #!/usr/bin/env python3
93
+ from wtftools.plugin_sdk import ok, warn, fail, skip
94
+ # ... your check ...
95
+ fail("internal-api unreachable", detail=["…"]) # exits 2 with JSON
96
+ ```
97
+
98
+ Exposes `ok / warn / fail / skip` (terminating) and `result(status, message,
99
+ detail=None)` (non-terminating, for scripts that emit multiple results).
100
+ Detail items are coerced to strings.
101
+ - **`examples/plugins/check-http-health.py`** — example Python plugin using
102
+ the SDK; probes an HTTP endpoint with latency thresholds.
103
+ - **`docs/PLUGIN_GUIDE.md`** — comprehensive plugin author's guide. Documents
104
+ both exit-code and JSON contracts, shows bash + Python quickstarts, lists
105
+ best practices, and points at the 5 example plugins.
106
+
107
+ - **`wtf fleet --watch SECONDS`** — auto-refresh fleet aggregation
108
+ (mirrors the existing `audit --watch` and `info --watch`). Default
109
+ off; pick an interval that respects N-hosts × per-fetch cost.
110
+ - **`wtf fleet --run-now`** — POST `/run-now` to every peer before
111
+ fetching, so the aggregator gets fresh data instead of cached snapshots.
112
+ Best-effort: partial failures print a `run-now reached M/N peer(s)`
113
+ status line and the fetch proceeds anyway.
114
+ - **`wtf events --watch SECONDS`** — auto-refresh the event timeline.
115
+ Useful in an incident war room.
116
+ - **`docs/QUICKSTART.md`** — 5-minute onboarding guide (README grew past 250
117
+ lines — newcomers needed something smaller). Covers install, the
118
+ incident-triage flow, fleet/Prometheus setup, custom checks, and a
119
+ cheat-sheet table mapping common questions to commands.
120
+
121
+ - **`wtf events`** — chronological host timeline. Merges six event sources
122
+ into one newest-first view: reboots (via `last -x reboot`), OOM kills,
123
+ kernel errors, failed-unit transitions, SSH auth failures, recent logins.
124
+ Flags: `--since HOURS` (default 24), `--kind KIND` (repeatable filter),
125
+ `--limit N`, `--format json`. Useful during incident post-mortems: one
126
+ command replaces `last`, `journalctl -k`, `journalctl SYSTEMD_UNIT=…`, and
127
+ several SSH-log greps.
128
+ - **`POST /run-now`** in wtfd — trigger an immediate audit from outside the
129
+ scheduler interval. Used by central dashboards that want fresh data on
130
+ demand. Returns `202 Accepted` instantly; the actual run completes in the
131
+ background and appears on the next `/audit.json` fetch. Auth-token-respecting.
132
+ The scheduler now wakes within ~1s of receiving a `/run-now`.
133
+
134
+ - **`wtf compare HOSTA HOSTB`** — side-by-side diff of two wtfd hosts.
135
+ Real-world SRE use case: «two boxes from the same template, why does
136
+ one behave differently?» Fetches `/audit.json` from both, walks the
137
+ merged set of check names, marks each row as `=` (identical), `DIFF`
138
+ (status or message differs), `A→` (only on A), `→B` (only on B).
139
+ `--only-drift` hides identical rows. `--format json` for pipelines.
140
+ `--token-file` if peers require Bearer auth. Exit code: 0 identical,
141
+ 1 drift present, 2 if at least one host is unreachable.
142
+ - **`wtf doctor --check-updates`** — opt-in PyPI version check. Queries
143
+ `https://pypi.org/pypi/wtftools/json` (3s timeout) and surfaces a
144
+ `[WARN]` row if a newer release is published. Off by default — `doctor`
145
+ stays an offline operation unless the operator explicitly opts in.
146
+
147
+ - **`wtf init`** — interactive setup wizard for fresh hosts. Walks through
148
+ four optional steps:
149
+ 1. write `/etc/wtftools/config.ini` (sample with defaults)
150
+ 2. install `/etc/update-motd.d/99-wtf-brief` for the ssh-login banner
151
+ 3. install + enable the bundled `wtfd.service` (off by default)
152
+ 4. add `/etc/cron.d/wtftools-hourly` for an hourly audit snapshot
153
+
154
+ Use `--non-interactive` for scripted deploys; `--dry-run` to preview;
155
+ per-step `--enable-X` / `--no-X` flags override defaults.
156
+ - **`examples/plugins/`** — four ready-to-use plugin scripts:
157
+ - `check-cert-domain.sh` — remote TLS cert expiry probe
158
+ - `check-postgres-connections.sh` — Postgres `pg_stat_activity` vs `max_connections`
159
+ - `check-redis-memory.sh` — Redis `used_memory` vs `maxmemory`
160
+ - `check-disk-write.sh` — quick fsync-write latency test
161
+
162
+ Drop any of these into `/etc/wtf/checks.d/` and `wtf audit` picks them up.
163
+ - **`docs/schema/`** — JSON Schema (draft-07) for `--format json` outputs:
164
+ `audit-v1.json` and `fleet-v1.json`. Use with `check-jsonschema` or any
165
+ validator to build typed parsers in your integration.
166
+ - **`CONTRIBUTING.md`** — dev setup, test/lint commands, how to add a new
167
+ check or subcommand, release flow.
168
+ - **GitHub Actions release workflow** (`.github/workflows/release.yml`) —
169
+ on `v*` tag: runs the suite, builds sdist+wheel, publishes to PyPI
170
+ (via `PYPI_API_TOKEN` secret), builds + pushes a Docker image to GHCR.
171
+
172
+ - **`wtf fleet`** — multi-host aggregation. Pulls `/audit.json` from each
173
+ configured wtfd peer in parallel (`urllib` + ThreadPoolExecutor, no extra
174
+ deps). Renders an at-a-glance fleet view sorted by severity:
175
+ unreachable → fail → warn → ok. Per-host row inlines the top two problems
176
+ so an SRE doesn't need to drill in to know what's broken.
177
+ - Targets from `--hosts a:8765,b:8765` (repeatable), `--hosts-file FILE`
178
+ (one per line, `#` comments), or `[thresholds] fleet_hosts = …` in the
179
+ config file. All sources merge and dedupe.
180
+ - `--token-file FILE` sends `Authorization: Bearer …` to every peer.
181
+ - `--problem-only` hides healthy hosts during incidents.
182
+ - `--format prometheus` emits one set of metrics per host
183
+ (`wtf_fleet_host_up{host="…"}`, `wtf_fleet_summary_count{host,status}`)
184
+ suitable for a single scrape job targeting the aggregator.
185
+ - Exit codes: 0 if all hosts OK; 1 if some unreachable but no FAIL;
186
+ 2 if any FAIL or everything is unreachable. CI-friendly.
187
+ - **Dockerfile** — `python:3.12-slim` base with `[full]` extras (psutil)
188
+ plus tools wtftools probes (procps, iproute2, smartmontools, openssl,
189
+ systemd-sysv, cron). `HEALTHCHECK` against `/healthz`, default entrypoint
190
+ is `wtf`. `.dockerignore` keeps the image lean.
191
+
192
+ - **LLM bridge for `wtf explain`** — closes the loop: instead of piping the
193
+ structured prompt to an LLM by hand, point at a backend directly.
194
+ - `wtf explain --llm ollama` — subprocess call to local ollama (no API key).
195
+ - `wtf explain --llm claude` — uses `anthropic` SDK if installed +
196
+ `ANTHROPIC_API_KEY` env. Default model: `claude-haiku-4-5-20251001`.
197
+ - `wtf explain --llm openai` — uses `openai` SDK + `OPENAI_API_KEY`.
198
+ - `wtf explain --llm auto` — tries ollama → claude → openai, returns the
199
+ first one that responds.
200
+ - `--llm-model` overrides the default model; `--llm-timeout` overrides 60s.
201
+ - No mandatory new dependencies — the SDKs are imported lazily, missing
202
+ backends become a graceful skip with an explanatory message.
203
+ - **`wtf audit --format html`** — self-contained HTML with inline CSS.
204
+ Color-coded rows, collapsible detail. Survives email/ticket paste.
205
+ - **`wtf audit --output FILE` / `-o FILE`** — write the audit to a file
206
+ instead of stdout. Drops ANSI escapes automatically so logs stay clean.
207
+ - **`fail2ban` check** — surfaces currently-banned IP counts per jail
208
+ (informational, not a problem signal). Skip when fail2ban-client missing
209
+ or the daemon is down.
210
+
211
+ - **`wtfd` daemon** — PROJECT.md Phase 2 landed. Stdlib-only single-process
212
+ daemon (`pip install wtftools` ships an extra `wtfd` console script).
213
+ Runs `audit` on a configurable cadence and serves the result over HTTP:
214
+ - `GET /` — brief one-liner (host, fail/warn counts, top problems)
215
+ - `GET /healthz` — liveness probe
216
+ - `GET /audit` / `/audit.txt` — current audit in plaintext
217
+ - `GET /audit.json` — full audit + summary + timestamp + error state
218
+ - `GET /audit.prom` — Prometheus textfile-collector
219
+ - `GET /history` — snapshot dir + list of recent basenames
220
+ - `GET /snapshot/N` — Nth-most-recent snapshot (by index or basename prefix)
221
+
222
+ Flags: `--listen HOST:PORT` (default `127.0.0.1:8765`), `--interval SEC`
223
+ (default 300 = 5 min), `--save` to persist each run as a snapshot,
224
+ `--auth-token-file PATH` for `Authorization: Bearer …` protection.
225
+ Every response carries `X-WTF-Host`, `X-WTF-Last-Audit`, `X-WTF-Version`
226
+ headers for trivial observability. Run via `wtf serve …` or the bare
227
+ `wtfd` console script.
228
+
229
+ - **systemd unit** in `scripts/wtfd.service` — `DynamicUser=yes`,
230
+ `StateDirectory=wtftools`, hardened (`ProtectSystem=strict`,
231
+ `NoNewPrivileges`, `ProtectKernel*`). Drop into `/etc/systemd/system/`,
232
+ `systemctl enable --now wtfd`.
233
+
234
+ - **`http-probes` and `tcp-probes` checks** — declare endpoints in
235
+ `[thresholds]` (`http_probes = http://localhost:80, http://localhost:9090`
236
+ and `tcp_probes = 127.0.0.1:5432, db.internal:6379`). Each becomes its own
237
+ audit row. HTTP non-2xx/3xx → FAIL; connect refused/timeout → FAIL; latency
238
+ ≥ `probe_slow_ms` → WARN. Uses stdlib `http.client` + `socket` — no extra
239
+ dependencies. Catches the "service is running but not actually serving"
240
+ failure mode that `failed-units` misses.
241
+ - **`smart` check** — per-disk SMART health via `smartctl -H -j` (requires
242
+ `smartmontools` package, typically also root). One FAILED disk → FAIL with
243
+ device name in detail. Discovers disks via `lsblk`, filters out loop devices
244
+ and partitions.
245
+ - **`wtf diff`** — standalone snapshot diff command. `wtf diff` compares the
246
+ latest snapshot to a fresh audit (same as `wtf audit --diff`).
247
+ `wtf diff --snapshot N` reaches back N snapshots. `wtf diff --against A B`
248
+ diffs two snapshot files directly without running a live audit (useful for
249
+ comparing snapshots shipped from other hosts).
250
+ - **`wtf audit --format plain`** — tab-separated `status<TAB>name<TAB>message`
251
+ rows. No headers, no summary, no colors. Designed for shell pipelines:
252
+ `wtf audit --format plain | awk '$1=="fail"'`.
253
+
254
+ - **`wtf top`** — focused process top with sort and filters.
255
+ `--sort cpu|rss`, `--user PREFIX`, `--name SUBSTRING`, `--limit N`.
256
+ Cuts through the noise of `wtf info`'s 5-row top section when you need
257
+ the bigger picture.
258
+ - **`wtf ports`** — listening sockets with owning PID, user, command.
259
+ Replaces `ss -tlnp` for the common "who's on :443?" question.
260
+ `--proto tcp|udp|all`, `--public-only` (drops 127.x).
261
+ - **`wtf motd-install`** — installs `/etc/update-motd.d/99-wtf-brief` so
262
+ every SSH login shows a one-line wtftools summary. `--path` to override
263
+ destination, requires root.
264
+ - **`hw-temp` check** — reads `/sys/class/hwmon/*/temp*_input`. ≥75°C WARN,
265
+ ≥90°C FAIL (configurable). Reports max + count, lists all sensors in
266
+ `-v` detail. Filters absurd readings (<-50°C or >200°C broken sensors).
267
+ - **`dns` check** — probes well-known hosts via the system resolver.
268
+ Configurable list (`dns_probe_hosts`, default `google.com,cloudflare.com`)
269
+ + 2s per-probe timeout. All resolve → OK. Some fail → WARN. None
270
+ resolve → FAIL (broken DNS / resolved.service down). Catches silently-
271
+ broken `systemd-resolved`.
272
+ - **`wtf audit --format csv`** — CSV output with name,status,message,detail
273
+ columns. For spreadsheet flows / lightweight reporting.
274
+
275
+ - **Snapshots, history, and diff** — `wtfd-lite` finally exists.
276
+ - `wtf audit --save` persists the current run to `~/.cache/wtftools/snapshots/`
277
+ (or `/var/lib/wtftools/snapshots/` when running as root, or
278
+ `$WTFTOOLS_SNAPSHOT_DIR` if set). Auto-rotates to keep the newest 48.
279
+ - `wtf audit --diff` compares the current audit to the most recent snapshot,
280
+ flagging regressions / recoveries / new / removed checks. Sorted with
281
+ regressions first.
282
+ - `wtf history` lists stored snapshots with status counts.
283
+ - Snapshot file format is plain JSON — easy to ship to a central host.
284
+ - **`docker` check** — surfaces containers in `unhealthy` or `Restarting` state.
285
+ `unhealthy` → FAIL, `restarting`-only → WARN. Skips cleanly when docker is
286
+ not installed or the daemon is unreachable.
287
+ - **NTP drift magnitude** in the `time-sync` check — when `chronyc tracking`
288
+ is available, the reported offset (ms) augments the binary sync/no-sync
289
+ signal. Drift ≥100ms → WARN, ≥1s → FAIL.
290
+ - **`wtf audit --format prometheus`** — Prometheus textfile-collector output.
291
+ Two metrics: `wtf_check_status{name="..."}` (0/1/2/3 for ok/warn/fail/skip)
292
+ and `wtf_summary_total{status="..."}`. Drop into node_exporter's
293
+ `--collector.textfile.directory` for scraping.
294
+ - **`wtf info --watch SECONDS`** — live-refresh the host snapshot (mirror of
295
+ the existing `wtf audit --watch`).
296
+
297
+ ### Added (earlier in this Unreleased cycle)
298
+ - **`wtf explain`** — turns audit findings into actionable per-check advice.
299
+ A rule-based table maps each `(name, status)` to a 1-2 sentence diagnosis
300
+ and concrete next steps (which command to run, which file to vacuum, etc.).
301
+ Covers every built-in check; unknown checks get a fallback hint.
302
+ - **`wtf explain --prompt`** — emit an LLM-ready prompt summarizing the audit.
303
+ Pipe to `claude`, `ollama run llama3`, or any other LLM for a synthesized
304
+ diagnosis without bundling an LLM dependency. The PROJECT.md headline finally
305
+ has a delivery vehicle.
306
+ - **`wtf audit --alert <cmd>`** — fire a shell command when audit produces
307
+ FAIL (or WARN, with `--alert-on warn`). Audit summary is piped to the
308
+ command's stdin; env vars `WTF_FAIL_COUNT`, `WTF_WARN_COUNT`, `WTF_HOST`
309
+ are set. Cron-driven monitoring without a notification client:
310
+ `wtf audit --alert 'mail -s "wtf $WTF_HOST" sre@example.com'`.
311
+ - **`conntrack` check** — reads `/proc/sys/net/netfilter/nf_conntrack_count`
312
+ vs `nf_conntrack_max`. NAT/firewall/proxy hosts silently drop new
313
+ connections when the table fills; ≥70% WARN, ≥90% FAIL (configurable).
314
+ - **`journal-disk` check** — parses `journalctl --disk-usage`. ≥4GB WARN,
315
+ ≥16GB FAIL (configurable). Includes a vacuum-size hint in the message.
316
+ - pyproject installs the bash-completion file system-wide.
317
+
318
+ ### Added (earlier in this Unreleased cycle)
319
+ - **Parallel check execution** — checks now run on a `ThreadPoolExecutor`
320
+ (default 8 workers, configurable via `config.ini` `parallel_workers` or env).
321
+ Typical full audit dropped from ~2.3s to ~1.2s on a 24-core dev machine; one
322
+ hung check no longer blocks the rest. Use `wtf audit --serial` to force the
323
+ old sequential path for debugging.
324
+ - **Per-check timeout** — every check gets a default 10s budget. A check that
325
+ exceeds it surfaces a `[SKIP]` result with a clear "timeout" message instead
326
+ of hanging the whole audit. Tune via `config.ini` `check_timeout` or
327
+ `wtf audit --check-timeout SECONDS`.
328
+ - **`psi` check** — reads `/proc/pressure/{cpu,memory,io}` (Linux ≥4.20). The
329
+ modern kernel signal for real resource contention. Thresholds on PSI `some
330
+ avg10`: ≥10% WARN, ≥30% FAIL (configurable). Three result rows: one per
331
+ resource. Auto-skipped when `psi=0` boot cmdline is set.
332
+ - **`kernel-taint` check** — reads `/proc/sys/kernel/tainted`. Non-zero means
333
+ the kernel saw a proprietary/forced/unsigned module, a machine check, a
334
+ soft-lockup, etc. Decodes the bitmask into readable flag names; severe bits
335
+ (`MACHINE_CHECK`, `SOFTLOCKUP`, `DIE`, `BAD_PAGE`) escalate to FAIL.
336
+ - **`cert-expiry` check** — walks server-cert dirs (`/etc/letsencrypt/live`,
337
+ `/etc/nginx/ssl`, `/etc/haproxy/certs`, …), parses `notAfter` via openssl.
338
+ ≥30d OK, <30d WARN, <7d FAIL. Bounded to 50 files. Avoids the system CA
339
+ store (`/etc/ssl/certs`) which legitimately ships long-expired root CAs.
340
+ - **`wtf logs`** — recent ERROR+ journal entries grouped by service. Flags:
341
+ `--since '1 hour ago'`, `--priority err`, `--units N`, `--lines N`,
342
+ `--format json`. Natural complement to `wtf services <name>`.
343
+
344
+ - **`wtf services <name>`** — focused drilldown for one systemd unit: shows
345
+ ActiveState, SubState, Result, UnitFileState, MainPID, NRestarts,
346
+ MemoryCurrent, listening ports owned by the main pid, plus the last N journal
347
+ lines. Replaces the SSH dance of `systemctl status … && journalctl -u … && ss -tlnp`.
348
+ - **Config file** — INI at `/etc/wtftools/config.ini`, `/etc/wtf/config.ini`,
349
+ or `~/.config/wtftools/config.ini`. Customizable thresholds for disk, memory,
350
+ swap, load, iowait, fds, pids, tcp-retrans, auth, service restarts, plus
351
+ `[ignore]` lists. Global `--config PATH` stacks a further file on top.
352
+ - **`wtf config`** — print effective values + search paths. `wtf config --example`
353
+ prints a fully-commented template ready for `> /etc/wtftools/config.ini`.
354
+ - **`wtf audit --ignore NAME`** — skip a check by short-name OR by result-name
355
+ (e.g. `--ignore "disk /mnt/Backup"` to hush a single noisy mount). Repeatable.
356
+ - **`tcp-retrans`** check — samples `/proc/net/snmp` TCP RetransSegs/OutSegs
357
+ over a 1-second window; ≥1% WARN, ≥5% FAIL (configurable).
358
+
359
+ ### Changed
360
+ - All audit thresholds now read from the active config (no more hardcoded
361
+ 85/95/30/70). Defaults match prior behavior exactly.
362
+ - `run_audit()` accepts `ignore=` and merges it with the config's
363
+ `[ignore]` lists.
364
+
365
+ - **Plugin system**: drop executable scripts into `/etc/wtf/checks.d/`,
366
+ `/etc/wtftools/checks.d/`, or `~/.config/wtftools/checks.d/`. Exit codes
367
+ `0=ok / 1=warn / 2=fail / 77=skip`; stdout becomes the message. A plugin
368
+ may also emit a one-line JSON object `{"status":..., "message":...,
369
+ "detail":[...]}` for full control. Plugins show up in `wtf audit` and
370
+ `wtf audit --list-checks` under the `plugin:<name>` namespace.
371
+ - `wtf plugins` — list discovery dirs and registered plugins.
372
+ - `restart-loops` audit check — flags active services where systemd has had
373
+ to bring them back ≥3 times (`NRestarts`). ≥10 → FAIL (the "flaky daemon"
374
+ case where the service technically "runs" but isn't healthy).
375
+ - `network-errors` audit check — reads `/sys/class/net/*/statistics/` and
376
+ surfaces interfaces with non-zero rx/tx errors or drops (≥1000 → WARN).
377
+ - `wtf audit --brief` / `-b` — one-line summary suitable for MOTD / SSH
378
+ banners: `wtf: 1 fail, 3 warn — swap: 99% · …`. Exit code mirrors severity.
379
+ - Example plugin in `scripts/example-plugin-check-tmp.sh` (warns when /tmp
380
+ usage crosses 80% / 95%).
381
+ - `wtf doctor` — self-diagnostic that probes which CLI tools (`systemctl`,
382
+ `journalctl`, `apt`, `timedatectl`, …) and `/proc` files are available.
383
+ Explains why checks may be skipped on this host.
384
+ - `wtf audit --check NAME` — run a single named check (repeatable). For CI
385
+ and scripted use (e.g. `wtf audit --check disks --check memory --format json`).
386
+ - `wtf audit --list-checks` — print the short names of every registered check.
387
+ - `wtf audit --only fail|warn|problem|skip|ok|all` — filter output by status.
388
+ Useful on terminal: `wtf audit --only problem` shows just what's broken.
389
+ - `wtf audit --since HOURS` — configurable look-back window for OOM, kernel
390
+ errors and failed-auth checks (was hardcoded to 24h).
391
+ - `wtf audit --watch SECONDS` — live mode that re-runs the audit and re-prints
392
+ every N seconds (Ctrl-C to exit).
393
+ - Bash completion in `scripts/wtf.bash-completion`.
394
+ - GitHub Actions CI workflow running tests + coverage on Python 3.10–3.12.
395
+
396
+ ### Changed
397
+ - Audit registry now keys checks by stable short names so `--check` / `--list-checks`
398
+ expose a documented, scriptable surface.
wtftools-0.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Aleksandr Pimenov
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,5 @@
1
+ include README.md
2
+ include LICENSE
3
+ include CHANGELOG.md
4
+ recursive-include wtftools *.py
5
+ recursive-include scripts *.sh *.bash-completion
@@ -0,0 +1,246 @@
1
+ Metadata-Version: 2.4
2
+ Name: wtftools
3
+ Version: 0.0.0
4
+ Summary: One command to see what is going on with your Linux server right now.
5
+ Author-email: Aleksandr Pimenov <wachawo@gmail.com>
6
+ Maintainer-email: Aleksandr Pimenov <wachawo@gmail.com>
7
+ License: MIT License
8
+
9
+ Copyright (c) 2026 Aleksandr Pimenov
10
+
11
+ Permission is hereby granted, free of charge, to any person obtaining a copy
12
+ of this software and associated documentation files (the "Software"), to deal
13
+ in the Software without restriction, including without limitation the rights
14
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
+ copies of the Software, and to permit persons to whom the Software is
16
+ furnished to do so, subject to the following conditions:
17
+
18
+ The above copyright notice and this permission notice shall be included in all
19
+ copies or substantial portions of the Software.
20
+
21
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
+ SOFTWARE.
28
+
29
+ Project-URL: Homepage, https://github.com/wachawo/wtftools
30
+ Project-URL: Repository, https://github.com/wachawo/wtftools.git
31
+ Project-URL: Documentation, https://github.com/wachawo/wtftools#readme
32
+ Project-URL: Bug Reports, https://github.com/wachawo/wtftools/issues
33
+ Keywords: devops,sre,linux,diagnostics,monitoring,cron,system,audit,cli
34
+ Classifier: Development Status :: 4 - Beta
35
+ Classifier: Intended Audience :: System Administrators
36
+ Classifier: Intended Audience :: Developers
37
+ Classifier: License :: OSI Approved :: MIT License
38
+ Classifier: Programming Language :: Python :: 3
39
+ Classifier: Programming Language :: Python :: 3.8
40
+ Classifier: Programming Language :: Python :: 3.9
41
+ Classifier: Programming Language :: Python :: 3.10
42
+ Classifier: Programming Language :: Python :: 3.11
43
+ Classifier: Programming Language :: Python :: 3.12
44
+ Classifier: Operating System :: POSIX :: Linux
45
+ Classifier: Topic :: System :: Systems Administration
46
+ Classifier: Topic :: System :: Monitoring
47
+ Classifier: Topic :: Utilities
48
+ Requires-Python: >=3.8
49
+ Description-Content-Type: text/markdown
50
+ License-File: LICENSE
51
+ Provides-Extra: full
52
+ Requires-Dist: psutil>=5.9.0; extra == "full"
53
+ Provides-Extra: dev
54
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
55
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
56
+ Requires-Dist: coverage>=7.0.0; extra == "dev"
57
+ Requires-Dist: ruff>=0.4.0; extra == "dev"
58
+ Requires-Dist: build>=1.0.0; extra == "dev"
59
+ Requires-Dist: stdeb>=0.10.0; extra == "dev"
60
+ Requires-Dist: pre-commit>=3.0.0; extra == "dev"
61
+ Dynamic: license-file
62
+
63
+ # wtftools
64
+
65
+ > One command to see what is going on with your Linux server right now.
66
+
67
+ **Status:** v0.0.0 — initial public release. 14 subcommands, 38 built-in
68
+ checks, snapshot/diff/history, LLM-driven explain. One-shot CLI; no daemon,
69
+ no fleet aggregator, no plugin extension API.
70
+
71
+ > **In a hurry?** See [docs/QUICKSTART.md](docs/QUICKSTART.md) for the 5-minute version.
72
+
73
+ ```
74
+ $ wtf
75
+ ─────────── AUDIT ────────────
76
+ [ OK ] uptime 3d 4h 12m
77
+ [ OK ] load average 0.42 0.51 0.55 / 8 CPU
78
+ [ OK ] memory 4.1GB / 16.0GB used (25%)
79
+ [WARN] disk /var 17.0GB / 20.0GB used (85%)
80
+ [ OK ] zombie processes 0 zombies
81
+ [FAIL] failed systemd units 1 failed unit(s)
82
+ [ OK ] crontab syntax 14 cron line(s), no errors
83
+
84
+ Summary: 12 ok · 1 warn · 1 fail · 2 skip
85
+ ```
86
+
87
+ ## Subcommands
88
+
89
+ | command | what it does |
90
+ |---------------------|-------------------------------------------------------------|
91
+ | `wtf` / `wtf audit` | green/yellow/red checklist: what is OK and what is not |
92
+ | `wtf problems` | alias for `audit --only problems` — show WARN+FAIL only |
93
+ | `wtf explain` | per-check actionable advice; `--llm` to pipe to LLM |
94
+ | `wtf info` | one-page snapshot: host, uptime, load, mem, disks, top, net |
95
+ | `wtf top` | focused process top: sort by cpu/rss, filter user/name |
96
+ | `wtf ports` | listening sockets with owning PID/user/command |
97
+ | `wtf services NAME` | drilldown one service: state, restarts, mem, ports, journal |
98
+ | `wtf logs` | recent ERROR+ journal entries grouped by service |
99
+ | `wtf events` | chronological timeline: reboots, OOM, failed units, … |
100
+ | `wtf history` | list saved audit snapshots (`wtf audit --save` to create) |
101
+ | `wtf diff` | compare current state to a saved snapshot |
102
+ | `wtf crontab` | validate all standard crontab locations + per-user crontabs |
103
+ | `wtf doctor` | self-diagnostic: which tools wtftools can actually use |
104
+ | `wtf config` | show effective config / print example |
105
+
106
+ `wtftools` absorbs and supersedes [`checkcrontab`](https://github.com/wachawo/checkcrontab) — the same cron validator now lives at `wtf crontab`.
107
+
108
+ ## Install
109
+
110
+ ### From PyPI
111
+
112
+ ```bash
113
+ pip install wtftools # core, stdlib-only
114
+ pip install wtftools[full] # + psutil for richer metrics
115
+ ```
116
+
117
+ After install the short command `wtf` (and the long alias `wtftools`) is on `$PATH`.
118
+
119
+ ### From apt (Debian/Ubuntu)
120
+
121
+ ```bash
122
+ sudo apt install python3-psutil
123
+ sudo dpkg -i wtftools_0.0.0-1_all.deb
124
+ ```
125
+
126
+ A `.deb` is built from the same source via `scripts/build-deb.sh` (uses `stdeb`).
127
+
128
+ ### From source
129
+
130
+ ```bash
131
+ git clone https://github.com/wachawo/wtftools
132
+ cd wtftools
133
+ pip install -e .
134
+ # or test without installing:
135
+ python3 wtf.py audit
136
+ ```
137
+
138
+ ## Usage
139
+
140
+ ```bash
141
+ wtf # short audit summary (default)
142
+ wtf problems # only WARN+FAIL rows
143
+ wtf info # detailed system snapshot
144
+ wtf info --format json # machine-readable
145
+
146
+ wtf audit # full audit with [OK]/[WARN]/[FAIL] markers
147
+ wtf audit -v # show extra detail (failed units, OOM events)
148
+ wtf audit --strict # exit 1 on warnings (CI-friendly)
149
+ wtf audit --format json # JSON output for pipelines
150
+ wtf audit --check memory --check disks # run named checks only
151
+ wtf audit --list-checks # show all available check short-names
152
+ wtf audit --since 1 # look-back window for OOM/auth/kernel (default 24h)
153
+ wtf audit --brief # one-line summary for MOTD / SSH banners
154
+ wtf audit --ignore swap --ignore "disk /mnt/Backup" # silence specific checks
155
+ wtf audit --format csv > audit.csv # spreadsheet-friendly
156
+ wtf audit --format plain | awk '$1=="fail"' # shell-pipeline-friendly
157
+ wtf audit --format html -o report.html # self-contained HTML for tickets
158
+
159
+ wtf audit --save # save snapshot to ~/.cache/wtftools/
160
+ wtf diff # what changed vs last snapshot
161
+ wtf diff --snapshot 5 # vs 5 snapshots ago
162
+ wtf history # list saved snapshots
163
+
164
+ wtf explain # per-check actionable advice
165
+ wtf explain --prompt | ollama run llama3 # pipe to local LLM
166
+ wtf explain --llm ollama # built-in: call ollama directly
167
+ wtf explain --llm claude # anthropic SDK + ANTHROPIC_API_KEY
168
+ wtf explain --llm auto # try ollama → claude → openai
169
+
170
+ wtf audit --alert 'mail -s "wtf $WTF_HOST" sre@example.com'
171
+ wtf audit --alert-on warn --alert 'curl -X POST $SLACK_WEBHOOK -d @-'
172
+
173
+ wtf top # top processes
174
+ wtf top --sort rss --user www-data --limit 5 # top RAM consumers for one user
175
+ wtf ports # listening TCP + owning process
176
+
177
+ wtf services nginx # state + restarts + ports + last 20 journal lines
178
+ wtf logs # last hour, ERROR+
179
+ wtf events --since 48 # 48-hour incident timeline
180
+ wtf events --kind oom --kind failed-unit # filter to specific kinds
181
+
182
+ wtf doctor # show which CLI tools wtf can use on this host
183
+ wtf doctor --check-updates # also query PyPI for a newer version
184
+ ```
185
+
186
+ ## Exit codes
187
+
188
+ | code | meaning |
189
+ |------|--------------------------------------------------|
190
+ | 0 | everything OK (`audit`) / no errors (`crontab`) |
191
+ | 1 | warnings with `--strict`, or crontab errors |
192
+ | 2 | audit found a `[FAIL]` |
193
+ | 130 | interrupted (Ctrl-C) |
194
+
195
+ ## Built-in checks
196
+
197
+ uptime · system state · load average · CPU iowait · PSI cpu/memory/io ·
198
+ TCP retransmits · memory · swap · disk (per mount) · inodes ·
199
+ read-only mounts · failed systemd units · enabled-but-down services ·
200
+ restart loops · network errors · conntrack · journal disk usage · zombies ·
201
+ D-state processes · OOM kills · kernel errors · kernel taint · cert expiry ·
202
+ open file descriptors · process count · failed auth · time sync ·
203
+ pending updates · reboot required · cron daemon · crontab syntax · docker ·
204
+ hw temperatures · disk SMART · DNS · HTTP/TCP probes · fail2ban.
205
+
206
+ Run `wtf audit --list-checks` for the full list of short names usable with
207
+ `--check` and `--ignore`.
208
+
209
+ ## Config
210
+
211
+ Drop an INI at any of:
212
+
213
+ - `/etc/wtftools/config.ini`
214
+ - `/etc/wtf/config.ini`
215
+ - `~/.config/wtftools/config.ini`
216
+
217
+ Or stack one ad-hoc via `wtf --config /path/to.ini …`. Run `wtf config --example`
218
+ for a fully-commented template. Headlines:
219
+
220
+ ```ini
221
+ [thresholds]
222
+ disk_warn = 85
223
+ disk_fail = 95
224
+ swap_warn = 50
225
+ swap_fail = 90
226
+ tcp_retrans_warn = 1.0
227
+ tcp_retrans_fail = 5.0
228
+
229
+ [ignore]
230
+ checks = swap, updates
231
+ result_names =
232
+ disk /mnt/Backup
233
+ disk /mnt/Video
234
+ ```
235
+
236
+ ## Compatibility
237
+
238
+ - Python 3.8+
239
+ - Linux (any systemd-based distribution is the happy path; the tool degrades
240
+ gracefully when `systemctl` / `journalctl` are missing)
241
+ - No network access required for the core CLI
242
+ - Optional network: `wtf explain --llm claude/openai`, `wtf doctor --check-updates`
243
+
244
+ ## License
245
+
246
+ MIT