@kinetica/admin-agent 0.1.2 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +83 -24
- package/dist/admin-agent.js +1838 -284
- package/knowledge/references/bundle/support-bundle.md +40 -0
- package/knowledge/references/version-quirks-7.2.md +7 -2
- package/package.json +1 -1
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Support Bundle Layout & Parsing
|
|
3
|
+
category: bundle
|
|
4
|
+
keywords: [bundle, sysinfo, gpudb_sysinfo, logs, rank, gpudb.conf, host-diagnostics, offline, loki]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
### Log line format
|
|
8
|
+
|
|
9
|
+
Core rank logs (`core-gpudb-rolling-r0.log`) look like:
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
2026-06-11 15:18:52.786 FATAL (55114,55114,r0/gpudb_cluster_i) node2 Job.cpp:9 - Segmentation fault, signal: 11
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
That is: `timestamp severity (pid,tid,rank/ctx) host source:line - message`. Severities seen: INFO, WARN, UERR (user error), ERROR, FATAL. Component logs (sql-engine, reveal, tomcat) use a similar prefix without the `source:line` field.
|
|
16
|
+
|
|
17
|
+
Severity order for filtering is `WARN < UERR < ERROR < FATAL`, so `min_severity=ERROR` EXCLUDES UERR (user-error) lines — use `WARN` or `UERR` to include them.
|
|
18
|
+
|
|
19
|
+
### How to read logs efficiently
|
|
20
|
+
|
|
21
|
+
- The logs are large (a rank log can exceed 100k lines). NEVER ask for a whole file. Use `kinetica_bundle_log_timeline` to localize, then `kinetica_bundle_search_logs` with a tight time window + severity to extract only relevant lines. The match cap is shared across files — if you see "capped", narrow the query rather than asking for more.
|
|
22
|
+
- You can pass a timeline bucket label straight into `from_ts`/`to_ts` (e.g. `2026-06-11 15` searches that whole hour) — partial timestamps are widened to cover the full period.
|
|
23
|
+
- Timestamps are plain local strings without a timezone; compare them lexically and treat cross-rank timing cautiously.
|
|
24
|
+
- **Ranks vs. the host manager:** `rank` selects a numeric rank (`r0`, `r1`, …) only. The host manager (`core-gpudb-rolling-hm.log`) is a singleton service, NOT a rank — search or timeline it with `host_manager: true`, never `rank: "hm"`. By default both `log_timeline` and `search_logs` already cover the host manager along with the numeric ranks; `kinetica_bundle_list_files` lists it under `services_present`.
|
|
25
|
+
|
|
26
|
+
### Files of interest
|
|
27
|
+
|
|
28
|
+
`kinetica_bundle_list_files` annotates every file with a `description` of what it contains, so consult that first. The canonical OS-diagnostic / host files (each is `EXEC_CMD`-wrapped — read with `kinetica_bundle_read_sysinfo`):
|
|
29
|
+
|
|
30
|
+
- **Kinetica:** `gpudb.txt` (version/build, binary md5+ldd, captured config), `gpudb-exe-r{N}-*.txt` (per-rank process: command line, PID, environment — memory limits, LD_PRELOAD/jemalloc), `gpudb-exe.txt` (all gpudb processes), `loki-info.txt` (Loki log-index stats), `tables.txt` (schemas, when collected).
|
|
31
|
+
- **Memory / CPU / GPU:** `mem.txt` (free + /proc/meminfo + transparent hugepage), `cpu.txt` (lscpu, NUMA, interrupts), `gpu.txt` (nvidia-smi -L/-q, modinfo nvidia).
|
|
32
|
+
- **Disk / storage:** `disk.txt` (df, mount, lsblk, fdisk, /proc/diskstats), `lsof.txt` (open files + sockets), `lslocks.txt` (file locks).
|
|
33
|
+
- **Network:** `net.txt` (ifconfig, netstat, resolv.conf).
|
|
34
|
+
- **Kernel / OS:** `dmesg.txt` (kernel ring buffer — OOM killer, segfaults, hardware errors), `sys.txt` (uname, uptime, ulimits, kernel cmdline, clocksource, lsmod), `sysctl.txt` (kernel tunables).
|
|
35
|
+
- **Hardware / firmware:** `dmidecode.txt` (BIOS/DMI), `lshw.txt` (hardware listing), `pci.txt` (lspci, I/O resources).
|
|
36
|
+
- **Processes:** `ps.txt` (full process list).
|
|
37
|
+
- **Packages / accounts:** `deb.txt` / `rpm.txt` (installed packages), `user.txt` (users/groups, gpudb account), `ld.so.conf.txt`, `etc_*.txt` (system shell/host config).
|
|
38
|
+
- **Evidence Gaps:** `errors.txt` / `proc-logs-erros.txt` — collection commands that FAILED. `logfiles.txt` — manifest of log dirs the collector enumerated.
|
|
39
|
+
|
|
40
|
+
Rolling core logs under `logs-local/` are the primary source. The small last-2h Loki tails under `logs/` are searched only when no rolling core logs were collected. Each `*.txt` artifact records the exact shell command that produced it in its `EXEC_CMD:` header, so `kinetica_bundle_read_sysinfo` always shows you precisely what ran.
|
|
@@ -81,8 +81,13 @@ The schema uses these names (not the "obvious" SQL-standard names):
|
|
|
81
81
|
sizes. For a real table listing with sizes, query
|
|
82
82
|
`ki_catalog.ki_objects` via SQL instead.
|
|
83
83
|
- **`/admin/show/logs`** is not implemented on 7.2.x — returns 404
|
|
84
|
-
"Unknown URI".
|
|
85
|
-
`ki_catalog.
|
|
84
|
+
"Unknown URI". There is no system log table to query either (7.2.x has
|
|
85
|
+
no `ki_catalog` logs table). For query-level diagnostics use
|
|
86
|
+
`ki_catalog.ki_query_history` (slow/failed queries with error_message)
|
|
87
|
+
or `ki_catalog.ki_query_span_metrics_all` (operation-level events) via
|
|
88
|
+
`kinetica_execute_sql`. For raw application/rank logs, analyze an
|
|
89
|
+
offline support bundle (`--bundle`) — the live system exposes no log
|
|
90
|
+
endpoint.
|
|
86
91
|
|
|
87
92
|
## Default Resource Groups
|
|
88
93
|
|