@kinetica/admin-agent 0.2.1 → 0.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -10
- package/dist/admin-agent.js +584 -301
- package/knowledge/references/bundle/support-bundle.md +18 -4
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -48,7 +48,7 @@ Built with the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-
|
|
|
48
48
|
|
|
49
49
|
- Autonomous multi-round investigation with parallel tool calls
|
|
50
50
|
- 16 read-only diagnostic tools + 4 mutation tools with interactive approval + 2 self-managing tools (reporting, batch-column alter) = **22 live tools**, plus 6 offline bundle-analysis tools = **28 total**
|
|
51
|
-
- **Offline support-bundle analysis** — diagnose from an extracted `gpudb_sysinfo` bundle (per-rank logs, `gpudb.conf`, host diagnostics) with no live connection, or attach a bundle alongside a live session to cross-check captured history against current state
|
|
51
|
+
- **Offline support-bundle analysis** — diagnose from an extracted `gpudb_sysinfo` bundle (per-rank logs, `gpudb.conf`, host diagnostics) with no live connection, or attach a bundle alongside a live session to cross-check captured history against current state — even bundles that don't match the standard layout, via file-name and content inference
|
|
52
52
|
- Expert knowledge via pluggable playbooks (no code required to add new ones)
|
|
53
53
|
- Schema-aware SQL — discovers actual column names at startup, never guesses
|
|
54
54
|
- HTTPS-first URL resolution with explicit consent required before any HTTP fallback
|
|
@@ -243,6 +243,8 @@ A bundle and a live connection are **composable capabilities, not exclusive mode
|
|
|
243
243
|
|
|
244
244
|
**Every rank, however its logs were captured.** A bundle can carry per-rank logs in two forms: full rolling logs for the ranks on the collector's own host (`logs-local/`, including rotated history like `….log.1`), and centralized Loki/promtail exports for the entire cluster (`logs/rank0.log` … `rankN.log`, plus `hostmanager.log` and per-component tails). The agent reads both transparently — it identifies each rank from either source, prefers the richer rolling log when a rank has both, and falls back to the centralized export for ranks that live on other hosts. So on a multi-node cluster you can investigate **all** ranks (and the host manager), not just the ones local to where the bundle was collected. The centralized exports are JSON-wrapped on disk; the tools unwrap them automatically, so severity filters and timelines behave identically across both formats. `kinetica_bundle_list_files` reports the true rank count under `ranks_present` — trust it rather than guessing from `logs-local/`.
|
|
245
245
|
|
|
246
|
+
**Bundles that don't match the expected shape.** Not every bundle is a clean `gpudb_sysinfo` capture — a customer may hand over a flat logs-only dump, a differently-named collector's output, or a partial directory. The agent infers each file's type from its name, and for files whose names give nothing away it sniffs a bounded slice of their content against the same log/config/sysinfo parsers. So a rolling log shipped without the canonical `core-` prefix, or a host-manager `.out` capture, is still recognized, searchable, and rank-attributed rather than silently dropped. `kinetica_bundle_list_files` reports a `layout_match` verdict (`canonical` / `partial` / `unfamiliar`), a per-file confidence (`exact` / `inferred` / `weak`), and any files it couldn't place — and the operator gets a startup warning when a bundle is off-shape — so an inference is never passed off as certainty. Classification depends only on file names and contents, never on what the bundle directory itself is named.
|
|
247
|
+
|
|
246
248
|
Anthropic authentication still runs in bundle mode; only the interactive Kinetica credential collection is skipped (there may be no live DB to connect to). See [Offline Bundle Analysis](#offline-bundle-analysis-read-only) for the tools, and [CLAUDE.md](CLAUDE.md) for the parser/architecture details.
|
|
247
249
|
|
|
248
250
|
## CLI Flags
|
|
@@ -330,14 +332,14 @@ The `--bundle` flag points the agent at an **extracted** support-bundle director
|
|
|
330
332
|
|
|
331
333
|
Available against an extracted `gpudb_sysinfo` support bundle (see [Offline Bundle Mode](#offline-bundle-mode)). All read-only; the search/timeline tools stream and bound their output so a large rank log (tens of MB, hundreds of thousands of lines) never blows up the context.
|
|
332
334
|
|
|
333
|
-
| Tool | Description
|
|
334
|
-
| ------------------------------ |
|
|
335
|
-
| `kinetica_load_bundle` | Attach an extracted bundle directory; without a path it opens a directory picker (a model-supplied path needs operator confirmation)
|
|
336
|
-
| `kinetica_bundle_list_files` | Inventory: detected version, ranks + services present, file counts/sizes by kind — call this first
|
|
337
|
-
| `kinetica_bundle_log_timeline` | Per-time-bucket severity counts across ranks (the incident shape) — call before searching
|
|
338
|
-
| `kinetica_bundle_search_logs` | Bounded log search by regex, min-severity, time window, and rank / host-manager / component (reads both rolling and Loki-export logs) |
|
|
339
|
-
| `kinetica_bundle_read_config` | Read the bundle's real on-disk `gpudb.conf`, with optional section/key filter
|
|
340
|
-
| `kinetica_bundle_read_sysinfo` | OS/process/version diagnostic files (memory, CPU, disk, GPU, network, process args)
|
|
335
|
+
| Tool | Description |
|
|
336
|
+
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
337
|
+
| `kinetica_load_bundle` | Attach an extracted bundle directory; without a path it opens a directory picker (a model-supplied path needs operator confirmation) |
|
|
338
|
+
| `kinetica_bundle_list_files` | Inventory: detected version, ranks + services present, file counts/sizes by kind, plus a layout-match verdict + per-file confidence for off-shape bundles — call this first |
|
|
339
|
+
| `kinetica_bundle_log_timeline` | Per-time-bucket severity counts across ranks (the incident shape) — call before searching |
|
|
340
|
+
| `kinetica_bundle_search_logs` | Bounded log search by regex, min-severity, time window, and rank / host-manager / component (reads both rolling and Loki-export logs); `include_multiline` stitches a multi-line record — e.g. a full `Executing SQL:` query whose embedded newlines span many lines — back onto each match |
|
|
341
|
+
| `kinetica_bundle_read_config` | Read the bundle's real on-disk `gpudb.conf`, with optional section/key filter |
|
|
342
|
+
| `kinetica_bundle_read_sysinfo` | OS/process/version diagnostic files (memory, CPU, disk, GPU, network, process args) |
|
|
341
343
|
|
|
342
344
|
### Reporting
|
|
343
345
|
|
|
@@ -427,7 +429,7 @@ References provide domain knowledge (not diagnostic runbooks). Create a `.md` fi
|
|
|
427
429
|
- `sql-create-index` — column index syntax, chunk skip index, when to use which
|
|
428
430
|
- `version-quirks-7.2` — endpoint/property differences between 7.2.x and earlier releases
|
|
429
431
|
|
|
430
|
-
Plus a **bundle-scoped reference** (`support-bundle` — bundle layout, the two per-rank log families, raw + Loki-JSONL log-line formats, severity ordering, file parsing, crash-SQL forensics) that lives in `knowledge/references/bundle/`. It loads in **every** session — even a pure live one — so that a bundle attached mid-session via `kinetica_load_bundle` has its parsing knowledge ready in the (build-once) prompt; the corpus is cached, so the cost to a session that never attaches a bundle is negligible.
|
|
432
|
+
Plus a **bundle-scoped reference** (`support-bundle` — bundle layout, the two per-rank log families, raw + Loki-JSONL log-line formats, severity ordering, file parsing, crash-SQL forensics, and how to work an off-shape bundle via the `layout_match`/confidence signals) that lives in `knowledge/references/bundle/`. It loads in **every** session — even a pure live one — so that a bundle attached mid-session via `kinetica_load_bundle` has its parsing knowledge ready in the (build-once) prompt; the corpus is cached, so the cost to a session that never attaches a bundle is negligible.
|
|
431
433
|
|
|
432
434
|
> **Heads up — prompt budget:** all playbooks and references are front-loaded into a single system prompt at startup, so its token cost grows with the knowledge corpus. A startup tripwire (`agent/prompt-budget.ts`) prints the assembled prompt size under `DEBUG` and warns on stderr once it exceeds ~20,000 estimated tokens. Current baseline is ~13.4k tokens (6 playbooks + 9 references). If you add substantial knowledge and trip that warning, treat it as the cue to switch from "load everything" to keyword-based playbook selection.
|
|
433
435
|
|
package/dist/admin-agent.js
CHANGED
|
@@ -3875,248 +3875,8 @@ var import_claude_agent_sdk4 = require("@anthropic-ai/claude-agent-sdk");
|
|
|
3875
3875
|
// src/tools/bundle/list-files.ts
|
|
3876
3876
|
var import_zod18 = require("zod");
|
|
3877
3877
|
|
|
3878
|
-
// src/bundle/known-files.ts
|
|
3879
|
-
var KNOWN_BUNDLE_FILES = {
|
|
3880
|
-
// Host resources
|
|
3881
|
-
"cpu.txt": "CPU topology, NUMA, and interrupts (lscpu, numactl, /proc/cpuinfo, /proc/interrupts)",
|
|
3882
|
-
"mem.txt": "Memory usage, /proc/meminfo, and transparent-hugepage setting (free -m -t)",
|
|
3883
|
-
"disk.txt": "Filesystems, mounts, block devices, and disk stats (df, mount, lsblk, fdisk, /etc/fstab, /proc/diskstats)",
|
|
3884
|
-
"gpu.txt": "NVIDIA GPU inventory and state (nvidia-smi -L/-q, modinfo nvidia)",
|
|
3885
|
-
"net.txt": "Network interfaces, sockets, and DNS (hostname, ifconfig, netstat, /etc/resolv.conf)",
|
|
3886
|
-
// Processes
|
|
3887
|
-
"ps.txt": "Full process list (ps -auxww, ps -ejHlfww)",
|
|
3888
|
-
"gpudb-exe.txt": "Running gpudb processes (ps auxfwww | grep gpudb)",
|
|
3889
|
-
// Hardware / firmware
|
|
3890
|
-
"dmidecode.txt": "BIOS / DMI hardware inventory (dmidecode)",
|
|
3891
|
-
"lshw.txt": "Hardware listing (lshw -short -numeric)",
|
|
3892
|
-
"pci.txt": "PCI devices and I/O resources (lspci, /proc/ioports, /proc/iomem)",
|
|
3893
|
-
// Kernel / OS
|
|
3894
|
-
"dmesg.txt": "Kernel ring buffer \u2014 boot and runtime kernel messages (dmesg -T)",
|
|
3895
|
-
"dmesg-timestamp.txt": "Kernel ring buffer with human-readable timestamps",
|
|
3896
|
-
"sysctl.txt": "Kernel tunables (sysctl -a)",
|
|
3897
|
-
"sys.txt": "OS identity, uptime, ulimits, kernel cmdline, clocksource, and loaded modules (uname, ulimit, /proc/cmdline, lsmod)",
|
|
3898
|
-
"lsof.txt": "Open files and network sockets (lsof -n -P)",
|
|
3899
|
-
"lslocks.txt": "Held file locks (lslocks)",
|
|
3900
|
-
// Packages / linker / accounts
|
|
3901
|
-
"deb.txt": "Installed Debian packages and verification (dpkg -l, dpkg -V)",
|
|
3902
|
-
"rpm.txt": "Installed RPM packages (rpm -qa)",
|
|
3903
|
-
"ld.so.conf.txt": "Dynamic-linker library search paths (/etc/ld.so.conf)",
|
|
3904
|
-
"user.txt": "Users, groups, and the gpudb service account (whoami, id, /etc/passwd, /etc/group)",
|
|
3905
|
-
"sudoers.txt": "Sudo configuration (/etc/sudoers)",
|
|
3906
|
-
"etc_profile.txt": "Login shell profile (/etc/profile)",
|
|
3907
|
-
"etc_bashrc.txt": "System bashrc (/etc/bashrc)",
|
|
3908
|
-
"etc_host.txt": "Static hostname resolution (/etc/hosts)",
|
|
3909
|
-
// Kinetica-specific
|
|
3910
|
-
"gpudb.txt": "GPUdb version/build, binary md5 + ldd, and the captured gpudb.conf / gpudb_logger.conf ($GPUDB_EXE -v)",
|
|
3911
|
-
"gpudb_core_etc_gpudb.conf": "The live gpudb.conf at capture time (the database's main config)",
|
|
3912
|
-
"gpudb_core_etc_gpudb_logger.conf": "The logging configuration (gpudb_logger.conf)",
|
|
3913
|
-
"loki-info.txt": "Loki log-index stats: labels, series, and per-class volume (logcli)",
|
|
3914
|
-
"sql-queries.txt": "SQL query log extracted from Loki (logcli)",
|
|
3915
|
-
"tables.txt": "Table schemas and column types (gadmin --schema), when collected",
|
|
3916
|
-
"logfiles.txt": "Manifest: the log directories/files the collector enumerated",
|
|
3917
|
-
"errors.txt": "Collection commands that FAILED during capture (Evidence Gaps)",
|
|
3918
|
-
"proc-logs-erros.txt": "Per-process log-collection failures during capture (Evidence Gaps)"
|
|
3919
|
-
};
|
|
3920
|
-
var KIND_DESCRIPTIONS = {
|
|
3921
|
-
"core-log": "Per-rank rolling Kinetica core log (the primary incident narrative)",
|
|
3922
|
-
"component-log": "Component service log (sql-engine, httpd, reveal, tomcat, stats, \u2026)",
|
|
3923
|
-
"loki-tail": "Last-2h Loki tail for a service (small; searched only when no core logs exist)",
|
|
3924
|
-
"process-info": "Per-rank process snapshot: command line, PID, and environment (/proc/<pid>/environ)",
|
|
3925
|
-
config: "Kinetica configuration file",
|
|
3926
|
-
"version-info": "GPUdb version/build information",
|
|
3927
|
-
"collection-errors": "Collection commands that FAILED during capture (Evidence Gaps)",
|
|
3928
|
-
manifest: "Manifest of log directories/files the collector enumerated"
|
|
3929
|
-
};
|
|
3930
|
-
function basename(relPath) {
|
|
3931
|
-
const parts = relPath.split("/");
|
|
3932
|
-
return parts[parts.length - 1] ?? relPath;
|
|
3933
|
-
}
|
|
3934
|
-
function describeBundleFile(entry) {
|
|
3935
|
-
return KNOWN_BUNDLE_FILES[basename(entry.relPath)] ?? KIND_DESCRIPTIONS[entry.kind] ?? "";
|
|
3936
|
-
}
|
|
3937
|
-
|
|
3938
|
-
// src/tools/bundle/list-files.ts
|
|
3939
|
-
var BundleListFilesSchema = import_zod18.z.object({
|
|
3940
|
-
kind: import_zod18.z.string().optional()
|
|
3941
|
-
});
|
|
3942
|
-
async function bundleListFiles(source, args = {}) {
|
|
3943
|
-
const all = source.listFiles();
|
|
3944
|
-
const filtered = args.kind ? all.filter((e) => e.kind === args.kind) : all;
|
|
3945
|
-
const { totalFiles, totalBytes, byKind, ranks, services } = source.inventory();
|
|
3946
|
-
const version = await source.detectVersion();
|
|
3947
|
-
const errors = await source.collectionErrors();
|
|
3948
|
-
const files = filtered.map((e) => ({
|
|
3949
|
-
file: e.relPath,
|
|
3950
|
-
kind: e.kind,
|
|
3951
|
-
rank: e.rank ?? "",
|
|
3952
|
-
size_kb: Math.round(e.sizeBytes / 1024),
|
|
3953
|
-
// What the file contains — so the agent can pick the right one without reading it.
|
|
3954
|
-
description: describeBundleFile(e)
|
|
3955
|
-
}));
|
|
3956
|
-
return {
|
|
3957
|
-
ok: true,
|
|
3958
|
-
data: {
|
|
3959
|
-
detected_version: version ?? "unknown",
|
|
3960
|
-
ranks_present: ranks.join(", ") || "none",
|
|
3961
|
-
services_present: services.join(", ") || "none",
|
|
3962
|
-
total_files: totalFiles,
|
|
3963
|
-
total_size_mb: Number((totalBytes / 1e6).toFixed(1)),
|
|
3964
|
-
counts_by_kind: byKind,
|
|
3965
|
-
failed_collections: errors.length,
|
|
3966
|
-
files
|
|
3967
|
-
}
|
|
3968
|
-
};
|
|
3969
|
-
}
|
|
3970
|
-
|
|
3971
|
-
// src/tools/bundle/log-timeline.ts
|
|
3972
|
-
var import_zod19 = require("zod");
|
|
3973
|
-
var BundleLogTimelineSchema = import_zod19.z.object({
|
|
3974
|
-
min_severity: import_zod19.z.enum(["INFO", "WARN", "UERR", "ERROR", "FATAL"]).optional(),
|
|
3975
|
-
granularity: import_zod19.z.enum(["day", "hour", "minute"]).optional(),
|
|
3976
|
-
rank: import_zod19.z.string().describe('Numeric rank only, e.g. "r0"/"r1". For the host manager use host_manager.').optional(),
|
|
3977
|
-
host_manager: import_zod19.z.boolean().describe("Bucket the host-manager (hm) log \u2014 a singleton service, not a rank.").optional(),
|
|
3978
|
-
component: import_zod19.z.string().optional(),
|
|
3979
|
-
include_components: import_zod19.z.boolean().optional()
|
|
3980
|
-
});
|
|
3981
|
-
async function bundleLogTimeline(source, args = {}) {
|
|
3982
|
-
const query3 = {
|
|
3983
|
-
...args.min_severity !== void 0 ? { minSeverity: args.min_severity } : {},
|
|
3984
|
-
...args.granularity !== void 0 ? { granularity: args.granularity } : {},
|
|
3985
|
-
...args.rank !== void 0 ? { rank: args.rank } : {},
|
|
3986
|
-
...args.host_manager !== void 0 ? { hostManager: args.host_manager } : {},
|
|
3987
|
-
...args.component !== void 0 ? { component: args.component } : {},
|
|
3988
|
-
...args.include_components !== void 0 ? { includeComponents: args.include_components } : {}
|
|
3989
|
-
};
|
|
3990
|
-
const result = await source.logTimeline(query3);
|
|
3991
|
-
const severities = [...new Set(result.buckets.flatMap((b) => Object.keys(b.counts)))];
|
|
3992
|
-
const order = ["FATAL", "ERROR", "UERR", "WARN", "INFO"];
|
|
3993
|
-
severities.sort((a, b) => order.indexOf(a) - order.indexOf(b));
|
|
3994
|
-
const rows = result.buckets.map((b) => {
|
|
3995
|
-
const row = { time_bucket: b.bucket };
|
|
3996
|
-
for (const sev of severities) row[sev] = b.counts[sev] ?? 0;
|
|
3997
|
-
row.total = b.total;
|
|
3998
|
-
return row;
|
|
3999
|
-
});
|
|
4000
|
-
return {
|
|
4001
|
-
ok: true,
|
|
4002
|
-
note: result.totalCounted === 0 ? "No lines at or above the severity threshold \u2014 try a lower min_severity." : `${result.totalCounted} event(s) across ${result.buckets.length} bucket(s), ${result.filesScanned.length} file(s).`,
|
|
4003
|
-
data: {
|
|
4004
|
-
lines_scanned: result.linesScanned,
|
|
4005
|
-
files_scanned: result.filesScanned.join(", ") || "none",
|
|
4006
|
-
buckets: rows
|
|
4007
|
-
}
|
|
4008
|
-
};
|
|
4009
|
-
}
|
|
4010
|
-
|
|
4011
|
-
// src/tools/bundle/search-logs.ts
|
|
4012
|
-
var import_zod20 = require("zod");
|
|
4013
|
-
var BundleSearchLogsSchema = import_zod20.z.object({
|
|
4014
|
-
regex: import_zod20.z.string().optional(),
|
|
4015
|
-
min_severity: import_zod20.z.enum(["INFO", "WARN", "UERR", "ERROR", "FATAL"]).optional(),
|
|
4016
|
-
from_ts: import_zod20.z.string().optional(),
|
|
4017
|
-
to_ts: import_zod20.z.string().optional(),
|
|
4018
|
-
rank: import_zod20.z.string().describe('Numeric rank only, e.g. "r0"/"r1". For the host manager use host_manager.').optional(),
|
|
4019
|
-
host_manager: import_zod20.z.boolean().describe("Search the host-manager (hm) log \u2014 a singleton service, not a rank.").optional(),
|
|
4020
|
-
component: import_zod20.z.string().optional(),
|
|
4021
|
-
include_components: import_zod20.z.boolean().optional(),
|
|
4022
|
-
max_matches: import_zod20.z.number().int().min(1).max(1e3).optional()
|
|
4023
|
-
});
|
|
4024
|
-
async function bundleSearchLogs(source, args = {}) {
|
|
4025
|
-
const query3 = {
|
|
4026
|
-
...args.regex !== void 0 ? { regex: args.regex } : {},
|
|
4027
|
-
...args.min_severity !== void 0 ? { minSeverity: args.min_severity } : {},
|
|
4028
|
-
...args.from_ts !== void 0 ? { fromTs: args.from_ts } : {},
|
|
4029
|
-
...args.to_ts !== void 0 ? { toTs: args.to_ts } : {},
|
|
4030
|
-
...args.rank !== void 0 ? { rank: args.rank } : {},
|
|
4031
|
-
...args.host_manager !== void 0 ? { hostManager: args.host_manager } : {},
|
|
4032
|
-
...args.component !== void 0 ? { component: args.component } : {},
|
|
4033
|
-
...args.include_components !== void 0 ? { includeComponents: args.include_components } : {},
|
|
4034
|
-
...args.max_matches !== void 0 ? { maxMatches: args.max_matches } : {}
|
|
4035
|
-
};
|
|
4036
|
-
const result = await source.searchLogs(query3);
|
|
4037
|
-
const note = result.capped ? `Showing ${result.matches.length} of ${result.totalMatched} matches across ${result.filesScanned.length} file(s) (display capped). Narrow with a tighter regex, severity, or time window to surface the specific lines.` : `${result.totalMatched} match(es) across ${result.filesScanned.length} file(s).`;
|
|
4038
|
-
return {
|
|
4039
|
-
ok: true,
|
|
4040
|
-
note,
|
|
4041
|
-
data: {
|
|
4042
|
-
total_matched: result.totalMatched,
|
|
4043
|
-
lines_scanned: result.linesScanned,
|
|
4044
|
-
files_scanned: result.filesScanned.join(", ") || "none",
|
|
4045
|
-
capped: result.capped,
|
|
4046
|
-
matches: result.matches.map((m) => ({
|
|
4047
|
-
file: m.file,
|
|
4048
|
-
line: m.lineNumber,
|
|
4049
|
-
timestamp: m.timestamp ?? "",
|
|
4050
|
-
severity: m.severity ?? "",
|
|
4051
|
-
rank: m.rank ?? "",
|
|
4052
|
-
message: m.message
|
|
4053
|
-
}))
|
|
4054
|
-
}
|
|
4055
|
-
};
|
|
4056
|
-
}
|
|
4057
|
-
|
|
4058
|
-
// src/tools/bundle/read-config.ts
|
|
4059
|
-
var import_zod21 = require("zod");
|
|
4060
|
-
var BundleReadConfigSchema = import_zod21.z.object({
|
|
4061
|
-
section: import_zod21.z.string().optional(),
|
|
4062
|
-
key: import_zod21.z.string().optional()
|
|
4063
|
-
});
|
|
4064
|
-
async function bundleReadConfig(source, args = {}) {
|
|
4065
|
-
const result = await source.readConfig({
|
|
4066
|
-
...args.section !== void 0 ? { section: args.section } : {},
|
|
4067
|
-
...args.key !== void 0 ? { key: args.key } : {}
|
|
4068
|
-
});
|
|
4069
|
-
if ("error" in result) {
|
|
4070
|
-
return { ok: false, status: 0, error: result.error, raw: "" };
|
|
4071
|
-
}
|
|
4072
|
-
if (result.entries.length === 0 && args.section !== void 0) {
|
|
4073
|
-
const all = await source.readConfig(args.key !== void 0 ? { key: args.key } : {});
|
|
4074
|
-
const sections = "error" in all ? [] : [...new Set(all.entries.map((e) => e.section))].sort();
|
|
4075
|
-
const sectionList = sections.map((s) => s === "" ? "(flat/top-level)" : s).join(", ");
|
|
4076
|
-
return {
|
|
4077
|
-
ok: true,
|
|
4078
|
-
note: `No entries in section "${args.section}" of ${result.file}. gpudb.conf is largely flat \u2014 retry filtering by key only. Sections present: ${sectionList || "(none)"}.`,
|
|
4079
|
-
data: { section_not_found: args.section, available_sections: sections }
|
|
4080
|
-
};
|
|
4081
|
-
}
|
|
4082
|
-
return {
|
|
4083
|
-
ok: true,
|
|
4084
|
-
note: `${result.entries.length} entr(y/ies) from ${result.file}.`,
|
|
4085
|
-
data: result.entries.map((e) => ({ section: e.section, key: e.key, value: e.value }))
|
|
4086
|
-
};
|
|
4087
|
-
}
|
|
4088
|
-
|
|
4089
|
-
// src/tools/bundle/read-sysinfo.ts
|
|
4090
|
-
var import_zod22 = require("zod");
|
|
4091
|
-
var BundleReadSysinfoSchema = import_zod22.z.object({
|
|
4092
|
-
name: import_zod22.z.string().min(1)
|
|
4093
|
-
});
|
|
4094
|
-
async function bundleReadSysinfo(source, args) {
|
|
4095
|
-
const result = await source.readSysinfo(args.name);
|
|
4096
|
-
if ("error" in result) {
|
|
4097
|
-
return { ok: false, status: 0, error: result.error, raw: "" };
|
|
4098
|
-
}
|
|
4099
|
-
return {
|
|
4100
|
-
ok: true,
|
|
4101
|
-
data: {
|
|
4102
|
-
...result.header !== void 0 ? { source_file: result.header } : {},
|
|
4103
|
-
blocks: result.blocks.map((b) => ({
|
|
4104
|
-
command: b.command,
|
|
4105
|
-
...b.exitCode !== void 0 ? { exit_code: b.exitCode } : {},
|
|
4106
|
-
output: b.output
|
|
4107
|
-
}))
|
|
4108
|
-
}
|
|
4109
|
-
};
|
|
4110
|
-
}
|
|
4111
|
-
|
|
4112
|
-
// src/tools/bundle/load-bundle.ts
|
|
4113
|
-
var import_zod23 = require("zod");
|
|
4114
|
-
|
|
4115
|
-
// src/bundle/verify-bundle.ts
|
|
4116
|
-
var import_promises6 = require("fs/promises");
|
|
4117
|
-
|
|
4118
3878
|
// src/bundle/BundleSource.ts
|
|
4119
|
-
var
|
|
3879
|
+
var import_promises6 = require("fs/promises");
|
|
4120
3880
|
var import_node_path6 = require("path");
|
|
4121
3881
|
|
|
4122
3882
|
// src/bundle/sysinfo-block.ts
|
|
@@ -4274,6 +4034,8 @@ function parseLogLine(line) {
|
|
|
4274
4034
|
|
|
4275
4035
|
// src/bundle/log-search.ts
|
|
4276
4036
|
var DEFAULT_MAX_MATCHES = 200;
|
|
4037
|
+
var MULTILINE_MAX_LINES = 300;
|
|
4038
|
+
var MULTILINE_MAX_CHARS = 2e4;
|
|
4277
4039
|
var REGEX_SCAN_MAX = 8192;
|
|
4278
4040
|
var GRANULARITY_LEN = {
|
|
4279
4041
|
day: 10,
|
|
@@ -4315,6 +4077,23 @@ function matchesFilters(parsed, query3, regex, minRank) {
|
|
|
4315
4077
|
return false;
|
|
4316
4078
|
return true;
|
|
4317
4079
|
}
|
|
4080
|
+
function buildMatch(lineNumber, parsed) {
|
|
4081
|
+
return {
|
|
4082
|
+
lineNumber,
|
|
4083
|
+
...parsed.timestamp !== void 0 ? { timestamp: parsed.timestamp } : {},
|
|
4084
|
+
...parsed.severity !== void 0 ? { severity: parsed.severity } : {},
|
|
4085
|
+
...parsed.rank !== void 0 ? { rank: parsed.rank } : {},
|
|
4086
|
+
message: parsed.message,
|
|
4087
|
+
raw: parsed.raw
|
|
4088
|
+
};
|
|
4089
|
+
}
|
|
4090
|
+
function finalizeMultiline(pending) {
|
|
4091
|
+
if (pending.extra.length === 0) return pending.base;
|
|
4092
|
+
const joined = pending.extra.join("\n");
|
|
4093
|
+
const suffix = pending.truncated ? "\n\u2026 [continuation truncated]" : "";
|
|
4094
|
+
return { ...pending.base, message: `${pending.base.message}
|
|
4095
|
+
${joined}${suffix}` };
|
|
4096
|
+
}
|
|
4318
4097
|
async function searchLogFile(filePath, query3) {
|
|
4319
4098
|
const maxMatches = query3.maxMatches ?? DEFAULT_MAX_MATCHES;
|
|
4320
4099
|
const minRank = query3.minSeverity !== void 0 ? severityRank(query3.minSeverity) : -Infinity;
|
|
@@ -4336,9 +4115,17 @@ async function searchLogFile(filePath, query3) {
|
|
|
4336
4115
|
...query3.fromTs !== void 0 ? { fromTs: floorTimestamp(query3.fromTs) } : {},
|
|
4337
4116
|
...query3.toTs !== void 0 ? { toTs: ceilTimestamp(query3.toTs) } : {}
|
|
4338
4117
|
};
|
|
4118
|
+
const coalesce = query3.coalesceMultiline === true;
|
|
4339
4119
|
const matches = [];
|
|
4340
4120
|
let totalMatched = 0;
|
|
4341
4121
|
let linesScanned = 0;
|
|
4122
|
+
let pending;
|
|
4123
|
+
const flushPending = () => {
|
|
4124
|
+
if (pending) {
|
|
4125
|
+
matches.push(finalizeMultiline(pending));
|
|
4126
|
+
pending = void 0;
|
|
4127
|
+
}
|
|
4128
|
+
};
|
|
4342
4129
|
try {
|
|
4343
4130
|
const rl = (0, import_node_readline.createInterface)({
|
|
4344
4131
|
input: (0, import_node_fs4.createReadStream)(filePath, { encoding: "utf-8" }),
|
|
@@ -4347,20 +4134,29 @@ async function searchLogFile(filePath, query3) {
|
|
|
4347
4134
|
for await (const line of rl) {
|
|
4348
4135
|
linesScanned++;
|
|
4349
4136
|
const parsed = parseLogLine(line);
|
|
4137
|
+
if (pending) {
|
|
4138
|
+
if (parsed.timestamp === void 0) {
|
|
4139
|
+
if (!pending.truncated && pending.extra.length < MULTILINE_MAX_LINES && pending.chars + line.length + 1 <= MULTILINE_MAX_CHARS) {
|
|
4140
|
+
pending.extra.push(line);
|
|
4141
|
+
pending.chars += line.length + 1;
|
|
4142
|
+
} else {
|
|
4143
|
+
pending.truncated = true;
|
|
4144
|
+
}
|
|
4145
|
+
continue;
|
|
4146
|
+
}
|
|
4147
|
+
flushPending();
|
|
4148
|
+
}
|
|
4350
4149
|
if (!matchesFilters(parsed, boundedQuery, regex, minRank)) continue;
|
|
4351
4150
|
totalMatched++;
|
|
4352
4151
|
if (matches.length < maxMatches) {
|
|
4353
|
-
|
|
4354
|
-
|
|
4355
|
-
|
|
4356
|
-
...parsed.severity !== void 0 ? { severity: parsed.severity } : {},
|
|
4357
|
-
...parsed.rank !== void 0 ? { rank: parsed.rank } : {},
|
|
4358
|
-
message: parsed.message,
|
|
4359
|
-
raw: parsed.raw
|
|
4360
|
-
});
|
|
4152
|
+
const base = buildMatch(linesScanned, parsed);
|
|
4153
|
+
if (coalesce) pending = { base, extra: [], chars: 0, truncated: false };
|
|
4154
|
+
else matches.push(base);
|
|
4361
4155
|
}
|
|
4362
4156
|
}
|
|
4157
|
+
flushPending();
|
|
4363
4158
|
} catch (err) {
|
|
4159
|
+
flushPending();
|
|
4364
4160
|
const message = err instanceof Error ? err.message : String(err);
|
|
4365
4161
|
return {
|
|
4366
4162
|
matches,
|
|
@@ -4409,20 +4205,26 @@ async function aggregateTimeline(filePath, query3 = {}) {
|
|
|
4409
4205
|
}
|
|
4410
4206
|
|
|
4411
4207
|
// src/bundle/bundle-index.ts
|
|
4412
|
-
var
|
|
4208
|
+
var import_promises5 = require("fs/promises");
|
|
4413
4209
|
var import_node_path5 = require("path");
|
|
4414
4210
|
|
|
4415
4211
|
// src/bundle/classify-file.ts
|
|
4416
|
-
var ROLLING_ID_RE = /core-gpudb-rolling-(r\d+|hm)\.log(?:\.\d+)?$/;
|
|
4212
|
+
var ROLLING_ID_RE = /(?:core-)?gpudb-rolling-(r\d+|hm)\.log(?:\.\d+)?$/;
|
|
4417
4213
|
var EXE_ID_RE = /gpudb-exe-(r\d+|hm)-/;
|
|
4418
4214
|
var HOST_RE = /\b(node\w+)\b/;
|
|
4215
|
+
var CONF_RE = /\.conf$/i;
|
|
4216
|
+
var CONF_ALT_RE = /\.(cfg|ini)$/i;
|
|
4419
4217
|
var LOG_RE = /\.log(?:\.\d+)?$/;
|
|
4218
|
+
var LOGISH_RE = /\.(?:log|out|err)(?:\.\d+)?$/i;
|
|
4420
4219
|
var LOKI_RANK_RE = /^rank(\d+)\.log$/;
|
|
4421
4220
|
var LOKI_HM_BASE = "hostmanager.log";
|
|
4221
|
+
var HM_TOKEN_RE = /host-?manager/i;
|
|
4222
|
+
var RANK_TOKEN_RE = /(?:\brank[-_]?|\br)(\d{1,2})\b/i;
|
|
4223
|
+
var LOG_DIR_RE = /(?:^|\/)(?:logs|logs-local|log)(?:\/|$)/;
|
|
4422
4224
|
function rankOrService(id) {
|
|
4423
4225
|
return id === "hm" ? { service: "host-manager" } : { rank: id };
|
|
4424
4226
|
}
|
|
4425
|
-
function
|
|
4227
|
+
function basename(relPath) {
|
|
4426
4228
|
const parts = relPath.split("/");
|
|
4427
4229
|
return parts[parts.length - 1] ?? relPath;
|
|
4428
4230
|
}
|
|
@@ -4434,70 +4236,247 @@ function inferHost(relPath) {
|
|
|
4434
4236
|
return HOST_RE.exec(relPath)?.[1] ?? void 0;
|
|
4435
4237
|
}
|
|
4436
4238
|
function componentName(base) {
|
|
4437
|
-
return base.replace(/\.\d+$/, "").replace(/(
|
|
4239
|
+
return base.replace(/\.\d+$/, "").replace(/(?:\.(?:log|out|err))+$/i, "").replace(/^core-gpudb-/, "").replace(/^gpudb-/, "").replace(/-node\w+$/, "");
|
|
4438
4240
|
}
|
|
4241
|
+
function cls(kind, confidence, reason, parts = {}) {
|
|
4242
|
+
return {
|
|
4243
|
+
kind,
|
|
4244
|
+
confidence,
|
|
4245
|
+
reason,
|
|
4246
|
+
...parts.rank !== void 0 ? { rank: parts.rank } : {},
|
|
4247
|
+
...parts.inferredRank !== void 0 ? { inferredRank: parts.inferredRank } : {},
|
|
4248
|
+
...parts.service !== void 0 ? { service: parts.service } : {},
|
|
4249
|
+
...parts.component !== void 0 ? { component: parts.component } : {},
|
|
4250
|
+
...parts.host !== void 0 ? { host: parts.host } : {}
|
|
4251
|
+
};
|
|
4252
|
+
}
|
|
4253
|
+
var MATCHERS = [
|
|
4254
|
+
// ── Tier A: canonical filenames / locations (exact) ──────────────────────────
|
|
4255
|
+
(c) => CONF_RE.test(c.base) ? cls("config", "exact", "config (.conf)", { host: c.host }) : null,
|
|
4256
|
+
(c) => CONF_ALT_RE.test(c.base) ? cls("config", "inferred", "config-like extension (.cfg/.ini)", { host: c.host }) : null,
|
|
4257
|
+
(c) => c.base === "logfiles.txt" ? cls("manifest", "exact", "collector manifest", { host: c.host }) : null,
|
|
4258
|
+
(c) => c.base === "errors.txt" || c.base.endsWith("erros.txt") ? cls("collection-errors", "exact", "collection-errors summary", { host: c.host }) : null,
|
|
4259
|
+
(c) => c.base === "gpudb.txt" ? cls("version-info", "exact", "gpudb.txt", { host: c.host }) : null,
|
|
4260
|
+
(c) => {
|
|
4261
|
+
const m = EXE_ID_RE.exec(c.base);
|
|
4262
|
+
return m ? cls("process-info", "exact", "gpudb-exe process capture", {
|
|
4263
|
+
...rankOrService(m[1]),
|
|
4264
|
+
host: c.host
|
|
4265
|
+
}) : null;
|
|
4266
|
+
},
|
|
4267
|
+
(c) => {
|
|
4268
|
+
const m = ROLLING_ID_RE.exec(c.base);
|
|
4269
|
+
if (!m) return null;
|
|
4270
|
+
const reason = c.base.startsWith("core-") ? "core rolling-log pattern" : "rolling-log pattern (no core- prefix)";
|
|
4271
|
+
return cls("core-log", "exact", reason, { ...rankOrService(m[1]), host: c.host });
|
|
4272
|
+
},
|
|
4273
|
+
(c) => {
|
|
4274
|
+
if (c.dir !== "logs" || !LOG_RE.test(c.base)) return null;
|
|
4275
|
+
const lr = LOKI_RANK_RE.exec(c.base);
|
|
4276
|
+
const lokiId = lr ? `r${lr[1]}` : c.base === LOKI_HM_BASE ? "hm" : void 0;
|
|
4277
|
+
return lokiId !== void 0 ? cls("loki-tail", "exact", "Loki per-rank/host-manager export under logs/", {
|
|
4278
|
+
...rankOrService(lokiId),
|
|
4279
|
+
host: c.host
|
|
4280
|
+
}) : cls("loki-tail", "exact", "Loki component tail under logs/", {
|
|
4281
|
+
component: componentName(c.base),
|
|
4282
|
+
host: c.host
|
|
4283
|
+
});
|
|
4284
|
+
},
|
|
4285
|
+
(c) => c.dir === "logs-local" && LOG_RE.test(c.base) ? cls("component-log", "exact", "component log under logs-local/", {
|
|
4286
|
+
component: componentName(c.base),
|
|
4287
|
+
host: c.host
|
|
4288
|
+
}) : null,
|
|
4289
|
+
// ── Tier B: off-shape name/extension heuristics (inferred) ───────────────────
|
|
4290
|
+
// Host-manager service logs in a flat layout: the rolling-hm log is already caught
|
|
4291
|
+
// above; this catches the service log and the process stdout (.out). This MUST come
|
|
4292
|
+
// before the generic gpudb-prefixed matcher below — both would classify a
|
|
4293
|
+
// `gpudb-host-manager-*.log` as a component-log, but only this one adds the
|
|
4294
|
+
// `service: "host-manager"` tag. Kept separate (not folded into the gpudb matcher) so
|
|
4295
|
+
// a host-manager log WITHOUT a gpudb prefix (e.g. a renamed `hostmanager-*.out`) still
|
|
4296
|
+
// gets the service tag rather than falling through to a plain component-log.
|
|
4297
|
+
(c) => HM_TOKEN_RE.test(c.base) && LOGISH_RE.test(c.base) ? cls("component-log", "inferred", "host-manager service log (name match)", {
|
|
4298
|
+
service: "host-manager",
|
|
4299
|
+
component: componentName(c.base),
|
|
4300
|
+
host: c.host
|
|
4301
|
+
}) : null,
|
|
4302
|
+
// Any other gpudb-prefixed log-ish file in a non-canonical location.
|
|
4303
|
+
(c) => (c.base.startsWith("gpudb") || c.base.startsWith("core-gpudb")) && LOGISH_RE.test(c.base) ? cls("component-log", "inferred", "gpudb log (name match, non-canonical location)", {
|
|
4304
|
+
component: componentName(c.base),
|
|
4305
|
+
host: c.host
|
|
4306
|
+
}) : null,
|
|
4307
|
+
// A log-ish file sitting in a log-named directory, or carrying a rank token.
|
|
4308
|
+
(c) => {
|
|
4309
|
+
if (!LOGISH_RE.test(c.base)) return null;
|
|
4310
|
+
const inLogDir = LOG_DIR_RE.test(c.relPath);
|
|
4311
|
+
const rm = RANK_TOKEN_RE.exec(c.base);
|
|
4312
|
+
if (!inLogDir && !rm) return null;
|
|
4313
|
+
const rank = rm ? `r${rm[1]}` : void 0;
|
|
4314
|
+
const reason = rank ? "log-like file with a rank token" : "log-like file in a log directory";
|
|
4315
|
+
return cls(c.dir === "logs" ? "loki-tail" : "component-log", "inferred", reason, {
|
|
4316
|
+
...rank !== void 0 ? { rank, inferredRank: true } : { component: componentName(c.base) },
|
|
4317
|
+
host: c.host
|
|
4318
|
+
});
|
|
4319
|
+
},
|
|
4320
|
+
// ── Tier C: extension-only fallbacks (weak) ──────────────────────────────────
|
|
4321
|
+
(c) => c.base.endsWith(".txt") ? cls("os-diag", "weak", "fallback: .txt extension", { host: c.host }) : null,
|
|
4322
|
+
(c) => LOGISH_RE.test(c.base) ? cls("component-log", "weak", "fallback: log-like extension", {
|
|
4323
|
+
component: componentName(c.base),
|
|
4324
|
+
host: c.host
|
|
4325
|
+
}) : null
|
|
4326
|
+
];
|
|
4439
4327
|
function classifyFile(relPath) {
|
|
4440
|
-
const base =
|
|
4328
|
+
const base = basename(relPath);
|
|
4441
4329
|
const dir = dirOf(relPath);
|
|
4442
4330
|
const host = inferHost(relPath);
|
|
4443
|
-
|
|
4444
|
-
|
|
4331
|
+
const ctx = { relPath, base, dir, ...host !== void 0 ? { host } : {} };
|
|
4332
|
+
for (const matcher of MATCHERS) {
|
|
4333
|
+
const result = matcher(ctx);
|
|
4334
|
+
if (result) return result;
|
|
4445
4335
|
}
|
|
4446
|
-
|
|
4447
|
-
|
|
4336
|
+
return cls("unknown", "weak", "unrecognized file", { host });
|
|
4337
|
+
}
|
|
4338
|
+
|
|
4339
|
+
// src/bundle/sniff-file.ts
|
|
4340
|
+
var import_promises4 = require("fs/promises");
|
|
4341
|
+
var SNIFF_HEAD_BYTES = 8192;
|
|
4342
|
+
var SNIFF_MAX_LINES = 20;
|
|
4343
|
+
async function readHead(absPath, headBytes) {
|
|
4344
|
+
let fh;
|
|
4345
|
+
try {
|
|
4346
|
+
fh = await (0, import_promises4.open)(absPath, "r");
|
|
4347
|
+
const buf = Buffer.alloc(headBytes);
|
|
4348
|
+
const { bytesRead } = await fh.read(buf, 0, headBytes, 0);
|
|
4349
|
+
return buf.subarray(0, bytesRead).toString("utf-8");
|
|
4350
|
+
} catch {
|
|
4351
|
+
return "";
|
|
4352
|
+
} finally {
|
|
4353
|
+
await fh?.close().catch(() => void 0);
|
|
4448
4354
|
}
|
|
4449
|
-
|
|
4450
|
-
|
|
4355
|
+
}
|
|
4356
|
+
function refineSysinfoKind(command) {
|
|
4357
|
+
const cmd = command.toLowerCase();
|
|
4358
|
+
if (/-v\b|--version|\bgpudb_logger\b/.test(cmd) && cmd.includes("gpudb")) {
|
|
4359
|
+
return { kind: "version-info", detail: "version command" };
|
|
4451
4360
|
}
|
|
4452
|
-
if (
|
|
4453
|
-
return { kind: "
|
|
4361
|
+
if (/\bps\b|\/proc\/|environ|grep .*gpudb/.test(cmd)) {
|
|
4362
|
+
return { kind: "process-info", detail: "process snapshot command" };
|
|
4363
|
+
}
|
|
4364
|
+
return { kind: "os-diag", detail: "host-diagnostic command" };
|
|
4365
|
+
}
|
|
4366
|
+
function logLineResult(rank, severity, isHm) {
|
|
4367
|
+
if (rank !== void 0) {
|
|
4368
|
+
return { kind: "core-log", reason: `log line parsed (${severity}, rank ${rank})`, rank };
|
|
4369
|
+
}
|
|
4370
|
+
if (isHm) {
|
|
4371
|
+
return {
|
|
4372
|
+
kind: "component-log",
|
|
4373
|
+
reason: `log line parsed (${severity}, host-manager)`,
|
|
4374
|
+
service: "host-manager"
|
|
4375
|
+
};
|
|
4376
|
+
}
|
|
4377
|
+
return { kind: "component-log", reason: `log line parsed (${severity})` };
|
|
4378
|
+
}
|
|
4379
|
+
async function sniffFile(absPath, opts = {}) {
|
|
4380
|
+
const headBytes = opts.headBytes ?? SNIFF_HEAD_BYTES;
|
|
4381
|
+
const maxLines = opts.maxLines ?? SNIFF_MAX_LINES;
|
|
4382
|
+
const text2 = await readHead(absPath, headBytes);
|
|
4383
|
+
if (text2 === "") return void 0;
|
|
4384
|
+
const lines = [];
|
|
4385
|
+
for (const raw of text2.split("\n")) {
|
|
4386
|
+
const trimmed = raw.trim();
|
|
4387
|
+
if (trimmed === "") continue;
|
|
4388
|
+
lines.push(raw);
|
|
4389
|
+
if (lines.length >= maxLines) break;
|
|
4390
|
+
}
|
|
4391
|
+
if (lines.length === 0) return void 0;
|
|
4392
|
+
for (const line of lines) {
|
|
4393
|
+
const m = EXEC_CMD_RE.exec(line.trim());
|
|
4394
|
+
if (m) {
|
|
4395
|
+
const { kind, detail } = refineSysinfoKind(m[1]);
|
|
4396
|
+
return { kind, reason: `EXEC_CMD header (${detail})` };
|
|
4397
|
+
}
|
|
4454
4398
|
}
|
|
4455
|
-
const
|
|
4456
|
-
if (
|
|
4457
|
-
|
|
4399
|
+
const unwrapped = unwrapLokiJsonl(lines[0]);
|
|
4400
|
+
if (unwrapped !== void 0) {
|
|
4401
|
+
const p = parseLogLine(unwrapped);
|
|
4402
|
+
const rank = p.rank;
|
|
4403
|
+
return {
|
|
4404
|
+
kind: "loki-tail",
|
|
4405
|
+
reason: `Loki JSONL record${rank ? ` (rank ${rank})` : ""}`,
|
|
4406
|
+
...rank !== void 0 ? { rank } : {}
|
|
4407
|
+
};
|
|
4458
4408
|
}
|
|
4459
|
-
|
|
4460
|
-
const
|
|
4461
|
-
if (
|
|
4462
|
-
|
|
4463
|
-
|
|
4464
|
-
if (dir === "logs") {
|
|
4465
|
-
const lokiRank = LOKI_RANK_RE.exec(base);
|
|
4466
|
-
const lokiId = lokiRank ? `r${lokiRank[1]}` : base === LOKI_HM_BASE ? "hm" : void 0;
|
|
4467
|
-
if (lokiId !== void 0) {
|
|
4468
|
-
return { kind: "loki-tail", ...rankOrService(lokiId), ...host ? { host } : {} };
|
|
4469
|
-
}
|
|
4470
|
-
return { kind: "loki-tail", component: componentName(base), ...host ? { host } : {} };
|
|
4409
|
+
for (const line of lines) {
|
|
4410
|
+
const p = parseLogLine(line);
|
|
4411
|
+
if (p.severity !== void 0 && severityRank(p.severity) >= 0) {
|
|
4412
|
+
const isHm = p.context?.startsWith("hm/") ?? false;
|
|
4413
|
+
return logLineResult(p.rank, p.severity, isHm);
|
|
4471
4414
|
}
|
|
4472
|
-
return { kind: "component-log", component: componentName(base), ...host ? { host } : {} };
|
|
4473
4415
|
}
|
|
4474
|
-
|
|
4475
|
-
|
|
4416
|
+
const hasSection = lines.some((l) => SECTION_RE.test(l.trim()));
|
|
4417
|
+
if (hasSection && parseIni(text2).length >= 2) {
|
|
4418
|
+
return { kind: "config", reason: "INI section + key/value entries" };
|
|
4476
4419
|
}
|
|
4477
|
-
return
|
|
4420
|
+
return void 0;
|
|
4478
4421
|
}
|
|
4479
4422
|
|
|
4480
4423
|
// src/bundle/bundle-index.ts
|
|
4424
|
+
async function refineWithContent(c, absPath) {
|
|
4425
|
+
if (c.confidence !== "weak" || c.kind === "os-diag") return c;
|
|
4426
|
+
const sniff = await sniffFile(absPath);
|
|
4427
|
+
if (!sniff) return c;
|
|
4428
|
+
const addsKind = sniff.kind !== c.kind;
|
|
4429
|
+
const addsRank = sniff.rank !== void 0 && c.rank === void 0;
|
|
4430
|
+
const addsService = sniff.service !== void 0 && c.service === void 0;
|
|
4431
|
+
if (!addsKind && !addsRank && !addsService) return c;
|
|
4432
|
+
return {
|
|
4433
|
+
...c,
|
|
4434
|
+
kind: sniff.kind,
|
|
4435
|
+
confidence: "inferred",
|
|
4436
|
+
reason: `content: ${sniff.reason}`,
|
|
4437
|
+
...sniff.rank !== void 0 ? { rank: sniff.rank } : {},
|
|
4438
|
+
...sniff.service !== void 0 ? { service: sniff.service } : {}
|
|
4439
|
+
};
|
|
4440
|
+
}
|
|
4481
4441
|
async function buildIndex(rootDir) {
|
|
4482
4442
|
let relPaths;
|
|
4443
|
+
let realRoot;
|
|
4483
4444
|
try {
|
|
4484
|
-
relPaths = await (0,
|
|
4445
|
+
relPaths = await (0, import_promises5.readdir)(rootDir, { recursive: true });
|
|
4446
|
+
realRoot = await (0, import_promises5.realpath)(rootDir);
|
|
4485
4447
|
} catch {
|
|
4486
4448
|
return [];
|
|
4487
4449
|
}
|
|
4450
|
+
const dirConfined = /* @__PURE__ */ new Map();
|
|
4451
|
+
const isDirConfined = (dir) => {
|
|
4452
|
+
let verdict = dirConfined.get(dir);
|
|
4453
|
+
if (verdict === void 0) {
|
|
4454
|
+
verdict = (0, import_promises5.realpath)(dir).then(
|
|
4455
|
+
(realDir) => realDir === realRoot || realDir.startsWith(realRoot + import_node_path5.sep),
|
|
4456
|
+
() => false
|
|
4457
|
+
// an unresolvable directory (broken/cyclic symlink) → drop its entries
|
|
4458
|
+
);
|
|
4459
|
+
dirConfined.set(dir, verdict);
|
|
4460
|
+
}
|
|
4461
|
+
return verdict;
|
|
4462
|
+
};
|
|
4488
4463
|
const settled = await Promise.all(
|
|
4489
4464
|
relPaths.map(async (rel) => {
|
|
4490
4465
|
const relPath = rel.split("\\").join("/");
|
|
4491
4466
|
const absPath = (0, import_node_path5.join)(rootDir, rel);
|
|
4492
4467
|
try {
|
|
4493
|
-
const s = await (0,
|
|
4468
|
+
const s = await (0, import_promises5.lstat)(absPath);
|
|
4494
4469
|
if (s.isSymbolicLink() || !s.isFile()) return null;
|
|
4495
|
-
|
|
4470
|
+
if (!await isDirConfined((0, import_node_path5.dirname)(absPath))) return null;
|
|
4471
|
+
const c = await refineWithContent(classifyFile(relPath), absPath);
|
|
4496
4472
|
return {
|
|
4497
4473
|
relPath,
|
|
4498
4474
|
absPath,
|
|
4499
4475
|
kind: c.kind,
|
|
4476
|
+
confidence: c.confidence,
|
|
4477
|
+
...c.reason !== void 0 ? { reason: c.reason } : {},
|
|
4500
4478
|
...c.rank !== void 0 ? { rank: c.rank } : {},
|
|
4479
|
+
...c.inferredRank !== void 0 ? { inferredRank: c.inferredRank } : {},
|
|
4501
4480
|
...c.service !== void 0 ? { service: c.service } : {},
|
|
4502
4481
|
...c.host !== void 0 ? { host: c.host } : {},
|
|
4503
4482
|
...c.component !== void 0 ? { component: c.component } : {},
|
|
@@ -4513,6 +4492,25 @@ async function buildIndex(rootDir) {
|
|
|
4513
4492
|
|
|
4514
4493
|
// src/bundle/BundleSource.ts
|
|
4515
4494
|
var GPUDB_VERSION_RE = /GPUdb version\s*:\s*(\S+)/;
|
|
4495
|
+
var ANCHOR_KINDS = ["config", "version-info"];
|
|
4496
|
+
var MIN_ANCHORS_FOR_CANONICAL = 2;
|
|
4497
|
+
var PARTIAL_INFERRED_FRACTION = 0.25;
|
|
4498
|
+
function assessLayout(inventory) {
|
|
4499
|
+
const anchorsPresent = ANCHOR_KINDS.filter((k) => (inventory.byKind[k] ?? 0) > 0).length;
|
|
4500
|
+
const inferredFraction = inventory.totalFiles > 0 ? inventory.inferredFiles / inventory.totalFiles : 0;
|
|
4501
|
+
let layout;
|
|
4502
|
+
if (anchorsPresent === 0) layout = "unfamiliar";
|
|
4503
|
+
else if (anchorsPresent >= MIN_ANCHORS_FOR_CANONICAL && inferredFraction < PARTIAL_INFERRED_FRACTION)
|
|
4504
|
+
layout = "canonical";
|
|
4505
|
+
else layout = "partial";
|
|
4506
|
+
if (layout === "canonical") return { layout };
|
|
4507
|
+
const bits = [`${inventory.inferredFiles}/${inventory.totalFiles} files classified by inference`];
|
|
4508
|
+
if (inventory.unknownFiles > 0) bits.push(`${inventory.unknownFiles} unclassified`);
|
|
4509
|
+
if (inventory.inferredRanks.length > 0)
|
|
4510
|
+
bits.push(`inferred ranks ${inventory.inferredRanks.join(", ")} (unconfirmed)`);
|
|
4511
|
+
const layoutWarning = layout === "unfamiliar" ? `This bundle does not match the canonical gpudb_sysinfo layout \u2014 no config/version/host-diagnostic files were found. Working from inference: ${bits.join("; ")}.` : `This bundle only partially matches the canonical layout: ${bits.join("; ")}.`;
|
|
4512
|
+
return { layout, layoutWarning };
|
|
4513
|
+
}
|
|
4516
4514
|
function selectLogFiles(index, opts) {
|
|
4517
4515
|
if (opts.component !== void 0) {
|
|
4518
4516
|
return index.filter(
|
|
@@ -4546,7 +4544,8 @@ function toLineQuery(q) {
|
|
|
4546
4544
|
...q.minSeverity !== void 0 ? { minSeverity: q.minSeverity } : {},
|
|
4547
4545
|
...q.fromTs !== void 0 ? { fromTs: q.fromTs } : {},
|
|
4548
4546
|
...q.toTs !== void 0 ? { toTs: q.toTs } : {},
|
|
4549
|
-
...q.maxMatches !== void 0 ? { maxMatches: q.maxMatches } : {}
|
|
4547
|
+
...q.maxMatches !== void 0 ? { maxMatches: q.maxMatches } : {},
|
|
4548
|
+
...q.coalesceMultiline !== void 0 ? { coalesceMultiline: q.coalesceMultiline } : {}
|
|
4550
4549
|
};
|
|
4551
4550
|
}
|
|
4552
4551
|
function toTimelineLineQuery(q) {
|
|
@@ -4567,12 +4566,17 @@ async function createBundleSource(rootDir) {
|
|
|
4567
4566
|
const inventoryValue = (() => {
|
|
4568
4567
|
const byKind = {};
|
|
4569
4568
|
const rankSet = /* @__PURE__ */ new Set();
|
|
4569
|
+
const inferredRankSet = /* @__PURE__ */ new Set();
|
|
4570
4570
|
const serviceSet = /* @__PURE__ */ new Set();
|
|
4571
4571
|
let totalBytes = 0;
|
|
4572
|
+
let inferredFiles = 0;
|
|
4573
|
+
let unknownFiles = 0;
|
|
4572
4574
|
for (const e of index) {
|
|
4573
4575
|
byKind[e.kind] = (byKind[e.kind] ?? 0) + 1;
|
|
4574
4576
|
totalBytes += e.sizeBytes;
|
|
4575
|
-
if (e.
|
|
4577
|
+
if (e.confidence === "inferred") inferredFiles++;
|
|
4578
|
+
if (e.kind === "unknown") unknownFiles++;
|
|
4579
|
+
if (e.rank) (e.inferredRank ? inferredRankSet : rankSet).add(e.rank);
|
|
4576
4580
|
if (e.service) serviceSet.add(e.service);
|
|
4577
4581
|
}
|
|
4578
4582
|
return {
|
|
@@ -4580,14 +4584,17 @@ async function createBundleSource(rootDir) {
|
|
|
4580
4584
|
totalBytes,
|
|
4581
4585
|
byKind,
|
|
4582
4586
|
ranks: [...rankSet].sort(),
|
|
4583
|
-
|
|
4587
|
+
inferredRanks: [...inferredRankSet].filter((r) => !rankSet.has(r)).sort(),
|
|
4588
|
+
services: [...serviceSet].sort(),
|
|
4589
|
+
inferredFiles,
|
|
4590
|
+
unknownFiles
|
|
4584
4591
|
};
|
|
4585
4592
|
})();
|
|
4586
4593
|
const detectVersion = async () => {
|
|
4587
4594
|
const versionFile = findByKind("version-info");
|
|
4588
4595
|
if (versionFile) {
|
|
4589
4596
|
try {
|
|
4590
|
-
const parsed = parseSysinfo(await (0,
|
|
4597
|
+
const parsed = parseSysinfo(await (0, import_promises6.readFile)(versionFile.absPath, "utf-8"));
|
|
4591
4598
|
for (const block of parsed.blocks) {
|
|
4592
4599
|
const m = GPUDB_VERSION_RE.exec(block.output);
|
|
4593
4600
|
if (m) return m[1];
|
|
@@ -4598,7 +4605,7 @@ async function createBundleSource(rootDir) {
|
|
|
4598
4605
|
const configFile = findByKind("config");
|
|
4599
4606
|
if (configFile) {
|
|
4600
4607
|
try {
|
|
4601
|
-
const entries = parseIni(await (0,
|
|
4608
|
+
const entries = parseIni(await (0, import_promises6.readFile)(configFile.absPath, "utf-8"));
|
|
4602
4609
|
return entries.find((e) => e.key === "file_version")?.value;
|
|
4603
4610
|
} catch {
|
|
4604
4611
|
return void 0;
|
|
@@ -4610,7 +4617,7 @@ async function createBundleSource(rootDir) {
|
|
|
4610
4617
|
const configFile = index.find((e) => e.kind === "config" && e.relPath.endsWith("gpudb.conf")) ?? findByKind("config");
|
|
4611
4618
|
if (!configFile) return { error: "no gpudb.conf found in bundle" };
|
|
4612
4619
|
try {
|
|
4613
|
-
const entries = parseIni(await (0,
|
|
4620
|
+
const entries = parseIni(await (0, import_promises6.readFile)(configFile.absPath, "utf-8"));
|
|
4614
4621
|
return { entries: filterIni(entries, opts), file: configFile.relPath };
|
|
4615
4622
|
} catch (err) {
|
|
4616
4623
|
return { error: err instanceof Error ? err.message : String(err) };
|
|
@@ -4624,7 +4631,7 @@ async function createBundleSource(rootDir) {
|
|
|
4624
4631
|
const abs = resolve3(entry.relPath);
|
|
4625
4632
|
if (!abs) return { error: `path "${name}" escapes the bundle root` };
|
|
4626
4633
|
try {
|
|
4627
|
-
return parseSysinfo(await (0,
|
|
4634
|
+
return parseSysinfo(await (0, import_promises6.readFile)(abs, "utf-8"));
|
|
4628
4635
|
} catch (err) {
|
|
4629
4636
|
return { error: err instanceof Error ? err.message : String(err) };
|
|
4630
4637
|
}
|
|
@@ -4683,7 +4690,7 @@ async function createBundleSource(rootDir) {
|
|
|
4683
4690
|
const lines = [];
|
|
4684
4691
|
for (const file of files) {
|
|
4685
4692
|
try {
|
|
4686
|
-
const content = await (0,
|
|
4693
|
+
const content = await (0, import_promises6.readFile)(file.absPath, "utf-8");
|
|
4687
4694
|
for (const line of content.split("\n")) {
|
|
4688
4695
|
const trimmed = line.trim();
|
|
4689
4696
|
if (trimmed !== "" && !/^-{3,}$/.test(trimmed)) lines.push(trimmed);
|
|
@@ -4707,13 +4714,281 @@ async function createBundleSource(rootDir) {
|
|
|
4707
4714
|
};
|
|
4708
4715
|
}
|
|
4709
4716
|
|
|
4717
|
+
// src/bundle/known-files.ts
|
|
4718
|
+
var KNOWN_BUNDLE_FILES = {
|
|
4719
|
+
// Host resources
|
|
4720
|
+
"cpu.txt": "CPU topology, NUMA, and interrupts (lscpu, numactl, /proc/cpuinfo, /proc/interrupts)",
|
|
4721
|
+
"mem.txt": "Memory usage, /proc/meminfo, and transparent-hugepage setting (free -m -t)",
|
|
4722
|
+
"disk.txt": "Filesystems, mounts, block devices, and disk stats (df, mount, lsblk, fdisk, /etc/fstab, /proc/diskstats)",
|
|
4723
|
+
"gpu.txt": "NVIDIA GPU inventory and state (nvidia-smi -L/-q, modinfo nvidia)",
|
|
4724
|
+
"net.txt": "Network interfaces, sockets, and DNS (hostname, ifconfig, netstat, /etc/resolv.conf)",
|
|
4725
|
+
// Processes
|
|
4726
|
+
"ps.txt": "Full process list (ps -auxww, ps -ejHlfww)",
|
|
4727
|
+
"gpudb-exe.txt": "Running gpudb processes (ps auxfwww | grep gpudb)",
|
|
4728
|
+
// Hardware / firmware
|
|
4729
|
+
"dmidecode.txt": "BIOS / DMI hardware inventory (dmidecode)",
|
|
4730
|
+
"lshw.txt": "Hardware listing (lshw -short -numeric)",
|
|
4731
|
+
"pci.txt": "PCI devices and I/O resources (lspci, /proc/ioports, /proc/iomem)",
|
|
4732
|
+
// Kernel / OS
|
|
4733
|
+
"dmesg.txt": "Kernel ring buffer \u2014 boot and runtime kernel messages (dmesg -T)",
|
|
4734
|
+
"dmesg-timestamp.txt": "Kernel ring buffer with human-readable timestamps",
|
|
4735
|
+
"sysctl.txt": "Kernel tunables (sysctl -a)",
|
|
4736
|
+
"sys.txt": "OS identity, uptime, ulimits, kernel cmdline, clocksource, and loaded modules (uname, ulimit, /proc/cmdline, lsmod)",
|
|
4737
|
+
"lsof.txt": "Open files and network sockets (lsof -n -P)",
|
|
4738
|
+
"lslocks.txt": "Held file locks (lslocks)",
|
|
4739
|
+
// Packages / linker / accounts
|
|
4740
|
+
"deb.txt": "Installed Debian packages and verification (dpkg -l, dpkg -V)",
|
|
4741
|
+
"rpm.txt": "Installed RPM packages (rpm -qa)",
|
|
4742
|
+
"ld.so.conf.txt": "Dynamic-linker library search paths (/etc/ld.so.conf)",
|
|
4743
|
+
"user.txt": "Users, groups, and the gpudb service account (whoami, id, /etc/passwd, /etc/group)",
|
|
4744
|
+
"sudoers.txt": "Sudo configuration (/etc/sudoers)",
|
|
4745
|
+
"etc_profile.txt": "Login shell profile (/etc/profile)",
|
|
4746
|
+
"etc_bashrc.txt": "System bashrc (/etc/bashrc)",
|
|
4747
|
+
"etc_host.txt": "Static hostname resolution (/etc/hosts)",
|
|
4748
|
+
// Kinetica-specific
|
|
4749
|
+
"gpudb.txt": "GPUdb version/build, binary md5 + ldd, and the captured gpudb.conf / gpudb_logger.conf ($GPUDB_EXE -v)",
|
|
4750
|
+
"gpudb_core_etc_gpudb.conf": "The live gpudb.conf at capture time (the database's main config)",
|
|
4751
|
+
"gpudb_core_etc_gpudb_logger.conf": "The logging configuration (gpudb_logger.conf)",
|
|
4752
|
+
"loki-info.txt": "Loki log-index stats: labels, series, and per-class volume (logcli)",
|
|
4753
|
+
"sql-queries.txt": "SQL query log extracted from Loki (logcli)",
|
|
4754
|
+
"tables.txt": "Table schemas and column types (gadmin --schema), when collected",
|
|
4755
|
+
"logfiles.txt": "Manifest: the log directories/files the collector enumerated",
|
|
4756
|
+
"errors.txt": "Collection commands that FAILED during capture (Evidence Gaps)",
|
|
4757
|
+
"proc-logs-erros.txt": "Per-process log-collection failures during capture (Evidence Gaps)"
|
|
4758
|
+
};
|
|
4759
|
+
var KIND_DESCRIPTIONS = {
|
|
4760
|
+
"core-log": "Per-rank rolling Kinetica core log (the primary incident narrative)",
|
|
4761
|
+
"component-log": "Component service log (sql-engine, httpd, reveal, tomcat, stats, \u2026)",
|
|
4762
|
+
"loki-tail": "Last-2h Loki tail for a service (small; searched only when no core logs exist)",
|
|
4763
|
+
"process-info": "Per-rank process snapshot: command line, PID, and environment (/proc/<pid>/environ)",
|
|
4764
|
+
config: "Kinetica configuration file",
|
|
4765
|
+
"version-info": "GPUdb version/build information",
|
|
4766
|
+
"collection-errors": "Collection commands that FAILED during capture (Evidence Gaps)",
|
|
4767
|
+
manifest: "Manifest of log directories/files the collector enumerated"
|
|
4768
|
+
};
|
|
4769
|
+
function basename2(relPath) {
|
|
4770
|
+
const parts = relPath.split("/");
|
|
4771
|
+
return parts[parts.length - 1] ?? relPath;
|
|
4772
|
+
}
|
|
4773
|
+
function describeBundleFile(entry) {
|
|
4774
|
+
return KNOWN_BUNDLE_FILES[basename2(entry.relPath)] ?? KIND_DESCRIPTIONS[entry.kind] ?? "";
|
|
4775
|
+
}
|
|
4776
|
+
|
|
4777
|
+
// src/tools/bundle/list-files.ts
|
|
4778
|
+
var BundleListFilesSchema = import_zod18.z.object({
|
|
4779
|
+
kind: import_zod18.z.string().optional()
|
|
4780
|
+
});
|
|
4781
|
+
var MAX_UNKNOWN_LISTED = 40;
|
|
4782
|
+
async function bundleListFiles(source, args = {}) {
|
|
4783
|
+
const all = source.listFiles();
|
|
4784
|
+
const filtered = args.kind ? all.filter((e) => e.kind === args.kind) : all;
|
|
4785
|
+
const inventory = source.inventory();
|
|
4786
|
+
const {
|
|
4787
|
+
totalFiles,
|
|
4788
|
+
totalBytes,
|
|
4789
|
+
byKind,
|
|
4790
|
+
ranks,
|
|
4791
|
+
inferredRanks,
|
|
4792
|
+
services,
|
|
4793
|
+
inferredFiles,
|
|
4794
|
+
unknownFiles
|
|
4795
|
+
} = inventory;
|
|
4796
|
+
const { layout, layoutWarning } = assessLayout(inventory);
|
|
4797
|
+
const version = await source.detectVersion();
|
|
4798
|
+
const errors = await source.collectionErrors();
|
|
4799
|
+
const files = filtered.map((e) => ({
|
|
4800
|
+
file: e.relPath,
|
|
4801
|
+
kind: e.kind,
|
|
4802
|
+
// How sure the classification is: exact (canonical name) | inferred (heuristic) | weak.
|
|
4803
|
+
confidence: e.confidence,
|
|
4804
|
+
...e.reason !== void 0 ? { why: e.reason } : {},
|
|
4805
|
+
rank: e.rank ?? "",
|
|
4806
|
+
size_kb: Math.round(e.sizeBytes / 1024),
|
|
4807
|
+
// What the file contains — so the agent can pick the right one without reading it.
|
|
4808
|
+
description: describeBundleFile(e)
|
|
4809
|
+
}));
|
|
4810
|
+
const unknownPaths = all.filter((e) => e.kind === "unknown").map((e) => e.relPath);
|
|
4811
|
+
return {
|
|
4812
|
+
ok: true,
|
|
4813
|
+
data: {
|
|
4814
|
+
detected_version: version ?? "unknown",
|
|
4815
|
+
// How well the bundle matches the canonical gpudb_sysinfo layout.
|
|
4816
|
+
layout_match: layout,
|
|
4817
|
+
...layoutWarning !== void 0 ? { layout_note: layoutWarning } : {},
|
|
4818
|
+
ranks_present: ranks.join(", ") || "none",
|
|
4819
|
+
...inferredRanks.length > 0 ? { inferred_ranks_unconfirmed: inferredRanks.join(", ") } : {},
|
|
4820
|
+
services_present: services.join(", ") || "none",
|
|
4821
|
+
total_files: totalFiles,
|
|
4822
|
+
total_size_mb: Number((totalBytes / 1e6).toFixed(1)),
|
|
4823
|
+
counts_by_kind: byKind,
|
|
4824
|
+
inferred_files: inferredFiles,
|
|
4825
|
+
unknown_files: unknownFiles,
|
|
4826
|
+
...unknownPaths.length > 0 ? {
|
|
4827
|
+
unknown_file_paths: unknownPaths.slice(0, MAX_UNKNOWN_LISTED),
|
|
4828
|
+
...unknownPaths.length > MAX_UNKNOWN_LISTED ? { unknown_file_paths_truncated: unknownPaths.length - MAX_UNKNOWN_LISTED } : {}
|
|
4829
|
+
} : {},
|
|
4830
|
+
failed_collections: errors.length,
|
|
4831
|
+
files
|
|
4832
|
+
}
|
|
4833
|
+
};
|
|
4834
|
+
}
|
|
4835
|
+
|
|
4836
|
+
// src/tools/bundle/log-timeline.ts
|
|
4837
|
+
var import_zod19 = require("zod");
|
|
4838
|
+
var BundleLogTimelineSchema = import_zod19.z.object({
|
|
4839
|
+
min_severity: import_zod19.z.enum(["INFO", "WARN", "UERR", "ERROR", "FATAL"]).optional(),
|
|
4840
|
+
granularity: import_zod19.z.enum(["day", "hour", "minute"]).optional(),
|
|
4841
|
+
rank: import_zod19.z.string().describe('Numeric rank only, e.g. "r0"/"r1". For the host manager use host_manager.').optional(),
|
|
4842
|
+
host_manager: import_zod19.z.boolean().describe("Bucket the host-manager (hm) log \u2014 a singleton service, not a rank.").optional(),
|
|
4843
|
+
component: import_zod19.z.string().optional(),
|
|
4844
|
+
include_components: import_zod19.z.boolean().optional()
|
|
4845
|
+
});
|
|
4846
|
+
async function bundleLogTimeline(source, args = {}) {
|
|
4847
|
+
const query3 = {
|
|
4848
|
+
...args.min_severity !== void 0 ? { minSeverity: args.min_severity } : {},
|
|
4849
|
+
...args.granularity !== void 0 ? { granularity: args.granularity } : {},
|
|
4850
|
+
...args.rank !== void 0 ? { rank: args.rank } : {},
|
|
4851
|
+
...args.host_manager !== void 0 ? { hostManager: args.host_manager } : {},
|
|
4852
|
+
...args.component !== void 0 ? { component: args.component } : {},
|
|
4853
|
+
...args.include_components !== void 0 ? { includeComponents: args.include_components } : {}
|
|
4854
|
+
};
|
|
4855
|
+
const result = await source.logTimeline(query3);
|
|
4856
|
+
const severities = [...new Set(result.buckets.flatMap((b) => Object.keys(b.counts)))];
|
|
4857
|
+
const order = ["FATAL", "ERROR", "UERR", "WARN", "INFO"];
|
|
4858
|
+
severities.sort((a, b) => order.indexOf(a) - order.indexOf(b));
|
|
4859
|
+
const rows = result.buckets.map((b) => {
|
|
4860
|
+
const row = { time_bucket: b.bucket };
|
|
4861
|
+
for (const sev of severities) row[sev] = b.counts[sev] ?? 0;
|
|
4862
|
+
row.total = b.total;
|
|
4863
|
+
return row;
|
|
4864
|
+
});
|
|
4865
|
+
return {
|
|
4866
|
+
ok: true,
|
|
4867
|
+
note: result.totalCounted === 0 ? "No lines at or above the severity threshold \u2014 try a lower min_severity." : `${result.totalCounted} event(s) across ${result.buckets.length} bucket(s), ${result.filesScanned.length} file(s).`,
|
|
4868
|
+
data: {
|
|
4869
|
+
lines_scanned: result.linesScanned,
|
|
4870
|
+
files_scanned: result.filesScanned.join(", ") || "none",
|
|
4871
|
+
buckets: rows
|
|
4872
|
+
}
|
|
4873
|
+
};
|
|
4874
|
+
}
|
|
4875
|
+
|
|
4876
|
+
// src/tools/bundle/search-logs.ts
|
|
4877
|
+
var import_zod20 = require("zod");
|
|
4878
|
+
var BundleSearchLogsSchema = import_zod20.z.object({
|
|
4879
|
+
regex: import_zod20.z.string().optional(),
|
|
4880
|
+
min_severity: import_zod20.z.enum(["INFO", "WARN", "UERR", "ERROR", "FATAL"]).optional(),
|
|
4881
|
+
from_ts: import_zod20.z.string().optional(),
|
|
4882
|
+
to_ts: import_zod20.z.string().optional(),
|
|
4883
|
+
rank: import_zod20.z.string().describe('Numeric rank only, e.g. "r0"/"r1". For the host manager use host_manager.').optional(),
|
|
4884
|
+
host_manager: import_zod20.z.boolean().describe("Search the host-manager (hm) log \u2014 a singleton service, not a rank.").optional(),
|
|
4885
|
+
component: import_zod20.z.string().optional(),
|
|
4886
|
+
include_components: import_zod20.z.boolean().optional(),
|
|
4887
|
+
include_multiline: import_zod20.z.boolean().describe(
|
|
4888
|
+
"Reconstruct multi-line log records: append continuation lines (those with no timestamp) to each match. Use this to capture a full SQL statement on an 'Executing SQL:' line \u2014 the query often spans many lines because the SQL has embedded newlines, and a plain match shows only its first line. Works on the rolling core logs (logs-local/); Loki per-rank tails (logs/rankN.log) keep only the statement's first line, so there are no continuation lines to stitch there."
|
|
4889
|
+
).optional(),
|
|
4890
|
+
max_matches: import_zod20.z.number().int().min(1).max(1e3).optional()
|
|
4891
|
+
});
|
|
4892
|
+
async function bundleSearchLogs(source, args = {}) {
|
|
4893
|
+
const query3 = {
|
|
4894
|
+
...args.regex !== void 0 ? { regex: args.regex } : {},
|
|
4895
|
+
...args.min_severity !== void 0 ? { minSeverity: args.min_severity } : {},
|
|
4896
|
+
...args.from_ts !== void 0 ? { fromTs: args.from_ts } : {},
|
|
4897
|
+
...args.to_ts !== void 0 ? { toTs: args.to_ts } : {},
|
|
4898
|
+
...args.rank !== void 0 ? { rank: args.rank } : {},
|
|
4899
|
+
...args.host_manager !== void 0 ? { hostManager: args.host_manager } : {},
|
|
4900
|
+
...args.component !== void 0 ? { component: args.component } : {},
|
|
4901
|
+
...args.include_components !== void 0 ? { includeComponents: args.include_components } : {},
|
|
4902
|
+
...args.include_multiline !== void 0 ? { coalesceMultiline: args.include_multiline } : {},
|
|
4903
|
+
...args.max_matches !== void 0 ? { maxMatches: args.max_matches } : {}
|
|
4904
|
+
};
|
|
4905
|
+
const result = await source.searchLogs(query3);
|
|
4906
|
+
const note = result.capped ? `Showing ${result.matches.length} of ${result.totalMatched} matches across ${result.filesScanned.length} file(s) (display capped). Narrow with a tighter regex, severity, or time window to surface the specific lines.` : `${result.totalMatched} match(es) across ${result.filesScanned.length} file(s).`;
|
|
4907
|
+
return {
|
|
4908
|
+
ok: true,
|
|
4909
|
+
note,
|
|
4910
|
+
data: {
|
|
4911
|
+
total_matched: result.totalMatched,
|
|
4912
|
+
lines_scanned: result.linesScanned,
|
|
4913
|
+
files_scanned: result.filesScanned.join(", ") || "none",
|
|
4914
|
+
capped: result.capped,
|
|
4915
|
+
matches: result.matches.map((m) => ({
|
|
4916
|
+
file: m.file,
|
|
4917
|
+
line: m.lineNumber,
|
|
4918
|
+
timestamp: m.timestamp ?? "",
|
|
4919
|
+
severity: m.severity ?? "",
|
|
4920
|
+
rank: m.rank ?? "",
|
|
4921
|
+
message: m.message
|
|
4922
|
+
}))
|
|
4923
|
+
}
|
|
4924
|
+
};
|
|
4925
|
+
}
|
|
4926
|
+
|
|
4927
|
+
// src/tools/bundle/read-config.ts
|
|
4928
|
+
var import_zod21 = require("zod");
|
|
4929
|
+
var BundleReadConfigSchema = import_zod21.z.object({
|
|
4930
|
+
section: import_zod21.z.string().optional(),
|
|
4931
|
+
key: import_zod21.z.string().optional()
|
|
4932
|
+
});
|
|
4933
|
+
async function bundleReadConfig(source, args = {}) {
|
|
4934
|
+
const result = await source.readConfig({
|
|
4935
|
+
...args.section !== void 0 ? { section: args.section } : {},
|
|
4936
|
+
...args.key !== void 0 ? { key: args.key } : {}
|
|
4937
|
+
});
|
|
4938
|
+
if ("error" in result) {
|
|
4939
|
+
return { ok: false, status: 0, error: result.error, raw: "" };
|
|
4940
|
+
}
|
|
4941
|
+
if (result.entries.length === 0 && args.section !== void 0) {
|
|
4942
|
+
const all = await source.readConfig(args.key !== void 0 ? { key: args.key } : {});
|
|
4943
|
+
const sections = "error" in all ? [] : [...new Set(all.entries.map((e) => e.section))].sort();
|
|
4944
|
+
const sectionList = sections.map((s) => s === "" ? "(flat/top-level)" : s).join(", ");
|
|
4945
|
+
return {
|
|
4946
|
+
ok: true,
|
|
4947
|
+
note: `No entries in section "${args.section}" of ${result.file}. gpudb.conf is largely flat \u2014 retry filtering by key only. Sections present: ${sectionList || "(none)"}.`,
|
|
4948
|
+
data: { section_not_found: args.section, available_sections: sections }
|
|
4949
|
+
};
|
|
4950
|
+
}
|
|
4951
|
+
return {
|
|
4952
|
+
ok: true,
|
|
4953
|
+
note: `${result.entries.length} entr(y/ies) from ${result.file}.`,
|
|
4954
|
+
data: result.entries.map((e) => ({ section: e.section, key: e.key, value: e.value }))
|
|
4955
|
+
};
|
|
4956
|
+
}
|
|
4957
|
+
|
|
4958
|
+
// src/tools/bundle/read-sysinfo.ts
|
|
4959
|
+
var import_zod22 = require("zod");
|
|
4960
|
+
var BundleReadSysinfoSchema = import_zod22.z.object({
|
|
4961
|
+
name: import_zod22.z.string().min(1)
|
|
4962
|
+
});
|
|
4963
|
+
async function bundleReadSysinfo(source, args) {
|
|
4964
|
+
const result = await source.readSysinfo(args.name);
|
|
4965
|
+
if ("error" in result) {
|
|
4966
|
+
return { ok: false, status: 0, error: result.error, raw: "" };
|
|
4967
|
+
}
|
|
4968
|
+
return {
|
|
4969
|
+
ok: true,
|
|
4970
|
+
data: {
|
|
4971
|
+
...result.header !== void 0 ? { source_file: result.header } : {},
|
|
4972
|
+
blocks: result.blocks.map((b) => ({
|
|
4973
|
+
command: b.command,
|
|
4974
|
+
...b.exitCode !== void 0 ? { exit_code: b.exitCode } : {},
|
|
4975
|
+
output: b.output
|
|
4976
|
+
}))
|
|
4977
|
+
}
|
|
4978
|
+
};
|
|
4979
|
+
}
|
|
4980
|
+
|
|
4981
|
+
// src/tools/bundle/load-bundle.ts
|
|
4982
|
+
var import_zod23 = require("zod");
|
|
4983
|
+
|
|
4710
4984
|
// src/bundle/verify-bundle.ts
|
|
4985
|
+
var import_promises7 = require("fs/promises");
|
|
4711
4986
|
var ARCHIVE_RE = /\.(tgz|tar\.gz|tar|gz|zip)$/i;
|
|
4712
4987
|
var EXPECTED_KINDS = ["config", "core-log"];
|
|
4713
4988
|
async function verifyBundle(bundlePath) {
|
|
4714
4989
|
let info;
|
|
4715
4990
|
try {
|
|
4716
|
-
info = await (0,
|
|
4991
|
+
info = await (0, import_promises7.stat)(bundlePath);
|
|
4717
4992
|
} catch {
|
|
4718
4993
|
return { ok: false, error: `bundle path does not exist: ${bundlePath}` };
|
|
4719
4994
|
}
|
|
@@ -4733,12 +5008,15 @@ async function verifyBundle(bundlePath) {
|
|
|
4733
5008
|
}
|
|
4734
5009
|
const missingExpected = EXPECTED_KINDS.filter((k) => (inventory.byKind[k] ?? 0) === 0);
|
|
4735
5010
|
const kineticaVersion = await bundleSource.detectVersion();
|
|
5011
|
+
const { layout, layoutWarning } = assessLayout(inventory);
|
|
4736
5012
|
return {
|
|
4737
5013
|
ok: true,
|
|
4738
5014
|
bundleSource,
|
|
4739
5015
|
...kineticaVersion !== void 0 ? { kineticaVersion } : {},
|
|
4740
5016
|
inventory,
|
|
4741
|
-
missingExpected
|
|
5017
|
+
missingExpected,
|
|
5018
|
+
layout,
|
|
5019
|
+
...layoutWarning !== void 0 ? { layoutWarning } : {}
|
|
4742
5020
|
};
|
|
4743
5021
|
}
|
|
4744
5022
|
|
|
@@ -4957,7 +5235,7 @@ Before gathering evidence, announce a brief 2-3 line plan: restate the issue, li
|
|
|
4957
5235
|
|
|
4958
5236
|
### Round 1 \u2014 Orient
|
|
4959
5237
|
|
|
4960
|
-
- ${t}kinetica_bundle_list_files${t} \u2014 **ALWAYS FIRST.** Learn the detected version, which ranks are present, what file kinds exist, and how many collections failed.
|
|
5238
|
+
- ${t}kinetica_bundle_list_files${t} \u2014 **ALWAYS FIRST.** Learn the detected version, which ranks are present, what file kinds exist, and how many collections failed. Check ${t}layout_match${t}: if it is not ${t}canonical${t}, this bundle is off-shape (e.g. a logs-only dump) \u2014 read the ${t}layout_note${t}, treat any ${t}unknown_file_paths${t} as evidence to inspect by hand (open one with ${t}kinetica_bundle_read_sysinfo${t}), and trust ${t}ranks_present${t} over ${t}inferred_ranks_unconfirmed${t}. See the support-bundle reference ("When the bundle doesn't match the expected layout").
|
|
4961
5239
|
- ${t}kinetica_bundle_log_timeline${t} (min_severity: WARN) \u2014 get the incident shape: when did WARN/ERROR/FATAL spike, and on which rank?
|
|
4962
5240
|
|
|
4963
5241
|
### Round 2 \u2014 Drill Down
|
|
@@ -5077,7 +5355,7 @@ function createBundleHolder(initial) {
|
|
|
5077
5355
|
}
|
|
5078
5356
|
|
|
5079
5357
|
// src/cli/pick-bundle-path.ts
|
|
5080
|
-
var
|
|
5358
|
+
var import_promises8 = require("fs/promises");
|
|
5081
5359
|
var import_node_path7 = require("path");
|
|
5082
5360
|
function isPermissionError(err) {
|
|
5083
5361
|
if (typeof err !== "object" || err === null || !("code" in err)) return false;
|
|
@@ -5092,7 +5370,7 @@ async function listDirectoryCandidates(term) {
|
|
|
5092
5370
|
const resolved = (0, import_node_path7.resolve)(baseDir);
|
|
5093
5371
|
let entries;
|
|
5094
5372
|
try {
|
|
5095
|
-
entries = await (0,
|
|
5373
|
+
entries = await (0, import_promises8.readdir)(resolved, { withFileTypes: true });
|
|
5096
5374
|
} catch (err) {
|
|
5097
5375
|
if (isPermissionError(err)) return { kind: "denied", dir: resolved };
|
|
5098
5376
|
return { kind: "ok", candidates: [] };
|
|
@@ -6087,7 +6365,7 @@ async function logout() {
|
|
|
6087
6365
|
|
|
6088
6366
|
// src/session/env-file.ts
|
|
6089
6367
|
var import_fs2 = require("fs");
|
|
6090
|
-
var
|
|
6368
|
+
var import_promises9 = require("fs/promises");
|
|
6091
6369
|
var import_path2 = require("path");
|
|
6092
6370
|
var import_picocolors11 = __toESM(require("picocolors"));
|
|
6093
6371
|
function parseEnvContent(content) {
|
|
@@ -6179,11 +6457,11 @@ async function offerSaveCredentials(url, user, dir) {
|
|
|
6179
6457
|
const filePath = (0, import_path2.join)(dir ?? process.cwd(), ".env");
|
|
6180
6458
|
let existing;
|
|
6181
6459
|
try {
|
|
6182
|
-
existing = await (0,
|
|
6460
|
+
existing = await (0, import_promises9.readFile)(filePath, "utf8");
|
|
6183
6461
|
} catch {
|
|
6184
6462
|
}
|
|
6185
6463
|
const content = buildEnvContent(url, user, existing);
|
|
6186
|
-
await (0,
|
|
6464
|
+
await (0, import_promises9.writeFile)(filePath, content, "utf8");
|
|
6187
6465
|
console.error(import_picocolors11.default.dim("Saved to .env"));
|
|
6188
6466
|
} catch (err) {
|
|
6189
6467
|
const message = err instanceof Error ? err.message : String(err);
|
|
@@ -6624,7 +6902,12 @@ async function main() {
|
|
|
6624
6902
|
process.exitCode = 1;
|
|
6625
6903
|
return;
|
|
6626
6904
|
}
|
|
6627
|
-
if (result.
|
|
6905
|
+
if (result.layoutWarning !== void 0) {
|
|
6906
|
+
process.stderr.write(
|
|
6907
|
+
import_picocolors15.default.yellow(`Warning: ${result.layoutWarning} Diagnosing with what is present.
|
|
6908
|
+
`)
|
|
6909
|
+
);
|
|
6910
|
+
} else if (result.missingExpected.length > 0) {
|
|
6628
6911
|
process.stderr.write(
|
|
6629
6912
|
import_picocolors15.default.yellow(
|
|
6630
6913
|
`Warning: bundle is missing expected artifact(s): ${result.missingExpected.join(", ")}. Diagnosing with what is present.
|
|
@@ -20,6 +20,7 @@ Severity order for filtering is `WARN < UERR < ERROR < FATAL`, so `min_severity=
|
|
|
20
20
|
|
|
21
21
|
- The logs are large (a rank log can exceed 100k lines). NEVER ask for a whole file. Use `kinetica_bundle_log_timeline` to localize, then `kinetica_bundle_search_logs` with a tight time window + severity to extract only relevant lines. The match cap is shared across files — if you see "capped", narrow the query rather than asking for more.
|
|
22
22
|
- You can pass a timeline bucket label straight into `from_ts`/`to_ts` (e.g. `2026-06-11 15` searches that whole hour) — partial timestamps are widened to cover the full period.
|
|
23
|
+
- A single log record can span multiple physical lines when a logged value (notably a SQL statement) contains embedded newlines — the continuation lines have no timestamp. A plain search returns only the first line. Pass `include_multiline: true` to stitch the continuation lines back onto each match and recover the whole record. See "Finding a crash's triggering SQL".
|
|
23
24
|
- Timestamps are plain local strings without a timezone; compare them lexically and treat cross-rank timing cautiously.
|
|
24
25
|
- **Ranks vs. the host manager:** `rank` selects a numeric rank (`r0`, `r1`, …) only. The host manager (`core-gpudb-rolling-hm.log`) is a singleton service, NOT a rank — search or timeline it with `host_manager: true`, never `rank: "hm"`. By default both `log_timeline` and `search_logs` already cover the host manager along with the numeric ranks; `kinetica_bundle_list_files` lists it under `services_present`.
|
|
25
26
|
|
|
@@ -29,11 +30,13 @@ When a worker rank segfaults mid-query, that rank's log holds the **backtrace**
|
|
|
29
30
|
|
|
30
31
|
Workflow, given a `JobId` from a worker's crash stack:
|
|
31
32
|
|
|
32
|
-
1. `kinetica_bundle_search_logs` with `rank: "r0"
|
|
33
|
-
2.
|
|
34
|
-
3. **
|
|
33
|
+
1. `kinetica_bundle_search_logs` with `rank: "r0"`, `regex` = the JobId, **and `include_multiline: true`**. r0 logs the `/execute/sql` receipt (submitting user), the `Sql/SqlDriver.cpp … Executing SQL:` line, and per-operation endpoint lines.
|
|
34
|
+
2. **Read the full statement straight from the `Executing SQL:` line — do not reconstruct it.** Kinetica logs the SQL verbatim, so a real query spans MANY physical lines: `FROM …`, `JOIN …`, `WHERE …` each land on their own line with no timestamp prefix. Those continuation lines belong to the same log record; `include_multiline: true` stitches them back onto the match so you see the WHOLE query. WITHOUT it, a match is only the first physical line (e.g. `… Executing SQL: SELECT c."circuitId", c."circuitRef"`) and you would wrongly report the query as "truncated." Quote the statement verbatim from this single (now multi-line) match.
|
|
35
|
+
3. **Cache-hit fallback only:** if `Found plan for the SQL in cache` precedes the job, Kinetica logs `Executing SQL:` as just the statement keyword (e.g. a bare `SELECT` or `EXECUTE PROCEDURE …`) with no continuation lines to stitch. ONLY then fall back to the per-operation endpoint lines (`Endpoint_aggregate_group_by.cpp`, filter/join endpoints): they carry `table:`, `column_names:`/`aliases:` (the SELECT list), and `expr:` (the full WHERE predicate), whose values survive a cache hit. A `datetime()`/timestamp filter showing up here often _is_ the input that triggered a parser segfault.
|
|
35
36
|
|
|
36
|
-
|
|
37
|
+
**Where the full multi-line query actually lives:** `include_multiline` recovers the whole statement only from the **rolling core logs** (`logs-local/core-gpudb-rolling-r0.log`), where the SQL's embedded newlines are preserved as continuation lines. The **Loki per-rank tail** (`logs/rank0.log`) keeps only the statement's first physical line — promtail captures each line as its own record, so the `FROM`/`JOIN`/`WHERE` lines are simply not in that export, and nothing can stitch them. So this workflow depends on r0 being present under `logs-local/`. If r0 exists only as a Loki tail (rare for the coordinator, but possible for a Loki-only bundle), the complete query may not be in the bundle at all — say so rather than reporting the first line as the whole query, and fall back to step 3's endpoint lines.
|
|
38
|
+
|
|
39
|
+
See `rank-architecture.md` (Where queries are logged) for why this locality holds, and "Two log families" below for rolling-vs-Loki precedence.
|
|
37
40
|
|
|
38
41
|
### Files of interest
|
|
39
42
|
|
|
@@ -49,6 +52,17 @@ See `rank-architecture.md` (Where queries are logged) for why this locality hold
|
|
|
49
52
|
- **Packages / accounts:** `deb.txt` / `rpm.txt` (installed packages), `user.txt` (users/groups, gpudb account), `ld.so.conf.txt`, `etc_*.txt` (system shell/host config).
|
|
50
53
|
- **Evidence Gaps:** `errors.txt` / `proc-logs-erros.txt` — collection commands that FAILED. `logfiles.txt` — manifest of log dirs the collector enumerated.
|
|
51
54
|
|
|
55
|
+
### When the bundle doesn't match the expected layout
|
|
56
|
+
|
|
57
|
+
Not every bundle is a full `gpudb_sysinfo` capture. A customer may hand over a bare logs-only dump, a differently-named collector's output, or a flat directory. `kinetica_bundle_list_files` tells you how well it matched, so you never reason blindly over an unfamiliar shape:
|
|
58
|
+
|
|
59
|
+
- **`layout_match`** — `canonical` (a normal gpudb_sysinfo bundle), `partial`, or `unfamiliar` (none of the expected config/version/host-diagnostic anchors were found, e.g. a logs-only dump). When it is not `canonical`, a `layout_note` summarizes what was inferred.
|
|
60
|
+
- **Per-file `confidence`** — `exact` (matched a canonical name/location), `inferred` (recognized by a name or content heuristic — e.g. a rolling log shipped WITHOUT the `core-` prefix, or a `.out` whose first lines parsed as log lines), or `weak`. The `why` field states how each file was classified.
|
|
61
|
+
- **`inferred_ranks_unconfirmed`** — ranks seen only via a loose name guess, never confirmed by a canonical pattern or by log content. Treat these as "possible — verify," distinct from `ranks_present`, which stays trustworthy.
|
|
62
|
+
- **`unknown_file_paths`** — files that could not be classified at all. Do NOT ignore them: they may be evidence under an unfamiliar name. Read one with `kinetica_bundle_read_sysinfo` (it returns the raw content / EXEC_CMD blocks) to decide what it holds.
|
|
63
|
+
|
|
64
|
+
Inference does not make a file second-class: a rolling log recognized without its `core-` prefix is treated exactly like a canonical core log — it appears in `ranks_present` and `kinetica_bundle_search_logs`/`log_timeline` search it normally. The parsers have already been applied for you. Your job: trust `ranks_present` / `services_present`, sanity-check anything marked `inferred` or `unknown`, and state plainly in the report when the evidence came from an off-shape bundle (note the `layout_match`).
|
|
65
|
+
|
|
52
66
|
### Two log families — and why every rank is reachable
|
|
53
67
|
|
|
54
68
|
A bundle carries per-rank logs in up to two places, and the collector host usually holds only a couple of the cluster's ranks:
|