@kinetica/admin-agent 0.1.2 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +83 -24
- package/dist/admin-agent.js +1838 -284
- package/knowledge/references/bundle/support-bundle.md +40 -0
- package/knowledge/references/version-quirks-7.2.md +7 -2
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
[](LICENSE)
|
|
4
4
|
[](https://nodejs.org/)
|
|
5
5
|
|
|
6
|
-
AI-powered diagnostic agent for [Kinetica](https://www.kinetica.com/) GPU databases. Connects to a live instance, autonomously investigates issues
|
|
6
|
+
AI-powered diagnostic agent for [Kinetica](https://www.kinetica.com/) GPU databases. Connects to a live instance — or analyzes an extracted offline support bundle, or both at once — autonomously investigates issues across 28 tools, and produces structured markdown reports with evidence-backed findings and actionable remediation.
|
|
7
7
|
|
|
8
8
|
Built with the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-tools/claude-agent-sdk).
|
|
9
9
|
|
|
@@ -17,6 +17,7 @@ Built with the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-
|
|
|
17
17
|
- [🔑 Authentication](#authentication)
|
|
18
18
|
- [💰 Session Budget](#session-budget)
|
|
19
19
|
- [⚠️ Degraded Mode](#degraded-mode)
|
|
20
|
+
- [📦 Offline Bundle Mode](#offline-bundle-mode)
|
|
20
21
|
- [🖥️ CLI Flags](#cli-flags)
|
|
21
22
|
- [🧰 Tools](#tools)
|
|
22
23
|
- [💓 System Health & Monitoring](#system-health--monitoring)
|
|
@@ -26,6 +27,7 @@ Built with the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-
|
|
|
26
27
|
- [🗃️ SQL Execution (read-only)](#sql-execution-read-only)
|
|
27
28
|
- [✏️ Administrative Mutations (require approval)](#administrative-mutations-require-approval)
|
|
28
29
|
- [🔀 Batch Column Alter (self-approving)](#batch-column-alter-self-approving)
|
|
30
|
+
- [📦 Offline Bundle Analysis (read-only)](#offline-bundle-analysis-read-only)
|
|
29
31
|
- [📑 Reporting](#reporting)
|
|
30
32
|
- [🔒 Security](#security)
|
|
31
33
|
- [📚 Contributing Diagnostic Knowledge](#contributing-diagnostic-knowledge)
|
|
@@ -45,7 +47,8 @@ Built with the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-
|
|
|
45
47
|
**Key capabilities:**
|
|
46
48
|
|
|
47
49
|
- Autonomous multi-round investigation with parallel tool calls
|
|
48
|
-
- 16 read-only diagnostic tools + 4 mutation tools with interactive approval + 2 self-managing tools (reporting, batch-column alter) = **22 tools total**
|
|
50
|
+
- 16 read-only diagnostic tools + 4 mutation tools with interactive approval + 2 self-managing tools (reporting, batch-column alter) = **22 live tools**, plus 6 offline bundle-analysis tools = **28 total**
|
|
51
|
+
- **Offline support-bundle analysis** — diagnose from an extracted `gpudb_sysinfo` bundle (per-rank logs, `gpudb.conf`, host diagnostics) with no live connection, or attach a bundle alongside a live session to cross-check captured history against current state
|
|
49
52
|
- Expert knowledge via pluggable playbooks (no code required to add new ones)
|
|
50
53
|
- Schema-aware SQL — discovers actual column names at startup, never guesses
|
|
51
54
|
- HTTPS-first URL resolution with explicit consent required before any HTTP fallback
|
|
@@ -100,6 +103,8 @@ Round 5 — Verification
|
|
|
100
103
|
|
|
101
104
|
Each round uses multiple tools in parallel where possible. The agent names specific hypotheses, ties every conclusion to evidence, and never gives vague or generic advice.
|
|
102
105
|
|
|
106
|
+
In [offline bundle mode](#offline-bundle-mode) (no live connection) the protocol shortens to read-only **diagnose → report**: the agent has no DB engine to mutate against or re-query, so Rounds 4–5 (remediation and verification) are dropped. When a bundle is attached _alongside_ a live connection, the full 5-round protocol applies and the agent can correlate the bundle's frozen evidence (what happened) against current live state (what's true now).
|
|
107
|
+
|
|
103
108
|
### Example Report Output
|
|
104
109
|
|
|
105
110
|
After investigation, the agent produces a structured markdown report saved to `reports/`:
|
|
@@ -154,21 +159,22 @@ wasting ~28.7 MB as raw storage. Both issues have been remediated.
|
|
|
154
159
|
## Prerequisites
|
|
155
160
|
|
|
156
161
|
- **Node.js 20+**
|
|
157
|
-
- **Kinetica 7.2.x or later** — network-accessible URL (default port 9191)
|
|
162
|
+
- **Kinetica 7.2.x or later** — network-accessible URL (default port 9191); _not required for offline [`--bundle`](#offline-bundle-mode) analysis_
|
|
158
163
|
- **Anthropic API key** or **OAuth login** (Claude Pro/Max or Console account)
|
|
159
164
|
|
|
160
165
|
## Configuration
|
|
161
166
|
|
|
162
167
|
Set environment variables or use a `.env` file. The agent loads `.env` automatically at startup (shell-set variables always take precedence). Any missing values are prompted interactively.
|
|
163
168
|
|
|
164
|
-
| Variable
|
|
165
|
-
|
|
|
166
|
-
| `ANTHROPIC_API_KEY`
|
|
167
|
-
| `
|
|
168
|
-
| `
|
|
169
|
-
| `
|
|
170
|
-
| `
|
|
171
|
-
| `
|
|
169
|
+
| Variable | Description | Required |
|
|
170
|
+
| ------------------------ | ------------------------------------------------------------------------------------------------ | ----------------------------------------------- |
|
|
171
|
+
| `ANTHROPIC_API_KEY` | Anthropic API key for Claude | No — OAuth login used if unset |
|
|
172
|
+
| `ADMIN_AGENT_MAX_BUDGET` | Per-session budget cap in USD for API-key billing (overridden by `--max-budget`; default `5.00`) | No |
|
|
173
|
+
| `KINETICA_URL` | Kinetica instance URL (e.g. `http://host:9191` or bare `host:9191`) | Prompted if unset |
|
|
174
|
+
| `KINETICA_USER` | Kinetica username | Prompted if unset |
|
|
175
|
+
| `KINETICA_PASS` | Kinetica password | Prompted if unset (masked, never saved to .env) |
|
|
176
|
+
| `KINETICA_HTTPS_ONLY` | Set to `1` to refuse plaintext HTTP fallback entirely — strict mode for production clusters | No |
|
|
177
|
+
| `DEBUG` | Set to `1` to log HTTP requests and the assembled system-prompt token size to stderr | No |
|
|
172
178
|
|
|
173
179
|
```bash
|
|
174
180
|
cp .env.example .env # fill in values — or let the agent create it for you
|
|
@@ -208,12 +214,35 @@ npm run dev -- --logout
|
|
|
208
214
|
|
|
209
215
|
### Session Budget
|
|
210
216
|
|
|
211
|
-
Each session
|
|
217
|
+
Each session has a **budget guard** to prevent runaway spend. Its form depends on how you authenticate with Anthropic:
|
|
218
|
+
|
|
219
|
+
- **API-key billing** — the session enforces a dollar cap (default **$5.00**). Raise it with the `--max-budget=<USD>` flag or the `ADMIN_AGENT_MAX_BUDGET` environment variable (the flag wins when both are set). When estimated spend crosses ~80% of the cap, the agent warns on stderr and is instructed to save a partial report and wind down. If the cap is reached, the session ends with a message showing how to re-run with more headroom — and any report saved up to that point remains in `reports/`.
|
|
220
|
+
- **OAuth (Claude Pro/Max subscription)** — no dollar cap is imposed (you are not billed per token). The session is bounded by the **turn limit** (100 turns) instead.
|
|
221
|
+
|
|
222
|
+
The active guard is printed at startup, and the session summary reports per-investigation and total spend (API-key billing only). The dollar cap is enforced precisely by the Claude Agent SDK; the ~80% warning is an estimate from per-turn token usage, so it is approximate by design.
|
|
212
223
|
|
|
213
224
|
### Degraded Mode
|
|
214
225
|
|
|
215
226
|
If the DB engine on port 9191 is unreachable after 3 retries, the agent probes the host manager on port 9300. If it responds, the agent starts in **degraded mode** — only `kinetica_host_manager_status` provides useful data (version, license, per-rank process status). If both ports are unreachable, the agent exits with code 1.
|
|
216
227
|
|
|
228
|
+
### Offline Bundle Mode
|
|
229
|
+
|
|
230
|
+
When a cluster is down (or you're diagnosing after the fact), the live endpoints can't tell you what happened — but a `gpudb_sysinfo` **support bundle** can. It captures the evidence the live API never exposes: per-rank logs, the real on-disk `gpudb.conf`, and host-level diagnostics (memory, GPU, disk, process args). Point the agent at an **extracted** bundle directory to diagnose entirely offline:
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
admin-agent --bundle=/path/to/extracted-gpudb_sysinfo
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
The bundle must be **extracted first** — passing a `.tgz`/`.tar.gz` fails fast with an extract-first message. At startup the agent validates the directory, detects the Kinetica version, and prints an inventory (files by kind, ranks present); missing expected artifacts (e.g. no config, no core logs) are a non-fatal warning, mirroring degraded mode's "diagnose with what's present" philosophy.
|
|
237
|
+
|
|
238
|
+
A bundle and a live connection are **composable capabilities, not exclusive modes**:
|
|
239
|
+
|
|
240
|
+
- **Bundle only** (cluster unreachable) — the agent runs read-only and is bounded to 40 turns. Mutation tools are never even constructed, so offline analysis is read-only _by construction_.
|
|
241
|
+
- **Bundle + live** — when `--bundle` is given, the agent still attempts a best-effort, env-only live connection (no prompts, no exit). If the cluster answers, you get both tool sets and the agent correlates the bundle's frozen history against current live state. If not, it continues bundle-only.
|
|
242
|
+
- **Attach mid-session** — in any live session you can ask the agent to analyze a support bundle. It calls `kinetica_load_bundle` _without a path_, which opens an interactive directory picker for you to select the extracted bundle; the offline tools light up immediately. (If the agent instead proposes a specific path, you're asked to confirm it first — loading a directory lets the agent read files under it.)
|
|
243
|
+
|
|
244
|
+
Anthropic authentication still runs in bundle mode; only the interactive Kinetica credential collection is skipped (there may be no live DB to connect to). See [Offline Bundle Analysis](#offline-bundle-analysis-read-only) for the tools, and [CLAUDE.md](CLAUDE.md) for the parser/architecture details.
|
|
245
|
+
|
|
217
246
|
## CLI Flags
|
|
218
247
|
|
|
219
248
|
```bash
|
|
@@ -225,13 +254,19 @@ admin-agent --login-method=TYPE # Login method: claudeai (Pro/Max) or console
|
|
|
225
254
|
admin-agent --login-org=UUID # Target organization UUID for OAuth
|
|
226
255
|
admin-agent --logout # Log out from Anthropic account and exit
|
|
227
256
|
admin-agent --model=NAME # Override agent model (sonnet | haiku | opus); default: sonnet
|
|
257
|
+
admin-agent --max-budget=USD # Per-session budget cap in USD (API-key billing only); default: 5.00
|
|
258
|
+
admin-agent --bundle=PATH # Offline mode: diagnose from an extracted support-bundle directory
|
|
228
259
|
```
|
|
229
260
|
|
|
230
|
-
The `--model` flag swaps the primary model for a single session. `haiku` is cheaper and faster for simple triage; `opus` is slower and more expensive but produces deeper reasoning on complex investigations. The fallback model remains `haiku` regardless of the primary choice, so availability is unchanged.
|
|
261
|
+
The `--model` flag swaps the primary model for a single session. `haiku` is cheaper and faster for simple triage; `opus` is slower and more expensive but produces deeper reasoning on complex investigations. The fallback model remains `haiku` regardless of the primary choice, so availability is unchanged. When you omit `--model` in an interactive terminal, the agent shows a startup picker (defaulting to `sonnet`); non-interactive runs use the default without prompting.
|
|
262
|
+
|
|
263
|
+
The `--max-budget` flag sets the per-session dollar cap for API-key billing (see [Session Budget](#session-budget)). It overrides `ADMIN_AGENT_MAX_BUDGET` and has no effect under OAuth subscription billing, which is turn-limited instead.
|
|
264
|
+
|
|
265
|
+
The `--bundle` flag points the agent at an **extracted** support-bundle directory for [offline analysis](#offline-bundle-mode) (pass the directory, not a `.tgz`). It composes with a live connection — the agent attempts a best-effort live connection at startup so it can cross-check bundle evidence against current state — and skips interactive Kinetica credential collection (Anthropic auth still runs).
|
|
231
266
|
|
|
232
267
|
## Tools
|
|
233
268
|
|
|
234
|
-
|
|
269
|
+
28 tools organized into categories: **22 live tools** (used when connected to a running instance) plus **6 offline bundle-analysis tools** (used against an extracted support bundle). Diagnostic, SQL, and all bundle tools execute without approval — they are read-only. Mutation tools require explicit user confirmation via an interactive y/n/explain prompt. The batch column alter tool is self-approving via its own checklist + SQL preview flow. Before saving a report, the agent asks the operator (in conversation) whether to save and waits for a yes — so `save_report` only writes once you've agreed.
|
|
235
270
|
|
|
236
271
|
### System Health & Monitoring
|
|
237
272
|
|
|
@@ -289,11 +324,24 @@ The `--model` flag swaps the primary model for a single session. `haiku` is chea
|
|
|
289
324
|
| ------------------------------ | ------------------------------------------------------------------------------------------------------- |
|
|
290
325
|
| `kinetica_alter_table_columns` | Batch 2+ column changes into one ALTER TABLE. Two-step approval: interactive checklist then SQL preview |
|
|
291
326
|
|
|
327
|
+
### Offline Bundle Analysis (read-only)
|
|
328
|
+
|
|
329
|
+
Available against an extracted `gpudb_sysinfo` support bundle (see [Offline Bundle Mode](#offline-bundle-mode)). All read-only; the search/timeline tools stream and bound their output so a 20 MB rank log never blows up the context.
|
|
330
|
+
|
|
331
|
+
| Tool | Description |
|
|
332
|
+
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
|
|
333
|
+
| `kinetica_load_bundle` | Attach an extracted bundle directory; without a path it opens a directory picker (a model-supplied path needs operator confirmation) |
|
|
334
|
+
| `kinetica_bundle_list_files` | Inventory: detected version, ranks + services present, file counts/sizes by kind — call this first |
|
|
335
|
+
| `kinetica_bundle_log_timeline` | Per-time-bucket severity counts across ranks (the incident shape) — call before searching |
|
|
336
|
+
| `kinetica_bundle_search_logs` | Bounded log search by regex, min-severity, time window, and rank / host-manager / component |
|
|
337
|
+
| `kinetica_bundle_read_config` | Read the bundle's real on-disk `gpudb.conf`, with optional section/key filter |
|
|
338
|
+
| `kinetica_bundle_read_sysinfo` | OS/process/version diagnostic files (memory, CPU, disk, GPU, network, process args) |
|
|
339
|
+
|
|
292
340
|
### Reporting
|
|
293
341
|
|
|
294
|
-
| Tool | Description
|
|
295
|
-
| ------------- |
|
|
296
|
-
| `save_report` | Timestamped markdown report to `reports/` with credential scrubbing |
|
|
342
|
+
| Tool | Description |
|
|
343
|
+
| ------------- | ---------------------------------------------------------------------------------------------- |
|
|
344
|
+
| `save_report` | Timestamped markdown report to `reports/` with credential scrubbing — agent asks before saving |
|
|
297
345
|
|
|
298
346
|
## Security
|
|
299
347
|
|
|
@@ -302,11 +350,13 @@ The agent is designed with defense-in-depth for database administration:
|
|
|
302
350
|
- **Credential isolation** — Kinetica credentials are captured in a closure and never exposed to the agent or logged
|
|
303
351
|
- **HTTPS enforcement** — URL resolution probes HTTPS first; any fallback to plaintext HTTP requires explicit interactive confirmation, is refused in non-interactive environments, and can be disabled entirely via `KINETICA_HTTPS_ONLY=1`
|
|
304
352
|
- **Read-only by default** — 16 read-only diagnostic tools (including SQL execute/explain) run without approval; the agent has no access to `Bash`, `Edit`, `Write`, or `MultiEdit` and cannot run arbitrary shell commands
|
|
353
|
+
- **Offline analysis is read-only by construction** — in bundle-only mode the mutation tools are never instantiated; the 6 bundle tools only read files, and every read is confined to the bundle root (path-escape attempts via `..` are rejected). Attaching a bundle the _model_ chose (an explicit `path` passed to `kinetica_load_bundle`) requires operator confirmation, since loading widens the readable directory — a path you pick from the interactive picker is already your consent
|
|
305
354
|
- **Mutation approval gate** — the 4 administrative mutation tools each trigger an interactive y/n/explain prompt before execution; DROP/TRUNCATE/DELETE/UPDATE SQL is always blocked (with CTE-bypass protection)
|
|
306
355
|
- **Two-step approval for batch column alter** — `kinetica_alter_table_columns` requires the operator to select columns via a checklist, then confirm the exact SQL preview
|
|
307
356
|
- **Audit trail** — every mutation logs a redacted audit line to stderr (EXECUTED/FAILED + fingerprinted input summary) and appears in the report's "Mutations Applied" table with before/after state
|
|
308
357
|
- **Report scrubbing** — saved reports are scrubbed of URLs, auth headers, Basic/Bearer credentials, cookies, and passwords before writing to disk
|
|
309
|
-
- **
|
|
358
|
+
- **Confirmed report writes** — the agent asks the operator (in conversation) whether to save before composing the report, and writes only after a yes; the one exception is an automatic partial-report checkpoint when the budget guard is about to cut the session off, so findings are never lost
|
|
359
|
+
- **Budget guard** — a per-session dollar cap (default $5.00, configurable via `--max-budget` or `ADMIN_AGENT_MAX_BUDGET`) prevents runaway spend on API-key billing; OAuth subscription sessions are bounded by the turn limit instead
|
|
310
360
|
|
|
311
361
|
To report a security vulnerability, please see [SECURITY.md](SECURITY.md). Do not open a public GitHub issue for security issues.
|
|
312
362
|
|
|
@@ -375,7 +425,9 @@ References provide domain knowledge (not diagnostic runbooks). Create a `.md` fi
|
|
|
375
425
|
- `sql-create-index` — column index syntax, chunk skip index, when to use which
|
|
376
426
|
- `version-quirks-7.2` — endpoint/property differences between 7.2.x and earlier releases
|
|
377
427
|
|
|
378
|
-
|
|
428
|
+
Plus a **bundle-scoped reference** (`support-bundle` — bundle layout, log-line format, severity ordering, file parsing) that lives in `knowledge/references/bundle/`. It loads in **every** session — even a pure live one — so that a bundle attached mid-session via `kinetica_load_bundle` has its parsing knowledge ready in the (build-once) prompt; the corpus is cached, so the cost to a session that never attaches a bundle is negligible.
|
|
429
|
+
|
|
430
|
+
> **Heads up — prompt budget:** all playbooks and references are front-loaded into a single system prompt at startup, so its token cost grows with the knowledge corpus. A startup tripwire (`agent/prompt-budget.ts`) prints the assembled prompt size under `DEBUG` and warns on stderr once it exceeds ~20,000 estimated tokens. Current baseline is ~13.4k tokens (6 playbooks + 9 references). If you add substantial knowledge and trip that warning, treat it as the cue to switch from "load everything" to keyword-based playbook selection.
|
|
379
431
|
|
|
380
432
|
## Development
|
|
381
433
|
|
|
@@ -424,10 +476,11 @@ Exit codes: `0` pass, `1` assertion failed, `2` harness failure (e.g., missing A
|
|
|
424
476
|
|
|
425
477
|
```
|
|
426
478
|
src/
|
|
427
|
-
cli/ # Entry point, banner, arg parsing
|
|
428
|
-
agent/ # Agent loop, system
|
|
479
|
+
cli/ # Entry point, banner, arg parsing, bundle directory picker
|
|
480
|
+
agent/ # Agent loop, system prompts (live + bundle), schema discovery
|
|
429
481
|
session/ # Kinetica connection, credentials, .env management, URL resolution
|
|
430
|
-
|
|
482
|
+
bundle/ # Offline support-bundle parsers + BundleSource facade
|
|
483
|
+
tools/ # 28 MCP tools (rest/, sql/, mutation/, bundle/)
|
|
431
484
|
output/ # Formatting, truncation, table alignment
|
|
432
485
|
approval/ # Mutation approval gate and checklist UI
|
|
433
486
|
report/ # Report generation and credential scrubbing
|
|
@@ -435,7 +488,7 @@ src/
|
|
|
435
488
|
types/ # Shared type contracts
|
|
436
489
|
knowledge/
|
|
437
490
|
playbooks/ # Diagnostic runbooks (Markdown + YAML frontmatter)
|
|
438
|
-
references/ # Domain knowledge documents
|
|
491
|
+
references/ # Domain knowledge documents (bundle/ subdir = offline-only refs)
|
|
439
492
|
reports/ # Generated diagnostic reports (git-ignored)
|
|
440
493
|
```
|
|
441
494
|
|
|
@@ -470,9 +523,15 @@ admin-agent
|
|
|
470
523
|
|
|
471
524
|
- Some endpoints (e.g., `/admin/show/logs`) don't exist in Kinetica 7.2.x — the agent falls back to SQL queries automatically
|
|
472
525
|
|
|
526
|
+
**`--bundle` won't start / "expects an extracted directory"**
|
|
527
|
+
|
|
528
|
+
- The bundle must be **extracted first**: `tar xzf gpudb_sysinfo*.tgz`, then pass the resulting directory to `--bundle`. Passing the archive itself fails fast by design.
|
|
529
|
+
- `--bundle=` with an empty value (e.g. an unset shell variable) is rejected — supply a real path.
|
|
530
|
+
- A missing-artifact warning (no config / no core logs) is non-fatal; the agent diagnoses with whatever is present, just as in degraded mode.
|
|
531
|
+
|
|
473
532
|
**Agent hits budget cap**
|
|
474
533
|
|
|
475
|
-
-
|
|
534
|
+
- Applies to API-key billing only (default $5.00 per session). Raise it for the next run with `--max-budget=10` or `export ADMIN_AGENT_MAX_BUDGET=10`. The agent warns at ~80% so it can save a partial report before the cap is reached. For complex multi-table investigations, consider running focused sessions per table. OAuth (Pro/Max) sessions are turn-limited rather than dollar-capped.
|
|
476
535
|
|
|
477
536
|
**Empty or missing report**
|
|
478
537
|
|