@kinetica/admin-agent 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +191 -0
- package/NOTICE +2 -0
- package/README.md +484 -0
- package/dist/admin-agent.js +4961 -0
- package/knowledge/playbooks/config-drift.md +26 -0
- package/knowledge/playbooks/gpu-out-of-memory.md +27 -0
- package/knowledge/playbooks/memory-pressure.md +29 -0
- package/knowledge/playbooks/query-contention.md +28 -0
- package/knowledge/playbooks/resource-group-exhaustion.md +27 -0
- package/knowledge/playbooks/stale-rank.md +26 -0
- package/knowledge/references/catalog-enums.md +82 -0
- package/knowledge/references/catalog-joins.md +105 -0
- package/knowledge/references/gpudb-conf.md +93 -0
- package/knowledge/references/mutation-safety.md +89 -0
- package/knowledge/references/rank-architecture.md +54 -0
- package/knowledge/references/sql-alter-table.md +78 -0
- package/knowledge/references/sql-create-index.md +49 -0
- package/knowledge/references/tiered-objects.md +106 -0
- package/knowledge/references/version-quirks-7.2.md +96 -0
- package/knowledge/templates/report.md +57 -0
- package/package.json +76 -0
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Config Drift / Configuration Pitfalls
|
|
3
|
+
category: configuration
|
|
4
|
+
severity: warning
|
|
5
|
+
keywords: [config, drift, configuration, regression, upgrade]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Symptoms
|
|
9
|
+
|
|
10
|
+
- Unexpected behavior after upgrade or config change
|
|
11
|
+
- Performance regression with no workload change
|
|
12
|
+
|
|
13
|
+
## Detection
|
|
14
|
+
|
|
15
|
+
- `kinetica_get_system_properties` → compare against known-good values
|
|
16
|
+
- `kinetica_get_config` → snapshot shows non-default values (7.2.x: use system properties instead)
|
|
17
|
+
|
|
18
|
+
## Root Cause
|
|
19
|
+
|
|
20
|
+
Configuration issue from manual edits, upgrade migration, or environment-specific settings.
|
|
21
|
+
|
|
22
|
+
## Remediation
|
|
23
|
+
|
|
24
|
+
1. Use `kinetica_alter_system_properties` to restore known-good config values
|
|
25
|
+
2. Review Kinetica changelog for breaking config changes between versions
|
|
26
|
+
3. Document baseline configuration for future comparison
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: GPU Out-of-Memory
|
|
3
|
+
category: performance
|
|
4
|
+
severity: critical
|
|
5
|
+
keywords: [VRAM, GPU, OOM, memory, timeout]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Symptoms
|
|
9
|
+
|
|
10
|
+
- ERROR logs with "out_of_memory" or GPU OOM, query failures
|
|
11
|
+
- `kinetica_get_metrics` shows GPU memory near 100%
|
|
12
|
+
|
|
13
|
+
## Detection
|
|
14
|
+
|
|
15
|
+
- `kinetica_get_metrics` → check `vram_used` on worker ranks
|
|
16
|
+
- `ki_catalog.ki_tiered_objects` → find large VRAM-tier objects
|
|
17
|
+
|
|
18
|
+
## Root Cause
|
|
19
|
+
|
|
20
|
+
Queries materializing too much data in VRAM; oversized objects loaded into GPU memory.
|
|
21
|
+
|
|
22
|
+
## Remediation
|
|
23
|
+
|
|
24
|
+
1. Identify largest GPU objects via `kinetica_resource_objects`
|
|
25
|
+
2. Add query limits to constrain result set sizes
|
|
26
|
+
3. Review GPU memory allocation config via `kinetica_get_system_properties` (conf.tier.\*)
|
|
27
|
+
4. Consider tier eviction policy changes
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Memory Pressure
|
|
3
|
+
category: performance
|
|
4
|
+
severity: warning
|
|
5
|
+
keywords: [memory, pressure, eviction, RAM, slow, disk]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Symptoms
|
|
9
|
+
|
|
10
|
+
- Slow queries with no obvious cause
|
|
11
|
+
- Eviction warnings in logs
|
|
12
|
+
- `ki_tiered_objects` showing data moved to PERSIST or DISK tier
|
|
13
|
+
|
|
14
|
+
## Detection
|
|
15
|
+
|
|
16
|
+
- `kinetica_get_metrics` → high RAM usage percentage (above 80%)
|
|
17
|
+
- `ki_catalog.ki_tiered_objects` → objects in PERSIST/DISK tier that should be in RAM
|
|
18
|
+
- `kinetica_resource_objects` → non-zero eviction counts
|
|
19
|
+
|
|
20
|
+
## Root Cause
|
|
21
|
+
|
|
22
|
+
Total working set exceeds available RAM; large objects not fitting in configured tier limits.
|
|
23
|
+
|
|
24
|
+
## Remediation
|
|
25
|
+
|
|
26
|
+
1. Increase tier memory allocation via `kinetica_alter_system_properties` (conf.tier.\*)
|
|
27
|
+
2. Identify and evict cold objects via `kinetica_resource_objects`
|
|
28
|
+
3. Archive unused tables to free tier capacity
|
|
29
|
+
4. Review resource group memory limits in `kinetica_resource_groups`
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Query Contention
|
|
3
|
+
category: performance
|
|
4
|
+
severity: warning
|
|
5
|
+
keywords: [query, contention, slow, blocking, lock, concurrent]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Symptoms
|
|
9
|
+
|
|
10
|
+
- Long-running queries in `ki_query_history` (large elapsed time between start and completion)
|
|
11
|
+
- Active queries blocking each other
|
|
12
|
+
|
|
13
|
+
## Detection
|
|
14
|
+
|
|
15
|
+
- `ki_catalog.ki_query_active_all` → multiple long-running queries
|
|
16
|
+
- `ki_catalog.ki_query_history` → queries with large elapsed time
|
|
17
|
+
- `ki_catalog.ki_query_workers` → blocked worker threads
|
|
18
|
+
|
|
19
|
+
## Root Cause
|
|
20
|
+
|
|
21
|
+
Concurrent large queries competing for GPU resources; lock contention on shared tables.
|
|
22
|
+
|
|
23
|
+
## Remediation
|
|
24
|
+
|
|
25
|
+
1. Stagger large queries to reduce concurrent GPU pressure
|
|
26
|
+
2. Review query priority settings in resource groups
|
|
27
|
+
3. Consider query queue configuration via `kinetica_alter_system_properties`
|
|
28
|
+
4. Check `kinetica_resource_groups` for CPU concurrency limits
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Resource Group Exhaustion
|
|
3
|
+
category: resources
|
|
4
|
+
severity: warning
|
|
5
|
+
keywords: [resource, group, limit, exhaustion, tier, capacity]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Symptoms
|
|
9
|
+
|
|
10
|
+
- Queries failing with resource limit errors
|
|
11
|
+
- Tier capacity warnings
|
|
12
|
+
|
|
13
|
+
## Detection
|
|
14
|
+
|
|
15
|
+
- `kinetica_resource_groups` → tier usage near limits
|
|
16
|
+
- `kinetica_resource_objects` → uneven object distribution across ranks
|
|
17
|
+
|
|
18
|
+
## Root Cause
|
|
19
|
+
|
|
20
|
+
Resource group limits too low for workload; uneven data placement across tiers.
|
|
21
|
+
|
|
22
|
+
## Remediation
|
|
23
|
+
|
|
24
|
+
1. Increase resource group limits via `kinetica_alter_system_properties`
|
|
25
|
+
2. Use `kinetica_admin_rebalance` to redistribute data evenly across ranks
|
|
26
|
+
3. Review resource group assignments in `kinetica_show_security`
|
|
27
|
+
4. Consider adding new resource groups for workload isolation
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Stale Rank (Rank Not Responding)
|
|
3
|
+
category: cluster
|
|
4
|
+
severity: critical
|
|
5
|
+
keywords: [rank, stale, offline, crash, partition]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Symptoms
|
|
9
|
+
|
|
10
|
+
- Health check shows unhealthy rank
|
|
11
|
+
- Cluster status shows rank offline
|
|
12
|
+
|
|
13
|
+
## Detection
|
|
14
|
+
|
|
15
|
+
- `kinetica_health_check` → non-OK rank status
|
|
16
|
+
- `kinetica_cluster_status` → rank alerts, shard mapping gaps
|
|
17
|
+
|
|
18
|
+
## Root Cause
|
|
19
|
+
|
|
20
|
+
Stale rank process after crash or network partition; rank failed to rejoin cluster.
|
|
21
|
+
|
|
22
|
+
## Remediation
|
|
23
|
+
|
|
24
|
+
1. Tell user to run `gadmin restart rank <N>` manually (no REST API for worker restart in 7.2)
|
|
25
|
+
2. After rank recovers, use `kinetica_admin_rebalance` to redistribute shards
|
|
26
|
+
3. Verify recovery with `kinetica_health_check` and `kinetica_cluster_status`
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: ki_catalog Enum Values
|
|
3
|
+
category: catalog-schema
|
|
4
|
+
keywords:
|
|
5
|
+
[
|
|
6
|
+
ki_catalog,
|
|
7
|
+
enums,
|
|
8
|
+
obj_kind,
|
|
9
|
+
shard_kind,
|
|
10
|
+
persistence,
|
|
11
|
+
partition_type,
|
|
12
|
+
tier,
|
|
13
|
+
priority,
|
|
14
|
+
tiered_objects,
|
|
15
|
+
]
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Overview
|
|
19
|
+
|
|
20
|
+
Many `ki_catalog` columns encode state as single-character codes or
|
|
21
|
+
small string constants. These are the canonical values — decode them
|
|
22
|
+
explicitly when interpreting query results or building WHERE clauses.
|
|
23
|
+
|
|
24
|
+
## ki_objects
|
|
25
|
+
|
|
26
|
+
| Column | Value | Meaning |
|
|
27
|
+
| ------------- | ----- | ----------------------------- |
|
|
28
|
+
| `obj_kind` | `R` | table / relation |
|
|
29
|
+
| `obj_kind` | `V` | view |
|
|
30
|
+
| `shard_kind` | `S` | sharded |
|
|
31
|
+
| `shard_kind` | `N` | not sharded |
|
|
32
|
+
| `persistence` | `P` | persistent (survives restart) |
|
|
33
|
+
| `persistence` | `T` | temporary |
|
|
34
|
+
|
|
35
|
+
## ki_partitions
|
|
36
|
+
|
|
37
|
+
| Column | Value | Meaning |
|
|
38
|
+
| ---------------- | ---------- | -------------------------------- |
|
|
39
|
+
| `partition_type` | `NONE` | unpartitioned |
|
|
40
|
+
| `partition_type` | `INTERVAL` | time-based interval partitioning |
|
|
41
|
+
|
|
42
|
+
## ki_tiered_objects
|
|
43
|
+
|
|
44
|
+
### `id` format
|
|
45
|
+
|
|
46
|
+
String identifier (`char256`), format like `@schema@oid[col][chunk]`
|
|
47
|
+
(e.g., `@nyctaxi@365[col][0]`). NOT a numeric OID — **cannot** be
|
|
48
|
+
joined to `ki_objects.oid`. See `catalog-joins.md` for the correct
|
|
49
|
+
lookup path.
|
|
50
|
+
|
|
51
|
+
### `tier`
|
|
52
|
+
|
|
53
|
+
Storage tier placement. One of:
|
|
54
|
+
|
|
55
|
+
- `RAM` — host memory
|
|
56
|
+
- `PERSIST` — persistent SSD/disk tier
|
|
57
|
+
- `DISK0` — primary disk tier
|
|
58
|
+
- `VRAM` — GPU memory
|
|
59
|
+
|
|
60
|
+
Same values appear in `ki_partitions.tier`.
|
|
61
|
+
|
|
62
|
+
### `priority`
|
|
63
|
+
|
|
64
|
+
Tier manager priority — determines eviction order when a tier fills:
|
|
65
|
+
|
|
66
|
+
| Value | Meaning |
|
|
67
|
+
| ----- | ----------------------------------- |
|
|
68
|
+
| 1 | system / `ki_catalog` (never evict) |
|
|
69
|
+
| 5 | regular user tables |
|
|
70
|
+
| 9 | temporary / ephemeral |
|
|
71
|
+
|
|
72
|
+
Higher `priority` = more expendable = evicted first.
|
|
73
|
+
|
|
74
|
+
### `locked` and `evictable`
|
|
75
|
+
|
|
76
|
+
- `locked = 1` — pinned in its current tier; tier manager cannot move it.
|
|
77
|
+
- `evictable = 1` — tier manager may move this object to a lower tier
|
|
78
|
+
when space is needed.
|
|
79
|
+
|
|
80
|
+
An object can be both unlocked and non-evictable (rare; means the tier
|
|
81
|
+
manager will not proactively move it but nothing prevents an
|
|
82
|
+
administrator from doing so).
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: ki_catalog Cross-Table Correlation Paths
|
|
3
|
+
category: catalog-schema
|
|
4
|
+
keywords:
|
|
5
|
+
[
|
|
6
|
+
ki_catalog,
|
|
7
|
+
joins,
|
|
8
|
+
correlation,
|
|
9
|
+
ki_objects,
|
|
10
|
+
ki_columns,
|
|
11
|
+
ki_partitions,
|
|
12
|
+
ki_query_history,
|
|
13
|
+
ki_tiered_objects,
|
|
14
|
+
oid,
|
|
15
|
+
]
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Overview
|
|
19
|
+
|
|
20
|
+
When investigating issues, evidence usually has to be joined across
|
|
21
|
+
multiple `ki_catalog` tables. These are the standard correlation paths
|
|
22
|
+
— prefer them over ad-hoc joins.
|
|
23
|
+
|
|
24
|
+
## Table Metadata Chain
|
|
25
|
+
|
|
26
|
+
Walk this chain to go from an object name to its on-disk footprint and
|
|
27
|
+
schema:
|
|
28
|
+
|
|
29
|
+
```
|
|
30
|
+
ki_objects.oid
|
|
31
|
+
→ ki_obj_stat.oid (row counts, total sizes)
|
|
32
|
+
→ ki_partitions.oid (tier placement, compression)
|
|
33
|
+
→ ki_columns.table_oid (column schema)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Column Type Resolution
|
|
37
|
+
|
|
38
|
+
`ki_columns.column_type_oid` is a numeric OID, not a type name. Join
|
|
39
|
+
it to `ki_datatypes.oid` to get the human-readable type:
|
|
40
|
+
|
|
41
|
+
| OID | Type |
|
|
42
|
+
| ---- | ---------- |
|
|
43
|
+
| 20 | `long` |
|
|
44
|
+
| 1043 | `char256` |
|
|
45
|
+
| 1114 | `datetime` |
|
|
46
|
+
| 2950 | `uuid` |
|
|
47
|
+
| 25 | `string` |
|
|
48
|
+
|
|
49
|
+
## Query Drill-Down
|
|
50
|
+
|
|
51
|
+
To reconstruct a slow query's execution tree:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
ki_query_history.query_id
|
|
55
|
+
→ ki_query_span_metrics_all.query_id
|
|
56
|
+
→ span tree via span_id / parent_span_id
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Active Query Workers
|
|
60
|
+
|
|
61
|
+
For queries currently running:
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
ki_query_active_all.job_id
|
|
65
|
+
→ ki_query_workers.job_id (worker threads, elapsed time, blockers)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Use `ki_query_active_all.is_cancellable` to check whether a running
|
|
69
|
+
query can be cancelled before suggesting that remediation.
|
|
70
|
+
|
|
71
|
+
## Permission Audit
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
ki_object_permissions.role_oid → ki_users_and_roles.oid
|
|
75
|
+
ki_object_permissions.object_oid → ki_objects.oid
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Dependency Graph
|
|
79
|
+
|
|
80
|
+
For impact analysis before proposing a DROP:
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
ki_depend.src_obj_oid → ki_objects.oid
|
|
84
|
+
ki_depend.dep_obj_oid → ki_objects.oid
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Tier Object Lookup (WARNING — no OID join)
|
|
88
|
+
|
|
89
|
+
`ki_tiered_objects.id` is a **string identifier** (e.g.,
|
|
90
|
+
`@nyctaxi@365[col][0]`), NOT a numeric OID. Do NOT try to join it
|
|
91
|
+
with `ki_objects.oid` — the types don't match and the values don't
|
|
92
|
+
correspond.
|
|
93
|
+
|
|
94
|
+
For per-table tier placement, prefer the dedicated tool:
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
kinetica_resource_objects (with table_names filter)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
For SQL-based analysis, filter with a string match:
|
|
101
|
+
|
|
102
|
+
```sql
|
|
103
|
+
SELECT * FROM ki_catalog.ki_tiered_objects
|
|
104
|
+
WHERE id LIKE '%table_name%'
|
|
105
|
+
```
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: gpudb.conf Configuration Reference
|
|
3
|
+
category: configuration
|
|
4
|
+
keywords: [gpudb.conf, config, configuration, parameters, tuning, tiers, alerts]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
`gpudb.conf` is the master Kinetica configuration file (INI format, all under `[gaia]` section).
|
|
10
|
+
Default on-disk location: `/opt/gpudb/core/etc/gpudb.conf`.
|
|
11
|
+
Retrieved via `kinetica_show_configuration` (host manager port 9300), modified via `kinetica_alter_configuration`.
|
|
12
|
+
Runtime properties are a subset available via `kinetica_get_system_properties` / `kinetica_alter_system_properties`.
|
|
13
|
+
|
|
14
|
+
## Section Index
|
|
15
|
+
|
|
16
|
+
| Section | Key Parameters | Diagnostic Relevance |
|
|
17
|
+
| ------------------- | ------------------------------------------------------------------------------------ | --------------------------- |
|
|
18
|
+
| Identification | `ring_name`, `cluster_name` | Cluster identity |
|
|
19
|
+
| Hosts | `host<#>.address`, `host<#>.ram_limit`, `host<#>.gpus` | Host topology, RAM caps |
|
|
20
|
+
| Ranks | `rank<#>.host` | Rank-to-host mapping |
|
|
21
|
+
| Network | `head_port` (9191), `host_manager_http_port` (9300), `enable_worker_http_servers` | Connectivity issues |
|
|
22
|
+
| Security | `require_authentication`, `enable_authorization` | Auth troubleshooting |
|
|
23
|
+
| Auditing | `enable_audit`, `audit_body`, `lock_audit` | Audit trail |
|
|
24
|
+
| Licensing | `license_key` | License issues |
|
|
25
|
+
| Processes & Threads | `worker_endpoint_threads`, `tcs_per_tom`, `tps_per_tom`, `subtask_concurrency_limit` | Performance tuning |
|
|
26
|
+
| Hardware | `rank<#>.taskcalc_gpu`, `rank<#>.numa_node` | GPU/NUMA assignment |
|
|
27
|
+
| General | `default_ttl`, `chunk_size`, `execution_mode`, `request_timeout` | Performance, data lifecycle |
|
|
28
|
+
| Visualization | `max_heatmap_size`, `enable_opengl_renderer`, `enable_vectortile_service` | WMS/VTS issues |
|
|
29
|
+
| Text Search | `enable_text_search`, `text_indices_per_tom` | Text search issues |
|
|
30
|
+
| Persistence | `persist_directory`, `wal.*`, `compression_codec`, `load_vectors_on_start` | Data durability, startup |
|
|
31
|
+
| Monitoring | `enable_stats_server`, `telm.persist_query_metrics` | Observability |
|
|
32
|
+
| Graph Servers | `enable_graph_server`, `graph.server<#>.host` | Graph analytics |
|
|
33
|
+
| HA | `enable_ha`, `enable_ha_replay` | High availability |
|
|
34
|
+
| Alerts | `alert_memory_percentage`, `alert_disk_percentage`, `heartbeat_*` | Alert config |
|
|
35
|
+
| Failover | `np1.enable_worker_failover`, `np1.rank_restart_attempts` | Failover behavior |
|
|
36
|
+
| Postgres Proxy | `enable_postgres_proxy`, `postgres_proxy.port` (5432) | Client connectivity |
|
|
37
|
+
| SQL Engine | `sql.enable_planner`, `sql.planner.timeout`, `sql.plan_cache_size` | Query planning |
|
|
38
|
+
| Tiered Storage | `tier.{vram,ram,disk,persist,cold}.*` | Memory/storage management |
|
|
39
|
+
| Tier Strategy | `tier_strategy.default` | Data placement policy |
|
|
40
|
+
| Resource Groups | `resource_group.default.*` | Resource allocation |
|
|
41
|
+
|
|
42
|
+
## Performance-Critical Parameters
|
|
43
|
+
|
|
44
|
+
**Thread Pools** (all accept `-1` for auto):
|
|
45
|
+
|
|
46
|
+
- `worker_endpoint_threads` — HTTP request handling threads per worker rank
|
|
47
|
+
- `tps_per_tom` — data processing threads (inserts, updates, deletes); multi-head ingest not affected
|
|
48
|
+
- `tcs_per_tom` — calculation threads (aggregates, record retrieval)
|
|
49
|
+
- `subtask_concurrency_limit` — query-level scheduler concurrency; lower = depth-first (fewer queries, faster completion), higher = breadth-first (more concurrency)
|
|
50
|
+
|
|
51
|
+
**Chunk Settings:**
|
|
52
|
+
|
|
53
|
+
- `chunk_size` — records per chunk (default 8M; 0 disables chunking)
|
|
54
|
+
- `chunk_max_memory` — max total chunk data per table in bytes
|
|
55
|
+
- `chunk_column_max_memory` — max per-column chunk data in memory (512MB)
|
|
56
|
+
|
|
57
|
+
**Execution Mode:** `execution_mode` = `default` | `host` | `device` | `<rows>` — controls CPU vs GPU kernel execution. When set to `device` but no GPUs are available, falls back to CPU.
|
|
58
|
+
|
|
59
|
+
## Tiered Storage Quick Reference
|
|
60
|
+
|
|
61
|
+
Five tier types (data flows down when evicted):
|
|
62
|
+
|
|
63
|
+
1. **VRAM** — GPU memory; limit/watermarks per rank per GPU
|
|
64
|
+
2. **RAM** — main memory; rank0 gets ~10% of system RAM, workers split the rest
|
|
65
|
+
3. **Disk** — temporary swap cache (fast SSD recommended); multiple disk tiers supported
|
|
66
|
+
4. **Persist** — permanent storage; data survives restarts
|
|
67
|
+
5. **Cold** — extended storage (disk, HDFS, S3, Azure, GCS); for infrequently accessed data
|
|
68
|
+
|
|
69
|
+
**Watermark semantics:** `high_watermark` triggers background eviction; eviction continues until usage drops below `low_watermark`. Both are percentages (1-100). Set both to 100 to disable eviction. Watermarks are ignored when limit is -1.
|
|
70
|
+
|
|
71
|
+
**Default tier strategy format:** `VRAM <priority>, RAM <priority>, DISK0 <priority>, PERSIST <priority>` — priority 1 (lowest, first evicted) to 9 (highest, last evicted), 10 = unevictable.
|
|
72
|
+
|
|
73
|
+
## WAL (Write-Ahead Log)
|
|
74
|
+
|
|
75
|
+
- `wal.sync_policy`: `none` (disabled) | `background` (periodic) | `flush` (per-operation, survives DB crash) | `fsync` (per-operation, survives OS crash)
|
|
76
|
+
- `wal.checksum`: integrity protection on WAL entries
|
|
77
|
+
- `wal.truncate_corrupt_tables_on_start`: auto-truncate corrupt tables on replay (vs. manual REPAIR TABLE)
|
|
78
|
+
|
|
79
|
+
## Alert Thresholds
|
|
80
|
+
|
|
81
|
+
- `alert_memory_percentage` — comma-separated thresholds (e.g., `1, 5, 10, 20`) for low-memory alerts
|
|
82
|
+
- `alert_disk_percentage` — same for low-disk alerts
|
|
83
|
+
- `heartbeat_interval` / `heartbeat_timeout` / `heartbeat_missed_limit` — host failure detection timing
|
|
84
|
+
|
|
85
|
+
## Key Gotchas
|
|
86
|
+
|
|
87
|
+
- **`-1` means different things:** For thread counts = auto-detect; for tier limits = no limit (ignore watermarks); for `default_ttl` = disabled
|
|
88
|
+
- **`default_ttl`** is in MINUTES — non-protected tables are auto-deleted after this time. A value of 20 means tables without explicit TTL override vanish after 20 minutes.
|
|
89
|
+
- **`load_vectors_on_start = on_demand`** means data loads lazily — first queries on cold data will be slower
|
|
90
|
+
- **Rank 0** is the head/coordinator node with minimal RAM allocation (~10%); it does NOT hold data. Worker ranks (1+) hold all data.
|
|
91
|
+
- **`execution_mode = device`** silently falls back to CPU when no GPUs are present — no error is raised
|
|
92
|
+
- **7.2.x missing parameters:** `sm_omp_threads`, `kernel_omp_threads` do NOT exist — use `worker_endpoint_threads`, `subtask_concurrency_limit`, `tcs_per_tom` instead
|
|
93
|
+
- **Config changes require restart** unless the parameter is also a runtime system property (check via `kinetica_get_system_properties`)
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Mutation Safety Rules
|
|
3
|
+
category: mutation-policy
|
|
4
|
+
keywords:
|
|
5
|
+
[
|
|
6
|
+
mutation,
|
|
7
|
+
safety,
|
|
8
|
+
admin-rebalance,
|
|
9
|
+
alter-system-properties,
|
|
10
|
+
alter-configuration,
|
|
11
|
+
never-propose,
|
|
12
|
+
ai_api_key,
|
|
13
|
+
cache-clearing,
|
|
14
|
+
worker-restart,
|
|
15
|
+
aggressiveness,
|
|
16
|
+
]
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Overview
|
|
20
|
+
|
|
21
|
+
Safety contract the agent must follow before and during Round 4
|
|
22
|
+
(Mutation Proposal) of the investigation protocol. These rules combine
|
|
23
|
+
version-specific Kinetica 7.2.x facts with operational policy — every
|
|
24
|
+
mutation tool call is subject to them.
|
|
25
|
+
|
|
26
|
+
## Pre-Mutation Checklist
|
|
27
|
+
|
|
28
|
+
BEFORE proposing any mutation:
|
|
29
|
+
|
|
30
|
+
1. Always run `kinetica_health_check` first — do not mutate an unhealthy
|
|
31
|
+
cluster.
|
|
32
|
+
2. For `kinetica_admin_rebalance`: check `kinetica_cluster_status` for
|
|
33
|
+
active rebalance/add/remove operations — never propose rebalance
|
|
34
|
+
when one is already running.
|
|
35
|
+
3. For config changes: use `kinetica_get_system_properties` to read the
|
|
36
|
+
current value BEFORE proposing a change (so the report can show a
|
|
37
|
+
meaningful before/after diff).
|
|
38
|
+
|
|
39
|
+
## NEVER Propose
|
|
40
|
+
|
|
41
|
+
- `/clear/table` or `/clear/tablemonitor` as cache-clearing operations —
|
|
42
|
+
these DELETE DATA permanently in Kinetica. They are not caches.
|
|
43
|
+
- Setting `ai_api_key` via `kinetica_alter_system_properties` — this is
|
|
44
|
+
a credential that would appear in audit logs.
|
|
45
|
+
- Setting `external_files_directory` — filesystem path; potential path
|
|
46
|
+
traversal concern.
|
|
47
|
+
- Setting `flush_to_disk` — can trigger an expensive I/O storm.
|
|
48
|
+
- Worker restart — no REST API exists in Kinetica 7.2. Tell the
|
|
49
|
+
operator to run `gadmin restart rank <N>` manually instead.
|
|
50
|
+
- Cache clearing — no safe API exists in Kinetica 7.2. Recommend
|
|
51
|
+
query-side solutions (rewriting the query, adding an index, bumping
|
|
52
|
+
resource group limits) instead of trying to clear caches.
|
|
53
|
+
|
|
54
|
+
## For `kinetica_admin_rebalance`
|
|
55
|
+
|
|
56
|
+
- Recommend aggressiveness 1–3 during production hours (reduces query
|
|
57
|
+
latency impact).
|
|
58
|
+
- Recommend aggressiveness 4–5 during maintenance windows only.
|
|
59
|
+
- Warn the operator: rebalance causes "delayed query responses" while
|
|
60
|
+
running.
|
|
61
|
+
- Check `kinetica_cluster_status` for active jobs before proposing.
|
|
62
|
+
- On single-worker-rank clusters (rank 0 + 1 worker), rebalance
|
|
63
|
+
returns "Database must be offline" — rebalance is only meaningful
|
|
64
|
+
with 2+ worker ranks.
|
|
65
|
+
|
|
66
|
+
## For `kinetica_alter_system_properties`
|
|
67
|
+
|
|
68
|
+
- The tool enforces an allow-list of 43 documented properties —
|
|
69
|
+
unsupported names are rejected before the API call.
|
|
70
|
+
- Prefer changing `subtask_concurrency_limit`, `tcs_per_tom`, or
|
|
71
|
+
`tps_per_tom` for concurrency tuning.
|
|
72
|
+
- NOTE: `sm_omp_threads` and `kernel_omp_threads` do NOT exist in
|
|
73
|
+
Kinetica 7.2.x (not in the allow-list).
|
|
74
|
+
- Avoid `chunk_size` changes without DBA review — affects all query
|
|
75
|
+
performance.
|
|
76
|
+
- `request_timeout` changes affect ALL endpoints system-wide.
|
|
77
|
+
|
|
78
|
+
## For `kinetica_alter_configuration`
|
|
79
|
+
|
|
80
|
+
- ALWAYS read the current config via `kinetica_show_configuration`
|
|
81
|
+
first.
|
|
82
|
+
- Make targeted edits to specific lines — never compose a config from
|
|
83
|
+
scratch.
|
|
84
|
+
- Submit the full modified `config_string` (the entire file is
|
|
85
|
+
replaced).
|
|
86
|
+
- Changes require a service restart to take effect — inform the
|
|
87
|
+
operator.
|
|
88
|
+
- This tool contacts the host manager (port 9300), not the DB engine
|
|
89
|
+
(port 9191).
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Kinetica Rank Architecture
|
|
3
|
+
category: cluster-topology
|
|
4
|
+
keywords: [ranks, rank-0, head, coordinator, worker, shards, metrics-interpretation, asymmetry]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
Kinetica uses a rank-based distributed architecture. A single host
|
|
10
|
+
typically runs one head rank (rank 0) plus one or more worker ranks
|
|
11
|
+
(rank 1, rank 2, …). Understanding the asymmetry between rank 0 and
|
|
12
|
+
worker ranks is essential to correctly interpreting metrics and
|
|
13
|
+
resource-group reports.
|
|
14
|
+
|
|
15
|
+
## Rank 0 — Head / Coordinator
|
|
16
|
+
|
|
17
|
+
- Stores only **metadata** (~4 MB RAM steady-state).
|
|
18
|
+
- Has **no** `PERSIST` / `DISK` / `VRAM` tiers configured — data
|
|
19
|
+
tiers live on worker ranks.
|
|
20
|
+
- Has **no** resource objects (nothing to place in tiers).
|
|
21
|
+
- Has **no** `rank_usage` entry in resource groups.
|
|
22
|
+
- Much lower RAM limit, typically ~750 MB.
|
|
23
|
+
- Responsible for coordinating queries, query planning, and routing
|
|
24
|
+
requests to worker ranks.
|
|
25
|
+
|
|
26
|
+
## Rank 1+ — Workers / Data Nodes
|
|
27
|
+
|
|
28
|
+
- Hold the actual user data.
|
|
29
|
+
- Have full tier configuration (RAM / PERSIST / DISK / VRAM as
|
|
30
|
+
configured in `gpudb.conf`).
|
|
31
|
+
- All 16,384 shards map to worker ranks (rank 0 holds no shards).
|
|
32
|
+
- RAM limits typically 5+ GB per rank.
|
|
33
|
+
|
|
34
|
+
## Interpreting Metrics — Key Rule
|
|
35
|
+
|
|
36
|
+
**Rank 0's low resource usage is normal — it is NOT a sign of
|
|
37
|
+
imbalance or a failing node.**
|
|
38
|
+
|
|
39
|
+
When reviewing `kinetica_get_metrics` or `kinetica_node_details`:
|
|
40
|
+
|
|
41
|
+
- Compare worker ranks against each other — rank 1 vs rank 2 vs …
|
|
42
|
+
- Do NOT compare rank 0 against worker ranks; the asymmetry will
|
|
43
|
+
always make rank 0 look "idle".
|
|
44
|
+
- Do NOT propose rebalance because rank 0 has less data than workers
|
|
45
|
+
— that is the expected topology.
|
|
46
|
+
|
|
47
|
+
Similarly, when a resource group report shows no `rank_usage` for
|
|
48
|
+
rank 0, that is correct — nothing runs in resource groups on the head
|
|
49
|
+
rank.
|
|
50
|
+
|
|
51
|
+
## Single-Worker Clusters
|
|
52
|
+
|
|
53
|
+
Rebalance requires 2+ worker ranks — see `version-quirks-7.2.md` for
|
|
54
|
+
the exact `/admin/rebalance` precondition and error message.
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Kinetica ALTER TABLE — Column Property Syntax
|
|
3
|
+
category: sql-syntax
|
|
4
|
+
keywords:
|
|
5
|
+
[
|
|
6
|
+
alter-table,
|
|
7
|
+
alter-column,
|
|
8
|
+
modify-column,
|
|
9
|
+
dict,
|
|
10
|
+
text-search,
|
|
11
|
+
compress,
|
|
12
|
+
column-properties,
|
|
13
|
+
shard-key,
|
|
14
|
+
kinetica-alter-table-columns,
|
|
15
|
+
]
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Overview
|
|
19
|
+
|
|
20
|
+
Kinetica's `ALTER TABLE` syntax for column properties differs from
|
|
21
|
+
standard SQL in two non-obvious ways:
|
|
22
|
+
|
|
23
|
+
1. Column properties live INSIDE the type parentheses, not as trailing
|
|
24
|
+
clauses.
|
|
25
|
+
2. There is no `SET`/`ADD`/`DROP` for individual properties — every
|
|
26
|
+
change requires repeating the FULL column definition.
|
|
27
|
+
|
|
28
|
+
## Single-Column Changes
|
|
29
|
+
|
|
30
|
+
```sql
|
|
31
|
+
-- Add DICT encoding to an existing column (repeat full definition):
|
|
32
|
+
ALTER TABLE [schema.]table_name
|
|
33
|
+
ALTER COLUMN column_name VARCHAR(size, DICT) [NOT NULL]
|
|
34
|
+
|
|
35
|
+
-- Equivalent MODIFY syntax:
|
|
36
|
+
ALTER TABLE [schema.]table_name
|
|
37
|
+
MODIFY COLUMN column_name VARCHAR(size, DICT) [NOT NULL]
|
|
38
|
+
|
|
39
|
+
-- Remove DICT encoding (omit DICT from definition):
|
|
40
|
+
ALTER TABLE [schema.]table_name
|
|
41
|
+
ALTER COLUMN column_name VARCHAR(size) [NOT NULL]
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Multiple Column Changes
|
|
45
|
+
|
|
46
|
+
Multiple alterations on the same table can be bundled in a single
|
|
47
|
+
statement:
|
|
48
|
+
|
|
49
|
+
```sql
|
|
50
|
+
ALTER TABLE [schema.]table_name
|
|
51
|
+
ALTER COLUMN col1 VARCHAR(50, DICT),
|
|
52
|
+
ALTER COLUMN col2 VARCHAR(100, TEXT_SEARCH) NOT NULL,
|
|
53
|
+
ALTER COLUMN col3 INT(DICT)
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**For agent use:** when recommending 2+ column changes on one table,
|
|
57
|
+
prefer the `kinetica_alter_table_columns` tool — it composes this
|
|
58
|
+
bundled statement automatically and surfaces an interactive checklist
|
|
59
|
+
for operator approval.
|
|
60
|
+
|
|
61
|
+
## Key Rules
|
|
62
|
+
|
|
63
|
+
- **Properties inside parentheses:** `VARCHAR(50, DICT)` —
|
|
64
|
+
NOT `VARCHAR(50) DICT`. Placing the property outside the parens is a
|
|
65
|
+
syntax error.
|
|
66
|
+
- **Full definition required:** type, size, properties, nullability
|
|
67
|
+
must all be repeated. There is no `ALTER COLUMN col SET DICT` syntax.
|
|
68
|
+
- **Available column properties:** `DICT`, `TEXT_SEARCH`,
|
|
69
|
+
`COMPRESS(type)`, `IPV4`, `NORMALIZE`, `INIT_WITH_NOW`,
|
|
70
|
+
`INIT_WITH_UUID`, `UPDATE_WITH_NOW`.
|
|
71
|
+
- **Cascade behavior:** Dependent views, materialized views, and SQL
|
|
72
|
+
procedures are DROPPED when a referenced column is altered. Warn
|
|
73
|
+
operators before proposing ALTER COLUMN on a column with known
|
|
74
|
+
dependencies — check `ki_catalog.ki_depend` first.
|
|
75
|
+
- **Shard keys are immutable** — check `is_shard_key` in
|
|
76
|
+
`ki_catalog.ki_columns` (or `properties` in `kinetica_show_table`)
|
|
77
|
+
before proposing ALTER COLUMN. See `version-quirks-7.2.md` for the
|
|
78
|
+
full rule.
|