npm - @dx-do/cli - Versions diffs - 5.2.49 → 6.0.1 - Mend

@dx-do/cli 5.2.49 → 6.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (185) hide show

package/dist-node/inferred-w998vfq1.md ADDED Viewed

@@ -0,0 +1,41 @@
+---
+id: inferred
+title: inferred — the INFERRED_* prefix and what it means
+aliases: [INFERRED]
+category: topology
+related: [inventorize]
+tags: [core]
+---
+# inferred (the `INFERRED_*` prefix)
+A vertex with a type starting `INFERRED_` was **created by a platform-side inference rule** at ingest time, not by an agent emitting it directly.
+Examples seen on real tenants:
+- `INFERRED_DATABASE` — DB identity unified across multiple agents.
+- `INFERRED_WEBSERVICE` — web service unified across multiple callers.
+- `INFERRED_GENERICBACKEND` — generic backend pattern.
+## Why it exists
+Without inference, the same backing thing can show up as multiple vertices — one per agent observing it. Inference deduplicates. From the user's perspective, an `INFERRED_DATABASE` is a more reliable identity than a per-agent `DATABASE`.
+## What it means for queries
+- The shape (attributes, metrics) of `INFERRED_<TYPE>` is broadly the same as the non-inferred type. Backend-call metrics still live under the producing agents' metric sources, not on the inferred vertex.
+- Don't filter ONLY on `INFERRED_*` — many tenants have agent-side `DATABASE` vertices without the corresponding inference rule firing. To capture all matches, filter on both: `type IN [DATABASE, INFERRED_DATABASE]`.
+- Inferred vertices have an `inferredBackendNode` attribute (the rule's id) where non-inferred vertices have `backendNode` (the agent's id).
+## Compared to "inventorize"
+| Concept | Who creates the vertex | Where rules live |
+|---|---|---|
+| `INFERRED_*` | Platform-side inference rule | Built into DXO2 |
+| Custom inventorize entity | Tenant-side inventorize rule | `inventory create-inventorize-rule` |
+Both fall under the broader idea of "synthesize a vertex from a pattern of telemetry". The prefix is the marker for which side built it.
+## See also
+- `lexicon/inventorize`.
+- `entities/database_or_inferred` — the canonical example.

package/dist-node/installInstructions.md-k9ghf3dr.template ADDED Viewed

@@ -0,0 +1,21 @@
+# ${bundleName} (${bundleVersion})
+## Description
+${bundleDisplayName}
+# Installation Instructions
+deploy to extensions/deploy in the agent directory or add to a package via ACC.
+## Prerequisites
+## Dependencies
+${bundleDependencies}
+## Configuration
+# Usage Instructions
+## Debugging and Troubleshooting

package/dist-node/inventorize-xc9h9bjr.md ADDED Viewed

@@ -0,0 +1,34 @@
+---
+id: inventorize
+title: inventorize — the act of creating an entity from a pattern of telemetry
+aliases: [inventorization, inventorizing]
+category: topology
+related: [inferred]
+tags: [core]
+---
+# inventorize
+In DXO2, "inventorize" is the **platform's act of creating a topology entity from a pattern of telemetry** — telling the platform "when you see X happening, materialize a vertex of type Y with these attributes". The configuration object is an **inventorize rule** managed via the `inventory create-inventorize-rule` CLI command (and the corresponding REST API).
+## Why it exists
+Some metrics imply the existence of a resource that doesn't have a first-class vertex. The classic example: a JMX agent reports `JDBC_Pool|<pool>:Active Connections` — there's clearly a pool, but no `JDBC_CONNECTION_POOL` vertex by default. An inventorize rule can pattern-match on the metric path and create a vertex per pool.
+## How it relates to `INFERRED_*`
+`INFERRED_*` types (e.g. `INFERRED_DATABASE`) are typically **the output of platform-side inference rules**, which are a more constrained cousin of inventorize rules. The platform ships several built-in inference rules; tenants add their own inventorize rules on top.
+- Inferred = platform-shipped rule output.
+- Inventorize = tenant-added rule output (usually under `CUSTOM` layer, often `CUSTOM_INVENTORIZE_<N>` type).
+## How a user might phrase it
+- "Can we see this as an entity?" → yes, with an inventorize rule.
+- "Why is this metric not on a vertex?" → because no inventorize rule matches its pattern yet.
+- "Promote these metrics into an entity" → inventorize.
+## See also
+- `lexicon/inferred`.
+- `cookbooks/metrics-grounding` — when to consider an inventorize rule.

package/dist-node/investigation-planning-6kcm01h9.md ADDED Viewed

@@ -0,0 +1,149 @@
+---
+id: investigation-planning
+title: Investigation planning — attack each question with a plan
+applies_to: all
+tags: [getting-started, discovery, authoring]
+related: [discovery-flow, investigator-flow, entity-relationships, metrics-grounding]
+---
+# Investigation planning
+DXO2 has a lot of data structures, a lot of integrations, and a lot of overloaded terminology. Many user questions don't map directly to a single query — there's a missing step *before* authoring: **figure out what the user actually means and where the answer is most likely to live.**
+This cookbook is the framework for that step. It complements `investigator-flow` (which assumes you already know what to query) and `discovery-flow` (which is about probing tenant capabilities).
+## Before authoring a query, run the planning loop
+```
+   ┌──────────────────────────┐
+   │ 1. Disambiguate the      │
+   │    user's terms          │     ← lexicon + name-search
+   └────────────┬─────────────┘
+                ▼
+   ┌──────────────────────────┐
+   │ 2. Identify candidate    │
+   │    anchors that match    │     ← discovery_search_by_name
+   │    the user's phrasing   │
+   └────────────┬─────────────┘
+                ▼
+   ┌──────────────────────────┐
+   │ 3. Pick a strategy       │     ← entity-first / service-first /
+   │    based on what matched │       universe-first / agent-first /
+   │                          │       metric-first
+   └────────────┬─────────────┘
+                ▼
+   ┌──────────────────────────┐
+   │ 4. Author the query.     │
+   │    If empty / wrong,     │
+   │    consider inventorize  │
+   │    or fall back a level. │
+   └──────────────────────────┘
+```
+Time spent in steps 1-3 saves multiple round-trips of "I asked X, got nothing back, let me reformulate."
+## Step 1 — disambiguate the user's terms
+DXO2 has a small set of words that get used for very different things. Before treating a user's noun as a literal query target, check the lexicon:
+- **service** — DXO2 service vs DXI_SERVICE vs k8s_SERVICE vs OS service vs business service. (`lexicon/service`)
+- **application** — DXO2 application (service with `type=application`) vs APM `applicationName` vs k8s_DEPLOYMENT vs business application. (`lexicon/application`)
+- **universe** — APM Universe vs O2 Universe. (`lexicon/universe`)
+- **host** — uppercase `HOST` (APM) vs mixed-case `Host` (NetOps). (`lexicon/host`)
+- **agent / source / collector / monitor** — colloquial vs literal. (`lexicon/agent_source_collector`)
+A 30-second lexicon read prevents a 10-minute query-flailing session.
+## Step 2 — identify candidate anchors
+Once the term is disambiguated, search for entities whose name or label matches the user's phrasing. **Don't assume one shape**. The MCP tool `discovery_search_by_name` runs a name-search across multiple kinds in one call — services, universes, vertex names, and (when present) DXO2 applications.
+For a phrase like "the X application":
+| Search | What you get |
+|---|---|
+| Service-with-type=application named X | Modern DXO2 apps |
+| DXO2 service named X | Pre-application-era org structure |
+| k8s_DEPLOYMENT named X | Kubernetes-native shops |
+| Universe named X | Some customers use universes as portfolio markers |
+| Vertex with `applicationName=X` (BTs, SERVLETs) | APM-flavored anchor |
+If multiple kinds match, surface all candidates to the user before committing.
+## Step 3 — pick a strategy based on what matched
+Different anchors call for different first moves:
+### Entity-first (a vertex matches the user's phrasing)
+The match is a specific topology vertex. Walk from there:
+- TAS query starting from the vertex (`vertex search` → `detail` → TAS TRAVERSE for neighbors).
+- For metrics: `discovery_metrics_for_entity` to get the canonical metric query body.
+### Service-first (a DXO2 service matches)
+- `discovery_service_hierarchy` to see the service's place in the DAG (parents/children).
+- `query-service-inventory` (CLI: `service query-service-inventory`) to enumerate constituent entities.
+- `service-dependency-graph` for the runtime dependency view.
+- For health metrics: `service-overview` and `service-detail-metrics`.
+- **For "what services does this entity belong to"** — read the entity's `serviceNames` attribute (with TAS `includeServices: true`). Don't traverse the hierarchy; the platform pre-populates this index. See `entities/service` for the pattern.
+### Universe-first (a universe matches)
+- Determine APM vs O2 (the two universe kinds have different APIs).
+- Use the universe as a scope filter for a downstream TAS query (member services → member entities).
+### Agent-first (the user named an agent / collector / host)
+- Find the AGENT vertex by name or by host.
+- TRAVERSE `agent-monitors` outgoing edges to enumerate what the agent produces metrics for.
+- Query MM under the agent's metric source for raw data.
+### Metric-first (no entity matches but the user described a metric or pattern)
+This is the **JMX-flavored** case. The thing the user is asking about (a JDBC pool, a queue, a thread-pool, a cache region) very likely **has no vertex** because no integration currently materializes it.
+- Query MM by `metric.path` regex matching the user's intent.
+- Inspect the producing agent's source to ground identity.
+- Cross-reference `cookbooks/metrics-grounding` for the orphaned-by-design pattern.
+If the metric pattern is high-value and recurring, **consider an inventorize rule** to materialize the resource as a vertex (`inventory create-inventorize-rule`). See `lexicon/inventorize`.
+## Step 4 — author the query, but plan the fallback
+Empty results don't always mean "no data". They can mean:
+- The wrong vertex type filter excluded the actual matches (`HOST` vs `Host` is the textbook example).
+- The wrong layer was filtered to.
+- The agent reports the metric but no inventorize rule has materialized the entity yet.
+- The customer's universe scope hides the data behind the active scope.
+When you get empty results, **don't just retry with broader filters**. Drop one level in the strategy ladder:
+```
+ entity-first   → service-first  → universe-first  → metric-first  → inventorize
+ (anchored)                                                          (synthesize)
+```
+Each step is broader, and each step is a different *kind* of question.
+## Things that are universally fine to do early
+These tools are read-only, cached, and cheap — call them freely as part of step 1 / step 2:
+- `corpus_list("lexicon")` — overload disambiguation.
+- `corpus_list("entities")` — what kinds of vertices a tenant *can* have.
+- `discovery_layers`, `discovery_universes`, `discovery_services` — what the tenant *does* have.
+- `discovery_search_by_name` — bridge step 1 → step 2.
+- `discovery_vertex_types` — empirical type distribution on this tenant.
+- `discovery_edge_semantics` — what kinds of relationships exist on this tenant.
+Save expensive moves (`run_query` with a wide projection, full TAS dumps) for after the planning round-trips.
+## See also
+- `cookbooks/investigator-flow` — assumes you've planned, walks through the actual query authoring.
+- `cookbooks/discovery-flow` — what the discovery tools do.
+- `cookbooks/entity-relationships` — the kinds of relationships that exist.
+- `cookbooks/metrics-grounding` — the metric-first strategy in depth.
+- `lexicon/` — every overloaded term.

package/dist-node/investigator-flow-jc2s0n46.md ADDED Viewed

@@ -0,0 +1,186 @@
+---
+id: investigator-flow
+title: Investigator flow — handling natural-language DXO2 questions end-to-end
+applies_to: all
+tags: [investigator, getting-started, decision-tree]
+related: [universes-and-scopes, service-hierarchies, query-vs-analysis-separation, discovery-flow, gotchas]
+---
+# Investigator flow
+A typical DXO2 user asks **investigative** questions, not query-shaped ones:
+> *"Are there any hosts with high CPU right now?"*
+> *"Which services are degraded?"*
+> *"Why did APM-Eastern go red overnight?"*
+> *"Show me the top noisy metric sources."*
+The user doesn't care about TAS vs NASSQL, layers, or filter ops — they care about *the answer*. The investigator's job is to translate that intent into a query that gets the user close to the answer they're really looking for, then hand off to the ui (for human refinement / dashboarding) or to an analyzer (for thresholding / anomaly detection).
+This cookbook is the canonical decision tree.
+## The investigator loop (in order)
+```
+  user asks
+     ↓
+  scope-clarification checklist
+     ↓
+  ask 1-2 targeted questions (only the load-bearing ambiguities)
+     ↓
+  pick query type (TAS vs NASSQL, source op)
+     ↓
+  author query + verify with run_partial / run_query
+     ↓
+  save query (create_query)
+     ↓
+  hand off — UI link, or analyzer subagent
+```
+Each step below.
+## Step 1 — Scope-clarification checklist
+Before authoring anything, mentally walk this checklist:
+| Ambiguity | Source of truth | When to ask |
+|---|---|---|
+| **Service** — named? if so, which subservice scope? Direct children only or full descent? | `entities/service` + `discovery_service_hierarchy` (DAG view) | When the user names a service or anything service-shaped. **Check first** — services are the preferred scope axis when populated. |
+| **Universe** — which (or any)? APM vs O2? | `entities/universe` + `discovery_universes` | Always on a multi-universe tenant. Skip on single-universe tenants. |
+| **Time window** — now, last hour, last day, custom? | (whatever the user says) | When the question implies time (most do). Default to "right now" → ~1 hour. |
+| **Threshold** — what does "high" / "low" / "noisy" / "slow" mean? | (whatever the user has in mind) | When the question implies a threshold. Often best to *not* threshold in the query — see step 5. |
+| **Output shape** — list, count, ranking, time series? | (the question's grammar) | "Are there any" → list; "how many" → count; "top" → ranking; "show me the trend" → time series. |
+The first two are the **load-bearing** ones because they change the query's filter shape entirely. Time/threshold/output usually have safe defaults.
+### Scope-axis precedence — services first, then user-created universes
+DXO2 has two scope axes. They overlap; in practice they answer different questions about the same tenant. The default order:
+1. **Services first.** Use `discovery_service_hierarchy` to inventory the service catalog. Look at `totalServices`, `rootCount`, `roots[]`, and `tags`. If services look fully populated (a healthy tree of named, tagged services that obviously map to ownership/technology/functionality boundaries), **prefer service-scoped queries** for any investigative work.
+2. **Universes second, system-defaults filtered.** Use `discovery_universes`. **Skip `id == 'UNsaasProd'` (APM default) and `id == 'VIEWALL'` (O2 default)** — those are tenant-creation defaults rather than user-modeled scopes; they don't reflect organizational intent. The remaining universes are user-created and meaningful.
+3. **If services are sparse, lean on user-created universes.** Some tenants haven't fully populated services yet; in that case the universe axis is the better proxy for org shape.
+4. **Both populated?** Like demo-prod — prefer service-scoped queries. Universes still answer "which slice of data am I allowed to see," but services answer "what is this for."
+State the org-shape inference back to the user as part of the clarifying question — *"this tenant has ~100 services across 24 root branches plus 6 user-created universes; I'd lean on the service axis…"* — so the user can redirect if their mental model differs.
+## Step 2 — Ground the scope empirically
+Do *not* speak about universes or services abstractly. Run the discovery tools (front-door, all `alwaysLoad`):
+```
+corpus_get('entities', 'universe')          ← refresh on the conceptual model if needed
+corpus_get('entities', 'service')           ← same for services
+discovery_capabilities                        ← what the bound tenant supports
+discovery_universes                           ← APM + O2 universes, name-first shape
+discovery_service_hierarchy                   ← DAG view: roots, children, parents, tags
+```
+(`discovery_services` returns just a flat name list — cheaper, but use the hierarchy variant when you need parent/child or want to surface "broaden to siblings?" cleanly.)
+Cache the counts + roots in your own context — *"on this tenant: 11 user-created APM universes, 6 user-created O2 universes (`VIEWALL` excluded as system default), 106 services across 24 roots"* — so you can surface them in the same sentence as your clarifying question.
+## Step 3 — Ask the targeted question
+Surface what you found and ask only what's load-bearing. **One or two questions, not five.** Example for *"are there any hosts with high CPU right now?"*:
+> *"Quick scope check — this tenant has 3 APM universes (Prod, Staging, NA-EMEA-Combined), 7 O2 universes, and ~140 services. Did you want all hosts cross-universe, or hosts in a specific universe / service? And for 'high CPU' — do you have a threshold in mind (e.g. >70% sustained, >90% peak), or should I produce a query that returns the raw CPU data and we look at it together?"*
+Notice:
+- One sentence states what's there. (Surfacing scale is an answer to "where do I even start.")
+- Two questions: scope, then thresholding.
+- The threshold question has a built-in escape hatch (*"or produce raw data"*) — see step 5.
+## Step 4 — Pick query type (TAS vs NASSQL, then the source op)
+Based on the user's confirmed intent:
+| Question shape | TAS or NASSQL? | If NASSQL, source op |
+|---|---|---|
+| "are there any X" / "list the X" / "which X" | **TAS** (entities) | n/a |
+| "how many X" / "count X" | NASSQL (`FROM_TOPOLOGY`) | `FROM_TOPOLOGY` |
+| "top N X by metric" | NASSQL (mostly) | `FROM_METADATA` if ranking by metric *count*; `FROM` if ranking by metric *values* |
+| "show CPU over time for X" | NASSQL | `FROM` (datapoints) |
+| "what metrics exist for X" | NASSQL | `FROM_METADATA` (definitions, not values) |
+| "trace X to Y" / "what's connected to X" | **TAS** (`TRAVERSE`) | n/a |
+The metadata-vs-data distinction is the most-confused one — see `cookbooks/discovery-flow` for the full hierarchy and `cookbooks/nassql-quickstart` for source-op details.
+## Step 5 — Default: produce the query, not the analysis
+This is the load-bearing default for investigative work:
+> **Produce a query that returns the raw data the user cares about. Let the threshold / anomaly / trend analysis happen elsewhere — in the ui where the user can see it, or in an analyzer subagent.**
+Why: the same data shape can support multiple analyses. *"Hosts with high CPU"* could mean:
+- 1-of-30 datapoints over 90% (a momentary spike — usually noise)
+- All-of-30 datapoints over 70% (sustained pressure — usually the real story)
+- Peak over 95% in the last 6 hours (an SLO-shaped question)
+A query that returns just the CPU datapoints for the chosen scope supports all three. A query with `FILTER cpu > 90` baked in supports only the first, and silently. **The pre-filtered query is rarely what the user actually wanted, and it's silently lossy.**
+When to threshold in the query anyway:
+- The user explicitly stated a threshold (*"hosts with CPU over 90"*).
+- The result set would otherwise be unmanageable (e.g. millions of rows). Then add a sane `limit` first; only add a value-threshold if `limit` isn't enough.
+See `cookbooks/query-vs-analysis-separation` for the pattern catalog.
+## Step 6 — Author + verify
+Use corpus content as starting points:
+- `corpus_list('queries', {type: 'tas'})` — TAS examples.
+- `corpus_list('queries', {type: 'nassql'})` — NASSQL examples.
+- `corpus_get('queries', '<id>')` — full payload + per-op descriptions.
+Read the descriptions — they capture *why* the query is shaped a certain way and what mistake the alternative would be (these are agent-authored notes, exactly the kind of grounding you'd want).
+**Before authoring an op you don't write daily, call `query_schema` with that op.** The default response is a 4 KB summary listing the op's properties (name, description, required, typeHint) — cheaper than guess-and-retry, and prevents the textbook field-name fumbles (`_type` vs `type` on `ATTRIBUTE`, `count` vs `n` on `TOP`, etc.). Examples: `{type: "tas", op: "ATTRIBUTE"}`, `{type: "nassql", op: "FROM"}`, `{type: "metadata", op: "SPEC"}`. Reach for `full: true` only when nested validation detail (enum values past 6 entries, recursive sub-shapes) actually matters.
+Verify before saving:
+- `run_query` for full execution. Watch for empty-payload trap (see `gotchas`).
+- `run_partial_query` for sub-tree verification when you have nested filters or multi-step NASSQL pipelines.
+## Step 7 — Save + hand off
+Save with `create_query`:
+```
+create_query({
+  type: 'tas' | 'nassql' | 'metadata',
+  shortName: '<descriptive-kebab-case>',
+  payload: { ... },
+  descriptions: {
+    '$': '<top-level intent — what this query answers>',
+    '$.filter.input[0]': '<why this op shape>',
+    // … per-op notes capture the load-bearing reasoning
+  },
+  tmd: { description: '<one-line summary>', tags: ['investigator', 'foray-N'] },
+  md: '<companion doc explaining intent / expected output / gotchas>'
+})
+```
+The `descriptions` are the most valuable thing you leave behind — future agents reading this query via `get_query` get the reasoning for free.
+Then hand off:
+> *"Saved as `15-cpu-by-host`. Open in the ui to refine the time window, run, or attach to a dashboard: `http://localhost:<port>/queries/15-cpu-by-host`. (Or if you'd like a thresholded analysis right now, I can hand it to the analyzer.)"*
+## Anti-patterns to avoid
+- **Skipping the scope-clarification step**. *"Hosts with high CPU"* on a multi-universe tenant authored against the wrong universe is a wrong answer; the user has to redo the work.
+- **Asking five clarifying questions.** Pick the one or two that are load-bearing for the query shape. The rest can default.
+- **Hard-coding tenant-specific service / universe names.** Always look up via discovery; tenant data changes.
+- **Defaulting to thresholded queries.** Default to data-fetching. Threshold only when the user said.
+- **Treating "service" as DXI_SERVICE.** Almost always, "service" means `saService` (the organizational axis). See `entities/dxi_service` for the disambiguation.
+- **Long-form summaries instead of saving the query.** The deliverable is a saved query the user can refine, not a paragraph of explanation. Save first, explain in the handoff.
+## See also
+- `cookbooks/universes-and-scopes` — the universe-axis details.
+- `cookbooks/service-hierarchies` — the service-axis details + the `excludeSubServices` knob.
+- `cookbooks/query-vs-analysis-separation` — the data-vs-analysis default and the pattern catalog.
+- `cookbooks/discovery-flow` — discovery tool ordering.
+- `cookbooks/gotchas` — empirical surprises across all of the above.
+- `entities/universe`, `entities/service`, `entities/dxi_service` — the foundational entities.

package/dist-node/k8s_deployment_and_namespace-69c29152.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+id: k8s_deployment_and_namespace
+title: k8s_DEPLOYMENT / k8s_NAMESPACE / k8s_CLUSTER — the Kubernetes structural layer
+layer: INFRASTRUCTURE
+related_entities: [k8s_pod, k8s_replicaset, k8s_service, host]
+related_cookbooks: []
+tags: [k8s, infrastructure, integration-specific]
+---
+# k8s_DEPLOYMENT, k8s_NAMESPACE, k8s_CLUSTER
+## What they are
+These three vertex types describe the **structural** Kubernetes layer above the running pods/containers:
+- `k8s_CLUSTER` — a monitored Kubernetes cluster (top of the tree).
+- `k8s_NAMESPACE` — a namespace within a cluster (logical grouping).
+- `k8s_DEPLOYMENT` — a Deployment workload (declared spec for a set of pods).
+`k8s_REPLICASET` (an intermediate the deployment owns), `k8s_DAEMONSET`, and `k8s_STATEFULSET` are siblings of `k8s_DEPLOYMENT` at the workload-owner level.
+## Why this matters
+When a user says "the X application" in a Kubernetes-native shop, the right anchor entity is usually `k8s_DEPLOYMENT`, not a DXO2 service or APM application. The deployment name is what developers and SREs talk about every day; pods come and go.
+The hierarchy:
+```
+k8s_CLUSTER
+  └─ contains → k8s_NAMESPACE
+                   └─ contains → k8s_DEPLOYMENT  (or DAEMONSET / STATEFULSET)
+                                    └─ orchestrates → k8s_REPLICASET
+                                                         └─ orchestrates → k8s_POD
+                                                                              └─ contains → k8s_CONTAINER
+```
+The exact edge `semantic` values vary (`contains`, `orchestrates`, `composed_of`); `discovery_edge_semantics` on a tenant returns the empirical set.
+## Useful attributes
+k8s_CLUSTER:
+- `name`, `k8s_cluster_name`, `k8s_cluster_namespaces`, `k8s_cluster_nodes`.
+- `k8s_agent_data_source`, `k8s_project`, `k8s_type`.
+k8s_NAMESPACE:
+- `name`, `k8s_namespace_phase`, `k8s_namespace_uuid`.
+- `k8s_namespace_labels_kubernetes.io/metadata.name`.
+k8s_DEPLOYMENT:
+- `name`, `k8s_deployment.name`, `k8s_deployment_vertexid`.
+- `k8s_deployment_annotations_deployment.kubernetes.io/revision`.
+- `k8s_cluster_name`, `k8s_project`, `k8s_type`, `keyPrefix`.
+## Common starting moves
+- "How is application X" → search by name across `k8s_DEPLOYMENT` and DXO2 services. If `k8s_DEPLOYMENT` matches, walk down to pods and roll up CPU / memory / restart-count.
+- "All pods in namespace prod" → filter `k8s_POD` by traversal from the matching `k8s_NAMESPACE`.
+- "Cluster overview" → `k8s_CLUSTER` for top-level rollup, then `k8s_NAMESPACE` counts per cluster.
+## Mapping to "application"
+DXO2 has multiple notions of application; k8s adds another:
+- DXO2 application — a service with `type=application` (recently introduced; see `lexicon/application`).
+- APM-side application — `applicationName` attribute on BT / SERVLET / etc.
+- Kubernetes application — the convention is "one Deployment per app" (occasionally many Deployments grouped by labels).
+When grounding for a user query mentioning "application X", the agent should:
+1. Search by name across `k8s_DEPLOYMENT`, DXO2 service-with-type=application, and APM-flavored entities.
+2. Pick the strongest match. If `k8s_DEPLOYMENT` is the only hit, that's the anchor.
+3. If multiple match, ask the user to disambiguate or explore all of them.
+## Common synonyms / mistakes
+- "Cluster" is unambiguous in k8s shops; in NetOps it could mean a different concept entirely.
+- "Namespace" is k8s-specific terminology; users say "the prod environment" / "staging" — these usually map to k8s_NAMESPACE but sometimes to a DXO2 service or a label.
+- "Deployment" is unambiguous in k8s shops. In non-k8s contexts it can mean "a release", which is a different concept (no vertex type).
+## Related entities
+- **k8s_POD / k8s_CONTAINER** — the runtime layer below.
+- **k8s_SERVICE** — the network-routing layer (NOT the same as a DXO2 service).
+- **HOST** — when the cluster's worker nodes are also monitored as APM/Infrastructure HOSTs, both vertex types coexist.
+## See also
+- `entities/k8s_pod_and_container` — for the runtime layer.
+- `lexicon/service` — k8s_SERVICE vs DXO2 service vs others.
+- `lexicon/application` — when "application" means a deployment.

package/dist-node/k8s_pod_and_container-9h4v6cmj.md ADDED Viewed

@@ -0,0 +1,64 @@
+---
+id: k8s_pod_and_container
+title: k8s_POD / k8s_CONTAINER — the Kubernetes execution layer
+layer: INFRASTRUCTURE
+related_entities: [k8s_deployment, k8s_namespace, k8s_node, k8s_service, host]
+related_cookbooks: []
+tags: [k8s, infrastructure, integration-specific]
+---
+# k8s_POD and k8s_CONTAINER
+## What they are
+A `k8s_POD` is a Kubernetes pod — one or more containers scheduled together onto a node. A `k8s_CONTAINER` is a single container within that pod. Sidecars and init containers each get their own `k8s_CONTAINER` vertex; pod-level metrics aggregate across them.
+Most tenants running k8s have many more containers than pods (sidecar-heavy environments push the ratio above 2:1). The relationship is `k8s_POD --contains--> k8s_CONTAINER`.
+## Where they live
+Both live in the **INFRASTRUCTURE** layer (despite the runtime being heavily APM-flavored — Kubernetes-as-infrastructure is the platform's framing). The k8s agent is the producer.
+## Useful attributes
+k8s_POD:
+- `name`, `k8s_pod_nodename`, `k8s_pod_vertexid`, `k8s_container.name`.
+- `k8s_cluster_name`, `k8s_agent_data_source`, `k8s_project`.
+- `agent`, `product`, `ACNId`, `SourceProduct`, `resourceType`.
+k8s_CONTAINER:
+- `name`, `k8s_container_image`, `k8s_container_imagePullPolicy`.
+- `k8s_pod_nodename`, `k8s_cluster_name`, `k8s_agent_data_source`.
+## How metrics are reported
+The k8s agent reports per-pod and per-container CPU / memory / disk-IO / network. Pod-level metrics aggregate across containers. To find them:
+- Use MM `discovery_metrics` filtered by `metric.source` containing the cluster name and pod or container name.
+- The container's image often disambiguates which container's metrics you want — e.g. `metric.path` containing `nginx-sidecar` vs `app`.
+Restart-count and crash-loop signals show up at the pod level via `restart_count_total` (or similar — name varies by k8s agent version).
+## Common starting moves
+- "Pods crashlooping" — filter `k8s_POD` by `agent-monitors` edges where the corresponding metric crosses a threshold; use NASSQL `FROM_DATA` joined to `FROM_TOPOLOGY`.
+- "Memory pressure on pod X" — pod's `k8s_pod_nodename` is the host; combine pod metrics with k8s_NODE-level pressure.
+- "All containers running image Y" — filter `k8s_CONTAINER` by `k8s_container_image LIKE %Y%`.
+## Common synonyms / mistakes
+- "Pod" alone is unambiguous. "Container" is ambiguous in non-k8s shops where it can mean a Docker container, an LXC, an OCI runtime — in DXO2 with k8s monitoring it almost always means `k8s_CONTAINER`. With ECS it means `AWS_ECS_FARGATE_CONTAINER`.
+- Don't confuse `k8s_POD` with the application or deployment — a pod is **one running instance** of the deployment's spec. To answer "the X application" the right anchor is usually `k8s_DEPLOYMENT`, not the pod.
+## Related entities
+- **k8s_DEPLOYMENT** / **k8s_REPLICASET** / **k8s_DAEMONSET** / **k8s_STATEFULSET** — the workload owners (pod creators).
+- **k8s_NAMESPACE** — the namespace the pod belongs to.
+- **k8s_NODE** — the worker node the pod runs on.
+- **k8s_SERVICE** — the network-level service that routes traffic to a set of pods.
+- **HOST** — when the same node is also monitored by an Infrastructure agent, both `k8s_NODE` and `HOST` may exist for the same physical machine. Match by hostname / IP.
+## See also
+- `entities/k8s_deployment_and_namespace` — for the workload-owner side.
+- `lexicon/service` — k8s_SERVICE vs DXO2 service disambiguation.