siclaw 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +75 -114
- package/dist/agentbox/gateway-client.d.ts +2 -1
- package/dist/agentbox/gateway-client.js +6 -2
- package/dist/agentbox/gateway-client.js.map +1 -1
- package/dist/agentbox/http-server.js +184 -19
- package/dist/agentbox/http-server.js.map +1 -1
- package/dist/agentbox/resource-handlers.d.ts +1 -0
- package/dist/agentbox/resource-handlers.js +23 -23
- package/dist/agentbox/resource-handlers.js.map +1 -1
- package/dist/agentbox/session.js +85 -5
- package/dist/agentbox/session.js.map +1 -1
- package/dist/agentbox-main.d.ts +2 -1
- package/dist/agentbox-main.js +65 -18
- package/dist/agentbox-main.js.map +1 -1
- package/dist/cli-credentials.d.ts +1 -0
- package/dist/cli-credentials.js +109 -0
- package/dist/cli-credentials.js.map +1 -0
- package/dist/cli-first-run.d.ts +11 -0
- package/dist/cli-first-run.js +99 -0
- package/dist/cli-first-run.js.map +1 -0
- package/dist/cli-main.js +33 -11
- package/dist/cli-main.js.map +1 -1
- package/dist/cli-setup.d.ts +5 -11
- package/dist/cli-setup.js +12 -225
- package/dist/cli-setup.js.map +1 -1
- package/dist/core/agent-factory.d.ts +4 -0
- package/dist/core/agent-factory.js +102 -151
- package/dist/core/agent-factory.js.map +1 -1
- package/dist/core/config.d.ts +10 -3
- package/dist/core/config.js +11 -95
- package/dist/core/config.js.map +1 -1
- package/dist/core/extensions/deep-investigation.d.ts +2 -1
- package/dist/core/extensions/deep-investigation.js +144 -24
- package/dist/core/extensions/deep-investigation.js.map +1 -1
- package/dist/core/extensions/setup.d.ts +8 -0
- package/dist/core/extensions/setup.js +669 -0
- package/dist/core/extensions/setup.js.map +1 -0
- package/dist/core/llm-proxy.js +7 -3
- package/dist/core/llm-proxy.js.map +1 -1
- package/dist/core/mcp-client.d.ts +0 -10
- package/dist/core/mcp-client.js +0 -65
- package/dist/core/mcp-client.js.map +1 -1
- package/dist/core/prompt.d.ts +1 -1
- package/dist/core/prompt.js +42 -5
- package/dist/core/prompt.js.map +1 -1
- package/dist/core/provider-presets.d.ts +14 -0
- package/dist/core/provider-presets.js +81 -0
- package/dist/core/provider-presets.js.map +1 -0
- package/dist/cron/cron-coordinator.d.ts +2 -0
- package/dist/cron/cron-coordinator.js +46 -14
- package/dist/cron/cron-coordinator.js.map +1 -1
- package/dist/cron/cron-executor.js +33 -8
- package/dist/cron/cron-executor.js.map +1 -1
- package/dist/cron/cron-scheduler.d.ts +1 -1
- package/dist/cron/gateway-client.d.ts +5 -0
- package/dist/cron/gateway-client.js +43 -8
- package/dist/cron/gateway-client.js.map +1 -1
- package/dist/cron-main.js +39 -9
- package/dist/cron-main.js.map +1 -1
- package/dist/gateway/agentbox/client.d.ts +11 -0
- package/dist/gateway/agentbox/client.js +18 -0
- package/dist/gateway/agentbox/client.js.map +1 -1
- package/dist/gateway/agentbox/k8s-spawner.d.ts +11 -2
- package/dist/gateway/agentbox/k8s-spawner.js +95 -52
- package/dist/gateway/agentbox/k8s-spawner.js.map +1 -1
- package/dist/gateway/agentbox/local-spawner.d.ts +1 -1
- package/dist/gateway/agentbox/local-spawner.js +4 -2
- package/dist/gateway/agentbox/local-spawner.js.map +1 -1
- package/dist/gateway/agentbox/manager.d.ts +0 -10
- package/dist/gateway/agentbox/manager.js +11 -30
- package/dist/gateway/agentbox/manager.js.map +1 -1
- package/dist/gateway/agentbox/types.d.ts +6 -4
- package/dist/gateway/cron/cron-service.d.ts +49 -0
- package/dist/gateway/cron/cron-service.js +259 -0
- package/dist/gateway/cron/cron-service.js.map +1 -0
- package/dist/gateway/db/init-schema.js +44 -0
- package/dist/gateway/db/init-schema.js.map +1 -1
- package/dist/gateway/db/migrate-sqlite.js +73 -4
- package/dist/gateway/db/migrate-sqlite.js.map +1 -1
- package/dist/gateway/db/repositories/chat-repo.d.ts +56 -2
- package/dist/gateway/db/repositories/chat-repo.js +132 -2
- package/dist/gateway/db/repositories/chat-repo.js.map +1 -1
- package/dist/gateway/db/repositories/config-repo.d.ts +31 -2
- package/dist/gateway/db/repositories/config-repo.js +57 -7
- package/dist/gateway/db/repositories/config-repo.js.map +1 -1
- package/dist/gateway/db/repositories/env-repo.d.ts +14 -0
- package/dist/gateway/db/repositories/env-repo.js +15 -2
- package/dist/gateway/db/repositories/env-repo.js.map +1 -1
- package/dist/gateway/db/repositories/model-config-repo.d.ts +1 -1
- package/dist/gateway/db/repositories/model-config-repo.js +26 -12
- package/dist/gateway/db/repositories/model-config-repo.js.map +1 -1
- package/dist/gateway/db/repositories/skill-repo.d.ts +0 -5
- package/dist/gateway/db/repositories/skill-review-repo.d.ts +1 -0
- package/dist/gateway/db/repositories/skill-review-repo.js +4 -1
- package/dist/gateway/db/repositories/skill-review-repo.js.map +1 -1
- package/dist/gateway/db/repositories/skill-version-repo.js +0 -1
- package/dist/gateway/db/repositories/skill-version-repo.js.map +1 -1
- package/dist/gateway/db/repositories/system-config-repo.d.ts +1 -1
- package/dist/gateway/db/repositories/system-config-repo.js +2 -1
- package/dist/gateway/db/repositories/system-config-repo.js.map +1 -1
- package/dist/gateway/db/repositories/user-env-config-repo.d.ts +13 -0
- package/dist/gateway/db/repositories/user-env-config-repo.js +11 -0
- package/dist/gateway/db/repositories/user-env-config-repo.js.map +1 -1
- package/dist/gateway/db/repositories/workspace-repo.d.ts +3 -2
- package/dist/gateway/db/repositories/workspace-repo.js +6 -2
- package/dist/gateway/db/repositories/workspace-repo.js.map +1 -1
- package/dist/gateway/db/schema-mysql.d.ts +473 -51
- package/dist/gateway/db/schema-mysql.js +35 -4
- package/dist/gateway/db/schema-mysql.js.map +1 -1
- package/dist/gateway/db/schema-sqlite.d.ts +522 -57
- package/dist/gateway/db/schema-sqlite.js +38 -6
- package/dist/gateway/db/schema-sqlite.js.map +1 -1
- package/dist/gateway/db/schema.d.ts +471 -51
- package/dist/gateway/db/schema.js +1 -1
- package/dist/gateway/db/schema.js.map +1 -1
- package/dist/gateway/metrics-aggregator.d.ts +65 -0
- package/dist/gateway/metrics-aggregator.js +244 -0
- package/dist/gateway/metrics-aggregator.js.map +1 -0
- package/dist/gateway/plugins/channel-bridge.d.ts +4 -1
- package/dist/gateway/plugins/channel-bridge.js +78 -86
- package/dist/gateway/plugins/channel-bridge.js.map +1 -1
- package/dist/gateway/rpc-methods.d.ts +4 -2
- package/dist/gateway/rpc-methods.js +962 -163
- package/dist/gateway/rpc-methods.js.map +1 -1
- package/dist/gateway/security/cert-manager.d.ts +2 -2
- package/dist/gateway/security/cert-manager.js +4 -2
- package/dist/gateway/security/cert-manager.js.map +1 -1
- package/dist/gateway/server.d.ts +4 -8
- package/dist/gateway/server.js +297 -261
- package/dist/gateway/server.js.map +1 -1
- package/dist/gateway/skills/file-writer.js +17 -11
- package/dist/gateway/skills/file-writer.js.map +1 -1
- package/dist/gateway/skills/script-evaluator.js +12 -9
- package/dist/gateway/skills/script-evaluator.js.map +1 -1
- package/dist/gateway/web/dist/assets/index-0p17ZeTP.js +740 -0
- package/dist/gateway/web/dist/assets/index-9eP6nPUq.js +741 -0
- package/dist/gateway/web/dist/assets/index-9eP6nPUq.js.map +1 -0
- package/dist/gateway/web/dist/assets/index-CAmSY91d.js +675 -0
- package/dist/gateway/web/dist/assets/index-DMFEh8Pp.css +1 -0
- package/dist/gateway/web/dist/assets/index-DyowBCEj.css +1 -0
- package/dist/gateway/web/dist/assets/index-PDK5JJDO.css +1 -0
- package/dist/gateway/web/dist/index.html +2 -2
- package/dist/gateway-main.js +27 -10
- package/dist/gateway-main.js.map +1 -1
- package/dist/memory/embeddings.js +5 -4
- package/dist/memory/embeddings.js.map +1 -1
- package/dist/memory/indexer.d.ts +23 -3
- package/dist/memory/indexer.js +235 -23
- package/dist/memory/indexer.js.map +1 -1
- package/dist/memory/schema.js +15 -1
- package/dist/memory/schema.js.map +1 -1
- package/dist/memory/types.d.ts +18 -0
- package/dist/memory/types.js +6 -1
- package/dist/memory/types.js.map +1 -1
- package/dist/shared/detect-language.d.ts +12 -0
- package/dist/shared/detect-language.js +78 -0
- package/dist/shared/detect-language.js.map +1 -0
- package/dist/shared/diagnostic-events.d.ts +70 -0
- package/dist/shared/diagnostic-events.js +38 -0
- package/dist/shared/diagnostic-events.js.map +1 -0
- package/dist/shared/local-collector.d.ts +56 -0
- package/dist/shared/local-collector.js +284 -0
- package/dist/shared/local-collector.js.map +1 -0
- package/dist/shared/metrics-types.d.ts +64 -0
- package/dist/shared/metrics-types.js +25 -0
- package/dist/shared/metrics-types.js.map +1 -0
- package/dist/shared/metrics.d.ts +19 -0
- package/dist/shared/metrics.js +185 -0
- package/dist/shared/metrics.js.map +1 -0
- package/dist/shared/path-utils.d.ts +15 -0
- package/dist/shared/path-utils.js +23 -0
- package/dist/shared/path-utils.js.map +1 -0
- package/dist/shared/retry.d.ts +35 -0
- package/dist/shared/retry.js +61 -0
- package/dist/shared/retry.js.map +1 -0
- package/dist/tools/command-sets.d.ts +18 -2
- package/dist/tools/command-sets.js +207 -32
- package/dist/tools/command-sets.js.map +1 -1
- package/dist/tools/command-validator.d.ts +56 -0
- package/dist/tools/command-validator.js +357 -0
- package/dist/tools/command-validator.js.map +1 -0
- package/dist/tools/create-skill.js +26 -1
- package/dist/tools/create-skill.js.map +1 -1
- package/dist/tools/credential-list.js +1 -23
- package/dist/tools/credential-list.js.map +1 -1
- package/dist/tools/credential-manager.d.ts +98 -0
- package/dist/tools/credential-manager.js +313 -0
- package/dist/tools/credential-manager.js.map +1 -0
- package/dist/tools/deep-search/engine.js +184 -127
- package/dist/tools/deep-search/engine.js.map +1 -1
- package/dist/tools/deep-search/prompts.d.ts +10 -2
- package/dist/tools/deep-search/prompts.js +37 -36
- package/dist/tools/deep-search/prompts.js.map +1 -1
- package/dist/tools/deep-search/schemas.d.ts +87 -0
- package/dist/tools/deep-search/schemas.js +85 -0
- package/dist/tools/deep-search/schemas.js.map +1 -0
- package/dist/tools/deep-search/sub-agent.d.ts +21 -0
- package/dist/tools/deep-search/sub-agent.js +153 -4
- package/dist/tools/deep-search/sub-agent.js.map +1 -1
- package/dist/tools/deep-search/tool.js +1 -0
- package/dist/tools/deep-search/tool.js.map +1 -1
- package/dist/tools/deep-search/types.d.ts +2 -0
- package/dist/tools/deep-search/types.js.map +1 -1
- package/dist/tools/dp-tools.js +29 -5
- package/dist/tools/dp-tools.js.map +1 -1
- package/dist/tools/exec-utils.d.ts +85 -0
- package/dist/tools/exec-utils.js +294 -0
- package/dist/tools/exec-utils.js.map +1 -0
- package/dist/tools/fork-skill.js +14 -2
- package/dist/tools/fork-skill.js.map +1 -1
- package/dist/tools/investigation-feedback.d.ts +3 -0
- package/dist/tools/investigation-feedback.js +71 -0
- package/dist/tools/investigation-feedback.js.map +1 -0
- package/dist/tools/manage-schedule.js +16 -6
- package/dist/tools/manage-schedule.js.map +1 -1
- package/dist/tools/netns-script.js +27 -281
- package/dist/tools/netns-script.js.map +1 -1
- package/dist/tools/node-exec.d.ts +2 -14
- package/dist/tools/node-exec.js +18 -225
- package/dist/tools/node-exec.js.map +1 -1
- package/dist/tools/node-script.js +14 -168
- package/dist/tools/node-script.js.map +1 -1
- package/dist/tools/pod-exec.d.ts +1 -1
- package/dist/tools/pod-exec.js +10 -26
- package/dist/tools/pod-exec.js.map +1 -1
- package/dist/tools/pod-nsenter-exec.js +21 -225
- package/dist/tools/pod-nsenter-exec.js.map +1 -1
- package/dist/tools/pod-script.js +10 -19
- package/dist/tools/pod-script.js.map +1 -1
- package/dist/tools/restricted-bash.d.ts +1 -17
- package/dist/tools/restricted-bash.js +38 -252
- package/dist/tools/restricted-bash.js.map +1 -1
- package/dist/tools/run-skill.d.ts +3 -1
- package/dist/tools/run-skill.js +21 -1
- package/dist/tools/run-skill.js.map +1 -1
- package/dist/tools/script-resolver.d.ts +3 -1
- package/dist/tools/script-resolver.js +74 -30
- package/dist/tools/script-resolver.js.map +1 -1
- package/dist/tools/update-skill.js +17 -6
- package/dist/tools/update-skill.js.map +1 -1
- package/package.json +8 -6
- package/siclaw.mjs +10 -1
- package/skills/core/cluster-events/SKILL.md +1 -1
- package/skills/core/deep-investigation/SKILL.md +11 -0
- package/skills/core/deployment-rollout-debug/SKILL.md +1 -1
- package/skills/core/dns-debug/SKILL.md +1 -0
- package/skills/core/meta.json +12 -1
- package/skills/core/networkpolicy-debug/SKILL.md +332 -0
- package/skills/core/node-logs/scripts/get-node-logs.sh +19 -9
- package/skills/core/pod-pending-debug/SKILL.md +1 -0
- package/skills/core/quota-debug/SKILL.md +203 -0
- package/skills/core/service-debug/SKILL.md +1 -0
- package/skills/core/statefulset-debug/SKILL.md +280 -0
- package/skills/core/volcano-diagnose-pod/SKILL.md +196 -0
- package/skills/core/volcano-diagnose-pod/scripts/diagnose-pod.sh +175 -0
- package/skills/core/volcano-gang-scheduling/SKILL.md +299 -0
- package/skills/core/volcano-job-diagnose/SKILL.md +319 -0
- package/skills/core/volcano-job-diagnose/scripts/diagnose-job.sh +253 -0
- package/skills/core/volcano-node-resources/SKILL.md +334 -0
- package/skills/core/volcano-node-resources/scripts/get-node-resources.sh +281 -0
- package/skills/core/volcano-queue-diagnose/SKILL.md +294 -0
- package/skills/core/volcano-queue-diagnose/scripts/diagnose-queue.sh +283 -0
- package/skills/core/volcano-resource-insufficient/SKILL.md +315 -0
- package/skills/core/volcano-scheduler-config/SKILL.md +371 -0
- package/skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh +297 -0
- package/skills/core/volcano-scheduler-logs/SKILL.md +241 -0
- package/skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh +159 -0
- package/skills/platform/create-skill/SKILL.md +35 -3
- package/skills/platform/manage-skill/SKILL.md +9 -2
- package/skills/platform/update-skill/SKILL.md +17 -6
|
@@ -0,0 +1,332 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: networkpolicy-debug
|
|
3
|
+
description: >-
|
|
4
|
+
Diagnose NetworkPolicy-related connectivity issues (traffic unexpectedly blocked, default-deny effects, egress blocking DNS).
|
|
5
|
+
Identifies which NetworkPolicies affect a pod, checks ingress/egress rules, and verifies CNI support.
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# NetworkPolicy Connectivity Diagnosis
|
|
9
|
+
|
|
10
|
+
When pod-to-pod or pod-to-external communication is unexpectedly blocked, and Service/DNS/Ingress diagnostics show no issues, NetworkPolicy is a common root cause. Follow this flow to identify whether a NetworkPolicy is blocking traffic.
|
|
11
|
+
|
|
12
|
+
**Scope:** This skill is for **diagnosis only**. Once you identify the root cause, report it to the user and stop. Do NOT attempt to modify or delete NetworkPolicies — that should be left to the user or cluster administrator.
|
|
13
|
+
|
|
14
|
+
**When to use:** Pod connectivity "suddenly broke" or a newly deployed pod cannot reach other services. Typical clues:
|
|
15
|
+
- `service-debug` shows endpoints exist and ports match, but connections time out
|
|
16
|
+
- `dns-debug` shows DNS timeouts (may be egress NetworkPolicy blocking UDP 53)
|
|
17
|
+
- Traffic works from some pods but not others in the same namespace
|
|
18
|
+
- A new NetworkPolicy was recently applied
|
|
19
|
+
|
|
20
|
+
**Not for other network issues:** If the problem is DNS resolution → use `dns-debug`. If the problem is Service having no endpoints → use `service-debug`. If the problem is Ingress routing → use `ingress-debug`. This skill specifically diagnoses NetworkPolicy-level blocking.
|
|
21
|
+
|
|
22
|
+
## Key Concepts
|
|
23
|
+
|
|
24
|
+
- A NetworkPolicy selects pods via `podSelector` and defines allowed `ingress` (incoming) and/or `egress` (outgoing) traffic rules.
|
|
25
|
+
- **NetworkPolicy is deny-by-default once applied.** If any NetworkPolicy selects a pod for a given direction (ingress or egress), all traffic in that direction is denied EXCEPT what is explicitly allowed by the rules. Pods with NO NetworkPolicy selecting them allow all traffic.
|
|
26
|
+
- Multiple NetworkPolicies selecting the same pod are **additive (union)** — a connection is allowed if ANY matching policy permits it.
|
|
27
|
+
- NetworkPolicy requires **CNI support**. If the CNI plugin does not support NetworkPolicy (e.g., Flannel without additional plugins), policies are silently ignored — they can be created but have no effect.
|
|
28
|
+
- **`hostNetwork: true` pods are exempt.** Pods using the host network namespace are not selected by any NetworkPolicy — neither as targets nor as sources. A default-deny policy does not protect or restrict hostNetwork pods.
|
|
29
|
+
|
|
30
|
+
## Diagnostic Flow
|
|
31
|
+
|
|
32
|
+
### 1. Verify CNI supports NetworkPolicy
|
|
33
|
+
|
|
34
|
+
Not all CNI plugins enforce NetworkPolicy. If the CNI does not support it, policies are silently ignored — they exist as API objects but have no effect.
|
|
35
|
+
|
|
36
|
+
Check which CNI is running:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
kubectl get pods -n kube-system -o custom-columns='NAME:.metadata.name' | grep -E 'calico|cilium|weave|antrea|flannel|canal|kube-router'
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
If no results, the CNI may run in a different namespace (e.g., `cilium` in `cilium` namespace, `calico` in `calico-system`). Check other namespaces or inspect the node's CNI config:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
kubectl get pods -A -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' | grep -E 'calico|cilium|weave|antrea|flannel|canal|kube-router'
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
| CNI | NetworkPolicy support |
|
|
49
|
+
|-----|----------------------|
|
|
50
|
+
| Calico | Yes |
|
|
51
|
+
| Cilium | Yes (also supports extended CiliumNetworkPolicy) |
|
|
52
|
+
| Weave Net | Yes |
|
|
53
|
+
| Antrea | Yes |
|
|
54
|
+
| Canal (Flannel + Calico) | Yes |
|
|
55
|
+
| kube-router | Yes |
|
|
56
|
+
| Flannel (standalone) | **No** — policies are silently ignored |
|
|
57
|
+
| kubenet | **No** |
|
|
58
|
+
|
|
59
|
+
If the CNI does not support NetworkPolicy:
|
|
60
|
+
- Policies exist but do nothing — not the cause of blocked traffic, look elsewhere
|
|
61
|
+
- If the user expects policies to work, they need to switch to a CNI that supports them
|
|
62
|
+
|
|
63
|
+
If the CNI does support NetworkPolicy, continue to step 2.
|
|
64
|
+
|
|
65
|
+
### 2. List NetworkPolicies in the namespace
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
kubectl get networkpolicy -n <ns>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
If no NetworkPolicies exist in the namespace, standard Kubernetes NetworkPolicy is not the cause — all traffic is allowed by default. However, if the CNI is Calico or Cilium, also check for CNI-specific extended policies that operate independently of standard NetworkPolicy:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
# Cilium extended policies
|
|
75
|
+
kubectl get ciliumnetworkpolicy -n <ns> 2>/dev/null
|
|
76
|
+
kubectl get ciliumclusterwidenetworkpolicy 2>/dev/null
|
|
77
|
+
|
|
78
|
+
# Calico extended policies
|
|
79
|
+
kubectl get networkpolicy.crd.projectcalico.org -n <ns> 2>/dev/null
|
|
80
|
+
kubectl get globalnetworkpolicy.crd.projectcalico.org 2>/dev/null
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
These CNI-specific policies can block traffic even when no standard NetworkPolicy exists, and they take effect at a higher priority. If extended policies exist, examine their rules using `-o yaml`.
|
|
84
|
+
|
|
85
|
+
If neither standard nor extended policies exist, look elsewhere (firewall rules, service mesh, node-level iptables).
|
|
86
|
+
|
|
87
|
+
If policies exist (standard or extended), continue to step 3.
|
|
88
|
+
|
|
89
|
+
### 3. Identify which policies affect the target pod
|
|
90
|
+
|
|
91
|
+
Kubernetes does not provide a direct API to query "which policies affect this pod." You must manually match each policy's `podSelector` against the pod's labels.
|
|
92
|
+
|
|
93
|
+
Get the pod's labels:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
kubectl get pod <pod> -n <ns> -o jsonpath='{.metadata.labels}'
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Get all NetworkPolicies with their full pod selectors:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
kubectl get networkpolicy -n <ns> -o custom-columns='NAME:.metadata.name,SELECTOR:.spec.podSelector'
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Note: `podSelector` can use both `matchLabels` (exact key-value pairs) and `matchExpressions` (operators like `In`, `NotIn`, `Exists`, `DoesNotExist`). The command above shows both forms. If the output is truncated, use `-o yaml` to see the full selector.
|
|
106
|
+
|
|
107
|
+
A NetworkPolicy affects the pod if:
|
|
108
|
+
- The policy's `podSelector.matchLabels` matches a **subset** of the pod's labels
|
|
109
|
+
- The policy's `podSelector.matchExpressions` conditions are satisfied by the pod's labels (e.g., `{key: tier, operator: Exists}` matches any pod with a `tier` label)
|
|
110
|
+
- An **empty podSelector** (`{}`) matches ALL pods in the namespace
|
|
111
|
+
|
|
112
|
+
List the matching policies — these are the ones controlling the pod's traffic.
|
|
113
|
+
|
|
114
|
+
### 4. Check for default-deny policies
|
|
115
|
+
|
|
116
|
+
A common pattern is a namespace-wide "deny all" policy:
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
kubectl get networkpolicy -n <ns> -o yaml
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Look for policies with empty ingress or egress rules:
|
|
123
|
+
|
|
124
|
+
**Default deny all ingress:**
|
|
125
|
+
```yaml
|
|
126
|
+
spec:
|
|
127
|
+
podSelector: {} # matches all pods
|
|
128
|
+
policyTypes:
|
|
129
|
+
- Ingress
|
|
130
|
+
# no ingress rules = deny all incoming
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
**Default deny all egress:**
|
|
134
|
+
```yaml
|
|
135
|
+
spec:
|
|
136
|
+
podSelector: {}
|
|
137
|
+
policyTypes:
|
|
138
|
+
- Egress
|
|
139
|
+
# no egress rules = deny all outgoing
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**Default deny both:**
|
|
143
|
+
```yaml
|
|
144
|
+
spec:
|
|
145
|
+
podSelector: {}
|
|
146
|
+
policyTypes:
|
|
147
|
+
- Ingress
|
|
148
|
+
- Egress
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
If a default-deny policy exists, ALL traffic to/from pods in the namespace is blocked unless another NetworkPolicy explicitly allows it.
|
|
152
|
+
|
|
153
|
+
### 5. Determine which directions a policy controls
|
|
154
|
+
|
|
155
|
+
Before diagnosing ingress or egress, first confirm which direction(s) each matching policy actually controls. This depends on the `policyTypes` field:
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
kubectl get networkpolicy <policy-name> -n <ns> -o jsonpath='{.spec.policyTypes}'
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
| `policyTypes` value | Ingress controlled? | Egress controlled? |
|
|
162
|
+
|---------------------|--------------------|--------------------|
|
|
163
|
+
| `[Ingress]` | Yes | No — egress is unrestricted |
|
|
164
|
+
| `[Egress]` | No — ingress is unrestricted | Yes |
|
|
165
|
+
| `[Ingress, Egress]` | Yes | Yes |
|
|
166
|
+
| *Omitted entirely* | Yes (always implied) | Only if `egress` rules exist |
|
|
167
|
+
|
|
168
|
+
**The omitted case is a common trap:** If `policyTypes` is not specified but the policy has `ingress` rules and no `egress` rules, only ingress is controlled — egress remains fully open. If both `ingress` and `egress` rules are present (even empty), both directions are controlled.
|
|
169
|
+
|
|
170
|
+
If the connectivity issue is incoming traffic, focus on policies that control ingress (step 6). If outgoing, focus on egress (step 7). Do not waste time analyzing a direction the policy does not control.
|
|
171
|
+
|
|
172
|
+
### 6. Diagnose blocked ingress (incoming traffic to the pod)
|
|
173
|
+
|
|
174
|
+
If external pods or services cannot reach the target pod, check the ingress rules of all matching policies.
|
|
175
|
+
|
|
176
|
+
For each matching policy:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
kubectl get networkpolicy <policy-name> -n <ns> -o yaml
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Check the `ingress` section. Traffic is allowed if the source matches ANY `from` rule:
|
|
183
|
+
|
|
184
|
+
- **from[].podSelector** — allows traffic from pods with matching labels in the SAME namespace
|
|
185
|
+
- **from[].namespaceSelector** — allows traffic from pods in namespaces with matching labels
|
|
186
|
+
- **from[].podSelector + namespaceSelector** (in same `from` entry) — AND logic: pods must match both selectors
|
|
187
|
+
- **from[].ipBlock** — allows traffic from specific CIDR ranges
|
|
188
|
+
|
|
189
|
+
**Common issue: separate vs combined selectors**
|
|
190
|
+
|
|
191
|
+
```yaml
|
|
192
|
+
# AND logic — pod must be in matching namespace AND have matching labels
|
|
193
|
+
ingress:
|
|
194
|
+
- from:
|
|
195
|
+
- namespaceSelector:
|
|
196
|
+
matchLabels: {env: prod}
|
|
197
|
+
podSelector:
|
|
198
|
+
matchLabels: {role: frontend}
|
|
199
|
+
|
|
200
|
+
# OR logic — ANY pod in matching namespace OR ANY pod with matching labels
|
|
201
|
+
ingress:
|
|
202
|
+
- from:
|
|
203
|
+
- namespaceSelector:
|
|
204
|
+
matchLabels: {env: prod}
|
|
205
|
+
- podSelector:
|
|
206
|
+
matchLabels: {role: frontend}
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
The difference is whether `namespaceSelector` and `podSelector` are in the **same list item** (AND) or **separate list items** (OR). This is a frequent source of misconfiguration.
|
|
210
|
+
|
|
211
|
+
Check if the source pod's labels and namespace match any `from` rule. If not, the ingress is blocked.
|
|
212
|
+
|
|
213
|
+
Also check the `ports` section — if specified, only listed ports/protocols are allowed:
|
|
214
|
+
|
|
215
|
+
```yaml
|
|
216
|
+
ingress:
|
|
217
|
+
- from: [...]
|
|
218
|
+
ports:
|
|
219
|
+
- protocol: TCP
|
|
220
|
+
port: 8080
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
The `port` field can be a number or a **named port** (e.g., `port: http`). If a named port is used, it must match a `containerPort` name defined in the target pod's spec. If the pod does not define that port name, the rule will not match.
|
|
224
|
+
|
|
225
|
+
If the source is connecting on a different port, it will be blocked even if the `from` selector matches.
|
|
226
|
+
|
|
227
|
+
**NodePort / LoadBalancer SNAT issue:**
|
|
228
|
+
|
|
229
|
+
When external traffic enters through a NodePort or LoadBalancer Service, kube-proxy may SNAT the source IP to the node's IP. This means `podSelector` and `namespaceSelector` rules in ingress will NOT match the original client or source pod — they will see the node IP instead.
|
|
230
|
+
|
|
231
|
+
Check the Service's `externalTrafficPolicy`:
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
kubectl get svc <service-name> -n <ns> -o jsonpath='{.spec.externalTrafficPolicy}'
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
- **Cluster** (default) — source IP is SNATed to node IP. Ingress `podSelector`/`namespaceSelector` rules cannot match the original source. Use `ipBlock` with the node CIDR range instead.
|
|
238
|
+
- **Local** — original source IP is preserved, but traffic is only routed to pods on the node that received the request.
|
|
239
|
+
|
|
240
|
+
If the target pod has ingress NetworkPolicy and receives traffic via NodePort/LoadBalancer with `externalTrafficPolicy: Cluster`, `from: podSelector` rules will fail silently — the traffic appears to come from a node IP, not a pod IP.
|
|
241
|
+
|
|
242
|
+
### 7. Diagnose blocked egress (outgoing traffic from the pod)
|
|
243
|
+
|
|
244
|
+
If the pod cannot reach other services or external endpoints, check the egress rules.
|
|
245
|
+
|
|
246
|
+
For each matching policy that includes `Egress` in `policyTypes`:
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
kubectl get networkpolicy <policy-name> -n <ns> -o yaml
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
Check the `egress` section. The same selector logic applies as ingress (podSelector, namespaceSelector, ipBlock).
|
|
253
|
+
|
|
254
|
+
**Critical: DNS egress**
|
|
255
|
+
|
|
256
|
+
If any egress NetworkPolicy is applied to a pod, DNS traffic (UDP/TCP port 53) must be explicitly allowed, otherwise all DNS resolution will fail:
|
|
257
|
+
|
|
258
|
+
```yaml
|
|
259
|
+
egress:
|
|
260
|
+
- to:
|
|
261
|
+
- namespaceSelector:
|
|
262
|
+
matchLabels:
|
|
263
|
+
kubernetes.io/metadata.name: kube-system
|
|
264
|
+
ports:
|
|
265
|
+
- protocol: UDP
|
|
266
|
+
port: 53
|
|
267
|
+
- protocol: TCP
|
|
268
|
+
port: 53
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
Note: The example above targets only `kube-system` where CoreDNS runs. A broader alternative is `namespaceSelector: {}` (matches all namespaces), which is simpler but allows port 53 traffic to any namespace. When diagnosing, check whether ANY rule allows UDP/TCP 53 — the specificity of the namespace selector is a security concern but not a functionality blocker.
|
|
272
|
+
|
|
273
|
+
**Symptoms of blocked DNS egress:**
|
|
274
|
+
- `nslookup` times out from the pod
|
|
275
|
+
- Service names cannot be resolved but IP-based connections work
|
|
276
|
+
- Looks identical to a CoreDNS failure but only affects pods with egress policies
|
|
277
|
+
|
|
278
|
+
If the user reports DNS timeouts and the pod has an egress NetworkPolicy, check DNS port allowance FIRST before investigating CoreDNS with `dns-debug`.
|
|
279
|
+
|
|
280
|
+
**API Server egress**
|
|
281
|
+
|
|
282
|
+
The second most common egress issue after DNS. Pods that need to call the Kubernetes API (operators, controllers, pods using service account tokens) must be able to reach the API server. The API server endpoint is typically outside the pod network, so `podSelector`/`namespaceSelector` rules will not match it — use `ipBlock` instead.
|
|
283
|
+
|
|
284
|
+
Find the API server endpoint:
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
kubectl get endpoints kubernetes -n default
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
Symptoms of blocked API server egress:
|
|
291
|
+
- `kubectl` commands from within the pod time out (but DNS works — service names resolve)
|
|
292
|
+
- Operators or controllers cannot watch or list resources
|
|
293
|
+
- Service account token authentication fails
|
|
294
|
+
- Pod logs show "connection refused" or "i/o timeout" when calling the API
|
|
295
|
+
|
|
296
|
+
The key difference from DNS blocking: with DNS blocked, name resolution itself fails. With API server blocked, names resolve but the TCP connection to the API server times out.
|
|
297
|
+
|
|
298
|
+
### 8. Cross-namespace communication
|
|
299
|
+
|
|
300
|
+
When pods in different namespaces need to communicate, NetworkPolicies on BOTH sides may need to allow the traffic:
|
|
301
|
+
|
|
302
|
+
- The **destination pod's** NetworkPolicy must allow ingress from the source namespace/pod
|
|
303
|
+
- The **source pod's** NetworkPolicy (if it has egress rules) must allow egress to the destination namespace/pod
|
|
304
|
+
|
|
305
|
+
Check both sides:
|
|
306
|
+
|
|
307
|
+
```bash
|
|
308
|
+
# Destination namespace policies
|
|
309
|
+
kubectl get networkpolicy -n <destination-ns>
|
|
310
|
+
|
|
311
|
+
# Source namespace policies
|
|
312
|
+
kubectl get networkpolicy -n <source-ns>
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
For `namespaceSelector` to work, the target namespace must have the referenced labels:
|
|
316
|
+
|
|
317
|
+
```bash
|
|
318
|
+
kubectl get namespace <ns> --show-labels
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
If the namespace lacks the expected labels, the `namespaceSelector` will not match and traffic will be blocked.
|
|
322
|
+
|
|
323
|
+
## Notes
|
|
324
|
+
|
|
325
|
+
- **No policy = allow all.** NetworkPolicy is not deny-by-default at the cluster level. Only pods explicitly selected by at least one NetworkPolicy have restrictions. This means adding the FIRST NetworkPolicy to a namespace can suddenly break existing communication.
|
|
326
|
+
- **Policies are additive.** If policy A allows port 80 and policy B allows port 443 for the same pod, both ports are allowed. Policies never subtract permissions from each other.
|
|
327
|
+
- **`policyTypes` matters.** See step 5 for the full behavior matrix. Misunderstanding which direction a policy controls is a common cause of wasted debugging effort.
|
|
328
|
+
- **CIDR ranges and pod IPs.** Using `ipBlock` with pod CIDR ranges is fragile — pod IPs change. Prefer `podSelector` / `namespaceSelector` for in-cluster traffic. `ipBlock` is best for external IPs. Also check for `except` subnets within `ipBlock` — a rule may allow a broad CIDR (e.g., `10.0.0.0/8`) but exclude a specific subnet (e.g., `except: [10.244.0.0/16]`), causing unexpected blocks for IPs in the excluded range.
|
|
329
|
+
- **Service mesh interaction.** If the cluster runs Istio, Linkerd, or similar service meshes, traffic may be additionally controlled by the mesh's own policies (AuthorizationPolicy, etc.). NetworkPolicy operates at L3/L4, while service mesh policies typically operate at L7.
|
|
330
|
+
- **GPU clusters: multi-NIC / RDMA traffic is NOT affected by NetworkPolicy.** In GPU training clusters, pods typically have multiple network interfaces: a primary NIC (eth0) managed by the CNI, and secondary NICs (net1, etc.) for RDMA/InfiniBand/RoCE provisioned via Multus + SR-IOV or host-device plugin. NetworkPolicy only applies to the **primary CNI-managed interface**. RDMA/NCCL traffic on secondary interfaces bypasses CNI entirely and is invisible to NetworkPolicy. If a training job's GPU-to-GPU communication (NCCL) fails, NetworkPolicy is NOT the cause — investigate the RDMA network instead. If the same pod cannot reach the API server, download data, or resolve DNS, those go through the primary NIC and CAN be blocked by NetworkPolicy.
|
|
331
|
+
- **Quick verification:** To confirm a NetworkPolicy is the cause, test connectivity from a pod in the same namespace that is NOT selected by any NetworkPolicy (or from a different namespace without policies). If the same connection works from the unaffected pod, the NetworkPolicy is confirmed as the blocker.
|
|
332
|
+
- For cross-reference: if DNS is timing out, check egress rules here first, then use `dns-debug`. If Service endpoints exist but connections fail, check ingress rules here, then use `service-debug`.
|
|
@@ -54,18 +54,28 @@ if [[ -n "$UNIT" && -n "$FILE" ]]; then
|
|
|
54
54
|
exit 1
|
|
55
55
|
fi
|
|
56
56
|
|
|
57
|
-
#
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
else
|
|
62
|
-
CMD="cat '$FILE'"
|
|
57
|
+
# Validate --tail is a positive integer
|
|
58
|
+
if ! [[ "$TAIL" =~ ^[1-9][0-9]*$ ]]; then
|
|
59
|
+
echo "Error: --tail must be a positive integer (>0), got: $TAIL" >&2
|
|
60
|
+
exit 1
|
|
63
61
|
fi
|
|
64
62
|
|
|
65
|
-
|
|
66
|
-
|
|
63
|
+
# Execute using native bash pipelines — no sh -c string interpolation.
|
|
64
|
+
fetch_logs() {
|
|
65
|
+
if [[ -n "$UNIT" ]]; then
|
|
66
|
+
journalctl -u "$UNIT" --since "$SINCE" --no-pager 2>&1
|
|
67
|
+
else
|
|
68
|
+
cat -- "$FILE" 2>&1
|
|
69
|
+
fi
|
|
70
|
+
}
|
|
67
71
|
|
|
68
|
-
OUTPUT=$(
|
|
72
|
+
OUTPUT=$(
|
|
73
|
+
if [[ -n "$GREP" ]]; then
|
|
74
|
+
fetch_logs | grep -i -- "$GREP" | tail -n "$TAIL"
|
|
75
|
+
else
|
|
76
|
+
fetch_logs | tail -n "$TAIL"
|
|
77
|
+
fi
|
|
78
|
+
) || true
|
|
69
79
|
|
|
70
80
|
if [[ -z "$OUTPUT" ]]; then
|
|
71
81
|
if [[ -n "$UNIT" ]]; then
|
|
@@ -144,3 +144,4 @@ The scheduler is attempting to evict lower-priority pods to make room. This is n
|
|
|
144
144
|
|
|
145
145
|
- If no `FailedScheduling` event exists, the pod may not have been processed by the scheduler yet — check if the scheduler pod itself is healthy: `kubectl get pods -n kube-system -l component=kube-scheduler`.
|
|
146
146
|
- For pods created by controllers (Deployment, StatefulSet), the pending pod name may change as the controller recreates it — use label selectors to find the current pending pod.
|
|
147
|
+
- If the pod has a `scheduling.volcano.sh/pod-group` annotation, it is managed by Volcano scheduler — use `volcano-diagnose-pod` skill instead for Volcano-specific issues (PodGroup, Queue, Gang scheduling).
|
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: quota-debug
|
|
3
|
+
description: >-
|
|
4
|
+
Diagnose Kubernetes native ResourceQuota and LimitRange admission rejections (exceeded quota, forbidden by LimitRange, FailedCreate).
|
|
5
|
+
Checks namespace quotas, current usage, LimitRange constraints, and ReplicaSet events to identify why pods cannot be created.
|
|
6
|
+
Not applicable to Volcano Queue — use volcano-queue-diagnose for gang scheduling clusters.
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# ResourceQuota & LimitRange Admission Diagnosis
|
|
10
|
+
|
|
11
|
+
When pods fail to be created due to Kubernetes native namespace-level resource constraints — ResourceQuota exceeded or LimitRange violations — follow this flow to identify the root cause.
|
|
12
|
+
|
|
13
|
+
**Scope:** This skill is for **diagnosis only**. Once you identify the root cause, report it to the user and stop. Do NOT attempt to modify ResourceQuota, LimitRange, or pod specs — that should be left to the user or cluster administrator.
|
|
14
|
+
|
|
15
|
+
**When to use:** Pods are not being created at all (not Pending, not CrashLoopBackOff — simply missing). Typical trigger: a ReplicaSet or Job shows `FailedCreate` events mentioning `exceeded quota` or `forbidden: ... LimitRange`.
|
|
16
|
+
|
|
17
|
+
**Not applicable to Volcano Queue:** If the cluster uses Volcano for gang scheduling, resource quotas are managed by Volcano Queue, not Kubernetes native ResourceQuota. Use the `volcano-queue-diagnose` skill instead. To check:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
kubectl get queue 2>/dev/null
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
If this command returns results (Queue resources listed), the cluster uses Volcano — use `volcano-queue-diagnose`. If it returns nothing or an error, Volcano is not installed — continue with this skill.
|
|
24
|
+
|
|
25
|
+
## Diagnostic Flow
|
|
26
|
+
|
|
27
|
+
### 1. Identify the creation failure
|
|
28
|
+
|
|
29
|
+
If the user reports a Deployment not progressing or pods not appearing, first find the controller that owns the pods:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
kubectl get rs -n <ns> --sort-by='.metadata.creationTimestamp' | grep <deployment-name>
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Then check the ReplicaSet events for creation failures:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
kubectl describe rs <new-rs> -n <ns>
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Look for events with reason `FailedCreate`. The event message reveals whether it is a ResourceQuota or LimitRange rejection.
|
|
42
|
+
|
|
43
|
+
If the user already has a specific error message, skip to step 2.
|
|
44
|
+
|
|
45
|
+
### 2. Match the rejection type
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
#### `exceeded quota` — ResourceQuota exhausted
|
|
50
|
+
|
|
51
|
+
The namespace has a ResourceQuota and creating the pod would exceed the allowed limits.
|
|
52
|
+
|
|
53
|
+
Check current quota usage:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
kubectl get resourcequota -n <ns>
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
For detailed usage breakdown:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
kubectl describe resourcequota -n <ns>
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Compare the `Used` vs `Hard` columns. Common quota dimensions:
|
|
66
|
+
- **requests.cpu / requests.memory** — total CPU/memory requests across all pods in the namespace
|
|
67
|
+
- **limits.cpu / limits.memory** — total CPU/memory limits
|
|
68
|
+
- **pods** — maximum number of pods allowed in the namespace
|
|
69
|
+
- **count/deployments.apps, count/services** — object count limits
|
|
70
|
+
- **requests.nvidia.com/gpu** — total GPU requests (common in GPU-scheduled clusters)
|
|
71
|
+
- **requests.storage** — total PVC storage requested
|
|
72
|
+
- **persistentvolumeclaims** — number of PVCs
|
|
73
|
+
|
|
74
|
+
Then check what the pod is requesting:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
kubectl get pod -n <ns> -l app=<name> -o jsonpath='{range .items[0].spec.containers[*]}{.name}{"\t"}{.resources}{"\n"}{end}' 2>/dev/null
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
If no pods exist yet (all creation failed), check the Deployment or template spec:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
kubectl get deployment <name> -n <ns> -o jsonpath='{range .spec.template.spec.containers[*]}{.name}{"\t"}{.resources}{"\n"}{end}'
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Root cause analysis:**
|
|
87
|
+
- If `Used` is near `Hard` for cpu/memory: existing pods are consuming most of the quota — need to scale down other workloads or increase quota
|
|
88
|
+
- If `pods` count is at the limit: too many pods in namespace — clean up or increase quota
|
|
89
|
+
- If the pod's resource requests are very large: consider reducing requests to fit within remaining quota
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
#### `forbidden: ... minimum cpu/memory` — LimitRange minimum violation
|
|
94
|
+
|
|
95
|
+
The pod's container does not meet the minimum resource request required by the namespace's LimitRange.
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
kubectl get limitrange -n <ns>
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
kubectl describe limitrange -n <ns>
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Check the `Min` column for Container type. If a container does not specify resource requests, or its requests are below the minimum, admission will be rejected.
|
|
106
|
+
|
|
107
|
+
Compare with the pod's resource spec. If the container has no resource requests at all, the LimitRange default will be applied — but if the LimitRange has `min` without `default`, admission fails.
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
#### `forbidden: ... maximum cpu/memory` — LimitRange maximum violation
|
|
112
|
+
|
|
113
|
+
The pod's container exceeds the maximum resource limit allowed by the LimitRange.
|
|
114
|
+
|
|
115
|
+
Check LimitRange as above and compare the `Max` column with the container's resource requests/limits.
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
#### `forbidden: ... maxLimitRequestRatio` — LimitRange ratio violation
|
|
120
|
+
|
|
121
|
+
The ratio between the container's resource limit and request exceeds the allowed ratio (e.g., limit is 10x the request, but ratio cap is 3x).
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
kubectl describe limitrange -n <ns>
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Check the `Max Limit/Request Ratio` column. Then compare the pod's limits vs requests:
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
kubectl get deployment <name> -n <ns> -o jsonpath='{range .spec.template.spec.containers[*]}{.name}{"\t requests:"}{.resources.requests}{"\t limits:"}{.resources.limits}{"\n"}{end}'
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
#### `forbidden: ... no resources specified` — Missing required resources
|
|
136
|
+
|
|
137
|
+
The LimitRange requires resource specifications, but the container has none and no defaults are configured.
|
|
138
|
+
|
|
139
|
+
Check if the LimitRange has `Default` and `DefaultRequest` values:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
kubectl describe limitrange -n <ns>
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
If `Default` and `DefaultRequest` are empty but `Min` or `Max` are set, containers MUST explicitly specify resources.
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
#### Pod type LimitRange — min/max per pod
|
|
150
|
+
|
|
151
|
+
LimitRange can also enforce constraints at the Pod level (sum of all containers). Check if the LimitRange has a `Pod` type entry:
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
kubectl get limitrange -n <ns> -o jsonpath='{range .items[*].spec.limits[*]}{.type}{"\t min:"}{.min}{"\t max:"}{.max}{"\n"}{end}'
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
If the sum of all container resources exceeds the Pod-level max, admission is rejected.
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
#### `exceeded quota` for storage — PVC quota exhausted
|
|
162
|
+
|
|
163
|
+
If the error mentions `requests.storage` or `persistentvolumeclaims`:
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
kubectl describe resourcequota -n <ns>
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
Check the storage-related rows. Also check if there are per-StorageClass quotas:
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
kubectl get resourcequota -n <ns> -o yaml
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Look for keys like `<storageclass>.storageclass.storage.k8s.io/requests.storage`.
|
|
176
|
+
|
|
177
|
+
### 3. Check for multiple constraints
|
|
178
|
+
|
|
179
|
+
A namespace can have multiple ResourceQuotas and LimitRanges. Always check for all of them:
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
kubectl get resourcequota,limitrange -n <ns>
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
All ResourceQuotas must be satisfied (intersection). The most restrictive LimitRange applies.
|
|
186
|
+
|
|
187
|
+
### 4. Verify scoped quotas
|
|
188
|
+
|
|
189
|
+
ResourceQuotas can be scoped to specific priority classes or pod phases:
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
kubectl get resourcequota -n <ns> -o yaml
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Look for `spec.scopes` or `spec.scopeSelector`. A scoped quota only applies to pods matching the scope (e.g., `PriorityClass=high`). If the user's pod has a specific priority class, it may hit a scoped quota while the general quota still has capacity.
|
|
196
|
+
|
|
197
|
+
## Notes
|
|
198
|
+
|
|
199
|
+
- ResourceQuota admission happens **before scheduling**. A pod rejected by quota will never appear in `kubectl get pods` — look at the controller (ReplicaSet, Job) events instead.
|
|
200
|
+
- When a namespace has a ResourceQuota for compute resources (cpu/memory), **every container must specify requests/limits** for those resources, otherwise admission is rejected. This catches users who are used to running without resource specs.
|
|
201
|
+
- LimitRange can automatically inject default requests/limits into containers that don't specify them. Check if defaults are configured before telling users to add explicit resource specs.
|
|
202
|
+
- For cross-reference: if the pod IS created but stuck in Pending, use the `pod-pending-debug` skill instead — that covers scheduling failures (node resources, taints, affinity).
|
|
203
|
+
- `kubectl top pods -n <ns>` shows actual resource usage, while quota tracks **requested** resources. A namespace can hit quota limits even if actual usage is low. Note: `kubectl top` requires metrics-server to be installed — if it returns an error, skip it and rely on quota `Used` values instead.
|
|
@@ -162,3 +162,4 @@ If clients are getting `connection refused`, verify the individual pod IPs are c
|
|
|
162
162
|
- For services using `sessionAffinity: ClientIP`, connections from the same source IP are routed to the same pod — if that pod becomes unhealthy, the session sticks to it until the timeout.
|
|
163
163
|
- `EndpointSlices` (default in K8s 1.21+) replace Endpoints for large-scale services. You can check them with: `kubectl get endpointslices -n <ns> -l kubernetes.io/service-name=<service>`.
|
|
164
164
|
- If the cluster uses a service mesh (Istio, Linkerd), traffic routing may be controlled by the mesh — check the mesh's VirtualService or ServiceProfile resources.
|
|
165
|
+
- If endpoints exist and ports match but connections still time out, a NetworkPolicy may be blocking traffic — use `networkpolicy-debug` to check.
|