@zerry_jin/k8s-doctor-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 zerry
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.ko.md ADDED
@@ -0,0 +1,330 @@
1
+ # 🏥 K8s Doctor MCP
2
+
3
+ > AI 기반 Kubernetes 클러스터 진단 및 지능형 디버깅 추천 시스템
4
+
5
+ [![npm version](https://img.shields.io/npm/v/@zerry_jin/k8s-doctor-mcp)](https://www.npmjs.com/package/@zerry_jin/k8s-doctor-mcp)
6
+ [![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
7
+ [![Node](https://img.shields.io/badge/node-%3E%3D18-green)](https://nodejs.org)
8
+ [![Kubernetes](https://img.shields.io/badge/kubernetes-1.20%2B-blue)](https://kubernetes.io)
9
+
10
+ **[English](README.md)** | **[한국어](#한국어)**
11
+
12
+ ## 데모
13
+
14
+ <!-- 여기에 데모 GIF 추가 -->
15
+ ![K8s Doctor 데모](./docs/demo.gif)
16
+
17
+ ## 왜 K8s Doctor가 필요한가요?
18
+
19
+ 쿠버네티스 이슈가 발생하면 개발자들은 보통 이런 무한루프에 빠집니다:
20
+ - `kubectl get pods`
21
+ - `kubectl logs`
22
+ - `kubectl describe`
23
+ - 구글링, 스택오버플로우 검색...
24
+
25
+ **K8s Doctor가 게임체인저입니다.** 단순한 kubectl 래퍼가 아니라 AI 기반 진단 도구로:
26
+
27
+ - 🔍 **근본 원인 분석** - 단순 상태 체크를 넘어선 분석
28
+ - 🧠 **에러 패턴 감지** - 흔한 이슈 자동 인식 (Connection Refused, OOM, DNS 실패 등)
29
+ - 💡 **실행 가능한 해결책 제공** - 정확한 kubectl 명령어까지 알려줌
30
+ - 📊 **Exit code 분석** - exit 137, 143, 1이 무슨 의미인지 설명
31
+ - 🎯 **로그 패턴 매칭** - 수천 줄 로그에서 핵심만 추출
32
+ - 🏥 **건강도 점수** - 파드/클러스터 건강도를 0-100점으로 평가
33
+
34
+ ## 주요 기능
35
+
36
+ | 도구 | 설명 |
37
+ |------|------|
38
+ | `diagnose-pod` | **파드 종합 진단** - 상태, 이벤트, 리소스 분석 및 건강도 점수 제공 |
39
+ | `debug-crashloop` | **CrashLoopBackOff 전문가** - exit code 해석, 로그 분석, 근본 원인 파악 |
40
+ | `analyze-logs` | **스마트 로그 분석** - 에러 패턴 감지, 흔한 문제 해결책 제안 |
41
+ | `check-resources` | **리소스 사용량** - CPU/Memory limit 확인, OOM 위험 경고 |
42
+ | `full-diagnosis` | **클러스터 건강 체크** - 모든 노드와 파드 스캔 |
43
+ | `check-events` | **이벤트 분석** - Warning 이벤트 필터링 및 분석 |
44
+ | `list-namespaces` | **네임스페이스 목록** - 모든 네임스페이스 빠른 조회 |
45
+ | `list-pods` | **파드 목록** - 문제가 있는 파드 상태 표시 |
46
+
47
+ ## 설치
48
+
49
+ ### npm으로 설치 (권장)
50
+
51
+ ```bash
52
+ npm install -g @zerry_jin/k8s-doctor-mcp
53
+ ```
54
+
55
+ ### 소스에서 빌드
56
+
57
+ ```bash
58
+ git clone https://github.com/ongjin/k8s-doctor-mcp.git
59
+ cd k8s-doctor-mcp
60
+ npm install && npm run build
61
+ ```
62
+
63
+ ## Claude Code에 등록
64
+
65
+ ```bash
66
+ # npm 전역 설치 후
67
+ claude mcp add k8s-doctor -- k8s-doctor-mcp
68
+
69
+ # 또는 소스에서 빌드한 경우
70
+ claude mcp add k8s-doctor -- node /path/to/k8s-doctor-mcp/dist/index.js
71
+ ```
72
+
73
+ ## 빠른 설정 (권장)
74
+
75
+ 매번 도구 사용 승인을 누르는 것이 번거롭다면, 아래 방법으로 자동 허용을 설정하세요.
76
+
77
+ ### 🖥️ For Claude Desktop App Users
78
+ 1. Claude 앱을 재시작합니다.
79
+ 2. `k8s-doctor`를 사용하는 첫 번째 질문을 던집니다.
80
+ 3. 알림창이 뜨면 **"Always allow requests from this server"** 체크박스를 클릭하고 **Allow**를 누르세요.
81
+ (이후에는 묻지 않고 실행됩니다.)
82
+
83
+ ### ⌨️ For Claude Code (CLI) Users
84
+ 터미널 환경(`claude` 명령어)을 사용 중이라면 권한 관리 명령어를 사용하세요.
85
+
86
+ 1. 터미널에서 `claude`를 실행합니다.
87
+ 2. 프롬프트 입력창에 `/permissions`를 입력하고 엔터를 칩니다.
88
+ 3. **Global Permissions** 또는 **Project Permissions** 메뉴가 나오면 `Allowed Tools`를 선택합니다.
89
+ 4. `mcp__k8s-doctor__*` 를 입력하여 모든 도구를 허용하거나, 필요한 도구만 개별 등록합니다.
90
+
91
+ > 💡 **Tip**: 대부분의 경우 `diagnose-pod`, `debug-crashloop`, `analyze-logs` 세 가지만 허용하면 충분합니다. 이 세 도구로 90%의 디버깅 시나리오를 커버합니다.
92
+
93
+ **권장 설정:**
94
+ ```bash
95
+ # 균형잡힌 접근 - 주요 진단 도구 허용
96
+ claude config add allowedTools \
97
+ "mcp__k8s-doctor__diagnose-pod" \
98
+ "mcp__k8s-doctor__debug-crashloop" \
99
+ "mcp__k8s-doctor__analyze-logs" \
100
+ "mcp__k8s-doctor__full-diagnosis"
101
+ ```
102
+
103
+ ## 필수 조건
104
+
105
+ - **kubectl** 설정 및 작동 확인 (`kubectl cluster-info` 성공해야 함)
106
+ - **kubeconfig** 파일이 기본 위치(`~/.kube/config`)에 있거나 `KUBECONFIG` 환경변수 설정
107
+ - **Node.js** 18 이상
108
+ - Kubernetes 클러스터 접근 권한 (로컬 minikube/kind 또는 원격)
109
+
110
+ ## 사용 예제
111
+
112
+ ### 예제 1: CrashLooping 파드 진단
113
+
114
+ ```
115
+ 사용자: "production 네임스페이스의 'api-server' 파드가 CrashLoop 상태인데 왜 그런거야?"
116
+
117
+ Claude (k8s-doctor 사용):
118
+ 🔍 CrashLoopBackOff 진단
119
+
120
+ Exit Code: 137 (OOM Killed)
121
+ 근본 원인: 메모리 부족으로 컨테이너가 강제 종료되었습니다
122
+
123
+ 해결 방법:
124
+ 메모리 limit을 늘리세요:
125
+ ```yaml
126
+ resources:
127
+ limits:
128
+ memory: "512Mi" # 현재 값보다 높게 설정
129
+ ```
130
+
131
+ 관련 로그:
132
+ - 라인 1234: Error: JavaScript heap out of memory
133
+ - 라인 1256: FATAL ERROR: Reached heap limit
134
+ ```
135
+
136
+ ### 예제 2: 애플리케이션 로그 분석
137
+
138
+ ```
139
+ 사용자: "'backend-worker' 파드 로그를 분석해서 뭐가 실패하는지 알려줘"
140
+
141
+ Claude (analyze-logs 사용):
142
+ 📝 로그 분석 결과
143
+
144
+ 감지된 에러 패턴:
145
+
146
+ 🔴 Database Connection Error (15회 발생)
147
+ 가능한 원인:
148
+ - DB 서비스가 준비되지 않음
149
+ - 잘못된 연결 문자열
150
+ - 인증 실패
151
+
152
+ 해결 방법:
153
+ - DB Pod 상태 확인
154
+ - 환경변수 확인 (ConfigMap/Secret)
155
+ - 서비스 엔드포인트 확인: kubectl get endpoints
156
+
157
+ 🟡 Timeout (8회 발생)
158
+ 가능한 원인: 응답 시간이 너무 길거나 네트워크 지연
159
+ 해결책: 타임아웃 값을 늘리거나 서비스 성능 최적화
160
+ ```
161
+
162
+ ### 예제 3: 클러스터 전체 건강 체크
163
+
164
+ ```
165
+ 사용자: "클러스터 전체 건강 상태 확인해줘"
166
+
167
+ Claude (full-diagnosis 사용):
168
+ 🏥 클러스터 건강 진단
169
+
170
+ 전체 점수: 72/100 💛
171
+
172
+ 노드: 3/3 Ready ✅
173
+ 파드: 45/52 Running
174
+ - CrashLoop: 2개 🔥
175
+ - Pending: 5개 ⏳
176
+
177
+ Critical 이슈:
178
+ 🔴 파드 "payment-service" CrashLooping (exit 1)
179
+ 🔴 파드 "worker-3" OOM Killed
180
+
181
+ 권장사항:
182
+ - 2개 CrashLoop 파드를 즉시 수정하세요
183
+ - Pending 파드들의 리소스 부족 여부 확인
184
+ ```
185
+
186
+ ## 작동 원리
187
+
188
+ 1. **클러스터 연결** - kubeconfig를 통해 연결 (kubectl과 동일)
189
+ 2. **종합 데이터 수집** - 파드 상태, 이벤트, 로그, 리소스 사용량
190
+ 3. **패턴 매칭 적용** - 실전 경험을 바탕으로 한 일반적인 에러 패턴 인식
191
+ 4. **근본 원인 분석** - 단순히 상태만 보여주는게 아니라 WHY(왜) 실패했는지 설명
192
+ 5. **해결책 제공** - 정확한 명령어와 YAML로 수정 방법 제시
193
+
194
+ ## 감지하는 에러 패턴
195
+
196
+ K8s Doctor가 인식하는 일반적인 패턴들:
197
+
198
+ - 🔴 **Connection Refused** - 서비스 준비 안됨, 잘못된 포트, 네트워크 정책
199
+ - 🔴 **Database Connection Errors** - DB 인증, 잘못된 연결 문자열
200
+ - 🔴 **Out of Memory** - OOM kill, 메모리 누수, 부족한 limit
201
+ - 🟠 **File Not Found** - ConfigMap 미마운트, 잘못된 경로
202
+ - 🟠 **Permission Denied** - SecurityContext 문제, fsGroup 이슈
203
+ - 🟠 **DNS Resolution Failed** - CoreDNS 문제, 잘못된 서비스명
204
+ - 🟡 **Port Already in Use** - 같은 포트의 여러 프로세스
205
+ - 🟡 **Timeout** - 느린 응답, 네트워크 지연
206
+ - 🟡 **SSL/TLS Errors** - 만료된 인증서, CA bundle 누락
207
+
208
+ ## 아키텍처
209
+
210
+ ```
211
+ k8s-doctor-mcp/
212
+ ├── src/
213
+ │ ├── index.ts # MCP 서버 (모든 도구)
214
+ │ ├── types.ts # TypeScript 타입 정의
215
+ │ ├── diagnostics/
216
+ │ │ ├── pod-diagnostics.ts # 파드 건강 분석
217
+ │ │ └── cluster-health.ts # 클러스터 전체 진단
218
+ │ ├── analyzers/
219
+ │ │ └── log-analyzer.ts # 스마트 로그 패턴 매칭
220
+ │ └── utils/
221
+ │ ├── k8s-client.ts # Kubernetes API 클라이언트
222
+ │ └── formatters.ts # 출력 포맷팅 유틸
223
+ └── package.json
224
+ ```
225
+
226
+ ## 보안 고려사항
227
+
228
+ - K8s Doctor는 **읽기 전용** Kubernetes API만 사용 (list, get, describe)
229
+ - `kubectl get/describe/logs`와 동일한 권한 필요
230
+ - 클러스터 상태를 절대 변경하지 않음
231
+ - kubeconfig 자격증명은 로컬에만 유지
232
+ - 외부 서버로 데이터 전송 안함
233
+
234
+ ## 문제 해결
235
+
236
+ ### "kubeconfig를 찾을 수 없습니다"
237
+ ```bash
238
+ # kubectl 작동 확인
239
+ kubectl cluster-info
240
+
241
+ # kubeconfig 위치 확인
242
+ echo $KUBECONFIG
243
+
244
+ # 명시적 경로로 테스트
245
+ export KUBECONFIG=~/.kube/config
246
+ ```
247
+
248
+ ### "Permission denied"
249
+ ```bash
250
+ # 클러스터 권한 확인
251
+ kubectl auth can-i get pods --all-namespaces
252
+
253
+ # 최소한 다음에 대한 읽기 권한 필요:
254
+ # - pods, events, namespaces, nodes
255
+ ```
256
+
257
+ ### "Connection refused to cluster"
258
+ ```bash
259
+ # 클러스터 연결 확인
260
+ kubectl get nodes
261
+
262
+ # 로컬 클러스터의 경우 (minikube/kind)
263
+ minikube status
264
+ kind get clusters
265
+ ```
266
+
267
+ ## 개발
268
+
269
+ ```bash
270
+ # 클론 및 설치
271
+ git clone https://github.com/ongjin/k8s-doctor-mcp.git
272
+ cd k8s-doctor-mcp
273
+ npm install
274
+
275
+ # 개발 모드
276
+ npm run dev
277
+
278
+ # 빌드
279
+ npm run build
280
+
281
+ # Claude Code로 테스트
282
+ npm run build
283
+ claude mcp add k8s-doctor-dev -- node $(pwd)/dist/index.js
284
+ ```
285
+
286
+ ## 기여
287
+
288
+ 기여를 환영합니다! 특히:
289
+
290
+ - 🆕 새로운 에러 패턴 감지
291
+ - 🌍 국제화 (더 많은 언어)
292
+ - 📊 메트릭 통합 (Prometheus 등)
293
+ - 🧪 테스트 커버리지
294
+ - 📖 문서 개선
295
+
296
+ ## 로드맵
297
+
298
+ - [ ] Metrics Server 통합 (실시간 CPU/Memory 사용량)
299
+ - [ ] 네트워크 정책 진단
300
+ - [ ] 스토리지/PVC 문제 해결
301
+ - [ ] Helm 차트 분석
302
+ - [ ] 멀티 클러스터 지원
303
+ - [ ] 대화형 디버깅 모드
304
+ - [ ] 리포트 내보내기 (PDF, HTML)
305
+
306
+ ## 라이선스
307
+
308
+ MIT © [zerry](https://github.com/ongjin)
309
+
310
+ ## 감사의 말
311
+
312
+ 다음 기술로 만들어졌습니다:
313
+ - [@modelcontextprotocol/sdk](https://github.com/anthropics/mcp) - Model Context Protocol
314
+ - [@kubernetes/client-node](https://github.com/kubernetes-client/javascript) - Kubernetes JavaScript Client
315
+ - [Claude Code](https://claude.com/claude-code) - AI 기반 개발 도구
316
+
317
+ ## 스타 히스토리
318
+
319
+ 이 도구가 디버깅 시간을 절약해줬다면 ⭐ 스타 부탁드립니다!
320
+
321
+ ## 작성자
322
+
323
+ **zerry**
324
+
325
+ - GitHub: [@zerry](https://github.com/ongjin)
326
+ - kubectl 지옥에 지친 DevOps 커뮤니티를 위해 만들었습니다 😅
327
+
328
+ ---
329
+
330
+ **로그에 빠진 Kubernetes 사용자들을 위해 ❤️로 만들었습니다**
package/README.md ADDED
@@ -0,0 +1,330 @@
1
+ # 🏥 K8s Doctor MCP
2
+
3
+ > AI-powered Kubernetes cluster diagnostics and intelligent debugging recommendations
4
+
5
+ [![npm version](https://img.shields.io/npm/v/@zerry_jin/k8s-doctor-mcp)](https://www.npmjs.com/package/@zerry_jin/k8s-doctor-mcp)
6
+ [![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
7
+ [![Node](https://img.shields.io/badge/node-%3E%3D18-green)](https://nodejs.org)
8
+ [![Kubernetes](https://img.shields.io/badge/kubernetes-1.20%2B-blue)](https://kubernetes.io)
9
+
10
+ **[English](#english)** | **[한국어](README.ko.md)**
11
+
12
+ ## Demo
13
+
14
+ <!-- Add your demo GIF here -->
15
+ ![K8s Doctor Demo](./docs/demo.gif)
16
+
17
+ ## Why K8s Doctor?
18
+
19
+ When a Kubernetes issue strikes, developers typically run through an endless loop of:
20
+ - `kubectl get pods`
21
+ - `kubectl logs`
22
+ - `kubectl describe`
23
+ - Frantically searching StackOverflow...
24
+
25
+ **K8s Doctor changes the game.** It's not just a kubectl wrapper - it's an AI-powered diagnostic tool that:
26
+
27
+ - 🔍 **Analyzes root causes** - Goes beyond simple status checks
28
+ - 🧠 **Detects error patterns** - Recognizes common issues (Connection Refused, OOM, DNS failures)
29
+ - 💡 **Provides actionable solutions** - Gives you exact kubectl commands to fix problems
30
+ - 📊 **Exit code analysis** - Explains what exit 137, 143, 1 actually mean
31
+ - 🎯 **Log pattern matching** - Finds the signal in thousands of log lines
32
+ - 🏥 **Health scoring** - Rates your pod/cluster health 0-100
33
+
34
+ ## Features
35
+
36
+ | Tool | Description |
37
+ |------|-------------|
38
+ | `diagnose-pod` | **Comprehensive pod diagnostics** - analyzes status, events, resources, and provides health score |
39
+ | `debug-crashloop` | **CrashLoopBackOff specialist** - decodes exit codes, analyzes logs, finds root cause |
40
+ | `analyze-logs` | **Smart log analysis** - detects error patterns, suggests fixes for common issues |
41
+ | `check-resources` | **Resource usage** - validates CPU/Memory limits, warns about OOM risks |
42
+ | `full-diagnosis` | **Cluster health check** - scans all nodes and pods for issues |
43
+ | `check-events` | **Event analysis** - filters and analyzes Warning events |
44
+ | `list-namespaces` | **Namespace listing** - quick overview of all namespaces |
45
+ | `list-pods` | **Pod listing** - shows problematic pods with status indicators |
46
+
47
+ ## Installation
48
+
49
+ ### Via npm (recommended)
50
+
51
+ ```bash
52
+ npm install -g @zerry_jin/k8s-doctor-mcp
53
+ ```
54
+
55
+ ### From source
56
+
57
+ ```bash
58
+ git clone https://github.com/ongjin/k8s-doctor-mcp.git
59
+ cd k8s-doctor-mcp
60
+ npm install && npm run build
61
+ ```
62
+
63
+ ## Setup with Claude Code
64
+
65
+ ```bash
66
+ # After npm global install
67
+ claude mcp add k8s-doctor -- k8s-doctor-mcp
68
+
69
+ # Or from source build
70
+ claude mcp add k8s-doctor -- node /path/to/k8s-doctor-mcp/dist/index.js
71
+ ```
72
+
73
+ ## Quick Setup (Auto-approve Tools)
74
+
75
+ Tired of manually approving tool execution every time? Follow these steps to enable auto-approval.
76
+
77
+ ### 🖥️ For Claude Desktop App Users
78
+ 1. Restart the Claude Desktop App.
79
+ 2. Ask your first question using `k8s-doctor`.
80
+ 3. When the permission dialog appears, check the box **"Always allow requests from this server"** and click **Allow**.
81
+ (Future requests will execute automatically without prompts.)
82
+
83
+ ### ⌨️ For Claude Code (CLI) Users
84
+ If you are using the `claude` terminal command, manage permissions via the interactive menu:
85
+
86
+ 1. Run `claude` in your terminal.
87
+ 2. Type `/permissions` in the prompt and press Enter.
88
+ 3. Select **Global Permissions** (or Project Permissions) > **Allowed Tools**.
89
+ 4. Enter `mcp__k8s-doctor__*` to allow all tools, or add specific tools individually.
90
+
91
+ > 💡 **Tip**: For most use cases, allowing `diagnose-pod`, `debug-crashloop`, and `analyze-logs` is sufficient. These three cover 90% of debugging scenarios.
92
+
93
+ **Recommended configuration:**
94
+ ```bash
95
+ # Balanced approach - allow main diagnostic tools
96
+ claude config add allowedTools \
97
+ "mcp__k8s-doctor__diagnose-pod" \
98
+ "mcp__k8s-doctor__debug-crashloop" \
99
+ "mcp__k8s-doctor__analyze-logs" \
100
+ "mcp__k8s-doctor__full-diagnosis"
101
+ ```
102
+
103
+ ## Prerequisites
104
+
105
+ - **kubectl** configured and working (`kubectl cluster-info` should succeed)
106
+ - **kubeconfig** file in default location (`~/.kube/config`) or `KUBECONFIG` env var set
107
+ - **Node.js** 18 or higher
108
+ - Access to a Kubernetes cluster (local like minikube/kind, or remote)
109
+
110
+ ## Usage Examples
111
+
112
+ ### Example 1: Diagnose a CrashLooping Pod
113
+
114
+ ```
115
+ You: "My pod 'api-server' in namespace 'production' is CrashLooping. What's wrong?"
116
+
117
+ Claude (using k8s-doctor):
118
+ 🔍 CrashLoopBackOff 진단
119
+
120
+ Exit Code: 137 (OOM Killed)
121
+ Root Cause: Container was killed due to Out Of Memory
122
+
123
+ Solution:
124
+ Increase memory limit:
125
+ ```yaml
126
+ resources:
127
+ limits:
128
+ memory: "512Mi" # Increase from current value
129
+ ```
130
+
131
+ Relevant logs:
132
+ - Line 1234: Error: JavaScript heap out of memory
133
+ - Line 1256: FATAL ERROR: Reached heap limit
134
+ ```
135
+
136
+ ### Example 2: Analyze Application Logs
137
+
138
+ ```
139
+ You: "Analyze logs for pod 'backend-worker' and tell me what's failing"
140
+
141
+ Claude (using analyze-logs):
142
+ 📝 Log Analysis
143
+
144
+ Detected Error Patterns:
145
+
146
+ 🔴 Database Connection Error (15 occurrences)
147
+ Possible Causes:
148
+ - DB service not ready
149
+ - Wrong connection string
150
+ - Authentication failed
151
+
152
+ Solutions:
153
+ - Check DB pod status
154
+ - Verify environment variables (ConfigMap/Secret)
155
+ - Check service endpoints: kubectl get endpoints
156
+
157
+ 🟡 Timeout (8 occurrences)
158
+ Likely cause: Response time too slow or network delay
159
+ Solution: Increase timeout values or optimize service performance
160
+ ```
161
+
162
+ ### Example 3: Cluster Health Check
163
+
164
+ ```
165
+ You: "Check overall cluster health"
166
+
167
+ Claude (using full-diagnosis):
168
+ 🏥 Cluster Health Diagnosis
169
+
170
+ Overall Score: 72/100 💛
171
+
172
+ Nodes: 3/3 Ready ✅
173
+ Pods: 45/52 Running
174
+ - CrashLoop: 2 🔥
175
+ - Pending: 5 ⏳
176
+
177
+ Critical Issues:
178
+ 🔴 Pod "payment-service" CrashLooping (exit 1)
179
+ 🔴 Pod "worker-3" OOM Killed
180
+
181
+ Recommendations:
182
+ - Fix 2 CrashLoop pods immediately
183
+ - Check if pending pods lack resources
184
+ ```
185
+
186
+ ## How It Works
187
+
188
+ 1. **Connects to your cluster** via kubeconfig (same as kubectl)
189
+ 2. **Gathers comprehensive data** - pod status, events, logs, resource usage
190
+ 3. **Applies pattern matching** - recognizes common error patterns from production experience
191
+ 4. **Analyzes root causes** - doesn't just show status, explains WHY it's failing
192
+ 5. **Provides solutions** - gives exact commands and YAML to fix issues
193
+
194
+ ## Error Patterns Detected
195
+
196
+ K8s Doctor recognizes these common patterns:
197
+
198
+ - 🔴 **Connection Refused** - Service not ready, wrong port, network policy
199
+ - 🔴 **Database Connection Errors** - DB auth, wrong connection strings
200
+ - 🔴 **Out of Memory** - OOM kills, memory leaks, undersized limits
201
+ - 🟠 **File Not Found** - ConfigMap not mounted, wrong paths
202
+ - 🟠 **Permission Denied** - SecurityContext issues, fsGroup problems
203
+ - 🟠 **DNS Resolution Failed** - CoreDNS issues, wrong service names
204
+ - 🟡 **Port Already in Use** - Multiple processes on same port
205
+ - 🟡 **Timeout** - Slow responses, network delays
206
+ - 🟡 **SSL/TLS Errors** - Expired certs, missing CA bundles
207
+
208
+ ## Architecture
209
+
210
+ ```
211
+ k8s-doctor-mcp/
212
+ ├── src/
213
+ │ ├── index.ts # MCP server with all tools
214
+ │ ├── types.ts # TypeScript type definitions
215
+ │ ├── diagnostics/
216
+ │ │ ├── pod-diagnostics.ts # Pod health analysis
217
+ │ │ └── cluster-health.ts # Cluster-wide diagnostics
218
+ │ ├── analyzers/
219
+ │ │ └── log-analyzer.ts # Smart log pattern matching
220
+ │ └── utils/
221
+ │ ├── k8s-client.ts # Kubernetes API client
222
+ │ └── formatters.ts # Output formatting utilities
223
+ └── package.json
224
+ ```
225
+
226
+ ## Security Considerations
227
+
228
+ - K8s Doctor uses **read-only** Kubernetes API calls (list, get, describe)
229
+ - Requires same permissions as `kubectl get/describe/logs`
230
+ - Never modifies cluster state
231
+ - kubeconfig credentials stay local
232
+ - No data sent to external servers
233
+
234
+ ## Troubleshooting
235
+
236
+ ### "kubeconfig not found"
237
+ ```bash
238
+ # Verify kubectl works
239
+ kubectl cluster-info
240
+
241
+ # Check kubeconfig location
242
+ echo $KUBECONFIG
243
+
244
+ # Test with explicit path
245
+ export KUBECONFIG=~/.kube/config
246
+ ```
247
+
248
+ ### "Permission denied"
249
+ ```bash
250
+ # Check your cluster permissions
251
+ kubectl auth can-i get pods --all-namespaces
252
+
253
+ # You need at least read access to:
254
+ # - pods, events, namespaces, nodes
255
+ ```
256
+
257
+ ### "Connection refused to cluster"
258
+ ```bash
259
+ # Verify cluster connectivity
260
+ kubectl get nodes
261
+
262
+ # For local clusters (minikube/kind)
263
+ minikube status
264
+ kind get clusters
265
+ ```
266
+
267
+ ## Development
268
+
269
+ ```bash
270
+ # Clone and install
271
+ git clone https://github.com/ongjin/k8s-doctor-mcp.git
272
+ cd k8s-doctor-mcp
273
+ npm install
274
+
275
+ # Development mode
276
+ npm run dev
277
+
278
+ # Build
279
+ npm run build
280
+
281
+ # Test with Claude Code
282
+ npm run build
283
+ claude mcp add k8s-doctor-dev -- node $(pwd)/dist/index.js
284
+ ```
285
+
286
+ ## Contributing
287
+
288
+ Contributions welcome! Especially:
289
+
290
+ - 🆕 New error pattern detections
291
+ - 🌍 Internationalization (more languages)
292
+ - 📊 Metrics integration (Prometheus, etc.)
293
+ - 🧪 Test coverage
294
+ - 📖 Documentation improvements
295
+
296
+ ## Roadmap
297
+
298
+ - [ ] Metrics Server integration (real-time CPU/Memory usage)
299
+ - [ ] Network policy diagnostics
300
+ - [ ] Storage/PVC troubleshooting
301
+ - [ ] Helm chart analysis
302
+ - [ ] Multi-cluster support
303
+ - [ ] Interactive debugging mode
304
+ - [ ] Export reports (PDF, HTML)
305
+
306
+ ## License
307
+
308
+ MIT © [zerry](https://github.com/ongjin)
309
+
310
+ ## Acknowledgments
311
+
312
+ Built with:
313
+ - [@modelcontextprotocol/sdk](https://github.com/anthropics/mcp) - Model Context Protocol
314
+ - [@kubernetes/client-node](https://github.com/kubernetes-client/javascript) - Kubernetes JavaScript Client
315
+ - [Claude Code](https://claude.com/claude-code) - AI-powered development
316
+
317
+ ## Star History
318
+
319
+ If this tool saves you debugging time, please ⭐ star the repo!
320
+
321
+ ## Author
322
+
323
+ **zerry**
324
+
325
+ - GitHub: [@zerry](https://github.com/ongjin)
326
+ - Created for the DevOps community who are tired of kubectl hell 😅
327
+
328
+ ---
329
+
330
+ **Made with ❤️ for Kubernetes users drowning in logs**
@@ -0,0 +1,17 @@
1
+ /**
2
+ * Log analysis module
3
+ *
4
+ * Rather than simply showing logs
5
+ * finds error patterns and analyzes root causes
6
+ *
7
+ * @author zerry
8
+ */
9
+ import * as k8s from '@kubernetes/client-node';
10
+ import type { LogAnalysis } from '../types.js';
11
+ /**
12
+ * Analyze pod logs
13
+ *
14
+ * Finds error patterns in logs and suggests solutions
15
+ * Extracts key information from thousands of log lines
16
+ */
17
+ export declare function analyzeLogs(logApi: k8s.Log, namespace: string, podName: string, containerName?: string, tailLines?: number): Promise<LogAnalysis>;