elsabro 2.2.0 → 3.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +668 -20
- package/agents/elsabro-orchestrator.md +113 -0
- package/bin/install.js +0 -0
- package/commands/elsabro/execute.md +223 -46
- package/commands/elsabro/start.md +34 -0
- package/commands/elsabro/verify-work.md +29 -0
- package/flows/development-flow.json +452 -0
- package/flows/quick-flow.json +118 -0
- package/hooks/confirm-destructive.sh +145 -0
- package/hooks/hooks-config.json +81 -0
- package/hooks/lint-check.sh +238 -0
- package/hooks/post-edit-test.sh +189 -0
- package/package.json +5 -3
- package/references/SYSTEM_INDEX.md +379 -5
- package/references/agent-marketplace.md +2274 -0
- package/references/agent-protocol.md +1126 -0
- package/references/ai-code-suggestions.md +2413 -0
- package/references/checkpointing.md +595 -0
- package/references/collaboration-patterns.md +851 -0
- package/references/collaborative-sessions.md +1081 -0
- package/references/configuration-management.md +1810 -0
- package/references/cost-tracking.md +1095 -0
- package/references/enterprise-sso.md +2001 -0
- package/references/error-contracts-tests.md +1171 -0
- package/references/error-contracts-v2.md +968 -0
- package/references/error-contracts.md +3102 -0
- package/references/event-driven.md +1031 -0
- package/references/flow-orchestration.md +940 -0
- package/references/flow-visualization.md +1557 -0
- package/references/ide-integrations.md +3513 -0
- package/references/interrupt-system.md +681 -0
- package/references/kubernetes-deployment.md +3099 -0
- package/references/memory-system.md +683 -0
- package/references/mobile-companion.md +3236 -0
- package/references/multi-llm-providers.md +2494 -0
- package/references/multi-project-memory.md +1182 -0
- package/references/observability.md +793 -0
- package/references/output-schemas.md +858 -0
- package/references/parallel-worktrees.md +293 -0
- package/references/performance-profiler.md +955 -0
- package/references/plugin-system.md +1526 -0
- package/references/prompt-management.md +292 -0
- package/references/sandbox-execution.md +303 -0
- package/references/security-system.md +1253 -0
- package/references/streaming.md +696 -0
- package/references/testing-framework.md +1151 -0
- package/references/time-travel.md +802 -0
- package/references/tool-registry.md +886 -0
- package/references/voice-commands.md +3296 -0
- package/scripts/setup-parallel-worktrees.sh +319 -0
- package/skills/memory-update.md +207 -0
- package/skills/review.md +331 -0
- package/skills/techdebt.md +289 -0
- package/skills/tutor.md +219 -0
- package/templates/.planning/notes/.gitkeep +0 -0
- package/templates/CLAUDE.md.template +48 -0
- package/templates/agent-marketplace-config.json +220 -0
- package/templates/agent-protocol-config.json +136 -0
- package/templates/ai-suggestions-config.json +100 -0
- package/templates/checkpoint-state.json +61 -0
- package/templates/collaboration-config.json +157 -0
- package/templates/collaborative-sessions-config.json +153 -0
- package/templates/configuration-config.json +245 -0
- package/templates/cost-tracking-config.json +148 -0
- package/templates/enterprise-sso-config.json +438 -0
- package/templates/error-handling-config.json +79 -2
- package/templates/events-config.json +148 -0
- package/templates/flow-visualization-config.json +196 -0
- package/templates/ide-integrations-config.json +442 -0
- package/templates/kubernetes-config.json +764 -0
- package/templates/memory-state.json +84 -0
- package/templates/mistakes.md.template +52 -0
- package/templates/mobile-companion-config.json +600 -0
- package/templates/multi-llm-config.json +544 -0
- package/templates/multi-project-memory-config.json +145 -0
- package/templates/observability-config.json +109 -0
- package/templates/patterns.md.template +114 -0
- package/templates/performance-profiler-config.json +125 -0
- package/templates/plugin-config.json +170 -0
- package/templates/prompt-management-config.json +86 -0
- package/templates/sandbox-config.json +185 -0
- package/templates/schemas-config.json +65 -0
- package/templates/security-config.json +120 -0
- package/templates/streaming-config.json +72 -0
- package/templates/testing-config.json +81 -0
- package/templates/timetravel-config.json +62 -0
- package/templates/tool-registry-config.json +109 -0
- package/templates/voice-commands-config.json +658 -0
|
@@ -0,0 +1,3099 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: kubernetes-deployment
|
|
3
|
+
description: Sistema de deployment de ELSABRO en Kubernetes con Helm, HPA, VPA y observabilidad
|
|
4
|
+
version: 3.6.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# ELSABRO Kubernetes Deployment System (v3.6)
|
|
8
|
+
|
|
9
|
+
Sistema completo para desplegar ELSABRO como servicio en Kubernetes con soporte para multi-tenancy, auto-scaling, y observabilidad enterprise.
|
|
10
|
+
|
|
11
|
+
## Vision General
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
+-----------------------------------------------------------------------------+
|
|
15
|
+
| ELSABRO KUBERNETES ARCHITECTURE |
|
|
16
|
+
+-----------------------------------------------------------------------------+
|
|
17
|
+
| |
|
|
18
|
+
| +------------------------------------------------------------------------+ |
|
|
19
|
+
| | INGRESS LAYER | |
|
|
20
|
+
| | +----------------+ +----------------+ +----------------+ | |
|
|
21
|
+
| | | Nginx | | Traefik | | Istio GW | | |
|
|
22
|
+
| | | Ingress | | Ingress | | (Service | | |
|
|
23
|
+
| | | (default) | | (alternative) | | Mesh) | | |
|
|
24
|
+
| | +-------+--------+ +-------+--------+ +-------+--------+ | |
|
|
25
|
+
| +----------|------------------|------------------|-------------------------+ |
|
|
26
|
+
| | | | |
|
|
27
|
+
| +------------------+------------------+ |
|
|
28
|
+
| | |
|
|
29
|
+
| +-----------------------------v------------------------------------------+ |
|
|
30
|
+
| | SERVICE LAYER | |
|
|
31
|
+
| | +------------------------------------------------------------------+ | |
|
|
32
|
+
| | | elsabro-service | | |
|
|
33
|
+
| | | ClusterIP / LoadBalancer / NodePort | | |
|
|
34
|
+
| | | Port: 8080 (HTTP) | 8443 (HTTPS) | 9090 (Metrics) | | |
|
|
35
|
+
| | +------------------------------------------------------------------+ | |
|
|
36
|
+
| +------------------------------------------------------------------------+ |
|
|
37
|
+
| | |
|
|
38
|
+
| +-----------------------------v------------------------------------------+ |
|
|
39
|
+
| | DEPLOYMENT LAYER | |
|
|
40
|
+
| | +------------------------------------------------------------------+ | |
|
|
41
|
+
| | | elsabro-deployment | | |
|
|
42
|
+
| | | +------------+ +------------+ +------------+ +------------+ | | |
|
|
43
|
+
| | | | Pod 1 | | Pod 2 | | Pod 3 | | Pod N | | | |
|
|
44
|
+
| | | | Replica | | Replica | | Replica | | Replica | | | |
|
|
45
|
+
| | | +------------+ +------------+ +------------+ +------------+ | | |
|
|
46
|
+
| | | ^ ^ ^ ^ | | |
|
|
47
|
+
| | | +---------------+---------------+---------------+ | | |
|
|
48
|
+
| | | | | | |
|
|
49
|
+
| | | HPA / VPA / KEDA | | |
|
|
50
|
+
| | +------------------------------------------------------------------+ | |
|
|
51
|
+
| +------------------------------------------------------------------------+ |
|
|
52
|
+
| | |
|
|
53
|
+
| +-----------------------------v------------------------------------------+ |
|
|
54
|
+
| | INFRASTRUCTURE LAYER | |
|
|
55
|
+
| | +----------------+ +----------------+ +----------------+ | |
|
|
56
|
+
| | | Redis | | PostgreSQL | | RabbitMQ | | |
|
|
57
|
+
| | | Cluster | | Primary | | Cluster | | |
|
|
58
|
+
| | | (Session/ | | + Replicas | | (Event Bus) | | |
|
|
59
|
+
| | | Cache) | | | | | | |
|
|
60
|
+
| | +----------------+ +----------------+ +----------------+ | |
|
|
61
|
+
| +------------------------------------------------------------------------+ |
|
|
62
|
+
| | |
|
|
63
|
+
| +-----------------------------v------------------------------------------+ |
|
|
64
|
+
| | OBSERVABILITY LAYER | |
|
|
65
|
+
| | +----------------+ +----------------+ +----------------+ | |
|
|
66
|
+
| | | Prometheus | | Grafana | | Jaeger | | |
|
|
67
|
+
| | | (Metrics) | | (Dashboards) | | (Tracing) | | |
|
|
68
|
+
| | +----------------+ +----------------+ +----------------+ | |
|
|
69
|
+
| +------------------------------------------------------------------------+ |
|
|
70
|
+
| |
|
|
71
|
+
+-----------------------------------------------------------------------------+
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## 1. K8sDeployer
|
|
77
|
+
|
|
78
|
+
### Interfaces TypeScript
|
|
79
|
+
|
|
80
|
+
```typescript
|
|
81
|
+
/**
|
|
82
|
+
* K8sDeployer - Core deployment manager for ELSABRO on Kubernetes
|
|
83
|
+
*/
|
|
84
|
+
|
|
85
|
+
// Namespace configuration
|
|
86
|
+
interface NamespaceConfig {
|
|
87
|
+
name: string;
|
|
88
|
+
labels: Record<string, string>;
|
|
89
|
+
annotations: Record<string, string>;
|
|
90
|
+
resourceQuota?: ResourceQuotaSpec;
|
|
91
|
+
limitRange?: LimitRangeSpec;
|
|
92
|
+
networkPolicy?: NetworkPolicySpec;
|
|
93
|
+
}
|
|
94
|
+
|
|
95
|
+
// Deployment specification
|
|
96
|
+
interface DeploymentSpec {
|
|
97
|
+
name: string;
|
|
98
|
+
namespace: string;
|
|
99
|
+
replicas: number;
|
|
100
|
+
image: string;
|
|
101
|
+
imageTag: string;
|
|
102
|
+
imagePullPolicy: 'Always' | 'IfNotPresent' | 'Never';
|
|
103
|
+
resources: ResourceRequirements;
|
|
104
|
+
env: EnvVar[];
|
|
105
|
+
envFrom: EnvFromSource[];
|
|
106
|
+
ports: ContainerPort[];
|
|
107
|
+
volumeMounts: VolumeMount[];
|
|
108
|
+
volumes: Volume[];
|
|
109
|
+
nodeSelector?: Record<string, string>;
|
|
110
|
+
tolerations?: Toleration[];
|
|
111
|
+
affinity?: Affinity;
|
|
112
|
+
securityContext?: PodSecurityContext;
|
|
113
|
+
serviceAccountName?: string;
|
|
114
|
+
}
|
|
115
|
+
|
|
116
|
+
// Resource requirements
|
|
117
|
+
interface ResourceRequirements {
|
|
118
|
+
requests: {
|
|
119
|
+
cpu: string;
|
|
120
|
+
memory: string;
|
|
121
|
+
'ephemeral-storage'?: string;
|
|
122
|
+
};
|
|
123
|
+
limits: {
|
|
124
|
+
cpu: string;
|
|
125
|
+
memory: string;
|
|
126
|
+
'ephemeral-storage'?: string;
|
|
127
|
+
'nvidia.com/gpu'?: number;
|
|
128
|
+
};
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
// Environment variable
|
|
132
|
+
interface EnvVar {
|
|
133
|
+
name: string;
|
|
134
|
+
value?: string;
|
|
135
|
+
valueFrom?: {
|
|
136
|
+
configMapKeyRef?: { name: string; key: string };
|
|
137
|
+
secretKeyRef?: { name: string; key: string };
|
|
138
|
+
fieldRef?: { fieldPath: string };
|
|
139
|
+
resourceFieldRef?: { containerName: string; resource: string };
|
|
140
|
+
};
|
|
141
|
+
}
|
|
142
|
+
|
|
143
|
+
// ConfigMap specification
|
|
144
|
+
interface ConfigMapSpec {
|
|
145
|
+
name: string;
|
|
146
|
+
namespace: string;
|
|
147
|
+
data: Record<string, string>;
|
|
148
|
+
binaryData?: Record<string, string>;
|
|
149
|
+
immutable?: boolean;
|
|
150
|
+
}
|
|
151
|
+
|
|
152
|
+
// Secret specification
|
|
153
|
+
interface SecretSpec {
|
|
154
|
+
name: string;
|
|
155
|
+
namespace: string;
|
|
156
|
+
type: 'Opaque' | 'kubernetes.io/tls' | 'kubernetes.io/dockerconfigjson';
|
|
157
|
+
data?: Record<string, string>;
|
|
158
|
+
stringData?: Record<string, string>;
|
|
159
|
+
}
|
|
160
|
+
|
|
161
|
+
// Service Account specification
|
|
162
|
+
interface ServiceAccountSpec {
|
|
163
|
+
name: string;
|
|
164
|
+
namespace: string;
|
|
165
|
+
automountServiceAccountToken?: boolean;
|
|
166
|
+
imagePullSecrets?: { name: string }[];
|
|
167
|
+
secrets?: { name: string }[];
|
|
168
|
+
}
|
|
169
|
+
|
|
170
|
+
// RBAC Role specification
|
|
171
|
+
interface RoleSpec {
|
|
172
|
+
name: string;
|
|
173
|
+
namespace: string;
|
|
174
|
+
rules: PolicyRule[];
|
|
175
|
+
}
|
|
176
|
+
|
|
177
|
+
interface PolicyRule {
|
|
178
|
+
apiGroups: string[];
|
|
179
|
+
resources: string[];
|
|
180
|
+
verbs: ('get' | 'list' | 'watch' | 'create' | 'update' | 'patch' | 'delete')[];
|
|
181
|
+
resourceNames?: string[];
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
// K8sDeployer main class
|
|
185
|
+
class K8sDeployer {
|
|
186
|
+
private kubeConfig: KubeConfig;
|
|
187
|
+
private coreApi: CoreV1Api;
|
|
188
|
+
private appsApi: AppsV1Api;
|
|
189
|
+
private rbacApi: RbacAuthorizationV1Api;
|
|
190
|
+
private networkingApi: NetworkingV1Api;
|
|
191
|
+
private autoscalingApi: AutoscalingV2Api;
|
|
192
|
+
|
|
193
|
+
constructor(config: K8sDeployerConfig) {
|
|
194
|
+
this.kubeConfig = new KubeConfig();
|
|
195
|
+
this.kubeConfig.loadFromDefault();
|
|
196
|
+
this.initializeApis();
|
|
197
|
+
}
|
|
198
|
+
|
|
199
|
+
// Namespace operations
|
|
200
|
+
async createNamespace(config: NamespaceConfig): Promise<V1Namespace> {
|
|
201
|
+
const namespace: V1Namespace = {
|
|
202
|
+
apiVersion: 'v1',
|
|
203
|
+
kind: 'Namespace',
|
|
204
|
+
metadata: {
|
|
205
|
+
name: config.name,
|
|
206
|
+
labels: {
|
|
207
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
208
|
+
'app.kubernetes.io/managed-by': 'elsabro-deployer',
|
|
209
|
+
...config.labels
|
|
210
|
+
},
|
|
211
|
+
annotations: config.annotations
|
|
212
|
+
}
|
|
213
|
+
};
|
|
214
|
+
|
|
215
|
+
const result = await this.coreApi.createNamespace(namespace);
|
|
216
|
+
|
|
217
|
+
// Apply resource quota if specified
|
|
218
|
+
if (config.resourceQuota) {
|
|
219
|
+
await this.createResourceQuota(config.name, config.resourceQuota);
|
|
220
|
+
}
|
|
221
|
+
|
|
222
|
+
// Apply limit range if specified
|
|
223
|
+
if (config.limitRange) {
|
|
224
|
+
await this.createLimitRange(config.name, config.limitRange);
|
|
225
|
+
}
|
|
226
|
+
|
|
227
|
+
// Apply network policy if specified
|
|
228
|
+
if (config.networkPolicy) {
|
|
229
|
+
await this.createNetworkPolicy(config.name, config.networkPolicy);
|
|
230
|
+
}
|
|
231
|
+
|
|
232
|
+
return result.body;
|
|
233
|
+
}
|
|
234
|
+
|
|
235
|
+
// Deployment operations
|
|
236
|
+
async deploy(spec: DeploymentSpec): Promise<DeploymentResult> {
|
|
237
|
+
const deployment = this.buildDeploymentManifest(spec);
|
|
238
|
+
|
|
239
|
+
try {
|
|
240
|
+
// Check if deployment exists
|
|
241
|
+
const existing = await this.appsApi.readNamespacedDeployment(
|
|
242
|
+
spec.name,
|
|
243
|
+
spec.namespace
|
|
244
|
+
);
|
|
245
|
+
|
|
246
|
+
// Update existing deployment
|
|
247
|
+
const result = await this.appsApi.replaceNamespacedDeployment(
|
|
248
|
+
spec.name,
|
|
249
|
+
spec.namespace,
|
|
250
|
+
deployment
|
|
251
|
+
);
|
|
252
|
+
|
|
253
|
+
return {
|
|
254
|
+
action: 'updated',
|
|
255
|
+
deployment: result.body,
|
|
256
|
+
timestamp: new Date().toISOString()
|
|
257
|
+
};
|
|
258
|
+
} catch (e) {
|
|
259
|
+
if (e.statusCode === 404) {
|
|
260
|
+
// Create new deployment
|
|
261
|
+
const result = await this.appsApi.createNamespacedDeployment(
|
|
262
|
+
spec.namespace,
|
|
263
|
+
deployment
|
|
264
|
+
);
|
|
265
|
+
|
|
266
|
+
return {
|
|
267
|
+
action: 'created',
|
|
268
|
+
deployment: result.body,
|
|
269
|
+
timestamp: new Date().toISOString()
|
|
270
|
+
};
|
|
271
|
+
}
|
|
272
|
+
throw e;
|
|
273
|
+
}
|
|
274
|
+
}
|
|
275
|
+
|
|
276
|
+
// ConfigMap operations
|
|
277
|
+
async createConfigMap(spec: ConfigMapSpec): Promise<V1ConfigMap> {
|
|
278
|
+
const configMap: V1ConfigMap = {
|
|
279
|
+
apiVersion: 'v1',
|
|
280
|
+
kind: 'ConfigMap',
|
|
281
|
+
metadata: {
|
|
282
|
+
name: spec.name,
|
|
283
|
+
namespace: spec.namespace,
|
|
284
|
+
labels: {
|
|
285
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
286
|
+
'app.kubernetes.io/component': 'config'
|
|
287
|
+
}
|
|
288
|
+
},
|
|
289
|
+
data: spec.data,
|
|
290
|
+
binaryData: spec.binaryData,
|
|
291
|
+
immutable: spec.immutable
|
|
292
|
+
};
|
|
293
|
+
|
|
294
|
+
const result = await this.coreApi.createNamespacedConfigMap(
|
|
295
|
+
spec.namespace,
|
|
296
|
+
configMap
|
|
297
|
+
);
|
|
298
|
+
return result.body;
|
|
299
|
+
}
|
|
300
|
+
|
|
301
|
+
// Secret operations
|
|
302
|
+
async createSecret(spec: SecretSpec): Promise<V1Secret> {
|
|
303
|
+
const secret: V1Secret = {
|
|
304
|
+
apiVersion: 'v1',
|
|
305
|
+
kind: 'Secret',
|
|
306
|
+
metadata: {
|
|
307
|
+
name: spec.name,
|
|
308
|
+
namespace: spec.namespace,
|
|
309
|
+
labels: {
|
|
310
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
311
|
+
'app.kubernetes.io/component': 'secret'
|
|
312
|
+
}
|
|
313
|
+
},
|
|
314
|
+
type: spec.type,
|
|
315
|
+
data: spec.data ? this.encodeSecretData(spec.data) : undefined,
|
|
316
|
+
stringData: spec.stringData
|
|
317
|
+
};
|
|
318
|
+
|
|
319
|
+
const result = await this.coreApi.createNamespacedSecret(
|
|
320
|
+
spec.namespace,
|
|
321
|
+
secret
|
|
322
|
+
);
|
|
323
|
+
return result.body;
|
|
324
|
+
}
|
|
325
|
+
|
|
326
|
+
// ServiceAccount and RBAC operations
|
|
327
|
+
async setupRBAC(
|
|
328
|
+
namespace: string,
|
|
329
|
+
serviceAccountSpec: ServiceAccountSpec,
|
|
330
|
+
roleSpec: RoleSpec,
|
|
331
|
+
roleBindingName: string
|
|
332
|
+
): Promise<RBACSetupResult> {
|
|
333
|
+
// Create ServiceAccount
|
|
334
|
+
const sa = await this.createServiceAccount(serviceAccountSpec);
|
|
335
|
+
|
|
336
|
+
// Create Role
|
|
337
|
+
const role = await this.createRole(roleSpec);
|
|
338
|
+
|
|
339
|
+
// Create RoleBinding
|
|
340
|
+
const binding = await this.createRoleBinding({
|
|
341
|
+
name: roleBindingName,
|
|
342
|
+
namespace,
|
|
343
|
+
roleRef: {
|
|
344
|
+
apiGroup: 'rbac.authorization.k8s.io',
|
|
345
|
+
kind: 'Role',
|
|
346
|
+
name: roleSpec.name
|
|
347
|
+
},
|
|
348
|
+
subjects: [{
|
|
349
|
+
kind: 'ServiceAccount',
|
|
350
|
+
name: serviceAccountSpec.name,
|
|
351
|
+
namespace
|
|
352
|
+
}]
|
|
353
|
+
});
|
|
354
|
+
|
|
355
|
+
return { serviceAccount: sa, role, roleBinding: binding };
|
|
356
|
+
}
|
|
357
|
+
|
|
358
|
+
// Rollout operations
|
|
359
|
+
async rollout(
|
|
360
|
+
namespace: string,
|
|
361
|
+
deploymentName: string,
|
|
362
|
+
strategy: RolloutStrategy
|
|
363
|
+
): Promise<RolloutResult> {
|
|
364
|
+
const deployment = await this.appsApi.readNamespacedDeployment(
|
|
365
|
+
deploymentName,
|
|
366
|
+
namespace
|
|
367
|
+
);
|
|
368
|
+
|
|
369
|
+
switch (strategy.type) {
|
|
370
|
+
case 'restart':
|
|
371
|
+
return this.restartDeployment(namespace, deploymentName);
|
|
372
|
+
case 'scale':
|
|
373
|
+
return this.scaleDeployment(namespace, deploymentName, strategy.replicas);
|
|
374
|
+
case 'canary':
|
|
375
|
+
return this.canaryDeployment(namespace, deployment.body, strategy);
|
|
376
|
+
case 'bluegreen':
|
|
377
|
+
return this.blueGreenDeployment(namespace, deployment.body, strategy);
|
|
378
|
+
default:
|
|
379
|
+
throw new Error(`Unknown rollout strategy: ${strategy.type}`);
|
|
380
|
+
}
|
|
381
|
+
}
|
|
382
|
+
|
|
383
|
+
// Status and monitoring
|
|
384
|
+
async getDeploymentStatus(
|
|
385
|
+
namespace: string,
|
|
386
|
+
deploymentName: string
|
|
387
|
+
): Promise<DeploymentStatus> {
|
|
388
|
+
const deployment = await this.appsApi.readNamespacedDeployment(
|
|
389
|
+
deploymentName,
|
|
390
|
+
namespace
|
|
391
|
+
);
|
|
392
|
+
|
|
393
|
+
const pods = await this.coreApi.listNamespacedPod(
|
|
394
|
+
namespace,
|
|
395
|
+
undefined,
|
|
396
|
+
undefined,
|
|
397
|
+
undefined,
|
|
398
|
+
undefined,
|
|
399
|
+
`app.kubernetes.io/name=elsabro`
|
|
400
|
+
);
|
|
401
|
+
|
|
402
|
+
return {
|
|
403
|
+
name: deploymentName,
|
|
404
|
+
namespace,
|
|
405
|
+
replicas: {
|
|
406
|
+
desired: deployment.body.spec?.replicas || 0,
|
|
407
|
+
ready: deployment.body.status?.readyReplicas || 0,
|
|
408
|
+
available: deployment.body.status?.availableReplicas || 0,
|
|
409
|
+
unavailable: deployment.body.status?.unavailableReplicas || 0
|
|
410
|
+
},
|
|
411
|
+
conditions: deployment.body.status?.conditions || [],
|
|
412
|
+
pods: pods.body.items.map(pod => ({
|
|
413
|
+
name: pod.metadata?.name,
|
|
414
|
+
phase: pod.status?.phase,
|
|
415
|
+
ready: pod.status?.conditions?.find(c => c.type === 'Ready')?.status === 'True',
|
|
416
|
+
restarts: pod.status?.containerStatuses?.[0]?.restartCount || 0
|
|
417
|
+
}))
|
|
418
|
+
};
|
|
419
|
+
}
|
|
420
|
+
|
|
421
|
+
// Helper methods
|
|
422
|
+
private buildDeploymentManifest(spec: DeploymentSpec): V1Deployment {
|
|
423
|
+
return {
|
|
424
|
+
apiVersion: 'apps/v1',
|
|
425
|
+
kind: 'Deployment',
|
|
426
|
+
metadata: {
|
|
427
|
+
name: spec.name,
|
|
428
|
+
namespace: spec.namespace,
|
|
429
|
+
labels: {
|
|
430
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
431
|
+
'app.kubernetes.io/instance': spec.name,
|
|
432
|
+
'app.kubernetes.io/version': spec.imageTag,
|
|
433
|
+
'app.kubernetes.io/component': 'api',
|
|
434
|
+
'app.kubernetes.io/managed-by': 'elsabro-deployer'
|
|
435
|
+
}
|
|
436
|
+
},
|
|
437
|
+
spec: {
|
|
438
|
+
replicas: spec.replicas,
|
|
439
|
+
selector: {
|
|
440
|
+
matchLabels: {
|
|
441
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
442
|
+
'app.kubernetes.io/instance': spec.name
|
|
443
|
+
}
|
|
444
|
+
},
|
|
445
|
+
strategy: {
|
|
446
|
+
type: 'RollingUpdate',
|
|
447
|
+
rollingUpdate: {
|
|
448
|
+
maxSurge: '25%',
|
|
449
|
+
maxUnavailable: '25%'
|
|
450
|
+
}
|
|
451
|
+
},
|
|
452
|
+
template: {
|
|
453
|
+
metadata: {
|
|
454
|
+
labels: {
|
|
455
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
456
|
+
'app.kubernetes.io/instance': spec.name,
|
|
457
|
+
'app.kubernetes.io/version': spec.imageTag
|
|
458
|
+
},
|
|
459
|
+
annotations: {
|
|
460
|
+
'prometheus.io/scrape': 'true',
|
|
461
|
+
'prometheus.io/port': '9090',
|
|
462
|
+
'prometheus.io/path': '/metrics'
|
|
463
|
+
}
|
|
464
|
+
},
|
|
465
|
+
spec: {
|
|
466
|
+
serviceAccountName: spec.serviceAccountName,
|
|
467
|
+
securityContext: spec.securityContext,
|
|
468
|
+
containers: [{
|
|
469
|
+
name: 'elsabro',
|
|
470
|
+
image: `${spec.image}:${spec.imageTag}`,
|
|
471
|
+
imagePullPolicy: spec.imagePullPolicy,
|
|
472
|
+
ports: spec.ports,
|
|
473
|
+
env: spec.env,
|
|
474
|
+
envFrom: spec.envFrom,
|
|
475
|
+
resources: spec.resources,
|
|
476
|
+
volumeMounts: spec.volumeMounts,
|
|
477
|
+
livenessProbe: {
|
|
478
|
+
httpGet: { path: '/health/live', port: 8080 },
|
|
479
|
+
initialDelaySeconds: 15,
|
|
480
|
+
periodSeconds: 20,
|
|
481
|
+
timeoutSeconds: 5,
|
|
482
|
+
failureThreshold: 3
|
|
483
|
+
},
|
|
484
|
+
readinessProbe: {
|
|
485
|
+
httpGet: { path: '/health/ready', port: 8080 },
|
|
486
|
+
initialDelaySeconds: 5,
|
|
487
|
+
periodSeconds: 10,
|
|
488
|
+
timeoutSeconds: 3,
|
|
489
|
+
failureThreshold: 3
|
|
490
|
+
},
|
|
491
|
+
startupProbe: {
|
|
492
|
+
httpGet: { path: '/health/startup', port: 8080 },
|
|
493
|
+
initialDelaySeconds: 10,
|
|
494
|
+
periodSeconds: 5,
|
|
495
|
+
timeoutSeconds: 3,
|
|
496
|
+
failureThreshold: 30
|
|
497
|
+
}
|
|
498
|
+
}],
|
|
499
|
+
volumes: spec.volumes,
|
|
500
|
+
nodeSelector: spec.nodeSelector,
|
|
501
|
+
tolerations: spec.tolerations,
|
|
502
|
+
affinity: spec.affinity
|
|
503
|
+
}
|
|
504
|
+
}
|
|
505
|
+
}
|
|
506
|
+
};
|
|
507
|
+
}
|
|
508
|
+
}
|
|
509
|
+
```
|
|
510
|
+
|
|
511
|
+
---
|
|
512
|
+
|
|
513
|
+
## 2. HelmChartGenerator
|
|
514
|
+
|
|
515
|
+
### Helm Chart Structure
|
|
516
|
+
|
|
517
|
+
```
|
|
518
|
+
elsabro-chart/
|
|
519
|
+
+-- Chart.yaml
|
|
520
|
+
+-- values.yaml
|
|
521
|
+
+-- values-dev.yaml
|
|
522
|
+
+-- values-staging.yaml
|
|
523
|
+
+-- values-prod.yaml
|
|
524
|
+
+-- templates/
|
|
525
|
+
| +-- _helpers.tpl
|
|
526
|
+
| +-- deployment.yaml
|
|
527
|
+
| +-- service.yaml
|
|
528
|
+
| +-- ingress.yaml
|
|
529
|
+
| +-- hpa.yaml
|
|
530
|
+
| +-- vpa.yaml
|
|
531
|
+
| +-- configmap.yaml
|
|
532
|
+
| +-- secret.yaml
|
|
533
|
+
| +-- serviceaccount.yaml
|
|
534
|
+
| +-- role.yaml
|
|
535
|
+
| +-- rolebinding.yaml
|
|
536
|
+
| +-- networkpolicy.yaml
|
|
537
|
+
| +-- pdb.yaml
|
|
538
|
+
| +-- servicemonitor.yaml
|
|
539
|
+
+-- charts/
|
|
540
|
+
| +-- redis/
|
|
541
|
+
| +-- postgresql/
|
|
542
|
+
| +-- rabbitmq/
|
|
543
|
+
+-- .helmignore
|
|
544
|
+
```
|
|
545
|
+
|
|
546
|
+
### Chart.yaml
|
|
547
|
+
|
|
548
|
+
```yaml
|
|
549
|
+
apiVersion: v2
|
|
550
|
+
name: elsabro
|
|
551
|
+
description: ELSABRO - AI-Powered Development Workflow System
|
|
552
|
+
type: application
|
|
553
|
+
version: 3.6.0
|
|
554
|
+
appVersion: "3.6.0"
|
|
555
|
+
kubeVersion: ">=1.25.0-0"
|
|
556
|
+
|
|
557
|
+
keywords:
|
|
558
|
+
- ai
|
|
559
|
+
- agents
|
|
560
|
+
- development
|
|
561
|
+
- workflow
|
|
562
|
+
- automation
|
|
563
|
+
|
|
564
|
+
home: https://github.com/cubait/elsabro
|
|
565
|
+
sources:
|
|
566
|
+
- https://github.com/cubait/elsabro
|
|
567
|
+
|
|
568
|
+
maintainers:
|
|
569
|
+
- name: cubait
|
|
570
|
+
email: support@cubait.com
|
|
571
|
+
|
|
572
|
+
dependencies:
|
|
573
|
+
- name: redis
|
|
574
|
+
version: "18.x.x"
|
|
575
|
+
repository: "https://charts.bitnami.com/bitnami"
|
|
576
|
+
condition: redis.enabled
|
|
577
|
+
- name: postgresql
|
|
578
|
+
version: "14.x.x"
|
|
579
|
+
repository: "https://charts.bitnami.com/bitnami"
|
|
580
|
+
condition: postgresql.enabled
|
|
581
|
+
- name: rabbitmq
|
|
582
|
+
version: "12.x.x"
|
|
583
|
+
repository: "https://charts.bitnami.com/bitnami"
|
|
584
|
+
condition: rabbitmq.enabled
|
|
585
|
+
|
|
586
|
+
annotations:
|
|
587
|
+
artifacthub.io/license: MIT
|
|
588
|
+
artifacthub.io/links: |
|
|
589
|
+
- name: Documentation
|
|
590
|
+
url: https://docs.elsabro.dev
|
|
591
|
+
- name: Support
|
|
592
|
+
url: https://github.com/cubait/elsabro/issues
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
### values.yaml (Default)
|
|
596
|
+
|
|
597
|
+
```yaml
|
|
598
|
+
# ELSABRO Helm Chart Values
|
|
599
|
+
# Default configuration for all environments
|
|
600
|
+
|
|
601
|
+
global:
|
|
602
|
+
imageRegistry: ""
|
|
603
|
+
imagePullSecrets: []
|
|
604
|
+
storageClass: ""
|
|
605
|
+
|
|
606
|
+
# Number of replicas
|
|
607
|
+
replicaCount: 2
|
|
608
|
+
|
|
609
|
+
# Image configuration
|
|
610
|
+
image:
|
|
611
|
+
registry: ghcr.io
|
|
612
|
+
repository: cubait/elsabro
|
|
613
|
+
tag: "3.6.0"
|
|
614
|
+
pullPolicy: IfNotPresent
|
|
615
|
+
|
|
616
|
+
# Service Account
|
|
617
|
+
serviceAccount:
|
|
618
|
+
create: true
|
|
619
|
+
annotations: {}
|
|
620
|
+
name: "elsabro-sa"
|
|
621
|
+
automountServiceAccountToken: true
|
|
622
|
+
|
|
623
|
+
# Pod annotations
|
|
624
|
+
podAnnotations:
|
|
625
|
+
prometheus.io/scrape: "true"
|
|
626
|
+
prometheus.io/port: "9090"
|
|
627
|
+
prometheus.io/path: "/metrics"
|
|
628
|
+
|
|
629
|
+
# Pod security context
|
|
630
|
+
podSecurityContext:
|
|
631
|
+
runAsNonRoot: true
|
|
632
|
+
runAsUser: 1000
|
|
633
|
+
runAsGroup: 1000
|
|
634
|
+
fsGroup: 1000
|
|
635
|
+
|
|
636
|
+
# Container security context
|
|
637
|
+
securityContext:
|
|
638
|
+
allowPrivilegeEscalation: false
|
|
639
|
+
readOnlyRootFilesystem: true
|
|
640
|
+
runAsNonRoot: true
|
|
641
|
+
runAsUser: 1000
|
|
642
|
+
capabilities:
|
|
643
|
+
drop:
|
|
644
|
+
- ALL
|
|
645
|
+
|
|
646
|
+
# Service configuration
|
|
647
|
+
service:
|
|
648
|
+
type: ClusterIP
|
|
649
|
+
port: 8080
|
|
650
|
+
metricsPort: 9090
|
|
651
|
+
annotations: {}
|
|
652
|
+
|
|
653
|
+
# Ingress configuration
|
|
654
|
+
ingress:
|
|
655
|
+
enabled: false
|
|
656
|
+
className: "nginx"
|
|
657
|
+
annotations:
|
|
658
|
+
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
659
|
+
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
|
|
660
|
+
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
|
|
661
|
+
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
|
|
662
|
+
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
|
663
|
+
hosts:
|
|
664
|
+
- host: elsabro.local
|
|
665
|
+
paths:
|
|
666
|
+
- path: /
|
|
667
|
+
pathType: Prefix
|
|
668
|
+
tls:
|
|
669
|
+
- secretName: elsabro-tls
|
|
670
|
+
hosts:
|
|
671
|
+
- elsabro.local
|
|
672
|
+
|
|
673
|
+
# Resource limits
|
|
674
|
+
resources:
|
|
675
|
+
requests:
|
|
676
|
+
cpu: 500m
|
|
677
|
+
memory: 512Mi
|
|
678
|
+
limits:
|
|
679
|
+
cpu: 2000m
|
|
680
|
+
memory: 2Gi
|
|
681
|
+
|
|
682
|
+
# Horizontal Pod Autoscaler
|
|
683
|
+
autoscaling:
|
|
684
|
+
enabled: true
|
|
685
|
+
minReplicas: 2
|
|
686
|
+
maxReplicas: 10
|
|
687
|
+
targetCPUUtilizationPercentage: 70
|
|
688
|
+
targetMemoryUtilizationPercentage: 80
|
|
689
|
+
behavior:
|
|
690
|
+
scaleDown:
|
|
691
|
+
stabilizationWindowSeconds: 300
|
|
692
|
+
policies:
|
|
693
|
+
- type: Percent
|
|
694
|
+
value: 10
|
|
695
|
+
periodSeconds: 60
|
|
696
|
+
scaleUp:
|
|
697
|
+
stabilizationWindowSeconds: 0
|
|
698
|
+
policies:
|
|
699
|
+
- type: Percent
|
|
700
|
+
value: 100
|
|
701
|
+
periodSeconds: 15
|
|
702
|
+
- type: Pods
|
|
703
|
+
value: 4
|
|
704
|
+
periodSeconds: 15
|
|
705
|
+
selectPolicy: Max
|
|
706
|
+
customMetrics:
|
|
707
|
+
- type: Pods
|
|
708
|
+
pods:
|
|
709
|
+
metric:
|
|
710
|
+
name: elsabro_queue_length
|
|
711
|
+
target:
|
|
712
|
+
type: AverageValue
|
|
713
|
+
averageValue: 100
|
|
714
|
+
- type: Pods
|
|
715
|
+
pods:
|
|
716
|
+
metric:
|
|
717
|
+
name: elsabro_active_agents
|
|
718
|
+
target:
|
|
719
|
+
type: AverageValue
|
|
720
|
+
averageValue: 5
|
|
721
|
+
|
|
722
|
+
# Vertical Pod Autoscaler
|
|
723
|
+
vpa:
|
|
724
|
+
enabled: false
|
|
725
|
+
updateMode: "Auto"
|
|
726
|
+
resourcePolicy:
|
|
727
|
+
containerPolicies:
|
|
728
|
+
- containerName: elsabro
|
|
729
|
+
minAllowed:
|
|
730
|
+
cpu: 250m
|
|
731
|
+
memory: 256Mi
|
|
732
|
+
maxAllowed:
|
|
733
|
+
cpu: 4000m
|
|
734
|
+
memory: 8Gi
|
|
735
|
+
controlledResources:
|
|
736
|
+
- cpu
|
|
737
|
+
- memory
|
|
738
|
+
|
|
739
|
+
# Pod Disruption Budget
|
|
740
|
+
podDisruptionBudget:
|
|
741
|
+
enabled: true
|
|
742
|
+
minAvailable: 1
|
|
743
|
+
# maxUnavailable: 1
|
|
744
|
+
|
|
745
|
+
# Node selector
|
|
746
|
+
nodeSelector: {}
|
|
747
|
+
|
|
748
|
+
# Tolerations
|
|
749
|
+
tolerations: []
|
|
750
|
+
|
|
751
|
+
# Affinity rules
|
|
752
|
+
affinity:
|
|
753
|
+
podAntiAffinity:
|
|
754
|
+
preferredDuringSchedulingIgnoredDuringExecution:
|
|
755
|
+
- weight: 100
|
|
756
|
+
podAffinityTerm:
|
|
757
|
+
labelSelector:
|
|
758
|
+
matchExpressions:
|
|
759
|
+
- key: app.kubernetes.io/name
|
|
760
|
+
operator: In
|
|
761
|
+
values:
|
|
762
|
+
- elsabro
|
|
763
|
+
topologyKey: kubernetes.io/hostname
|
|
764
|
+
|
|
765
|
+
# Topology spread constraints
|
|
766
|
+
topologySpreadConstraints:
|
|
767
|
+
- maxSkew: 1
|
|
768
|
+
topologyKey: topology.kubernetes.io/zone
|
|
769
|
+
whenUnsatisfiable: ScheduleAnyway
|
|
770
|
+
labelSelector:
|
|
771
|
+
matchLabels:
|
|
772
|
+
app.kubernetes.io/name: elsabro
|
|
773
|
+
|
|
774
|
+
# Probes configuration
|
|
775
|
+
probes:
|
|
776
|
+
liveness:
|
|
777
|
+
enabled: true
|
|
778
|
+
path: /health/live
|
|
779
|
+
initialDelaySeconds: 15
|
|
780
|
+
periodSeconds: 20
|
|
781
|
+
timeoutSeconds: 5
|
|
782
|
+
failureThreshold: 3
|
|
783
|
+
readiness:
|
|
784
|
+
enabled: true
|
|
785
|
+
path: /health/ready
|
|
786
|
+
initialDelaySeconds: 5
|
|
787
|
+
periodSeconds: 10
|
|
788
|
+
timeoutSeconds: 3
|
|
789
|
+
failureThreshold: 3
|
|
790
|
+
startup:
|
|
791
|
+
enabled: true
|
|
792
|
+
path: /health/startup
|
|
793
|
+
initialDelaySeconds: 10
|
|
794
|
+
periodSeconds: 5
|
|
795
|
+
timeoutSeconds: 3
|
|
796
|
+
failureThreshold: 30
|
|
797
|
+
|
|
798
|
+
# ConfigMap data
|
|
799
|
+
config:
|
|
800
|
+
LOG_LEVEL: "info"
|
|
801
|
+
LOG_FORMAT: "json"
|
|
802
|
+
TELEMETRY_ENABLED: "true"
|
|
803
|
+
METRICS_ENABLED: "true"
|
|
804
|
+
TRACING_ENABLED: "true"
|
|
805
|
+
CACHE_TTL: "3600"
|
|
806
|
+
SESSION_TTL: "86400"
|
|
807
|
+
MAX_CONCURRENT_AGENTS: "10"
|
|
808
|
+
MAX_TOKENS_PER_REQUEST: "100000"
|
|
809
|
+
DEFAULT_MODEL: "claude-opus-4-5-20251101"
|
|
810
|
+
|
|
811
|
+
# Secrets (use external secrets in production)
|
|
812
|
+
secrets:
|
|
813
|
+
create: true
|
|
814
|
+
annotations: {}
|
|
815
|
+
# Anthropic API key
|
|
816
|
+
ANTHROPIC_API_KEY: ""
|
|
817
|
+
# Database connection string
|
|
818
|
+
DATABASE_URL: ""
|
|
819
|
+
# Redis connection string
|
|
820
|
+
REDIS_URL: ""
|
|
821
|
+
# RabbitMQ connection string
|
|
822
|
+
RABBITMQ_URL: ""
|
|
823
|
+
# JWT secret
|
|
824
|
+
JWT_SECRET: ""
|
|
825
|
+
|
|
826
|
+
# External Secrets Operator integration
|
|
827
|
+
externalSecrets:
|
|
828
|
+
enabled: false
|
|
829
|
+
secretStoreRef:
|
|
830
|
+
name: vault-backend
|
|
831
|
+
kind: SecretStore
|
|
832
|
+
target:
|
|
833
|
+
name: elsabro-secrets
|
|
834
|
+
data: []
|
|
835
|
+
|
|
836
|
+
# Network Policy
|
|
837
|
+
networkPolicy:
|
|
838
|
+
enabled: true
|
|
839
|
+
ingress:
|
|
840
|
+
- from:
|
|
841
|
+
- namespaceSelector:
|
|
842
|
+
matchLabels:
|
|
843
|
+
name: ingress-nginx
|
|
844
|
+
- podSelector:
|
|
845
|
+
matchLabels:
|
|
846
|
+
app.kubernetes.io/name: prometheus
|
|
847
|
+
ports:
|
|
848
|
+
- port: 8080
|
|
849
|
+
protocol: TCP
|
|
850
|
+
- port: 9090
|
|
851
|
+
protocol: TCP
|
|
852
|
+
egress:
|
|
853
|
+
- to:
|
|
854
|
+
- namespaceSelector: {}
|
|
855
|
+
ports:
|
|
856
|
+
- port: 443
|
|
857
|
+
protocol: TCP
|
|
858
|
+
- port: 5432
|
|
859
|
+
protocol: TCP
|
|
860
|
+
- port: 6379
|
|
861
|
+
protocol: TCP
|
|
862
|
+
- port: 5672
|
|
863
|
+
protocol: TCP
|
|
864
|
+
|
|
865
|
+
# Service Monitor for Prometheus
|
|
866
|
+
serviceMonitor:
|
|
867
|
+
enabled: true
|
|
868
|
+
namespace: ""
|
|
869
|
+
interval: 30s
|
|
870
|
+
scrapeTimeout: 10s
|
|
871
|
+
labels: {}
|
|
872
|
+
relabelings: []
|
|
873
|
+
metricRelabelings: []
|
|
874
|
+
|
|
875
|
+
# Prometheus Rules
|
|
876
|
+
prometheusRule:
|
|
877
|
+
enabled: true
|
|
878
|
+
namespace: ""
|
|
879
|
+
labels: {}
|
|
880
|
+
rules:
|
|
881
|
+
- alert: ElsabroHighErrorRate
|
|
882
|
+
expr: |
|
|
883
|
+
sum(rate(elsabro_requests_total{status=~"5.."}[5m]))
|
|
884
|
+
/ sum(rate(elsabro_requests_total[5m])) > 0.05
|
|
885
|
+
for: 5m
|
|
886
|
+
labels:
|
|
887
|
+
severity: critical
|
|
888
|
+
annotations:
|
|
889
|
+
summary: "High error rate detected"
|
|
890
|
+
description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
|
|
891
|
+
- alert: ElsabroHighLatency
|
|
892
|
+
expr: |
|
|
893
|
+
histogram_quantile(0.95, sum(rate(elsabro_request_duration_seconds_bucket[5m]))
|
|
894
|
+
by (le)) > 2
|
|
895
|
+
for: 5m
|
|
896
|
+
labels:
|
|
897
|
+
severity: warning
|
|
898
|
+
annotations:
|
|
899
|
+
summary: "High latency detected"
|
|
900
|
+
description: "P95 latency is {{ $value | humanizeDuration }} (threshold: 2s)"
|
|
901
|
+
- alert: ElsabroPodNotReady
|
|
902
|
+
expr: |
|
|
903
|
+
kube_pod_status_ready{namespace="elsabro", condition="true"} == 0
|
|
904
|
+
for: 5m
|
|
905
|
+
labels:
|
|
906
|
+
severity: critical
|
|
907
|
+
annotations:
|
|
908
|
+
summary: "Pod not ready"
|
|
909
|
+
description: "Pod {{ $labels.pod }} has been not ready for 5 minutes"
|
|
910
|
+
|
|
911
|
+
# Redis subchart configuration
|
|
912
|
+
redis:
|
|
913
|
+
enabled: true
|
|
914
|
+
architecture: standalone
|
|
915
|
+
auth:
|
|
916
|
+
enabled: true
|
|
917
|
+
password: ""
|
|
918
|
+
master:
|
|
919
|
+
persistence:
|
|
920
|
+
enabled: true
|
|
921
|
+
size: 8Gi
|
|
922
|
+
resources:
|
|
923
|
+
requests:
|
|
924
|
+
cpu: 100m
|
|
925
|
+
memory: 128Mi
|
|
926
|
+
limits:
|
|
927
|
+
cpu: 500m
|
|
928
|
+
memory: 512Mi
|
|
929
|
+
|
|
930
|
+
# PostgreSQL subchart configuration
|
|
931
|
+
postgresql:
|
|
932
|
+
enabled: true
|
|
933
|
+
auth:
|
|
934
|
+
username: elsabro
|
|
935
|
+
password: ""
|
|
936
|
+
database: elsabro
|
|
937
|
+
primary:
|
|
938
|
+
persistence:
|
|
939
|
+
enabled: true
|
|
940
|
+
size: 20Gi
|
|
941
|
+
resources:
|
|
942
|
+
requests:
|
|
943
|
+
cpu: 250m
|
|
944
|
+
memory: 256Mi
|
|
945
|
+
limits:
|
|
946
|
+
cpu: 1000m
|
|
947
|
+
memory: 1Gi
|
|
948
|
+
|
|
949
|
+
# RabbitMQ subchart configuration
|
|
950
|
+
rabbitmq:
|
|
951
|
+
enabled: true
|
|
952
|
+
auth:
|
|
953
|
+
username: elsabro
|
|
954
|
+
password: ""
|
|
955
|
+
persistence:
|
|
956
|
+
enabled: true
|
|
957
|
+
size: 8Gi
|
|
958
|
+
resources:
|
|
959
|
+
requests:
|
|
960
|
+
cpu: 100m
|
|
961
|
+
memory: 256Mi
|
|
962
|
+
limits:
|
|
963
|
+
cpu: 500m
|
|
964
|
+
memory: 512Mi
|
|
965
|
+
```
|
|
966
|
+
|
|
967
|
+
### values-prod.yaml (Production Overrides)
|
|
968
|
+
|
|
969
|
+
```yaml
|
|
970
|
+
# Production-specific values
|
|
971
|
+
|
|
972
|
+
replicaCount: 5
|
|
973
|
+
|
|
974
|
+
image:
|
|
975
|
+
pullPolicy: Always
|
|
976
|
+
|
|
977
|
+
resources:
|
|
978
|
+
requests:
|
|
979
|
+
cpu: 1000m
|
|
980
|
+
memory: 2Gi
|
|
981
|
+
limits:
|
|
982
|
+
cpu: 4000m
|
|
983
|
+
memory: 8Gi
|
|
984
|
+
|
|
985
|
+
autoscaling:
|
|
986
|
+
enabled: true
|
|
987
|
+
minReplicas: 5
|
|
988
|
+
maxReplicas: 50
|
|
989
|
+
targetCPUUtilizationPercentage: 60
|
|
990
|
+
targetMemoryUtilizationPercentage: 70
|
|
991
|
+
|
|
992
|
+
vpa:
|
|
993
|
+
enabled: true
|
|
994
|
+
updateMode: "Auto"
|
|
995
|
+
|
|
996
|
+
podDisruptionBudget:
|
|
997
|
+
enabled: true
|
|
998
|
+
minAvailable: 3
|
|
999
|
+
|
|
1000
|
+
ingress:
|
|
1001
|
+
enabled: true
|
|
1002
|
+
className: "nginx"
|
|
1003
|
+
annotations:
|
|
1004
|
+
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
1005
|
+
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
|
|
1006
|
+
nginx.ingress.kubernetes.io/rate-limit: "100"
|
|
1007
|
+
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
|
|
1008
|
+
hosts:
|
|
1009
|
+
- host: api.elsabro.io
|
|
1010
|
+
paths:
|
|
1011
|
+
- path: /
|
|
1012
|
+
pathType: Prefix
|
|
1013
|
+
tls:
|
|
1014
|
+
- secretName: elsabro-prod-tls
|
|
1015
|
+
hosts:
|
|
1016
|
+
- api.elsabro.io
|
|
1017
|
+
|
|
1018
|
+
config:
|
|
1019
|
+
LOG_LEVEL: "warn"
|
|
1020
|
+
MAX_CONCURRENT_AGENTS: "50"
|
|
1021
|
+
MAX_TOKENS_PER_REQUEST: "200000"
|
|
1022
|
+
|
|
1023
|
+
externalSecrets:
|
|
1024
|
+
enabled: true
|
|
1025
|
+
secretStoreRef:
|
|
1026
|
+
name: aws-secrets-manager
|
|
1027
|
+
kind: ClusterSecretStore
|
|
1028
|
+
|
|
1029
|
+
redis:
|
|
1030
|
+
architecture: replication
|
|
1031
|
+
replica:
|
|
1032
|
+
replicaCount: 3
|
|
1033
|
+
master:
|
|
1034
|
+
persistence:
|
|
1035
|
+
size: 32Gi
|
|
1036
|
+
resources:
|
|
1037
|
+
requests:
|
|
1038
|
+
cpu: 500m
|
|
1039
|
+
memory: 1Gi
|
|
1040
|
+
limits:
|
|
1041
|
+
cpu: 2000m
|
|
1042
|
+
memory: 4Gi
|
|
1043
|
+
|
|
1044
|
+
postgresql:
|
|
1045
|
+
architecture: replication
|
|
1046
|
+
readReplicas:
|
|
1047
|
+
replicaCount: 2
|
|
1048
|
+
primary:
|
|
1049
|
+
persistence:
|
|
1050
|
+
size: 100Gi
|
|
1051
|
+
resources:
|
|
1052
|
+
requests:
|
|
1053
|
+
cpu: 1000m
|
|
1054
|
+
memory: 2Gi
|
|
1055
|
+
limits:
|
|
1056
|
+
cpu: 4000m
|
|
1057
|
+
memory: 8Gi
|
|
1058
|
+
|
|
1059
|
+
rabbitmq:
|
|
1060
|
+
replicaCount: 3
|
|
1061
|
+
clustering:
|
|
1062
|
+
enabled: true
|
|
1063
|
+
persistence:
|
|
1064
|
+
size: 32Gi
|
|
1065
|
+
resources:
|
|
1066
|
+
requests:
|
|
1067
|
+
cpu: 500m
|
|
1068
|
+
memory: 1Gi
|
|
1069
|
+
limits:
|
|
1070
|
+
cpu: 2000m
|
|
1071
|
+
memory: 4Gi
|
|
1072
|
+
```
|
|
1073
|
+
|
|
1074
|
+
### HelmChartGenerator TypeScript Interface
|
|
1075
|
+
|
|
1076
|
+
```typescript
|
|
1077
|
+
/**
|
|
1078
|
+
* HelmChartGenerator - Generates Helm charts for ELSABRO deployments
|
|
1079
|
+
*/
|
|
1080
|
+
|
|
1081
|
+
interface HelmChartConfig {
|
|
1082
|
+
name: string;
|
|
1083
|
+
version: string;
|
|
1084
|
+
appVersion: string;
|
|
1085
|
+
description: string;
|
|
1086
|
+
dependencies: ChartDependency[];
|
|
1087
|
+
}
|
|
1088
|
+
|
|
1089
|
+
interface ChartDependency {
|
|
1090
|
+
name: string;
|
|
1091
|
+
version: string;
|
|
1092
|
+
repository: string;
|
|
1093
|
+
condition: string;
|
|
1094
|
+
}
|
|
1095
|
+
|
|
1096
|
+
interface ValuesConfig {
|
|
1097
|
+
environment: 'dev' | 'staging' | 'prod';
|
|
1098
|
+
replicas: number;
|
|
1099
|
+
image: ImageConfig;
|
|
1100
|
+
resources: ResourceConfig;
|
|
1101
|
+
autoscaling: AutoscalingConfig;
|
|
1102
|
+
ingress: IngressConfig;
|
|
1103
|
+
config: Record<string, string>;
|
|
1104
|
+
secrets: Record<string, string>;
|
|
1105
|
+
redis: RedisConfig;
|
|
1106
|
+
postgresql: PostgreSQLConfig;
|
|
1107
|
+
rabbitmq: RabbitMQConfig;
|
|
1108
|
+
}
|
|
1109
|
+
|
|
1110
|
+
class HelmChartGenerator {
|
|
1111
|
+
private chartDir: string;
|
|
1112
|
+
|
|
1113
|
+
constructor(outputDir: string) {
|
|
1114
|
+
this.chartDir = path.join(outputDir, 'elsabro-chart');
|
|
1115
|
+
}
|
|
1116
|
+
|
|
1117
|
+
// Generate complete Helm chart
|
|
1118
|
+
async generate(config: HelmChartConfig): Promise<void> {
|
|
1119
|
+
await this.createDirectoryStructure();
|
|
1120
|
+
await this.generateChartYaml(config);
|
|
1121
|
+
await this.generateValuesFiles();
|
|
1122
|
+
await this.generateTemplates();
|
|
1123
|
+
await this.generateHelpers();
|
|
1124
|
+
}
|
|
1125
|
+
|
|
1126
|
+
// Generate Chart.yaml
|
|
1127
|
+
private async generateChartYaml(config: HelmChartConfig): Promise<void> {
|
|
1128
|
+
const chartYaml = {
|
|
1129
|
+
apiVersion: 'v2',
|
|
1130
|
+
name: config.name,
|
|
1131
|
+
description: config.description,
|
|
1132
|
+
type: 'application',
|
|
1133
|
+
version: config.version,
|
|
1134
|
+
appVersion: config.appVersion,
|
|
1135
|
+
kubeVersion: '>=1.25.0-0',
|
|
1136
|
+
dependencies: config.dependencies
|
|
1137
|
+
};
|
|
1138
|
+
|
|
1139
|
+
await this.writeYaml(
|
|
1140
|
+
path.join(this.chartDir, 'Chart.yaml'),
|
|
1141
|
+
chartYaml
|
|
1142
|
+
);
|
|
1143
|
+
}
|
|
1144
|
+
|
|
1145
|
+
// Generate environment-specific values
|
|
1146
|
+
async generateValuesForEnvironment(
|
|
1147
|
+
environment: 'dev' | 'staging' | 'prod',
|
|
1148
|
+
customValues?: Partial<ValuesConfig>
|
|
1149
|
+
): Promise<string> {
|
|
1150
|
+
const baseValues = this.getBaseValues();
|
|
1151
|
+
const envOverrides = this.getEnvironmentOverrides(environment);
|
|
1152
|
+
|
|
1153
|
+
const values = deepMerge(baseValues, envOverrides, customValues || {});
|
|
1154
|
+
|
|
1155
|
+
const outputPath = path.join(
|
|
1156
|
+
this.chartDir,
|
|
1157
|
+
`values-${environment}.yaml`
|
|
1158
|
+
);
|
|
1159
|
+
|
|
1160
|
+
await this.writeYaml(outputPath, values);
|
|
1161
|
+
return outputPath;
|
|
1162
|
+
}
|
|
1163
|
+
|
|
1164
|
+
// Generate Kubernetes manifests from templates
|
|
1165
|
+
async template(
|
|
1166
|
+
releaseName: string,
|
|
1167
|
+
namespace: string,
|
|
1168
|
+
valuesFile: string
|
|
1169
|
+
): Promise<string> {
|
|
1170
|
+
// Use execFile for safe command execution
|
|
1171
|
+
const { stdout } = await execFileAsync('helm', [
|
|
1172
|
+
'template',
|
|
1173
|
+
releaseName,
|
|
1174
|
+
this.chartDir,
|
|
1175
|
+
'--namespace', namespace,
|
|
1176
|
+
'--values', valuesFile
|
|
1177
|
+
]);
|
|
1178
|
+
return stdout;
|
|
1179
|
+
}
|
|
1180
|
+
|
|
1181
|
+
// Validate chart
|
|
1182
|
+
async lint(): Promise<LintResult> {
|
|
1183
|
+
const { stdout, stderr, status } = await execFileAsync('helm', ['lint', this.chartDir]);
|
|
1184
|
+
return {
|
|
1185
|
+
success: status === 0,
|
|
1186
|
+
output: stdout,
|
|
1187
|
+
errors: this.parseLintErrors(stderr)
|
|
1188
|
+
};
|
|
1189
|
+
}
|
|
1190
|
+
|
|
1191
|
+
// Package chart
|
|
1192
|
+
async package(destination: string): Promise<string> {
|
|
1193
|
+
const { stdout } = await execFileAsync('helm', [
|
|
1194
|
+
'package',
|
|
1195
|
+
this.chartDir,
|
|
1196
|
+
'--destination', destination
|
|
1197
|
+
]);
|
|
1198
|
+
const match = stdout.match(/Successfully packaged chart and saved it to: (.+)/);
|
|
1199
|
+
return match ? match[1] : '';
|
|
1200
|
+
}
|
|
1201
|
+
|
|
1202
|
+
// Push to registry
|
|
1203
|
+
async push(packagePath: string, registry: string): Promise<void> {
|
|
1204
|
+
await execFileAsync('helm', ['push', packagePath, registry]);
|
|
1205
|
+
}
|
|
1206
|
+
|
|
1207
|
+
private getEnvironmentOverrides(env: 'dev' | 'staging' | 'prod'): Partial<ValuesConfig> {
|
|
1208
|
+
const overrides: Record<string, Partial<ValuesConfig>> = {
|
|
1209
|
+
dev: {
|
|
1210
|
+
replicas: 1,
|
|
1211
|
+
resources: {
|
|
1212
|
+
requests: { cpu: '250m', memory: '256Mi' },
|
|
1213
|
+
limits: { cpu: '1000m', memory: '1Gi' }
|
|
1214
|
+
},
|
|
1215
|
+
autoscaling: { enabled: false },
|
|
1216
|
+
ingress: { enabled: false }
|
|
1217
|
+
},
|
|
1218
|
+
staging: {
|
|
1219
|
+
replicas: 2,
|
|
1220
|
+
resources: {
|
|
1221
|
+
requests: { cpu: '500m', memory: '512Mi' },
|
|
1222
|
+
limits: { cpu: '2000m', memory: '2Gi' }
|
|
1223
|
+
},
|
|
1224
|
+
autoscaling: { enabled: true, minReplicas: 2, maxReplicas: 5 }
|
|
1225
|
+
},
|
|
1226
|
+
prod: {
|
|
1227
|
+
replicas: 5,
|
|
1228
|
+
resources: {
|
|
1229
|
+
requests: { cpu: '1000m', memory: '2Gi' },
|
|
1230
|
+
limits: { cpu: '4000m', memory: '8Gi' }
|
|
1231
|
+
},
|
|
1232
|
+
autoscaling: { enabled: true, minReplicas: 5, maxReplicas: 50 }
|
|
1233
|
+
}
|
|
1234
|
+
};
|
|
1235
|
+
|
|
1236
|
+
return overrides[env];
|
|
1237
|
+
}
|
|
1238
|
+
}
|
|
1239
|
+
```
|
|
1240
|
+
|
|
1241
|
+
---
|
|
1242
|
+
|
|
1243
|
+
## 3. ResourceScaler
|
|
1244
|
+
|
|
1245
|
+
### TypeScript Interfaces
|
|
1246
|
+
|
|
1247
|
+
```typescript
|
|
1248
|
+
/**
|
|
1249
|
+
* ResourceScaler - Manages HPA, VPA, and custom metrics scaling
|
|
1250
|
+
*/
|
|
1251
|
+
|
|
1252
|
+
// HPA Configuration
|
|
1253
|
+
interface HPAConfig {
|
|
1254
|
+
name: string;
|
|
1255
|
+
namespace: string;
|
|
1256
|
+
targetRef: {
|
|
1257
|
+
apiVersion: string;
|
|
1258
|
+
kind: string;
|
|
1259
|
+
name: string;
|
|
1260
|
+
};
|
|
1261
|
+
minReplicas: number;
|
|
1262
|
+
maxReplicas: number;
|
|
1263
|
+
metrics: HPAMetric[];
|
|
1264
|
+
behavior?: HPABehavior;
|
|
1265
|
+
}
|
|
1266
|
+
|
|
1267
|
+
interface HPAMetric {
|
|
1268
|
+
type: 'Resource' | 'Pods' | 'Object' | 'External';
|
|
1269
|
+
resource?: {
|
|
1270
|
+
name: 'cpu' | 'memory';
|
|
1271
|
+
target: {
|
|
1272
|
+
type: 'Utilization' | 'AverageValue';
|
|
1273
|
+
averageUtilization?: number;
|
|
1274
|
+
averageValue?: string;
|
|
1275
|
+
};
|
|
1276
|
+
};
|
|
1277
|
+
pods?: {
|
|
1278
|
+
metric: { name: string; selector?: LabelSelector };
|
|
1279
|
+
target: { type: 'AverageValue'; averageValue: string };
|
|
1280
|
+
};
|
|
1281
|
+
object?: {
|
|
1282
|
+
describedObject: { apiVersion: string; kind: string; name: string };
|
|
1283
|
+
metric: { name: string };
|
|
1284
|
+
target: { type: 'Value' | 'AverageValue'; value?: string; averageValue?: string };
|
|
1285
|
+
};
|
|
1286
|
+
external?: {
|
|
1287
|
+
metric: { name: string; selector?: LabelSelector };
|
|
1288
|
+
target: { type: 'Value' | 'AverageValue'; value?: string; averageValue?: string };
|
|
1289
|
+
};
|
|
1290
|
+
}
|
|
1291
|
+
|
|
1292
|
+
interface HPABehavior {
|
|
1293
|
+
scaleDown?: ScalingPolicy;
|
|
1294
|
+
scaleUp?: ScalingPolicy;
|
|
1295
|
+
}
|
|
1296
|
+
|
|
1297
|
+
interface ScalingPolicy {
|
|
1298
|
+
stabilizationWindowSeconds?: number;
|
|
1299
|
+
selectPolicy?: 'Max' | 'Min' | 'Disabled';
|
|
1300
|
+
policies?: {
|
|
1301
|
+
type: 'Pods' | 'Percent';
|
|
1302
|
+
value: number;
|
|
1303
|
+
periodSeconds: number;
|
|
1304
|
+
}[];
|
|
1305
|
+
}
|
|
1306
|
+
|
|
1307
|
+
// VPA Configuration
|
|
1308
|
+
interface VPAConfig {
|
|
1309
|
+
name: string;
|
|
1310
|
+
namespace: string;
|
|
1311
|
+
targetRef: {
|
|
1312
|
+
apiVersion: string;
|
|
1313
|
+
kind: string;
|
|
1314
|
+
name: string;
|
|
1315
|
+
};
|
|
1316
|
+
updatePolicy: {
|
|
1317
|
+
updateMode: 'Off' | 'Initial' | 'Recreate' | 'Auto';
|
|
1318
|
+
minReplicas?: number;
|
|
1319
|
+
};
|
|
1320
|
+
resourcePolicy?: {
|
|
1321
|
+
containerPolicies: ContainerResourcePolicy[];
|
|
1322
|
+
};
|
|
1323
|
+
}
|
|
1324
|
+
|
|
1325
|
+
interface ContainerResourcePolicy {
|
|
1326
|
+
containerName: string;
|
|
1327
|
+
mode?: 'Auto' | 'Off';
|
|
1328
|
+
minAllowed?: { cpu?: string; memory?: string };
|
|
1329
|
+
maxAllowed?: { cpu?: string; memory?: string };
|
|
1330
|
+
controlledResources?: ('cpu' | 'memory')[];
|
|
1331
|
+
controlledValues?: 'RequestsAndLimits' | 'RequestsOnly';
|
|
1332
|
+
}
|
|
1333
|
+
|
|
1334
|
+
// Resource Quota Configuration
|
|
1335
|
+
interface ResourceQuotaConfig {
|
|
1336
|
+
name: string;
|
|
1337
|
+
namespace: string;
|
|
1338
|
+
hard: {
|
|
1339
|
+
'requests.cpu'?: string;
|
|
1340
|
+
'requests.memory'?: string;
|
|
1341
|
+
'limits.cpu'?: string;
|
|
1342
|
+
'limits.memory'?: string;
|
|
1343
|
+
'pods'?: string;
|
|
1344
|
+
'services'?: string;
|
|
1345
|
+
'secrets'?: string;
|
|
1346
|
+
'configmaps'?: string;
|
|
1347
|
+
'persistentvolumeclaims'?: string;
|
|
1348
|
+
'requests.storage'?: string;
|
|
1349
|
+
};
|
|
1350
|
+
scopeSelector?: {
|
|
1351
|
+
matchExpressions: {
|
|
1352
|
+
operator: 'In' | 'NotIn' | 'Exists' | 'DoesNotExist';
|
|
1353
|
+
scopeName: 'Terminating' | 'NotTerminating' | 'BestEffort' | 'NotBestEffort' | 'PriorityClass';
|
|
1354
|
+
values?: string[];
|
|
1355
|
+
}[];
|
|
1356
|
+
};
|
|
1357
|
+
}
|
|
1358
|
+
|
|
1359
|
+
// Custom Metrics for ELSABRO
|
|
1360
|
+
interface ElsabroCustomMetrics {
|
|
1361
|
+
// Queue length - number of pending tasks
|
|
1362
|
+
queueLength: {
|
|
1363
|
+
name: 'elsabro_queue_length';
|
|
1364
|
+
targetAverageValue: number;
|
|
1365
|
+
scaleThreshold: number;
|
|
1366
|
+
};
|
|
1367
|
+
// Active agents - number of currently running agents
|
|
1368
|
+
activeAgents: {
|
|
1369
|
+
name: 'elsabro_active_agents';
|
|
1370
|
+
targetAverageValue: number;
|
|
1371
|
+
maxPerPod: number;
|
|
1372
|
+
};
|
|
1373
|
+
// Token usage rate - tokens per second
|
|
1374
|
+
tokenUsageRate: {
|
|
1375
|
+
name: 'elsabro_tokens_per_second';
|
|
1376
|
+
targetAverageValue: number;
|
|
1377
|
+
};
|
|
1378
|
+
// Memory pressure - percentage of allocated memory
|
|
1379
|
+
memoryPressure: {
|
|
1380
|
+
name: 'elsabro_memory_pressure';
|
|
1381
|
+
targetPercentage: number;
|
|
1382
|
+
};
|
|
1383
|
+
}
|
|
1384
|
+
|
|
1385
|
+
class ResourceScaler {
|
|
1386
|
+
private k8sClient: KubernetesClient;
|
|
1387
|
+
private prometheusClient: PrometheusClient;
|
|
1388
|
+
|
|
1389
|
+
constructor(config: ScalerConfig) {
|
|
1390
|
+
this.k8sClient = new KubernetesClient(config.kubeConfig);
|
|
1391
|
+
this.prometheusClient = new PrometheusClient(config.prometheusUrl);
|
|
1392
|
+
}
|
|
1393
|
+
|
|
1394
|
+
// Create or update HPA
|
|
1395
|
+
async configureHPA(config: HPAConfig): Promise<V2HorizontalPodAutoscaler> {
|
|
1396
|
+
const hpa: V2HorizontalPodAutoscaler = {
|
|
1397
|
+
apiVersion: 'autoscaling/v2',
|
|
1398
|
+
kind: 'HorizontalPodAutoscaler',
|
|
1399
|
+
metadata: {
|
|
1400
|
+
name: config.name,
|
|
1401
|
+
namespace: config.namespace,
|
|
1402
|
+
labels: {
|
|
1403
|
+
'app.kubernetes.io/name': 'elsabro',
|
|
1404
|
+
'app.kubernetes.io/component': 'autoscaler'
|
|
1405
|
+
}
|
|
1406
|
+
},
|
|
1407
|
+
spec: {
|
|
1408
|
+
scaleTargetRef: config.targetRef,
|
|
1409
|
+
minReplicas: config.minReplicas,
|
|
1410
|
+
maxReplicas: config.maxReplicas,
|
|
1411
|
+
metrics: config.metrics,
|
|
1412
|
+
behavior: config.behavior
|
|
1413
|
+
}
|
|
1414
|
+
};
|
|
1415
|
+
|
|
1416
|
+
return this.k8sClient.applyResource(hpa);
|
|
1417
|
+
}
|
|
1418
|
+
|
|
1419
|
+
// Create or update VPA
|
|
1420
|
+
async configureVPA(config: VPAConfig): Promise<VerticalPodAutoscaler> {
|
|
1421
|
+
const vpa: VerticalPodAutoscaler = {
|
|
1422
|
+
apiVersion: 'autoscaling.k8s.io/v1',
|
|
1423
|
+
kind: 'VerticalPodAutoscaler',
|
|
1424
|
+
metadata: {
|
|
1425
|
+
name: config.name,
|
|
1426
|
+
namespace: config.namespace
|
|
1427
|
+
},
|
|
1428
|
+
spec: {
|
|
1429
|
+
targetRef: config.targetRef,
|
|
1430
|
+
updatePolicy: config.updatePolicy,
|
|
1431
|
+
resourcePolicy: config.resourcePolicy
|
|
1432
|
+
}
|
|
1433
|
+
};
|
|
1434
|
+
|
|
1435
|
+
return this.k8sClient.applyResource(vpa);
|
|
1436
|
+
}
|
|
1437
|
+
|
|
1438
|
+
// Create resource quota
|
|
1439
|
+
async createResourceQuota(config: ResourceQuotaConfig): Promise<V1ResourceQuota> {
|
|
1440
|
+
const quota: V1ResourceQuota = {
|
|
1441
|
+
apiVersion: 'v1',
|
|
1442
|
+
kind: 'ResourceQuota',
|
|
1443
|
+
metadata: {
|
|
1444
|
+
name: config.name,
|
|
1445
|
+
namespace: config.namespace
|
|
1446
|
+
},
|
|
1447
|
+
spec: {
|
|
1448
|
+
hard: config.hard,
|
|
1449
|
+
scopeSelector: config.scopeSelector
|
|
1450
|
+
}
|
|
1451
|
+
};
|
|
1452
|
+
|
|
1453
|
+
return this.k8sClient.applyResource(quota);
|
|
1454
|
+
}
|
|
1455
|
+
|
|
1456
|
+
// Configure ELSABRO-specific scaling
|
|
1457
|
+
async configureElsabroScaling(
|
|
1458
|
+
namespace: string,
|
|
1459
|
+
deploymentName: string,
|
|
1460
|
+
customMetrics: Partial<ElsabroCustomMetrics>
|
|
1461
|
+
): Promise<ScalingConfiguration> {
|
|
1462
|
+
// Build HPA metrics array
|
|
1463
|
+
const metrics: HPAMetric[] = [
|
|
1464
|
+
// CPU utilization (standard)
|
|
1465
|
+
{
|
|
1466
|
+
type: 'Resource',
|
|
1467
|
+
resource: {
|
|
1468
|
+
name: 'cpu',
|
|
1469
|
+
target: { type: 'Utilization', averageUtilization: 70 }
|
|
1470
|
+
}
|
|
1471
|
+
},
|
|
1472
|
+
// Memory utilization (standard)
|
|
1473
|
+
{
|
|
1474
|
+
type: 'Resource',
|
|
1475
|
+
resource: {
|
|
1476
|
+
name: 'memory',
|
|
1477
|
+
target: { type: 'Utilization', averageUtilization: 80 }
|
|
1478
|
+
}
|
|
1479
|
+
}
|
|
1480
|
+
];
|
|
1481
|
+
|
|
1482
|
+
// Add custom metrics
|
|
1483
|
+
if (customMetrics.queueLength) {
|
|
1484
|
+
metrics.push({
|
|
1485
|
+
type: 'Pods',
|
|
1486
|
+
pods: {
|
|
1487
|
+
metric: { name: customMetrics.queueLength.name },
|
|
1488
|
+
target: {
|
|
1489
|
+
type: 'AverageValue',
|
|
1490
|
+
averageValue: String(customMetrics.queueLength.targetAverageValue)
|
|
1491
|
+
}
|
|
1492
|
+
}
|
|
1493
|
+
});
|
|
1494
|
+
}
|
|
1495
|
+
|
|
1496
|
+
if (customMetrics.activeAgents) {
|
|
1497
|
+
metrics.push({
|
|
1498
|
+
type: 'Pods',
|
|
1499
|
+
pods: {
|
|
1500
|
+
metric: { name: customMetrics.activeAgents.name },
|
|
1501
|
+
target: {
|
|
1502
|
+
type: 'AverageValue',
|
|
1503
|
+
averageValue: String(customMetrics.activeAgents.targetAverageValue)
|
|
1504
|
+
}
|
|
1505
|
+
}
|
|
1506
|
+
});
|
|
1507
|
+
}
|
|
1508
|
+
|
|
1509
|
+
// Configure HPA
|
|
1510
|
+
const hpa = await this.configureHPA({
|
|
1511
|
+
name: `${deploymentName}-hpa`,
|
|
1512
|
+
namespace,
|
|
1513
|
+
targetRef: {
|
|
1514
|
+
apiVersion: 'apps/v1',
|
|
1515
|
+
kind: 'Deployment',
|
|
1516
|
+
name: deploymentName
|
|
1517
|
+
},
|
|
1518
|
+
minReplicas: 2,
|
|
1519
|
+
maxReplicas: 20,
|
|
1520
|
+
metrics,
|
|
1521
|
+
behavior: {
|
|
1522
|
+
scaleDown: {
|
|
1523
|
+
stabilizationWindowSeconds: 300,
|
|
1524
|
+
policies: [
|
|
1525
|
+
{ type: 'Percent', value: 10, periodSeconds: 60 }
|
|
1526
|
+
]
|
|
1527
|
+
},
|
|
1528
|
+
scaleUp: {
|
|
1529
|
+
stabilizationWindowSeconds: 0,
|
|
1530
|
+
policies: [
|
|
1531
|
+
{ type: 'Percent', value: 100, periodSeconds: 15 },
|
|
1532
|
+
{ type: 'Pods', value: 4, periodSeconds: 15 }
|
|
1533
|
+
],
|
|
1534
|
+
selectPolicy: 'Max'
|
|
1535
|
+
}
|
|
1536
|
+
}
|
|
1537
|
+
});
|
|
1538
|
+
|
|
1539
|
+
// Configure VPA (in recommendation mode)
|
|
1540
|
+
const vpa = await this.configureVPA({
|
|
1541
|
+
name: `${deploymentName}-vpa`,
|
|
1542
|
+
namespace,
|
|
1543
|
+
targetRef: {
|
|
1544
|
+
apiVersion: 'apps/v1',
|
|
1545
|
+
kind: 'Deployment',
|
|
1546
|
+
name: deploymentName
|
|
1547
|
+
},
|
|
1548
|
+
updatePolicy: {
|
|
1549
|
+
updateMode: 'Auto',
|
|
1550
|
+
minReplicas: 2
|
|
1551
|
+
},
|
|
1552
|
+
resourcePolicy: {
|
|
1553
|
+
containerPolicies: [{
|
|
1554
|
+
containerName: 'elsabro',
|
|
1555
|
+
minAllowed: { cpu: '250m', memory: '256Mi' },
|
|
1556
|
+
maxAllowed: { cpu: '4000m', memory: '8Gi' },
|
|
1557
|
+
controlledResources: ['cpu', 'memory'],
|
|
1558
|
+
controlledValues: 'RequestsAndLimits'
|
|
1559
|
+
}]
|
|
1560
|
+
}
|
|
1561
|
+
});
|
|
1562
|
+
|
|
1563
|
+
return { hpa, vpa };
|
|
1564
|
+
}
|
|
1565
|
+
|
|
1566
|
+
// Get current scaling metrics
|
|
1567
|
+
async getScalingMetrics(
|
|
1568
|
+
namespace: string,
|
|
1569
|
+
deploymentName: string
|
|
1570
|
+
): Promise<ScalingMetrics> {
|
|
1571
|
+
const hpaStatus = await this.k8sClient.getHPAStatus(
|
|
1572
|
+
`${deploymentName}-hpa`,
|
|
1573
|
+
namespace
|
|
1574
|
+
);
|
|
1575
|
+
|
|
1576
|
+
const prometheusMetrics = await this.prometheusClient.query(`
|
|
1577
|
+
{
|
|
1578
|
+
queue_length: avg(elsabro_queue_length{namespace="${namespace}"}),
|
|
1579
|
+
active_agents: sum(elsabro_active_agents{namespace="${namespace}"}),
|
|
1580
|
+
tokens_per_second: rate(elsabro_tokens_total{namespace="${namespace}"}[5m]),
|
|
1581
|
+
cpu_usage: avg(container_cpu_usage_seconds_total{namespace="${namespace}", container="elsabro"}),
|
|
1582
|
+
memory_usage: avg(container_memory_usage_bytes{namespace="${namespace}", container="elsabro"})
|
|
1583
|
+
}
|
|
1584
|
+
`);
|
|
1585
|
+
|
|
1586
|
+
return {
|
|
1587
|
+
currentReplicas: hpaStatus.currentReplicas,
|
|
1588
|
+
desiredReplicas: hpaStatus.desiredReplicas,
|
|
1589
|
+
metrics: {
|
|
1590
|
+
cpu: prometheusMetrics.cpu_usage,
|
|
1591
|
+
memory: prometheusMetrics.memory_usage,
|
|
1592
|
+
queueLength: prometheusMetrics.queue_length,
|
|
1593
|
+
activeAgents: prometheusMetrics.active_agents,
|
|
1594
|
+
tokensPerSecond: prometheusMetrics.tokens_per_second
|
|
1595
|
+
},
|
|
1596
|
+
recommendations: this.generateRecommendations(prometheusMetrics)
|
|
1597
|
+
};
|
|
1598
|
+
}
|
|
1599
|
+
|
|
1600
|
+
// Manual scaling
|
|
1601
|
+
async scale(
|
|
1602
|
+
namespace: string,
|
|
1603
|
+
deploymentName: string,
|
|
1604
|
+
replicas: number
|
|
1605
|
+
): Promise<void> {
|
|
1606
|
+
await this.k8sClient.scaleDeployment(namespace, deploymentName, replicas);
|
|
1607
|
+
}
|
|
1608
|
+
|
|
1609
|
+
private generateRecommendations(metrics: any): ScalingRecommendation[] {
|
|
1610
|
+
const recommendations: ScalingRecommendation[] = [];
|
|
1611
|
+
|
|
1612
|
+
if (metrics.queue_length > 500) {
|
|
1613
|
+
recommendations.push({
|
|
1614
|
+
type: 'scale_up',
|
|
1615
|
+
reason: 'High queue length',
|
|
1616
|
+
suggestedReplicas: Math.ceil(metrics.queue_length / 100)
|
|
1617
|
+
});
|
|
1618
|
+
}
|
|
1619
|
+
|
|
1620
|
+
if (metrics.active_agents > 40) {
|
|
1621
|
+
recommendations.push({
|
|
1622
|
+
type: 'scale_up',
|
|
1623
|
+
reason: 'High agent concurrency',
|
|
1624
|
+
suggestedReplicas: Math.ceil(metrics.active_agents / 5)
|
|
1625
|
+
});
|
|
1626
|
+
}
|
|
1627
|
+
|
|
1628
|
+
return recommendations;
|
|
1629
|
+
}
|
|
1630
|
+
}
|
|
1631
|
+
```
|
|
1632
|
+
|
|
1633
|
+
### HPA YAML Example
|
|
1634
|
+
|
|
1635
|
+
```yaml
|
|
1636
|
+
apiVersion: autoscaling/v2
|
|
1637
|
+
kind: HorizontalPodAutoscaler
|
|
1638
|
+
metadata:
|
|
1639
|
+
name: elsabro-hpa
|
|
1640
|
+
namespace: elsabro
|
|
1641
|
+
labels:
|
|
1642
|
+
app.kubernetes.io/name: elsabro
|
|
1643
|
+
app.kubernetes.io/component: autoscaler
|
|
1644
|
+
spec:
|
|
1645
|
+
scaleTargetRef:
|
|
1646
|
+
apiVersion: apps/v1
|
|
1647
|
+
kind: Deployment
|
|
1648
|
+
name: elsabro
|
|
1649
|
+
minReplicas: 2
|
|
1650
|
+
maxReplicas: 20
|
|
1651
|
+
metrics:
|
|
1652
|
+
# CPU utilization
|
|
1653
|
+
- type: Resource
|
|
1654
|
+
resource:
|
|
1655
|
+
name: cpu
|
|
1656
|
+
target:
|
|
1657
|
+
type: Utilization
|
|
1658
|
+
averageUtilization: 70
|
|
1659
|
+
# Memory utilization
|
|
1660
|
+
- type: Resource
|
|
1661
|
+
resource:
|
|
1662
|
+
name: memory
|
|
1663
|
+
target:
|
|
1664
|
+
type: Utilization
|
|
1665
|
+
averageUtilization: 80
|
|
1666
|
+
# Custom metric: Queue length
|
|
1667
|
+
- type: Pods
|
|
1668
|
+
pods:
|
|
1669
|
+
metric:
|
|
1670
|
+
name: elsabro_queue_length
|
|
1671
|
+
target:
|
|
1672
|
+
type: AverageValue
|
|
1673
|
+
averageValue: "100"
|
|
1674
|
+
# Custom metric: Active agents
|
|
1675
|
+
- type: Pods
|
|
1676
|
+
pods:
|
|
1677
|
+
metric:
|
|
1678
|
+
name: elsabro_active_agents
|
|
1679
|
+
target:
|
|
1680
|
+
type: AverageValue
|
|
1681
|
+
averageValue: "5"
|
|
1682
|
+
behavior:
|
|
1683
|
+
scaleDown:
|
|
1684
|
+
stabilizationWindowSeconds: 300
|
|
1685
|
+
policies:
|
|
1686
|
+
- type: Percent
|
|
1687
|
+
value: 10
|
|
1688
|
+
periodSeconds: 60
|
|
1689
|
+
scaleUp:
|
|
1690
|
+
stabilizationWindowSeconds: 0
|
|
1691
|
+
policies:
|
|
1692
|
+
- type: Percent
|
|
1693
|
+
value: 100
|
|
1694
|
+
periodSeconds: 15
|
|
1695
|
+
- type: Pods
|
|
1696
|
+
value: 4
|
|
1697
|
+
periodSeconds: 15
|
|
1698
|
+
selectPolicy: Max
|
|
1699
|
+
```
|
|
1700
|
+
|
|
1701
|
+
---
|
|
1702
|
+
|
|
1703
|
+
## 4. HealthMonitor
|
|
1704
|
+
|
|
1705
|
+
### TypeScript Interfaces
|
|
1706
|
+
|
|
1707
|
+
```typescript
|
|
1708
|
+
/**
|
|
1709
|
+
* HealthMonitor - Health checks, metrics, and alerting for ELSABRO
|
|
1710
|
+
*/
|
|
1711
|
+
|
|
1712
|
+
// Health check types
|
|
1713
|
+
interface HealthCheckResult {
|
|
1714
|
+
status: 'healthy' | 'degraded' | 'unhealthy';
|
|
1715
|
+
timestamp: string;
|
|
1716
|
+
checks: {
|
|
1717
|
+
[key: string]: {
|
|
1718
|
+
status: 'pass' | 'warn' | 'fail';
|
|
1719
|
+
message?: string;
|
|
1720
|
+
duration_ms?: number;
|
|
1721
|
+
data?: Record<string, any>;
|
|
1722
|
+
};
|
|
1723
|
+
};
|
|
1724
|
+
version: string;
|
|
1725
|
+
uptime_seconds: number;
|
|
1726
|
+
}
|
|
1727
|
+
|
|
1728
|
+
// Probe configuration
|
|
1729
|
+
interface ProbeConfig {
|
|
1730
|
+
liveness: {
|
|
1731
|
+
path: string;
|
|
1732
|
+
port: number;
|
|
1733
|
+
initialDelaySeconds: number;
|
|
1734
|
+
periodSeconds: number;
|
|
1735
|
+
timeoutSeconds: number;
|
|
1736
|
+
failureThreshold: number;
|
|
1737
|
+
successThreshold: number;
|
|
1738
|
+
};
|
|
1739
|
+
readiness: {
|
|
1740
|
+
path: string;
|
|
1741
|
+
port: number;
|
|
1742
|
+
initialDelaySeconds: number;
|
|
1743
|
+
periodSeconds: number;
|
|
1744
|
+
timeoutSeconds: number;
|
|
1745
|
+
failureThreshold: number;
|
|
1746
|
+
successThreshold: number;
|
|
1747
|
+
};
|
|
1748
|
+
startup: {
|
|
1749
|
+
path: string;
|
|
1750
|
+
port: number;
|
|
1751
|
+
initialDelaySeconds: number;
|
|
1752
|
+
periodSeconds: number;
|
|
1753
|
+
timeoutSeconds: number;
|
|
1754
|
+
failureThreshold: number;
|
|
1755
|
+
};
|
|
1756
|
+
}
|
|
1757
|
+
|
|
1758
|
+
// Metrics endpoint
|
|
1759
|
+
interface MetricsConfig {
|
|
1760
|
+
enabled: boolean;
|
|
1761
|
+
port: number;
|
|
1762
|
+
path: string;
|
|
1763
|
+
namespace: string;
|
|
1764
|
+
labels: Record<string, string>;
|
|
1765
|
+
histogramBuckets: {
|
|
1766
|
+
requestDuration: number[];
|
|
1767
|
+
tokenUsage: number[];
|
|
1768
|
+
agentDuration: number[];
|
|
1769
|
+
};
|
|
1770
|
+
}
|
|
1771
|
+
|
|
1772
|
+
// Alert rule
|
|
1773
|
+
interface AlertRule {
|
|
1774
|
+
name: string;
|
|
1775
|
+
expression: string;
|
|
1776
|
+
duration: string;
|
|
1777
|
+
severity: 'critical' | 'warning' | 'info';
|
|
1778
|
+
summary: string;
|
|
1779
|
+
description: string;
|
|
1780
|
+
runbook_url?: string;
|
|
1781
|
+
labels?: Record<string, string>;
|
|
1782
|
+
annotations?: Record<string, string>;
|
|
1783
|
+
}
|
|
1784
|
+
|
|
1785
|
+
class HealthMonitor {
|
|
1786
|
+
private startTime: Date;
|
|
1787
|
+
private checks: Map<string, HealthCheck>;
|
|
1788
|
+
private metricsRegistry: MetricsRegistry;
|
|
1789
|
+
|
|
1790
|
+
constructor(config: HealthMonitorConfig) {
|
|
1791
|
+
this.startTime = new Date();
|
|
1792
|
+
this.checks = new Map();
|
|
1793
|
+
this.metricsRegistry = new MetricsRegistry(config.metrics);
|
|
1794
|
+
this.registerDefaultChecks();
|
|
1795
|
+
this.registerDefaultMetrics();
|
|
1796
|
+
}
|
|
1797
|
+
|
|
1798
|
+
// Register default health checks
|
|
1799
|
+
private registerDefaultChecks(): void {
|
|
1800
|
+
// Database connection
|
|
1801
|
+
this.registerCheck('database', async () => {
|
|
1802
|
+
const start = Date.now();
|
|
1803
|
+
try {
|
|
1804
|
+
await this.db.query('SELECT 1');
|
|
1805
|
+
return {
|
|
1806
|
+
status: 'pass',
|
|
1807
|
+
duration_ms: Date.now() - start
|
|
1808
|
+
};
|
|
1809
|
+
} catch (error) {
|
|
1810
|
+
return {
|
|
1811
|
+
status: 'fail',
|
|
1812
|
+
message: error.message,
|
|
1813
|
+
duration_ms: Date.now() - start
|
|
1814
|
+
};
|
|
1815
|
+
}
|
|
1816
|
+
});
|
|
1817
|
+
|
|
1818
|
+
// Redis connection
|
|
1819
|
+
this.registerCheck('redis', async () => {
|
|
1820
|
+
const start = Date.now();
|
|
1821
|
+
try {
|
|
1822
|
+
await this.redis.ping();
|
|
1823
|
+
return {
|
|
1824
|
+
status: 'pass',
|
|
1825
|
+
duration_ms: Date.now() - start
|
|
1826
|
+
};
|
|
1827
|
+
} catch (error) {
|
|
1828
|
+
return {
|
|
1829
|
+
status: 'fail',
|
|
1830
|
+
message: error.message,
|
|
1831
|
+
duration_ms: Date.now() - start
|
|
1832
|
+
};
|
|
1833
|
+
}
|
|
1834
|
+
});
|
|
1835
|
+
|
|
1836
|
+
// RabbitMQ connection
|
|
1837
|
+
this.registerCheck('rabbitmq', async () => {
|
|
1838
|
+
const start = Date.now();
|
|
1839
|
+
try {
|
|
1840
|
+
const channel = await this.amqp.checkQueue('elsabro_tasks');
|
|
1841
|
+
return {
|
|
1842
|
+
status: 'pass',
|
|
1843
|
+
duration_ms: Date.now() - start,
|
|
1844
|
+
data: { messageCount: channel.messageCount }
|
|
1845
|
+
};
|
|
1846
|
+
} catch (error) {
|
|
1847
|
+
return {
|
|
1848
|
+
status: 'fail',
|
|
1849
|
+
message: error.message,
|
|
1850
|
+
duration_ms: Date.now() - start
|
|
1851
|
+
};
|
|
1852
|
+
}
|
|
1853
|
+
});
|
|
1854
|
+
|
|
1855
|
+
// Memory usage
|
|
1856
|
+
this.registerCheck('memory', async () => {
|
|
1857
|
+
const usage = process.memoryUsage();
|
|
1858
|
+
const heapUsedPercent = (usage.heapUsed / usage.heapTotal) * 100;
|
|
1859
|
+
return {
|
|
1860
|
+
status: heapUsedPercent > 90 ? 'warn' : 'pass',
|
|
1861
|
+
data: {
|
|
1862
|
+
heapUsed: usage.heapUsed,
|
|
1863
|
+
heapTotal: usage.heapTotal,
|
|
1864
|
+
external: usage.external,
|
|
1865
|
+
rss: usage.rss
|
|
1866
|
+
}
|
|
1867
|
+
};
|
|
1868
|
+
});
|
|
1869
|
+
|
|
1870
|
+
// Agent pool
|
|
1871
|
+
this.registerCheck('agent_pool', async () => {
|
|
1872
|
+
const pool = await this.agentManager.getPoolStatus();
|
|
1873
|
+
return {
|
|
1874
|
+
status: pool.available > 0 ? 'pass' : 'warn',
|
|
1875
|
+
data: {
|
|
1876
|
+
available: pool.available,
|
|
1877
|
+
active: pool.active,
|
|
1878
|
+
queued: pool.queued
|
|
1879
|
+
}
|
|
1880
|
+
};
|
|
1881
|
+
});
|
|
1882
|
+
}
|
|
1883
|
+
|
|
1884
|
+
// Register default metrics
|
|
1885
|
+
private registerDefaultMetrics(): void {
|
|
1886
|
+
// Request counter
|
|
1887
|
+
this.metricsRegistry.registerCounter({
|
|
1888
|
+
name: 'elsabro_requests_total',
|
|
1889
|
+
help: 'Total number of requests',
|
|
1890
|
+
labelNames: ['method', 'path', 'status']
|
|
1891
|
+
});
|
|
1892
|
+
|
|
1893
|
+
// Request duration histogram
|
|
1894
|
+
this.metricsRegistry.registerHistogram({
|
|
1895
|
+
name: 'elsabro_request_duration_seconds',
|
|
1896
|
+
help: 'Request duration in seconds',
|
|
1897
|
+
labelNames: ['method', 'path'],
|
|
1898
|
+
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 30, 60]
|
|
1899
|
+
});
|
|
1900
|
+
|
|
1901
|
+
// Agent invocations
|
|
1902
|
+
this.metricsRegistry.registerCounter({
|
|
1903
|
+
name: 'elsabro_agent_invocations_total',
|
|
1904
|
+
help: 'Total agent invocations',
|
|
1905
|
+
labelNames: ['agent', 'model', 'status']
|
|
1906
|
+
});
|
|
1907
|
+
|
|
1908
|
+
// Agent duration
|
|
1909
|
+
this.metricsRegistry.registerHistogram({
|
|
1910
|
+
name: 'elsabro_agent_duration_seconds',
|
|
1911
|
+
help: 'Agent execution duration',
|
|
1912
|
+
labelNames: ['agent', 'model'],
|
|
1913
|
+
buckets: [1, 5, 10, 30, 60, 120, 300, 600]
|
|
1914
|
+
});
|
|
1915
|
+
|
|
1916
|
+
// Token usage
|
|
1917
|
+
this.metricsRegistry.registerCounter({
|
|
1918
|
+
name: 'elsabro_tokens_total',
|
|
1919
|
+
help: 'Total tokens used',
|
|
1920
|
+
labelNames: ['model', 'type']
|
|
1921
|
+
});
|
|
1922
|
+
|
|
1923
|
+
// Queue length gauge
|
|
1924
|
+
this.metricsRegistry.registerGauge({
|
|
1925
|
+
name: 'elsabro_queue_length',
|
|
1926
|
+
help: 'Current queue length'
|
|
1927
|
+
});
|
|
1928
|
+
|
|
1929
|
+
// Active agents gauge
|
|
1930
|
+
this.metricsRegistry.registerGauge({
|
|
1931
|
+
name: 'elsabro_active_agents',
|
|
1932
|
+
help: 'Currently active agents'
|
|
1933
|
+
});
|
|
1934
|
+
|
|
1935
|
+
// Flow executions
|
|
1936
|
+
this.metricsRegistry.registerCounter({
|
|
1937
|
+
name: 'elsabro_flow_executions_total',
|
|
1938
|
+
help: 'Total flow executions',
|
|
1939
|
+
labelNames: ['flow', 'status']
|
|
1940
|
+
});
|
|
1941
|
+
|
|
1942
|
+
// Checkpoint counter
|
|
1943
|
+
this.metricsRegistry.registerCounter({
|
|
1944
|
+
name: 'elsabro_checkpoints_total',
|
|
1945
|
+
help: 'Total checkpoints saved',
|
|
1946
|
+
labelNames: ['type']
|
|
1947
|
+
});
|
|
1948
|
+
|
|
1949
|
+
// Error counter
|
|
1950
|
+
this.metricsRegistry.registerCounter({
|
|
1951
|
+
name: 'elsabro_errors_total',
|
|
1952
|
+
help: 'Total errors',
|
|
1953
|
+
labelNames: ['type', 'severity']
|
|
1954
|
+
});
|
|
1955
|
+
}
|
|
1956
|
+
|
|
1957
|
+
// Liveness endpoint handler
|
|
1958
|
+
async handleLiveness(req: Request, res: Response): Promise<void> {
|
|
1959
|
+
// Liveness just checks if the process is running
|
|
1960
|
+
res.status(200).json({
|
|
1961
|
+
status: 'ok',
|
|
1962
|
+
timestamp: new Date().toISOString()
|
|
1963
|
+
});
|
|
1964
|
+
}
|
|
1965
|
+
|
|
1966
|
+
// Readiness endpoint handler
|
|
1967
|
+
async handleReadiness(req: Request, res: Response): Promise<void> {
|
|
1968
|
+
const result = await this.runChecks(['database', 'redis']);
|
|
1969
|
+
const status = result.status === 'healthy' ? 200 : 503;
|
|
1970
|
+
res.status(status).json(result);
|
|
1971
|
+
}
|
|
1972
|
+
|
|
1973
|
+
// Startup endpoint handler
|
|
1974
|
+
async handleStartup(req: Request, res: Response): Promise<void> {
|
|
1975
|
+
const result = await this.runChecks(['database', 'redis', 'rabbitmq']);
|
|
1976
|
+
const status = result.status === 'healthy' ? 200 : 503;
|
|
1977
|
+
res.status(status).json(result);
|
|
1978
|
+
}
|
|
1979
|
+
|
|
1980
|
+
// Full health endpoint
|
|
1981
|
+
async handleHealth(req: Request, res: Response): Promise<void> {
|
|
1982
|
+
const result = await this.runAllChecks();
|
|
1983
|
+
const status = result.status === 'healthy' ? 200 :
|
|
1984
|
+
result.status === 'degraded' ? 200 : 503;
|
|
1985
|
+
res.status(status).json(result);
|
|
1986
|
+
}
|
|
1987
|
+
|
|
1988
|
+
// Metrics endpoint
|
|
1989
|
+
async handleMetrics(req: Request, res: Response): Promise<void> {
|
|
1990
|
+
res.set('Content-Type', this.metricsRegistry.contentType);
|
|
1991
|
+
res.end(await this.metricsRegistry.metrics());
|
|
1992
|
+
}
|
|
1993
|
+
|
|
1994
|
+
// Run specific checks
|
|
1995
|
+
async runChecks(checkNames: string[]): Promise<HealthCheckResult> {
|
|
1996
|
+
const checks: HealthCheckResult['checks'] = {};
|
|
1997
|
+
let hasFailure = false;
|
|
1998
|
+
let hasWarning = false;
|
|
1999
|
+
|
|
2000
|
+
for (const name of checkNames) {
|
|
2001
|
+
const check = this.checks.get(name);
|
|
2002
|
+
if (check) {
|
|
2003
|
+
checks[name] = await check();
|
|
2004
|
+
if (checks[name].status === 'fail') hasFailure = true;
|
|
2005
|
+
if (checks[name].status === 'warn') hasWarning = true;
|
|
2006
|
+
}
|
|
2007
|
+
}
|
|
2008
|
+
|
|
2009
|
+
return {
|
|
2010
|
+
status: hasFailure ? 'unhealthy' : hasWarning ? 'degraded' : 'healthy',
|
|
2011
|
+
timestamp: new Date().toISOString(),
|
|
2012
|
+
checks,
|
|
2013
|
+
version: process.env.APP_VERSION || '3.6.0',
|
|
2014
|
+
uptime_seconds: (Date.now() - this.startTime.getTime()) / 1000
|
|
2015
|
+
};
|
|
2016
|
+
}
|
|
2017
|
+
|
|
2018
|
+
// Register custom check
|
|
2019
|
+
registerCheck(name: string, check: () => Promise<HealthCheckResult['checks'][string]>): void {
|
|
2020
|
+
this.checks.set(name, check);
|
|
2021
|
+
}
|
|
2022
|
+
|
|
2023
|
+
// Get alerting rules
|
|
2024
|
+
getAlertingRules(): AlertRule[] {
|
|
2025
|
+
return [
|
|
2026
|
+
{
|
|
2027
|
+
name: 'ElsabroHighErrorRate',
|
|
2028
|
+
expression: `
|
|
2029
|
+
sum(rate(elsabro_errors_total{severity="critical"}[5m]))
|
|
2030
|
+
/ sum(rate(elsabro_requests_total[5m])) > 0.05
|
|
2031
|
+
`,
|
|
2032
|
+
duration: '5m',
|
|
2033
|
+
severity: 'critical',
|
|
2034
|
+
summary: 'ELSABRO high error rate',
|
|
2035
|
+
description: 'Error rate is {{ $value | humanizePercentage }} (threshold: 5%)',
|
|
2036
|
+
runbook_url: 'https://docs.elsabro.dev/runbooks/high-error-rate'
|
|
2037
|
+
},
|
|
2038
|
+
{
|
|
2039
|
+
name: 'ElsabroHighLatency',
|
|
2040
|
+
expression: `
|
|
2041
|
+
histogram_quantile(0.95,
|
|
2042
|
+
sum(rate(elsabro_request_duration_seconds_bucket[5m])) by (le)
|
|
2043
|
+
) > 5
|
|
2044
|
+
`,
|
|
2045
|
+
duration: '5m',
|
|
2046
|
+
severity: 'warning',
|
|
2047
|
+
summary: 'ELSABRO high latency',
|
|
2048
|
+
description: 'P95 latency is {{ $value | humanizeDuration }}'
|
|
2049
|
+
},
|
|
2050
|
+
{
|
|
2051
|
+
name: 'ElsabroQueueBacklog',
|
|
2052
|
+
expression: 'elsabro_queue_length > 1000',
|
|
2053
|
+
duration: '10m',
|
|
2054
|
+
severity: 'warning',
|
|
2055
|
+
summary: 'ELSABRO queue backlog',
|
|
2056
|
+
description: 'Queue has {{ $value }} pending tasks'
|
|
2057
|
+
},
|
|
2058
|
+
{
|
|
2059
|
+
name: 'ElsabroAgentFailures',
|
|
2060
|
+
expression: `
|
|
2061
|
+
sum(rate(elsabro_agent_invocations_total{status="failed"}[5m]))
|
|
2062
|
+
/ sum(rate(elsabro_agent_invocations_total[5m])) > 0.1
|
|
2063
|
+
`,
|
|
2064
|
+
duration: '5m',
|
|
2065
|
+
severity: 'critical',
|
|
2066
|
+
summary: 'ELSABRO high agent failure rate',
|
|
2067
|
+
description: 'Agent failure rate is {{ $value | humanizePercentage }}'
|
|
2068
|
+
},
|
|
2069
|
+
{
|
|
2070
|
+
name: 'ElsabroDatabaseConnectionFailure',
|
|
2071
|
+
expression: 'elsabro_database_connections_active == 0',
|
|
2072
|
+
duration: '1m',
|
|
2073
|
+
severity: 'critical',
|
|
2074
|
+
summary: 'ELSABRO database connection failure',
|
|
2075
|
+
description: 'No active database connections'
|
|
2076
|
+
},
|
|
2077
|
+
{
|
|
2078
|
+
name: 'ElsabroRedisConnectionFailure',
|
|
2079
|
+
expression: 'elsabro_redis_connections_active == 0',
|
|
2080
|
+
duration: '1m',
|
|
2081
|
+
severity: 'critical',
|
|
2082
|
+
summary: 'ELSABRO Redis connection failure',
|
|
2083
|
+
description: 'No active Redis connections'
|
|
2084
|
+
},
|
|
2085
|
+
{
|
|
2086
|
+
name: 'ElsabroHighMemoryUsage',
|
|
2087
|
+
expression: `
|
|
2088
|
+
container_memory_usage_bytes{container="elsabro"}
|
|
2089
|
+
/ container_spec_memory_limit_bytes{container="elsabro"} > 0.9
|
|
2090
|
+
`,
|
|
2091
|
+
duration: '5m',
|
|
2092
|
+
severity: 'warning',
|
|
2093
|
+
summary: 'ELSABRO high memory usage',
|
|
2094
|
+
description: 'Memory usage is {{ $value | humanizePercentage }} of limit'
|
|
2095
|
+
},
|
|
2096
|
+
{
|
|
2097
|
+
name: 'ElsabroPodCrashLooping',
|
|
2098
|
+
expression: 'rate(kube_pod_container_status_restarts_total{container="elsabro"}[15m]) > 0',
|
|
2099
|
+
duration: '15m',
|
|
2100
|
+
severity: 'critical',
|
|
2101
|
+
summary: 'ELSABRO pod crash looping',
|
|
2102
|
+
description: 'Pod {{ $labels.pod }} is crash looping'
|
|
2103
|
+
}
|
|
2104
|
+
];
|
|
2105
|
+
}
|
|
2106
|
+
}
|
|
2107
|
+
```
|
|
2108
|
+
|
|
2109
|
+
### Grafana Dashboard JSON
|
|
2110
|
+
|
|
2111
|
+
```json
|
|
2112
|
+
{
|
|
2113
|
+
"dashboard": {
|
|
2114
|
+
"id": null,
|
|
2115
|
+
"uid": "elsabro-overview",
|
|
2116
|
+
"title": "ELSABRO Overview",
|
|
2117
|
+
"tags": ["elsabro", "ai", "agents"],
|
|
2118
|
+
"timezone": "browser",
|
|
2119
|
+
"schemaVersion": 38,
|
|
2120
|
+
"version": 1,
|
|
2121
|
+
"refresh": "30s",
|
|
2122
|
+
"panels": [
|
|
2123
|
+
{
|
|
2124
|
+
"id": 1,
|
|
2125
|
+
"title": "Request Rate",
|
|
2126
|
+
"type": "stat",
|
|
2127
|
+
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 0 },
|
|
2128
|
+
"targets": [{
|
|
2129
|
+
"expr": "sum(rate(elsabro_requests_total[5m]))",
|
|
2130
|
+
"legendFormat": "requests/sec"
|
|
2131
|
+
}]
|
|
2132
|
+
},
|
|
2133
|
+
{
|
|
2134
|
+
"id": 2,
|
|
2135
|
+
"title": "Error Rate",
|
|
2136
|
+
"type": "stat",
|
|
2137
|
+
"gridPos": { "h": 4, "w": 6, "x": 6, "y": 0 },
|
|
2138
|
+
"targets": [{
|
|
2139
|
+
"expr": "sum(rate(elsabro_errors_total[5m])) / sum(rate(elsabro_requests_total[5m])) * 100",
|
|
2140
|
+
"legendFormat": "error %"
|
|
2141
|
+
}],
|
|
2142
|
+
"fieldConfig": {
|
|
2143
|
+
"defaults": {
|
|
2144
|
+
"unit": "percent",
|
|
2145
|
+
"thresholds": {
|
|
2146
|
+
"steps": [
|
|
2147
|
+
{ "value": 0, "color": "green" },
|
|
2148
|
+
{ "value": 1, "color": "yellow" },
|
|
2149
|
+
{ "value": 5, "color": "red" }
|
|
2150
|
+
]
|
|
2151
|
+
}
|
|
2152
|
+
}
|
|
2153
|
+
}
|
|
2154
|
+
},
|
|
2155
|
+
{
|
|
2156
|
+
"id": 3,
|
|
2157
|
+
"title": "P95 Latency",
|
|
2158
|
+
"type": "stat",
|
|
2159
|
+
"gridPos": { "h": 4, "w": 6, "x": 12, "y": 0 },
|
|
2160
|
+
"targets": [{
|
|
2161
|
+
"expr": "histogram_quantile(0.95, sum(rate(elsabro_request_duration_seconds_bucket[5m])) by (le))",
|
|
2162
|
+
"legendFormat": "p95"
|
|
2163
|
+
}],
|
|
2164
|
+
"fieldConfig": {
|
|
2165
|
+
"defaults": {
|
|
2166
|
+
"unit": "s",
|
|
2167
|
+
"thresholds": {
|
|
2168
|
+
"steps": [
|
|
2169
|
+
{ "value": 0, "color": "green" },
|
|
2170
|
+
{ "value": 2, "color": "yellow" },
|
|
2171
|
+
{ "value": 5, "color": "red" }
|
|
2172
|
+
]
|
|
2173
|
+
}
|
|
2174
|
+
}
|
|
2175
|
+
}
|
|
2176
|
+
},
|
|
2177
|
+
{
|
|
2178
|
+
"id": 4,
|
|
2179
|
+
"title": "Active Agents",
|
|
2180
|
+
"type": "stat",
|
|
2181
|
+
"gridPos": { "h": 4, "w": 6, "x": 18, "y": 0 },
|
|
2182
|
+
"targets": [{
|
|
2183
|
+
"expr": "sum(elsabro_active_agents)",
|
|
2184
|
+
"legendFormat": "agents"
|
|
2185
|
+
}]
|
|
2186
|
+
},
|
|
2187
|
+
{
|
|
2188
|
+
"id": 5,
|
|
2189
|
+
"title": "Request Rate by Endpoint",
|
|
2190
|
+
"type": "timeseries",
|
|
2191
|
+
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 },
|
|
2192
|
+
"targets": [{
|
|
2193
|
+
"expr": "sum(rate(elsabro_requests_total[5m])) by (path)",
|
|
2194
|
+
"legendFormat": "{{ path }}"
|
|
2195
|
+
}]
|
|
2196
|
+
},
|
|
2197
|
+
{
|
|
2198
|
+
"id": 6,
|
|
2199
|
+
"title": "Latency Distribution",
|
|
2200
|
+
"type": "heatmap",
|
|
2201
|
+
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 },
|
|
2202
|
+
"targets": [{
|
|
2203
|
+
"expr": "sum(rate(elsabro_request_duration_seconds_bucket[5m])) by (le)",
|
|
2204
|
+
"format": "heatmap"
|
|
2205
|
+
}]
|
|
2206
|
+
},
|
|
2207
|
+
{
|
|
2208
|
+
"id": 7,
|
|
2209
|
+
"title": "Agent Invocations by Model",
|
|
2210
|
+
"type": "timeseries",
|
|
2211
|
+
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 12 },
|
|
2212
|
+
"targets": [{
|
|
2213
|
+
"expr": "sum(rate(elsabro_agent_invocations_total[5m])) by (model)",
|
|
2214
|
+
"legendFormat": "{{ model }}"
|
|
2215
|
+
}]
|
|
2216
|
+
},
|
|
2217
|
+
{
|
|
2218
|
+
"id": 8,
|
|
2219
|
+
"title": "Token Usage",
|
|
2220
|
+
"type": "timeseries",
|
|
2221
|
+
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 12 },
|
|
2222
|
+
"targets": [
|
|
2223
|
+
{
|
|
2224
|
+
"expr": "sum(rate(elsabro_tokens_total{type=\"input\"}[5m]))",
|
|
2225
|
+
"legendFormat": "Input Tokens"
|
|
2226
|
+
},
|
|
2227
|
+
{
|
|
2228
|
+
"expr": "sum(rate(elsabro_tokens_total{type=\"output\"}[5m]))",
|
|
2229
|
+
"legendFormat": "Output Tokens"
|
|
2230
|
+
}
|
|
2231
|
+
]
|
|
2232
|
+
},
|
|
2233
|
+
{
|
|
2234
|
+
"id": 9,
|
|
2235
|
+
"title": "Queue Length",
|
|
2236
|
+
"type": "timeseries",
|
|
2237
|
+
"gridPos": { "h": 6, "w": 8, "x": 0, "y": 20 },
|
|
2238
|
+
"targets": [{
|
|
2239
|
+
"expr": "elsabro_queue_length",
|
|
2240
|
+
"legendFormat": "queue"
|
|
2241
|
+
}],
|
|
2242
|
+
"fieldConfig": {
|
|
2243
|
+
"defaults": {
|
|
2244
|
+
"thresholds": {
|
|
2245
|
+
"steps": [
|
|
2246
|
+
{ "value": 0, "color": "green" },
|
|
2247
|
+
{ "value": 500, "color": "yellow" },
|
|
2248
|
+
{ "value": 1000, "color": "red" }
|
|
2249
|
+
]
|
|
2250
|
+
}
|
|
2251
|
+
}
|
|
2252
|
+
}
|
|
2253
|
+
},
|
|
2254
|
+
{
|
|
2255
|
+
"id": 10,
|
|
2256
|
+
"title": "Pod CPU Usage",
|
|
2257
|
+
"type": "timeseries",
|
|
2258
|
+
"gridPos": { "h": 6, "w": 8, "x": 8, "y": 20 },
|
|
2259
|
+
"targets": [{
|
|
2260
|
+
"expr": "sum(rate(container_cpu_usage_seconds_total{container=\"elsabro\"}[5m])) by (pod)",
|
|
2261
|
+
"legendFormat": "{{ pod }}"
|
|
2262
|
+
}]
|
|
2263
|
+
},
|
|
2264
|
+
{
|
|
2265
|
+
"id": 11,
|
|
2266
|
+
"title": "Pod Memory Usage",
|
|
2267
|
+
"type": "timeseries",
|
|
2268
|
+
"gridPos": { "h": 6, "w": 8, "x": 16, "y": 20 },
|
|
2269
|
+
"targets": [{
|
|
2270
|
+
"expr": "container_memory_usage_bytes{container=\"elsabro\"}",
|
|
2271
|
+
"legendFormat": "{{ pod }}"
|
|
2272
|
+
}],
|
|
2273
|
+
"fieldConfig": {
|
|
2274
|
+
"defaults": { "unit": "bytes" }
|
|
2275
|
+
}
|
|
2276
|
+
}
|
|
2277
|
+
]
|
|
2278
|
+
}
|
|
2279
|
+
}
|
|
2280
|
+
```
|
|
2281
|
+
|
|
2282
|
+
---
|
|
2283
|
+
|
|
2284
|
+
## 5. Infrastructure Components
|
|
2285
|
+
|
|
2286
|
+
### Redis Configuration
|
|
2287
|
+
|
|
2288
|
+
```yaml
|
|
2289
|
+
apiVersion: v1
|
|
2290
|
+
kind: ConfigMap
|
|
2291
|
+
metadata:
|
|
2292
|
+
name: redis-config
|
|
2293
|
+
namespace: elsabro
|
|
2294
|
+
data:
|
|
2295
|
+
redis.conf: |
|
|
2296
|
+
# Memory management
|
|
2297
|
+
maxmemory 2gb
|
|
2298
|
+
maxmemory-policy allkeys-lru
|
|
2299
|
+
|
|
2300
|
+
# Persistence
|
|
2301
|
+
appendonly yes
|
|
2302
|
+
appendfsync everysec
|
|
2303
|
+
|
|
2304
|
+
# Performance
|
|
2305
|
+
tcp-backlog 511
|
|
2306
|
+
tcp-keepalive 300
|
|
2307
|
+
|
|
2308
|
+
# Security
|
|
2309
|
+
protected-mode yes
|
|
2310
|
+
bind 0.0.0.0
|
|
2311
|
+
|
|
2312
|
+
# Cluster (if enabled)
|
|
2313
|
+
# cluster-enabled yes
|
|
2314
|
+
# cluster-config-file nodes.conf
|
|
2315
|
+
# cluster-node-timeout 5000
|
|
2316
|
+
---
|
|
2317
|
+
apiVersion: apps/v1
|
|
2318
|
+
kind: StatefulSet
|
|
2319
|
+
metadata:
|
|
2320
|
+
name: redis
|
|
2321
|
+
namespace: elsabro
|
|
2322
|
+
spec:
|
|
2323
|
+
serviceName: redis
|
|
2324
|
+
replicas: 1
|
|
2325
|
+
selector:
|
|
2326
|
+
matchLabels:
|
|
2327
|
+
app: redis
|
|
2328
|
+
template:
|
|
2329
|
+
metadata:
|
|
2330
|
+
labels:
|
|
2331
|
+
app: redis
|
|
2332
|
+
spec:
|
|
2333
|
+
containers:
|
|
2334
|
+
- name: redis
|
|
2335
|
+
image: redis:7-alpine
|
|
2336
|
+
ports:
|
|
2337
|
+
- containerPort: 6379
|
|
2338
|
+
resources:
|
|
2339
|
+
requests:
|
|
2340
|
+
cpu: 100m
|
|
2341
|
+
memory: 256Mi
|
|
2342
|
+
limits:
|
|
2343
|
+
cpu: 500m
|
|
2344
|
+
memory: 2Gi
|
|
2345
|
+
volumeMounts:
|
|
2346
|
+
- name: redis-data
|
|
2347
|
+
mountPath: /data
|
|
2348
|
+
- name: redis-config
|
|
2349
|
+
mountPath: /usr/local/etc/redis
|
|
2350
|
+
command:
|
|
2351
|
+
- redis-server
|
|
2352
|
+
- /usr/local/etc/redis/redis.conf
|
|
2353
|
+
readinessProbe:
|
|
2354
|
+
exec:
|
|
2355
|
+
command: ["redis-cli", "ping"]
|
|
2356
|
+
initialDelaySeconds: 5
|
|
2357
|
+
periodSeconds: 10
|
|
2358
|
+
livenessProbe:
|
|
2359
|
+
exec:
|
|
2360
|
+
command: ["redis-cli", "ping"]
|
|
2361
|
+
initialDelaySeconds: 15
|
|
2362
|
+
periodSeconds: 20
|
|
2363
|
+
volumes:
|
|
2364
|
+
- name: redis-config
|
|
2365
|
+
configMap:
|
|
2366
|
+
name: redis-config
|
|
2367
|
+
volumeClaimTemplates:
|
|
2368
|
+
- metadata:
|
|
2369
|
+
name: redis-data
|
|
2370
|
+
spec:
|
|
2371
|
+
accessModes: ["ReadWriteOnce"]
|
|
2372
|
+
resources:
|
|
2373
|
+
requests:
|
|
2374
|
+
storage: 10Gi
|
|
2375
|
+
---
|
|
2376
|
+
apiVersion: v1
|
|
2377
|
+
kind: Service
|
|
2378
|
+
metadata:
|
|
2379
|
+
name: redis
|
|
2380
|
+
namespace: elsabro
|
|
2381
|
+
spec:
|
|
2382
|
+
ports:
|
|
2383
|
+
- port: 6379
|
|
2384
|
+
targetPort: 6379
|
|
2385
|
+
selector:
|
|
2386
|
+
app: redis
|
|
2387
|
+
clusterIP: None
|
|
2388
|
+
```
|
|
2389
|
+
|
|
2390
|
+
### PostgreSQL Configuration
|
|
2391
|
+
|
|
2392
|
+
```yaml
|
|
2393
|
+
apiVersion: v1
|
|
2394
|
+
kind: ConfigMap
|
|
2395
|
+
metadata:
|
|
2396
|
+
name: postgres-config
|
|
2397
|
+
namespace: elsabro
|
|
2398
|
+
data:
|
|
2399
|
+
postgresql.conf: |
|
|
2400
|
+
# Connection settings
|
|
2401
|
+
listen_addresses = '*'
|
|
2402
|
+
max_connections = 200
|
|
2403
|
+
|
|
2404
|
+
# Memory settings
|
|
2405
|
+
shared_buffers = 256MB
|
|
2406
|
+
effective_cache_size = 768MB
|
|
2407
|
+
work_mem = 4MB
|
|
2408
|
+
maintenance_work_mem = 64MB
|
|
2409
|
+
|
|
2410
|
+
# WAL settings
|
|
2411
|
+
wal_level = replica
|
|
2412
|
+
max_wal_senders = 3
|
|
2413
|
+
max_replication_slots = 3
|
|
2414
|
+
|
|
2415
|
+
# Logging
|
|
2416
|
+
log_destination = 'stderr'
|
|
2417
|
+
logging_collector = on
|
|
2418
|
+
log_directory = 'log'
|
|
2419
|
+
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
|
|
2420
|
+
log_statement = 'ddl'
|
|
2421
|
+
log_min_duration_statement = 1000
|
|
2422
|
+
|
|
2423
|
+
# Performance
|
|
2424
|
+
random_page_cost = 1.1
|
|
2425
|
+
effective_io_concurrency = 200
|
|
2426
|
+
|
|
2427
|
+
pg_hba.conf: |
|
|
2428
|
+
# TYPE DATABASE USER ADDRESS METHOD
|
|
2429
|
+
local all all trust
|
|
2430
|
+
host all all 127.0.0.1/32 scram-sha-256
|
|
2431
|
+
host all all ::1/128 scram-sha-256
|
|
2432
|
+
host all all 0.0.0.0/0 scram-sha-256
|
|
2433
|
+
host replication all 0.0.0.0/0 scram-sha-256
|
|
2434
|
+
---
|
|
2435
|
+
apiVersion: apps/v1
|
|
2436
|
+
kind: StatefulSet
|
|
2437
|
+
metadata:
|
|
2438
|
+
name: postgres
|
|
2439
|
+
namespace: elsabro
|
|
2440
|
+
spec:
|
|
2441
|
+
serviceName: postgres
|
|
2442
|
+
replicas: 1
|
|
2443
|
+
selector:
|
|
2444
|
+
matchLabels:
|
|
2445
|
+
app: postgres
|
|
2446
|
+
template:
|
|
2447
|
+
metadata:
|
|
2448
|
+
labels:
|
|
2449
|
+
app: postgres
|
|
2450
|
+
spec:
|
|
2451
|
+
containers:
|
|
2452
|
+
- name: postgres
|
|
2453
|
+
image: postgres:16-alpine
|
|
2454
|
+
ports:
|
|
2455
|
+
- containerPort: 5432
|
|
2456
|
+
env:
|
|
2457
|
+
- name: POSTGRES_DB
|
|
2458
|
+
value: elsabro
|
|
2459
|
+
- name: POSTGRES_USER
|
|
2460
|
+
valueFrom:
|
|
2461
|
+
secretKeyRef:
|
|
2462
|
+
name: postgres-secret
|
|
2463
|
+
key: username
|
|
2464
|
+
- name: POSTGRES_PASSWORD
|
|
2465
|
+
valueFrom:
|
|
2466
|
+
secretKeyRef:
|
|
2467
|
+
name: postgres-secret
|
|
2468
|
+
key: password
|
|
2469
|
+
- name: PGDATA
|
|
2470
|
+
value: /var/lib/postgresql/data/pgdata
|
|
2471
|
+
resources:
|
|
2472
|
+
requests:
|
|
2473
|
+
cpu: 250m
|
|
2474
|
+
memory: 512Mi
|
|
2475
|
+
limits:
|
|
2476
|
+
cpu: 1000m
|
|
2477
|
+
memory: 2Gi
|
|
2478
|
+
volumeMounts:
|
|
2479
|
+
- name: postgres-data
|
|
2480
|
+
mountPath: /var/lib/postgresql/data
|
|
2481
|
+
- name: postgres-config
|
|
2482
|
+
mountPath: /etc/postgresql
|
|
2483
|
+
readinessProbe:
|
|
2484
|
+
exec:
|
|
2485
|
+
command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
|
|
2486
|
+
initialDelaySeconds: 5
|
|
2487
|
+
periodSeconds: 10
|
|
2488
|
+
livenessProbe:
|
|
2489
|
+
exec:
|
|
2490
|
+
command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
|
|
2491
|
+
initialDelaySeconds: 15
|
|
2492
|
+
periodSeconds: 20
|
|
2493
|
+
volumes:
|
|
2494
|
+
- name: postgres-config
|
|
2495
|
+
configMap:
|
|
2496
|
+
name: postgres-config
|
|
2497
|
+
volumeClaimTemplates:
|
|
2498
|
+
- metadata:
|
|
2499
|
+
name: postgres-data
|
|
2500
|
+
spec:
|
|
2501
|
+
accessModes: ["ReadWriteOnce"]
|
|
2502
|
+
resources:
|
|
2503
|
+
requests:
|
|
2504
|
+
storage: 50Gi
|
|
2505
|
+
---
|
|
2506
|
+
apiVersion: v1
|
|
2507
|
+
kind: Service
|
|
2508
|
+
metadata:
|
|
2509
|
+
name: postgres
|
|
2510
|
+
namespace: elsabro
|
|
2511
|
+
spec:
|
|
2512
|
+
ports:
|
|
2513
|
+
- port: 5432
|
|
2514
|
+
targetPort: 5432
|
|
2515
|
+
selector:
|
|
2516
|
+
app: postgres
|
|
2517
|
+
clusterIP: None
|
|
2518
|
+
```
|
|
2519
|
+
|
|
2520
|
+
### RabbitMQ Configuration
|
|
2521
|
+
|
|
2522
|
+
```yaml
|
|
2523
|
+
apiVersion: v1
|
|
2524
|
+
kind: ConfigMap
|
|
2525
|
+
metadata:
|
|
2526
|
+
name: rabbitmq-config
|
|
2527
|
+
namespace: elsabro
|
|
2528
|
+
data:
|
|
2529
|
+
rabbitmq.conf: |
|
|
2530
|
+
## Cluster formation
|
|
2531
|
+
cluster_formation.peer_discovery_backend = k8s
|
|
2532
|
+
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
|
|
2533
|
+
cluster_formation.k8s.address_type = hostname
|
|
2534
|
+
cluster_formation.node_cleanup.interval = 30
|
|
2535
|
+
cluster_formation.node_cleanup.only_log_warning = true
|
|
2536
|
+
cluster_partition_handling = autoheal
|
|
2537
|
+
|
|
2538
|
+
## Queue settings
|
|
2539
|
+
queue_master_locator = min-masters
|
|
2540
|
+
|
|
2541
|
+
## Networking
|
|
2542
|
+
listeners.tcp.default = 5672
|
|
2543
|
+
management.listener.port = 15672
|
|
2544
|
+
|
|
2545
|
+
## Memory management
|
|
2546
|
+
vm_memory_high_watermark.relative = 0.7
|
|
2547
|
+
vm_memory_high_watermark_paging_ratio = 0.75
|
|
2548
|
+
|
|
2549
|
+
## Disk free limit
|
|
2550
|
+
disk_free_limit.absolute = 1GB
|
|
2551
|
+
|
|
2552
|
+
enabled_plugins: |
|
|
2553
|
+
[rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_prometheus].
|
|
2554
|
+
---
|
|
2555
|
+
apiVersion: apps/v1
|
|
2556
|
+
kind: StatefulSet
|
|
2557
|
+
metadata:
|
|
2558
|
+
name: rabbitmq
|
|
2559
|
+
namespace: elsabro
|
|
2560
|
+
spec:
|
|
2561
|
+
serviceName: rabbitmq
|
|
2562
|
+
replicas: 3
|
|
2563
|
+
selector:
|
|
2564
|
+
matchLabels:
|
|
2565
|
+
app: rabbitmq
|
|
2566
|
+
template:
|
|
2567
|
+
metadata:
|
|
2568
|
+
labels:
|
|
2569
|
+
app: rabbitmq
|
|
2570
|
+
spec:
|
|
2571
|
+
serviceAccountName: rabbitmq
|
|
2572
|
+
containers:
|
|
2573
|
+
- name: rabbitmq
|
|
2574
|
+
image: rabbitmq:3.12-management-alpine
|
|
2575
|
+
ports:
|
|
2576
|
+
- containerPort: 5672
|
|
2577
|
+
name: amqp
|
|
2578
|
+
- containerPort: 15672
|
|
2579
|
+
name: management
|
|
2580
|
+
- containerPort: 15692
|
|
2581
|
+
name: prometheus
|
|
2582
|
+
env:
|
|
2583
|
+
- name: RABBITMQ_DEFAULT_USER
|
|
2584
|
+
valueFrom:
|
|
2585
|
+
secretKeyRef:
|
|
2586
|
+
name: rabbitmq-secret
|
|
2587
|
+
key: username
|
|
2588
|
+
- name: RABBITMQ_DEFAULT_PASS
|
|
2589
|
+
valueFrom:
|
|
2590
|
+
secretKeyRef:
|
|
2591
|
+
name: rabbitmq-secret
|
|
2592
|
+
key: password
|
|
2593
|
+
- name: RABBITMQ_ERLANG_COOKIE
|
|
2594
|
+
valueFrom:
|
|
2595
|
+
secretKeyRef:
|
|
2596
|
+
name: rabbitmq-secret
|
|
2597
|
+
key: erlang-cookie
|
|
2598
|
+
- name: K8S_SERVICE_NAME
|
|
2599
|
+
value: rabbitmq
|
|
2600
|
+
- name: RABBITMQ_NODENAME
|
|
2601
|
+
value: rabbit@$(POD_NAME).rabbitmq.elsabro.svc.cluster.local
|
|
2602
|
+
- name: POD_NAME
|
|
2603
|
+
valueFrom:
|
|
2604
|
+
fieldRef:
|
|
2605
|
+
fieldPath: metadata.name
|
|
2606
|
+
resources:
|
|
2607
|
+
requests:
|
|
2608
|
+
cpu: 200m
|
|
2609
|
+
memory: 512Mi
|
|
2610
|
+
limits:
|
|
2611
|
+
cpu: 1000m
|
|
2612
|
+
memory: 2Gi
|
|
2613
|
+
volumeMounts:
|
|
2614
|
+
- name: rabbitmq-data
|
|
2615
|
+
mountPath: /var/lib/rabbitmq
|
|
2616
|
+
- name: rabbitmq-config
|
|
2617
|
+
mountPath: /etc/rabbitmq
|
|
2618
|
+
readinessProbe:
|
|
2619
|
+
exec:
|
|
2620
|
+
command: ["rabbitmq-diagnostics", "check_running"]
|
|
2621
|
+
initialDelaySeconds: 20
|
|
2622
|
+
periodSeconds: 10
|
|
2623
|
+
livenessProbe:
|
|
2624
|
+
exec:
|
|
2625
|
+
command: ["rabbitmq-diagnostics", "ping"]
|
|
2626
|
+
initialDelaySeconds: 60
|
|
2627
|
+
periodSeconds: 30
|
|
2628
|
+
volumes:
|
|
2629
|
+
- name: rabbitmq-config
|
|
2630
|
+
configMap:
|
|
2631
|
+
name: rabbitmq-config
|
|
2632
|
+
volumeClaimTemplates:
|
|
2633
|
+
- metadata:
|
|
2634
|
+
name: rabbitmq-data
|
|
2635
|
+
spec:
|
|
2636
|
+
accessModes: ["ReadWriteOnce"]
|
|
2637
|
+
resources:
|
|
2638
|
+
requests:
|
|
2639
|
+
storage: 10Gi
|
|
2640
|
+
---
|
|
2641
|
+
apiVersion: v1
|
|
2642
|
+
kind: Service
|
|
2643
|
+
metadata:
|
|
2644
|
+
name: rabbitmq
|
|
2645
|
+
namespace: elsabro
|
|
2646
|
+
spec:
|
|
2647
|
+
ports:
|
|
2648
|
+
- port: 5672
|
|
2649
|
+
targetPort: 5672
|
|
2650
|
+
name: amqp
|
|
2651
|
+
- port: 15672
|
|
2652
|
+
targetPort: 15672
|
|
2653
|
+
name: management
|
|
2654
|
+
- port: 15692
|
|
2655
|
+
targetPort: 15692
|
|
2656
|
+
name: prometheus
|
|
2657
|
+
selector:
|
|
2658
|
+
app: rabbitmq
|
|
2659
|
+
clusterIP: None
|
|
2660
|
+
```
|
|
2661
|
+
|
|
2662
|
+
### Ingress Configuration (Nginx)
|
|
2663
|
+
|
|
2664
|
+
```yaml
|
|
2665
|
+
apiVersion: networking.k8s.io/v1
|
|
2666
|
+
kind: Ingress
|
|
2667
|
+
metadata:
|
|
2668
|
+
name: elsabro-ingress
|
|
2669
|
+
namespace: elsabro
|
|
2670
|
+
annotations:
|
|
2671
|
+
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
2672
|
+
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
|
|
2673
|
+
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
|
|
2674
|
+
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
|
|
2675
|
+
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
|
|
2676
|
+
nginx.ingress.kubernetes.io/proxy-buffering: "on"
|
|
2677
|
+
nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
|
|
2678
|
+
nginx.ingress.kubernetes.io/rate-limit: "100"
|
|
2679
|
+
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
|
|
2680
|
+
nginx.ingress.kubernetes.io/enable-cors: "true"
|
|
2681
|
+
nginx.ingress.kubernetes.io/cors-allow-origin: "*"
|
|
2682
|
+
nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS"
|
|
2683
|
+
nginx.ingress.kubernetes.io/cors-allow-headers: "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization"
|
|
2684
|
+
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
|
2685
|
+
spec:
|
|
2686
|
+
ingressClassName: nginx
|
|
2687
|
+
tls:
|
|
2688
|
+
- hosts:
|
|
2689
|
+
- api.elsabro.io
|
|
2690
|
+
- ws.elsabro.io
|
|
2691
|
+
secretName: elsabro-tls
|
|
2692
|
+
rules:
|
|
2693
|
+
- host: api.elsabro.io
|
|
2694
|
+
http:
|
|
2695
|
+
paths:
|
|
2696
|
+
- path: /
|
|
2697
|
+
pathType: Prefix
|
|
2698
|
+
backend:
|
|
2699
|
+
service:
|
|
2700
|
+
name: elsabro
|
|
2701
|
+
port:
|
|
2702
|
+
number: 8080
|
|
2703
|
+
- host: ws.elsabro.io
|
|
2704
|
+
http:
|
|
2705
|
+
paths:
|
|
2706
|
+
- path: /
|
|
2707
|
+
pathType: Prefix
|
|
2708
|
+
backend:
|
|
2709
|
+
service:
|
|
2710
|
+
name: elsabro
|
|
2711
|
+
port:
|
|
2712
|
+
number: 8080
|
|
2713
|
+
```
|
|
2714
|
+
|
|
2715
|
+
---
|
|
2716
|
+
|
|
2717
|
+
## 6. Docker Multi-Stage Build
|
|
2718
|
+
|
|
2719
|
+
### Dockerfile
|
|
2720
|
+
|
|
2721
|
+
```dockerfile
|
|
2722
|
+
# ==============================================================================
|
|
2723
|
+
# ELSABRO Docker Multi-Stage Build
|
|
2724
|
+
# Version: 3.6.0
|
|
2725
|
+
# ==============================================================================
|
|
2726
|
+
|
|
2727
|
+
# ------------------------------------------------------------------------------
|
|
2728
|
+
# Stage 1: Dependencies
|
|
2729
|
+
# ------------------------------------------------------------------------------
|
|
2730
|
+
FROM node:20-alpine AS deps
|
|
2731
|
+
|
|
2732
|
+
WORKDIR /app
|
|
2733
|
+
|
|
2734
|
+
# Install build dependencies
|
|
2735
|
+
RUN apk add --no-cache python3 make g++ git
|
|
2736
|
+
|
|
2737
|
+
# Copy package files
|
|
2738
|
+
COPY package.json package-lock.json ./
|
|
2739
|
+
|
|
2740
|
+
# Install production dependencies
|
|
2741
|
+
RUN npm ci --only=production && \
|
|
2742
|
+
cp -R node_modules /prod_modules
|
|
2743
|
+
|
|
2744
|
+
# Install all dependencies (including devDependencies)
|
|
2745
|
+
RUN npm ci
|
|
2746
|
+
|
|
2747
|
+
# ------------------------------------------------------------------------------
|
|
2748
|
+
# Stage 2: Builder
|
|
2749
|
+
# ------------------------------------------------------------------------------
|
|
2750
|
+
FROM node:20-alpine AS builder
|
|
2751
|
+
|
|
2752
|
+
WORKDIR /app
|
|
2753
|
+
|
|
2754
|
+
# Copy dependencies from deps stage
|
|
2755
|
+
COPY --from=deps /app/node_modules ./node_modules
|
|
2756
|
+
|
|
2757
|
+
# Copy source code
|
|
2758
|
+
COPY . .
|
|
2759
|
+
|
|
2760
|
+
# Build TypeScript
|
|
2761
|
+
RUN npm run build
|
|
2762
|
+
|
|
2763
|
+
# Run tests (optional, can be skipped with --build-arg SKIP_TESTS=true)
|
|
2764
|
+
ARG SKIP_TESTS=false
|
|
2765
|
+
RUN if [ "$SKIP_TESTS" = "false" ]; then npm run test; fi
|
|
2766
|
+
|
|
2767
|
+
# Prune dev dependencies
|
|
2768
|
+
RUN npm prune --production
|
|
2769
|
+
|
|
2770
|
+
# ------------------------------------------------------------------------------
|
|
2771
|
+
# Stage 3: Production
|
|
2772
|
+
# ------------------------------------------------------------------------------
|
|
2773
|
+
FROM node:20-alpine AS production
|
|
2774
|
+
|
|
2775
|
+
# Labels
|
|
2776
|
+
LABEL org.opencontainers.image.title="ELSABRO"
|
|
2777
|
+
LABEL org.opencontainers.image.description="AI-Powered Development Workflow System"
|
|
2778
|
+
LABEL org.opencontainers.image.version="3.6.0"
|
|
2779
|
+
LABEL org.opencontainers.image.vendor="CubaIT"
|
|
2780
|
+
LABEL org.opencontainers.image.source="https://github.com/cubait/elsabro"
|
|
2781
|
+
|
|
2782
|
+
# Create non-root user
|
|
2783
|
+
RUN addgroup -g 1000 elsabro && \
|
|
2784
|
+
adduser -u 1000 -G elsabro -D -h /app elsabro
|
|
2785
|
+
|
|
2786
|
+
WORKDIR /app
|
|
2787
|
+
|
|
2788
|
+
# Install runtime dependencies
|
|
2789
|
+
RUN apk add --no-cache \
|
|
2790
|
+
dumb-init \
|
|
2791
|
+
tini \
|
|
2792
|
+
curl \
|
|
2793
|
+
ca-certificates
|
|
2794
|
+
|
|
2795
|
+
# Copy built artifacts
|
|
2796
|
+
COPY --from=builder --chown=elsabro:elsabro /app/dist ./dist
|
|
2797
|
+
COPY --from=builder --chown=elsabro:elsabro /app/node_modules ./node_modules
|
|
2798
|
+
COPY --from=builder --chown=elsabro:elsabro /app/package.json ./
|
|
2799
|
+
|
|
2800
|
+
# Create directories for runtime data
|
|
2801
|
+
RUN mkdir -p /app/.elsabro /app/logs /app/tmp && \
|
|
2802
|
+
chown -R elsabro:elsabro /app
|
|
2803
|
+
|
|
2804
|
+
# Switch to non-root user
|
|
2805
|
+
USER elsabro
|
|
2806
|
+
|
|
2807
|
+
# Environment variables
|
|
2808
|
+
ENV NODE_ENV=production
|
|
2809
|
+
ENV PORT=8080
|
|
2810
|
+
ENV METRICS_PORT=9090
|
|
2811
|
+
ENV LOG_LEVEL=info
|
|
2812
|
+
ENV LOG_FORMAT=json
|
|
2813
|
+
|
|
2814
|
+
# Expose ports
|
|
2815
|
+
EXPOSE 8080 9090
|
|
2816
|
+
|
|
2817
|
+
# Health check
|
|
2818
|
+
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
|
|
2819
|
+
CMD curl -f http://localhost:8080/health/live || exit 1
|
|
2820
|
+
|
|
2821
|
+
# Use tini as init system
|
|
2822
|
+
ENTRYPOINT ["/sbin/tini", "--"]
|
|
2823
|
+
|
|
2824
|
+
# Start application
|
|
2825
|
+
CMD ["node", "dist/server.js"]
|
|
2826
|
+
|
|
2827
|
+
# ------------------------------------------------------------------------------
|
|
2828
|
+
# Stage 4: Development (optional)
|
|
2829
|
+
# ------------------------------------------------------------------------------
|
|
2830
|
+
FROM node:20-alpine AS development
|
|
2831
|
+
|
|
2832
|
+
WORKDIR /app
|
|
2833
|
+
|
|
2834
|
+
# Install development dependencies
|
|
2835
|
+
RUN apk add --no-cache python3 make g++ git
|
|
2836
|
+
|
|
2837
|
+
# Copy all dependencies
|
|
2838
|
+
COPY --from=deps /app/node_modules ./node_modules
|
|
2839
|
+
|
|
2840
|
+
# Copy source code
|
|
2841
|
+
COPY . .
|
|
2842
|
+
|
|
2843
|
+
# Environment variables
|
|
2844
|
+
ENV NODE_ENV=development
|
|
2845
|
+
ENV PORT=8080
|
|
2846
|
+
|
|
2847
|
+
# Expose ports
|
|
2848
|
+
EXPOSE 8080 9090 9229
|
|
2849
|
+
|
|
2850
|
+
# Start with hot reload
|
|
2851
|
+
CMD ["npm", "run", "dev"]
|
|
2852
|
+
```
|
|
2853
|
+
|
|
2854
|
+
### docker-compose.yml
|
|
2855
|
+
|
|
2856
|
+
```yaml
|
|
2857
|
+
version: '3.8'
|
|
2858
|
+
|
|
2859
|
+
services:
|
|
2860
|
+
elsabro:
|
|
2861
|
+
build:
|
|
2862
|
+
context: .
|
|
2863
|
+
target: production
|
|
2864
|
+
image: ghcr.io/cubait/elsabro:3.6.0
|
|
2865
|
+
container_name: elsabro
|
|
2866
|
+
restart: unless-stopped
|
|
2867
|
+
ports:
|
|
2868
|
+
- "8080:8080"
|
|
2869
|
+
- "9090:9090"
|
|
2870
|
+
environment:
|
|
2871
|
+
- NODE_ENV=production
|
|
2872
|
+
- DATABASE_URL=postgresql://elsabro:${POSTGRES_PASSWORD}@postgres:5432/elsabro
|
|
2873
|
+
- REDIS_URL=redis://:${REDIS_PASSWORD}@redis:6379/0
|
|
2874
|
+
- RABBITMQ_URL=amqp://elsabro:${RABBITMQ_PASSWORD}@rabbitmq:5672
|
|
2875
|
+
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
|
|
2876
|
+
- JWT_SECRET=${JWT_SECRET}
|
|
2877
|
+
- LOG_LEVEL=info
|
|
2878
|
+
depends_on:
|
|
2879
|
+
postgres:
|
|
2880
|
+
condition: service_healthy
|
|
2881
|
+
redis:
|
|
2882
|
+
condition: service_healthy
|
|
2883
|
+
rabbitmq:
|
|
2884
|
+
condition: service_healthy
|
|
2885
|
+
healthcheck:
|
|
2886
|
+
test: ["CMD", "curl", "-f", "http://localhost:8080/health/live"]
|
|
2887
|
+
interval: 30s
|
|
2888
|
+
timeout: 10s
|
|
2889
|
+
retries: 3
|
|
2890
|
+
start_period: 30s
|
|
2891
|
+
networks:
|
|
2892
|
+
- elsabro-network
|
|
2893
|
+
|
|
2894
|
+
postgres:
|
|
2895
|
+
image: postgres:16-alpine
|
|
2896
|
+
container_name: elsabro-postgres
|
|
2897
|
+
restart: unless-stopped
|
|
2898
|
+
environment:
|
|
2899
|
+
- POSTGRES_DB=elsabro
|
|
2900
|
+
- POSTGRES_USER=elsabro
|
|
2901
|
+
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
|
|
2902
|
+
volumes:
|
|
2903
|
+
- postgres-data:/var/lib/postgresql/data
|
|
2904
|
+
healthcheck:
|
|
2905
|
+
test: ["CMD-SHELL", "pg_isready -U elsabro"]
|
|
2906
|
+
interval: 10s
|
|
2907
|
+
timeout: 5s
|
|
2908
|
+
retries: 5
|
|
2909
|
+
networks:
|
|
2910
|
+
- elsabro-network
|
|
2911
|
+
|
|
2912
|
+
redis:
|
|
2913
|
+
image: redis:7-alpine
|
|
2914
|
+
container_name: elsabro-redis
|
|
2915
|
+
restart: unless-stopped
|
|
2916
|
+
command: redis-server --requirepass ${REDIS_PASSWORD}
|
|
2917
|
+
volumes:
|
|
2918
|
+
- redis-data:/data
|
|
2919
|
+
healthcheck:
|
|
2920
|
+
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
|
|
2921
|
+
interval: 10s
|
|
2922
|
+
timeout: 5s
|
|
2923
|
+
retries: 5
|
|
2924
|
+
networks:
|
|
2925
|
+
- elsabro-network
|
|
2926
|
+
|
|
2927
|
+
rabbitmq:
|
|
2928
|
+
image: rabbitmq:3.12-management-alpine
|
|
2929
|
+
container_name: elsabro-rabbitmq
|
|
2930
|
+
restart: unless-stopped
|
|
2931
|
+
environment:
|
|
2932
|
+
- RABBITMQ_DEFAULT_USER=elsabro
|
|
2933
|
+
- RABBITMQ_DEFAULT_PASS=${RABBITMQ_PASSWORD}
|
|
2934
|
+
volumes:
|
|
2935
|
+
- rabbitmq-data:/var/lib/rabbitmq
|
|
2936
|
+
healthcheck:
|
|
2937
|
+
test: ["CMD", "rabbitmq-diagnostics", "ping"]
|
|
2938
|
+
interval: 30s
|
|
2939
|
+
timeout: 10s
|
|
2940
|
+
retries: 5
|
|
2941
|
+
networks:
|
|
2942
|
+
- elsabro-network
|
|
2943
|
+
|
|
2944
|
+
volumes:
|
|
2945
|
+
postgres-data:
|
|
2946
|
+
redis-data:
|
|
2947
|
+
rabbitmq-data:
|
|
2948
|
+
|
|
2949
|
+
networks:
|
|
2950
|
+
elsabro-network:
|
|
2951
|
+
driver: bridge
|
|
2952
|
+
```
|
|
2953
|
+
|
|
2954
|
+
---
|
|
2955
|
+
|
|
2956
|
+
## 7. CLI Commands
|
|
2957
|
+
|
|
2958
|
+
### /elsabro:k8s
|
|
2959
|
+
|
|
2960
|
+
```bash
|
|
2961
|
+
# Deploy ELSABRO to Kubernetes
|
|
2962
|
+
/elsabro:k8s deploy [environment] [options]
|
|
2963
|
+
|
|
2964
|
+
# Options:
|
|
2965
|
+
# --namespace <ns> Target namespace (default: elsabro)
|
|
2966
|
+
# --replicas <n> Number of replicas (default: from values)
|
|
2967
|
+
# --image-tag <tag> Docker image tag (default: latest)
|
|
2968
|
+
# --values <file> Custom values file
|
|
2969
|
+
# --dry-run Preview without applying
|
|
2970
|
+
# --wait Wait for deployment to complete
|
|
2971
|
+
|
|
2972
|
+
# Examples:
|
|
2973
|
+
/elsabro:k8s deploy dev
|
|
2974
|
+
/elsabro:k8s deploy staging --replicas 3
|
|
2975
|
+
/elsabro:k8s deploy prod --image-tag v3.6.0 --wait
|
|
2976
|
+
|
|
2977
|
+
# Scale deployment
|
|
2978
|
+
/elsabro:k8s scale <replicas> [options]
|
|
2979
|
+
|
|
2980
|
+
# Options:
|
|
2981
|
+
# --namespace <ns> Target namespace
|
|
2982
|
+
# --deployment <name> Deployment name (default: elsabro)
|
|
2983
|
+
|
|
2984
|
+
# Examples:
|
|
2985
|
+
/elsabro:k8s scale 5
|
|
2986
|
+
/elsabro:k8s scale 10 --namespace elsabro-prod
|
|
2987
|
+
|
|
2988
|
+
# Get deployment status
|
|
2989
|
+
/elsabro:k8s status [options]
|
|
2990
|
+
|
|
2991
|
+
# Options:
|
|
2992
|
+
# --namespace <ns> Target namespace
|
|
2993
|
+
# --watch Watch for changes
|
|
2994
|
+
# --output <format> Output format (table|json|yaml)
|
|
2995
|
+
|
|
2996
|
+
# Examples:
|
|
2997
|
+
/elsabro:k8s status
|
|
2998
|
+
/elsabro:k8s status --namespace elsabro-prod --watch
|
|
2999
|
+
|
|
3000
|
+
# View logs
|
|
3001
|
+
/elsabro:k8s logs [options]
|
|
3002
|
+
|
|
3003
|
+
# Options:
|
|
3004
|
+
# --namespace <ns> Target namespace
|
|
3005
|
+
# --pod <name> Specific pod name
|
|
3006
|
+
# --container <name> Container name
|
|
3007
|
+
# --follow Follow log output
|
|
3008
|
+
# --tail <n> Number of lines (default: 100)
|
|
3009
|
+
# --since <duration> Show logs since duration (e.g., 1h)
|
|
3010
|
+
|
|
3011
|
+
# Examples:
|
|
3012
|
+
/elsabro:k8s logs
|
|
3013
|
+
/elsabro:k8s logs --follow --tail 500
|
|
3014
|
+
/elsabro:k8s logs --since 30m --namespace elsabro-prod
|
|
3015
|
+
|
|
3016
|
+
# Generate Helm chart
|
|
3017
|
+
/elsabro:k8s helm generate [options]
|
|
3018
|
+
|
|
3019
|
+
# Options:
|
|
3020
|
+
# --output <dir> Output directory
|
|
3021
|
+
# --environment <env> Target environment (dev|staging|prod)
|
|
3022
|
+
|
|
3023
|
+
# Rollout management
|
|
3024
|
+
/elsabro:k8s rollout <action> [options]
|
|
3025
|
+
|
|
3026
|
+
# Actions:
|
|
3027
|
+
# status - Show rollout status
|
|
3028
|
+
# restart - Restart deployment
|
|
3029
|
+
# undo - Rollback to previous version
|
|
3030
|
+
# pause - Pause rollout
|
|
3031
|
+
# resume - Resume rollout
|
|
3032
|
+
|
|
3033
|
+
# Examples:
|
|
3034
|
+
/elsabro:k8s rollout status
|
|
3035
|
+
/elsabro:k8s rollout restart
|
|
3036
|
+
/elsabro:k8s rollout undo --to-revision 3
|
|
3037
|
+
|
|
3038
|
+
# Port forwarding
|
|
3039
|
+
/elsabro:k8s port-forward [options]
|
|
3040
|
+
|
|
3041
|
+
# Options:
|
|
3042
|
+
# --namespace <ns> Target namespace
|
|
3043
|
+
# --port <local:remote> Port mapping (default: 8080:8080)
|
|
3044
|
+
# --pod <name> Specific pod
|
|
3045
|
+
|
|
3046
|
+
# Examples:
|
|
3047
|
+
/elsabro:k8s port-forward
|
|
3048
|
+
/elsabro:k8s port-forward --port 3000:8080
|
|
3049
|
+
|
|
3050
|
+
# Execute command in pod
|
|
3051
|
+
/elsabro:k8s exec <command> [options]
|
|
3052
|
+
|
|
3053
|
+
# Options:
|
|
3054
|
+
# --namespace <ns> Target namespace
|
|
3055
|
+
# --pod <name> Specific pod
|
|
3056
|
+
# --interactive Interactive mode (-it)
|
|
3057
|
+
|
|
3058
|
+
# Examples:
|
|
3059
|
+
/elsabro:k8s exec "npm run migrate"
|
|
3060
|
+
/elsabro:k8s exec "sh" --interactive
|
|
3061
|
+
```
|
|
3062
|
+
|
|
3063
|
+
---
|
|
3064
|
+
|
|
3065
|
+
## Comandos de Usuario Rapidos
|
|
3066
|
+
|
|
3067
|
+
```bash
|
|
3068
|
+
# Deployment commands
|
|
3069
|
+
/elsabro:k8s deploy dev # Deploy to development
|
|
3070
|
+
/elsabro:k8s deploy staging --replicas 3 # Deploy to staging with 3 replicas
|
|
3071
|
+
/elsabro:k8s deploy prod --wait # Deploy to production and wait
|
|
3072
|
+
|
|
3073
|
+
# Scaling commands
|
|
3074
|
+
/elsabro:k8s scale 5 # Scale to 5 replicas
|
|
3075
|
+
/elsabro:k8s scale 10 --namespace prod # Scale production to 10
|
|
3076
|
+
|
|
3077
|
+
# Status and monitoring
|
|
3078
|
+
/elsabro:k8s status # Get current status
|
|
3079
|
+
/elsabro:k8s status --watch # Watch status in real-time
|
|
3080
|
+
/elsabro:k8s logs --follow # Stream logs
|
|
3081
|
+
|
|
3082
|
+
# Rollout management
|
|
3083
|
+
/elsabro:k8s rollout status # Check rollout status
|
|
3084
|
+
/elsabro:k8s rollout restart # Restart all pods
|
|
3085
|
+
/elsabro:k8s rollout undo # Rollback to previous version
|
|
3086
|
+
```
|
|
3087
|
+
|
|
3088
|
+
---
|
|
3089
|
+
|
|
3090
|
+
## Changelog
|
|
3091
|
+
|
|
3092
|
+
- **v3.6.0**: Initial Kubernetes Deployment System implementation
|
|
3093
|
+
- K8sDeployer for deployment management
|
|
3094
|
+
- HelmChartGenerator for Helm chart creation
|
|
3095
|
+
- ResourceScaler with HPA/VPA support
|
|
3096
|
+
- HealthMonitor with probes and metrics
|
|
3097
|
+
- Infrastructure components (Redis, PostgreSQL, RabbitMQ)
|
|
3098
|
+
- Docker multi-stage build
|
|
3099
|
+
- CLI commands for operations
|