elsabro 2.2.0 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (88) hide show
  1. package/README.md +668 -20
  2. package/agents/elsabro-orchestrator.md +113 -0
  3. package/bin/install.js +0 -0
  4. package/commands/elsabro/execute.md +223 -46
  5. package/commands/elsabro/start.md +34 -0
  6. package/commands/elsabro/verify-work.md +29 -0
  7. package/flows/development-flow.json +452 -0
  8. package/flows/quick-flow.json +118 -0
  9. package/hooks/confirm-destructive.sh +145 -0
  10. package/hooks/hooks-config.json +81 -0
  11. package/hooks/lint-check.sh +238 -0
  12. package/hooks/post-edit-test.sh +189 -0
  13. package/package.json +5 -3
  14. package/references/SYSTEM_INDEX.md +379 -5
  15. package/references/agent-marketplace.md +2274 -0
  16. package/references/agent-protocol.md +1126 -0
  17. package/references/ai-code-suggestions.md +2413 -0
  18. package/references/checkpointing.md +595 -0
  19. package/references/collaboration-patterns.md +851 -0
  20. package/references/collaborative-sessions.md +1081 -0
  21. package/references/configuration-management.md +1810 -0
  22. package/references/cost-tracking.md +1095 -0
  23. package/references/enterprise-sso.md +2001 -0
  24. package/references/error-contracts-tests.md +1171 -0
  25. package/references/error-contracts-v2.md +968 -0
  26. package/references/error-contracts.md +3102 -0
  27. package/references/event-driven.md +1031 -0
  28. package/references/flow-orchestration.md +940 -0
  29. package/references/flow-visualization.md +1557 -0
  30. package/references/ide-integrations.md +3513 -0
  31. package/references/interrupt-system.md +681 -0
  32. package/references/kubernetes-deployment.md +3099 -0
  33. package/references/memory-system.md +683 -0
  34. package/references/mobile-companion.md +3236 -0
  35. package/references/multi-llm-providers.md +2494 -0
  36. package/references/multi-project-memory.md +1182 -0
  37. package/references/observability.md +793 -0
  38. package/references/output-schemas.md +858 -0
  39. package/references/parallel-worktrees.md +293 -0
  40. package/references/performance-profiler.md +955 -0
  41. package/references/plugin-system.md +1526 -0
  42. package/references/prompt-management.md +292 -0
  43. package/references/sandbox-execution.md +303 -0
  44. package/references/security-system.md +1253 -0
  45. package/references/streaming.md +696 -0
  46. package/references/testing-framework.md +1151 -0
  47. package/references/time-travel.md +802 -0
  48. package/references/tool-registry.md +886 -0
  49. package/references/voice-commands.md +3296 -0
  50. package/scripts/setup-parallel-worktrees.sh +319 -0
  51. package/skills/memory-update.md +207 -0
  52. package/skills/review.md +331 -0
  53. package/skills/techdebt.md +289 -0
  54. package/skills/tutor.md +219 -0
  55. package/templates/.planning/notes/.gitkeep +0 -0
  56. package/templates/CLAUDE.md.template +48 -0
  57. package/templates/agent-marketplace-config.json +220 -0
  58. package/templates/agent-protocol-config.json +136 -0
  59. package/templates/ai-suggestions-config.json +100 -0
  60. package/templates/checkpoint-state.json +61 -0
  61. package/templates/collaboration-config.json +157 -0
  62. package/templates/collaborative-sessions-config.json +153 -0
  63. package/templates/configuration-config.json +245 -0
  64. package/templates/cost-tracking-config.json +148 -0
  65. package/templates/enterprise-sso-config.json +438 -0
  66. package/templates/error-handling-config.json +79 -2
  67. package/templates/events-config.json +148 -0
  68. package/templates/flow-visualization-config.json +196 -0
  69. package/templates/ide-integrations-config.json +442 -0
  70. package/templates/kubernetes-config.json +764 -0
  71. package/templates/memory-state.json +84 -0
  72. package/templates/mistakes.md.template +52 -0
  73. package/templates/mobile-companion-config.json +600 -0
  74. package/templates/multi-llm-config.json +544 -0
  75. package/templates/multi-project-memory-config.json +145 -0
  76. package/templates/observability-config.json +109 -0
  77. package/templates/patterns.md.template +114 -0
  78. package/templates/performance-profiler-config.json +125 -0
  79. package/templates/plugin-config.json +170 -0
  80. package/templates/prompt-management-config.json +86 -0
  81. package/templates/sandbox-config.json +185 -0
  82. package/templates/schemas-config.json +65 -0
  83. package/templates/security-config.json +120 -0
  84. package/templates/streaming-config.json +72 -0
  85. package/templates/testing-config.json +81 -0
  86. package/templates/timetravel-config.json +62 -0
  87. package/templates/tool-registry-config.json +109 -0
  88. package/templates/voice-commands-config.json +658 -0
@@ -0,0 +1,3099 @@
1
+ ---
2
+ name: kubernetes-deployment
3
+ description: Sistema de deployment de ELSABRO en Kubernetes con Helm, HPA, VPA y observabilidad
4
+ version: 3.6.0
5
+ ---
6
+
7
+ # ELSABRO Kubernetes Deployment System (v3.6)
8
+
9
+ Sistema completo para desplegar ELSABRO como servicio en Kubernetes con soporte para multi-tenancy, auto-scaling, y observabilidad enterprise.
10
+
11
+ ## Vision General
12
+
13
+ ```
14
+ +-----------------------------------------------------------------------------+
15
+ | ELSABRO KUBERNETES ARCHITECTURE |
16
+ +-----------------------------------------------------------------------------+
17
+ | |
18
+ | +------------------------------------------------------------------------+ |
19
+ | | INGRESS LAYER | |
20
+ | | +----------------+ +----------------+ +----------------+ | |
21
+ | | | Nginx | | Traefik | | Istio GW | | |
22
+ | | | Ingress | | Ingress | | (Service | | |
23
+ | | | (default) | | (alternative) | | Mesh) | | |
24
+ | | +-------+--------+ +-------+--------+ +-------+--------+ | |
25
+ | +----------|------------------|------------------|-------------------------+ |
26
+ | | | | |
27
+ | +------------------+------------------+ |
28
+ | | |
29
+ | +-----------------------------v------------------------------------------+ |
30
+ | | SERVICE LAYER | |
31
+ | | +------------------------------------------------------------------+ | |
32
+ | | | elsabro-service | | |
33
+ | | | ClusterIP / LoadBalancer / NodePort | | |
34
+ | | | Port: 8080 (HTTP) | 8443 (HTTPS) | 9090 (Metrics) | | |
35
+ | | +------------------------------------------------------------------+ | |
36
+ | +------------------------------------------------------------------------+ |
37
+ | | |
38
+ | +-----------------------------v------------------------------------------+ |
39
+ | | DEPLOYMENT LAYER | |
40
+ | | +------------------------------------------------------------------+ | |
41
+ | | | elsabro-deployment | | |
42
+ | | | +------------+ +------------+ +------------+ +------------+ | | |
43
+ | | | | Pod 1 | | Pod 2 | | Pod 3 | | Pod N | | | |
44
+ | | | | Replica | | Replica | | Replica | | Replica | | | |
45
+ | | | +------------+ +------------+ +------------+ +------------+ | | |
46
+ | | | ^ ^ ^ ^ | | |
47
+ | | | +---------------+---------------+---------------+ | | |
48
+ | | | | | | |
49
+ | | | HPA / VPA / KEDA | | |
50
+ | | +------------------------------------------------------------------+ | |
51
+ | +------------------------------------------------------------------------+ |
52
+ | | |
53
+ | +-----------------------------v------------------------------------------+ |
54
+ | | INFRASTRUCTURE LAYER | |
55
+ | | +----------------+ +----------------+ +----------------+ | |
56
+ | | | Redis | | PostgreSQL | | RabbitMQ | | |
57
+ | | | Cluster | | Primary | | Cluster | | |
58
+ | | | (Session/ | | + Replicas | | (Event Bus) | | |
59
+ | | | Cache) | | | | | | |
60
+ | | +----------------+ +----------------+ +----------------+ | |
61
+ | +------------------------------------------------------------------------+ |
62
+ | | |
63
+ | +-----------------------------v------------------------------------------+ |
64
+ | | OBSERVABILITY LAYER | |
65
+ | | +----------------+ +----------------+ +----------------+ | |
66
+ | | | Prometheus | | Grafana | | Jaeger | | |
67
+ | | | (Metrics) | | (Dashboards) | | (Tracing) | | |
68
+ | | +----------------+ +----------------+ +----------------+ | |
69
+ | +------------------------------------------------------------------------+ |
70
+ | |
71
+ +-----------------------------------------------------------------------------+
72
+ ```
73
+
74
+ ---
75
+
76
+ ## 1. K8sDeployer
77
+
78
+ ### Interfaces TypeScript
79
+
80
+ ```typescript
81
+ /**
82
+ * K8sDeployer - Core deployment manager for ELSABRO on Kubernetes
83
+ */
84
+
85
+ // Namespace configuration
86
+ interface NamespaceConfig {
87
+ name: string;
88
+ labels: Record<string, string>;
89
+ annotations: Record<string, string>;
90
+ resourceQuota?: ResourceQuotaSpec;
91
+ limitRange?: LimitRangeSpec;
92
+ networkPolicy?: NetworkPolicySpec;
93
+ }
94
+
95
+ // Deployment specification
96
+ interface DeploymentSpec {
97
+ name: string;
98
+ namespace: string;
99
+ replicas: number;
100
+ image: string;
101
+ imageTag: string;
102
+ imagePullPolicy: 'Always' | 'IfNotPresent' | 'Never';
103
+ resources: ResourceRequirements;
104
+ env: EnvVar[];
105
+ envFrom: EnvFromSource[];
106
+ ports: ContainerPort[];
107
+ volumeMounts: VolumeMount[];
108
+ volumes: Volume[];
109
+ nodeSelector?: Record<string, string>;
110
+ tolerations?: Toleration[];
111
+ affinity?: Affinity;
112
+ securityContext?: PodSecurityContext;
113
+ serviceAccountName?: string;
114
+ }
115
+
116
+ // Resource requirements
117
+ interface ResourceRequirements {
118
+ requests: {
119
+ cpu: string;
120
+ memory: string;
121
+ 'ephemeral-storage'?: string;
122
+ };
123
+ limits: {
124
+ cpu: string;
125
+ memory: string;
126
+ 'ephemeral-storage'?: string;
127
+ 'nvidia.com/gpu'?: number;
128
+ };
129
+ }
130
+
131
+ // Environment variable
132
+ interface EnvVar {
133
+ name: string;
134
+ value?: string;
135
+ valueFrom?: {
136
+ configMapKeyRef?: { name: string; key: string };
137
+ secretKeyRef?: { name: string; key: string };
138
+ fieldRef?: { fieldPath: string };
139
+ resourceFieldRef?: { containerName: string; resource: string };
140
+ };
141
+ }
142
+
143
+ // ConfigMap specification
144
+ interface ConfigMapSpec {
145
+ name: string;
146
+ namespace: string;
147
+ data: Record<string, string>;
148
+ binaryData?: Record<string, string>;
149
+ immutable?: boolean;
150
+ }
151
+
152
+ // Secret specification
153
+ interface SecretSpec {
154
+ name: string;
155
+ namespace: string;
156
+ type: 'Opaque' | 'kubernetes.io/tls' | 'kubernetes.io/dockerconfigjson';
157
+ data?: Record<string, string>;
158
+ stringData?: Record<string, string>;
159
+ }
160
+
161
+ // Service Account specification
162
+ interface ServiceAccountSpec {
163
+ name: string;
164
+ namespace: string;
165
+ automountServiceAccountToken?: boolean;
166
+ imagePullSecrets?: { name: string }[];
167
+ secrets?: { name: string }[];
168
+ }
169
+
170
+ // RBAC Role specification
171
+ interface RoleSpec {
172
+ name: string;
173
+ namespace: string;
174
+ rules: PolicyRule[];
175
+ }
176
+
177
+ interface PolicyRule {
178
+ apiGroups: string[];
179
+ resources: string[];
180
+ verbs: ('get' | 'list' | 'watch' | 'create' | 'update' | 'patch' | 'delete')[];
181
+ resourceNames?: string[];
182
+ }
183
+
184
+ // K8sDeployer main class
185
+ class K8sDeployer {
186
+ private kubeConfig: KubeConfig;
187
+ private coreApi: CoreV1Api;
188
+ private appsApi: AppsV1Api;
189
+ private rbacApi: RbacAuthorizationV1Api;
190
+ private networkingApi: NetworkingV1Api;
191
+ private autoscalingApi: AutoscalingV2Api;
192
+
193
+ constructor(config: K8sDeployerConfig) {
194
+ this.kubeConfig = new KubeConfig();
195
+ this.kubeConfig.loadFromDefault();
196
+ this.initializeApis();
197
+ }
198
+
199
+ // Namespace operations
200
+ async createNamespace(config: NamespaceConfig): Promise<V1Namespace> {
201
+ const namespace: V1Namespace = {
202
+ apiVersion: 'v1',
203
+ kind: 'Namespace',
204
+ metadata: {
205
+ name: config.name,
206
+ labels: {
207
+ 'app.kubernetes.io/name': 'elsabro',
208
+ 'app.kubernetes.io/managed-by': 'elsabro-deployer',
209
+ ...config.labels
210
+ },
211
+ annotations: config.annotations
212
+ }
213
+ };
214
+
215
+ const result = await this.coreApi.createNamespace(namespace);
216
+
217
+ // Apply resource quota if specified
218
+ if (config.resourceQuota) {
219
+ await this.createResourceQuota(config.name, config.resourceQuota);
220
+ }
221
+
222
+ // Apply limit range if specified
223
+ if (config.limitRange) {
224
+ await this.createLimitRange(config.name, config.limitRange);
225
+ }
226
+
227
+ // Apply network policy if specified
228
+ if (config.networkPolicy) {
229
+ await this.createNetworkPolicy(config.name, config.networkPolicy);
230
+ }
231
+
232
+ return result.body;
233
+ }
234
+
235
+ // Deployment operations
236
+ async deploy(spec: DeploymentSpec): Promise<DeploymentResult> {
237
+ const deployment = this.buildDeploymentManifest(spec);
238
+
239
+ try {
240
+ // Check if deployment exists
241
+ const existing = await this.appsApi.readNamespacedDeployment(
242
+ spec.name,
243
+ spec.namespace
244
+ );
245
+
246
+ // Update existing deployment
247
+ const result = await this.appsApi.replaceNamespacedDeployment(
248
+ spec.name,
249
+ spec.namespace,
250
+ deployment
251
+ );
252
+
253
+ return {
254
+ action: 'updated',
255
+ deployment: result.body,
256
+ timestamp: new Date().toISOString()
257
+ };
258
+ } catch (e) {
259
+ if (e.statusCode === 404) {
260
+ // Create new deployment
261
+ const result = await this.appsApi.createNamespacedDeployment(
262
+ spec.namespace,
263
+ deployment
264
+ );
265
+
266
+ return {
267
+ action: 'created',
268
+ deployment: result.body,
269
+ timestamp: new Date().toISOString()
270
+ };
271
+ }
272
+ throw e;
273
+ }
274
+ }
275
+
276
+ // ConfigMap operations
277
+ async createConfigMap(spec: ConfigMapSpec): Promise<V1ConfigMap> {
278
+ const configMap: V1ConfigMap = {
279
+ apiVersion: 'v1',
280
+ kind: 'ConfigMap',
281
+ metadata: {
282
+ name: spec.name,
283
+ namespace: spec.namespace,
284
+ labels: {
285
+ 'app.kubernetes.io/name': 'elsabro',
286
+ 'app.kubernetes.io/component': 'config'
287
+ }
288
+ },
289
+ data: spec.data,
290
+ binaryData: spec.binaryData,
291
+ immutable: spec.immutable
292
+ };
293
+
294
+ const result = await this.coreApi.createNamespacedConfigMap(
295
+ spec.namespace,
296
+ configMap
297
+ );
298
+ return result.body;
299
+ }
300
+
301
+ // Secret operations
302
+ async createSecret(spec: SecretSpec): Promise<V1Secret> {
303
+ const secret: V1Secret = {
304
+ apiVersion: 'v1',
305
+ kind: 'Secret',
306
+ metadata: {
307
+ name: spec.name,
308
+ namespace: spec.namespace,
309
+ labels: {
310
+ 'app.kubernetes.io/name': 'elsabro',
311
+ 'app.kubernetes.io/component': 'secret'
312
+ }
313
+ },
314
+ type: spec.type,
315
+ data: spec.data ? this.encodeSecretData(spec.data) : undefined,
316
+ stringData: spec.stringData
317
+ };
318
+
319
+ const result = await this.coreApi.createNamespacedSecret(
320
+ spec.namespace,
321
+ secret
322
+ );
323
+ return result.body;
324
+ }
325
+
326
+ // ServiceAccount and RBAC operations
327
+ async setupRBAC(
328
+ namespace: string,
329
+ serviceAccountSpec: ServiceAccountSpec,
330
+ roleSpec: RoleSpec,
331
+ roleBindingName: string
332
+ ): Promise<RBACSetupResult> {
333
+ // Create ServiceAccount
334
+ const sa = await this.createServiceAccount(serviceAccountSpec);
335
+
336
+ // Create Role
337
+ const role = await this.createRole(roleSpec);
338
+
339
+ // Create RoleBinding
340
+ const binding = await this.createRoleBinding({
341
+ name: roleBindingName,
342
+ namespace,
343
+ roleRef: {
344
+ apiGroup: 'rbac.authorization.k8s.io',
345
+ kind: 'Role',
346
+ name: roleSpec.name
347
+ },
348
+ subjects: [{
349
+ kind: 'ServiceAccount',
350
+ name: serviceAccountSpec.name,
351
+ namespace
352
+ }]
353
+ });
354
+
355
+ return { serviceAccount: sa, role, roleBinding: binding };
356
+ }
357
+
358
+ // Rollout operations
359
+ async rollout(
360
+ namespace: string,
361
+ deploymentName: string,
362
+ strategy: RolloutStrategy
363
+ ): Promise<RolloutResult> {
364
+ const deployment = await this.appsApi.readNamespacedDeployment(
365
+ deploymentName,
366
+ namespace
367
+ );
368
+
369
+ switch (strategy.type) {
370
+ case 'restart':
371
+ return this.restartDeployment(namespace, deploymentName);
372
+ case 'scale':
373
+ return this.scaleDeployment(namespace, deploymentName, strategy.replicas);
374
+ case 'canary':
375
+ return this.canaryDeployment(namespace, deployment.body, strategy);
376
+ case 'bluegreen':
377
+ return this.blueGreenDeployment(namespace, deployment.body, strategy);
378
+ default:
379
+ throw new Error(`Unknown rollout strategy: ${strategy.type}`);
380
+ }
381
+ }
382
+
383
+ // Status and monitoring
384
+ async getDeploymentStatus(
385
+ namespace: string,
386
+ deploymentName: string
387
+ ): Promise<DeploymentStatus> {
388
+ const deployment = await this.appsApi.readNamespacedDeployment(
389
+ deploymentName,
390
+ namespace
391
+ );
392
+
393
+ const pods = await this.coreApi.listNamespacedPod(
394
+ namespace,
395
+ undefined,
396
+ undefined,
397
+ undefined,
398
+ undefined,
399
+ `app.kubernetes.io/name=elsabro`
400
+ );
401
+
402
+ return {
403
+ name: deploymentName,
404
+ namespace,
405
+ replicas: {
406
+ desired: deployment.body.spec?.replicas || 0,
407
+ ready: deployment.body.status?.readyReplicas || 0,
408
+ available: deployment.body.status?.availableReplicas || 0,
409
+ unavailable: deployment.body.status?.unavailableReplicas || 0
410
+ },
411
+ conditions: deployment.body.status?.conditions || [],
412
+ pods: pods.body.items.map(pod => ({
413
+ name: pod.metadata?.name,
414
+ phase: pod.status?.phase,
415
+ ready: pod.status?.conditions?.find(c => c.type === 'Ready')?.status === 'True',
416
+ restarts: pod.status?.containerStatuses?.[0]?.restartCount || 0
417
+ }))
418
+ };
419
+ }
420
+
421
+ // Helper methods
422
+ private buildDeploymentManifest(spec: DeploymentSpec): V1Deployment {
423
+ return {
424
+ apiVersion: 'apps/v1',
425
+ kind: 'Deployment',
426
+ metadata: {
427
+ name: spec.name,
428
+ namespace: spec.namespace,
429
+ labels: {
430
+ 'app.kubernetes.io/name': 'elsabro',
431
+ 'app.kubernetes.io/instance': spec.name,
432
+ 'app.kubernetes.io/version': spec.imageTag,
433
+ 'app.kubernetes.io/component': 'api',
434
+ 'app.kubernetes.io/managed-by': 'elsabro-deployer'
435
+ }
436
+ },
437
+ spec: {
438
+ replicas: spec.replicas,
439
+ selector: {
440
+ matchLabels: {
441
+ 'app.kubernetes.io/name': 'elsabro',
442
+ 'app.kubernetes.io/instance': spec.name
443
+ }
444
+ },
445
+ strategy: {
446
+ type: 'RollingUpdate',
447
+ rollingUpdate: {
448
+ maxSurge: '25%',
449
+ maxUnavailable: '25%'
450
+ }
451
+ },
452
+ template: {
453
+ metadata: {
454
+ labels: {
455
+ 'app.kubernetes.io/name': 'elsabro',
456
+ 'app.kubernetes.io/instance': spec.name,
457
+ 'app.kubernetes.io/version': spec.imageTag
458
+ },
459
+ annotations: {
460
+ 'prometheus.io/scrape': 'true',
461
+ 'prometheus.io/port': '9090',
462
+ 'prometheus.io/path': '/metrics'
463
+ }
464
+ },
465
+ spec: {
466
+ serviceAccountName: spec.serviceAccountName,
467
+ securityContext: spec.securityContext,
468
+ containers: [{
469
+ name: 'elsabro',
470
+ image: `${spec.image}:${spec.imageTag}`,
471
+ imagePullPolicy: spec.imagePullPolicy,
472
+ ports: spec.ports,
473
+ env: spec.env,
474
+ envFrom: spec.envFrom,
475
+ resources: spec.resources,
476
+ volumeMounts: spec.volumeMounts,
477
+ livenessProbe: {
478
+ httpGet: { path: '/health/live', port: 8080 },
479
+ initialDelaySeconds: 15,
480
+ periodSeconds: 20,
481
+ timeoutSeconds: 5,
482
+ failureThreshold: 3
483
+ },
484
+ readinessProbe: {
485
+ httpGet: { path: '/health/ready', port: 8080 },
486
+ initialDelaySeconds: 5,
487
+ periodSeconds: 10,
488
+ timeoutSeconds: 3,
489
+ failureThreshold: 3
490
+ },
491
+ startupProbe: {
492
+ httpGet: { path: '/health/startup', port: 8080 },
493
+ initialDelaySeconds: 10,
494
+ periodSeconds: 5,
495
+ timeoutSeconds: 3,
496
+ failureThreshold: 30
497
+ }
498
+ }],
499
+ volumes: spec.volumes,
500
+ nodeSelector: spec.nodeSelector,
501
+ tolerations: spec.tolerations,
502
+ affinity: spec.affinity
503
+ }
504
+ }
505
+ }
506
+ };
507
+ }
508
+ }
509
+ ```
510
+
511
+ ---
512
+
513
+ ## 2. HelmChartGenerator
514
+
515
+ ### Helm Chart Structure
516
+
517
+ ```
518
+ elsabro-chart/
519
+ +-- Chart.yaml
520
+ +-- values.yaml
521
+ +-- values-dev.yaml
522
+ +-- values-staging.yaml
523
+ +-- values-prod.yaml
524
+ +-- templates/
525
+ | +-- _helpers.tpl
526
+ | +-- deployment.yaml
527
+ | +-- service.yaml
528
+ | +-- ingress.yaml
529
+ | +-- hpa.yaml
530
+ | +-- vpa.yaml
531
+ | +-- configmap.yaml
532
+ | +-- secret.yaml
533
+ | +-- serviceaccount.yaml
534
+ | +-- role.yaml
535
+ | +-- rolebinding.yaml
536
+ | +-- networkpolicy.yaml
537
+ | +-- pdb.yaml
538
+ | +-- servicemonitor.yaml
539
+ +-- charts/
540
+ | +-- redis/
541
+ | +-- postgresql/
542
+ | +-- rabbitmq/
543
+ +-- .helmignore
544
+ ```
545
+
546
+ ### Chart.yaml
547
+
548
+ ```yaml
549
+ apiVersion: v2
550
+ name: elsabro
551
+ description: ELSABRO - AI-Powered Development Workflow System
552
+ type: application
553
+ version: 3.6.0
554
+ appVersion: "3.6.0"
555
+ kubeVersion: ">=1.25.0-0"
556
+
557
+ keywords:
558
+ - ai
559
+ - agents
560
+ - development
561
+ - workflow
562
+ - automation
563
+
564
+ home: https://github.com/cubait/elsabro
565
+ sources:
566
+ - https://github.com/cubait/elsabro
567
+
568
+ maintainers:
569
+ - name: cubait
570
+ email: support@cubait.com
571
+
572
+ dependencies:
573
+ - name: redis
574
+ version: "18.x.x"
575
+ repository: "https://charts.bitnami.com/bitnami"
576
+ condition: redis.enabled
577
+ - name: postgresql
578
+ version: "14.x.x"
579
+ repository: "https://charts.bitnami.com/bitnami"
580
+ condition: postgresql.enabled
581
+ - name: rabbitmq
582
+ version: "12.x.x"
583
+ repository: "https://charts.bitnami.com/bitnami"
584
+ condition: rabbitmq.enabled
585
+
586
+ annotations:
587
+ artifacthub.io/license: MIT
588
+ artifacthub.io/links: |
589
+ - name: Documentation
590
+ url: https://docs.elsabro.dev
591
+ - name: Support
592
+ url: https://github.com/cubait/elsabro/issues
593
+ ```
594
+
595
+ ### values.yaml (Default)
596
+
597
+ ```yaml
598
+ # ELSABRO Helm Chart Values
599
+ # Default configuration for all environments
600
+
601
+ global:
602
+ imageRegistry: ""
603
+ imagePullSecrets: []
604
+ storageClass: ""
605
+
606
+ # Number of replicas
607
+ replicaCount: 2
608
+
609
+ # Image configuration
610
+ image:
611
+ registry: ghcr.io
612
+ repository: cubait/elsabro
613
+ tag: "3.6.0"
614
+ pullPolicy: IfNotPresent
615
+
616
+ # Service Account
617
+ serviceAccount:
618
+ create: true
619
+ annotations: {}
620
+ name: "elsabro-sa"
621
+ automountServiceAccountToken: true
622
+
623
+ # Pod annotations
624
+ podAnnotations:
625
+ prometheus.io/scrape: "true"
626
+ prometheus.io/port: "9090"
627
+ prometheus.io/path: "/metrics"
628
+
629
+ # Pod security context
630
+ podSecurityContext:
631
+ runAsNonRoot: true
632
+ runAsUser: 1000
633
+ runAsGroup: 1000
634
+ fsGroup: 1000
635
+
636
+ # Container security context
637
+ securityContext:
638
+ allowPrivilegeEscalation: false
639
+ readOnlyRootFilesystem: true
640
+ runAsNonRoot: true
641
+ runAsUser: 1000
642
+ capabilities:
643
+ drop:
644
+ - ALL
645
+
646
+ # Service configuration
647
+ service:
648
+ type: ClusterIP
649
+ port: 8080
650
+ metricsPort: 9090
651
+ annotations: {}
652
+
653
+ # Ingress configuration
654
+ ingress:
655
+ enabled: false
656
+ className: "nginx"
657
+ annotations:
658
+ nginx.ingress.kubernetes.io/ssl-redirect: "true"
659
+ nginx.ingress.kubernetes.io/proxy-body-size: "50m"
660
+ nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
661
+ nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
662
+ cert-manager.io/cluster-issuer: "letsencrypt-prod"
663
+ hosts:
664
+ - host: elsabro.local
665
+ paths:
666
+ - path: /
667
+ pathType: Prefix
668
+ tls:
669
+ - secretName: elsabro-tls
670
+ hosts:
671
+ - elsabro.local
672
+
673
+ # Resource limits
674
+ resources:
675
+ requests:
676
+ cpu: 500m
677
+ memory: 512Mi
678
+ limits:
679
+ cpu: 2000m
680
+ memory: 2Gi
681
+
682
+ # Horizontal Pod Autoscaler
683
+ autoscaling:
684
+ enabled: true
685
+ minReplicas: 2
686
+ maxReplicas: 10
687
+ targetCPUUtilizationPercentage: 70
688
+ targetMemoryUtilizationPercentage: 80
689
+ behavior:
690
+ scaleDown:
691
+ stabilizationWindowSeconds: 300
692
+ policies:
693
+ - type: Percent
694
+ value: 10
695
+ periodSeconds: 60
696
+ scaleUp:
697
+ stabilizationWindowSeconds: 0
698
+ policies:
699
+ - type: Percent
700
+ value: 100
701
+ periodSeconds: 15
702
+ - type: Pods
703
+ value: 4
704
+ periodSeconds: 15
705
+ selectPolicy: Max
706
+ customMetrics:
707
+ - type: Pods
708
+ pods:
709
+ metric:
710
+ name: elsabro_queue_length
711
+ target:
712
+ type: AverageValue
713
+ averageValue: 100
714
+ - type: Pods
715
+ pods:
716
+ metric:
717
+ name: elsabro_active_agents
718
+ target:
719
+ type: AverageValue
720
+ averageValue: 5
721
+
722
+ # Vertical Pod Autoscaler
723
+ vpa:
724
+ enabled: false
725
+ updateMode: "Auto"
726
+ resourcePolicy:
727
+ containerPolicies:
728
+ - containerName: elsabro
729
+ minAllowed:
730
+ cpu: 250m
731
+ memory: 256Mi
732
+ maxAllowed:
733
+ cpu: 4000m
734
+ memory: 8Gi
735
+ controlledResources:
736
+ - cpu
737
+ - memory
738
+
739
+ # Pod Disruption Budget
740
+ podDisruptionBudget:
741
+ enabled: true
742
+ minAvailable: 1
743
+ # maxUnavailable: 1
744
+
745
+ # Node selector
746
+ nodeSelector: {}
747
+
748
+ # Tolerations
749
+ tolerations: []
750
+
751
+ # Affinity rules
752
+ affinity:
753
+ podAntiAffinity:
754
+ preferredDuringSchedulingIgnoredDuringExecution:
755
+ - weight: 100
756
+ podAffinityTerm:
757
+ labelSelector:
758
+ matchExpressions:
759
+ - key: app.kubernetes.io/name
760
+ operator: In
761
+ values:
762
+ - elsabro
763
+ topologyKey: kubernetes.io/hostname
764
+
765
+ # Topology spread constraints
766
+ topologySpreadConstraints:
767
+ - maxSkew: 1
768
+ topologyKey: topology.kubernetes.io/zone
769
+ whenUnsatisfiable: ScheduleAnyway
770
+ labelSelector:
771
+ matchLabels:
772
+ app.kubernetes.io/name: elsabro
773
+
774
+ # Probes configuration
775
+ probes:
776
+ liveness:
777
+ enabled: true
778
+ path: /health/live
779
+ initialDelaySeconds: 15
780
+ periodSeconds: 20
781
+ timeoutSeconds: 5
782
+ failureThreshold: 3
783
+ readiness:
784
+ enabled: true
785
+ path: /health/ready
786
+ initialDelaySeconds: 5
787
+ periodSeconds: 10
788
+ timeoutSeconds: 3
789
+ failureThreshold: 3
790
+ startup:
791
+ enabled: true
792
+ path: /health/startup
793
+ initialDelaySeconds: 10
794
+ periodSeconds: 5
795
+ timeoutSeconds: 3
796
+ failureThreshold: 30
797
+
798
+ # ConfigMap data
799
+ config:
800
+ LOG_LEVEL: "info"
801
+ LOG_FORMAT: "json"
802
+ TELEMETRY_ENABLED: "true"
803
+ METRICS_ENABLED: "true"
804
+ TRACING_ENABLED: "true"
805
+ CACHE_TTL: "3600"
806
+ SESSION_TTL: "86400"
807
+ MAX_CONCURRENT_AGENTS: "10"
808
+ MAX_TOKENS_PER_REQUEST: "100000"
809
+ DEFAULT_MODEL: "claude-opus-4-5-20251101"
810
+
811
+ # Secrets (use external secrets in production)
812
+ secrets:
813
+ create: true
814
+ annotations: {}
815
+ # Anthropic API key
816
+ ANTHROPIC_API_KEY: ""
817
+ # Database connection string
818
+ DATABASE_URL: ""
819
+ # Redis connection string
820
+ REDIS_URL: ""
821
+ # RabbitMQ connection string
822
+ RABBITMQ_URL: ""
823
+ # JWT secret
824
+ JWT_SECRET: ""
825
+
826
+ # External Secrets Operator integration
827
+ externalSecrets:
828
+ enabled: false
829
+ secretStoreRef:
830
+ name: vault-backend
831
+ kind: SecretStore
832
+ target:
833
+ name: elsabro-secrets
834
+ data: []
835
+
836
+ # Network Policy
837
+ networkPolicy:
838
+ enabled: true
839
+ ingress:
840
+ - from:
841
+ - namespaceSelector:
842
+ matchLabels:
843
+ name: ingress-nginx
844
+ - podSelector:
845
+ matchLabels:
846
+ app.kubernetes.io/name: prometheus
847
+ ports:
848
+ - port: 8080
849
+ protocol: TCP
850
+ - port: 9090
851
+ protocol: TCP
852
+ egress:
853
+ - to:
854
+ - namespaceSelector: {}
855
+ ports:
856
+ - port: 443
857
+ protocol: TCP
858
+ - port: 5432
859
+ protocol: TCP
860
+ - port: 6379
861
+ protocol: TCP
862
+ - port: 5672
863
+ protocol: TCP
864
+
865
+ # Service Monitor for Prometheus
866
+ serviceMonitor:
867
+ enabled: true
868
+ namespace: ""
869
+ interval: 30s
870
+ scrapeTimeout: 10s
871
+ labels: {}
872
+ relabelings: []
873
+ metricRelabelings: []
874
+
875
+ # Prometheus Rules
876
+ prometheusRule:
877
+ enabled: true
878
+ namespace: ""
879
+ labels: {}
880
+ rules:
881
+ - alert: ElsabroHighErrorRate
882
+ expr: |
883
+ sum(rate(elsabro_requests_total{status=~"5.."}[5m]))
884
+ / sum(rate(elsabro_requests_total[5m])) > 0.05
885
+ for: 5m
886
+ labels:
887
+ severity: critical
888
+ annotations:
889
+ summary: "High error rate detected"
890
+ description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
891
+ - alert: ElsabroHighLatency
892
+ expr: |
893
+ histogram_quantile(0.95, sum(rate(elsabro_request_duration_seconds_bucket[5m]))
894
+ by (le)) > 2
895
+ for: 5m
896
+ labels:
897
+ severity: warning
898
+ annotations:
899
+ summary: "High latency detected"
900
+ description: "P95 latency is {{ $value | humanizeDuration }} (threshold: 2s)"
901
+ - alert: ElsabroPodNotReady
902
+ expr: |
903
+ kube_pod_status_ready{namespace="elsabro", condition="true"} == 0
904
+ for: 5m
905
+ labels:
906
+ severity: critical
907
+ annotations:
908
+ summary: "Pod not ready"
909
+ description: "Pod {{ $labels.pod }} has been not ready for 5 minutes"
910
+
911
+ # Redis subchart configuration
912
+ redis:
913
+ enabled: true
914
+ architecture: standalone
915
+ auth:
916
+ enabled: true
917
+ password: ""
918
+ master:
919
+ persistence:
920
+ enabled: true
921
+ size: 8Gi
922
+ resources:
923
+ requests:
924
+ cpu: 100m
925
+ memory: 128Mi
926
+ limits:
927
+ cpu: 500m
928
+ memory: 512Mi
929
+
930
+ # PostgreSQL subchart configuration
931
+ postgresql:
932
+ enabled: true
933
+ auth:
934
+ username: elsabro
935
+ password: ""
936
+ database: elsabro
937
+ primary:
938
+ persistence:
939
+ enabled: true
940
+ size: 20Gi
941
+ resources:
942
+ requests:
943
+ cpu: 250m
944
+ memory: 256Mi
945
+ limits:
946
+ cpu: 1000m
947
+ memory: 1Gi
948
+
949
+ # RabbitMQ subchart configuration
950
+ rabbitmq:
951
+ enabled: true
952
+ auth:
953
+ username: elsabro
954
+ password: ""
955
+ persistence:
956
+ enabled: true
957
+ size: 8Gi
958
+ resources:
959
+ requests:
960
+ cpu: 100m
961
+ memory: 256Mi
962
+ limits:
963
+ cpu: 500m
964
+ memory: 512Mi
965
+ ```
966
+
967
+ ### values-prod.yaml (Production Overrides)
968
+
969
+ ```yaml
970
+ # Production-specific values
971
+
972
+ replicaCount: 5
973
+
974
+ image:
975
+ pullPolicy: Always
976
+
977
+ resources:
978
+ requests:
979
+ cpu: 1000m
980
+ memory: 2Gi
981
+ limits:
982
+ cpu: 4000m
983
+ memory: 8Gi
984
+
985
+ autoscaling:
986
+ enabled: true
987
+ minReplicas: 5
988
+ maxReplicas: 50
989
+ targetCPUUtilizationPercentage: 60
990
+ targetMemoryUtilizationPercentage: 70
991
+
992
+ vpa:
993
+ enabled: true
994
+ updateMode: "Auto"
995
+
996
+ podDisruptionBudget:
997
+ enabled: true
998
+ minAvailable: 3
999
+
1000
+ ingress:
1001
+ enabled: true
1002
+ className: "nginx"
1003
+ annotations:
1004
+ nginx.ingress.kubernetes.io/ssl-redirect: "true"
1005
+ nginx.ingress.kubernetes.io/proxy-body-size: "100m"
1006
+ nginx.ingress.kubernetes.io/rate-limit: "100"
1007
+ nginx.ingress.kubernetes.io/rate-limit-window: "1m"
1008
+ hosts:
1009
+ - host: api.elsabro.io
1010
+ paths:
1011
+ - path: /
1012
+ pathType: Prefix
1013
+ tls:
1014
+ - secretName: elsabro-prod-tls
1015
+ hosts:
1016
+ - api.elsabro.io
1017
+
1018
+ config:
1019
+ LOG_LEVEL: "warn"
1020
+ MAX_CONCURRENT_AGENTS: "50"
1021
+ MAX_TOKENS_PER_REQUEST: "200000"
1022
+
1023
+ externalSecrets:
1024
+ enabled: true
1025
+ secretStoreRef:
1026
+ name: aws-secrets-manager
1027
+ kind: ClusterSecretStore
1028
+
1029
+ redis:
1030
+ architecture: replication
1031
+ replica:
1032
+ replicaCount: 3
1033
+ master:
1034
+ persistence:
1035
+ size: 32Gi
1036
+ resources:
1037
+ requests:
1038
+ cpu: 500m
1039
+ memory: 1Gi
1040
+ limits:
1041
+ cpu: 2000m
1042
+ memory: 4Gi
1043
+
1044
+ postgresql:
1045
+ architecture: replication
1046
+ readReplicas:
1047
+ replicaCount: 2
1048
+ primary:
1049
+ persistence:
1050
+ size: 100Gi
1051
+ resources:
1052
+ requests:
1053
+ cpu: 1000m
1054
+ memory: 2Gi
1055
+ limits:
1056
+ cpu: 4000m
1057
+ memory: 8Gi
1058
+
1059
+ rabbitmq:
1060
+ replicaCount: 3
1061
+ clustering:
1062
+ enabled: true
1063
+ persistence:
1064
+ size: 32Gi
1065
+ resources:
1066
+ requests:
1067
+ cpu: 500m
1068
+ memory: 1Gi
1069
+ limits:
1070
+ cpu: 2000m
1071
+ memory: 4Gi
1072
+ ```
1073
+
1074
+ ### HelmChartGenerator TypeScript Interface
1075
+
1076
+ ```typescript
1077
+ /**
1078
+ * HelmChartGenerator - Generates Helm charts for ELSABRO deployments
1079
+ */
1080
+
1081
+ interface HelmChartConfig {
1082
+ name: string;
1083
+ version: string;
1084
+ appVersion: string;
1085
+ description: string;
1086
+ dependencies: ChartDependency[];
1087
+ }
1088
+
1089
+ interface ChartDependency {
1090
+ name: string;
1091
+ version: string;
1092
+ repository: string;
1093
+ condition: string;
1094
+ }
1095
+
1096
+ interface ValuesConfig {
1097
+ environment: 'dev' | 'staging' | 'prod';
1098
+ replicas: number;
1099
+ image: ImageConfig;
1100
+ resources: ResourceConfig;
1101
+ autoscaling: AutoscalingConfig;
1102
+ ingress: IngressConfig;
1103
+ config: Record<string, string>;
1104
+ secrets: Record<string, string>;
1105
+ redis: RedisConfig;
1106
+ postgresql: PostgreSQLConfig;
1107
+ rabbitmq: RabbitMQConfig;
1108
+ }
1109
+
1110
+ class HelmChartGenerator {
1111
+ private chartDir: string;
1112
+
1113
+ constructor(outputDir: string) {
1114
+ this.chartDir = path.join(outputDir, 'elsabro-chart');
1115
+ }
1116
+
1117
+ // Generate complete Helm chart
1118
+ async generate(config: HelmChartConfig): Promise<void> {
1119
+ await this.createDirectoryStructure();
1120
+ await this.generateChartYaml(config);
1121
+ await this.generateValuesFiles();
1122
+ await this.generateTemplates();
1123
+ await this.generateHelpers();
1124
+ }
1125
+
1126
+ // Generate Chart.yaml
1127
+ private async generateChartYaml(config: HelmChartConfig): Promise<void> {
1128
+ const chartYaml = {
1129
+ apiVersion: 'v2',
1130
+ name: config.name,
1131
+ description: config.description,
1132
+ type: 'application',
1133
+ version: config.version,
1134
+ appVersion: config.appVersion,
1135
+ kubeVersion: '>=1.25.0-0',
1136
+ dependencies: config.dependencies
1137
+ };
1138
+
1139
+ await this.writeYaml(
1140
+ path.join(this.chartDir, 'Chart.yaml'),
1141
+ chartYaml
1142
+ );
1143
+ }
1144
+
1145
+ // Generate environment-specific values
1146
+ async generateValuesForEnvironment(
1147
+ environment: 'dev' | 'staging' | 'prod',
1148
+ customValues?: Partial<ValuesConfig>
1149
+ ): Promise<string> {
1150
+ const baseValues = this.getBaseValues();
1151
+ const envOverrides = this.getEnvironmentOverrides(environment);
1152
+
1153
+ const values = deepMerge(baseValues, envOverrides, customValues || {});
1154
+
1155
+ const outputPath = path.join(
1156
+ this.chartDir,
1157
+ `values-${environment}.yaml`
1158
+ );
1159
+
1160
+ await this.writeYaml(outputPath, values);
1161
+ return outputPath;
1162
+ }
1163
+
1164
+ // Generate Kubernetes manifests from templates
1165
+ async template(
1166
+ releaseName: string,
1167
+ namespace: string,
1168
+ valuesFile: string
1169
+ ): Promise<string> {
1170
+ // Use execFile for safe command execution
1171
+ const { stdout } = await execFileAsync('helm', [
1172
+ 'template',
1173
+ releaseName,
1174
+ this.chartDir,
1175
+ '--namespace', namespace,
1176
+ '--values', valuesFile
1177
+ ]);
1178
+ return stdout;
1179
+ }
1180
+
1181
+ // Validate chart
1182
+ async lint(): Promise<LintResult> {
1183
+ const { stdout, stderr, status } = await execFileAsync('helm', ['lint', this.chartDir]);
1184
+ return {
1185
+ success: status === 0,
1186
+ output: stdout,
1187
+ errors: this.parseLintErrors(stderr)
1188
+ };
1189
+ }
1190
+
1191
+ // Package chart
1192
+ async package(destination: string): Promise<string> {
1193
+ const { stdout } = await execFileAsync('helm', [
1194
+ 'package',
1195
+ this.chartDir,
1196
+ '--destination', destination
1197
+ ]);
1198
+ const match = stdout.match(/Successfully packaged chart and saved it to: (.+)/);
1199
+ return match ? match[1] : '';
1200
+ }
1201
+
1202
+ // Push to registry
1203
+ async push(packagePath: string, registry: string): Promise<void> {
1204
+ await execFileAsync('helm', ['push', packagePath, registry]);
1205
+ }
1206
+
1207
+ private getEnvironmentOverrides(env: 'dev' | 'staging' | 'prod'): Partial<ValuesConfig> {
1208
+ const overrides: Record<string, Partial<ValuesConfig>> = {
1209
+ dev: {
1210
+ replicas: 1,
1211
+ resources: {
1212
+ requests: { cpu: '250m', memory: '256Mi' },
1213
+ limits: { cpu: '1000m', memory: '1Gi' }
1214
+ },
1215
+ autoscaling: { enabled: false },
1216
+ ingress: { enabled: false }
1217
+ },
1218
+ staging: {
1219
+ replicas: 2,
1220
+ resources: {
1221
+ requests: { cpu: '500m', memory: '512Mi' },
1222
+ limits: { cpu: '2000m', memory: '2Gi' }
1223
+ },
1224
+ autoscaling: { enabled: true, minReplicas: 2, maxReplicas: 5 }
1225
+ },
1226
+ prod: {
1227
+ replicas: 5,
1228
+ resources: {
1229
+ requests: { cpu: '1000m', memory: '2Gi' },
1230
+ limits: { cpu: '4000m', memory: '8Gi' }
1231
+ },
1232
+ autoscaling: { enabled: true, minReplicas: 5, maxReplicas: 50 }
1233
+ }
1234
+ };
1235
+
1236
+ return overrides[env];
1237
+ }
1238
+ }
1239
+ ```
1240
+
1241
+ ---
1242
+
1243
+ ## 3. ResourceScaler
1244
+
1245
+ ### TypeScript Interfaces
1246
+
1247
+ ```typescript
1248
+ /**
1249
+ * ResourceScaler - Manages HPA, VPA, and custom metrics scaling
1250
+ */
1251
+
1252
+ // HPA Configuration
1253
+ interface HPAConfig {
1254
+ name: string;
1255
+ namespace: string;
1256
+ targetRef: {
1257
+ apiVersion: string;
1258
+ kind: string;
1259
+ name: string;
1260
+ };
1261
+ minReplicas: number;
1262
+ maxReplicas: number;
1263
+ metrics: HPAMetric[];
1264
+ behavior?: HPABehavior;
1265
+ }
1266
+
1267
+ interface HPAMetric {
1268
+ type: 'Resource' | 'Pods' | 'Object' | 'External';
1269
+ resource?: {
1270
+ name: 'cpu' | 'memory';
1271
+ target: {
1272
+ type: 'Utilization' | 'AverageValue';
1273
+ averageUtilization?: number;
1274
+ averageValue?: string;
1275
+ };
1276
+ };
1277
+ pods?: {
1278
+ metric: { name: string; selector?: LabelSelector };
1279
+ target: { type: 'AverageValue'; averageValue: string };
1280
+ };
1281
+ object?: {
1282
+ describedObject: { apiVersion: string; kind: string; name: string };
1283
+ metric: { name: string };
1284
+ target: { type: 'Value' | 'AverageValue'; value?: string; averageValue?: string };
1285
+ };
1286
+ external?: {
1287
+ metric: { name: string; selector?: LabelSelector };
1288
+ target: { type: 'Value' | 'AverageValue'; value?: string; averageValue?: string };
1289
+ };
1290
+ }
1291
+
1292
+ interface HPABehavior {
1293
+ scaleDown?: ScalingPolicy;
1294
+ scaleUp?: ScalingPolicy;
1295
+ }
1296
+
1297
+ interface ScalingPolicy {
1298
+ stabilizationWindowSeconds?: number;
1299
+ selectPolicy?: 'Max' | 'Min' | 'Disabled';
1300
+ policies?: {
1301
+ type: 'Pods' | 'Percent';
1302
+ value: number;
1303
+ periodSeconds: number;
1304
+ }[];
1305
+ }
1306
+
1307
+ // VPA Configuration
1308
+ interface VPAConfig {
1309
+ name: string;
1310
+ namespace: string;
1311
+ targetRef: {
1312
+ apiVersion: string;
1313
+ kind: string;
1314
+ name: string;
1315
+ };
1316
+ updatePolicy: {
1317
+ updateMode: 'Off' | 'Initial' | 'Recreate' | 'Auto';
1318
+ minReplicas?: number;
1319
+ };
1320
+ resourcePolicy?: {
1321
+ containerPolicies: ContainerResourcePolicy[];
1322
+ };
1323
+ }
1324
+
1325
+ interface ContainerResourcePolicy {
1326
+ containerName: string;
1327
+ mode?: 'Auto' | 'Off';
1328
+ minAllowed?: { cpu?: string; memory?: string };
1329
+ maxAllowed?: { cpu?: string; memory?: string };
1330
+ controlledResources?: ('cpu' | 'memory')[];
1331
+ controlledValues?: 'RequestsAndLimits' | 'RequestsOnly';
1332
+ }
1333
+
1334
+ // Resource Quota Configuration
1335
+ interface ResourceQuotaConfig {
1336
+ name: string;
1337
+ namespace: string;
1338
+ hard: {
1339
+ 'requests.cpu'?: string;
1340
+ 'requests.memory'?: string;
1341
+ 'limits.cpu'?: string;
1342
+ 'limits.memory'?: string;
1343
+ 'pods'?: string;
1344
+ 'services'?: string;
1345
+ 'secrets'?: string;
1346
+ 'configmaps'?: string;
1347
+ 'persistentvolumeclaims'?: string;
1348
+ 'requests.storage'?: string;
1349
+ };
1350
+ scopeSelector?: {
1351
+ matchExpressions: {
1352
+ operator: 'In' | 'NotIn' | 'Exists' | 'DoesNotExist';
1353
+ scopeName: 'Terminating' | 'NotTerminating' | 'BestEffort' | 'NotBestEffort' | 'PriorityClass';
1354
+ values?: string[];
1355
+ }[];
1356
+ };
1357
+ }
1358
+
1359
+ // Custom Metrics for ELSABRO
1360
+ interface ElsabroCustomMetrics {
1361
+ // Queue length - number of pending tasks
1362
+ queueLength: {
1363
+ name: 'elsabro_queue_length';
1364
+ targetAverageValue: number;
1365
+ scaleThreshold: number;
1366
+ };
1367
+ // Active agents - number of currently running agents
1368
+ activeAgents: {
1369
+ name: 'elsabro_active_agents';
1370
+ targetAverageValue: number;
1371
+ maxPerPod: number;
1372
+ };
1373
+ // Token usage rate - tokens per second
1374
+ tokenUsageRate: {
1375
+ name: 'elsabro_tokens_per_second';
1376
+ targetAverageValue: number;
1377
+ };
1378
+ // Memory pressure - percentage of allocated memory
1379
+ memoryPressure: {
1380
+ name: 'elsabro_memory_pressure';
1381
+ targetPercentage: number;
1382
+ };
1383
+ }
1384
+
1385
+ class ResourceScaler {
1386
+ private k8sClient: KubernetesClient;
1387
+ private prometheusClient: PrometheusClient;
1388
+
1389
+ constructor(config: ScalerConfig) {
1390
+ this.k8sClient = new KubernetesClient(config.kubeConfig);
1391
+ this.prometheusClient = new PrometheusClient(config.prometheusUrl);
1392
+ }
1393
+
1394
+ // Create or update HPA
1395
+ async configureHPA(config: HPAConfig): Promise<V2HorizontalPodAutoscaler> {
1396
+ const hpa: V2HorizontalPodAutoscaler = {
1397
+ apiVersion: 'autoscaling/v2',
1398
+ kind: 'HorizontalPodAutoscaler',
1399
+ metadata: {
1400
+ name: config.name,
1401
+ namespace: config.namespace,
1402
+ labels: {
1403
+ 'app.kubernetes.io/name': 'elsabro',
1404
+ 'app.kubernetes.io/component': 'autoscaler'
1405
+ }
1406
+ },
1407
+ spec: {
1408
+ scaleTargetRef: config.targetRef,
1409
+ minReplicas: config.minReplicas,
1410
+ maxReplicas: config.maxReplicas,
1411
+ metrics: config.metrics,
1412
+ behavior: config.behavior
1413
+ }
1414
+ };
1415
+
1416
+ return this.k8sClient.applyResource(hpa);
1417
+ }
1418
+
1419
+ // Create or update VPA
1420
+ async configureVPA(config: VPAConfig): Promise<VerticalPodAutoscaler> {
1421
+ const vpa: VerticalPodAutoscaler = {
1422
+ apiVersion: 'autoscaling.k8s.io/v1',
1423
+ kind: 'VerticalPodAutoscaler',
1424
+ metadata: {
1425
+ name: config.name,
1426
+ namespace: config.namespace
1427
+ },
1428
+ spec: {
1429
+ targetRef: config.targetRef,
1430
+ updatePolicy: config.updatePolicy,
1431
+ resourcePolicy: config.resourcePolicy
1432
+ }
1433
+ };
1434
+
1435
+ return this.k8sClient.applyResource(vpa);
1436
+ }
1437
+
1438
+ // Create resource quota
1439
+ async createResourceQuota(config: ResourceQuotaConfig): Promise<V1ResourceQuota> {
1440
+ const quota: V1ResourceQuota = {
1441
+ apiVersion: 'v1',
1442
+ kind: 'ResourceQuota',
1443
+ metadata: {
1444
+ name: config.name,
1445
+ namespace: config.namespace
1446
+ },
1447
+ spec: {
1448
+ hard: config.hard,
1449
+ scopeSelector: config.scopeSelector
1450
+ }
1451
+ };
1452
+
1453
+ return this.k8sClient.applyResource(quota);
1454
+ }
1455
+
1456
+ // Configure ELSABRO-specific scaling
1457
+ async configureElsabroScaling(
1458
+ namespace: string,
1459
+ deploymentName: string,
1460
+ customMetrics: Partial<ElsabroCustomMetrics>
1461
+ ): Promise<ScalingConfiguration> {
1462
+ // Build HPA metrics array
1463
+ const metrics: HPAMetric[] = [
1464
+ // CPU utilization (standard)
1465
+ {
1466
+ type: 'Resource',
1467
+ resource: {
1468
+ name: 'cpu',
1469
+ target: { type: 'Utilization', averageUtilization: 70 }
1470
+ }
1471
+ },
1472
+ // Memory utilization (standard)
1473
+ {
1474
+ type: 'Resource',
1475
+ resource: {
1476
+ name: 'memory',
1477
+ target: { type: 'Utilization', averageUtilization: 80 }
1478
+ }
1479
+ }
1480
+ ];
1481
+
1482
+ // Add custom metrics
1483
+ if (customMetrics.queueLength) {
1484
+ metrics.push({
1485
+ type: 'Pods',
1486
+ pods: {
1487
+ metric: { name: customMetrics.queueLength.name },
1488
+ target: {
1489
+ type: 'AverageValue',
1490
+ averageValue: String(customMetrics.queueLength.targetAverageValue)
1491
+ }
1492
+ }
1493
+ });
1494
+ }
1495
+
1496
+ if (customMetrics.activeAgents) {
1497
+ metrics.push({
1498
+ type: 'Pods',
1499
+ pods: {
1500
+ metric: { name: customMetrics.activeAgents.name },
1501
+ target: {
1502
+ type: 'AverageValue',
1503
+ averageValue: String(customMetrics.activeAgents.targetAverageValue)
1504
+ }
1505
+ }
1506
+ });
1507
+ }
1508
+
1509
+ // Configure HPA
1510
+ const hpa = await this.configureHPA({
1511
+ name: `${deploymentName}-hpa`,
1512
+ namespace,
1513
+ targetRef: {
1514
+ apiVersion: 'apps/v1',
1515
+ kind: 'Deployment',
1516
+ name: deploymentName
1517
+ },
1518
+ minReplicas: 2,
1519
+ maxReplicas: 20,
1520
+ metrics,
1521
+ behavior: {
1522
+ scaleDown: {
1523
+ stabilizationWindowSeconds: 300,
1524
+ policies: [
1525
+ { type: 'Percent', value: 10, periodSeconds: 60 }
1526
+ ]
1527
+ },
1528
+ scaleUp: {
1529
+ stabilizationWindowSeconds: 0,
1530
+ policies: [
1531
+ { type: 'Percent', value: 100, periodSeconds: 15 },
1532
+ { type: 'Pods', value: 4, periodSeconds: 15 }
1533
+ ],
1534
+ selectPolicy: 'Max'
1535
+ }
1536
+ }
1537
+ });
1538
+
1539
+ // Configure VPA (in recommendation mode)
1540
+ const vpa = await this.configureVPA({
1541
+ name: `${deploymentName}-vpa`,
1542
+ namespace,
1543
+ targetRef: {
1544
+ apiVersion: 'apps/v1',
1545
+ kind: 'Deployment',
1546
+ name: deploymentName
1547
+ },
1548
+ updatePolicy: {
1549
+ updateMode: 'Auto',
1550
+ minReplicas: 2
1551
+ },
1552
+ resourcePolicy: {
1553
+ containerPolicies: [{
1554
+ containerName: 'elsabro',
1555
+ minAllowed: { cpu: '250m', memory: '256Mi' },
1556
+ maxAllowed: { cpu: '4000m', memory: '8Gi' },
1557
+ controlledResources: ['cpu', 'memory'],
1558
+ controlledValues: 'RequestsAndLimits'
1559
+ }]
1560
+ }
1561
+ });
1562
+
1563
+ return { hpa, vpa };
1564
+ }
1565
+
1566
+ // Get current scaling metrics
1567
+ async getScalingMetrics(
1568
+ namespace: string,
1569
+ deploymentName: string
1570
+ ): Promise<ScalingMetrics> {
1571
+ const hpaStatus = await this.k8sClient.getHPAStatus(
1572
+ `${deploymentName}-hpa`,
1573
+ namespace
1574
+ );
1575
+
1576
+ const prometheusMetrics = await this.prometheusClient.query(`
1577
+ {
1578
+ queue_length: avg(elsabro_queue_length{namespace="${namespace}"}),
1579
+ active_agents: sum(elsabro_active_agents{namespace="${namespace}"}),
1580
+ tokens_per_second: rate(elsabro_tokens_total{namespace="${namespace}"}[5m]),
1581
+ cpu_usage: avg(container_cpu_usage_seconds_total{namespace="${namespace}", container="elsabro"}),
1582
+ memory_usage: avg(container_memory_usage_bytes{namespace="${namespace}", container="elsabro"})
1583
+ }
1584
+ `);
1585
+
1586
+ return {
1587
+ currentReplicas: hpaStatus.currentReplicas,
1588
+ desiredReplicas: hpaStatus.desiredReplicas,
1589
+ metrics: {
1590
+ cpu: prometheusMetrics.cpu_usage,
1591
+ memory: prometheusMetrics.memory_usage,
1592
+ queueLength: prometheusMetrics.queue_length,
1593
+ activeAgents: prometheusMetrics.active_agents,
1594
+ tokensPerSecond: prometheusMetrics.tokens_per_second
1595
+ },
1596
+ recommendations: this.generateRecommendations(prometheusMetrics)
1597
+ };
1598
+ }
1599
+
1600
+ // Manual scaling
1601
+ async scale(
1602
+ namespace: string,
1603
+ deploymentName: string,
1604
+ replicas: number
1605
+ ): Promise<void> {
1606
+ await this.k8sClient.scaleDeployment(namespace, deploymentName, replicas);
1607
+ }
1608
+
1609
+ private generateRecommendations(metrics: any): ScalingRecommendation[] {
1610
+ const recommendations: ScalingRecommendation[] = [];
1611
+
1612
+ if (metrics.queue_length > 500) {
1613
+ recommendations.push({
1614
+ type: 'scale_up',
1615
+ reason: 'High queue length',
1616
+ suggestedReplicas: Math.ceil(metrics.queue_length / 100)
1617
+ });
1618
+ }
1619
+
1620
+ if (metrics.active_agents > 40) {
1621
+ recommendations.push({
1622
+ type: 'scale_up',
1623
+ reason: 'High agent concurrency',
1624
+ suggestedReplicas: Math.ceil(metrics.active_agents / 5)
1625
+ });
1626
+ }
1627
+
1628
+ return recommendations;
1629
+ }
1630
+ }
1631
+ ```
1632
+
1633
+ ### HPA YAML Example
1634
+
1635
+ ```yaml
1636
+ apiVersion: autoscaling/v2
1637
+ kind: HorizontalPodAutoscaler
1638
+ metadata:
1639
+ name: elsabro-hpa
1640
+ namespace: elsabro
1641
+ labels:
1642
+ app.kubernetes.io/name: elsabro
1643
+ app.kubernetes.io/component: autoscaler
1644
+ spec:
1645
+ scaleTargetRef:
1646
+ apiVersion: apps/v1
1647
+ kind: Deployment
1648
+ name: elsabro
1649
+ minReplicas: 2
1650
+ maxReplicas: 20
1651
+ metrics:
1652
+ # CPU utilization
1653
+ - type: Resource
1654
+ resource:
1655
+ name: cpu
1656
+ target:
1657
+ type: Utilization
1658
+ averageUtilization: 70
1659
+ # Memory utilization
1660
+ - type: Resource
1661
+ resource:
1662
+ name: memory
1663
+ target:
1664
+ type: Utilization
1665
+ averageUtilization: 80
1666
+ # Custom metric: Queue length
1667
+ - type: Pods
1668
+ pods:
1669
+ metric:
1670
+ name: elsabro_queue_length
1671
+ target:
1672
+ type: AverageValue
1673
+ averageValue: "100"
1674
+ # Custom metric: Active agents
1675
+ - type: Pods
1676
+ pods:
1677
+ metric:
1678
+ name: elsabro_active_agents
1679
+ target:
1680
+ type: AverageValue
1681
+ averageValue: "5"
1682
+ behavior:
1683
+ scaleDown:
1684
+ stabilizationWindowSeconds: 300
1685
+ policies:
1686
+ - type: Percent
1687
+ value: 10
1688
+ periodSeconds: 60
1689
+ scaleUp:
1690
+ stabilizationWindowSeconds: 0
1691
+ policies:
1692
+ - type: Percent
1693
+ value: 100
1694
+ periodSeconds: 15
1695
+ - type: Pods
1696
+ value: 4
1697
+ periodSeconds: 15
1698
+ selectPolicy: Max
1699
+ ```
1700
+
1701
+ ---
1702
+
1703
+ ## 4. HealthMonitor
1704
+
1705
+ ### TypeScript Interfaces
1706
+
1707
+ ```typescript
1708
+ /**
1709
+ * HealthMonitor - Health checks, metrics, and alerting for ELSABRO
1710
+ */
1711
+
1712
+ // Health check types
1713
+ interface HealthCheckResult {
1714
+ status: 'healthy' | 'degraded' | 'unhealthy';
1715
+ timestamp: string;
1716
+ checks: {
1717
+ [key: string]: {
1718
+ status: 'pass' | 'warn' | 'fail';
1719
+ message?: string;
1720
+ duration_ms?: number;
1721
+ data?: Record<string, any>;
1722
+ };
1723
+ };
1724
+ version: string;
1725
+ uptime_seconds: number;
1726
+ }
1727
+
1728
+ // Probe configuration
1729
+ interface ProbeConfig {
1730
+ liveness: {
1731
+ path: string;
1732
+ port: number;
1733
+ initialDelaySeconds: number;
1734
+ periodSeconds: number;
1735
+ timeoutSeconds: number;
1736
+ failureThreshold: number;
1737
+ successThreshold: number;
1738
+ };
1739
+ readiness: {
1740
+ path: string;
1741
+ port: number;
1742
+ initialDelaySeconds: number;
1743
+ periodSeconds: number;
1744
+ timeoutSeconds: number;
1745
+ failureThreshold: number;
1746
+ successThreshold: number;
1747
+ };
1748
+ startup: {
1749
+ path: string;
1750
+ port: number;
1751
+ initialDelaySeconds: number;
1752
+ periodSeconds: number;
1753
+ timeoutSeconds: number;
1754
+ failureThreshold: number;
1755
+ };
1756
+ }
1757
+
1758
+ // Metrics endpoint
1759
+ interface MetricsConfig {
1760
+ enabled: boolean;
1761
+ port: number;
1762
+ path: string;
1763
+ namespace: string;
1764
+ labels: Record<string, string>;
1765
+ histogramBuckets: {
1766
+ requestDuration: number[];
1767
+ tokenUsage: number[];
1768
+ agentDuration: number[];
1769
+ };
1770
+ }
1771
+
1772
+ // Alert rule
1773
+ interface AlertRule {
1774
+ name: string;
1775
+ expression: string;
1776
+ duration: string;
1777
+ severity: 'critical' | 'warning' | 'info';
1778
+ summary: string;
1779
+ description: string;
1780
+ runbook_url?: string;
1781
+ labels?: Record<string, string>;
1782
+ annotations?: Record<string, string>;
1783
+ }
1784
+
1785
+ class HealthMonitor {
1786
+ private startTime: Date;
1787
+ private checks: Map<string, HealthCheck>;
1788
+ private metricsRegistry: MetricsRegistry;
1789
+
1790
+ constructor(config: HealthMonitorConfig) {
1791
+ this.startTime = new Date();
1792
+ this.checks = new Map();
1793
+ this.metricsRegistry = new MetricsRegistry(config.metrics);
1794
+ this.registerDefaultChecks();
1795
+ this.registerDefaultMetrics();
1796
+ }
1797
+
1798
+ // Register default health checks
1799
+ private registerDefaultChecks(): void {
1800
+ // Database connection
1801
+ this.registerCheck('database', async () => {
1802
+ const start = Date.now();
1803
+ try {
1804
+ await this.db.query('SELECT 1');
1805
+ return {
1806
+ status: 'pass',
1807
+ duration_ms: Date.now() - start
1808
+ };
1809
+ } catch (error) {
1810
+ return {
1811
+ status: 'fail',
1812
+ message: error.message,
1813
+ duration_ms: Date.now() - start
1814
+ };
1815
+ }
1816
+ });
1817
+
1818
+ // Redis connection
1819
+ this.registerCheck('redis', async () => {
1820
+ const start = Date.now();
1821
+ try {
1822
+ await this.redis.ping();
1823
+ return {
1824
+ status: 'pass',
1825
+ duration_ms: Date.now() - start
1826
+ };
1827
+ } catch (error) {
1828
+ return {
1829
+ status: 'fail',
1830
+ message: error.message,
1831
+ duration_ms: Date.now() - start
1832
+ };
1833
+ }
1834
+ });
1835
+
1836
+ // RabbitMQ connection
1837
+ this.registerCheck('rabbitmq', async () => {
1838
+ const start = Date.now();
1839
+ try {
1840
+ const channel = await this.amqp.checkQueue('elsabro_tasks');
1841
+ return {
1842
+ status: 'pass',
1843
+ duration_ms: Date.now() - start,
1844
+ data: { messageCount: channel.messageCount }
1845
+ };
1846
+ } catch (error) {
1847
+ return {
1848
+ status: 'fail',
1849
+ message: error.message,
1850
+ duration_ms: Date.now() - start
1851
+ };
1852
+ }
1853
+ });
1854
+
1855
+ // Memory usage
1856
+ this.registerCheck('memory', async () => {
1857
+ const usage = process.memoryUsage();
1858
+ const heapUsedPercent = (usage.heapUsed / usage.heapTotal) * 100;
1859
+ return {
1860
+ status: heapUsedPercent > 90 ? 'warn' : 'pass',
1861
+ data: {
1862
+ heapUsed: usage.heapUsed,
1863
+ heapTotal: usage.heapTotal,
1864
+ external: usage.external,
1865
+ rss: usage.rss
1866
+ }
1867
+ };
1868
+ });
1869
+
1870
+ // Agent pool
1871
+ this.registerCheck('agent_pool', async () => {
1872
+ const pool = await this.agentManager.getPoolStatus();
1873
+ return {
1874
+ status: pool.available > 0 ? 'pass' : 'warn',
1875
+ data: {
1876
+ available: pool.available,
1877
+ active: pool.active,
1878
+ queued: pool.queued
1879
+ }
1880
+ };
1881
+ });
1882
+ }
1883
+
1884
+ // Register default metrics
1885
+ private registerDefaultMetrics(): void {
1886
+ // Request counter
1887
+ this.metricsRegistry.registerCounter({
1888
+ name: 'elsabro_requests_total',
1889
+ help: 'Total number of requests',
1890
+ labelNames: ['method', 'path', 'status']
1891
+ });
1892
+
1893
+ // Request duration histogram
1894
+ this.metricsRegistry.registerHistogram({
1895
+ name: 'elsabro_request_duration_seconds',
1896
+ help: 'Request duration in seconds',
1897
+ labelNames: ['method', 'path'],
1898
+ buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 30, 60]
1899
+ });
1900
+
1901
+ // Agent invocations
1902
+ this.metricsRegistry.registerCounter({
1903
+ name: 'elsabro_agent_invocations_total',
1904
+ help: 'Total agent invocations',
1905
+ labelNames: ['agent', 'model', 'status']
1906
+ });
1907
+
1908
+ // Agent duration
1909
+ this.metricsRegistry.registerHistogram({
1910
+ name: 'elsabro_agent_duration_seconds',
1911
+ help: 'Agent execution duration',
1912
+ labelNames: ['agent', 'model'],
1913
+ buckets: [1, 5, 10, 30, 60, 120, 300, 600]
1914
+ });
1915
+
1916
+ // Token usage
1917
+ this.metricsRegistry.registerCounter({
1918
+ name: 'elsabro_tokens_total',
1919
+ help: 'Total tokens used',
1920
+ labelNames: ['model', 'type']
1921
+ });
1922
+
1923
+ // Queue length gauge
1924
+ this.metricsRegistry.registerGauge({
1925
+ name: 'elsabro_queue_length',
1926
+ help: 'Current queue length'
1927
+ });
1928
+
1929
+ // Active agents gauge
1930
+ this.metricsRegistry.registerGauge({
1931
+ name: 'elsabro_active_agents',
1932
+ help: 'Currently active agents'
1933
+ });
1934
+
1935
+ // Flow executions
1936
+ this.metricsRegistry.registerCounter({
1937
+ name: 'elsabro_flow_executions_total',
1938
+ help: 'Total flow executions',
1939
+ labelNames: ['flow', 'status']
1940
+ });
1941
+
1942
+ // Checkpoint counter
1943
+ this.metricsRegistry.registerCounter({
1944
+ name: 'elsabro_checkpoints_total',
1945
+ help: 'Total checkpoints saved',
1946
+ labelNames: ['type']
1947
+ });
1948
+
1949
+ // Error counter
1950
+ this.metricsRegistry.registerCounter({
1951
+ name: 'elsabro_errors_total',
1952
+ help: 'Total errors',
1953
+ labelNames: ['type', 'severity']
1954
+ });
1955
+ }
1956
+
1957
+ // Liveness endpoint handler
1958
+ async handleLiveness(req: Request, res: Response): Promise<void> {
1959
+ // Liveness just checks if the process is running
1960
+ res.status(200).json({
1961
+ status: 'ok',
1962
+ timestamp: new Date().toISOString()
1963
+ });
1964
+ }
1965
+
1966
+ // Readiness endpoint handler
1967
+ async handleReadiness(req: Request, res: Response): Promise<void> {
1968
+ const result = await this.runChecks(['database', 'redis']);
1969
+ const status = result.status === 'healthy' ? 200 : 503;
1970
+ res.status(status).json(result);
1971
+ }
1972
+
1973
+ // Startup endpoint handler
1974
+ async handleStartup(req: Request, res: Response): Promise<void> {
1975
+ const result = await this.runChecks(['database', 'redis', 'rabbitmq']);
1976
+ const status = result.status === 'healthy' ? 200 : 503;
1977
+ res.status(status).json(result);
1978
+ }
1979
+
1980
+ // Full health endpoint
1981
+ async handleHealth(req: Request, res: Response): Promise<void> {
1982
+ const result = await this.runAllChecks();
1983
+ const status = result.status === 'healthy' ? 200 :
1984
+ result.status === 'degraded' ? 200 : 503;
1985
+ res.status(status).json(result);
1986
+ }
1987
+
1988
+ // Metrics endpoint
1989
+ async handleMetrics(req: Request, res: Response): Promise<void> {
1990
+ res.set('Content-Type', this.metricsRegistry.contentType);
1991
+ res.end(await this.metricsRegistry.metrics());
1992
+ }
1993
+
1994
+ // Run specific checks
1995
+ async runChecks(checkNames: string[]): Promise<HealthCheckResult> {
1996
+ const checks: HealthCheckResult['checks'] = {};
1997
+ let hasFailure = false;
1998
+ let hasWarning = false;
1999
+
2000
+ for (const name of checkNames) {
2001
+ const check = this.checks.get(name);
2002
+ if (check) {
2003
+ checks[name] = await check();
2004
+ if (checks[name].status === 'fail') hasFailure = true;
2005
+ if (checks[name].status === 'warn') hasWarning = true;
2006
+ }
2007
+ }
2008
+
2009
+ return {
2010
+ status: hasFailure ? 'unhealthy' : hasWarning ? 'degraded' : 'healthy',
2011
+ timestamp: new Date().toISOString(),
2012
+ checks,
2013
+ version: process.env.APP_VERSION || '3.6.0',
2014
+ uptime_seconds: (Date.now() - this.startTime.getTime()) / 1000
2015
+ };
2016
+ }
2017
+
2018
+ // Register custom check
2019
+ registerCheck(name: string, check: () => Promise<HealthCheckResult['checks'][string]>): void {
2020
+ this.checks.set(name, check);
2021
+ }
2022
+
2023
+ // Get alerting rules
2024
+ getAlertingRules(): AlertRule[] {
2025
+ return [
2026
+ {
2027
+ name: 'ElsabroHighErrorRate',
2028
+ expression: `
2029
+ sum(rate(elsabro_errors_total{severity="critical"}[5m]))
2030
+ / sum(rate(elsabro_requests_total[5m])) > 0.05
2031
+ `,
2032
+ duration: '5m',
2033
+ severity: 'critical',
2034
+ summary: 'ELSABRO high error rate',
2035
+ description: 'Error rate is {{ $value | humanizePercentage }} (threshold: 5%)',
2036
+ runbook_url: 'https://docs.elsabro.dev/runbooks/high-error-rate'
2037
+ },
2038
+ {
2039
+ name: 'ElsabroHighLatency',
2040
+ expression: `
2041
+ histogram_quantile(0.95,
2042
+ sum(rate(elsabro_request_duration_seconds_bucket[5m])) by (le)
2043
+ ) > 5
2044
+ `,
2045
+ duration: '5m',
2046
+ severity: 'warning',
2047
+ summary: 'ELSABRO high latency',
2048
+ description: 'P95 latency is {{ $value | humanizeDuration }}'
2049
+ },
2050
+ {
2051
+ name: 'ElsabroQueueBacklog',
2052
+ expression: 'elsabro_queue_length > 1000',
2053
+ duration: '10m',
2054
+ severity: 'warning',
2055
+ summary: 'ELSABRO queue backlog',
2056
+ description: 'Queue has {{ $value }} pending tasks'
2057
+ },
2058
+ {
2059
+ name: 'ElsabroAgentFailures',
2060
+ expression: `
2061
+ sum(rate(elsabro_agent_invocations_total{status="failed"}[5m]))
2062
+ / sum(rate(elsabro_agent_invocations_total[5m])) > 0.1
2063
+ `,
2064
+ duration: '5m',
2065
+ severity: 'critical',
2066
+ summary: 'ELSABRO high agent failure rate',
2067
+ description: 'Agent failure rate is {{ $value | humanizePercentage }}'
2068
+ },
2069
+ {
2070
+ name: 'ElsabroDatabaseConnectionFailure',
2071
+ expression: 'elsabro_database_connections_active == 0',
2072
+ duration: '1m',
2073
+ severity: 'critical',
2074
+ summary: 'ELSABRO database connection failure',
2075
+ description: 'No active database connections'
2076
+ },
2077
+ {
2078
+ name: 'ElsabroRedisConnectionFailure',
2079
+ expression: 'elsabro_redis_connections_active == 0',
2080
+ duration: '1m',
2081
+ severity: 'critical',
2082
+ summary: 'ELSABRO Redis connection failure',
2083
+ description: 'No active Redis connections'
2084
+ },
2085
+ {
2086
+ name: 'ElsabroHighMemoryUsage',
2087
+ expression: `
2088
+ container_memory_usage_bytes{container="elsabro"}
2089
+ / container_spec_memory_limit_bytes{container="elsabro"} > 0.9
2090
+ `,
2091
+ duration: '5m',
2092
+ severity: 'warning',
2093
+ summary: 'ELSABRO high memory usage',
2094
+ description: 'Memory usage is {{ $value | humanizePercentage }} of limit'
2095
+ },
2096
+ {
2097
+ name: 'ElsabroPodCrashLooping',
2098
+ expression: 'rate(kube_pod_container_status_restarts_total{container="elsabro"}[15m]) > 0',
2099
+ duration: '15m',
2100
+ severity: 'critical',
2101
+ summary: 'ELSABRO pod crash looping',
2102
+ description: 'Pod {{ $labels.pod }} is crash looping'
2103
+ }
2104
+ ];
2105
+ }
2106
+ }
2107
+ ```
2108
+
2109
+ ### Grafana Dashboard JSON
2110
+
2111
+ ```json
2112
+ {
2113
+ "dashboard": {
2114
+ "id": null,
2115
+ "uid": "elsabro-overview",
2116
+ "title": "ELSABRO Overview",
2117
+ "tags": ["elsabro", "ai", "agents"],
2118
+ "timezone": "browser",
2119
+ "schemaVersion": 38,
2120
+ "version": 1,
2121
+ "refresh": "30s",
2122
+ "panels": [
2123
+ {
2124
+ "id": 1,
2125
+ "title": "Request Rate",
2126
+ "type": "stat",
2127
+ "gridPos": { "h": 4, "w": 6, "x": 0, "y": 0 },
2128
+ "targets": [{
2129
+ "expr": "sum(rate(elsabro_requests_total[5m]))",
2130
+ "legendFormat": "requests/sec"
2131
+ }]
2132
+ },
2133
+ {
2134
+ "id": 2,
2135
+ "title": "Error Rate",
2136
+ "type": "stat",
2137
+ "gridPos": { "h": 4, "w": 6, "x": 6, "y": 0 },
2138
+ "targets": [{
2139
+ "expr": "sum(rate(elsabro_errors_total[5m])) / sum(rate(elsabro_requests_total[5m])) * 100",
2140
+ "legendFormat": "error %"
2141
+ }],
2142
+ "fieldConfig": {
2143
+ "defaults": {
2144
+ "unit": "percent",
2145
+ "thresholds": {
2146
+ "steps": [
2147
+ { "value": 0, "color": "green" },
2148
+ { "value": 1, "color": "yellow" },
2149
+ { "value": 5, "color": "red" }
2150
+ ]
2151
+ }
2152
+ }
2153
+ }
2154
+ },
2155
+ {
2156
+ "id": 3,
2157
+ "title": "P95 Latency",
2158
+ "type": "stat",
2159
+ "gridPos": { "h": 4, "w": 6, "x": 12, "y": 0 },
2160
+ "targets": [{
2161
+ "expr": "histogram_quantile(0.95, sum(rate(elsabro_request_duration_seconds_bucket[5m])) by (le))",
2162
+ "legendFormat": "p95"
2163
+ }],
2164
+ "fieldConfig": {
2165
+ "defaults": {
2166
+ "unit": "s",
2167
+ "thresholds": {
2168
+ "steps": [
2169
+ { "value": 0, "color": "green" },
2170
+ { "value": 2, "color": "yellow" },
2171
+ { "value": 5, "color": "red" }
2172
+ ]
2173
+ }
2174
+ }
2175
+ }
2176
+ },
2177
+ {
2178
+ "id": 4,
2179
+ "title": "Active Agents",
2180
+ "type": "stat",
2181
+ "gridPos": { "h": 4, "w": 6, "x": 18, "y": 0 },
2182
+ "targets": [{
2183
+ "expr": "sum(elsabro_active_agents)",
2184
+ "legendFormat": "agents"
2185
+ }]
2186
+ },
2187
+ {
2188
+ "id": 5,
2189
+ "title": "Request Rate by Endpoint",
2190
+ "type": "timeseries",
2191
+ "gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 },
2192
+ "targets": [{
2193
+ "expr": "sum(rate(elsabro_requests_total[5m])) by (path)",
2194
+ "legendFormat": "{{ path }}"
2195
+ }]
2196
+ },
2197
+ {
2198
+ "id": 6,
2199
+ "title": "Latency Distribution",
2200
+ "type": "heatmap",
2201
+ "gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 },
2202
+ "targets": [{
2203
+ "expr": "sum(rate(elsabro_request_duration_seconds_bucket[5m])) by (le)",
2204
+ "format": "heatmap"
2205
+ }]
2206
+ },
2207
+ {
2208
+ "id": 7,
2209
+ "title": "Agent Invocations by Model",
2210
+ "type": "timeseries",
2211
+ "gridPos": { "h": 8, "w": 12, "x": 0, "y": 12 },
2212
+ "targets": [{
2213
+ "expr": "sum(rate(elsabro_agent_invocations_total[5m])) by (model)",
2214
+ "legendFormat": "{{ model }}"
2215
+ }]
2216
+ },
2217
+ {
2218
+ "id": 8,
2219
+ "title": "Token Usage",
2220
+ "type": "timeseries",
2221
+ "gridPos": { "h": 8, "w": 12, "x": 12, "y": 12 },
2222
+ "targets": [
2223
+ {
2224
+ "expr": "sum(rate(elsabro_tokens_total{type=\"input\"}[5m]))",
2225
+ "legendFormat": "Input Tokens"
2226
+ },
2227
+ {
2228
+ "expr": "sum(rate(elsabro_tokens_total{type=\"output\"}[5m]))",
2229
+ "legendFormat": "Output Tokens"
2230
+ }
2231
+ ]
2232
+ },
2233
+ {
2234
+ "id": 9,
2235
+ "title": "Queue Length",
2236
+ "type": "timeseries",
2237
+ "gridPos": { "h": 6, "w": 8, "x": 0, "y": 20 },
2238
+ "targets": [{
2239
+ "expr": "elsabro_queue_length",
2240
+ "legendFormat": "queue"
2241
+ }],
2242
+ "fieldConfig": {
2243
+ "defaults": {
2244
+ "thresholds": {
2245
+ "steps": [
2246
+ { "value": 0, "color": "green" },
2247
+ { "value": 500, "color": "yellow" },
2248
+ { "value": 1000, "color": "red" }
2249
+ ]
2250
+ }
2251
+ }
2252
+ }
2253
+ },
2254
+ {
2255
+ "id": 10,
2256
+ "title": "Pod CPU Usage",
2257
+ "type": "timeseries",
2258
+ "gridPos": { "h": 6, "w": 8, "x": 8, "y": 20 },
2259
+ "targets": [{
2260
+ "expr": "sum(rate(container_cpu_usage_seconds_total{container=\"elsabro\"}[5m])) by (pod)",
2261
+ "legendFormat": "{{ pod }}"
2262
+ }]
2263
+ },
2264
+ {
2265
+ "id": 11,
2266
+ "title": "Pod Memory Usage",
2267
+ "type": "timeseries",
2268
+ "gridPos": { "h": 6, "w": 8, "x": 16, "y": 20 },
2269
+ "targets": [{
2270
+ "expr": "container_memory_usage_bytes{container=\"elsabro\"}",
2271
+ "legendFormat": "{{ pod }}"
2272
+ }],
2273
+ "fieldConfig": {
2274
+ "defaults": { "unit": "bytes" }
2275
+ }
2276
+ }
2277
+ ]
2278
+ }
2279
+ }
2280
+ ```
2281
+
2282
+ ---
2283
+
2284
+ ## 5. Infrastructure Components
2285
+
2286
+ ### Redis Configuration
2287
+
2288
+ ```yaml
2289
+ apiVersion: v1
2290
+ kind: ConfigMap
2291
+ metadata:
2292
+ name: redis-config
2293
+ namespace: elsabro
2294
+ data:
2295
+ redis.conf: |
2296
+ # Memory management
2297
+ maxmemory 2gb
2298
+ maxmemory-policy allkeys-lru
2299
+
2300
+ # Persistence
2301
+ appendonly yes
2302
+ appendfsync everysec
2303
+
2304
+ # Performance
2305
+ tcp-backlog 511
2306
+ tcp-keepalive 300
2307
+
2308
+ # Security
2309
+ protected-mode yes
2310
+ bind 0.0.0.0
2311
+
2312
+ # Cluster (if enabled)
2313
+ # cluster-enabled yes
2314
+ # cluster-config-file nodes.conf
2315
+ # cluster-node-timeout 5000
2316
+ ---
2317
+ apiVersion: apps/v1
2318
+ kind: StatefulSet
2319
+ metadata:
2320
+ name: redis
2321
+ namespace: elsabro
2322
+ spec:
2323
+ serviceName: redis
2324
+ replicas: 1
2325
+ selector:
2326
+ matchLabels:
2327
+ app: redis
2328
+ template:
2329
+ metadata:
2330
+ labels:
2331
+ app: redis
2332
+ spec:
2333
+ containers:
2334
+ - name: redis
2335
+ image: redis:7-alpine
2336
+ ports:
2337
+ - containerPort: 6379
2338
+ resources:
2339
+ requests:
2340
+ cpu: 100m
2341
+ memory: 256Mi
2342
+ limits:
2343
+ cpu: 500m
2344
+ memory: 2Gi
2345
+ volumeMounts:
2346
+ - name: redis-data
2347
+ mountPath: /data
2348
+ - name: redis-config
2349
+ mountPath: /usr/local/etc/redis
2350
+ command:
2351
+ - redis-server
2352
+ - /usr/local/etc/redis/redis.conf
2353
+ readinessProbe:
2354
+ exec:
2355
+ command: ["redis-cli", "ping"]
2356
+ initialDelaySeconds: 5
2357
+ periodSeconds: 10
2358
+ livenessProbe:
2359
+ exec:
2360
+ command: ["redis-cli", "ping"]
2361
+ initialDelaySeconds: 15
2362
+ periodSeconds: 20
2363
+ volumes:
2364
+ - name: redis-config
2365
+ configMap:
2366
+ name: redis-config
2367
+ volumeClaimTemplates:
2368
+ - metadata:
2369
+ name: redis-data
2370
+ spec:
2371
+ accessModes: ["ReadWriteOnce"]
2372
+ resources:
2373
+ requests:
2374
+ storage: 10Gi
2375
+ ---
2376
+ apiVersion: v1
2377
+ kind: Service
2378
+ metadata:
2379
+ name: redis
2380
+ namespace: elsabro
2381
+ spec:
2382
+ ports:
2383
+ - port: 6379
2384
+ targetPort: 6379
2385
+ selector:
2386
+ app: redis
2387
+ clusterIP: None
2388
+ ```
2389
+
2390
+ ### PostgreSQL Configuration
2391
+
2392
+ ```yaml
2393
+ apiVersion: v1
2394
+ kind: ConfigMap
2395
+ metadata:
2396
+ name: postgres-config
2397
+ namespace: elsabro
2398
+ data:
2399
+ postgresql.conf: |
2400
+ # Connection settings
2401
+ listen_addresses = '*'
2402
+ max_connections = 200
2403
+
2404
+ # Memory settings
2405
+ shared_buffers = 256MB
2406
+ effective_cache_size = 768MB
2407
+ work_mem = 4MB
2408
+ maintenance_work_mem = 64MB
2409
+
2410
+ # WAL settings
2411
+ wal_level = replica
2412
+ max_wal_senders = 3
2413
+ max_replication_slots = 3
2414
+
2415
+ # Logging
2416
+ log_destination = 'stderr'
2417
+ logging_collector = on
2418
+ log_directory = 'log'
2419
+ log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
2420
+ log_statement = 'ddl'
2421
+ log_min_duration_statement = 1000
2422
+
2423
+ # Performance
2424
+ random_page_cost = 1.1
2425
+ effective_io_concurrency = 200
2426
+
2427
+ pg_hba.conf: |
2428
+ # TYPE DATABASE USER ADDRESS METHOD
2429
+ local all all trust
2430
+ host all all 127.0.0.1/32 scram-sha-256
2431
+ host all all ::1/128 scram-sha-256
2432
+ host all all 0.0.0.0/0 scram-sha-256
2433
+ host replication all 0.0.0.0/0 scram-sha-256
2434
+ ---
2435
+ apiVersion: apps/v1
2436
+ kind: StatefulSet
2437
+ metadata:
2438
+ name: postgres
2439
+ namespace: elsabro
2440
+ spec:
2441
+ serviceName: postgres
2442
+ replicas: 1
2443
+ selector:
2444
+ matchLabels:
2445
+ app: postgres
2446
+ template:
2447
+ metadata:
2448
+ labels:
2449
+ app: postgres
2450
+ spec:
2451
+ containers:
2452
+ - name: postgres
2453
+ image: postgres:16-alpine
2454
+ ports:
2455
+ - containerPort: 5432
2456
+ env:
2457
+ - name: POSTGRES_DB
2458
+ value: elsabro
2459
+ - name: POSTGRES_USER
2460
+ valueFrom:
2461
+ secretKeyRef:
2462
+ name: postgres-secret
2463
+ key: username
2464
+ - name: POSTGRES_PASSWORD
2465
+ valueFrom:
2466
+ secretKeyRef:
2467
+ name: postgres-secret
2468
+ key: password
2469
+ - name: PGDATA
2470
+ value: /var/lib/postgresql/data/pgdata
2471
+ resources:
2472
+ requests:
2473
+ cpu: 250m
2474
+ memory: 512Mi
2475
+ limits:
2476
+ cpu: 1000m
2477
+ memory: 2Gi
2478
+ volumeMounts:
2479
+ - name: postgres-data
2480
+ mountPath: /var/lib/postgresql/data
2481
+ - name: postgres-config
2482
+ mountPath: /etc/postgresql
2483
+ readinessProbe:
2484
+ exec:
2485
+ command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
2486
+ initialDelaySeconds: 5
2487
+ periodSeconds: 10
2488
+ livenessProbe:
2489
+ exec:
2490
+ command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
2491
+ initialDelaySeconds: 15
2492
+ periodSeconds: 20
2493
+ volumes:
2494
+ - name: postgres-config
2495
+ configMap:
2496
+ name: postgres-config
2497
+ volumeClaimTemplates:
2498
+ - metadata:
2499
+ name: postgres-data
2500
+ spec:
2501
+ accessModes: ["ReadWriteOnce"]
2502
+ resources:
2503
+ requests:
2504
+ storage: 50Gi
2505
+ ---
2506
+ apiVersion: v1
2507
+ kind: Service
2508
+ metadata:
2509
+ name: postgres
2510
+ namespace: elsabro
2511
+ spec:
2512
+ ports:
2513
+ - port: 5432
2514
+ targetPort: 5432
2515
+ selector:
2516
+ app: postgres
2517
+ clusterIP: None
2518
+ ```
2519
+
2520
+ ### RabbitMQ Configuration
2521
+
2522
+ ```yaml
2523
+ apiVersion: v1
2524
+ kind: ConfigMap
2525
+ metadata:
2526
+ name: rabbitmq-config
2527
+ namespace: elsabro
2528
+ data:
2529
+ rabbitmq.conf: |
2530
+ ## Cluster formation
2531
+ cluster_formation.peer_discovery_backend = k8s
2532
+ cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
2533
+ cluster_formation.k8s.address_type = hostname
2534
+ cluster_formation.node_cleanup.interval = 30
2535
+ cluster_formation.node_cleanup.only_log_warning = true
2536
+ cluster_partition_handling = autoheal
2537
+
2538
+ ## Queue settings
2539
+ queue_master_locator = min-masters
2540
+
2541
+ ## Networking
2542
+ listeners.tcp.default = 5672
2543
+ management.listener.port = 15672
2544
+
2545
+ ## Memory management
2546
+ vm_memory_high_watermark.relative = 0.7
2547
+ vm_memory_high_watermark_paging_ratio = 0.75
2548
+
2549
+ ## Disk free limit
2550
+ disk_free_limit.absolute = 1GB
2551
+
2552
+ enabled_plugins: |
2553
+ [rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_prometheus].
2554
+ ---
2555
+ apiVersion: apps/v1
2556
+ kind: StatefulSet
2557
+ metadata:
2558
+ name: rabbitmq
2559
+ namespace: elsabro
2560
+ spec:
2561
+ serviceName: rabbitmq
2562
+ replicas: 3
2563
+ selector:
2564
+ matchLabels:
2565
+ app: rabbitmq
2566
+ template:
2567
+ metadata:
2568
+ labels:
2569
+ app: rabbitmq
2570
+ spec:
2571
+ serviceAccountName: rabbitmq
2572
+ containers:
2573
+ - name: rabbitmq
2574
+ image: rabbitmq:3.12-management-alpine
2575
+ ports:
2576
+ - containerPort: 5672
2577
+ name: amqp
2578
+ - containerPort: 15672
2579
+ name: management
2580
+ - containerPort: 15692
2581
+ name: prometheus
2582
+ env:
2583
+ - name: RABBITMQ_DEFAULT_USER
2584
+ valueFrom:
2585
+ secretKeyRef:
2586
+ name: rabbitmq-secret
2587
+ key: username
2588
+ - name: RABBITMQ_DEFAULT_PASS
2589
+ valueFrom:
2590
+ secretKeyRef:
2591
+ name: rabbitmq-secret
2592
+ key: password
2593
+ - name: RABBITMQ_ERLANG_COOKIE
2594
+ valueFrom:
2595
+ secretKeyRef:
2596
+ name: rabbitmq-secret
2597
+ key: erlang-cookie
2598
+ - name: K8S_SERVICE_NAME
2599
+ value: rabbitmq
2600
+ - name: RABBITMQ_NODENAME
2601
+ value: rabbit@$(POD_NAME).rabbitmq.elsabro.svc.cluster.local
2602
+ - name: POD_NAME
2603
+ valueFrom:
2604
+ fieldRef:
2605
+ fieldPath: metadata.name
2606
+ resources:
2607
+ requests:
2608
+ cpu: 200m
2609
+ memory: 512Mi
2610
+ limits:
2611
+ cpu: 1000m
2612
+ memory: 2Gi
2613
+ volumeMounts:
2614
+ - name: rabbitmq-data
2615
+ mountPath: /var/lib/rabbitmq
2616
+ - name: rabbitmq-config
2617
+ mountPath: /etc/rabbitmq
2618
+ readinessProbe:
2619
+ exec:
2620
+ command: ["rabbitmq-diagnostics", "check_running"]
2621
+ initialDelaySeconds: 20
2622
+ periodSeconds: 10
2623
+ livenessProbe:
2624
+ exec:
2625
+ command: ["rabbitmq-diagnostics", "ping"]
2626
+ initialDelaySeconds: 60
2627
+ periodSeconds: 30
2628
+ volumes:
2629
+ - name: rabbitmq-config
2630
+ configMap:
2631
+ name: rabbitmq-config
2632
+ volumeClaimTemplates:
2633
+ - metadata:
2634
+ name: rabbitmq-data
2635
+ spec:
2636
+ accessModes: ["ReadWriteOnce"]
2637
+ resources:
2638
+ requests:
2639
+ storage: 10Gi
2640
+ ---
2641
+ apiVersion: v1
2642
+ kind: Service
2643
+ metadata:
2644
+ name: rabbitmq
2645
+ namespace: elsabro
2646
+ spec:
2647
+ ports:
2648
+ - port: 5672
2649
+ targetPort: 5672
2650
+ name: amqp
2651
+ - port: 15672
2652
+ targetPort: 15672
2653
+ name: management
2654
+ - port: 15692
2655
+ targetPort: 15692
2656
+ name: prometheus
2657
+ selector:
2658
+ app: rabbitmq
2659
+ clusterIP: None
2660
+ ```
2661
+
2662
+ ### Ingress Configuration (Nginx)
2663
+
2664
+ ```yaml
2665
+ apiVersion: networking.k8s.io/v1
2666
+ kind: Ingress
2667
+ metadata:
2668
+ name: elsabro-ingress
2669
+ namespace: elsabro
2670
+ annotations:
2671
+ nginx.ingress.kubernetes.io/ssl-redirect: "true"
2672
+ nginx.ingress.kubernetes.io/proxy-body-size: "100m"
2673
+ nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
2674
+ nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
2675
+ nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
2676
+ nginx.ingress.kubernetes.io/proxy-buffering: "on"
2677
+ nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
2678
+ nginx.ingress.kubernetes.io/rate-limit: "100"
2679
+ nginx.ingress.kubernetes.io/rate-limit-window: "1m"
2680
+ nginx.ingress.kubernetes.io/enable-cors: "true"
2681
+ nginx.ingress.kubernetes.io/cors-allow-origin: "*"
2682
+ nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS"
2683
+ nginx.ingress.kubernetes.io/cors-allow-headers: "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization"
2684
+ cert-manager.io/cluster-issuer: "letsencrypt-prod"
2685
+ spec:
2686
+ ingressClassName: nginx
2687
+ tls:
2688
+ - hosts:
2689
+ - api.elsabro.io
2690
+ - ws.elsabro.io
2691
+ secretName: elsabro-tls
2692
+ rules:
2693
+ - host: api.elsabro.io
2694
+ http:
2695
+ paths:
2696
+ - path: /
2697
+ pathType: Prefix
2698
+ backend:
2699
+ service:
2700
+ name: elsabro
2701
+ port:
2702
+ number: 8080
2703
+ - host: ws.elsabro.io
2704
+ http:
2705
+ paths:
2706
+ - path: /
2707
+ pathType: Prefix
2708
+ backend:
2709
+ service:
2710
+ name: elsabro
2711
+ port:
2712
+ number: 8080
2713
+ ```
2714
+
2715
+ ---
2716
+
2717
+ ## 6. Docker Multi-Stage Build
2718
+
2719
+ ### Dockerfile
2720
+
2721
+ ```dockerfile
2722
+ # ==============================================================================
2723
+ # ELSABRO Docker Multi-Stage Build
2724
+ # Version: 3.6.0
2725
+ # ==============================================================================
2726
+
2727
+ # ------------------------------------------------------------------------------
2728
+ # Stage 1: Dependencies
2729
+ # ------------------------------------------------------------------------------
2730
+ FROM node:20-alpine AS deps
2731
+
2732
+ WORKDIR /app
2733
+
2734
+ # Install build dependencies
2735
+ RUN apk add --no-cache python3 make g++ git
2736
+
2737
+ # Copy package files
2738
+ COPY package.json package-lock.json ./
2739
+
2740
+ # Install production dependencies
2741
+ RUN npm ci --only=production && \
2742
+ cp -R node_modules /prod_modules
2743
+
2744
+ # Install all dependencies (including devDependencies)
2745
+ RUN npm ci
2746
+
2747
+ # ------------------------------------------------------------------------------
2748
+ # Stage 2: Builder
2749
+ # ------------------------------------------------------------------------------
2750
+ FROM node:20-alpine AS builder
2751
+
2752
+ WORKDIR /app
2753
+
2754
+ # Copy dependencies from deps stage
2755
+ COPY --from=deps /app/node_modules ./node_modules
2756
+
2757
+ # Copy source code
2758
+ COPY . .
2759
+
2760
+ # Build TypeScript
2761
+ RUN npm run build
2762
+
2763
+ # Run tests (optional, can be skipped with --build-arg SKIP_TESTS=true)
2764
+ ARG SKIP_TESTS=false
2765
+ RUN if [ "$SKIP_TESTS" = "false" ]; then npm run test; fi
2766
+
2767
+ # Prune dev dependencies
2768
+ RUN npm prune --production
2769
+
2770
+ # ------------------------------------------------------------------------------
2771
+ # Stage 3: Production
2772
+ # ------------------------------------------------------------------------------
2773
+ FROM node:20-alpine AS production
2774
+
2775
+ # Labels
2776
+ LABEL org.opencontainers.image.title="ELSABRO"
2777
+ LABEL org.opencontainers.image.description="AI-Powered Development Workflow System"
2778
+ LABEL org.opencontainers.image.version="3.6.0"
2779
+ LABEL org.opencontainers.image.vendor="CubaIT"
2780
+ LABEL org.opencontainers.image.source="https://github.com/cubait/elsabro"
2781
+
2782
+ # Create non-root user
2783
+ RUN addgroup -g 1000 elsabro && \
2784
+ adduser -u 1000 -G elsabro -D -h /app elsabro
2785
+
2786
+ WORKDIR /app
2787
+
2788
+ # Install runtime dependencies
2789
+ RUN apk add --no-cache \
2790
+ dumb-init \
2791
+ tini \
2792
+ curl \
2793
+ ca-certificates
2794
+
2795
+ # Copy built artifacts
2796
+ COPY --from=builder --chown=elsabro:elsabro /app/dist ./dist
2797
+ COPY --from=builder --chown=elsabro:elsabro /app/node_modules ./node_modules
2798
+ COPY --from=builder --chown=elsabro:elsabro /app/package.json ./
2799
+
2800
+ # Create directories for runtime data
2801
+ RUN mkdir -p /app/.elsabro /app/logs /app/tmp && \
2802
+ chown -R elsabro:elsabro /app
2803
+
2804
+ # Switch to non-root user
2805
+ USER elsabro
2806
+
2807
+ # Environment variables
2808
+ ENV NODE_ENV=production
2809
+ ENV PORT=8080
2810
+ ENV METRICS_PORT=9090
2811
+ ENV LOG_LEVEL=info
2812
+ ENV LOG_FORMAT=json
2813
+
2814
+ # Expose ports
2815
+ EXPOSE 8080 9090
2816
+
2817
+ # Health check
2818
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
2819
+ CMD curl -f http://localhost:8080/health/live || exit 1
2820
+
2821
+ # Use tini as init system
2822
+ ENTRYPOINT ["/sbin/tini", "--"]
2823
+
2824
+ # Start application
2825
+ CMD ["node", "dist/server.js"]
2826
+
2827
+ # ------------------------------------------------------------------------------
2828
+ # Stage 4: Development (optional)
2829
+ # ------------------------------------------------------------------------------
2830
+ FROM node:20-alpine AS development
2831
+
2832
+ WORKDIR /app
2833
+
2834
+ # Install development dependencies
2835
+ RUN apk add --no-cache python3 make g++ git
2836
+
2837
+ # Copy all dependencies
2838
+ COPY --from=deps /app/node_modules ./node_modules
2839
+
2840
+ # Copy source code
2841
+ COPY . .
2842
+
2843
+ # Environment variables
2844
+ ENV NODE_ENV=development
2845
+ ENV PORT=8080
2846
+
2847
+ # Expose ports
2848
+ EXPOSE 8080 9090 9229
2849
+
2850
+ # Start with hot reload
2851
+ CMD ["npm", "run", "dev"]
2852
+ ```
2853
+
2854
+ ### docker-compose.yml
2855
+
2856
+ ```yaml
2857
+ version: '3.8'
2858
+
2859
+ services:
2860
+ elsabro:
2861
+ build:
2862
+ context: .
2863
+ target: production
2864
+ image: ghcr.io/cubait/elsabro:3.6.0
2865
+ container_name: elsabro
2866
+ restart: unless-stopped
2867
+ ports:
2868
+ - "8080:8080"
2869
+ - "9090:9090"
2870
+ environment:
2871
+ - NODE_ENV=production
2872
+ - DATABASE_URL=postgresql://elsabro:${POSTGRES_PASSWORD}@postgres:5432/elsabro
2873
+ - REDIS_URL=redis://:${REDIS_PASSWORD}@redis:6379/0
2874
+ - RABBITMQ_URL=amqp://elsabro:${RABBITMQ_PASSWORD}@rabbitmq:5672
2875
+ - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
2876
+ - JWT_SECRET=${JWT_SECRET}
2877
+ - LOG_LEVEL=info
2878
+ depends_on:
2879
+ postgres:
2880
+ condition: service_healthy
2881
+ redis:
2882
+ condition: service_healthy
2883
+ rabbitmq:
2884
+ condition: service_healthy
2885
+ healthcheck:
2886
+ test: ["CMD", "curl", "-f", "http://localhost:8080/health/live"]
2887
+ interval: 30s
2888
+ timeout: 10s
2889
+ retries: 3
2890
+ start_period: 30s
2891
+ networks:
2892
+ - elsabro-network
2893
+
2894
+ postgres:
2895
+ image: postgres:16-alpine
2896
+ container_name: elsabro-postgres
2897
+ restart: unless-stopped
2898
+ environment:
2899
+ - POSTGRES_DB=elsabro
2900
+ - POSTGRES_USER=elsabro
2901
+ - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
2902
+ volumes:
2903
+ - postgres-data:/var/lib/postgresql/data
2904
+ healthcheck:
2905
+ test: ["CMD-SHELL", "pg_isready -U elsabro"]
2906
+ interval: 10s
2907
+ timeout: 5s
2908
+ retries: 5
2909
+ networks:
2910
+ - elsabro-network
2911
+
2912
+ redis:
2913
+ image: redis:7-alpine
2914
+ container_name: elsabro-redis
2915
+ restart: unless-stopped
2916
+ command: redis-server --requirepass ${REDIS_PASSWORD}
2917
+ volumes:
2918
+ - redis-data:/data
2919
+ healthcheck:
2920
+ test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
2921
+ interval: 10s
2922
+ timeout: 5s
2923
+ retries: 5
2924
+ networks:
2925
+ - elsabro-network
2926
+
2927
+ rabbitmq:
2928
+ image: rabbitmq:3.12-management-alpine
2929
+ container_name: elsabro-rabbitmq
2930
+ restart: unless-stopped
2931
+ environment:
2932
+ - RABBITMQ_DEFAULT_USER=elsabro
2933
+ - RABBITMQ_DEFAULT_PASS=${RABBITMQ_PASSWORD}
2934
+ volumes:
2935
+ - rabbitmq-data:/var/lib/rabbitmq
2936
+ healthcheck:
2937
+ test: ["CMD", "rabbitmq-diagnostics", "ping"]
2938
+ interval: 30s
2939
+ timeout: 10s
2940
+ retries: 5
2941
+ networks:
2942
+ - elsabro-network
2943
+
2944
+ volumes:
2945
+ postgres-data:
2946
+ redis-data:
2947
+ rabbitmq-data:
2948
+
2949
+ networks:
2950
+ elsabro-network:
2951
+ driver: bridge
2952
+ ```
2953
+
2954
+ ---
2955
+
2956
+ ## 7. CLI Commands
2957
+
2958
+ ### /elsabro:k8s
2959
+
2960
+ ```bash
2961
+ # Deploy ELSABRO to Kubernetes
2962
+ /elsabro:k8s deploy [environment] [options]
2963
+
2964
+ # Options:
2965
+ # --namespace <ns> Target namespace (default: elsabro)
2966
+ # --replicas <n> Number of replicas (default: from values)
2967
+ # --image-tag <tag> Docker image tag (default: latest)
2968
+ # --values <file> Custom values file
2969
+ # --dry-run Preview without applying
2970
+ # --wait Wait for deployment to complete
2971
+
2972
+ # Examples:
2973
+ /elsabro:k8s deploy dev
2974
+ /elsabro:k8s deploy staging --replicas 3
2975
+ /elsabro:k8s deploy prod --image-tag v3.6.0 --wait
2976
+
2977
+ # Scale deployment
2978
+ /elsabro:k8s scale <replicas> [options]
2979
+
2980
+ # Options:
2981
+ # --namespace <ns> Target namespace
2982
+ # --deployment <name> Deployment name (default: elsabro)
2983
+
2984
+ # Examples:
2985
+ /elsabro:k8s scale 5
2986
+ /elsabro:k8s scale 10 --namespace elsabro-prod
2987
+
2988
+ # Get deployment status
2989
+ /elsabro:k8s status [options]
2990
+
2991
+ # Options:
2992
+ # --namespace <ns> Target namespace
2993
+ # --watch Watch for changes
2994
+ # --output <format> Output format (table|json|yaml)
2995
+
2996
+ # Examples:
2997
+ /elsabro:k8s status
2998
+ /elsabro:k8s status --namespace elsabro-prod --watch
2999
+
3000
+ # View logs
3001
+ /elsabro:k8s logs [options]
3002
+
3003
+ # Options:
3004
+ # --namespace <ns> Target namespace
3005
+ # --pod <name> Specific pod name
3006
+ # --container <name> Container name
3007
+ # --follow Follow log output
3008
+ # --tail <n> Number of lines (default: 100)
3009
+ # --since <duration> Show logs since duration (e.g., 1h)
3010
+
3011
+ # Examples:
3012
+ /elsabro:k8s logs
3013
+ /elsabro:k8s logs --follow --tail 500
3014
+ /elsabro:k8s logs --since 30m --namespace elsabro-prod
3015
+
3016
+ # Generate Helm chart
3017
+ /elsabro:k8s helm generate [options]
3018
+
3019
+ # Options:
3020
+ # --output <dir> Output directory
3021
+ # --environment <env> Target environment (dev|staging|prod)
3022
+
3023
+ # Rollout management
3024
+ /elsabro:k8s rollout <action> [options]
3025
+
3026
+ # Actions:
3027
+ # status - Show rollout status
3028
+ # restart - Restart deployment
3029
+ # undo - Rollback to previous version
3030
+ # pause - Pause rollout
3031
+ # resume - Resume rollout
3032
+
3033
+ # Examples:
3034
+ /elsabro:k8s rollout status
3035
+ /elsabro:k8s rollout restart
3036
+ /elsabro:k8s rollout undo --to-revision 3
3037
+
3038
+ # Port forwarding
3039
+ /elsabro:k8s port-forward [options]
3040
+
3041
+ # Options:
3042
+ # --namespace <ns> Target namespace
3043
+ # --port <local:remote> Port mapping (default: 8080:8080)
3044
+ # --pod <name> Specific pod
3045
+
3046
+ # Examples:
3047
+ /elsabro:k8s port-forward
3048
+ /elsabro:k8s port-forward --port 3000:8080
3049
+
3050
+ # Execute command in pod
3051
+ /elsabro:k8s exec <command> [options]
3052
+
3053
+ # Options:
3054
+ # --namespace <ns> Target namespace
3055
+ # --pod <name> Specific pod
3056
+ # --interactive Interactive mode (-it)
3057
+
3058
+ # Examples:
3059
+ /elsabro:k8s exec "npm run migrate"
3060
+ /elsabro:k8s exec "sh" --interactive
3061
+ ```
3062
+
3063
+ ---
3064
+
3065
+ ## Comandos de Usuario Rapidos
3066
+
3067
+ ```bash
3068
+ # Deployment commands
3069
+ /elsabro:k8s deploy dev # Deploy to development
3070
+ /elsabro:k8s deploy staging --replicas 3 # Deploy to staging with 3 replicas
3071
+ /elsabro:k8s deploy prod --wait # Deploy to production and wait
3072
+
3073
+ # Scaling commands
3074
+ /elsabro:k8s scale 5 # Scale to 5 replicas
3075
+ /elsabro:k8s scale 10 --namespace prod # Scale production to 10
3076
+
3077
+ # Status and monitoring
3078
+ /elsabro:k8s status # Get current status
3079
+ /elsabro:k8s status --watch # Watch status in real-time
3080
+ /elsabro:k8s logs --follow # Stream logs
3081
+
3082
+ # Rollout management
3083
+ /elsabro:k8s rollout status # Check rollout status
3084
+ /elsabro:k8s rollout restart # Restart all pods
3085
+ /elsabro:k8s rollout undo # Rollback to previous version
3086
+ ```
3087
+
3088
+ ---
3089
+
3090
+ ## Changelog
3091
+
3092
+ - **v3.6.0**: Initial Kubernetes Deployment System implementation
3093
+ - K8sDeployer for deployment management
3094
+ - HelmChartGenerator for Helm chart creation
3095
+ - ResourceScaler with HPA/VPA support
3096
+ - HealthMonitor with probes and metrics
3097
+ - Infrastructure components (Redis, PostgreSQL, RabbitMQ)
3098
+ - Docker multi-stage build
3099
+ - CLI commands for operations