@rulebricks/cli 2.1.6 → 2.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +75 -14
- package/cluster-setup/aws/README.md +123 -0
- package/cluster-setup/aws/check-aws-access.sh +242 -0
- package/cluster-setup/aws/parameters.json +13 -0
- package/cluster-setup/aws/rulebricks-cluster.cfn.yaml +355 -0
- package/cluster-setup/azure/README.md +141 -0
- package/cluster-setup/azure/check-aks-prereqs.sh +276 -0
- package/cluster-setup/azure/parameters.json +30 -0
- package/cluster-setup/azure/rulebricks-cluster.bicep +546 -0
- package/cluster-setup/gcp/README.md +189 -0
- package/cluster-setup/gcp/check-gke-prereqs.sh +260 -0
- package/dist/commands/backup.d.ts +5 -0
- package/dist/commands/backup.js +104 -0
- package/dist/commands/deploy.d.ts +3 -1
- package/dist/commands/deploy.js +226 -326
- package/dist/commands/destroy.d.ts +1 -1
- package/dist/commands/destroy.js +73 -123
- package/dist/commands/init.d.ts +5 -1
- package/dist/commands/init.js +78 -47
- package/dist/commands/list.d.ts +1 -0
- package/dist/commands/list.js +74 -0
- package/dist/commands/open.d.ts +1 -1
- package/dist/commands/open.js +4 -12
- package/dist/commands/redeploy.d.ts +6 -0
- package/dist/commands/redeploy.js +310 -0
- package/dist/commands/restore.d.ts +5 -0
- package/dist/commands/restore.js +338 -0
- package/dist/commands/status.js +62 -49
- package/dist/commands/upgrade.js +74 -51
- package/dist/components/DNSWaitScreen.d.ts +5 -1
- package/dist/components/DNSWaitScreen.js +47 -41
- package/dist/components/Wizard/WizardContext.d.ts +174 -29
- package/dist/components/Wizard/WizardContext.js +896 -91
- package/dist/components/Wizard/steps/CloudProviderStep.js +192 -102
- package/dist/components/Wizard/steps/DomainStep.js +5 -24
- package/dist/components/Wizard/steps/ExternalServicesStep.d.ts +6 -0
- package/dist/components/Wizard/steps/ExternalServicesStep.js +645 -0
- package/dist/components/Wizard/steps/FeatureConfigStep.d.ts +2 -1
- package/dist/components/Wizard/steps/FeatureConfigStep.js +959 -248
- package/dist/components/Wizard/steps/FeaturesStep.js +31 -35
- package/dist/components/Wizard/steps/ObservabilityStep.d.ts +6 -0
- package/dist/components/Wizard/steps/ObservabilityStep.js +137 -0
- package/dist/components/Wizard/steps/ReviewStep.d.ts +2 -1
- package/dist/components/Wizard/steps/ReviewStep.js +56 -7
- package/dist/components/Wizard/steps/StorageStep.d.ts +9 -0
- package/dist/components/Wizard/steps/StorageStep.js +592 -0
- package/dist/components/Wizard/steps/SupabaseCredentialsStep.js +20 -21
- package/dist/components/Wizard/steps/VersionStep.js +45 -23
- package/dist/components/Wizard/steps/index.d.ts +3 -3
- package/dist/components/Wizard/steps/index.js +3 -3
- package/dist/components/common/CommandApproval.d.ts +12 -0
- package/dist/components/common/CommandApproval.js +91 -0
- package/dist/components/common/DeploymentPicker.d.ts +14 -0
- package/dist/components/common/DeploymentPicker.js +16 -0
- package/dist/components/common/index.d.ts +2 -0
- package/dist/components/common/index.js +2 -0
- package/dist/index.js +94 -62
- package/dist/lib/cloudCli.d.ts +134 -63
- package/dist/lib/cloudCli.js +512 -220
- package/dist/lib/clusterSetupDefaults.d.ts +30 -0
- package/dist/lib/clusterSetupDefaults.js +64 -0
- package/dist/lib/commandApproval.d.ts +26 -0
- package/dist/lib/commandApproval.js +114 -0
- package/dist/lib/config.d.ts +12 -10
- package/dist/lib/config.js +91 -33
- package/dist/lib/configFixtures.d.ts +5 -0
- package/dist/lib/configFixtures.js +513 -0
- package/dist/lib/deploymentHealth.d.ts +32 -0
- package/dist/lib/deploymentHealth.js +157 -0
- package/dist/lib/dns.d.ts +1 -1
- package/dist/lib/dns.js +19 -1
- package/dist/lib/dns.test.d.ts +1 -0
- package/dist/lib/dns.test.js +27 -0
- package/dist/lib/dockerHub.d.ts +12 -1
- package/dist/lib/dockerHub.js +18 -8
- package/dist/lib/helm.d.ts +4 -0
- package/dist/lib/helm.js +16 -0
- package/dist/lib/helmValues.d.ts +25 -0
- package/dist/lib/helmValues.js +1937 -259
- package/dist/lib/helmValues.test.d.ts +1 -0
- package/dist/lib/helmValues.test.js +966 -0
- package/dist/lib/htpasswd.d.ts +1 -0
- package/dist/lib/htpasswd.js +15 -0
- package/dist/lib/kubernetes.d.ts +126 -13
- package/dist/lib/kubernetes.js +624 -134
- package/dist/lib/secrets.d.ts +23 -0
- package/dist/lib/secrets.js +158 -0
- package/dist/lib/validateValues.d.ts +31 -0
- package/dist/lib/validateValues.js +253 -0
- package/dist/lib/versions.d.ts +82 -11
- package/dist/lib/versions.js +131 -31
- package/dist/lib/versions.test.d.ts +1 -0
- package/dist/lib/versions.test.js +81 -0
- package/dist/lib/wizardSteps.d.ts +14 -0
- package/dist/lib/wizardSteps.js +23 -0
- package/dist/lib/workloadIdentity.d.ts +26 -0
- package/dist/lib/workloadIdentity.js +323 -0
- package/dist/lib/workloadIdentity.test.d.ts +1 -0
- package/dist/lib/workloadIdentity.test.js +57 -0
- package/dist/types/index.d.ts +2152 -95
- package/dist/types/index.js +554 -286
- package/package.json +10 -4
- package/schema/values.schema.json +1934 -0
- package/dist/components/Wizard/steps/CredentialsStep.d.ts +0 -6
- package/dist/components/Wizard/steps/CredentialsStep.js +0 -22
- package/dist/components/Wizard/steps/DeploymentModeStep.d.ts +0 -5
- package/dist/components/Wizard/steps/DeploymentModeStep.js +0 -26
- package/dist/components/Wizard/steps/TierStep.d.ts +0 -6
- package/dist/components/Wizard/steps/TierStep.js +0 -29
- package/dist/lib/terraform.d.ts +0 -66
- package/dist/lib/terraform.js +0 -754
- package/terraform/aws/main.tf +0 -355
- package/terraform/azure/main.tf +0 -371
- package/terraform/gcp/main.tf +0 -407
|
@@ -0,0 +1,355 @@
|
|
|
1
|
+
AWSTemplateFormatVersion: "2010-09-09"
|
|
2
|
+
Description: >-
|
|
3
|
+
Rulebricks EKS cluster, turnkey. Stands up an EKS cluster with a managed node
|
|
4
|
+
group and a single IAM role + S3 bucket for all Rulebricks data:
|
|
5
|
+
decision logs -> s3://<cluster>-data/decision-logs/ (Vector) SA: vector
|
|
6
|
+
backups -> s3://<cluster>-data/db-backups/ (backup job) SA: <ns>-backup
|
|
7
|
+
metrics -> AMP (Prometheus remote write) SA: prometheus
|
|
8
|
+
Identity uses EKS Pod Identity (AWS's recommended mechanism for new clusters):
|
|
9
|
+
one IAM role with scoped policies. The namespace-scoped Pod Identity
|
|
10
|
+
associations binding vector/backup/prometheus to this role are created by the
|
|
11
|
+
Rulebricks CLI at deploy time, so this stack stays deployment-independent and
|
|
12
|
+
one cluster can host many deployments. eksctl alone cannot create the S3 bucket
|
|
13
|
+
or AMP workspace, so the full picture lives here in one stack.
|
|
14
|
+
|
|
15
|
+
Parameters:
|
|
16
|
+
ClusterName:
|
|
17
|
+
Type: String
|
|
18
|
+
Default: rulebricks-cluster
|
|
19
|
+
KubernetesVersion:
|
|
20
|
+
Type: String
|
|
21
|
+
Default: "1.34"
|
|
22
|
+
NodeInstanceType:
|
|
23
|
+
Type: String
|
|
24
|
+
Default: c7i.xlarge
|
|
25
|
+
NodeDesiredCapacity:
|
|
26
|
+
Type: Number
|
|
27
|
+
Default: 2
|
|
28
|
+
NodeMinSize:
|
|
29
|
+
Type: Number
|
|
30
|
+
Default: 2
|
|
31
|
+
NodeMaxSize:
|
|
32
|
+
Type: Number
|
|
33
|
+
# Core services need only 2-4 small nodes; burst capacity lives in the
|
|
34
|
+
# dedicated burst nodegroup, and a low core ceiling steers the autoscaler
|
|
35
|
+
# there when the worker fleet scales out.
|
|
36
|
+
Default: 4
|
|
37
|
+
NodeVolumeSizeGiB:
|
|
38
|
+
Type: Number
|
|
39
|
+
Default: 50
|
|
40
|
+
VpcCidr:
|
|
41
|
+
Type: String
|
|
42
|
+
Default: 10.0.0.0/16
|
|
43
|
+
EnableBurstPool:
|
|
44
|
+
Type: String
|
|
45
|
+
AllowedValues: ["true", "false"]
|
|
46
|
+
Default: "true"
|
|
47
|
+
Description: >-
|
|
48
|
+
Add a dedicated burst nodegroup: one large on-demand node (scales
|
|
49
|
+
0->BurstNodeMaxSize) labeled and tainted rulebricks.com/pool=burst.
|
|
50
|
+
The Rulebricks chart makes workers tolerate and softly prefer it out
|
|
51
|
+
of the box, keeping the scaled-out fleet off the core nodes.
|
|
52
|
+
BurstInstanceType:
|
|
53
|
+
Type: String
|
|
54
|
+
Default: c7i.4xlarge
|
|
55
|
+
Description: >-
|
|
56
|
+
Instance type for the burst nodegroup. Default 16 vCPU: 2x4 vCPU core
|
|
57
|
+
floor + 16 = 24 vCPU running steady-state at full burst, and exactly
|
|
58
|
+
32 vCPU even with the core nodegroup at its 4-node max.
|
|
59
|
+
BurstNodeMaxSize:
|
|
60
|
+
Type: Number
|
|
61
|
+
Default: 1
|
|
62
|
+
Conditions:
|
|
63
|
+
BurstPoolEnabled: !Equals [!Ref EnableBurstPool, "true"]
|
|
64
|
+
|
|
65
|
+
Resources:
|
|
66
|
+
# ---------------------------------------------------------------------------
|
|
67
|
+
# Networking (minimal: VPC + 2 public subnets across 2 AZs + IGW + routing)
|
|
68
|
+
# ---------------------------------------------------------------------------
|
|
69
|
+
Vpc:
|
|
70
|
+
Type: AWS::EC2::VPC
|
|
71
|
+
Properties:
|
|
72
|
+
CidrBlock: !Ref VpcCidr
|
|
73
|
+
EnableDnsHostnames: true
|
|
74
|
+
EnableDnsSupport: true
|
|
75
|
+
Tags:
|
|
76
|
+
- { Key: Name, Value: !Sub "${ClusterName}-vpc" }
|
|
77
|
+
- { Key: Environment, Value: rulebricks }
|
|
78
|
+
|
|
79
|
+
InternetGateway:
|
|
80
|
+
Type: AWS::EC2::InternetGateway
|
|
81
|
+
|
|
82
|
+
VpcGatewayAttachment:
|
|
83
|
+
Type: AWS::EC2::VPCGatewayAttachment
|
|
84
|
+
Properties:
|
|
85
|
+
VpcId: !Ref Vpc
|
|
86
|
+
InternetGatewayId: !Ref InternetGateway
|
|
87
|
+
|
|
88
|
+
SubnetA:
|
|
89
|
+
Type: AWS::EC2::Subnet
|
|
90
|
+
Properties:
|
|
91
|
+
VpcId: !Ref Vpc
|
|
92
|
+
AvailabilityZone: !Select [0, !GetAZs ""]
|
|
93
|
+
CidrBlock: 10.0.0.0/19
|
|
94
|
+
MapPublicIpOnLaunch: true
|
|
95
|
+
Tags:
|
|
96
|
+
- { Key: Name, Value: !Sub "${ClusterName}-subnet-a" }
|
|
97
|
+
|
|
98
|
+
SubnetB:
|
|
99
|
+
Type: AWS::EC2::Subnet
|
|
100
|
+
Properties:
|
|
101
|
+
VpcId: !Ref Vpc
|
|
102
|
+
AvailabilityZone: !Select [1, !GetAZs ""]
|
|
103
|
+
CidrBlock: 10.0.32.0/19
|
|
104
|
+
MapPublicIpOnLaunch: true
|
|
105
|
+
Tags:
|
|
106
|
+
- { Key: Name, Value: !Sub "${ClusterName}-subnet-b" }
|
|
107
|
+
|
|
108
|
+
RouteTable:
|
|
109
|
+
Type: AWS::EC2::RouteTable
|
|
110
|
+
Properties:
|
|
111
|
+
VpcId: !Ref Vpc
|
|
112
|
+
|
|
113
|
+
DefaultRoute:
|
|
114
|
+
Type: AWS::EC2::Route
|
|
115
|
+
DependsOn: VpcGatewayAttachment
|
|
116
|
+
Properties:
|
|
117
|
+
RouteTableId: !Ref RouteTable
|
|
118
|
+
DestinationCidrBlock: 0.0.0.0/0
|
|
119
|
+
GatewayId: !Ref InternetGateway
|
|
120
|
+
|
|
121
|
+
SubnetARouteAssoc:
|
|
122
|
+
Type: AWS::EC2::SubnetRouteTableAssociation
|
|
123
|
+
Properties:
|
|
124
|
+
SubnetId: !Ref SubnetA
|
|
125
|
+
RouteTableId: !Ref RouteTable
|
|
126
|
+
|
|
127
|
+
SubnetBRouteAssoc:
|
|
128
|
+
Type: AWS::EC2::SubnetRouteTableAssociation
|
|
129
|
+
Properties:
|
|
130
|
+
SubnetId: !Ref SubnetB
|
|
131
|
+
RouteTableId: !Ref RouteTable
|
|
132
|
+
|
|
133
|
+
# ---------------------------------------------------------------------------
|
|
134
|
+
# Cluster + node group IAM roles
|
|
135
|
+
# ---------------------------------------------------------------------------
|
|
136
|
+
ClusterRole:
|
|
137
|
+
Type: AWS::IAM::Role
|
|
138
|
+
Properties:
|
|
139
|
+
AssumeRolePolicyDocument:
|
|
140
|
+
Version: "2012-10-17"
|
|
141
|
+
Statement:
|
|
142
|
+
- Effect: Allow
|
|
143
|
+
Principal: { Service: eks.amazonaws.com }
|
|
144
|
+
Action: sts:AssumeRole
|
|
145
|
+
ManagedPolicyArns:
|
|
146
|
+
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
|
|
147
|
+
|
|
148
|
+
NodeRole:
|
|
149
|
+
Type: AWS::IAM::Role
|
|
150
|
+
Properties:
|
|
151
|
+
AssumeRolePolicyDocument:
|
|
152
|
+
Version: "2012-10-17"
|
|
153
|
+
Statement:
|
|
154
|
+
- Effect: Allow
|
|
155
|
+
Principal: { Service: ec2.amazonaws.com }
|
|
156
|
+
Action: sts:AssumeRole
|
|
157
|
+
ManagedPolicyArns:
|
|
158
|
+
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
|
|
159
|
+
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
|
|
160
|
+
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
|
|
161
|
+
- arn:aws:iam::aws:policy/AmazonEBSCSIDriverPolicy
|
|
162
|
+
|
|
163
|
+
# ---------------------------------------------------------------------------
|
|
164
|
+
# EKS cluster
|
|
165
|
+
# ---------------------------------------------------------------------------
|
|
166
|
+
Cluster:
|
|
167
|
+
Type: AWS::EKS::Cluster
|
|
168
|
+
Properties:
|
|
169
|
+
Name: !Ref ClusterName
|
|
170
|
+
Version: !Ref KubernetesVersion
|
|
171
|
+
RoleArn: !GetAtt ClusterRole.Arn
|
|
172
|
+
ResourcesVpcConfig:
|
|
173
|
+
SubnetIds:
|
|
174
|
+
- !Ref SubnetA
|
|
175
|
+
- !Ref SubnetB
|
|
176
|
+
EndpointPublicAccess: true
|
|
177
|
+
EndpointPrivateAccess: true
|
|
178
|
+
|
|
179
|
+
# Pod Identity agent add-on: required for PodIdentityAssociation to work.
|
|
180
|
+
PodIdentityAddon:
|
|
181
|
+
Type: AWS::EKS::Addon
|
|
182
|
+
Properties:
|
|
183
|
+
ClusterName: !Ref Cluster
|
|
184
|
+
AddonName: eks-pod-identity-agent
|
|
185
|
+
|
|
186
|
+
EbsCsiAddon:
|
|
187
|
+
Type: AWS::EKS::Addon
|
|
188
|
+
Properties:
|
|
189
|
+
ClusterName: !Ref Cluster
|
|
190
|
+
AddonName: aws-ebs-csi-driver
|
|
191
|
+
|
|
192
|
+
NodeGroup:
|
|
193
|
+
Type: AWS::EKS::Nodegroup
|
|
194
|
+
Properties:
|
|
195
|
+
ClusterName: !Ref Cluster
|
|
196
|
+
NodegroupName: standard-nodes
|
|
197
|
+
NodeRole: !GetAtt NodeRole.Arn
|
|
198
|
+
Subnets:
|
|
199
|
+
- !Ref SubnetA
|
|
200
|
+
- !Ref SubnetB
|
|
201
|
+
InstanceTypes:
|
|
202
|
+
- !Ref NodeInstanceType
|
|
203
|
+
AmiType: AL2023_x86_64_STANDARD # c7i = Intel/x86
|
|
204
|
+
DiskSize: !Ref NodeVolumeSizeGiB
|
|
205
|
+
ScalingConfig:
|
|
206
|
+
DesiredSize: !Ref NodeDesiredCapacity
|
|
207
|
+
MinSize: !Ref NodeMinSize
|
|
208
|
+
MaxSize: !Ref NodeMaxSize
|
|
209
|
+
|
|
210
|
+
# Dedicated burst nodegroup: one large on-demand node for the scaled-out
|
|
211
|
+
# worker fleet. The taint keeps everything except workers off it; the label
|
|
212
|
+
# is what the chart's soft worker affinity targets. EKS has no parked-VM
|
|
213
|
+
# (Deallocate) equivalent, so bursts cold-provision (~2-3 min); a Karpenter
|
|
214
|
+
# NodePool with the same label/taint is the future fast path.
|
|
215
|
+
BurstNodeGroup:
|
|
216
|
+
Type: AWS::EKS::Nodegroup
|
|
217
|
+
Condition: BurstPoolEnabled
|
|
218
|
+
Properties:
|
|
219
|
+
ClusterName: !Ref Cluster
|
|
220
|
+
NodegroupName: burst-workers
|
|
221
|
+
NodeRole: !GetAtt NodeRole.Arn
|
|
222
|
+
Subnets:
|
|
223
|
+
- !Ref SubnetA
|
|
224
|
+
- !Ref SubnetB
|
|
225
|
+
InstanceTypes:
|
|
226
|
+
- !Ref BurstInstanceType
|
|
227
|
+
AmiType: AL2023_x86_64_STANDARD
|
|
228
|
+
DiskSize: !Ref NodeVolumeSizeGiB
|
|
229
|
+
Labels:
|
|
230
|
+
rulebricks.com/pool: burst
|
|
231
|
+
Taints:
|
|
232
|
+
- Key: rulebricks.com/pool
|
|
233
|
+
Value: burst
|
|
234
|
+
Effect: NO_SCHEDULE
|
|
235
|
+
ScalingConfig:
|
|
236
|
+
DesiredSize: 0
|
|
237
|
+
MinSize: 0
|
|
238
|
+
MaxSize: !Ref BurstNodeMaxSize
|
|
239
|
+
|
|
240
|
+
# ---------------------------------------------------------------------------
|
|
241
|
+
# OBJECT STORAGE (all Rulebricks data)
|
|
242
|
+
# One bucket holds everything; decision logs and backups are key prefixes
|
|
243
|
+
# (decision-logs/ and db-backups/) within it.
|
|
244
|
+
# ---------------------------------------------------------------------------
|
|
245
|
+
DataBucket:
|
|
246
|
+
Type: AWS::S3::Bucket
|
|
247
|
+
Properties:
|
|
248
|
+
BucketName: !Sub "${ClusterName}-data-${AWS::AccountId}"
|
|
249
|
+
PublicAccessBlockConfiguration:
|
|
250
|
+
BlockPublicAcls: true
|
|
251
|
+
BlockPublicPolicy: true
|
|
252
|
+
IgnorePublicAcls: true
|
|
253
|
+
RestrictPublicBuckets: true
|
|
254
|
+
BucketEncryption:
|
|
255
|
+
ServerSideEncryptionConfiguration:
|
|
256
|
+
- ServerSideEncryptionByDefault: { SSEAlgorithm: AES256 }
|
|
257
|
+
|
|
258
|
+
# ---------------------------------------------------------------------------
|
|
259
|
+
# METRICS (Amazon Managed Service for Prometheus workspace)
|
|
260
|
+
# ---------------------------------------------------------------------------
|
|
261
|
+
PrometheusWorkspace:
|
|
262
|
+
Type: AWS::APS::Workspace
|
|
263
|
+
Properties:
|
|
264
|
+
Alias: !Sub "${ClusterName}-amp"
|
|
265
|
+
|
|
266
|
+
# ---------------------------------------------------------------------------
|
|
267
|
+
# RULEBRICKS ROLE (single role for all data paths)
|
|
268
|
+
# Trusts the EKS Pod Identity service principal. The Rulebricks CLI creates the
|
|
269
|
+
# namespace-scoped Pod Identity associations (vector/backup/prometheus) at
|
|
270
|
+
# deploy time. Grants S3 read/write on the one data bucket and aps:RemoteWrite
|
|
271
|
+
# on the AMP workspace.
|
|
272
|
+
# ---------------------------------------------------------------------------
|
|
273
|
+
RulebricksRole:
|
|
274
|
+
Type: AWS::IAM::Role
|
|
275
|
+
Properties:
|
|
276
|
+
RoleName: !Sub "${ClusterName}-rulebricks"
|
|
277
|
+
AssumeRolePolicyDocument:
|
|
278
|
+
Version: "2012-10-17"
|
|
279
|
+
Statement:
|
|
280
|
+
- Effect: Allow
|
|
281
|
+
Principal: { Service: pods.eks.amazonaws.com }
|
|
282
|
+
Action:
|
|
283
|
+
- sts:AssumeRole
|
|
284
|
+
- sts:TagSession
|
|
285
|
+
Policies:
|
|
286
|
+
- PolicyName: rulebricks-s3-data
|
|
287
|
+
PolicyDocument:
|
|
288
|
+
Version: "2012-10-17"
|
|
289
|
+
Statement:
|
|
290
|
+
- Effect: Allow
|
|
291
|
+
Action:
|
|
292
|
+
- s3:PutObject
|
|
293
|
+
- s3:GetObject
|
|
294
|
+
- s3:DeleteObject
|
|
295
|
+
- s3:ListBucket
|
|
296
|
+
Resource:
|
|
297
|
+
- !GetAtt DataBucket.Arn
|
|
298
|
+
- !Sub "${DataBucket.Arn}/*"
|
|
299
|
+
- PolicyName: rulebricks-amp-remote-write
|
|
300
|
+
PolicyDocument:
|
|
301
|
+
Version: "2012-10-17"
|
|
302
|
+
Statement:
|
|
303
|
+
- Effect: Allow
|
|
304
|
+
Action:
|
|
305
|
+
- aps:RemoteWrite
|
|
306
|
+
Resource:
|
|
307
|
+
- !GetAtt PrometheusWorkspace.Arn
|
|
308
|
+
- PolicyName: rulebricks-msk-iam
|
|
309
|
+
PolicyDocument:
|
|
310
|
+
Version: "2012-10-17"
|
|
311
|
+
Statement:
|
|
312
|
+
# AWS MSK IAM access for HPS + the Vector bridge when Kafka is
|
|
313
|
+
# externalized to MSK. Account-scoped so any managed cluster works
|
|
314
|
+
# without re-provisioning; tighten Resource to a specific cluster
|
|
315
|
+
# ARN to lock it down. Harmless when Kafka runs in-cluster.
|
|
316
|
+
- Effect: Allow
|
|
317
|
+
Action:
|
|
318
|
+
- kafka-cluster:Connect
|
|
319
|
+
- kafka-cluster:DescribeCluster
|
|
320
|
+
- kafka-cluster:DescribeClusterDynamicConfiguration
|
|
321
|
+
Resource:
|
|
322
|
+
- !Sub "arn:${AWS::Partition}:kafka:*:${AWS::AccountId}:cluster/*/*"
|
|
323
|
+
- Effect: Allow
|
|
324
|
+
Action:
|
|
325
|
+
- kafka-cluster:*Topic*
|
|
326
|
+
- kafka-cluster:WriteData
|
|
327
|
+
- kafka-cluster:ReadData
|
|
328
|
+
Resource:
|
|
329
|
+
- !Sub "arn:${AWS::Partition}:kafka:*:${AWS::AccountId}:topic/*/*/*"
|
|
330
|
+
- Effect: Allow
|
|
331
|
+
Action:
|
|
332
|
+
- kafka-cluster:AlterGroup
|
|
333
|
+
- kafka-cluster:DescribeGroup
|
|
334
|
+
Resource:
|
|
335
|
+
- !Sub "arn:${AWS::Partition}:kafka:*:${AWS::AccountId}:group/*/*/*"
|
|
336
|
+
|
|
337
|
+
# Pod Identity associations (hps / vector / clickhouse / backup / prometheus) are
|
|
338
|
+
# namespace-scoped, so the Rulebricks CLI creates them at `rulebricks deploy`
|
|
339
|
+
# time. This stack only provisions the deployment-independent role, bucket, and
|
|
340
|
+
# AMP workspace, so one cluster can host many deployments.
|
|
341
|
+
|
|
342
|
+
Outputs:
|
|
343
|
+
ClusterName:
|
|
344
|
+
Value: !Ref Cluster
|
|
345
|
+
KubeconfigCommand:
|
|
346
|
+
Value: !Sub "aws eks update-kubeconfig --name ${ClusterName} --region ${AWS::Region}"
|
|
347
|
+
DataBucketName:
|
|
348
|
+
Value: !Ref DataBucket
|
|
349
|
+
RulebricksRoleArn:
|
|
350
|
+
Value: !GetAtt RulebricksRole.Arn
|
|
351
|
+
PrometheusWorkspaceId:
|
|
352
|
+
Value: !Ref PrometheusWorkspace
|
|
353
|
+
PrometheusRemoteWriteUrl:
|
|
354
|
+
Description: Set as Prometheus remote_write url (append /api/v1/remote_write).
|
|
355
|
+
Value: !GetAtt PrometheusWorkspace.PrometheusEndpoint
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# Azure Cluster Setup
|
|
2
|
+
|
|
3
|
+
A compact, turnkey AKS cluster for Rulebricks. One Bicep deploy creates the
|
|
4
|
+
cluster **and** the object storage + Azure Monitor resources the platform needs,
|
|
5
|
+
fully wired to Workload Identity. Bring-your-own infra is also supported.
|
|
6
|
+
|
|
7
|
+
## Files
|
|
8
|
+
|
|
9
|
+
- `rulebricks-cluster.bicep` — AKS cluster (Azure CNI, Calico, Standard LB, Disk CSI, OIDC issuer, Workload Identity) plus the single Rulebricks identity and its data paths.
|
|
10
|
+
- `parameters.json` — sample parameters (turnkey defaults: all paths on).
|
|
11
|
+
- `check-aks-prereqs.sh` — verifies login, providers, quota, role-assignment rights, kubectl/helm.
|
|
12
|
+
|
|
13
|
+
## One identity, one container (deployment-independent)
|
|
14
|
+
|
|
15
|
+
A single user-assigned identity, `<cluster>-rulebricks`, holds both data roles,
|
|
16
|
+
and all data lives in one container, `<cluster>-data`, under per-purpose prefixes.
|
|
17
|
+
Toggle each path with its `enable*` flag.
|
|
18
|
+
|
|
19
|
+
| Path | Service account | Role / target |
|
|
20
|
+
| --------------------------------- | --------------------------- | --------------------------------------------------------------- |
|
|
21
|
+
| Decision logs (Vector → Blob) | `vector` | Storage Blob Data Contributor → `<cluster>-data/decision-logs/` |
|
|
22
|
+
| DB backups (job → Blob) | `<release>-backup` | Storage Blob Data Contributor → `<cluster>-data/db-backups/` |
|
|
23
|
+
| Metrics (Prometheus remote write) | `prometheus` | Monitoring Metrics Publisher → Azure Monitor DCR |
|
|
24
|
+
|
|
25
|
+
The identity has Storage Blob Data Contributor on the storage account and
|
|
26
|
+
Monitoring Metrics Publisher on the DCR.
|
|
27
|
+
|
|
28
|
+
> **This template does not need a deployment name.** Federated identity credentials are
|
|
29
|
+
> `namespace`-scoped (`system:serviceaccount:rulebricks-<deploymentName>:<sa>`), so they can't be
|
|
30
|
+
> created until the deployment namespace is known. The **Rulebricks CLI creates them at
|
|
31
|
+
> `rulebricks deploy` time** against this identity. That keeps cluster-setup generic, so one cluster
|
|
32
|
+
> can host any number of deployments without re-running it — the CLI adds each deployment's
|
|
33
|
+
> credentials on deploy. (Azure wildcard "flexible" FICs would avoid even that, but they're
|
|
34
|
+
> unsupported on managed identities and AKS OIDC issuers.)
|
|
35
|
+
|
|
36
|
+
## Turnkey vs. bring-your-own
|
|
37
|
+
|
|
38
|
+
- `createStorage: true` provisions a storage account + the single `<cluster>-data` container (deterministic globally-unique account name). `false` → set `existingStorageAccountName`.
|
|
39
|
+
- `createMonitorWorkspace: true` provisions an Azure Monitor workspace + data collection endpoint + rule, so the metrics role is scoped to a DCR we own. `false` → set `existingDataCollectionRuleId`.
|
|
40
|
+
|
|
41
|
+
Defaults are turnkey: `createStorage`, `createMonitorWorkspace`, and all `enable*` flags are `true`.
|
|
42
|
+
|
|
43
|
+
## Core cluster parameters
|
|
44
|
+
|
|
45
|
+
`clusterName` (`rulebricks-cluster`), `location` (`eastus`), `kubernetesVersion`
|
|
46
|
+
(`1.34`), `nodeCount`/`maxNodeCount` (`2`/`4`), `nodeVmSize`
|
|
47
|
+
(`Standard_F4as_v6`), `maxPods` (`110`), `osDiskSizeGB` (`64`), `osDiskType`
|
|
48
|
+
(`Managed`). The default (core) pool runs the always-on services on two to
|
|
49
|
+
four 4-vCPU nodes; burst capacity lives in the dedicated burst pool below.
|
|
50
|
+
The `110` max-pods avoids the legacy 30/node limit, and the autoscaler
|
|
51
|
+
profile is tuned for bursts (`scan-interval` 10s, `least-waste` expander).
|
|
52
|
+
Both pools use `Deallocate` scale-down: removed nodes are parked (disk-only
|
|
53
|
+
cost, container images cached) and resume in ~30-60s instead of
|
|
54
|
+
reprovisioning.
|
|
55
|
+
|
|
56
|
+
### Burst worker pool (default on)
|
|
57
|
+
|
|
58
|
+
`enableBurstPool` (`true`), `burstVmSize` (`Standard_F16as_v6`, 16 vCPU -
|
|
59
|
+
the Fas_v6 family has no 24-vCPU size), `burstMaxCount` (`1`). One large
|
|
60
|
+
`User`-mode node that scales 0 -> 1 on demand and parks between bursts. It
|
|
61
|
+
is labeled and tainted `rulebricks.com/pool=burst`: the Rulebricks chart
|
|
62
|
+
makes workers tolerate the taint and softly prefer the label out of the box,
|
|
63
|
+
so the entire scaled-out worker fleet lands on this node while core services
|
|
64
|
+
stay on the default pool. Sizing math: 2 x 4 vCPU core floor + 16 vCPU burst
|
|
65
|
+
= 24 vCPU running steady-state at full burst, and exactly 32 vCPU even with
|
|
66
|
+
the core pool at its 4-node max - sized to a 32-vCPU family quota.
|
|
67
|
+
First-ever burst
|
|
68
|
+
cold-provisions the VM (~2-4 min); every burst after resumes the parked VM
|
|
69
|
+
(~30-60s). Note deallocated VMs resume into their original zone/SKU - in a
|
|
70
|
+
capacity-constrained region a resume can fail and the autoscaler retries;
|
|
71
|
+
the warm worker floor on the core pool carries traffic in the meantime.
|
|
72
|
+
|
|
73
|
+
## Check access
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
az login
|
|
77
|
+
az account set --subscription <subscription-id>
|
|
78
|
+
AZURE_LOCATION=eastus bash check-aks-prereqs.sh
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Register any flagged providers with the suggested `az provider register`
|
|
82
|
+
commands and wait for completion. Note: creating role assignments needs
|
|
83
|
+
**Owner** or **User Access Administrator** — Contributor alone is not enough.
|
|
84
|
+
|
|
85
|
+
## Create the cluster
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
az group create --name rulebricks-rg --location eastus
|
|
89
|
+
az deployment group create \
|
|
90
|
+
--resource-group rulebricks-rg \
|
|
91
|
+
--template-file rulebricks-cluster.bicep \
|
|
92
|
+
--parameters @parameters.json
|
|
93
|
+
|
|
94
|
+
az aks get-credentials --name rulebricks-cluster --resource-group rulebricks-rg
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Run `rulebricks init` once kubeconfig works, then select this cluster. The
|
|
98
|
+
deploy emits `rulebricksClientId`, the generated `storageAccountName`, the
|
|
99
|
+
`dataContainer` name, and `dceMetricsIngestionEndpoint` / `dcrImmutableId` for
|
|
100
|
+
the CLI to consume.
|
|
101
|
+
|
|
102
|
+
> Managed-Prometheus role assignments take ~30 min to propagate; expect HTTP 403
|
|
103
|
+
> in the Prometheus log until then. This is expected, not a misconfiguration.
|
|
104
|
+
|
|
105
|
+
## Delete the cluster
|
|
106
|
+
|
|
107
|
+
Run `rulebricks destroy <deployment-name>` first so Kubernetes removes
|
|
108
|
+
LoadBalancer services and PVC-backed disks. Then delete the resource group:
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
az group delete --name rulebricks-rg --yes
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
AKS cascade-deletes its `MC_<rg>_<cluster>_<region>` node resource group, so
|
|
115
|
+
this removes the cluster, node pool, identities, role assignments, federated
|
|
116
|
+
credentials, and (when created by the template) the storage account and Azure
|
|
117
|
+
Monitor workspace.
|
|
118
|
+
|
|
119
|
+
## Notes
|
|
120
|
+
|
|
121
|
+
- Inbound TCP `80`/`443` are open to the AKS subnet for LoadBalancer services and cert-manager HTTP-01 validation.
|
|
122
|
+
- `maxPods` is fixed at node-pool creation; changing it means a replacement pool or recreate.
|
|
123
|
+
- Federated identity credentials for vector/backup/prometheus are created by the Rulebricks CLI at deploy time, so this template takes no deployment name. (The optional `external-dns` path is the one exception — set `rulebricksNamespace` if you enable it.)
|
|
124
|
+
- BYO storage/monitor resources outside this resource group need an admin to assign the relevant role to the emitted identity client ID.
|
|
125
|
+
|
|
126
|
+
## Fallback secret-based auth
|
|
127
|
+
|
|
128
|
+
If Workload Identity is unavailable, decision-log export can use a storage
|
|
129
|
+
connection string, and metrics can use OAuth client-secret auth:
|
|
130
|
+
|
|
131
|
+
```bash
|
|
132
|
+
kubectl create secret generic azure-blob-logs \
|
|
133
|
+
--namespace rulebricks-demo \
|
|
134
|
+
--from-literal=connection-string='<connection-string>'
|
|
135
|
+
# CLI prompt: azure-blob-logs:connection-string
|
|
136
|
+
|
|
137
|
+
kubectl create secret generic azure-monitor-oauth \
|
|
138
|
+
--namespace rulebricks-demo \
|
|
139
|
+
--from-literal=client-secret='<client-secret>'
|
|
140
|
+
# CLI prompt: azure-monitor-oauth:client-secret
|
|
141
|
+
```
|