@rulebricks/cli 2.1.6 → 2.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (114) hide show
  1. package/README.md +75 -14
  2. package/cluster-setup/aws/README.md +123 -0
  3. package/cluster-setup/aws/check-aws-access.sh +242 -0
  4. package/cluster-setup/aws/parameters.json +13 -0
  5. package/cluster-setup/aws/rulebricks-cluster.cfn.yaml +355 -0
  6. package/cluster-setup/azure/README.md +141 -0
  7. package/cluster-setup/azure/check-aks-prereqs.sh +276 -0
  8. package/cluster-setup/azure/parameters.json +30 -0
  9. package/cluster-setup/azure/rulebricks-cluster.bicep +546 -0
  10. package/cluster-setup/gcp/README.md +189 -0
  11. package/cluster-setup/gcp/check-gke-prereqs.sh +260 -0
  12. package/dist/commands/backup.d.ts +5 -0
  13. package/dist/commands/backup.js +104 -0
  14. package/dist/commands/deploy.d.ts +3 -1
  15. package/dist/commands/deploy.js +226 -326
  16. package/dist/commands/destroy.d.ts +1 -1
  17. package/dist/commands/destroy.js +73 -123
  18. package/dist/commands/init.d.ts +5 -1
  19. package/dist/commands/init.js +78 -47
  20. package/dist/commands/list.d.ts +1 -0
  21. package/dist/commands/list.js +74 -0
  22. package/dist/commands/open.d.ts +1 -1
  23. package/dist/commands/open.js +4 -12
  24. package/dist/commands/redeploy.d.ts +6 -0
  25. package/dist/commands/redeploy.js +310 -0
  26. package/dist/commands/restore.d.ts +5 -0
  27. package/dist/commands/restore.js +338 -0
  28. package/dist/commands/status.js +62 -49
  29. package/dist/commands/upgrade.js +74 -51
  30. package/dist/components/DNSWaitScreen.d.ts +5 -1
  31. package/dist/components/DNSWaitScreen.js +47 -41
  32. package/dist/components/Wizard/WizardContext.d.ts +174 -29
  33. package/dist/components/Wizard/WizardContext.js +896 -91
  34. package/dist/components/Wizard/steps/CloudProviderStep.js +192 -102
  35. package/dist/components/Wizard/steps/DomainStep.js +5 -24
  36. package/dist/components/Wizard/steps/ExternalServicesStep.d.ts +6 -0
  37. package/dist/components/Wizard/steps/ExternalServicesStep.js +645 -0
  38. package/dist/components/Wizard/steps/FeatureConfigStep.d.ts +2 -1
  39. package/dist/components/Wizard/steps/FeatureConfigStep.js +959 -248
  40. package/dist/components/Wizard/steps/FeaturesStep.js +31 -35
  41. package/dist/components/Wizard/steps/ObservabilityStep.d.ts +6 -0
  42. package/dist/components/Wizard/steps/ObservabilityStep.js +137 -0
  43. package/dist/components/Wizard/steps/ReviewStep.d.ts +2 -1
  44. package/dist/components/Wizard/steps/ReviewStep.js +56 -7
  45. package/dist/components/Wizard/steps/StorageStep.d.ts +9 -0
  46. package/dist/components/Wizard/steps/StorageStep.js +592 -0
  47. package/dist/components/Wizard/steps/SupabaseCredentialsStep.js +20 -21
  48. package/dist/components/Wizard/steps/VersionStep.js +45 -23
  49. package/dist/components/Wizard/steps/index.d.ts +3 -3
  50. package/dist/components/Wizard/steps/index.js +3 -3
  51. package/dist/components/common/CommandApproval.d.ts +12 -0
  52. package/dist/components/common/CommandApproval.js +91 -0
  53. package/dist/components/common/DeploymentPicker.d.ts +14 -0
  54. package/dist/components/common/DeploymentPicker.js +16 -0
  55. package/dist/components/common/index.d.ts +2 -0
  56. package/dist/components/common/index.js +2 -0
  57. package/dist/index.js +94 -62
  58. package/dist/lib/cloudCli.d.ts +134 -63
  59. package/dist/lib/cloudCli.js +512 -220
  60. package/dist/lib/clusterSetupDefaults.d.ts +30 -0
  61. package/dist/lib/clusterSetupDefaults.js +64 -0
  62. package/dist/lib/commandApproval.d.ts +26 -0
  63. package/dist/lib/commandApproval.js +114 -0
  64. package/dist/lib/config.d.ts +12 -10
  65. package/dist/lib/config.js +91 -33
  66. package/dist/lib/configFixtures.d.ts +5 -0
  67. package/dist/lib/configFixtures.js +513 -0
  68. package/dist/lib/deploymentHealth.d.ts +32 -0
  69. package/dist/lib/deploymentHealth.js +157 -0
  70. package/dist/lib/dns.d.ts +1 -1
  71. package/dist/lib/dns.js +19 -1
  72. package/dist/lib/dns.test.d.ts +1 -0
  73. package/dist/lib/dns.test.js +27 -0
  74. package/dist/lib/dockerHub.d.ts +12 -1
  75. package/dist/lib/dockerHub.js +18 -8
  76. package/dist/lib/helm.d.ts +4 -0
  77. package/dist/lib/helm.js +16 -0
  78. package/dist/lib/helmValues.d.ts +25 -0
  79. package/dist/lib/helmValues.js +1937 -259
  80. package/dist/lib/helmValues.test.d.ts +1 -0
  81. package/dist/lib/helmValues.test.js +966 -0
  82. package/dist/lib/htpasswd.d.ts +1 -0
  83. package/dist/lib/htpasswd.js +15 -0
  84. package/dist/lib/kubernetes.d.ts +126 -13
  85. package/dist/lib/kubernetes.js +624 -134
  86. package/dist/lib/secrets.d.ts +23 -0
  87. package/dist/lib/secrets.js +158 -0
  88. package/dist/lib/validateValues.d.ts +31 -0
  89. package/dist/lib/validateValues.js +253 -0
  90. package/dist/lib/versions.d.ts +82 -11
  91. package/dist/lib/versions.js +131 -31
  92. package/dist/lib/versions.test.d.ts +1 -0
  93. package/dist/lib/versions.test.js +81 -0
  94. package/dist/lib/wizardSteps.d.ts +14 -0
  95. package/dist/lib/wizardSteps.js +23 -0
  96. package/dist/lib/workloadIdentity.d.ts +26 -0
  97. package/dist/lib/workloadIdentity.js +323 -0
  98. package/dist/lib/workloadIdentity.test.d.ts +1 -0
  99. package/dist/lib/workloadIdentity.test.js +57 -0
  100. package/dist/types/index.d.ts +2152 -95
  101. package/dist/types/index.js +554 -286
  102. package/package.json +10 -4
  103. package/schema/values.schema.json +1934 -0
  104. package/dist/components/Wizard/steps/CredentialsStep.d.ts +0 -6
  105. package/dist/components/Wizard/steps/CredentialsStep.js +0 -22
  106. package/dist/components/Wizard/steps/DeploymentModeStep.d.ts +0 -5
  107. package/dist/components/Wizard/steps/DeploymentModeStep.js +0 -26
  108. package/dist/components/Wizard/steps/TierStep.d.ts +0 -6
  109. package/dist/components/Wizard/steps/TierStep.js +0 -29
  110. package/dist/lib/terraform.d.ts +0 -66
  111. package/dist/lib/terraform.js +0 -754
  112. package/terraform/aws/main.tf +0 -355
  113. package/terraform/azure/main.tf +0 -371
  114. package/terraform/gcp/main.tf +0 -407
@@ -0,0 +1,355 @@
1
+ AWSTemplateFormatVersion: "2010-09-09"
2
+ Description: >-
3
+ Rulebricks EKS cluster, turnkey. Stands up an EKS cluster with a managed node
4
+ group and a single IAM role + S3 bucket for all Rulebricks data:
5
+ decision logs -> s3://<cluster>-data/decision-logs/ (Vector) SA: vector
6
+ backups -> s3://<cluster>-data/db-backups/ (backup job) SA: <ns>-backup
7
+ metrics -> AMP (Prometheus remote write) SA: prometheus
8
+ Identity uses EKS Pod Identity (AWS's recommended mechanism for new clusters):
9
+ one IAM role with scoped policies. The namespace-scoped Pod Identity
10
+ associations binding vector/backup/prometheus to this role are created by the
11
+ Rulebricks CLI at deploy time, so this stack stays deployment-independent and
12
+ one cluster can host many deployments. eksctl alone cannot create the S3 bucket
13
+ or AMP workspace, so the full picture lives here in one stack.
14
+
15
+ Parameters:
16
+ ClusterName:
17
+ Type: String
18
+ Default: rulebricks-cluster
19
+ KubernetesVersion:
20
+ Type: String
21
+ Default: "1.34"
22
+ NodeInstanceType:
23
+ Type: String
24
+ Default: c7i.xlarge
25
+ NodeDesiredCapacity:
26
+ Type: Number
27
+ Default: 2
28
+ NodeMinSize:
29
+ Type: Number
30
+ Default: 2
31
+ NodeMaxSize:
32
+ Type: Number
33
+ # Core services need only 2-4 small nodes; burst capacity lives in the
34
+ # dedicated burst nodegroup, and a low core ceiling steers the autoscaler
35
+ # there when the worker fleet scales out.
36
+ Default: 4
37
+ NodeVolumeSizeGiB:
38
+ Type: Number
39
+ Default: 50
40
+ VpcCidr:
41
+ Type: String
42
+ Default: 10.0.0.0/16
43
+ EnableBurstPool:
44
+ Type: String
45
+ AllowedValues: ["true", "false"]
46
+ Default: "true"
47
+ Description: >-
48
+ Add a dedicated burst nodegroup: one large on-demand node (scales
49
+ 0->BurstNodeMaxSize) labeled and tainted rulebricks.com/pool=burst.
50
+ The Rulebricks chart makes workers tolerate and softly prefer it out
51
+ of the box, keeping the scaled-out fleet off the core nodes.
52
+ BurstInstanceType:
53
+ Type: String
54
+ Default: c7i.4xlarge
55
+ Description: >-
56
+ Instance type for the burst nodegroup. Default 16 vCPU: 2x4 vCPU core
57
+ floor + 16 = 24 vCPU running steady-state at full burst, and exactly
58
+ 32 vCPU even with the core nodegroup at its 4-node max.
59
+ BurstNodeMaxSize:
60
+ Type: Number
61
+ Default: 1
62
+ Conditions:
63
+ BurstPoolEnabled: !Equals [!Ref EnableBurstPool, "true"]
64
+
65
+ Resources:
66
+ # ---------------------------------------------------------------------------
67
+ # Networking (minimal: VPC + 2 public subnets across 2 AZs + IGW + routing)
68
+ # ---------------------------------------------------------------------------
69
+ Vpc:
70
+ Type: AWS::EC2::VPC
71
+ Properties:
72
+ CidrBlock: !Ref VpcCidr
73
+ EnableDnsHostnames: true
74
+ EnableDnsSupport: true
75
+ Tags:
76
+ - { Key: Name, Value: !Sub "${ClusterName}-vpc" }
77
+ - { Key: Environment, Value: rulebricks }
78
+
79
+ InternetGateway:
80
+ Type: AWS::EC2::InternetGateway
81
+
82
+ VpcGatewayAttachment:
83
+ Type: AWS::EC2::VPCGatewayAttachment
84
+ Properties:
85
+ VpcId: !Ref Vpc
86
+ InternetGatewayId: !Ref InternetGateway
87
+
88
+ SubnetA:
89
+ Type: AWS::EC2::Subnet
90
+ Properties:
91
+ VpcId: !Ref Vpc
92
+ AvailabilityZone: !Select [0, !GetAZs ""]
93
+ CidrBlock: 10.0.0.0/19
94
+ MapPublicIpOnLaunch: true
95
+ Tags:
96
+ - { Key: Name, Value: !Sub "${ClusterName}-subnet-a" }
97
+
98
+ SubnetB:
99
+ Type: AWS::EC2::Subnet
100
+ Properties:
101
+ VpcId: !Ref Vpc
102
+ AvailabilityZone: !Select [1, !GetAZs ""]
103
+ CidrBlock: 10.0.32.0/19
104
+ MapPublicIpOnLaunch: true
105
+ Tags:
106
+ - { Key: Name, Value: !Sub "${ClusterName}-subnet-b" }
107
+
108
+ RouteTable:
109
+ Type: AWS::EC2::RouteTable
110
+ Properties:
111
+ VpcId: !Ref Vpc
112
+
113
+ DefaultRoute:
114
+ Type: AWS::EC2::Route
115
+ DependsOn: VpcGatewayAttachment
116
+ Properties:
117
+ RouteTableId: !Ref RouteTable
118
+ DestinationCidrBlock: 0.0.0.0/0
119
+ GatewayId: !Ref InternetGateway
120
+
121
+ SubnetARouteAssoc:
122
+ Type: AWS::EC2::SubnetRouteTableAssociation
123
+ Properties:
124
+ SubnetId: !Ref SubnetA
125
+ RouteTableId: !Ref RouteTable
126
+
127
+ SubnetBRouteAssoc:
128
+ Type: AWS::EC2::SubnetRouteTableAssociation
129
+ Properties:
130
+ SubnetId: !Ref SubnetB
131
+ RouteTableId: !Ref RouteTable
132
+
133
+ # ---------------------------------------------------------------------------
134
+ # Cluster + node group IAM roles
135
+ # ---------------------------------------------------------------------------
136
+ ClusterRole:
137
+ Type: AWS::IAM::Role
138
+ Properties:
139
+ AssumeRolePolicyDocument:
140
+ Version: "2012-10-17"
141
+ Statement:
142
+ - Effect: Allow
143
+ Principal: { Service: eks.amazonaws.com }
144
+ Action: sts:AssumeRole
145
+ ManagedPolicyArns:
146
+ - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
147
+
148
+ NodeRole:
149
+ Type: AWS::IAM::Role
150
+ Properties:
151
+ AssumeRolePolicyDocument:
152
+ Version: "2012-10-17"
153
+ Statement:
154
+ - Effect: Allow
155
+ Principal: { Service: ec2.amazonaws.com }
156
+ Action: sts:AssumeRole
157
+ ManagedPolicyArns:
158
+ - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
159
+ - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
160
+ - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
161
+ - arn:aws:iam::aws:policy/AmazonEBSCSIDriverPolicy
162
+
163
+ # ---------------------------------------------------------------------------
164
+ # EKS cluster
165
+ # ---------------------------------------------------------------------------
166
+ Cluster:
167
+ Type: AWS::EKS::Cluster
168
+ Properties:
169
+ Name: !Ref ClusterName
170
+ Version: !Ref KubernetesVersion
171
+ RoleArn: !GetAtt ClusterRole.Arn
172
+ ResourcesVpcConfig:
173
+ SubnetIds:
174
+ - !Ref SubnetA
175
+ - !Ref SubnetB
176
+ EndpointPublicAccess: true
177
+ EndpointPrivateAccess: true
178
+
179
+ # Pod Identity agent add-on: required for PodIdentityAssociation to work.
180
+ PodIdentityAddon:
181
+ Type: AWS::EKS::Addon
182
+ Properties:
183
+ ClusterName: !Ref Cluster
184
+ AddonName: eks-pod-identity-agent
185
+
186
+ EbsCsiAddon:
187
+ Type: AWS::EKS::Addon
188
+ Properties:
189
+ ClusterName: !Ref Cluster
190
+ AddonName: aws-ebs-csi-driver
191
+
192
+ NodeGroup:
193
+ Type: AWS::EKS::Nodegroup
194
+ Properties:
195
+ ClusterName: !Ref Cluster
196
+ NodegroupName: standard-nodes
197
+ NodeRole: !GetAtt NodeRole.Arn
198
+ Subnets:
199
+ - !Ref SubnetA
200
+ - !Ref SubnetB
201
+ InstanceTypes:
202
+ - !Ref NodeInstanceType
203
+ AmiType: AL2023_x86_64_STANDARD # c7i = Intel/x86
204
+ DiskSize: !Ref NodeVolumeSizeGiB
205
+ ScalingConfig:
206
+ DesiredSize: !Ref NodeDesiredCapacity
207
+ MinSize: !Ref NodeMinSize
208
+ MaxSize: !Ref NodeMaxSize
209
+
210
+ # Dedicated burst nodegroup: one large on-demand node for the scaled-out
211
+ # worker fleet. The taint keeps everything except workers off it; the label
212
+ # is what the chart's soft worker affinity targets. EKS has no parked-VM
213
+ # (Deallocate) equivalent, so bursts cold-provision (~2-3 min); a Karpenter
214
+ # NodePool with the same label/taint is the future fast path.
215
+ BurstNodeGroup:
216
+ Type: AWS::EKS::Nodegroup
217
+ Condition: BurstPoolEnabled
218
+ Properties:
219
+ ClusterName: !Ref Cluster
220
+ NodegroupName: burst-workers
221
+ NodeRole: !GetAtt NodeRole.Arn
222
+ Subnets:
223
+ - !Ref SubnetA
224
+ - !Ref SubnetB
225
+ InstanceTypes:
226
+ - !Ref BurstInstanceType
227
+ AmiType: AL2023_x86_64_STANDARD
228
+ DiskSize: !Ref NodeVolumeSizeGiB
229
+ Labels:
230
+ rulebricks.com/pool: burst
231
+ Taints:
232
+ - Key: rulebricks.com/pool
233
+ Value: burst
234
+ Effect: NO_SCHEDULE
235
+ ScalingConfig:
236
+ DesiredSize: 0
237
+ MinSize: 0
238
+ MaxSize: !Ref BurstNodeMaxSize
239
+
240
+ # ---------------------------------------------------------------------------
241
+ # OBJECT STORAGE (all Rulebricks data)
242
+ # One bucket holds everything; decision logs and backups are key prefixes
243
+ # (decision-logs/ and db-backups/) within it.
244
+ # ---------------------------------------------------------------------------
245
+ DataBucket:
246
+ Type: AWS::S3::Bucket
247
+ Properties:
248
+ BucketName: !Sub "${ClusterName}-data-${AWS::AccountId}"
249
+ PublicAccessBlockConfiguration:
250
+ BlockPublicAcls: true
251
+ BlockPublicPolicy: true
252
+ IgnorePublicAcls: true
253
+ RestrictPublicBuckets: true
254
+ BucketEncryption:
255
+ ServerSideEncryptionConfiguration:
256
+ - ServerSideEncryptionByDefault: { SSEAlgorithm: AES256 }
257
+
258
+ # ---------------------------------------------------------------------------
259
+ # METRICS (Amazon Managed Service for Prometheus workspace)
260
+ # ---------------------------------------------------------------------------
261
+ PrometheusWorkspace:
262
+ Type: AWS::APS::Workspace
263
+ Properties:
264
+ Alias: !Sub "${ClusterName}-amp"
265
+
266
+ # ---------------------------------------------------------------------------
267
+ # RULEBRICKS ROLE (single role for all data paths)
268
+ # Trusts the EKS Pod Identity service principal. The Rulebricks CLI creates the
269
+ # namespace-scoped Pod Identity associations (vector/backup/prometheus) at
270
+ # deploy time. Grants S3 read/write on the one data bucket and aps:RemoteWrite
271
+ # on the AMP workspace.
272
+ # ---------------------------------------------------------------------------
273
+ RulebricksRole:
274
+ Type: AWS::IAM::Role
275
+ Properties:
276
+ RoleName: !Sub "${ClusterName}-rulebricks"
277
+ AssumeRolePolicyDocument:
278
+ Version: "2012-10-17"
279
+ Statement:
280
+ - Effect: Allow
281
+ Principal: { Service: pods.eks.amazonaws.com }
282
+ Action:
283
+ - sts:AssumeRole
284
+ - sts:TagSession
285
+ Policies:
286
+ - PolicyName: rulebricks-s3-data
287
+ PolicyDocument:
288
+ Version: "2012-10-17"
289
+ Statement:
290
+ - Effect: Allow
291
+ Action:
292
+ - s3:PutObject
293
+ - s3:GetObject
294
+ - s3:DeleteObject
295
+ - s3:ListBucket
296
+ Resource:
297
+ - !GetAtt DataBucket.Arn
298
+ - !Sub "${DataBucket.Arn}/*"
299
+ - PolicyName: rulebricks-amp-remote-write
300
+ PolicyDocument:
301
+ Version: "2012-10-17"
302
+ Statement:
303
+ - Effect: Allow
304
+ Action:
305
+ - aps:RemoteWrite
306
+ Resource:
307
+ - !GetAtt PrometheusWorkspace.Arn
308
+ - PolicyName: rulebricks-msk-iam
309
+ PolicyDocument:
310
+ Version: "2012-10-17"
311
+ Statement:
312
+ # AWS MSK IAM access for HPS + the Vector bridge when Kafka is
313
+ # externalized to MSK. Account-scoped so any managed cluster works
314
+ # without re-provisioning; tighten Resource to a specific cluster
315
+ # ARN to lock it down. Harmless when Kafka runs in-cluster.
316
+ - Effect: Allow
317
+ Action:
318
+ - kafka-cluster:Connect
319
+ - kafka-cluster:DescribeCluster
320
+ - kafka-cluster:DescribeClusterDynamicConfiguration
321
+ Resource:
322
+ - !Sub "arn:${AWS::Partition}:kafka:*:${AWS::AccountId}:cluster/*/*"
323
+ - Effect: Allow
324
+ Action:
325
+ - kafka-cluster:*Topic*
326
+ - kafka-cluster:WriteData
327
+ - kafka-cluster:ReadData
328
+ Resource:
329
+ - !Sub "arn:${AWS::Partition}:kafka:*:${AWS::AccountId}:topic/*/*/*"
330
+ - Effect: Allow
331
+ Action:
332
+ - kafka-cluster:AlterGroup
333
+ - kafka-cluster:DescribeGroup
334
+ Resource:
335
+ - !Sub "arn:${AWS::Partition}:kafka:*:${AWS::AccountId}:group/*/*/*"
336
+
337
+ # Pod Identity associations (hps / vector / clickhouse / backup / prometheus) are
338
+ # namespace-scoped, so the Rulebricks CLI creates them at `rulebricks deploy`
339
+ # time. This stack only provisions the deployment-independent role, bucket, and
340
+ # AMP workspace, so one cluster can host many deployments.
341
+
342
+ Outputs:
343
+ ClusterName:
344
+ Value: !Ref Cluster
345
+ KubeconfigCommand:
346
+ Value: !Sub "aws eks update-kubeconfig --name ${ClusterName} --region ${AWS::Region}"
347
+ DataBucketName:
348
+ Value: !Ref DataBucket
349
+ RulebricksRoleArn:
350
+ Value: !GetAtt RulebricksRole.Arn
351
+ PrometheusWorkspaceId:
352
+ Value: !Ref PrometheusWorkspace
353
+ PrometheusRemoteWriteUrl:
354
+ Description: Set as Prometheus remote_write url (append /api/v1/remote_write).
355
+ Value: !GetAtt PrometheusWorkspace.PrometheusEndpoint
@@ -0,0 +1,141 @@
1
+ # Azure Cluster Setup
2
+
3
+ A compact, turnkey AKS cluster for Rulebricks. One Bicep deploy creates the
4
+ cluster **and** the object storage + Azure Monitor resources the platform needs,
5
+ fully wired to Workload Identity. Bring-your-own infra is also supported.
6
+
7
+ ## Files
8
+
9
+ - `rulebricks-cluster.bicep` — AKS cluster (Azure CNI, Calico, Standard LB, Disk CSI, OIDC issuer, Workload Identity) plus the single Rulebricks identity and its data paths.
10
+ - `parameters.json` — sample parameters (turnkey defaults: all paths on).
11
+ - `check-aks-prereqs.sh` — verifies login, providers, quota, role-assignment rights, kubectl/helm.
12
+
13
+ ## One identity, one container (deployment-independent)
14
+
15
+ A single user-assigned identity, `<cluster>-rulebricks`, holds both data roles,
16
+ and all data lives in one container, `<cluster>-data`, under per-purpose prefixes.
17
+ Toggle each path with its `enable*` flag.
18
+
19
+ | Path | Service account | Role / target |
20
+ | --------------------------------- | --------------------------- | --------------------------------------------------------------- |
21
+ | Decision logs (Vector → Blob) | `vector` | Storage Blob Data Contributor → `<cluster>-data/decision-logs/` |
22
+ | DB backups (job → Blob) | `<release>-backup` | Storage Blob Data Contributor → `<cluster>-data/db-backups/` |
23
+ | Metrics (Prometheus remote write) | `prometheus` | Monitoring Metrics Publisher → Azure Monitor DCR |
24
+
25
+ The identity has Storage Blob Data Contributor on the storage account and
26
+ Monitoring Metrics Publisher on the DCR.
27
+
28
+ > **This template does not need a deployment name.** Federated identity credentials are
29
+ > `namespace`-scoped (`system:serviceaccount:rulebricks-<deploymentName>:<sa>`), so they can't be
30
+ > created until the deployment namespace is known. The **Rulebricks CLI creates them at
31
+ > `rulebricks deploy` time** against this identity. That keeps cluster-setup generic, so one cluster
32
+ > can host any number of deployments without re-running it — the CLI adds each deployment's
33
+ > credentials on deploy. (Azure wildcard "flexible" FICs would avoid even that, but they're
34
+ > unsupported on managed identities and AKS OIDC issuers.)
35
+
36
+ ## Turnkey vs. bring-your-own
37
+
38
+ - `createStorage: true` provisions a storage account + the single `<cluster>-data` container (deterministic globally-unique account name). `false` → set `existingStorageAccountName`.
39
+ - `createMonitorWorkspace: true` provisions an Azure Monitor workspace + data collection endpoint + rule, so the metrics role is scoped to a DCR we own. `false` → set `existingDataCollectionRuleId`.
40
+
41
+ Defaults are turnkey: `createStorage`, `createMonitorWorkspace`, and all `enable*` flags are `true`.
42
+
43
+ ## Core cluster parameters
44
+
45
+ `clusterName` (`rulebricks-cluster`), `location` (`eastus`), `kubernetesVersion`
46
+ (`1.34`), `nodeCount`/`maxNodeCount` (`2`/`4`), `nodeVmSize`
47
+ (`Standard_F4as_v6`), `maxPods` (`110`), `osDiskSizeGB` (`64`), `osDiskType`
48
+ (`Managed`). The default (core) pool runs the always-on services on two to
49
+ four 4-vCPU nodes; burst capacity lives in the dedicated burst pool below.
50
+ The `110` max-pods avoids the legacy 30/node limit, and the autoscaler
51
+ profile is tuned for bursts (`scan-interval` 10s, `least-waste` expander).
52
+ Both pools use `Deallocate` scale-down: removed nodes are parked (disk-only
53
+ cost, container images cached) and resume in ~30-60s instead of
54
+ reprovisioning.
55
+
56
+ ### Burst worker pool (default on)
57
+
58
+ `enableBurstPool` (`true`), `burstVmSize` (`Standard_F16as_v6`, 16 vCPU -
59
+ the Fas_v6 family has no 24-vCPU size), `burstMaxCount` (`1`). One large
60
+ `User`-mode node that scales 0 -> 1 on demand and parks between bursts. It
61
+ is labeled and tainted `rulebricks.com/pool=burst`: the Rulebricks chart
62
+ makes workers tolerate the taint and softly prefer the label out of the box,
63
+ so the entire scaled-out worker fleet lands on this node while core services
64
+ stay on the default pool. Sizing math: 2 x 4 vCPU core floor + 16 vCPU burst
65
+ = 24 vCPU running steady-state at full burst, and exactly 32 vCPU even with
66
+ the core pool at its 4-node max - sized to a 32-vCPU family quota.
67
+ First-ever burst
68
+ cold-provisions the VM (~2-4 min); every burst after resumes the parked VM
69
+ (~30-60s). Note deallocated VMs resume into their original zone/SKU - in a
70
+ capacity-constrained region a resume can fail and the autoscaler retries;
71
+ the warm worker floor on the core pool carries traffic in the meantime.
72
+
73
+ ## Check access
74
+
75
+ ```bash
76
+ az login
77
+ az account set --subscription <subscription-id>
78
+ AZURE_LOCATION=eastus bash check-aks-prereqs.sh
79
+ ```
80
+
81
+ Register any flagged providers with the suggested `az provider register`
82
+ commands and wait for completion. Note: creating role assignments needs
83
+ **Owner** or **User Access Administrator** — Contributor alone is not enough.
84
+
85
+ ## Create the cluster
86
+
87
+ ```bash
88
+ az group create --name rulebricks-rg --location eastus
89
+ az deployment group create \
90
+ --resource-group rulebricks-rg \
91
+ --template-file rulebricks-cluster.bicep \
92
+ --parameters @parameters.json
93
+
94
+ az aks get-credentials --name rulebricks-cluster --resource-group rulebricks-rg
95
+ ```
96
+
97
+ Run `rulebricks init` once kubeconfig works, then select this cluster. The
98
+ deploy emits `rulebricksClientId`, the generated `storageAccountName`, the
99
+ `dataContainer` name, and `dceMetricsIngestionEndpoint` / `dcrImmutableId` for
100
+ the CLI to consume.
101
+
102
+ > Managed-Prometheus role assignments take ~30 min to propagate; expect HTTP 403
103
+ > in the Prometheus log until then. This is expected, not a misconfiguration.
104
+
105
+ ## Delete the cluster
106
+
107
+ Run `rulebricks destroy <deployment-name>` first so Kubernetes removes
108
+ LoadBalancer services and PVC-backed disks. Then delete the resource group:
109
+
110
+ ```bash
111
+ az group delete --name rulebricks-rg --yes
112
+ ```
113
+
114
+ AKS cascade-deletes its `MC_<rg>_<cluster>_<region>` node resource group, so
115
+ this removes the cluster, node pool, identities, role assignments, federated
116
+ credentials, and (when created by the template) the storage account and Azure
117
+ Monitor workspace.
118
+
119
+ ## Notes
120
+
121
+ - Inbound TCP `80`/`443` are open to the AKS subnet for LoadBalancer services and cert-manager HTTP-01 validation.
122
+ - `maxPods` is fixed at node-pool creation; changing it means a replacement pool or recreate.
123
+ - Federated identity credentials for vector/backup/prometheus are created by the Rulebricks CLI at deploy time, so this template takes no deployment name. (The optional `external-dns` path is the one exception — set `rulebricksNamespace` if you enable it.)
124
+ - BYO storage/monitor resources outside this resource group need an admin to assign the relevant role to the emitted identity client ID.
125
+
126
+ ## Fallback secret-based auth
127
+
128
+ If Workload Identity is unavailable, decision-log export can use a storage
129
+ connection string, and metrics can use OAuth client-secret auth:
130
+
131
+ ```bash
132
+ kubectl create secret generic azure-blob-logs \
133
+ --namespace rulebricks-demo \
134
+ --from-literal=connection-string='<connection-string>'
135
+ # CLI prompt: azure-blob-logs:connection-string
136
+
137
+ kubectl create secret generic azure-monitor-oauth \
138
+ --namespace rulebricks-demo \
139
+ --from-literal=client-secret='<client-secret>'
140
+ # CLI prompt: azure-monitor-oauth:client-secret
141
+ ```