cluster-builder 0.3.0__tar.gz → 0.3.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of cluster-builder might be problematic. Click here for more details.

Files changed (44) hide show
  1. cluster_builder-0.3.2/PKG-INFO +339 -0
  2. cluster_builder-0.3.2/README.md +323 -0
  3. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/config/cluster.py +35 -7
  4. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/config/postgres.py +4 -1
  5. cluster_builder-0.3.2/cluster_builder/infrastructure/executor.py +88 -0
  6. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/infrastructure/templates.py +2 -2
  7. cluster_builder-0.3.2/cluster_builder/swarmchestrate.py +593 -0
  8. cluster_builder-0.3.2/cluster_builder/templates/aws/main.tf +156 -0
  9. cluster_builder-0.3.2/cluster_builder/templates/deploy_manifest.tf +43 -0
  10. cluster_builder-0.3.2/cluster_builder/templates/edge/main.tf +98 -0
  11. cluster_builder-0.3.2/cluster_builder/templates/ha_user_data.sh.tpl +33 -0
  12. cluster_builder-0.3.2/cluster_builder/templates/master_user_data.sh.tpl +37 -0
  13. cluster_builder-0.3.2/cluster_builder/templates/openstack/main.tf +218 -0
  14. cluster_builder-0.3.2/cluster_builder/templates/openstack_provider.tf +70 -0
  15. cluster_builder-0.3.2/cluster_builder/templates/worker_user_data.sh.tpl +34 -0
  16. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/utils/hcl.py +91 -15
  17. cluster_builder-0.3.2/cluster_builder.egg-info/PKG-INFO +339 -0
  18. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder.egg-info/SOURCES.txt +4 -3
  19. cluster_builder-0.3.2/cluster_builder.egg-info/requires.txt +6 -0
  20. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/pyproject.toml +8 -5
  21. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/tests/test_hcl.py +33 -20
  22. cluster_builder-0.3.0/PKG-INFO +0 -264
  23. cluster_builder-0.3.0/README.md +0 -250
  24. cluster_builder-0.3.0/cluster_builder/infrastructure/executor.py +0 -88
  25. cluster_builder-0.3.0/cluster_builder/swarmchestrate.py +0 -373
  26. cluster_builder-0.3.0/cluster_builder/templates/aws/main.tf +0 -93
  27. cluster_builder-0.3.0/cluster_builder/templates/edge/main.tf.j2 +0 -40
  28. cluster_builder-0.3.0/cluster_builder/templates/ha_user_data.sh.tpl +0 -2
  29. cluster_builder-0.3.0/cluster_builder/templates/master_user_data.sh.tpl +0 -6
  30. cluster_builder-0.3.0/cluster_builder/templates/openstack/main.tf.j2 +0 -76
  31. cluster_builder-0.3.0/cluster_builder/templates/openstack/network_security_group.tf.j2 +0 -34
  32. cluster_builder-0.3.0/cluster_builder/templates/worker_user_data.sh.tpl +0 -2
  33. cluster_builder-0.3.0/cluster_builder.egg-info/PKG-INFO +0 -264
  34. cluster_builder-0.3.0/cluster_builder.egg-info/requires.txt +0 -4
  35. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/LICENSE +0 -0
  36. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/__init__.py +0 -0
  37. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/config/__init__.py +0 -0
  38. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/infrastructure/__init__.py +0 -0
  39. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/templates/aws_provider.tf +0 -0
  40. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/utils/__init__.py +0 -0
  41. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder/utils/logging.py +0 -0
  42. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder.egg-info/dependency_links.txt +0 -0
  43. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/cluster_builder.egg-info/top_level.txt +0 -0
  44. {cluster_builder-0.3.0 → cluster_builder-0.3.2}/setup.cfg +0 -0
@@ -0,0 +1,339 @@
1
+ Metadata-Version: 2.4
2
+ Name: cluster-builder
3
+ Version: 0.3.2
4
+ Summary: Swarmchestrate cluster builder
5
+ Author-email: Gunjan <G.Kotak@westminster.ac.uk>, Jay <J.Deslauriers@westminster.ac.uk>
6
+ License: Apache2
7
+ Description-Content-Type: text/markdown
8
+ License-File: LICENSE
9
+ Requires-Dist: names_generator==0.2.0
10
+ Requires-Dist: python-hcl2==7.2
11
+ Requires-Dist: lark-parser==0.12.0
12
+ Requires-Dist: python-dotenv==1.1.1
13
+ Requires-Dist: psycopg2-binary==2.9.10
14
+ Requires-Dist: yaspin==3.1.0
15
+ Dynamic: license-file
16
+
17
+ # Swarmchestrate - Cluster Builder
18
+
19
+ This repository contains the codebase for **cluster-builder**, which builds K3s clusters for Swarmchestrate using OpenTofu.
20
+
21
+ Key features:
22
+ - **Create**: Provisions infrastructure using OpenTofu and installs K3s.
23
+ - **Add**: Add worker or HA nodes to existing clusters.
24
+ - **Remove**: Selectively remove nodes from existing clusters.
25
+ - **Delete**: Destroys the provisioned infrastructure when no longer required.
26
+
27
+ ---
28
+
29
+ ## Prerequisites
30
+
31
+ Before proceeding, ensure the following prerequisites are installed:
32
+
33
+ 1. **Git**: For cloning the repository.
34
+ 2. **Python**: Version 3.9 or higher.
35
+ 3. **pip**: Python package manager.
36
+ 4. **Make**: To run the provided `Makefile`.
37
+ 5. **PostgreSQL**: For storing OpenTofu state.
38
+ 6. (Optional) **Docker**: To create a dev Postgres
39
+ 7. For detailed instructions on **edge device requirements**, refer to the [Edge Device Requirements](docs/edge-requirements.md) document.
40
+
41
+ ---
42
+
43
+ ## Getting Started
44
+
45
+ ### 1. Clone the Repository
46
+
47
+ To get started, clone this repository:
48
+
49
+ ```bash
50
+ git clone https://github.com/Swarmchestrate/cluster-builder.git
51
+ ```
52
+
53
+ ### 2. Navigate to the Project Directory
54
+
55
+ ```bash
56
+ cd cluster-builder
57
+ ```
58
+
59
+ ### 3. Install Dependencies and Tools
60
+
61
+ Run the Makefile to install all necessary dependencies, including OpenTofu:
62
+
63
+ ```bash
64
+ make install
65
+ ```
66
+
67
+ This command will:
68
+ - Install Python dependencies listed in requirements.txt.
69
+ - Download and configure OpenTofu for infrastructure management.
70
+
71
+ ```bash
72
+ make db
73
+ ```
74
+
75
+ This command will:
76
+ - Spin up an empty dev Postgres DB (in Docker) for storing state
77
+
78
+ in ths makefile database details are provide you update or use that ones name pg-db -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=adminpass -e POSTGRES_DB=swarmchestrate
79
+
80
+ For database setup as a service, refer to the [database setup as service](docs/database_setup.md) document
81
+
82
+ ### 4. Populate .env file with access config
83
+ The .env file is used to store environment variables required by the application. It contains configuration details for connecting to your cloud providers, the PostgreSQL database, and any other necessary resources.
84
+
85
+ #### 4.1. Rename or copy the example file to **.env**
86
+
87
+ ```bash
88
+ cp .env_example .env
89
+ ```
90
+
91
+ #### 4.2. Open the **.env** file and add the necessary configuration for your cloud providers and PostgreSQL:
92
+
93
+ ```ini
94
+ ## PG Configuration
95
+ POSTGRES_USER=postgres
96
+ POSTGRES_PASSWORD=secret
97
+ POSTGRES_HOST=db.example.com
98
+ POSTGRES_DATABASE=terraform_state
99
+ POSTGRES_SSLMODE=prefer
100
+
101
+ ## AWS Auth
102
+ TF_VAR_aws_region=us-west-2
103
+ TF_VAR_aws_access_key=AKIAXXXXXXXXXXXXXXXX
104
+ TF_VAR_aws_secret_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
105
+
106
+ ## OpenStack Auth - AppCreds Mode
107
+ TF_VAR_openstack_auth_method=appcreds
108
+ TF_VAR_openstack_auth_url=https://openstack.example.com:5000
109
+ TF_VAR_openstack_application_credential_id=fdXXXXXXXXXXXXXXXX
110
+ TF_VAR_openstack_application_credential_secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
111
+ TF_VAR_openstack_region=RegionOne
112
+
113
+ ## OpenStack Auth - User/Pass Mode
114
+ # TF_VAR_openstack_auth_method=userpass
115
+ # TF_VAR_openstack_auth_url=https://openstack.example.com:5000
116
+ # TF_VAR_openstack_region=RegionOne
117
+ # TF_VAR_openstack_user_name=myuser
118
+ # TF_VAR_openstack_password=mypassword
119
+ # TF_VAR_openstack_project_id=project-id-123
120
+ # TF_VAR_openstack_user_domain_name=Default
121
+ ```
122
+
123
+ ---
124
+
125
+ ## Basic Usage
126
+
127
+ ### Initialisation
128
+
129
+ ```python
130
+ from cluster_builder import Swarmchestrate
131
+
132
+ # Initialise the orchestrator
133
+ orchestrator = Swarmchestrate(
134
+ template_dir="/path/to/templates",
135
+ output_dir="/path/to/output"
136
+ )
137
+ ```
138
+
139
+ ### Creating a New Cluster
140
+
141
+ To create a new k3s cluster, use the **add_node** method with the **master** role:
142
+
143
+ ```python
144
+ # Configuration for a new cluster using aws provider
145
+ config = {
146
+ "cloud": "aws",
147
+ "k3s_role": "master",
148
+ "ha": False, # Set to True for high availability (HA) deployments
149
+ "instance_type": "t2.small", # AWS instance type
150
+ "ssh_key_name": "g", # SSH key name for AWS or OpenStack
151
+ "ssh_user": "ec2-user", # SSH user for the instance
152
+ "ssh_private_key_path": "/workspaces/cluster-builder/scripts/g.pem", # Path to SSH private key
153
+ "ami": "ami-0c0493bbac867d427", # AMI ID for AWS (specific to region)
154
+ # Optional parameters
155
+ # If existing SG is specified, it will be used directly with no port changes
156
+ "security_group_id": "sg-0123456789abcdef0",
157
+ # No security_group_id means a new SG will be created and these ports applied as rules
158
+ # These ports will be used ONLY if creating a new SG
159
+ "tcp_ports": [10020], # Optional list of TCP ports to open
160
+ "udp_ports": [1003] # Optional list of UDP ports to open
161
+ }
162
+
163
+ # Create the cluster (returns the cluster name)
164
+ cluster_name = orchestrator.add_node(config)
165
+ print(f"Created cluster: {cluster_name}")
166
+ ```
167
+
168
+ Note: Fetch the outputs from the master node and use them when adding a worker node.
169
+
170
+ ### Adding Nodes to an Existing Cluster
171
+
172
+ To add worker or high-availability nodes to an existing cluster:
173
+
174
+ ```python
175
+ # Configuration for adding a worker node using aws provider
176
+ worker_config = {
177
+ "cloud": "aws",
178
+ "k3s_role": "worker", # Role can be 'worker' or 'ha'
179
+ "instance_type": "t2.small", # AWS instance type
180
+ "ssh_key_name": "g", # SSH key name
181
+ "ssh_user": "ec2-user", # SSH user for the instance
182
+ "ssh_private_key_path": "/workspaces/cluster-builder/scripts/g.pem", # Path to SSH private key
183
+ "ami": "ami-0c0493bbac867d427", # AMI ID for AWS
184
+ # Additional parameters obtained after deploying the master node:
185
+ "master_ip": "12.13.14.15", # IP address of the master node (required for worker/HA roles)
186
+ "cluster_name": "elastic_mcnulty", # Name of the cluster
187
+ "k3s_token": "G4lm7wEaFuCCygeU", # Token of the cluster
188
+ # Optional parameters
189
+ # If existing SG is specified, it will be used directly with no port changes
190
+ "security_group_id": "sg-0123456789abcdef0",
191
+ # No security_group_id means a new SG will be created and these ports applied as rules
192
+ # These ports will be used ONLY if creating a new SG
193
+ "tcp_ports": [10020], # Optional list of TCP ports to open
194
+ "udp_ports": [1003] # Optional list of UDP ports to open
195
+ }
196
+
197
+ # Add the worker node
198
+ cluster_name = orchestrator.add_node(worker_config)
199
+ print(f"Added worker node to cluster: {cluster_name}")
200
+ ```
201
+
202
+ ### Removing a Specific Node
203
+
204
+ To remove a specific node from a cluster:
205
+
206
+ ```python
207
+ # Remove a node by its resource name
208
+ orchestrator.remove_node(
209
+ cluster_name="your-cluster-name",
210
+ resource_name="aws_eloquent_feynman" # The resource identifier of the node
211
+ )
212
+ ```
213
+
214
+ The **remove_node** method:
215
+ 1. Destroys the node's infrastructure resources
216
+ 2. Removes the node's configuration from the cluster
217
+
218
+ ---
219
+
220
+ ### Destroying an Entire Cluster
221
+
222
+ To completely destroy a cluster and all its nodes:
223
+
224
+ ```python
225
+ # Destroy the entire cluster
226
+ orchestrator.destroy(
227
+ cluster_name="your-cluster-name"
228
+ )
229
+ ```
230
+
231
+ The **destroy** method:
232
+ 1. Destroys all infrastructure resources associated with the cluster
233
+ 2. Removes the cluster directory and configuration files
234
+
235
+ Note for **Edge Devices**:
236
+ Since the edge device is already provisioned, the `destroy` method will not remove K3s directly from the edge device. You will need to manually uninstall K3s from your edge device after the cluster is destroyed.
237
+
238
+ ---
239
+ ### Deploying Manifests
240
+
241
+ The deploy_manifests method copies Kubernetes manifests to the target cluster node.
242
+
243
+ ```python
244
+ orchestrator.deploy_manifests(
245
+ manifest_folder="path/to/manifests",
246
+ master_ip="MASTER_NODE_IP",
247
+ ssh_key_path="path/to/key.pem",
248
+ ssh_user="USERNAME"
249
+ )
250
+ ```
251
+
252
+ ## Important Configuration Requirements
253
+ ### High Availability Flag (ha):
254
+
255
+ - The ha flag should be set to True for high availability deployment (usually when adding a ha or worker node to an existing master).
256
+
257
+ ### SSH Credentials:
258
+
259
+ - For all roles (k3s_role="master", k3s_role="worker", k3s_role="ha"), you must specify both ssh_user and ssh_private_key_path except for edge.
260
+
261
+ - The ssh_private_key_path should be the path to your SSH private key file. Ensure that the SSH key is copied to the specified path before running the script.
262
+
263
+ - The ssh_key_name and the ssh_private_key_path are different—ensure that your SSH key is placed correctly at the provided ssh_private_key_path.
264
+
265
+ ### Ports:
266
+ You can specify custom ports for your nodes in the tcp_ports and udp_ports fields. However, certain ports are required for Kubernetes deployment (even if not specified explicitly):
267
+
268
+ **TCP Ports:**
269
+
270
+ - 2379-2380: For etcd communication
271
+ - 6443: K3s API server
272
+ - 10250: Kubelet metrics
273
+ - 51820-51821: WireGuard (for encrypted networking)
274
+ - 22: SSH access
275
+ - 80, 443: HTTP/HTTPS access
276
+ - 53: DNS (CoreDNS)
277
+ - 5432: PostgreSQL access (master node)
278
+
279
+ **UDP Ports:**
280
+
281
+ - 8472: VXLAN for Flannel
282
+ - 53: DNS
283
+
284
+ ### OpenStack:
285
+ When provisioning on OpenStack, you should provide the value for 'floating_ip_pool' from which floating IPs can be allocated for the instance. If not specified, OpenTofu will not assign floating IP.
286
+
287
+ ---
288
+
289
+ ## Advanced Usage
290
+
291
+ ### Dry Run Mode
292
+
293
+ All operations support a **dryrun** parameter, which validates the configuration
294
+ without making changes. A node created with dryrun should be removed with dryrun.
295
+
296
+ ```python
297
+ # Validate configuration without deploying
298
+ orchestrator.add_node(config, dryrun=True)
299
+
300
+ # Validate removal without destroying
301
+ orchestrator.remove_node(cluster_name, resource_name, dryrun=True)
302
+
303
+ # Validate destruction without destroying
304
+ orchestrator.destroy(cluster_name, dryrun=True)
305
+ ```
306
+
307
+ ### Custom Cluster Names
308
+
309
+ By default, cluster names are generated automatically. To specify a custom name:
310
+
311
+ ```python
312
+ config = {
313
+ "cloud": "aws",
314
+ "k3s_role": "master",
315
+ "cluster_name": "production-cluster",
316
+ # ... other configuration ...
317
+ }
318
+
319
+ orchestrator.add_node(config)
320
+ ```
321
+
322
+ ---
323
+
324
+ ## Template Structure
325
+
326
+ Templates should be organised as follows:
327
+ - `templates/` - Base directory for templates
328
+ - `templates/{cloud}/` - Terraform modules for each cloud provider
329
+ - `templates/{role}_user_data.sh.tpl` - Node initialisation scripts
330
+ - `templates/{cloud}_provider.tf.j2` - Provider configuration templates
331
+
332
+ ---
333
+
334
+ ## DEMO
335
+ Some test scripts have been created for demonstrating the functionality of the cluster builder. These scripts can be referred to for understanding how the system works and for testing various configurations.
336
+
337
+ For detailed service deployment examples and to explore the test scripts, refer to the [test scripts](docs/test-scripts.md) document
338
+
339
+ ---
@@ -0,0 +1,323 @@
1
+ # Swarmchestrate - Cluster Builder
2
+
3
+ This repository contains the codebase for **cluster-builder**, which builds K3s clusters for Swarmchestrate using OpenTofu.
4
+
5
+ Key features:
6
+ - **Create**: Provisions infrastructure using OpenTofu and installs K3s.
7
+ - **Add**: Add worker or HA nodes to existing clusters.
8
+ - **Remove**: Selectively remove nodes from existing clusters.
9
+ - **Delete**: Destroys the provisioned infrastructure when no longer required.
10
+
11
+ ---
12
+
13
+ ## Prerequisites
14
+
15
+ Before proceeding, ensure the following prerequisites are installed:
16
+
17
+ 1. **Git**: For cloning the repository.
18
+ 2. **Python**: Version 3.9 or higher.
19
+ 3. **pip**: Python package manager.
20
+ 4. **Make**: To run the provided `Makefile`.
21
+ 5. **PostgreSQL**: For storing OpenTofu state.
22
+ 6. (Optional) **Docker**: To create a dev Postgres
23
+ 7. For detailed instructions on **edge device requirements**, refer to the [Edge Device Requirements](docs/edge-requirements.md) document.
24
+
25
+ ---
26
+
27
+ ## Getting Started
28
+
29
+ ### 1. Clone the Repository
30
+
31
+ To get started, clone this repository:
32
+
33
+ ```bash
34
+ git clone https://github.com/Swarmchestrate/cluster-builder.git
35
+ ```
36
+
37
+ ### 2. Navigate to the Project Directory
38
+
39
+ ```bash
40
+ cd cluster-builder
41
+ ```
42
+
43
+ ### 3. Install Dependencies and Tools
44
+
45
+ Run the Makefile to install all necessary dependencies, including OpenTofu:
46
+
47
+ ```bash
48
+ make install
49
+ ```
50
+
51
+ This command will:
52
+ - Install Python dependencies listed in requirements.txt.
53
+ - Download and configure OpenTofu for infrastructure management.
54
+
55
+ ```bash
56
+ make db
57
+ ```
58
+
59
+ This command will:
60
+ - Spin up an empty dev Postgres DB (in Docker) for storing state
61
+
62
+ in ths makefile database details are provide you update or use that ones name pg-db -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=adminpass -e POSTGRES_DB=swarmchestrate
63
+
64
+ For database setup as a service, refer to the [database setup as service](docs/database_setup.md) document
65
+
66
+ ### 4. Populate .env file with access config
67
+ The .env file is used to store environment variables required by the application. It contains configuration details for connecting to your cloud providers, the PostgreSQL database, and any other necessary resources.
68
+
69
+ #### 4.1. Rename or copy the example file to **.env**
70
+
71
+ ```bash
72
+ cp .env_example .env
73
+ ```
74
+
75
+ #### 4.2. Open the **.env** file and add the necessary configuration for your cloud providers and PostgreSQL:
76
+
77
+ ```ini
78
+ ## PG Configuration
79
+ POSTGRES_USER=postgres
80
+ POSTGRES_PASSWORD=secret
81
+ POSTGRES_HOST=db.example.com
82
+ POSTGRES_DATABASE=terraform_state
83
+ POSTGRES_SSLMODE=prefer
84
+
85
+ ## AWS Auth
86
+ TF_VAR_aws_region=us-west-2
87
+ TF_VAR_aws_access_key=AKIAXXXXXXXXXXXXXXXX
88
+ TF_VAR_aws_secret_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
89
+
90
+ ## OpenStack Auth - AppCreds Mode
91
+ TF_VAR_openstack_auth_method=appcreds
92
+ TF_VAR_openstack_auth_url=https://openstack.example.com:5000
93
+ TF_VAR_openstack_application_credential_id=fdXXXXXXXXXXXXXXXX
94
+ TF_VAR_openstack_application_credential_secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
95
+ TF_VAR_openstack_region=RegionOne
96
+
97
+ ## OpenStack Auth - User/Pass Mode
98
+ # TF_VAR_openstack_auth_method=userpass
99
+ # TF_VAR_openstack_auth_url=https://openstack.example.com:5000
100
+ # TF_VAR_openstack_region=RegionOne
101
+ # TF_VAR_openstack_user_name=myuser
102
+ # TF_VAR_openstack_password=mypassword
103
+ # TF_VAR_openstack_project_id=project-id-123
104
+ # TF_VAR_openstack_user_domain_name=Default
105
+ ```
106
+
107
+ ---
108
+
109
+ ## Basic Usage
110
+
111
+ ### Initialisation
112
+
113
+ ```python
114
+ from cluster_builder import Swarmchestrate
115
+
116
+ # Initialise the orchestrator
117
+ orchestrator = Swarmchestrate(
118
+ template_dir="/path/to/templates",
119
+ output_dir="/path/to/output"
120
+ )
121
+ ```
122
+
123
+ ### Creating a New Cluster
124
+
125
+ To create a new k3s cluster, use the **add_node** method with the **master** role:
126
+
127
+ ```python
128
+ # Configuration for a new cluster using aws provider
129
+ config = {
130
+ "cloud": "aws",
131
+ "k3s_role": "master",
132
+ "ha": False, # Set to True for high availability (HA) deployments
133
+ "instance_type": "t2.small", # AWS instance type
134
+ "ssh_key_name": "g", # SSH key name for AWS or OpenStack
135
+ "ssh_user": "ec2-user", # SSH user for the instance
136
+ "ssh_private_key_path": "/workspaces/cluster-builder/scripts/g.pem", # Path to SSH private key
137
+ "ami": "ami-0c0493bbac867d427", # AMI ID for AWS (specific to region)
138
+ # Optional parameters
139
+ # If existing SG is specified, it will be used directly with no port changes
140
+ "security_group_id": "sg-0123456789abcdef0",
141
+ # No security_group_id means a new SG will be created and these ports applied as rules
142
+ # These ports will be used ONLY if creating a new SG
143
+ "tcp_ports": [10020], # Optional list of TCP ports to open
144
+ "udp_ports": [1003] # Optional list of UDP ports to open
145
+ }
146
+
147
+ # Create the cluster (returns the cluster name)
148
+ cluster_name = orchestrator.add_node(config)
149
+ print(f"Created cluster: {cluster_name}")
150
+ ```
151
+
152
+ Note: Fetch the outputs from the master node and use them when adding a worker node.
153
+
154
+ ### Adding Nodes to an Existing Cluster
155
+
156
+ To add worker or high-availability nodes to an existing cluster:
157
+
158
+ ```python
159
+ # Configuration for adding a worker node using aws provider
160
+ worker_config = {
161
+ "cloud": "aws",
162
+ "k3s_role": "worker", # Role can be 'worker' or 'ha'
163
+ "instance_type": "t2.small", # AWS instance type
164
+ "ssh_key_name": "g", # SSH key name
165
+ "ssh_user": "ec2-user", # SSH user for the instance
166
+ "ssh_private_key_path": "/workspaces/cluster-builder/scripts/g.pem", # Path to SSH private key
167
+ "ami": "ami-0c0493bbac867d427", # AMI ID for AWS
168
+ # Additional parameters obtained after deploying the master node:
169
+ "master_ip": "12.13.14.15", # IP address of the master node (required for worker/HA roles)
170
+ "cluster_name": "elastic_mcnulty", # Name of the cluster
171
+ "k3s_token": "G4lm7wEaFuCCygeU", # Token of the cluster
172
+ # Optional parameters
173
+ # If existing SG is specified, it will be used directly with no port changes
174
+ "security_group_id": "sg-0123456789abcdef0",
175
+ # No security_group_id means a new SG will be created and these ports applied as rules
176
+ # These ports will be used ONLY if creating a new SG
177
+ "tcp_ports": [10020], # Optional list of TCP ports to open
178
+ "udp_ports": [1003] # Optional list of UDP ports to open
179
+ }
180
+
181
+ # Add the worker node
182
+ cluster_name = orchestrator.add_node(worker_config)
183
+ print(f"Added worker node to cluster: {cluster_name}")
184
+ ```
185
+
186
+ ### Removing a Specific Node
187
+
188
+ To remove a specific node from a cluster:
189
+
190
+ ```python
191
+ # Remove a node by its resource name
192
+ orchestrator.remove_node(
193
+ cluster_name="your-cluster-name",
194
+ resource_name="aws_eloquent_feynman" # The resource identifier of the node
195
+ )
196
+ ```
197
+
198
+ The **remove_node** method:
199
+ 1. Destroys the node's infrastructure resources
200
+ 2. Removes the node's configuration from the cluster
201
+
202
+ ---
203
+
204
+ ### Destroying an Entire Cluster
205
+
206
+ To completely destroy a cluster and all its nodes:
207
+
208
+ ```python
209
+ # Destroy the entire cluster
210
+ orchestrator.destroy(
211
+ cluster_name="your-cluster-name"
212
+ )
213
+ ```
214
+
215
+ The **destroy** method:
216
+ 1. Destroys all infrastructure resources associated with the cluster
217
+ 2. Removes the cluster directory and configuration files
218
+
219
+ Note for **Edge Devices**:
220
+ Since the edge device is already provisioned, the `destroy` method will not remove K3s directly from the edge device. You will need to manually uninstall K3s from your edge device after the cluster is destroyed.
221
+
222
+ ---
223
+ ### Deploying Manifests
224
+
225
+ The deploy_manifests method copies Kubernetes manifests to the target cluster node.
226
+
227
+ ```python
228
+ orchestrator.deploy_manifests(
229
+ manifest_folder="path/to/manifests",
230
+ master_ip="MASTER_NODE_IP",
231
+ ssh_key_path="path/to/key.pem",
232
+ ssh_user="USERNAME"
233
+ )
234
+ ```
235
+
236
+ ## Important Configuration Requirements
237
+ ### High Availability Flag (ha):
238
+
239
+ - The ha flag should be set to True for high availability deployment (usually when adding a ha or worker node to an existing master).
240
+
241
+ ### SSH Credentials:
242
+
243
+ - For all roles (k3s_role="master", k3s_role="worker", k3s_role="ha"), you must specify both ssh_user and ssh_private_key_path except for edge.
244
+
245
+ - The ssh_private_key_path should be the path to your SSH private key file. Ensure that the SSH key is copied to the specified path before running the script.
246
+
247
+ - The ssh_key_name and the ssh_private_key_path are different—ensure that your SSH key is placed correctly at the provided ssh_private_key_path.
248
+
249
+ ### Ports:
250
+ You can specify custom ports for your nodes in the tcp_ports and udp_ports fields. However, certain ports are required for Kubernetes deployment (even if not specified explicitly):
251
+
252
+ **TCP Ports:**
253
+
254
+ - 2379-2380: For etcd communication
255
+ - 6443: K3s API server
256
+ - 10250: Kubelet metrics
257
+ - 51820-51821: WireGuard (for encrypted networking)
258
+ - 22: SSH access
259
+ - 80, 443: HTTP/HTTPS access
260
+ - 53: DNS (CoreDNS)
261
+ - 5432: PostgreSQL access (master node)
262
+
263
+ **UDP Ports:**
264
+
265
+ - 8472: VXLAN for Flannel
266
+ - 53: DNS
267
+
268
+ ### OpenStack:
269
+ When provisioning on OpenStack, you should provide the value for 'floating_ip_pool' from which floating IPs can be allocated for the instance. If not specified, OpenTofu will not assign floating IP.
270
+
271
+ ---
272
+
273
+ ## Advanced Usage
274
+
275
+ ### Dry Run Mode
276
+
277
+ All operations support a **dryrun** parameter, which validates the configuration
278
+ without making changes. A node created with dryrun should be removed with dryrun.
279
+
280
+ ```python
281
+ # Validate configuration without deploying
282
+ orchestrator.add_node(config, dryrun=True)
283
+
284
+ # Validate removal without destroying
285
+ orchestrator.remove_node(cluster_name, resource_name, dryrun=True)
286
+
287
+ # Validate destruction without destroying
288
+ orchestrator.destroy(cluster_name, dryrun=True)
289
+ ```
290
+
291
+ ### Custom Cluster Names
292
+
293
+ By default, cluster names are generated automatically. To specify a custom name:
294
+
295
+ ```python
296
+ config = {
297
+ "cloud": "aws",
298
+ "k3s_role": "master",
299
+ "cluster_name": "production-cluster",
300
+ # ... other configuration ...
301
+ }
302
+
303
+ orchestrator.add_node(config)
304
+ ```
305
+
306
+ ---
307
+
308
+ ## Template Structure
309
+
310
+ Templates should be organised as follows:
311
+ - `templates/` - Base directory for templates
312
+ - `templates/{cloud}/` - Terraform modules for each cloud provider
313
+ - `templates/{role}_user_data.sh.tpl` - Node initialisation scripts
314
+ - `templates/{cloud}_provider.tf.j2` - Provider configuration templates
315
+
316
+ ---
317
+
318
+ ## DEMO
319
+ Some test scripts have been created for demonstrating the functionality of the cluster builder. These scripts can be referred to for understanding how the system works and for testing various configurations.
320
+
321
+ For detailed service deployment examples and to explore the test scripts, refer to the [test scripts](docs/test-scripts.md) document
322
+
323
+ ---