cem_acpt 0.11.0 → 0.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +8 -0
  3. data/.worktreeinclude +1 -0
  4. data/CLAUDE.md +64 -25
  5. data/Gemfile.lock +1 -1
  6. data/README.md +20 -7
  7. data/docs/ARCHITECTURE.md +1042 -0
  8. data/docs/rfcs/0000-template.md +54 -0
  9. data/docs/rfcs/0001-fix-bolt-missing-skip-path.md +105 -0
  10. data/docs/rfcs/0002-fix-default-character-substitutions.md +119 -0
  11. data/docs/rfcs/0003-windows-image-builder-template.md +110 -0
  12. data/docs/rfcs/0004-image-name-truncation-off-by-one.md +108 -0
  13. data/docs/rfcs/0005-os-dispatch-replace-windows-heuristic.md +117 -0
  14. data/docs/rfcs/0006-configurable-windows-bucket.md +96 -0
  15. data/docs/rfcs/0007-logging-quiet-and-typos.md +121 -0
  16. data/docs/rfcs/0008-namespace-platform-classes.md +110 -0
  17. data/docs/rfcs/0009-bolt-log-formatter-cleanup.md +111 -0
  18. data/docs/rfcs/0010-dead-code-cleanup.md +83 -0
  19. data/docs/rfcs/0011-provisioner-factory-consistency.md +89 -0
  20. data/docs/rfcs/README.md +34 -0
  21. data/lib/cem_acpt/cli.rb +10 -1
  22. data/lib/cem_acpt/config/cem_acpt.rb +4 -1
  23. data/lib/cem_acpt/image_builder/errors.rb +24 -0
  24. data/lib/cem_acpt/image_builder/provision_commands.rb +15 -3
  25. data/lib/cem_acpt/image_builder.rb +29 -2
  26. data/lib/cem_acpt/image_name_builder.rb +8 -1
  27. data/lib/cem_acpt/platform/gcp.rb +112 -106
  28. data/lib/cem_acpt/platform.rb +21 -19
  29. data/lib/cem_acpt/provision/terraform/linux.rb +1 -1
  30. data/lib/cem_acpt/provision/terraform/os_data.rb +23 -0
  31. data/lib/cem_acpt/provision/terraform/windows.rb +7 -1
  32. data/lib/cem_acpt/provision/terraform.rb +20 -16
  33. data/lib/cem_acpt/test_runner/log_formatter/bolt_summary_results_formatter.rb +2 -1
  34. data/lib/cem_acpt/test_runner/log_formatter.rb +0 -1
  35. data/lib/cem_acpt/test_runner.rb +21 -8
  36. data/lib/cem_acpt/utils/winrm_runner.rb +4 -3
  37. data/lib/cem_acpt/utils.rb +0 -12
  38. data/lib/cem_acpt/version.rb +1 -1
  39. data/lib/cem_acpt.rb +19 -7
  40. data/lib/terraform/gcp/linux/main.tf +6 -1
  41. data/lib/terraform/image/gcp/linux/main.tf +8 -1
  42. data/specifications/CEM-6713.md +165 -0
  43. data/specifications/CEM-6714.md +271 -0
  44. data/specifications/CEM-6715.md +133 -0
  45. data/specifications/CEM-6716.md +160 -0
  46. data/specifications/CEM-6717.md +239 -0
  47. data/specifications/CEM-6718.md +120 -0
  48. data/specifications/CEM-6719.md +173 -0
  49. metadata +26 -11
  50. data/.claude/settings.local.json +0 -7
  51. data/lib/cem_acpt/action_result.rb +0 -91
  52. data/lib/cem_acpt/puppet_helpers.rb +0 -38
  53. data/lib/cem_acpt/test_runner/log_formatter/bolt_error_formatter.rb +0 -65
  54. data/lib/cem_acpt/test_runner/log_formatter/bolt_output_formatter.rb +0 -54
@@ -0,0 +1,1042 @@
1
+ # cem_acpt Architecture
2
+
3
+ This document describes the architecture of `cem_acpt`, a Ruby gem that provides
4
+ two CLI tools for the Puppet SCE (Security Compliance Enforcement, formerly
5
+ CEM) module suite:
6
+
7
+ - **`cem_acpt`** — runs acceptance tests against ephemeral cloud nodes. For
8
+ each acceptance test it provisions a node, applies a Puppet manifest, runs
9
+ Goss-based infrastructure assertions over HTTP, optionally runs Bolt tasks,
10
+ and tears the node down.
11
+ - **`cem_acpt_image`** — builds the base VM images that `cem_acpt` later uses
12
+ as the boot disk for its test nodes.
13
+
14
+ The intended audience is engineers working on `cem_acpt` itself or on the
15
+ consuming Puppet modules (`sce_linux`, `sce_windows`, etc.). It assumes
16
+ familiarity with Ruby, Terraform, Puppet modules, and GCP at a basic level.
17
+
18
+ > Companion docs: [README.md](../README.md) covers user-facing usage and
19
+ > configuration. [CLAUDE.md](../CLAUDE.md) summarizes commands and
20
+ > conventions. This document describes how the code is organized and how
21
+ > the parts fit together.
22
+
23
+ ## Table of contents
24
+
25
+ 1. [Repository layout](#1-repository-layout)
26
+ 2. [Runtime entry points](#2-runtime-entry-points)
27
+ 3. [Configuration system](#3-configuration-system)
28
+ 4. [`cem_acpt` test-runner lifecycle](#4-cem_acpt-test-runner-lifecycle)
29
+ 5. [Test data & image-name builder](#5-test-data--image-name-builder)
30
+ 6. [Platform abstraction](#6-platform-abstraction)
31
+ 7. [Provisioner: Terraform](#7-provisioner-terraform)
32
+ 8. [Action framework](#8-action-framework)
33
+ 9. [Goss subsystem](#9-goss-subsystem)
34
+ 10. [Bolt subsystem](#10-bolt-subsystem)
35
+ 11. [Result aggregation & log formatting](#11-result-aggregation--log-formatting)
36
+ 12. [`cem_acpt_image` image-builder lifecycle](#12-cem_acpt_image-image-builder-lifecycle)
37
+ 13. [Windows path](#13-windows-path)
38
+ 14. [Cross-cutting utilities](#14-cross-cutting-utilities)
39
+ 15. [Logging](#15-logging)
40
+ 16. [On-disk state](#16-on-disk-state)
41
+ 17. [Test suite](#17-test-suite)
42
+ 18. [External dependencies](#18-external-dependencies)
43
+ 19. [Open questions / observed dead code](#19-open-questions--observed-dead-code)
44
+
45
+ ---
46
+
47
+ ## 1. Repository layout
48
+
49
+ ```
50
+ cem_acpt/
51
+ ├── exe/
52
+ │ ├── cem_acpt # bin entry point for the test runner
53
+ │ └── cem_acpt_image # bin entry point for the image builder
54
+ ├── lib/
55
+ │ ├── cem_acpt.rb # top-level dispatcher (CemAcpt.run)
56
+ │ ├── cem_acpt/ # all Ruby code
57
+ │ │ ├── cli.rb # OptionParser definitions for both binaries
58
+ │ │ ├── config.rb # requires Config::CemAcpt and Config::CemAcptImage
59
+ │ │ ├── config/ # config schema + merge engine
60
+ │ │ ├── core_ext.rb # Hash#dot_dig, Hash#dot_store, Array#split_into_groups
61
+ │ │ ├── logging.rb # MultiLogger / GitHub Actions formatting
62
+ │ │ ├── logging/ # log line formatters
63
+ │ │ ├── platform.rb # dynamic loader for platform implementations
64
+ │ │ ├── platform/ # base + GCP platform module
65
+ │ │ ├── provision.rb # provisioner factory
66
+ │ │ ├── provision/ # Terraform driver + OS-specific backends
67
+ │ │ ├── test_runner.rb # main TestRunner::Runner
68
+ │ │ ├── test_runner/ # log_formatter/, test_results.rb
69
+ │ │ ├── actions.rb # ActionGroup / Action / ActionConfig
70
+ │ │ ├── goss.rb # requires goss/api
71
+ │ │ ├── goss/ # HTTP client + response wrapping
72
+ │ │ ├── bolt.rb # Bolt::TestRunner
73
+ │ │ ├── bolt/ # cmd/, cmd.rb, errors, helpers, inventory,
74
+ │ │ │ # project, summary_results, tasks, tests, yaml_file
75
+ │ │ ├── image_builder.rb # TerraformBuilder (image-build flow)
76
+ │ │ ├── image_builder/ # Exec wrapper + ProvisionCommands
77
+ │ │ ├── image_name_builder.rb
78
+ │ │ ├── test_data.rb # acceptance-test discovery + variable expansion
79
+ │ │ ├── utils.rb # high-level helpers (e.g. Windows password polling)
80
+ │ │ ├── utils/ # shell, ssh, files, puppet, winrm_runner, etc.
81
+ │ │ └── version.rb
82
+ │ └── terraform/ # HCL templates copied to ~/.cem_acpt at runtime
83
+ │ ├── gcp/
84
+ │ │ ├── linux/ # test-node provisioning + systemd + log_service
85
+ │ │ └── windows/ # test-node skeleton (no remote-exec; see §13)
86
+ │ └── image/
87
+ │ └── gcp/
88
+ │ ├── linux/ # image-build node provisioning
89
+ │ └── windows/ # placeholder (.keep only) — no template yet
90
+ ├── spec/ # RSpec tests mirroring lib/
91
+ ├── docs/ # this directory
92
+ ├── sample_config.yaml
93
+ ├── cem_acpt.gemspec
94
+ ├── Rakefile # delegates to bundler/gem_tasks + RSpec
95
+ └── .rubocop.yml # Ruby 3.2 target, line length 200
96
+ ```
97
+
98
+ `lib/terraform/` is shipped inside the gem. At runtime its contents are
99
+ copied into `~/.cem_acpt/terraform/`; a SHA-256 checksum (mixing in
100
+ `CemAcpt::VERSION`) decides whether the on-disk copy is stale and needs
101
+ to be replaced (`Config::Base#create_terraform_dir!`).
102
+
103
+ ## 2. Runtime entry points
104
+
105
+ Both binaries are thin shells that:
106
+
107
+ 1. `require 'dotenv/load'` so any `.env` in CWD is available before
108
+ the config layer reads `ENV`.
109
+ 2. Parse CLI options through `CemAcpt::Cli.parse_opts_for(:cem_acpt)`
110
+ or `:cem_acpt_image` (`lib/cem_acpt/cli.rb`), which returns a
111
+ `[command, options]` tuple. `command` is normally `:cem_acpt` or
112
+ `:cem_acpt_image`, but can be one of `:version`,
113
+ `:print_yaml_config`, or `:print_explain_config` if a meta flag was
114
+ passed.
115
+ 3. Hand off to `CemAcpt.run(command, original_command, options)` in
116
+ `lib/cem_acpt.rb`.
117
+
118
+ `CemAcpt.run` is the single dispatch point for both binaries. It:
119
+
120
+ - installs a SIGINT handler that flips the logger into trap-context mode
121
+ before calling `exit 1` (`set_up_signal_handlers`),
122
+ - optionally wraps the run in a `TracePoint` (filtered to
123
+ `lib/cem_acpt` paths, excluding `lib/cem_acpt/logging`) when
124
+ `--trace` is set,
125
+ - delegates to `run_cem_acpt` (which builds a `TestRunner::Runner`
126
+ and `exit`s with the runner's `exit_code`), or
127
+ - delegates to `run_cem_acpt_image` (which builds a
128
+ `ImageBuilder::TerraformBuilder` via `ImageBuilder.build_images`).
129
+
130
+ ```text
131
+ exe/cem_acpt ──▶ CemAcpt::Cli.parse_opts_for ──▶ CemAcpt.run ──▶ TestRunner::Runner#run
132
+ exe/cem_acpt_image ──────────────────────────────────────────▶ ImageBuilder::TerraformBuilder#run
133
+ ```
134
+
135
+ Both binaries default `options[:module_dir]` to `Dir.pwd` if the user
136
+ did not pass `-m / --module-dir`.
137
+
138
+ ## 3. Configuration system
139
+
140
+ Configuration lives under `lib/cem_acpt/config/`:
141
+
142
+ - `Config::Base` — merge engine, environment-variable handling,
143
+ Terraform-dir bootstrapping, secret wrapping, validation, freezing.
144
+ - `Config::CemAcpt` — schema (`VALID_KEYS`) and `defaults` for the test
145
+ runner. Top-level keys include `actions`, `bolt`, `image_name_builder`,
146
+ `module_dir`, `node_data`, `test_data`, `tests`.
147
+ - `Config::CemAcptImage` — schema and `defaults` for the image builder.
148
+ Top-level keys include `cem_acpt_image`, `dry_run`, `images`,
149
+ `image_name_filter`, `no_build_images`, `node_data`. Uses an
150
+ `env_var_prefix` of `CEM_ACPT_IMAGE` and registers a `load_hook` that
151
+ filters `images` by `image_name_filter` *after* the merge but before
152
+ validation.
153
+
154
+ ### Merge order
155
+
156
+ `Config::Base#load` merges sources from low to high precedence. The
157
+ ordering visible in code (`base.rb:125-155`) is:
158
+
159
+ 1. **Defaults** from the subclass (seeded in `init_config!`).
160
+ 2. **Environment variables** matching `CEM_ACPT_*` (or
161
+ `CEM_ACPT_IMAGE_*` for the image builder), translated by
162
+ `env_var_to_dot_key` so e.g. `CEM_ACPT_NODE_DATA__DISK_SIZE` becomes
163
+ `node_data.disk_size`. Only env vars whose top-level key is in
164
+ `valid_keys` are kept.
165
+ 3. **User config** at `~/.cem_acpt/config.yaml` (skippable with
166
+ `load_user_config: false`).
167
+ 4. **`--config FILE`** (the runtime config file). Resolved against
168
+ `module_dir` first, then `File.expand_path` fallback.
169
+ 5. **CLI `@options`** (everything else from `parse_opts_for`).
170
+ 6. **`add_static_options!`** runs last as a single phase; internally
171
+ it does two things:
172
+ - 6a. Sets the framework-owned keys `user_config.dir`,
173
+ `user_config.file`, `provisioner = 'terraform'`, and
174
+ `terraform.dir`. Only these four keys clobber anything earlier;
175
+ the rest of the merged config is untouched.
176
+ - 6b. Calls `set_third_party_env_vars!`: `RUNNER_DEBUG=1` forces
177
+ `log_level=debug` and `verbose=true`; `GITHUB_ACTIONS=true` or
178
+ `CI=true` forces `ci_mode=true`.
179
+ 7. The optional class-level `load_hook` then runs inside `load`
180
+ (currently used by `Config::CemAcptImage` to filter images).
181
+
182
+ The merge uses `deep_merge` with `overwrite_arrays: true` and
183
+ `merge_nil_values: true`. Hash keys are then symbolized via the
184
+ `ExtendedHash#format!` refinement, unknown top-level keys are dropped
185
+ with a `warn`, secrets are wrapped, and the final hash is `freeze`d
186
+ for thread safety.
187
+
188
+ The README documents the precedence order in user-facing terms; this
189
+ section reflects the actual implementation order, which is consistent.
190
+
191
+ ### Lookups
192
+
193
+ `Config::Base#get('a.b.c')` (alias `dget`, also `[]` when called with a
194
+ String) walks the frozen hash via `Hash#dot_dig`. Results are duplicated
195
+ on read (`@dot_key_cache`), so callers cannot mutate config state.
196
+
197
+ ### Secrets
198
+
199
+ The `secrets:` top-level key can come from any merge source (env
200
+ variable, user config, runtime config file, or CLI `-O`). After the
201
+ merge and key-symbolization steps, every value under `secrets:` is
202
+ wrapped in `Config::Secret`, whose `#to_s` and `#inspect` redact the
203
+ value (`Secret(key=****)`). Provisioning code unwraps secrets only at
204
+ the moment of running Terraform (`Provision::Terraform#unwrap_secrets`,
205
+ `ImageBuilder::TerraformBuilder#terraform_vars`). The README warns that
206
+ secrets can still leak through Terraform's own logging.
207
+
208
+ ### `-Y` and `-X`
209
+
210
+ `CemAcpt.print_config` builds a config without running anything:
211
+ `-Y` prints the merged result as YAML; `-X` prints
212
+ `Base#explain` — a trace of every key and the source(s) that
213
+ contributed to it (collected via `add_config_explanation` calls
214
+ sprinkled through the merge steps).
215
+
216
+ ## 4. `cem_acpt` test-runner lifecycle
217
+
218
+ `TestRunner::Runner#run` (`lib/cem_acpt/test_runner.rb`) is the entire
219
+ test-execution flow. It is wrapped in a `begin/rescue/ensure` so that
220
+ provisioned infrastructure is always cleaned up.
221
+
222
+ ```text
223
+ ┌─ start_time recorded ─────────────────────────────────────────┐
224
+ │ │
225
+ │ 1. Dir.chdir(module_dir) │
226
+ │ 2. configure_actions (registers :goss + :bolt) │
227
+ │ 3. pre_provision_test_nodes: │
228
+ │ • build Puppet module tarball │
229
+ │ • build test_data array (one entry per test × for_each) │
230
+ │ • build platform/node objects (one per test_data entry) │
231
+ │ • create ephemeral SSH keys (unless disabled) │
232
+ │ • setup_bolt (if :bolt action is registered) │
233
+ │ 4. provision_test_nodes (terraform init/plan/apply) │
234
+ │ 5. instance_names_ips = provisioner_output (with retries) │
235
+ │ 6. partition tests by Provision::OsData.os_family_for: │
236
+ │ (the test list — not platform.name — drives this fork) │
237
+ │ • mixed Windows + Linux → raise (unsupported today) │
238
+ │ • all-Windows: upload module tarball to │
239
+ │ gs://<windows_bucket> and for each instance run │
240
+ │ WinRMRunner::WinNode.run │
241
+ │ 7. run_tests │
242
+ │ • Actions.execute over registered groups │
243
+ │ - :goss group runs async via async-http │
244
+ │ - :bolt group runs sync; threaded inside Bolt runner │
245
+ │ │
246
+ │ rescue StandardError ──▶ append error to results │
247
+ │ ensure ─▶ clean_up ─▶ destroy_test_nodes (or `terraform show`│
248
+ │ if --no-destroy-nodes), clean ephemeral keys, │
249
+ │ cleanup_bucket (Windows), Dir.chdir(@old_dir) │
250
+ │ ensure ─▶ process_test_results: pop from queue, set exit_code│
251
+ └───────────────────────────────────────────────────────────────┘
252
+ ```
253
+
254
+ ### Key state on the runner
255
+
256
+ - `@run_data` — a hash that travels with everything provisioned. Holds
257
+ `:module_package_path`, `:test_data` (Array of test-data hashes),
258
+ `:nodes` (Array of platform objects), `:private_key`, `:public_key`,
259
+ `:known_hosts`, and on Windows `:win_remote_module_name` /
260
+ `:win_remote_module_path`.
261
+ - `@instance_names_ips` — what Terraform's `instance_name_ip` output
262
+ emits: `{ <instance_name> => { 'ip' => ..., 'test_name' => ... } }`.
263
+ - `@hosts` — the IPs that Goss and Bolt run against.
264
+ - `@results` — a `TestRunner::TestResults::Results` (queue-backed).
265
+ - `@exit_code` — `0` only if every result has status in
266
+ `SUCCESS_STATUS = [200, 0]`.
267
+
268
+ ### Pass/fail rules
269
+
270
+ - `process_test_results` iterates the results queue, calls `.status` on
271
+ each, and sets `@exit_code = 1` on the first non-success.
272
+ - Empty results → `@exit_code = 1`. (i.e. "no tests ran" is a failure.)
273
+ - `provisioner_output` makes up to 3 attempts (i.e. 2 retries) with a
274
+ 3-second sleep between attempts; nil/empty after the third attempt
275
+ raises.
276
+
277
+ ### Cleanup contract
278
+
279
+ `clean_up` runs once unconditionally (in the outer `ensure`). It is
280
+ no-op if `--no-destroy-nodes` is set, in which case it instead logs
281
+ the SSH keys and runs `terraform show` so the user can SSH into the
282
+ provisioned nodes manually.
283
+
284
+ ## 5. Test data & image-name builder
285
+
286
+ ### Test discovery
287
+
288
+ `TestData::Fetcher` (`lib/cem_acpt/test_data.rb`) reads the
289
+ `tests:` config array. For each entry it expects a directory under
290
+ `<module_dir>/spec/acceptance/<test_name>/` containing at least:
291
+
292
+ - `goss.yaml` — Goss assertions
293
+ - `manifest.pp` — Puppet manifest applied during provisioning
294
+
295
+ and optionally:
296
+
297
+ - `bolt.yaml` — per-task validation hashes (see §10)
298
+
299
+ If `goss.yaml` or `manifest.pp` is missing, the runner raises before
300
+ provisioning anything.
301
+
302
+ ### Variable expansion
303
+
304
+ For each acceptance test, the fetcher builds a base test-data hash
305
+ (`{ test_name:, test_dir:, goss_file:, puppet_manifest:, bolt_test? }`)
306
+ and then runs four expansion passes:
307
+
308
+ 1. **`for_each`** — duplicates the hash once per item in each
309
+ `test_data.for_each.<key>` array, setting that key on the duplicate.
310
+ The default config seeds `for_each.collection = ['puppet8']`, so a
311
+ single test produces a single test-data entry by default but can be
312
+ easily fanned out.
313
+ 2. **`vars`** — merges `test_data.vars` static key/values.
314
+ 3. **`name_pattern_vars`** — runs the test name against a Regexp with
315
+ named captures and merges those captures. The default pattern
316
+ carves a name like `cis_rhel-8_firewalld_server_2` into:
317
+
318
+ | Capture | Value |
319
+ |------------------|------------|
320
+ | `framework` | `cis` |
321
+ | `image_fam` | `rhel-8` |
322
+ | `firewall` | `firewalld`|
323
+ | `framework_vars` | `server_2` |
324
+ 4. **`vars_post_processing`** — `new_vars` synthesizes new keys from
325
+ existing ones (only `string_split` is implemented today). The
326
+ default config splits `framework_vars` into `profile` and `level`.
327
+ `delete_vars` drops keys after the new ones are computed.
328
+
329
+ The result is an Array of hashes; each one produces one provisioned
330
+ test node.
331
+
332
+ ### Image name
333
+
334
+ If the config has an `image_name_builder` key, `Platform::TestBase`
335
+ defers to `ImageNameBuilder` (`lib/cem_acpt/image_name_builder.rb`)
336
+ for each test-data entry. The builder:
337
+
338
+ 1. Resolves each part: `'$image_fam'` → `test_data.dot_dig('image_fam')`.
339
+ 2. Joins with `join_with` (default ``''``).
340
+ 3. Optionally validates against `validation_pattern`.
341
+ 4. Applies pairwise `character_substitutions`.
342
+
343
+ If `image_name_builder` is not configured, the platform falls back to
344
+ `test_data[:image_name]`, which would have to come from `vars` or
345
+ `name_pattern_vars`.
346
+
347
+ ## 6. Platform abstraction
348
+
349
+ `CemAcpt::Platform` (`lib/cem_acpt/platform.rb`) is a small dynamic
350
+ loader. The active platform is `config.get('platform.name')` (default
351
+ `'gcp'`). The string is matched case-sensitively against the basenames
352
+ of `lib/cem_acpt/platform/*.rb`, which are all lowercase — so this
353
+ config value must be lowercase (e.g. `gcp`, not `GCP`) or the lookup
354
+ will fail with `Platform <name> is not supported`.
355
+
356
+ - `Platform::Base` — node identity, defines abstract `platform_data` /
357
+ `node_data` hooks.
358
+ - `Platform::TestBase < Base` — adds per-test-data context and the
359
+ `image_name` lookup hook.
360
+ - `Platform.use(platform, config, run_data)` — for each entry in
361
+ `run_data[:test_data]`, instantiates one platform-specific
362
+ `TestBase` and returns the array. Used by the test runner.
363
+ - `Platform.get(platform, base_type: :base|:test)` — returns the class
364
+ without instantiating it. Used by the image builder, which only
365
+ needs `platform_data` (no per-test context).
366
+
367
+ ### Loading mechanics
368
+
369
+ `platforms` is computed once and memoized on the module by globbing
370
+ `lib/cem_acpt/platform/*.rb` and excluding `base.rb`. To add a new platform you create
371
+ `lib/cem_acpt/platform/<name>.rb` defining a module
372
+ `CemAcpt::Platform::Mixin::<CamelCaseName>` with `#platform_data` and
373
+ `#node_data`. `platform_class` then dynamically creates a class named
374
+ after the camel-cased platform name (e.g. `gcp` → `Gcp`,
375
+ `aws_govcloud` → `AwsGovcloud`) inheriting from `Platform::Base` or
376
+ `Platform::TestBase`, and `include`s the corresponding mixin from
377
+ `CemAcpt::Platform::Mixin`. The class is cached on `CemAcpt::Platform`
378
+ under that name for subsequent lookups; the cache check uses
379
+ `const_defined?(name, false)` so unrelated same-named constants
380
+ elsewhere in the constant graph cannot win.
381
+
382
+ ### GCP
383
+
384
+ `CemAcpt::Platform::Mixin::Gcp` (`lib/cem_acpt/platform/gcp.rb`) is
385
+ mixed into the dynamic `CemAcpt::Platform::Gcp` class and populates
386
+ `platform_data` and `node_data` by:
387
+
388
+ - preferring values from `config.get('platform.*')` and
389
+ `config.get('node_data.*')`, then
390
+ - shelling out to `gcloud` (`os-login describe-profile`,
391
+ `config get-value project|compute/region|compute/zone`) for any value
392
+ the user didn't set.
393
+
394
+ The platform also supplies the SSH private/public key paths from
395
+ `@run_data` (the ephemeral keys created in pre-provision), falling
396
+ back to `~/.ssh/google_compute_engine{,.pub}`.
397
+
398
+ `platform_data` returns the cluster-wide vars (project, region, zone,
399
+ subnetwork, credentials, keys, username). `node_data` returns the
400
+ per-instance vars (machine type, disk size, max run duration, image
401
+ name, test name).
402
+
403
+ ## 7. Provisioner: Terraform
404
+
405
+ There is currently one provisioner. `Provision.new_provisioner`
406
+ (`lib/cem_acpt/provision.rb`) returns
407
+ `Provision::Terraform` for `provisioner == 'terraform'` and raises
408
+ otherwise. `provisioner` is currently force-set to `'terraform'` in
409
+ `Config::Base#add_static_options!`, so this dispatch is effectively
410
+ fixed today.
411
+
412
+ ### `Provision::Terraform`
413
+
414
+ `lib/cem_acpt/provision/terraform.rb` orchestrates the four-step
415
+ Terraform run:
416
+
417
+ 1. **Pick a backend** based on the first test's name. `OsData.use_for?`
418
+ matches the `^prefix_osname-version` pattern; `Linux.valid_names`
419
+ covers `centos rhel oel alma rocky ubuntu` and `valid_versions`
420
+ covers `7 8 9 2004 2204 2404`. `Windows.valid_names` is
421
+ `[windows]` and versions are `2016 2019 2022 2025`.
422
+ 2. **Build a working dir** at
423
+ `~/.cem_acpt/terraform/test_<unix_ts>/`, populated by
424
+ `cp_r`-ing the backend's `provision_directory`
425
+ (`<terraform_dir>/<platform>/<linux|windows>/`, computed in
426
+ `new_working_dir` from `base_provision_directory + implementation_name`)
427
+ plus the Puppet module tarball. The private and public keys are
428
+ each copied only when they exist on disk; otherwise the
429
+ corresponding state field stays `nil`.
430
+ 3. **Format vars** (`formatted_vars`) — merges
431
+ `nodes.first.platform_data` with `puppet_module_package`,
432
+ `private_key`, `public_key`, and the `node_data` map (one entry per
433
+ provisioned instance). `node_data` includes test-specific paths
434
+ (`goss_file`, `puppet_manifest`) and the `provision_commands` array
435
+ that Terraform feeds into a `remote-exec`.
436
+ 4. **Run the Terraform CLI** (`init`, `plan`, `apply`, `output`,
437
+ `destroy`, `show`) via `TerraformCmd`.
438
+
439
+ ### `TerraformCmd`
440
+
441
+ `lib/cem_acpt/provision/terraform/terraform_cmd.rb` is a thin
442
+ stand-in for the abandoned `ruby-terraform` gem. It builds shell
443
+ commands (e.g. `terraform -chdir=… plan -out=…`), shells out via
444
+ `Utils::Shell.run_cmd`, streams stdout/stderr to the logger in real
445
+ time, and raises `ShellCommandError` on non-zero exit (configurable).
446
+
447
+ Notable behavior:
448
+
449
+ - The `vars` opt is rendered as `-var='key=value'` (or
450
+ `-var='key=<json>'` for hash values).
451
+ - `plan` requires `:plan` and writes it via `-out=...`. `apply`
452
+ requires `:plan` and supplies it as the trailing positional arg.
453
+ - `output` defaults to `combine_out_err: false` so JSON parsing
454
+ isn't broken by stderr noise.
455
+ - `chdir` walks `[Dir.pwd, working_dir, opts[:chdir]]` and uses the
456
+ first directory that contains a `main.tf`.
457
+
458
+ ### OS-specific provision commands
459
+
460
+ `Provision::Linux#provision_commands` produces the inline list passed
461
+ to the `remote-exec` block of `lib/terraform/gcp/linux/main.tf`. The
462
+ remote module package name comes from `OsData#remote_module_package_name`
463
+ (currently `'puppet-module.tar.gz'`):
464
+
465
+ ```
466
+ sudo /opt/puppetlabs/puppet/bin/puppet module install /opt/cem_acpt/puppet-module.tar.gz
467
+ curl -fsSL https://goss.rocks/install | sudo sh
468
+ sudo /opt/puppetlabs/puppet/bin/gem install webrick
469
+ sudo chmod +x /opt/cem_acpt/log_service/log_service.rb
470
+ sudo /opt/cem_acpt/log_service/log_service.rb # daemonized HTTP server on :8083
471
+
472
+ # Only if `<provision_directory>/systemd/*.service` is non-empty
473
+ # (currently: goss-acpt, goss-idempotent, goss-noop):
474
+ sudo cp /opt/cem_acpt/systemd/<file> /etc/systemd/system/<file>
475
+ sudo systemctl daemon-reload
476
+ sudo systemctl start <file> && sudo systemctl enable <file>
477
+
478
+ # Finally — note that --logdest comes before [--debug]/[--verbose],
479
+ # and the manifest is the trailing positional argument:
480
+ sudo /opt/puppetlabs/puppet/bin/puppet apply \
481
+ --logdest console,/opt/cem_acpt/provision_apply.log \
482
+ [--debug] [--verbose] \
483
+ /opt/cem_acpt/manifest.pp
484
+ ```
485
+
486
+ `provision_commands_wrapper` prepends RPM/dnf or apt warm-up commands
487
+ for EL8-family or Ubuntu base images respectively.
488
+
489
+ The Windows backend has no analogous `provision_commands` —
490
+ `Provision::Terraform#provision_node_data` dispatches on backend
491
+ class and threads an empty list into the Windows `node_data`. The
492
+ real shell work happens via WinRM after Terraform finishes; see §13.
493
+ Calling `Provision::Windows#provision_commands` directly raises
494
+ `NotImplementedError` to signal that to anyone who tries.
495
+
496
+ The image-build flow in §12 reuses the same plan/apply/output/destroy
497
+ shape as this provisioner — that section concentrates on the
498
+ differences (image creation, `gcloud` integration, `--filter` and dry
499
+ runs) rather than re-deriving the Terraform CLI choreography.
500
+
501
+ ### Terraform templates
502
+
503
+ The on-disk templates (`lib/terraform/gcp/linux/main.tf`,
504
+ `lib/terraform/gcp/windows/main.tf`) declare the required
505
+ `google` provider (currently pinned to `7.24.0`), accept the merged
506
+ vars, and create a `google_compute_instance` per `var.node_data`
507
+ entry. The Linux template runs a sequence of `remote-exec` and
508
+ `file` provisioners to upload `provision_dir_source` (the test
509
+ artifacts), the Puppet module tarball, `goss.yaml`, and `manifest.pp`,
510
+ then runs `each.value.provision_commands` over SSH.
511
+
512
+ Each instance is tagged `cem-acpt-test-node` and tagged in metadata
513
+ with `cem-acpt-test=<test_name>`. The `instance_name_ip` output is the
514
+ `{ instance => { ip:, test_name: } }` map the runner consumes.
515
+
516
+ The systemd unit files (`goss-acpt.service`, `goss-idempotent.service`,
517
+ `goss-noop.service`) start three `goss serve` instances on
518
+ ports 8080/8081/8082 with endpoints `/acpt`, `/idempotent`, `/noop`.
519
+ `goss-idempotent` and `goss-noop` pin their ports explicitly with
520
+ `-l ":8081"` / `-l ":8082"`; `goss-acpt` has no `-l` flag and relies
521
+ on goss's default of `:8080` — functionally identical, but
522
+ asymmetric across the three units.
523
+ `log_service.rb` (also shipped under `lib/terraform/gcp/linux/`) is a
524
+ WEBrick server on port 8083 serving the contents of the apply logs.
525
+
526
+ ## 8. Action framework
527
+
528
+ `CemAcpt::Actions` (`lib/cem_acpt/actions.rb`) is a small
529
+ register/execute framework. It has three classes:
530
+
531
+ - **`Action`** — name, order, callable block.
532
+ - **`ActionGroup`** — ordered, optionally async list of `Action`s.
533
+ - **`ActionConfig`** — global config object: `groups` (Hash), `only`,
534
+ `except`. Filters happen at `filter_actions` time on each group.
535
+
536
+ The runner registers two groups in `configure_actions`:
537
+
538
+ | Group | Order | Async | Action(s) |
539
+ | ------- | ----- | ----- | ------------------------------------------------- |
540
+ | `:goss` | 0 | yes | `:acpt`, `:idempotent`, `:noop` (one per Goss endpoint) |
541
+ | `:bolt` | 1 | no | `:bolt` (delegates to `Bolt::TestRunner`) |
542
+
543
+ `Actions.execute` runs each group sequentially in `order`. Async
544
+ groups create an `Async::Barrier` and a single `Async::HTTP::Internet`
545
+ that is shared across all action calls in the group; this is the
546
+ HTTP client the Goss action callbacks use. Sync groups simply iterate
547
+ and call.
548
+
549
+ CLI flags `-a / --only-actions` and `-A / --except-actions` populate
550
+ `config.actions.only` and `config.actions.except`, which feed
551
+ `ActionConfig#only=` and `#except=`.
552
+
553
+ The default Goss action keys come from `Goss::Api::ACTIONS`
554
+ (see §9). When the `bolt` binary is missing, `pre_provision_test_nodes`
555
+ suppresses the `:bolt` action by catching `ShellCommandNotFoundError`
556
+ in `Runner#setup_bolt` and appending `:bolt` to
557
+ `CemAcpt::Actions.config.except` (preserving any user-supplied
558
+ `--except-actions` entries). The remaining action groups continue
559
+ normally.
560
+
561
+ ## 9. Goss subsystem
562
+
563
+ `lib/cem_acpt/goss/api.rb` is a tiny HTTP client.
564
+
565
+ - `Goss::Api::ACTIONS = { acpt: '8080/acpt', idempotent: '8081/idempotent', noop: '8082/noop' }`.
566
+ - `run_action(host, action, internet=nil)` does an HTTP GET against
567
+ `http://<host>:<port>/<endpoint>`, parses the JSON body, and wraps
568
+ the result in `Goss::Api::ActionResponse`.
569
+ - `get_run_logs(host, internet=nil)` GETs `http://<host>:8083/run-logs`
570
+ for the logs served by the on-node `log_service.rb`.
571
+
572
+ The goss HTTP server runs as systemd units installed by the
573
+ `provision_commands` (see §7).
574
+
575
+ `ActionResponse` (`lib/cem_acpt/goss/api/action_response.rb`):
576
+
577
+ - `#status` returns the integer HTTP status. `#success?` is `status == 200`.
578
+ - `#results` lazily wraps each item in the JSON `'results'` array as
579
+ `ActionResponseResult` (one per Goss assertion).
580
+ - `#summary` wraps the `'summary'` block as `ActionResponseSummary`
581
+ (failed_count / passed_count / total_duration / summary_line).
582
+ - `DurationHandler` provides a small `Duration(value, unit, round)`
583
+ struct so callers can ask for `:nanoseconds | :milliseconds |
584
+ :seconds`.
585
+ - `metadata` is an open hash the runner stuffs `:run_logs` into, which
586
+ the log formatter renders on failure (`log_action_test_result`).
587
+
588
+ The runner registers one Action per `Goss::Api::ACTIONS` key. Each
589
+ Action callback iterates over `@hosts` and makes **two** calls per
590
+ host using the group's shared `Async::HTTP::Internet`: one
591
+ `run_action` to the Goss endpoint, then a second `get_run_logs` to
592
+ `:8083/run-logs` whose body is attached as `metadata[:run_logs]`. The
593
+ combined response is pushed onto the shared `@results` queue. Sharing
594
+ one `Async::HTTP::Internet` across both calls (and across all hosts in
595
+ the group) is what keeps connection setup costs amortized.
596
+
597
+ ## 10. Bolt subsystem
598
+
599
+ `lib/cem_acpt/bolt.rb` defines the top-level `Bolt::TestRunner`. The
600
+ rest of the subsystem lives under `lib/cem_acpt/bolt/`, including
601
+ `errors.rb` (defines `BoltActionError`), `helpers.rb`, `yaml_file.rb`
602
+ (base class for the inventory/project YAML I/O), and
603
+ `summary_results.rb` in addition to the classes called out below.
604
+
605
+ ### Object model
606
+
607
+ | Class | Role |
608
+ |-------------------------------|------------------------------------------------------|
609
+ | `Bolt::TestRunner` | Top-level coordinator. Owns inventory, project, tests. |
610
+ | `Bolt::Inventory < YamlFile` | Generates and persists `inventory.yaml`. |
611
+ | `Bolt::Project < YamlFile` | Generates and persists `bolt-project.yaml`. |
612
+ | `Bolt::YamlFile` | Idempotent YAML I/O (`save!`, `delete!`, `latest_saved?`). |
613
+ | `Bolt::TaskList` | `bolt task show` + filter (`module_pattern`, `name_filter`, `only`, `ignore`). |
614
+ | `Bolt::TaskWrapper` | Per-task abstraction with `show` / `run` and a `last_cmd_executed` pointer. |
615
+ | `Bolt::Cmd::Base` | Generic Bolt CLI builder, options DSL (`option`, `supports_params`). |
616
+ | `Bolt::Cmd::TaskShow / TaskRun` | Concrete subcommands. |
617
+ | `Bolt::Cmd::Output` | Wraps the JSON output (`--format json` is hard-coded). Errors get coerced into a synthetic `_error` item so downstream code is uniform. |
618
+ | `Bolt::Cmd::OutputItem / OutputError` | Per-target result item / error item. |
619
+ | `Bolt::Tests::TestData` | Per-task validation hash from `bolt.yaml`. |
620
+ | `Bolt::Tests::Test` | One Bolt task × matched groups. `#run` calls `task.run` and validates. |
621
+ | `Bolt::Tests::TestList` | Loads `bolt.yaml`s, builds `Test` instances using `TaskList × test_data`. |
622
+ | `Bolt::Tests::TestResult` | Per-target validation result. |
623
+ | `Bolt::Tests::TestResults` | Collection of `TestResult`s for one task. |
624
+ | `Bolt::SummaryResults < Utils::FinalizerQueue` | Aggregate over all tests; the value pushed to the runner's results queue. |
625
+
626
+ ### Setup / teardown
627
+
628
+ `TestRunner#setup!` (called from
629
+ `pre_provision_test_nodes` before any node is provisioned) writes
630
+ `bolt-project.yaml` and `inventory.yaml` if they don't already match
631
+ the in-memory hashes, and triggers `tests.setup!`, which runs
632
+ `bolt task show` to discover tasks. If `bolt` isn't on the PATH the
633
+ runner downgrades the action to ignored.
634
+
635
+ After provisioning, the runner sets `bolt_test_runner.hosts =
636
+ filtered_bolt_hosts` (filtering by `bolt.tests.only` / `.ignore`)
637
+ and calls `run`. The private key was already injected into the
638
+ `Inventory` at `Bolt::TestRunner.new` time (from
639
+ `run_data[:private_key]`); there is also a `run_data=` writer that
640
+ updates it, but the runner doesn't call it in this flow. By default
641
+ Bolt tests are split into `bolt.max_threads` (default 5) groups and
642
+ each group runs sequentially in its own Ruby thread. Set
643
+ `threaded: false` (not exposed via CLI) for sync execution.
644
+
645
+ `teardown!` runs `delete!` on the inventory and project unless
646
+ `bolt.keep_inventory` / `bolt.keep_project` are set.
647
+
648
+ ### Validation
649
+
650
+ Each `bolt.yaml` entry is a hash keyed by Bolt task name:
651
+
652
+ ```yaml
653
+ 'sce_linux::audit_sssd_certmap':
654
+ status: 'success'
655
+ value:
656
+ match: '^(true|false)$'
657
+ ```
658
+
659
+ `TestData#validate_props` walks every key in the hash and:
660
+
661
+ - compares to a string with `==`,
662
+ - compares to a hash by interpreting `match:` / `not_match:` as
663
+ regex predicates and flagging unknown keys,
664
+ - compares to anything else with `==`.
665
+
666
+ Failures are accumulated as `{ prop:, result:, validator:,
667
+ validation_value:, other_value: }` hashes that the
668
+ `bolt_summary_results_formatter` renders.
669
+
670
+ ## 11. Result aggregation & log formatting
671
+
672
+ ### Results
673
+
674
+ `TestRunner::TestResults::Results` (`test_runner/test_results.rb`)
675
+ wraps a `Queue`. Anything pushed to it is converted on the way in:
676
+
677
+ - a `StandardError` becomes a `TestErrorActionResult`,
678
+ - everything else becomes a `TestActionResult`,
679
+
680
+ and in both cases is paired with a log formatter built by
681
+ `LogFormatter.new_formatter(result, config, instance_names_ips)`.
682
+ The formatter implementations live under
683
+ `lib/cem_acpt/test_runner/log_formatter/`:
684
+
685
+ | Formatter | For |
686
+ |--------------------------------------|----------------------------------------------------|
687
+ | `GossActionResponse` | Successful Goss responses |
688
+ | `GossErrorFormatter` | Error Goss responses (`error?` is true) |
689
+ | `BoltSummaryResultsFormatter` | Bolt subsystem aggregate |
690
+ | `StandardErrorFormatter` | Anything else that bubbles up |
691
+
692
+ `process_test_results` pops each result, calls `result.status`, sets
693
+ `@exit_code = 1` on the first non-success status, and routes the
694
+ formatted output to `logger.info|verbose|error` based on whether each
695
+ formatted line starts with `Passed:` / `Skipped:` / something else.
696
+ On non-pass it appends the captured run logs (provision/idempotent/noop
697
+ apply logs) — debug-level lines are filtered out unless
698
+ `debug?` is true and `puppet.no_debug` is not set.
699
+
700
+ ### CI groups
701
+
702
+ `logger.start_ci_group` / `end_ci_group` emit
703
+ `::group:: ... ::endgroup::` directives on stdout when running under
704
+ `GITHUB_ACTIONS`/`CI`/`-I` so the summary collapses neatly per node
705
+ in the GitHub Actions web UI.
706
+
707
+ ## 12. `cem_acpt_image` image-builder lifecycle
708
+
709
+ `ImageBuilder::TerraformBuilder` (`lib/cem_acpt/image_builder.rb`)
710
+ is similar in shape to `Provision::Terraform` but builds an image
711
+ artifact rather than running tests. The class comment explicitly
712
+ acknowledges the duplication and flags it for future refactor.
713
+
714
+ ```text
715
+ 1. new_tfvars(config):
716
+ • require secrets.puppet_auth_token
717
+ • generate ephemeral SSH keys (unless disabled)
718
+ • for each entry in config.images:
719
+ - new_platform → platform_data
720
+ - merge in: provision_commands (via ProvisionCommands.provision_commands),
721
+ image_family, base_image, windows_image flag
722
+ 2. divide_tfvars_by_os → linux_tfvars, windows_tfvars
723
+ 3. dry_run? → log and exit
724
+ 4. new_working_dir under ~/.cem_acpt/terraform/image/<platform>/image_builder_<ts>/
725
+ 5. terraform init for each os subdir
726
+ 6. for each os:
727
+ terraform plan → terraform apply
728
+ parse `node-data` output → for each instance:
729
+ gcloud compute instances stop (unless --no-destroy-nodes)
730
+ unless --no-build-images:
731
+ gcloud compute images deprecate <old in family>
732
+ gcloud compute images create <new>
733
+ ensure terraform destroy (unless --no-destroy-nodes)
734
+ ```
735
+
736
+ ### Differences from the test runner
737
+
738
+ - Each image's `provision_commands` is OS-aware:
739
+ `ProvisionCommands` (`lib/cem_acpt/image_builder/provision_commands.rb`)
740
+ generates the right repo-setup + puppet-agent install commands for
741
+ EL family, Debian/Ubuntu, or Windows.
742
+ `secrets.puppet_auth_token` is shipped to the node via an inline
743
+ `export PUPPET_AUTH_TOKEN='...'` prepended to the `provision_commands`
744
+ in the Terraform template.
745
+ - The image-builder dir layout has `linux/` and `windows/` siblings;
746
+ the builder `Dir.chdir` between them as it iterates
747
+ (`in_os_dir(os_str)`). `lib/terraform/image/gcp/windows/` currently
748
+ ships only a `.keep` file — no `main.tf` exists. To keep this from
749
+ surfacing as a confusing `terraform init` failure, `TerraformBuilder#run`
750
+ calls `assert_template_present!(os_str)` for every populated OS bucket
751
+ before any working-dir or Terraform work, raising
752
+ `ImageBuilder::MissingTemplateError` with the missing path and a
753
+ pointer to `--no-#{os_str}`. Authoring the Windows `main.tf` is
754
+ tracked separately in [RFC 0003](rfcs/0003-windows-image-builder-template.md).
755
+ - Image naming: the new image is created with family
756
+ `<image_family>` and name `<image_family>-v<unix_ts>`.
757
+ `TerraformBuilder#image_name_from_image_family` enforces GCE's
758
+ `GCE_IMAGE_NAME_MAX = 63` cap (RFC 1035 label rules); when the
759
+ concatenated name would overflow, the family is clipped to fit
760
+ while the `-v<unix_ts>` suffix is preserved verbatim, so two
761
+ consecutive builds in the same family cannot collide on the
762
+ truncated name. Trailing dashes from a clipped family are stripped
763
+ before the suffix is appended (GCE rejects names ending in `-`).
764
+ The previous implementation used `"…-v#{ts}"[0..64]`, an inclusive
765
+ range that allowed up to 65 characters and could land mid-timestamp;
766
+ see [RFC 0004](rfcs/0004-image-name-truncation-off-by-one.md). Old
767
+ `READY` images in the same family are deprecated with a 1-day grace
768
+ period before deletion.
769
+ - `gcloud` execution is encapsulated in
770
+ `ImageBuilder::Exec::Gcloud`. `Gcloud#run` shells out via
771
+ `Open3.capture3`, appends `--format=json`, and parses the output.
772
+ Construction-time validation is a separate `verify_gcloud!` call
773
+ that just runs `system('gcloud --version')` (not Open3) and raises
774
+ if it returns non-true.
775
+
776
+ ### CLI flags for the image builder
777
+
778
+ - `--dry-run` — log the tfvars and exit.
779
+ - `--no-build-images` — apply Terraform but do not create images
780
+ (helpful for debugging the provisioner step).
781
+ - `--provision-only` — `--no-build-images` + `--no-destroy-nodes`.
782
+ - `--no-linux` / `--no-windows` — skip the corresponding OS bucket.
783
+ - `-F / --filter REGEX` — applied to image keys via the `load_hook`
784
+ in `Config::CemAcptImage`.
785
+
786
+ ## 13. Windows path
787
+
788
+ `cem_acpt`'s Windows flow is a hybrid of Terraform (just to provision
789
+ the instance) and Ruby/WinRM (everything else):
790
+
791
+ - `lib/terraform/gcp/windows/main.tf` creates the instance and pins a
792
+ hard-coded service account
793
+ (`cem-windows-acpt-test@team-sse.iam.gserviceaccount.com`, scope
794
+ `cloud-platform`). The Linux template (`lib/terraform/gcp/linux/main.tf`)
795
+ has no `service_account` block at all — this is a Windows-only
796
+ detail, not a symmetric platform feature. The Windows template
797
+ defines no `remote-exec` or `file` provisioners; the
798
+ `username`/`private_key`/`public_key` vars exist as stubs only.
799
+ - After Terraform reports back IPs, the runner partitions the
800
+ configured `tests:` by `Provision::OsData.os_family_for` (which
801
+ consults `Linux.use_for?` / `Windows.use_for?` rather than substring
802
+ matching). A mixed Linux + Windows list raises early; an all-Windows
803
+ list takes the Windows branch and:
804
+ 1. Uploads the Puppet module tarball to `gs://<windows_bucket>/<uuid>`
805
+ via `gcloud storage cp`. The bucket name comes from
806
+ `platform.gcp.windows_bucket` (default: `win_cem_acpt`) and can be
807
+ overridden via the `--windows-bucket` CLI flag, the
808
+ `CEM_ACPT_PLATFORM__GCP__WINDOWS_BUCKET` env var, or a config file.
809
+ The same bucket URI is threaded into `WinNode` so the in-instance
810
+ `gcloud storage cp` step uses the configured bucket as well.
811
+ 2. For each instance, runs
812
+ `Utils.get_windows_login_info(name, ip_hash)` which polls
813
+ `gcloud compute reset-windows-password` until the instance is ready
814
+ (max 60 × 10s = 10 minutes), then parses out the username/password.
815
+ 3. Constructs a `Utils::WinRMRunner::WinNode` per instance and
816
+ `run`s it, which opens an SSL WinRM session and issues PowerShell
817
+ commands to: enable long paths, install Puppet, fetch the module
818
+ tarball, install Goss (alpha Windows build), install NSSM, register
819
+ three NSSM-managed Goss services on ports 8080/8081/8082, open
820
+ firewall holes, and finally `puppet apply` the manifest.
821
+ - After the test, `cleanup_bucket` removes the uploaded tarball.
822
+
823
+ This duplication of Linux-side provisioning logic in PowerShell is
824
+ intentional and called out in [`README.md#Testing-with-sce_windows`](../README.md#testing-with-sce_windows),
825
+ but it does mean the Windows path is sensitive to the specific
826
+ versions hardcoded in `winrm_runner.rb`
827
+ (`puppet-agent-7.25.0-x64.msi`, `goss v0.3.23`, `nssm-2.24-101-g897c7ad`).
828
+ NSSM is required because Goss's executable cannot register as a
829
+ Windows service directly.
830
+
831
+ ## 14. Cross-cutting utilities
832
+
833
+ `lib/cem_acpt/utils/`:
834
+
835
+ - **`shell.rb`** — `Utils::Shell.run_cmd` (Open3-based, streams
836
+ stdout/stderr to a logger via `Output#<<`, optional combine), plus
837
+ `Utils::Shell.which` (mimics `which(1)` and skips Ruby bin dirs by
838
+ default to avoid `bundle exec` confusion). Defines
839
+ `ShellCommandError` / `ShellCommandNotFoundError`.
840
+ - **`ssh.rb`** — `SSH::Keygen` (wraps `ssh-keygen` with defaults
841
+ `ed25519` / 100 rounds / 4096 bits — note that `-b 4096` is silently
842
+ ignored by `ssh-keygen` when the key type is `ed25519`, since
843
+ ed25519 keys are a fixed size; the `-b` flag is effectively a no-op
844
+ unless the type is overridden) plus `SSH::Ephemeral`, the
845
+ create/clean entry points used by the runner. Setting
846
+ `CEM_ACPT_SSH_PRI_KEY` in the environment short-circuits ephemeral
847
+ generation.
848
+ - **`puppet.rb`** — `Utils::Puppet::ModulePackageBuilder`. Wraps
849
+ `Puppet::Modulebuilder::Builder` for normal modules; for modules
850
+ whose metadata `name` includes `windows`, falls back to a manual
851
+ `tar -czf`. The builder validates module metadata at construction
852
+ time (so a malformed `metadata.json` fails before any cloud work).
853
+ - **`files.rb`** — `Utils::Files.{read,write,delete}` dispatch on
854
+ file extension to a `YamlUtil`/`JsonUtil`/`FileUtil` subclass.
855
+ Reads are mtime-cached in a process-local registry to avoid
856
+ re-reading unchanged files inside loops.
857
+ - **`finalizer_queue.rb`** — `Queue → frozen Array` once-only
858
+ conversion used by `Bolt::SummaryResults` to express "all results
859
+ collected, now interrogate the aggregate".
860
+ - **`terminal.rb`** — a 1Hz "..." spinner thread used in CI mode by
861
+ the image builder so GitHub Actions doesn't kill the job for
862
+ inactivity.
863
+ - **`winrm_runner.rb`** — see §13.
864
+
865
+ `lib/cem_acpt/utils.rb` itself adds a couple of GCP-specific helpers
866
+ (`reset_password_readiness_polling`, `get_windows_login_info`).
867
+
868
+ `lib/cem_acpt/core_ext.rb` defines two refinements:
869
+
870
+ - `ExtendedHash` — `format!` (recursively symbolize keys),
871
+ `dot_dig`/`dget`, `dot_store`/`dset`, `has?`/`dhas?`. These power
872
+ the entire `Config::Base#get('a.b.c')` API.
873
+ - `ExtendedArray` — `split_into_groups(n)` used by Bolt's threaded
874
+ test runner.
875
+
876
+ Refinements must be activated per-file with `using
877
+ CemAcpt::CoreExt::ExtendedHash` / `ExtendedArray`.
878
+
879
+ ## 15. Logging
880
+
881
+ `lib/cem_acpt/logging.rb` defines:
882
+
883
+ - **`Logger < ::Logger`** — overrides `info|debug|warn|error|fatal` so
884
+ that, in CI mode, log lines are emitted as
885
+ `::notice::…` / `::warning::…` / `::error::…` GitHub Actions
886
+ annotations via `<<`. In trap context (after SIGINT) it bypasses
887
+ the standard `Logger` codepath (which uses a Mutex and would
888
+ deadlock) and writes directly to the raw logdev.
889
+ `start_ci_group` / `end_ci_group` emit `::group::` / `::endgroup::`.
890
+ - **`MultiLogger`** — fan-out delegator; methods are forwarded to all
891
+ underlying loggers if every one of them responds. Used by
892
+ `CemAcpt#initialize_logger!` to log to both `$stdout` and a
893
+ `--log-file FILE` simultaneously.
894
+ - **module-level `Logging.logger`** — global accessor. Including
895
+ `CemAcpt::Logging` in any class adds both an instance-level and
896
+ class-level `logger` method.
897
+
898
+ `-D / --debug` sets `log_level = 'debug'`. `-v / --verbose` enables
899
+ the custom `verbose` severity (debug-level message gated by a
900
+ `@verbose` flag — quieter than `--debug` but louder than `info`).
901
+
902
+ `-q / --quiet` drops `$stdout` from the logger's destinations. Reading
903
+ `cem_acpt.rb#initialize_logger!` (per RFC 0007):
904
+
905
+ - `--quiet` *with* `--log-file FILE`: stdout is dropped, log lines go
906
+ only to `FILE`.
907
+ - `--quiet` *without* `--log-file` *outside CI*: refused at startup
908
+ with a `RuntimeError` —
909
+ `--quiet without --log-file would silence all output; pass --log-file or drop --quiet`.
910
+ This is deliberate; the previous behavior silently kept stdout and
911
+ read as a no-op.
912
+ - CI mode (`GITHUB_ACTIONS`, `CI`, or `-I`): `$stdout` is forcibly
913
+ re-added to `logdevs` if `--quiet` would have dropped it. A single
914
+ `warn`-level line is emitted noting the override. This is intentional
915
+ — Actions needs stdout for `::group::` / `::notice::` directives —
916
+ but it does mean `--quiet` is partially overridden under CI.
917
+
918
+ ## 16. On-disk state
919
+
920
+ cem_acpt creates and uses a user-config directory at `~/.cem_acpt/`:
921
+
922
+ ```
923
+ ~/.cem_acpt/
924
+ ├── config.yaml # optional user-supplied config
925
+ ├── terraform_checksum.txt # sha256(version + tree of lib/terraform/)
926
+ └── terraform/ # copy of lib/terraform/ from the gem
927
+ ├── gcp/
928
+ │ ├── linux/ # provisioning template(s) for tests
929
+ │ └── windows/
930
+ └── image/
931
+ └── gcp/
932
+ ├── linux/ # provisioning template(s) for image-build
933
+ └── windows/
934
+ ```
935
+
936
+ `Config::Base#create_terraform_dir!` is responsible for keeping
937
+ `~/.cem_acpt/terraform/` in sync with the version of the gem
938
+ currently in use. It computes a SHA-256 over (a) `CemAcpt::VERSION` and
939
+ (b) every file/directory name under `lib/terraform/`. If the recorded
940
+ checksum differs from the current one (or doesn't exist), it
941
+ `rm_rf`s the on-disk copy and re-copies. The version is mixed into the
942
+ hash so a `gem install cem_acpt -v X.Y.Z` of the same source always
943
+ forces a refresh.
944
+
945
+ Per-run state is created under `~/.cem_acpt/terraform/`:
946
+
947
+ - Test runs: `~/.cem_acpt/terraform/test_<unix_ts>/`
948
+ - Image builds: `~/.cem_acpt/terraform/image/<platform>/image_builder_<unix_ts>/`
949
+
950
+ These directories are removed by `Provision::Terraform#destroy` /
951
+ the image builder's `terraform_destroy` step unless
952
+ `--no-destroy-nodes` is set, in which case the user is responsible for
953
+ cleaning them up. Ephemeral SSH keys live in `~/.ssh/acpt_test_key{,.pub}`
954
+ and are deleted on a successful destroy.
955
+
956
+ ## 17. Test suite
957
+
958
+ The repo's own RSpec suite lives under `spec/` and mirrors `lib/`:
959
+
960
+ - `spec/cem_acpt_spec.rb` — top-level smoke tests.
961
+ - `spec/cem_acpt/test_runner_spec.rb`
962
+ - `spec/cem_acpt/bolt_spec.rb`,
963
+ `spec/cem_acpt/bolt/summary_results_spec.rb`,
964
+ `spec/cem_acpt/bolt/cmd/{task_run_spec,task_show_spec,output_spec}.rb`,
965
+ `spec/cem_acpt/bolt/tests/{test_spec,testlist_spec}.rb`
966
+ - `spec/cem_acpt/config/cem_acpt_spec.rb`
967
+ - `spec/cem_acpt/platform/gcp_spec.rb`
968
+ - `spec/cem_acpt/provision/terraform/terraform_cmd_spec.rb`
969
+ - `spec/cem_acpt/test_runner/log_formatter/goss_action_response_spec.rb`
970
+ - `spec/fixtures/` — sample Goss/Bolt JSON output, sample config
971
+ YAMLs, and a `config_testing/` tree that mimics a `~/.cem_acpt/`
972
+ directory for the config-merge tests.
973
+
974
+ Conventions per `CLAUDE.md` and `cem_acpt.gemspec`:
975
+
976
+ - Ruby 3.0+ runtime; RuboCop targets Ruby 3.2.
977
+ - Document Ruby with YARD comments.
978
+ - All changes ship with tests.
979
+
980
+ These are *self-tests*. The acceptance tests under
981
+ `spec/acceptance/` referenced throughout this doc live in the
982
+ **consuming module's** repo (e.g. `sce_linux`), not here.
983
+
984
+ ## 18. External dependencies
985
+
986
+ ### Runtime gems
987
+
988
+ | Gem | Version | Used by |
989
+ |----------------------|-------------|--------------------------------------------|
990
+ | `async-http` | `~> 0.6x` | Goss action group (parallel HTTP GETs) |
991
+ | `bcrypt_pbkdf` | `~> 1.x` | (transitive — for `winrm`/`ed25519`) |
992
+ | `deep_merge` | `~> 1.x` | `Config::Base#load` |
993
+ | `dotenv` | `~> 3.x` | `exe/cem_acpt*` boot |
994
+ | `ed25519` | `~> 1.x` | (transitive) |
995
+ | `puppet-modulebuilder` | `>= 0.0.1`| `Utils::Puppet::ModulePackageBuilder` |
996
+ | `winrm` | `~> 2.x` | `Utils::WinRMRunner` |
997
+
998
+ ### External binaries
999
+
1000
+ | Binary | Required for | Discovered via |
1001
+ |--------------|-----------------------------|----------------------------|
1002
+ | `terraform` | All provisioning | `Utils::Shell.which` |
1003
+ | `gcloud` | GCP platform / image builder| `verify_gcloud!`, shell-out|
1004
+ | `ssh-keygen` | Ephemeral SSH keys | `Utils::Shell.which` |
1005
+ | `bolt` | Bolt action (optional) | `Utils::Shell.which` (intended-graceful, currently broken — see §19) |
1006
+
1007
+ ### Stdlib / Ruby version
1008
+
1009
+ - Requires Ruby `>= 3.0.0` (gemspec) and is developed against 3.2.
1010
+ - Uses `Async`/`Async::Barrier`/`Async::HTTP::Internet`,
1011
+ `Open3.popen3`, `IO.select`, `Queue`, `Mutex`, `Thread`, `TracePoint`,
1012
+ `WEBrick` (only on the test node, in `log_service.rb`).
1013
+
1014
+ ## 19. Open questions / observed dead code
1015
+
1016
+ These are things the exploration surfaced that don't seem load-bearing
1017
+ in the current code but would warrant a quick decision before
1018
+ spec-driven refactors.
1019
+
1020
+ 1. **`provisioner` is statically forced to `'terraform'`** in
1021
+ `Config::Base#add_static_options!`, even though the `Provision`
1022
+ factory branches on it. If we want a non-Terraform path in the
1023
+ future the static set would need to become a default.
1024
+ 2. **Platform constant cache.** `Platform.platform_class` caches the
1025
+ created class under `CemAcpt::Platform` (e.g.
1026
+ `CemAcpt::Platform::Gcp`) and uses `const_defined?(name, false)`,
1027
+ so unrelated same-named constants elsewhere in the graph cannot
1028
+ collide. Mixins live alongside under `CemAcpt::Platform::Mixin`.
1029
+ Resolved by [CEM-6717](rfcs/0008-namespace-platform-classes.md).
1030
+ 3. **`lib/terraform/image/gcp/windows/` is an empty directory.** It
1031
+ contains only `.keep` — no `main.tf`. As of the fix for
1032
+ [CEM-6712](rfcs/0003-windows-image-builder-template.md),
1033
+ `TerraformBuilder#assert_template_present!` raises
1034
+ `ImageBuilder::MissingTemplateError` early with a clear, actionable
1035
+ message naming the missing path and pointing at `--no-windows`,
1036
+ instead of letting `terraform init` fail with "no configuration
1037
+ files". Shipping an actual Windows `main.tf` is the long-term half
1038
+ of RFC 0003 and remains open.
1039
+
1040
+ If any of these are intentional ("yes, that's load-bearing because
1041
+ …") we should annotate them in the code comments rather than rely on
1042
+ oral history.