@hasna/uptime 0.1.9 → 0.1.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -40,19 +40,31 @@ write a sourceable env file with a placeholder probe identity.
40
40
 
41
41
  1. Locate the real infrastructure repository or create the change in the
42
42
  approved owner repository.
43
- 2. Confirm the AWS caller identity:
43
+ 2. Set the operator shell variables used by the command snippets:
44
44
 
45
45
  ```bash
46
- aws sts get-caller-identity --profile <aws-profile>
46
+ : "${AWS_PROFILE_NAME:?set AWS_PROFILE_NAME to the reviewed AWS profile}"
47
+ AWS_REGION="${AWS_REGION:-us-east-1}"
48
+ TF_DIR="${TF_DIR:-infra/aws}"
49
+ PLAN_FILE="${PLAN_FILE:-open-uptime.tfplan}"
47
50
  ```
48
51
 
49
- 3. Confirm the target VPC, private subnets, KMS key, and EFS/Backup plan inputs
52
+ 3. Confirm the AWS caller identity:
53
+
54
+ ```bash
55
+ aws sts get-caller-identity --profile "$AWS_PROFILE_NAME"
56
+ ```
57
+
58
+ 4. Confirm the target VPC, private subnets, KMS key, and EFS/Backup plan inputs
50
59
  still match the plan.
51
- 4. Confirm the protected access mode. The first deploy can use the CloudFront
60
+ 5. Confirm the protected access mode. The first deploy can use the CloudFront
52
61
  default HTTPS domain without custom DNS or ACM. Custom hostname deploys still
53
62
  require Route53/edge ownership and an ACM certificate.
54
- 5. Confirm the deployment role uses short-lived credentials or OIDC, not copied
63
+ 6. Confirm the deployment role uses short-lived credentials or OIDC, not copied
55
64
  access keys.
65
+ 7. Create a private evidence directory outside the public repository. Store
66
+ command output, plan summaries, screenshots, and incident notes there. Do
67
+ not store tokens, database URLs, probe private keys, or secret values.
56
68
 
57
69
  ## Required Resources
58
70
 
@@ -81,14 +93,259 @@ copy-pastable AWS mutation commands.
81
93
  Plan the included Terraform/OpenTofu starter without a backend:
82
94
 
83
95
  ```bash
84
- terraform -chdir=infra/aws fmt -check
85
- terraform -chdir=infra/aws init -backend=false
86
- terraform -chdir=infra/aws validate
87
- terraform -chdir=infra/aws plan -out open-uptime.tfplan
96
+ terraform -chdir="$TF_DIR" fmt -check
97
+ terraform -chdir="$TF_DIR" init -backend=false
98
+ terraform -chdir="$TF_DIR" validate
99
+ terraform -chdir="$TF_DIR" plan -out "$PLAN_FILE"
88
100
  ```
89
101
 
90
102
  Use Terraform/OpenTofu 1.9 or newer for this starter.
91
103
 
104
+ ## Zero-Count Apply
105
+
106
+ The first reviewed apply must create infrastructure with every ECS service at
107
+ desired count `0`.
108
+
109
+ 1. Confirm the plan has no deletes or replacements and that all ECS services are
110
+ dormant:
111
+
112
+ ```bash
113
+ terraform -chdir="$TF_DIR" show -json "$PLAN_FILE" \
114
+ | jq -r '.resource_changes[] | select(.type=="aws_ecs_service") | [.address, .change.after.desired_count] | @tsv'
115
+ ```
116
+
117
+ 2. Confirm Terraform is not managing secret values:
118
+
119
+ ```bash
120
+ terraform -chdir="$TF_DIR" show -json "$PLAN_FILE" \
121
+ | jq -r '.resource_changes[] | select(.type | test("secret_version|random_password|random_string")) | .address'
122
+ ```
123
+
124
+ This command must print nothing.
125
+
126
+ 3. Apply only the reviewed zero-count plan:
127
+
128
+ ```bash
129
+ terraform -chdir="$TF_DIR" apply "$PLAN_FILE"
130
+ ```
131
+
132
+ 4. Capture outputs, the source commit, the package version, the plan summary,
133
+ and the caller identity in private deployment evidence.
134
+
135
+ ## Image And Secrets
136
+
137
+ After the zero-count apply, build the image through the approved deploy pipeline
138
+ or the declared image builder. Record only the immutable digest, not build logs
139
+ that contain environment values:
140
+
141
+ ```bash
142
+ IMAGE_BUILDER_PROJECT="$(terraform -chdir="$TF_DIR" output -raw image_builder_project_name)"
143
+ aws codebuild start-build \
144
+ --profile "$AWS_PROFILE_NAME" \
145
+ --region "$AWS_REGION" \
146
+ --project-name "$IMAGE_BUILDER_PROJECT"
147
+ ```
148
+
149
+ Update the approved infra root so `container_image` is the immutable ECR digest,
150
+ then re-plan with all services still at `0`.
151
+
152
+ Populate Secrets Manager values out of band. Verify metadata only:
153
+
154
+ ```bash
155
+ terraform -chdir="$TF_DIR" output -json secret_refs | jq -r '.[]' | while read -r SECRET_ID; do
156
+ aws secretsmanager describe-secret \
157
+ --profile "$AWS_PROFILE_NAME" \
158
+ --region "$AWS_REGION" \
159
+ --secret-id "$SECRET_ID"
160
+ aws secretsmanager list-secret-version-ids \
161
+ --profile "$AWS_PROFILE_NAME" \
162
+ --region "$AWS_REGION" \
163
+ --secret-id "$SECRET_ID"
164
+ done
165
+ ```
166
+
167
+ Each required secret must have an `AWSCURRENT` version before any task is
168
+ started. Never run `get-secret-value` in shared logs or public evidence.
169
+
170
+ ## Protected Web Scale-Up
171
+
172
+ Before setting `desired_counts.web = 1`, verify:
173
+
174
+ - the image is an immutable digest, not a mutable tag or placeholder;
175
+ - required secrets have `AWSCURRENT` versions;
176
+ - `HASNA_UPTIME_ALLOWED_ORIGINS` matches the public HTTPS edge origin;
177
+ - CloudFront origin access is distribution-bound, not just narrowed to
178
+ CloudFront origin-facing ranges;
179
+ - web egress to ECR, Secrets Manager, CloudWatch Logs, S3, EFS, and any required
180
+ endpoints has been proven through NAT or VPC endpoints;
181
+ - scheduler, public-probe, reporter, and migration remain at `0`.
182
+
183
+ Scale only the web task, then capture the ECS deployment id and task definition
184
+ ARN:
185
+
186
+ ```bash
187
+ ECS_CLUSTER="$(terraform -chdir="$TF_DIR" output -raw ecs_cluster_name)"
188
+ WEB_SERVICE="$(terraform -chdir="$TF_DIR" output -json service_names | jq -r '.[] | select(endswith("-web"))')"
189
+ aws ecs describe-services \
190
+ --profile "$AWS_PROFILE_NAME" \
191
+ --region "$AWS_REGION" \
192
+ --cluster "$ECS_CLUSTER" \
193
+ --services "$WEB_SERVICE" \
194
+ --query 'services[0].{taskDefinition:taskDefinition,deployments:deployments[*].{id:id,status:status,desired:desiredCount,running:runningCount}}'
195
+ ```
196
+
197
+ ## Smoke Checks
198
+
199
+ Run these checks through the public edge URL and record status codes and request
200
+ ids. Use a scoped hosted token only from the operator secret store.
201
+
202
+ ```bash
203
+ EDGE_URL="$(terraform -chdir="$TF_DIR" output -raw protected_access_url)"
204
+ : "${HOSTED_TOKEN_FILE:?set HOSTED_TOKEN_FILE to a 0600 file containing the scoped hosted token}"
205
+ HOSTED_TOKEN="$(tr -d '\n' < "$HOSTED_TOKEN_FILE")"
206
+
207
+ curl -fsS "$EDGE_URL/health"
208
+ curl -i "$EDGE_URL/"
209
+ curl -i "$EDGE_URL/api/v1/summary"
210
+ curl -i -H "Authorization: Bearer $HOSTED_TOKEN" "$EDGE_URL/api/v1/summary"
211
+ ```
212
+
213
+ Expected results:
214
+
215
+ - `/health` returns `200` and no monitor data.
216
+ - Dashboard and API reads without auth return `401` or the approved identity
217
+ layer denial.
218
+ - Authenticated API reads return only the authorized workspace.
219
+ - Direct ALB origin access is denied unless it is the approved CloudFront origin
220
+ path.
221
+
222
+ ## Logs And Alarms
223
+
224
+ Inspect recent web logs without printing secrets:
225
+
226
+ ```bash
227
+ WEB_LOG_GROUP="$(terraform -chdir="$TF_DIR" output -json log_group_names | jq -r '.web')"
228
+ aws logs tail "$WEB_LOG_GROUP" \
229
+ --profile "$AWS_PROFILE_NAME" \
230
+ --region "$AWS_REGION" \
231
+ --since 15m
232
+ ```
233
+
234
+ Verify the initial web alarms exist and are not already alarming:
235
+
236
+ ```bash
237
+ WEB_5XX_ALARM="$(terraform -chdir="$TF_DIR" output -json alarm_names | jq -r '.web_5xx')"
238
+ WEB_UNHEALTHY_ALARM="$(terraform -chdir="$TF_DIR" output -json alarm_names | jq -r '.web_unhealthy')"
239
+ aws cloudwatch describe-alarms \
240
+ --profile "$AWS_PROFILE_NAME" \
241
+ --region "$AWS_REGION" \
242
+ --alarm-names "$WEB_5XX_ALARM" "$WEB_UNHEALTHY_ALARM" \
243
+ --query 'MetricAlarms[*].{name:AlarmName,state:StateValue,reason:StateReason}'
244
+ ```
245
+
246
+ Scheduler-stall, stale-probe, and report-delivery alarms stay blocked until
247
+ those workers are implemented, emit metrics, and are enabled.
248
+
249
+ ## Backups And Restore Evidence
250
+
251
+ Verify EFS backup coverage after the first apply:
252
+
253
+ ```bash
254
+ BACKUP_VAULT="$(terraform -chdir="$TF_DIR" output -raw backup_vault_name)"
255
+ EFS_FILE_SYSTEM_ID="$(terraform -chdir="$TF_DIR" output -raw efs_file_system_id)"
256
+ EFS_FILE_SYSTEM_ARN="$(aws efs describe-file-systems \
257
+ --profile "$AWS_PROFILE_NAME" \
258
+ --region "$AWS_REGION" \
259
+ --file-system-id "$EFS_FILE_SYSTEM_ID" \
260
+ --query 'FileSystems[0].FileSystemArn' \
261
+ --output text)"
262
+
263
+ aws backup list-protected-resources \
264
+ --profile "$AWS_PROFILE_NAME" \
265
+ --region "$AWS_REGION" \
266
+ --query "Results[?ResourceArn=='$EFS_FILE_SYSTEM_ARN'].[ResourceArn,LastBackupTime]"
267
+ aws backup list-recovery-points-by-backup-vault \
268
+ --profile "$AWS_PROFILE_NAME" \
269
+ --region "$AWS_REGION" \
270
+ --backup-vault-name "$BACKUP_VAULT" \
271
+ --query "RecoveryPoints[?ResourceArn=='$EFS_FILE_SYSTEM_ARN'].[RecoveryPointArn,Status,CreationDate]"
272
+ ```
273
+
274
+ A restore drill must restore to a separate file system or staging target first.
275
+ Do not overwrite the production EFS file system during a drill. Record the
276
+ recovery point ARN, restore job id, target resource, validation result, and
277
+ cleanup action.
278
+
279
+ Run the restore drill with a dedicated restore role and a staging security group
280
+ and subnet. The metadata keys are AWS Backup EFS restore metadata; keep the
281
+ staging file system encrypted with the Open Uptime KMS key.
282
+
283
+ ```bash
284
+ : "${RECOVERY_POINT_ARN:?set RECOVERY_POINT_ARN to the selected recovery point ARN}"
285
+ : "${RESTORE_ROLE_ARN:?set RESTORE_ROLE_ARN to the AWS Backup restore role ARN}"
286
+ : "${STAGING_SUBNET_ID:?set STAGING_SUBNET_ID to the staging private subnet id}"
287
+ : "${STAGING_SECURITY_GROUP_ID:?set STAGING_SECURITY_GROUP_ID to the staging EFS security group id}"
288
+ KMS_KEY_ARN="$(terraform -chdir="$TF_DIR" output -raw kms_key_arn)"
289
+
290
+ RESTORE_JOB_ID="$(aws backup start-restore-job \
291
+ --profile "$AWS_PROFILE_NAME" \
292
+ --region "$AWS_REGION" \
293
+ --recovery-point-arn "$RECOVERY_POINT_ARN" \
294
+ --iam-role-arn "$RESTORE_ROLE_ARN" \
295
+ --resource-type EFS \
296
+ --metadata "file-system-id=$EFS_FILE_SYSTEM_ID,newFileSystem=true,encrypted=true,kmsKeyId=$KMS_KEY_ARN,performanceMode=generalPurpose,throughputMode=bursting" \
297
+ --query 'RestoreJobId' \
298
+ --output text)"
299
+
300
+ aws backup describe-restore-job \
301
+ --profile "$AWS_PROFILE_NAME" \
302
+ --region "$AWS_REGION" \
303
+ --restore-job-id "$RESTORE_JOB_ID" \
304
+ --query '{status:Status,createdResourceArn:CreatedResourceArn,statusMessage:StatusMessage}'
305
+ ```
306
+
307
+ Poll `describe-restore-job` until `Status` is `COMPLETED`, then create a
308
+ temporary mount target for the restored file system in the staging subnet:
309
+
310
+ ```bash
311
+ RESTORED_EFS_ID="$(aws backup describe-restore-job \
312
+ --profile "$AWS_PROFILE_NAME" \
313
+ --region "$AWS_REGION" \
314
+ --restore-job-id "$RESTORE_JOB_ID" \
315
+ --query 'CreatedResourceArn' \
316
+ --output text | awk -F/ '{print $NF}')"
317
+
318
+ aws efs create-mount-target \
319
+ --profile "$AWS_PROFILE_NAME" \
320
+ --region "$AWS_REGION" \
321
+ --file-system-id "$RESTORED_EFS_ID" \
322
+ --subnet-id "$STAGING_SUBNET_ID" \
323
+ --security-groups "$STAGING_SECURITY_GROUP_ID"
324
+ ```
325
+
326
+ Validate the restored `/data/uptime/uptime.db` from a staging host or task with
327
+ read-only SQLite integrity checks. Capture only counts and integrity status, not
328
+ monitor targets or secrets:
329
+
330
+ ```bash
331
+ sqlite3 /mnt/restore/uptime/uptime.db 'PRAGMA integrity_check;'
332
+ sqlite3 /mnt/restore/uptime/uptime.db 'SELECT COUNT(*) FROM monitors;'
333
+ ```
334
+
335
+ After evidence is recorded, delete the staging mount target and restored file
336
+ system. Never mount the restored file system over production during a drill.
337
+
338
+ ## Reports And Reporter Gate
339
+
340
+ Report preview can be tested locally or through authenticated read APIs. Hosted
341
+ delivery attempts through Mailery, Telephony, or Open Logs must stay disabled
342
+ until the reporter has cloud channel refs, idempotency storage, retry/backoff
343
+ state, audit rows, and delivery alarms.
344
+
345
+ Do not set `desired_counts.reporter = 1` until a reviewed runbook section exists
346
+ for report retry, duplicate suppression, provider failure handling, and delivery
347
+ audit export.
348
+
92
349
  ## Private Probe Operator
93
350
 
94
351
  The operator machine should be a private probe/operator machine, not the hosted
@@ -112,6 +369,11 @@ routes are backed by cloud check jobs and cloud audit rows.
112
369
  URLs, or probe private keys in task definitions. Use ECS `secrets.valueFrom`
113
370
  refs such as `HASNA_UPTIME_HOSTED_TOKEN`.
114
371
  - Do not run public probe workers against private targets.
372
+ - Do not enable public probe workers until runtime target policy resolves and
373
+ pins DNS answers, rejects redirects and DNS rebinding into denied ranges, and
374
+ emits target-policy decision records. The current configuration-time policy
375
+ blocks direct denied hosts, including IPv4-mapped IPv6 forms, but it is not a
376
+ substitute for execution-time DNS and redirect enforcement.
115
377
  - Do not enable scheduler, public-probe, reporter, or migration workers against
116
378
  the EFS SQLite bridge; those services need Postgres/cloud leases first.
117
379
  - Do not expose dashboard/API routes without hosted auth and workspace checks.
@@ -128,8 +390,59 @@ routes are backed by cloud check jobs and cloud audit rows.
128
390
 
129
391
  ## Rollback
130
392
 
131
- Before each service update, record the previous task definition ARN. Roll back
132
- by disabling scheduler/reporter work first, then restoring the previous web or
133
- worker task definition. EFS backup restore requires separate operator approval,
134
- a selected recovery point, a replacement mount target/access point cutover, and
135
- an audit event.
393
+ Before each service update, record the previous task definition ARN and current
394
+ desired counts:
395
+
396
+ ```bash
397
+ ECS_CLUSTER="$(terraform -chdir="$TF_DIR" output -raw ecs_cluster_name)"
398
+ WEB_SERVICE="$(terraform -chdir="$TF_DIR" output -json service_names | jq -r '.[] | select(endswith("-web"))')"
399
+ aws ecs describe-services \
400
+ --profile "$AWS_PROFILE_NAME" \
401
+ --region "$AWS_REGION" \
402
+ --cluster "$ECS_CLUSTER" \
403
+ --services "$WEB_SERVICE" \
404
+ --query 'services[0].{taskDefinition:taskDefinition,desired:desiredCount,running:runningCount}'
405
+ ```
406
+
407
+ If web health fails after scale-up, first scale web back to `0`:
408
+
409
+ ```bash
410
+ aws ecs update-service \
411
+ --profile "$AWS_PROFILE_NAME" \
412
+ --region "$AWS_REGION" \
413
+ --cluster "$ECS_CLUSTER" \
414
+ --service "$WEB_SERVICE" \
415
+ --desired-count 0
416
+ ```
417
+
418
+ If a later task definition is bad, restore the previous task definition and keep
419
+ workers disabled:
420
+
421
+ ```bash
422
+ : "${PREVIOUS_TASK_DEFINITION_ARN:?set PREVIOUS_TASK_DEFINITION_ARN from the pre-update evidence}"
423
+ aws ecs update-service \
424
+ --profile "$AWS_PROFILE_NAME" \
425
+ --region "$AWS_REGION" \
426
+ --cluster "$ECS_CLUSTER" \
427
+ --service "$WEB_SERVICE" \
428
+ --task-definition "$PREVIOUS_TASK_DEFINITION_ARN" \
429
+ --desired-count 1
430
+ ```
431
+
432
+ Disable scheduler/reporter/probe work before data rollback. EFS backup restore
433
+ requires separate operator approval, a selected recovery point, a replacement
434
+ mount target/access point cutover, validation in staging, and an audit event.
435
+
436
+ ## Evidence Checklist
437
+
438
+ A deployment record is not complete until it contains:
439
+
440
+ - source commit, package version, published package integrity, and image digest;
441
+ - Terraform plan summary and zero-count desired-count proof;
442
+ - secret metadata proof showing `AWSCURRENT` without secret values;
443
+ - protected edge smoke results and direct-origin denial evidence;
444
+ - ECS service/task definition evidence;
445
+ - CloudWatch log tail and alarm-state readback;
446
+ - backup vault, protected-resource, recovery-point, and restore-drill evidence;
447
+ - rollback command transcript or dry-run notes;
448
+ - explicit list of remaining disabled workers and why they remain disabled.
@@ -26,6 +26,41 @@ output "evidence_bucket" {
26
26
  value = aws_s3_bucket.evidence.bucket
27
27
  }
28
28
 
29
+ output "kms_key_arn" {
30
+ value = var.kms_key_arn
31
+ }
32
+
33
+ output "secret_refs" {
34
+ value = {
35
+ app_env = var.app_env_secret_arn
36
+ hosted_token = var.hosted_token_secret_arn
37
+ public_probe = var.public_probe_secret_arn
38
+ reporting = var.reporting_secret_arn
39
+ }
40
+ }
41
+
42
+ output "log_group_names" {
43
+ value = merge(
44
+ { image_builder = aws_cloudwatch_log_group.image_builder.name },
45
+ { for role, group in aws_cloudwatch_log_group.service : role => group.name },
46
+ )
47
+ }
48
+
49
+ output "alarm_names" {
50
+ value = {
51
+ web_5xx = aws_cloudwatch_metric_alarm.web_5xx.alarm_name
52
+ web_unhealthy = aws_cloudwatch_metric_alarm.web_unhealthy.alarm_name
53
+ }
54
+ }
55
+
56
+ output "backup_vault_name" {
57
+ value = aws_backup_vault.data.name
58
+ }
59
+
60
+ output "backup_plan_id" {
61
+ value = aws_backup_plan.data.id
62
+ }
63
+
29
64
  output "efs_file_system_id" {
30
65
  value = aws_efs_file_system.data.id
31
66
  }
@@ -15,7 +15,7 @@ public_subnet_ids = ["subnet-replace-public-a", "subnet-replace-public-b"
15
15
  alb_ingress_cidr_blocks = []
16
16
  private_subnet_ids = ["subnet-replace-private-a", "subnet-replace-private-b"]
17
17
  container_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/open-uptime@sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
18
- runtime_package_version = "0.1.9"
18
+ runtime_package_version = "0.1.11"
19
19
  certificate_arn = null
20
20
  hosted_zone_id = null
21
21
  app_env_secret_arn = "arn:aws:secretsmanager:us-east-1:123456789012:secret:open-uptime/prod/app/env"
@@ -116,7 +116,7 @@ variable "container_image" {
116
116
  variable "runtime_package_version" {
117
117
  description = "Published @hasna/uptime package version that CodeBuild should build into the ECR image."
118
118
  type = string
119
- default = "0.1.9"
119
+ default = "0.1.11"
120
120
 
121
121
  validation {
122
122
  condition = can(regex("^[0-9]+\\.[0-9]+\\.[0-9]+(-[0-9A-Za-z.-]+)?$", var.runtime_package_version))
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hasna/uptime",
3
- "version": "0.1.9",
3
+ "version": "0.1.11",
4
4
  "description": "Local-first uptime and downtime monitoring service with CLI, MCP, SDK, SQLite persistence, and a dashboard.",
5
5
  "license": "Apache-2.0",
6
6
  "type": "module",