@hasna/uptime 0.1.20 → 0.1.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +2 -1
- package/dist/api.js +5 -2
- package/dist/cli/index.js +6 -3
- package/dist/cloud-plan.js +1 -1
- package/dist/index.js +6 -3
- package/docs/architecture.md +43 -0
- package/docs/aws-runtime-security.md +473 -0
- package/docs/cloud-source-of-truth.md +482 -0
- package/docs/deployment-metadata.example.json +52 -0
- package/docs/monitoring-product-contract.md +493 -0
- package/docs/operational-tracking.md +91 -0
- package/infra/aws/terraform.tfvars.example +1 -1
- package/infra/aws/variables.tf +1 -1
- package/package.json +3 -2
|
@@ -0,0 +1,482 @@
|
|
|
1
|
+
# Cloud Source Of Truth
|
|
2
|
+
|
|
3
|
+
This document defines the target source-of-truth model for running Open Uptime as
|
|
4
|
+
an internal Hasna cloud service while keeping local developer workflows intact.
|
|
5
|
+
|
|
6
|
+
The current release is local-first: Open Uptime stores SQLite under
|
|
7
|
+
`~/.hasna/uptime`, and the dashboard/API are intended for local or trusted
|
|
8
|
+
loopback use. Hosted cloud mode is a separate operating mode and must not be
|
|
9
|
+
implemented as "sync the local SQLite database and expose it on the web".
|
|
10
|
+
|
|
11
|
+
Current deployment bridge as of 2026-06-28: the deployable AWS runtime uses an
|
|
12
|
+
explicit EFS-mounted SQLite database at `HASNA_UPTIME_HOSTED_SQLITE_DB` with AWS
|
|
13
|
+
Backup. That is cloud-backed file storage for the first protected service
|
|
14
|
+
deployment, not the final cloud source-of-truth design. The target state remains
|
|
15
|
+
a first-class Postgres adapter with workspace-scoped migrations, leases,
|
|
16
|
+
tombstones, and audit rows.
|
|
17
|
+
|
|
18
|
+
## Principles
|
|
19
|
+
|
|
20
|
+
- Cloud-primary must mean an explicit cloud mode, not `hybrid` fallback.
|
|
21
|
+
- Every service owns its own durable data. Cross-service integration stores
|
|
22
|
+
stable references and snapshots, not copied private records or secrets.
|
|
23
|
+
- Local SQLite and Markdown stores become caches, import sources, or developer
|
|
24
|
+
fallback stores after cutover. They are not authoritative in cloud mode.
|
|
25
|
+
- Deletes use tombstones and versions. A local stale write must never overwrite
|
|
26
|
+
a newer cloud row.
|
|
27
|
+
- A private operator/probe machine can be preferred for local checks, but authority comes from
|
|
28
|
+
cloud leases, service credentials, migrations, and audit records.
|
|
29
|
+
- Secret values stay in AWS Secrets Manager or the owning service. Cloud records
|
|
30
|
+
store secret references, channel ids, and redacted metadata only.
|
|
31
|
+
- Rendered dashboards and canvases are JSON Render/React Flow specs generated
|
|
32
|
+
from cloud records. They must not embed credentials, raw local paths that leak
|
|
33
|
+
private state, or provider tokens.
|
|
34
|
+
|
|
35
|
+
## Canonical Stores
|
|
36
|
+
|
|
37
|
+
| Surface | Cloud source of truth | Local role | Notes |
|
|
38
|
+
| --- | --- | --- | --- |
|
|
39
|
+
| Open Uptime | Dedicated `uptime` Postgres schema or database on the approved apps RDS, plus object storage for browser evidence. | `~/.hasna/uptime/uptime.db` is local/dev fallback and migration source only. | Needs first-class Postgres store, migrations, distributed check leases, audit tables, and tombstones. |
|
|
40
|
+
| Projects registry | `projects` database on the approved apps RDS. | `~/.hasna/projects/projects.db` is cache/fallback. | Open Uptime project id is deployment-specific; link external service ids rather than duplicating project rows. |
|
|
41
|
+
| Per-project stores | Cloud rows keyed by `workspace_id` and app namespace, with local cache at `$HASNA_PROJECTS_HOME/data/<workspace_id>/project.db`. | Existing `by-id/<workspace_id>/project.db` paths remain compatibility imports until migrated. | Stores todos links, canvases, JSON Render specs, loops, handoffs, mementos refs, notes refs, and knowledge refs. |
|
|
42
|
+
| Project canvases | Project cloud store tables: canvases, nodes, edges, layout, render spec refs. | Local cache for offline inspection only. | Multiple canvases per project are allowed. React Flow is the editing surface; JSON Render specs are the view payload. |
|
|
43
|
+
| Todos | Todos cloud database after conflict/tombstone fixes. | `~/.hasna/todos/todos.db` is cache. | Reuse the existing `open-uptime` todos project instead of creating a duplicate. Current unresolved conflicts must be reconciled before cutover. |
|
|
44
|
+
| Conversations | Conversations cloud database after messages, reactions, receipts, tasks, and activity are included or assigned to another owner. | `~/.hasna/conversations/messages.db` is cache. | Until then, conversation metadata can be linked, but the service is not cloud-primary for full conversation history. |
|
|
45
|
+
| Mementos | Mementos cloud database after versioned tombstones and conflict semantics are fixed. | `~/.hasna/mementos/mementos.db` is cache. | Store refs from Open Uptime/project stores; avoid copying large memory bodies into Uptime. |
|
|
46
|
+
| Knowledge | Knowledge cloud artifact/index store. | Local knowledge DB/files are authoring cache. | Generated architecture records should go through the knowledge/artifact API once cloud auth is ready. |
|
|
47
|
+
| Notes | Notes cloud metadata plus object storage for Markdown/audio. | Local Markdown/audio files are cache and authoring source during migration. | Fleet `rsync` is not sufficient for cloud-primary. |
|
|
48
|
+
| Servers | Open Servers remains owner of server inventory. | Local SQLite remains source until Open Servers gets cloud-primary mode. | Uptime imports refs/snapshots and runs private probes, not arbitrary operator-entered private targets. |
|
|
49
|
+
| Domains | Open Domains remains owner of domain/DNS inventory. | Local SQLite remains source until Open Domains gets cloud-primary mode. | Uptime imports domain refs for DNS, TLS, expiry, and root HTTP monitors. |
|
|
50
|
+
| Deployment | Open Deployment remains owner of deployment inventory and run state. | Local SQLite remains source until Open Deployment gets cloud-primary mode. | Uptime imports latest environment/resource refs; it must not expose Open Deployment's server publicly. |
|
|
51
|
+
| Mailery, Telephony, Logs | Owning services and AWS secrets own delivery configuration. | Local URLs/keys are dev-only. | Hosted Uptime uses configured channel ids and secret refs. Requests must not submit raw `apiUrl`, `sendKey`, or `apiKey`. |
|
|
52
|
+
|
|
53
|
+
## Open Uptime Cloud Data Model
|
|
54
|
+
|
|
55
|
+
Open Uptime cloud mode needs tables for:
|
|
56
|
+
|
|
57
|
+
- `assets`: imported or manual monitorable things, keyed by
|
|
58
|
+
`source_service`, `source_table`, `source_id`, `workspace_id`, and `kind`.
|
|
59
|
+
- `monitors`: cloud monitor config, owner/team/env/tags, source asset ref,
|
|
60
|
+
selected probe policy, assertion config, retry/timeout/interval config, and
|
|
61
|
+
status.
|
|
62
|
+
- `probes`: public and private probe registrations with capabilities, machine
|
|
63
|
+
id, region/location, version, last heartbeat, and trust policy.
|
|
64
|
+
- `check_jobs`: scheduled work leased transactionally by probes.
|
|
65
|
+
- `check_results`: immutable final results with timing, normalized error,
|
|
66
|
+
probe id, monitor version, and evidence refs.
|
|
67
|
+
- `incidents`: duration-based downtime windows with ack/silence/assignment,
|
|
68
|
+
escalation state, timeline, affected assets, and report suppression state.
|
|
69
|
+
- `browser_evidence`: redacted screenshot/trace/console/network artifact refs.
|
|
70
|
+
- `report_schedules` and `report_runs`: SLA/report windows, recipients or
|
|
71
|
+
channel refs, delivery attempts, and generated JSON/HTML summary refs.
|
|
72
|
+
- `audit_events`: actor, source IP/proxy identity, machine id, reason,
|
|
73
|
+
mutation target, before/after hashes, and idempotency key.
|
|
74
|
+
- `sync_tombstones`: deleted ids with entity type, version, actor, and expiry.
|
|
75
|
+
|
|
76
|
+
All rows that can change must carry at least `id`, `workspace_id`, `version`,
|
|
77
|
+
`created_at`, `updated_at`, `deleted_at`, `origin_machine_id`, `actor_ref`,
|
|
78
|
+
and `idempotency_key` where applicable.
|
|
79
|
+
|
|
80
|
+
The first cloud schema must be implemented as a real Postgres adapter. A generic
|
|
81
|
+
`@hasna/cloud` snapshot sync may support discovery, status, migration reporting,
|
|
82
|
+
or backfill, but it is not the runtime data store for checks, probes, incidents,
|
|
83
|
+
reports, or operator actions.
|
|
84
|
+
|
|
85
|
+
## Auth, Workspace, And Audit Contract
|
|
86
|
+
|
|
87
|
+
Hosted mode is closed by default:
|
|
88
|
+
|
|
89
|
+
- `/health` is the only unauthenticated endpoint.
|
|
90
|
+
- dashboard HTML, JSON APIs, MCP-over-HTTP, JSON Render specs, canvas records,
|
|
91
|
+
browser artifacts, report previews, import previews, and every mutation
|
|
92
|
+
require an authenticated actor and workspace context.
|
|
93
|
+
- service tokens are scoped by purpose: `uptime:read`, `uptime:write`,
|
|
94
|
+
`uptime:probe`, `uptime:report`, `uptime:admin`, and service-specific import
|
|
95
|
+
scopes.
|
|
96
|
+
- probe tokens and operator tokens are separate. A probe can claim jobs and
|
|
97
|
+
submit results, but cannot read unrelated monitor configuration, export
|
|
98
|
+
reports, mutate imports, or administer workspaces.
|
|
99
|
+
- workspace isolation is enforced in the storage layer through RLS or equivalent
|
|
100
|
+
scoped queries, workspace-scoped unique indexes, and service methods that
|
|
101
|
+
require explicit workspace context.
|
|
102
|
+
- tokens must be rotatable and revocable. Rotation and revocation are audit
|
|
103
|
+
events.
|
|
104
|
+
|
|
105
|
+
Every cloud mutation writes an immutable `audit_events` record:
|
|
106
|
+
|
|
107
|
+
- actor, workspace, machine id, probe id when relevant, source IP or proxy
|
|
108
|
+
identity, user agent or service name, action, target entity, target version,
|
|
109
|
+
idempotency key, reason, and before/after hashes;
|
|
110
|
+
- create, update, delete/tombstone, check, check-result ingest, report send,
|
|
111
|
+
import preview, import apply, probe registration, probe lease, rollback, and
|
|
112
|
+
migration actions are audited;
|
|
113
|
+
- audit rows never store secret values, full browser evidence, raw request
|
|
114
|
+
bodies, or unredacted private target URLs.
|
|
115
|
+
|
|
116
|
+
Hosted tests must prove unauthenticated reads fail, mutation requests without
|
|
117
|
+
the right scope fail, and workspace A cannot read, mutate, check, report, import,
|
|
118
|
+
or delete workspace B data.
|
|
119
|
+
|
|
120
|
+
## Target Policy
|
|
121
|
+
|
|
122
|
+
The target-state architecture uses one shared target policy at both
|
|
123
|
+
configuration time and execution time. The current hosted API implements
|
|
124
|
+
configuration-time checks for direct targets, and the SDK exposes
|
|
125
|
+
`runHostedHttpCheck` for hosted public HTTP probes. That runner performs runtime
|
|
126
|
+
DNS resolution, address pinning, redirect validation, DNS-rebinding protection,
|
|
127
|
+
and decision-record evidence. Public probe execution stays disabled until cloud
|
|
128
|
+
check-job leases and the public-probe worker are wired to that runner and
|
|
129
|
+
validated in AWS.
|
|
130
|
+
|
|
131
|
+
Public probes must deny:
|
|
132
|
+
|
|
133
|
+
- loopback, wildcard, unspecified, link-local, multicast, RFC1918, carrier-grade
|
|
134
|
+
NAT, IPv6 ULA/link-local, and cloud metadata ranges;
|
|
135
|
+
- `localhost` and names resolving to denied ranges;
|
|
136
|
+
- URL userinfo such as `https://user:pass@example.com`;
|
|
137
|
+
- redirects to a denied target;
|
|
138
|
+
- DNS rebinding between validation and execution;
|
|
139
|
+
- arbitrary TCP hosts that are not approved public targets.
|
|
140
|
+
|
|
141
|
+
Private targets are allowed only when they come from an approved inventory ref,
|
|
142
|
+
such as Open Servers or a deployment resource, and only on private probes whose
|
|
143
|
+
egress policy permits that target class. Operators cannot bypass this by typing
|
|
144
|
+
an arbitrary private IP or hostname into a hosted monitor form.
|
|
145
|
+
|
|
146
|
+
The target policy must expose a decision record with target class, resolved
|
|
147
|
+
addresses, rule id, probe class, and redacted target display. Tests must cover
|
|
148
|
+
localhost, link-local, metadata endpoints, private IPs, IPv6 denied ranges,
|
|
149
|
+
redirects, DNS rebinding, private DNS names, Tailscale/private names, and
|
|
150
|
+
secret-like URL query strings.
|
|
151
|
+
|
|
152
|
+
## Monitor Taxonomy
|
|
153
|
+
|
|
154
|
+
The cloud product must define monitor kinds and assertion schemas before
|
|
155
|
+
implementation. Initial first-class kinds:
|
|
156
|
+
|
|
157
|
+
- `http`: URL, method, redirects, expected status, header assertions, body text
|
|
158
|
+
assertions, JSON assertions, latency threshold, retry policy.
|
|
159
|
+
- `browser_page`: URL, viewport/device, navigation timeout, console-error
|
|
160
|
+
policy, uncaught-exception policy, failed-resource policy, screenshot/trace
|
|
161
|
+
policy, Core Web Vitals-lite timing, and optional DOM assertions.
|
|
162
|
+
- `tcp`: host/port connect from approved public or private probe.
|
|
163
|
+
- `server_health`: inventory-backed health URL or port check from Open Servers,
|
|
164
|
+
always routed through private probes unless the source is explicitly public.
|
|
165
|
+
- `dns`: record type, authoritative and recursive resolvers, expected values,
|
|
166
|
+
propagation/drift policy.
|
|
167
|
+
- `tls`: hostname, expiry threshold, chain validity, hostname match, issuer
|
|
168
|
+
metadata.
|
|
169
|
+
- `domain_expiry`: registry/RDAP expiry threshold and registrar metadata.
|
|
170
|
+
- `deployment`: imported deployment/environment resource status, latest live URL
|
|
171
|
+
health, rollback/failure signal.
|
|
172
|
+
- `heartbeat`: external job or service check-in before a deadline.
|
|
173
|
+
- `report_delivery`: scheduled report generation and delivery health.
|
|
174
|
+
|
|
175
|
+
Each kind needs a config schema, normalized result schema, summary fields,
|
|
176
|
+
failure reason taxonomy, CLI/API/MCP/SDK representation, JSON Render view, and
|
|
177
|
+
contract tests.
|
|
178
|
+
|
|
179
|
+
## Browser Evidence And PII
|
|
180
|
+
|
|
181
|
+
Browser monitoring is not part of the local-first release. It becomes cloud
|
|
182
|
+
scope only when the evidence pipeline is in place:
|
|
183
|
+
|
|
184
|
+
- screenshots, traces, HAR-like network records, console logs, page errors, and
|
|
185
|
+
HTML snippets are stored in an encrypted object bucket with versioning,
|
|
186
|
+
lifecycle, retention, and public access blocks;
|
|
187
|
+
- Postgres stores artifact refs, redaction status, checksum, content type,
|
|
188
|
+
size, retention class, workspace id, monitor id, result id, and evidence
|
|
189
|
+
grouping key;
|
|
190
|
+
- signed URLs are short-lived, workspace-scoped, and audit logged;
|
|
191
|
+
- scrubbing removes cookies, auth headers, tokens, secret-like query params,
|
|
192
|
+
form values, local storage/session storage values, and credential-looking
|
|
193
|
+
console/network payloads before persistence;
|
|
194
|
+
- screenshot capture must support masking selectors and page areas;
|
|
195
|
+
- default retention is short, with longer retention only by policy.
|
|
196
|
+
|
|
197
|
+
If this evidence pipeline is not implemented, browser/page checks must stay out
|
|
198
|
+
of hosted cutover.
|
|
199
|
+
|
|
200
|
+
## Probe And Check Job Protocol
|
|
201
|
+
|
|
202
|
+
Cloud checks are never scheduled by independent local loops over local SQLite.
|
|
203
|
+
They use cloud jobs and cloud leases:
|
|
204
|
+
|
|
205
|
+
1. scheduler creates deterministic `check_jobs` for monitor/version/schedule
|
|
206
|
+
slots;
|
|
207
|
+
2. probes heartbeat with machine id, version, capabilities, location, trust
|
|
208
|
+
class, and current load;
|
|
209
|
+
3. probes claim jobs transactionally with a TTL and fencing token;
|
|
210
|
+
4. probes execute only jobs matching their capabilities and target policy;
|
|
211
|
+
5. result ingest requires the active fencing token and idempotency key;
|
|
212
|
+
6. expired leases can be reclaimed, but duplicate result ingest for the same
|
|
213
|
+
job slot is rejected or marked duplicate;
|
|
214
|
+
7. probe health is derived from heartbeat lag, failed claims, execution errors,
|
|
215
|
+
result latency, and version drift.
|
|
216
|
+
|
|
217
|
+
Two-probe race tests must prove that one schedule slot produces one authoritative
|
|
218
|
+
result and that stale fencing tokens cannot submit or overwrite results.
|
|
219
|
+
|
|
220
|
+
## Import Preview And Apply Contract
|
|
221
|
+
|
|
222
|
+
Import is a reviewable workflow, not direct bulk creation.
|
|
223
|
+
|
|
224
|
+
Preview output includes:
|
|
225
|
+
|
|
226
|
+
- source service, source table/model, source id, source updated time, workspace,
|
|
227
|
+
candidate kind, redacted target display, proposed monitor config, tags,
|
|
228
|
+
owner/team/env, source checksum, freshness, conflicts, and policy warnings;
|
|
229
|
+
- action: create, update, unchanged, stale, blocked, or conflict;
|
|
230
|
+
- idempotency key: `source_service/source_table/source_id/kind`;
|
|
231
|
+
- no secret values and no raw local-only paths in hosted render payloads.
|
|
232
|
+
|
|
233
|
+
Apply input references preview ids and chosen actions. Apply writes assets,
|
|
234
|
+
monitors, provenance snapshots, audit events, and rollback records. Stale source
|
|
235
|
+
records mark assets stale; they do not delete monitor history automatically.
|
|
236
|
+
Rollback can undo newly-created monitor config from an import batch without
|
|
237
|
+
deleting historical check results.
|
|
238
|
+
|
|
239
|
+
CLI, API, MCP, and SDK surfaces must expose dry-run preview and apply with the
|
|
240
|
+
same schema.
|
|
241
|
+
|
|
242
|
+
## Incident And Operator Workflow
|
|
243
|
+
|
|
244
|
+
Cloud incidents are duration-based operational records. They need:
|
|
245
|
+
|
|
246
|
+
- severity, owner/team, assignee, affected asset refs, source monitor refs,
|
|
247
|
+
status, opened/resolved timestamps, detection result id, recovery result id,
|
|
248
|
+
and SLA impact;
|
|
249
|
+
- actions: acknowledge, unacknowledge, silence, unsilence, create maintenance
|
|
250
|
+
window, assign, comment, manual close, reopen, link related incident, and
|
|
251
|
+
attach evidence;
|
|
252
|
+
- escalation state, notification suppression state, and report inclusion policy;
|
|
253
|
+
- immutable timeline events for checks, operator actions, notification attempts,
|
|
254
|
+
imports, maintenance changes, and report events.
|
|
255
|
+
|
|
256
|
+
Every action requires actor, scope, reason, idempotency key, and audit row.
|
|
257
|
+
Dashboard and JSON Render views must support filtering by owner, environment,
|
|
258
|
+
source, probe, kind, severity, status, assignment, silence, and maintenance.
|
|
259
|
+
|
|
260
|
+
## Reports And Delivery
|
|
261
|
+
|
|
262
|
+
Local development can keep direct Mailery/Telephony/Logs options. Hosted mode
|
|
263
|
+
uses workspace-authorized delivery channel refs and secret refs only.
|
|
264
|
+
|
|
265
|
+
Hosted report APIs must reject raw `apiUrl`, `sendKey`, `apiKey`, arbitrary
|
|
266
|
+
Open Logs project ids, and arbitrary delivery destinations. The server resolves
|
|
267
|
+
approved channel refs through the owning service or AWS secret refs and records
|
|
268
|
+
only redacted delivery metadata.
|
|
269
|
+
|
|
270
|
+
Scheduled reports use duration-based SLA windows, not check-count percentages.
|
|
271
|
+
Report config includes workspace, owner/team/env filters, monitor kinds, time
|
|
272
|
+
window, timezone, maintenance exclusion policy, recipients/channel refs,
|
|
273
|
+
template, and retention. `report_runs` records generated JSON/HTML refs,
|
|
274
|
+
delivery attempts, delivery failures, idempotency keys, and audit refs.
|
|
275
|
+
|
|
276
|
+
Payloads are audience scoped. Private target hostnames/URLs are masked unless
|
|
277
|
+
the recipient/channel is authorized for that workspace and target class.
|
|
278
|
+
|
|
279
|
+
## Sync And Conflict Contract
|
|
280
|
+
|
|
281
|
+
The minimum cloud-primary contract for Hasna services is:
|
|
282
|
+
|
|
283
|
+
1. `status` command shows effective mode, redacted database env name, schema
|
|
284
|
+
version, machine id, sync cursor, conflict count, and whether local storage
|
|
285
|
+
is cache-only.
|
|
286
|
+
2. Cloud mode opens Postgres directly. It does not silently fall back to local
|
|
287
|
+
SQLite when a cloud connection fails.
|
|
288
|
+
3. Pull applies tombstones. Deletes are propagated, not filtered away.
|
|
289
|
+
4. Push uses optimistic concurrency on `version` or `updated_at`. Stale writes
|
|
290
|
+
produce conflicts instead of overwriting cloud rows.
|
|
291
|
+
5. Migration dry-runs report counts, schema versions, and conflict counts only.
|
|
292
|
+
They do not print records or secret values.
|
|
293
|
+
6. Cutover freezes legacy local stores for the migrated service until the
|
|
294
|
+
rollback window closes.
|
|
295
|
+
7. Project per-project stores, canvases, JSON Render specs, and project data
|
|
296
|
+
records are cloud-backed. Local `project.db` files are caches or import
|
|
297
|
+
sources, not the source of truth.
|
|
298
|
+
8. Dependent services are explicitly classified before Uptime cutover:
|
|
299
|
+
cloud-capable, local-only/link-only, or blocked. Link-only services can be
|
|
300
|
+
referenced but cannot be treated as authoritative cloud records.
|
|
301
|
+
|
|
302
|
+
`@hasna/cloud` can remain the shared discovery/status/migration layer, but
|
|
303
|
+
Open Uptime should implement a real cloud storage adapter instead of trying to
|
|
304
|
+
reuse a generic SQLite snapshot sync as runtime storage.
|
|
305
|
+
|
|
306
|
+
## Private Probe Cloud-Primary Mode
|
|
307
|
+
|
|
308
|
+
An operator/probe machine should become cloud-primary only after a preflight proves:
|
|
309
|
+
|
|
310
|
+
- canonical service database endpoints resolve and connect using scoped service
|
|
311
|
+
credentials loaded from secret refs;
|
|
312
|
+
- `cloud status` no longer points at stale or non-resolving hosts;
|
|
313
|
+
- `projects`, `todos`, `conversations`, `mementos`, `knowledge`, `notes`, and
|
|
314
|
+
`uptime` all report an explicit cloud-capable mode or are marked local-only
|
|
315
|
+
with a documented owner;
|
|
316
|
+
- outstanding todos conflicts are reconciled or quarantined;
|
|
317
|
+
- schema versions match the release being deployed;
|
|
318
|
+
- the machine has a unique cloud machine registration and a time-limited primary
|
|
319
|
+
operator lease;
|
|
320
|
+
- local `~/.hasna` databases are backed up before migration and then treated as
|
|
321
|
+
cache/fallback, not active authority.
|
|
322
|
+
|
|
323
|
+
Operator primary status is a time-limited lease, not a boolean flag. The lease
|
|
324
|
+
must include a fencing token, heartbeat deadline, revocation path, audit rows,
|
|
325
|
+
and clear behavior for expiration. Cloud writers and probe workers must include
|
|
326
|
+
the current fencing token for primary-only actions.
|
|
327
|
+
|
|
328
|
+
Rollback is a planned mode:
|
|
329
|
+
|
|
330
|
+
1. pause new cloud writes for the affected service;
|
|
331
|
+
2. revoke the operator machine's primary lease;
|
|
332
|
+
3. point CLIs back to the preflight local fallback store;
|
|
333
|
+
4. keep cloud rows read-only for comparison until the incident is resolved;
|
|
334
|
+
5. record the rollback in Projects, Todos, Logs, and the service audit table.
|
|
335
|
+
|
|
336
|
+
Migration and rollback require pre-cutover backups, conflict quarantine, dry-run
|
|
337
|
+
counts only, legacy-store freeze, restore rehearsal, cloud/local read-only
|
|
338
|
+
comparison evidence, and a documented window for deleting or archiving temporary
|
|
339
|
+
fallback data.
|
|
340
|
+
|
|
341
|
+
## Inventory Imports
|
|
342
|
+
|
|
343
|
+
Imports are explicit preview/apply workflows:
|
|
344
|
+
|
|
345
|
+
- preview reads source services and emits candidates without secret values;
|
|
346
|
+
- apply creates or updates `assets` and `monitors` with provenance;
|
|
347
|
+
- idempotency key is `source_service/source_table/source_id/kind`;
|
|
348
|
+
- stale source records mark Uptime assets as stale; they do not delete monitor
|
|
349
|
+
history automatically;
|
|
350
|
+
- private targets are only created from approved inventory refs and are assigned
|
|
351
|
+
to private probes.
|
|
352
|
+
|
|
353
|
+
Initial import sources:
|
|
354
|
+
|
|
355
|
+
- Projects: repo/service groups, owner, stage, priority, GitHub refs, project
|
|
356
|
+
stores, canvases, and JSON Render dashboard refs.
|
|
357
|
+
- Servers: hostname, health URL, ports, Tailscale fields, project refs, and
|
|
358
|
+
readiness snapshots for private probe monitors.
|
|
359
|
+
- Domains: registered domains, DNS records, TLS expiry, domain expiry, root
|
|
360
|
+
HTTP checks, and discovered page candidates.
|
|
361
|
+
- Deployment: environment URLs, provider/resource status, latest deployment
|
|
362
|
+
refs, region, and rollback/failure signals.
|
|
363
|
+
|
|
364
|
+
## JSON Render And Canvases
|
|
365
|
+
|
|
366
|
+
Every project can have multiple cloud-backed canvases. A canvas stores React
|
|
367
|
+
Flow nodes/edges and links each node to a JSON Render spec, a source record, or
|
|
368
|
+
an Open Uptime dashboard query.
|
|
369
|
+
|
|
370
|
+
Open Uptime should expose JSON Render specs for:
|
|
371
|
+
|
|
372
|
+
- fleet overview by owner/environment/source/probe;
|
|
373
|
+
- incidents and SLA windows;
|
|
374
|
+
- browser error evidence;
|
|
375
|
+
- probe health;
|
|
376
|
+
- import preview diffs;
|
|
377
|
+
- report schedules and report runs.
|
|
378
|
+
|
|
379
|
+
The rendering layer reads cloud records through authenticated APIs. It does not
|
|
380
|
+
read local SQLite databases directly and does not receive raw secrets.
|
|
381
|
+
|
|
382
|
+
Canvas/render APIs must include:
|
|
383
|
+
|
|
384
|
+
- project canvas CRUD, node/edge CRUD, and stable links to monitors, incidents,
|
|
385
|
+
probes, imports, report runs, and source inventory refs;
|
|
386
|
+
- JSON Render spec versioning and validation;
|
|
387
|
+
- redaction tests for private targets, local paths, secret refs, and browser
|
|
388
|
+
evidence;
|
|
389
|
+
- workspace/project authorization on every render and canvas endpoint.
|
|
390
|
+
|
|
391
|
+
## AWS Runtime Contract
|
|
392
|
+
|
|
393
|
+
The preferred hosted runtime is:
|
|
394
|
+
|
|
395
|
+
- approved apps RDS with a dedicated Uptime database or schema,
|
|
396
|
+
least-privileged user, TLS, migrations, automated backups, PITR, deletion
|
|
397
|
+
protection, and pre-cutover snapshots;
|
|
398
|
+
- hardened S3 bucket for browser evidence and report artifacts, with KMS
|
|
399
|
+
encryption, versioning, lifecycle/retention, public access block, and scoped
|
|
400
|
+
IAM policies;
|
|
401
|
+
- ECR image repository, ECS/Fargate service, task role, execution role, private
|
|
402
|
+
subnets, security groups, protected edge, public ALB origin, CloudFront
|
|
403
|
+
default-domain HTTPS or custom TLS/DNS, log groups, metrics, alarms, and
|
|
404
|
+
deployment circuit breaker;
|
|
405
|
+
- hosted web task public-origin configuration through
|
|
406
|
+
`HASNA_UPTIME_ALLOWED_ORIGINS`, matching the selected HTTPS edge origin;
|
|
407
|
+
- Secrets Manager or SSM parameter `valueFrom` refs in task definitions, never
|
|
408
|
+
plaintext secret values in task environment;
|
|
409
|
+
- owner/project/environment tags, budget alarms, and a monthly cost estimate.
|
|
410
|
+
|
|
411
|
+
Except for the current single-web-task deployment bridge documented above, EFS
|
|
412
|
+
is not part of the target architecture unless a future requirement proves a
|
|
413
|
+
shared filesystem is necessary. SQLite-on-EFS is not the final cloud source of
|
|
414
|
+
truth and must not be expanded to scheduler/probe/reporter workers.
|
|
415
|
+
|
|
416
|
+
Open Deployment can orchestrate or record deployment metadata only after its
|
|
417
|
+
hosted mode is private or authenticated and its provider model stops storing raw
|
|
418
|
+
secret values. It must not be exposed as a public upstream for Uptime.
|
|
419
|
+
|
|
420
|
+
An infra change in the approved infrastructure repository must exist before live
|
|
421
|
+
deployment. It must define the runtime resources, outputs consumed by Open
|
|
422
|
+
Uptime/Open Deployment, backup/restore drills, CloudWatch alarms for
|
|
423
|
+
ECS/API/RDS/S3/probe lag/job backlog/delivery failures, and rollback commands.
|
|
424
|
+
|
|
425
|
+
## Current Blockers
|
|
426
|
+
|
|
427
|
+
- Open Uptime has no Postgres/cloud store and no distributed leases.
|
|
428
|
+
- Hosted API reads are protected only by a broad bootstrap token, and the hosted
|
|
429
|
+
dashboard shell still fails closed; production-grade identity/RBAC is not
|
|
430
|
+
implemented yet.
|
|
431
|
+
- Outbound target policy for hosted HTTP probes exists in the SDK, but the
|
|
432
|
+
cloud public-probe worker and lease path are not wired to it yet.
|
|
433
|
+
- `@hasna/cloud` hybrid mode still returns SQLite, so it is not cloud-primary.
|
|
434
|
+
- The local cloud config currently points at a stale/non-resolving database host.
|
|
435
|
+
- Todos has unresolved conflicts that must be reconciled before cloud cutover.
|
|
436
|
+
- Conversations, notes, mementos, servers, domains, and deployment have partial
|
|
437
|
+
or local-first storage models that need explicit ownership decisions.
|
|
438
|
+
- Open Uptime is not registered in `@hasna/cloud` known service lists.
|
|
439
|
+
- A private Hasna AWS bridge now has zero-count runtime resources, including
|
|
440
|
+
ECR, dormant ECS services, ALB, CloudFront default-domain edge, logs, alarms,
|
|
441
|
+
EFS, Backup, and Secrets Manager containers. The public Terraform module now
|
|
442
|
+
defines explicit ECS container health checks for every task definition. The
|
|
443
|
+
active private secret refs under `hasna/xyz/opensource/uptime/prod/*` have
|
|
444
|
+
`AWSCURRENT` versions, and one-off web task smokes proved image pull/startup,
|
|
445
|
+
secret injection, CloudWatch log delivery, EFS read/write, S3 PutObject, and
|
|
446
|
+
NAT HTTPS egress while services stayed at desired count `0`. The Terraform
|
|
447
|
+
private deployment has CloudFront-only origin verification headers applied so
|
|
448
|
+
the ALB rejects direct origin requests that only share CloudFront's managed
|
|
449
|
+
prefix list; that header value is secret-bearing Terraform/AWS configuration,
|
|
450
|
+
not public evidence. The private deployment evidence also includes a
|
|
451
|
+
representative SQLite EFS backup/restore drill with integrity/count checks.
|
|
452
|
+
It is not live: live scale-up is still blocked by edge/auth smokes, alarm
|
|
453
|
+
actions, budget recipients, and production auth hardening beyond scoped
|
|
454
|
+
static operator tokens.
|
|
455
|
+
- Projects per-project cloud stores do not exist yet; current local
|
|
456
|
+
`project.db` stores are not enough for cloud-backed canvases or JSON Render.
|
|
457
|
+
- Browser/page monitoring lacks the artifact, redaction, retention, and storage
|
|
458
|
+
controls required for hosted mode.
|
|
459
|
+
- Open Deployment stores and injects secrets in ways that conflict with the
|
|
460
|
+
hosted secret-ref model and must stay private until hardened.
|
|
461
|
+
|
|
462
|
+
## Acceptance Criteria
|
|
463
|
+
|
|
464
|
+
- A fresh operator-machine local state directory can read and mutate cloud-backed
|
|
465
|
+
projects, project stores, todos, knowledge, notes, mementos, and uptime data
|
|
466
|
+
without copying a database.
|
|
467
|
+
- Open Uptime cloud mode refuses to start if the cloud database is unavailable
|
|
468
|
+
unless an explicit `--local-fallback` dev flag is used.
|
|
469
|
+
- Public API/dashboard reads and every mutation require auth except `/health`.
|
|
470
|
+
- Uptime checks use transactional cloud leases and cannot double-run from two
|
|
471
|
+
probes for the same schedule slot.
|
|
472
|
+
- Import preview/apply is idempotent and stores source provenance for every
|
|
473
|
+
monitorable asset.
|
|
474
|
+
- Tombstones propagate across push/pull, and stale writes are conflicts.
|
|
475
|
+
- Project canvases and JSON Render specs are stored per project in cloud-backed
|
|
476
|
+
project stores and support multiple canvases per project.
|
|
477
|
+
- Hosted report delivery uses service-owned channel refs and secret refs only.
|
|
478
|
+
- Rollback from cloud-primary to local fallback is documented and tested before
|
|
479
|
+
private-probe cutover.
|
|
480
|
+
- The implementation contract covers monitor kinds/assertions, probe leases,
|
|
481
|
+
target policy, incident actions, report schedules, import preview/apply,
|
|
482
|
+
browser evidence, auth/RBAC, schema migrations, backups, and infra outputs.
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
{
|
|
2
|
+
"service": "open-uptime",
|
|
3
|
+
"package": "@hasna/uptime",
|
|
4
|
+
"intendedVersion": "0.1.22",
|
|
5
|
+
"accountProfile": "<aws-profile>",
|
|
6
|
+
"accountId": "<aws-account-id>",
|
|
7
|
+
"region": "us-east-1",
|
|
8
|
+
"workspaceId": "<workspace-id>",
|
|
9
|
+
"repo": "hasna/uptime",
|
|
10
|
+
"deploymentStatus": "preflighted-not-deployed",
|
|
11
|
+
"runtimeStore": {
|
|
12
|
+
"mode": "hosted-efs-sqlite",
|
|
13
|
+
"environmentVariable": "HASNA_UPTIME_HOSTED_SQLITE_DB",
|
|
14
|
+
"path": "/data/uptime/uptime.db",
|
|
15
|
+
"backup": "AWS Backup on encrypted EFS"
|
|
16
|
+
},
|
|
17
|
+
"protectedAccess": {
|
|
18
|
+
"mode": "cloudfront_default_domain",
|
|
19
|
+
"url": "https://<cloudfront-domain>",
|
|
20
|
+
"allowedOriginsEnv": "HASNA_UPTIME_ALLOWED_ORIGINS=https://<cloudfront-domain>",
|
|
21
|
+
"originPolicy": "ALB HTTP ingress restricted to CloudFront origin-facing ranges plus CloudFront-only origin verification header before scale-up",
|
|
22
|
+
"originVerifyHeaderRequiredBeforeScaleUp": true,
|
|
23
|
+
"originVerifyHeaderEnabled": "<true-after-private-root-apply>",
|
|
24
|
+
"originVerifyHeaderName": "<private-header-name>",
|
|
25
|
+
"originVerifyHeaderValueLocation": "private Terraform variable / encrypted state; never public docs or logs"
|
|
26
|
+
},
|
|
27
|
+
"awsInventory": {
|
|
28
|
+
"vpcId": "<target-vpc-id>",
|
|
29
|
+
"vpcCidr": "<target-vpc-cidr>",
|
|
30
|
+
"rdsInstanceObserved": "<optional-rds-instance-id>",
|
|
31
|
+
"rdsUsedByCurrentRuntime": false,
|
|
32
|
+
"ecrRepositoryObserved": false,
|
|
33
|
+
"route53HostedZoneObservedForHostname": false,
|
|
34
|
+
"acmCertificateObservedInRegion": false,
|
|
35
|
+
"cloudFrontDistributionObserved": false,
|
|
36
|
+
"dockerDaemonAccessibleFromOperatorMachine": false
|
|
37
|
+
},
|
|
38
|
+
"repoArtifacts": {
|
|
39
|
+
"terraform": "infra/aws",
|
|
40
|
+
"runtimeDockerfile": "Dockerfile",
|
|
41
|
+
"packageDockerfile": "Dockerfile.package",
|
|
42
|
+
"runbook": "docs/aws-deployment-runbook.md"
|
|
43
|
+
},
|
|
44
|
+
"remainingExternalPrerequisites": [
|
|
45
|
+
"Approved CloudFront default-domain or custom TLS/DNS access path",
|
|
46
|
+
"Approved private application subnets",
|
|
47
|
+
"Terraform apply from the approved infrastructure source of truth",
|
|
48
|
+
"CodeBuild image-builder run after the npm package is published",
|
|
49
|
+
"Hosted auth/RBAC replacement for broad static hosted tokens before broad exposure"
|
|
50
|
+
],
|
|
51
|
+
"lastUpdated": "YYYY-MM-DD"
|
|
52
|
+
}
|