@m-kopa/launchpad-cli 0.26.1 → 0.27.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: launchpad-deploy
3
- description: Walk a Launchpad user through deploying an app from their local working directory (Model A — `launchpad init` + `launchpad deploy`). Wraps the CLI verbs end-to-end: detects the app shape, scaffolds `launchpad.yaml`, resolves the allowed Entra group via `launchpad groups`, bundles the CWD via `launchpad deploy`, and tails the resulting content PR. Use when someone says "deploy a new app", "ship my app to Launchpad", "/launchpad-deploy", "I have an app locally — get it on Launchpad", or any variant. Resume/abandon for legacy in-flight provisioning is at the bottom.
4
- version: 0.26.1
3
+ description: Walk a Launchpad user through deploying an app from their local working directory (Model A — `launchpad init` + `launchpad deploy`). Wraps the CLI verbs end-to-end: detects the app shape, scaffolds `launchpad.yaml`, resolves the allowed Entra group via `launchpad groups`, bundles the CWD via `launchpad deploy`, and watches the rollout via `launchpad status`. Use when someone says "deploy a new app", "ship my app to Launchpad", "/launchpad-deploy", "I have an app locally — get it on Launchpad", or any variant. Resume/abandon for legacy in-flight provisioning is at the bottom.
4
+ version: 0.27.1
5
5
  ---
6
6
 
7
7
  <!-- BEGIN shell-contract (managed by scripts/sync-skill-contract.sh — edit skills/_partials/shell-contract.md) -->
@@ -31,12 +31,13 @@ esac
31
31
  # /launchpad-deploy
32
32
 
33
33
  Model A deploy flow: the user already has an app in their working
34
- directory (Vite/React, static, or container-shape), and wants it
34
+ directory (Vite/React or static), and wants it
35
35
  running on Launchpad. The CLI handles everything end-to-end —
36
36
  detection, scaffolding, group resolution, bundling, upload, and the
37
- content PR the bot opens on their behalf. **No `gh`, `jq`, or `curl`
38
- required; no M-KOPA GitHub access required.** External users with a
39
- Cf Access account are first-class.
37
+ commit the bot lands on the app repo. **No `gh`, `jq`, or `curl`
38
+ required; no M-KOPA GitHub access required.** External users without
39
+ M-KOPA GitHub access are first-class — a Launchpad sign-in
40
+ (`launchpad login`, M-KOPA Microsoft account) is all they need.
40
41
 
41
42
  ## Constants (single source of truth)
42
43
 
@@ -106,7 +107,10 @@ If the user is unsure, `launchpad init` itself runs an auto-detector
106
107
  (M-1216 T2): it looks for `package.json`, `vite.config.{ts,js}`, a
107
108
  lockfile, a `functions/` subdirectory, etc., and infers app-type +
108
109
  package manager + build command + dest dir. Surface what it
109
- discovered before committing to a slug.
110
+ discovered before committing to a slug. Genuine ambiguity (multiple
111
+ lockfiles, missing vite config, monorepo-shape root) aborts loudly —
112
+ resolve it by passing `--type` explicitly or skipping detection with
113
+ `--no-detect`.
110
114
 
111
115
  ### A.2 Scaffold `launchpad.yaml`
112
116
 
@@ -116,31 +120,34 @@ launchpad init
116
120
 
117
121
  This is interactive by default. It will ask for:
118
122
 
119
- - **slug** — `lowercase-with-hyphens`, 3–30 chars, must not start or
120
- end with a hyphen. Reuse-rejection is server-side; you don't need
121
- to pre-check.
122
- - **display name** — free text, shown in the portal catalogue.
123
- - **app type** — `static`, `react`, `react+api`, or `container`. The
124
- detector pre-selects the right one for Vite/React layouts; the
125
- user overrides if needed.
126
- - **team / owner** for the registry.
127
- - **allowed Entra group** — see §A.3.
123
+ - **name (slug)** — lowercase letters, digits, and hyphens; 2–63
124
+ chars; must start and end alphanumeric. Reuse-rejection is
125
+ server-side; you don't need to pre-check.
126
+ - **team** — for the registry.
127
+ - **owner email** — must be a valid email address.
128
+ - **deployment type** `static`, `react`, `react+api`, or
129
+ `container`. The detector pre-selects the right one for Vite/React
130
+ layouts; the user overrides if needed.
131
+ - **allowed_entra_groups** — one or more, comma-separated; see §A.3.
128
132
 
129
133
  Non-interactive form for scripting:
130
134
 
131
135
  ```bash
132
136
  launchpad init --non-interactive \
133
- --slug <slug> \
134
- --display-name "<display name>" \
135
- --app-type <apptype> \
137
+ --name <slug> \
138
+ --type <apptype> \
136
139
  --team <team> \
137
140
  --owner <owner-email> \
138
141
  --group <group>
139
142
  ```
140
143
 
141
- `--app-type` must be one of `static`, `react`, `react+api`, or
142
- `container`. `--group` accepts the Entra group's display name or
143
- UUID (the bot canonicalises to the UUID).
144
+ `--type` must be one of `static`, `react`, `react+api`, or
145
+ `container`. `--group` (preferred long form `--allowed-group`,
146
+ repeatable for multiple groups) accepts the Entra group's display
147
+ name or UUID (the bot canonicalises to the UUID). Other flags:
148
+ `--description <text>`, `--auth <mode>` (see §A.5),
149
+ `--session-duration <d>`, `--hostname <host>` (repeatable),
150
+ `--out <path>`, `--force`, `--no-gitignore`, `--no-detect`.
144
151
 
145
152
  `launchpad init` writes `./launchpad.yaml` and exits 0. Re-running
146
153
  against an existing file is rejected unless `--force` is passed —
@@ -157,20 +164,19 @@ commit to a slug:
157
164
  | `static` | n/a | n/a | n/a | No bundler, no TS, no React. |
158
165
  | `react` | n/a | client-side or remote-fetch only | n/a | SPA only — no server-side anything. |
159
166
  | `react+api` | `hono` (exclusive) | Cloudflare D1 / Neon (HTTP) / KV / R2 | **Sibling cron Worker** (Pages has no `scheduled` handler — see "Two-tier apps" below) | Requires `compatibility_flags = ["nodejs_compat"]` in `wrangler.toml` (ADR-0011 carve-out). |
160
- | `container` | any HTTP server | any (container-local or HTTP backend) | container-local | Single HTTP port; deployed via Cloudflare Containers. |
161
-
162
- **Will not run on Launchpad** (representativenot exhaustive):
163
- `fastify` / `@fastify/*` / `express` / `koa` / `@koa/*` /
164
- `@nestjs/*` / `hapi` / `@hapi/*`, `better-sqlite3` / native
165
- `sqlite3`, native TCP drivers (`pg`, `mysql`, `mysql2`, `mongodb`,
166
- `mongoose`, `redis`, `ioredis`), `dotenv`, `setInterval` /
167
- `setTimeout` daemons, top-level `Dockerfile` / `docker-compose.yml`
168
- on non-container app-types, `nginx.conf`, `Procfile`, `pm2`,
169
- `pm2.config.js`, `ecosystem.config.js`, `forever`, `nodemon`. The
170
- **canonical validation list** is `/launchpad-content-pr` § Stack-fit
171
- pre-flightthat skill enforces the gate at deploy time. If the
172
- existing app uses any of these, plan the swap **before** picking a
173
- slug — do not deploy first and port later.
167
+ | `container` | any HTTP server | any (container-local or HTTP backend) | container-local | Single HTTP port. Schema-valid, but the guided flow in this skill covers `static`/`react`/`react+api` — container guidance ships separately. |
168
+
169
+ **Will not run on the Workers runtime** (advisoryrepresentative,
170
+ not exhaustive): `fastify` / `@fastify/*` / `express` / `koa` /
171
+ `@koa/*` / `@nestjs/*` / `hapi` / `@hapi/*`, `better-sqlite3` /
172
+ native `sqlite3`, native TCP drivers (`pg`, `mysql`, `mysql2`,
173
+ `mongodb`, `mongoose`, `redis`, `ioredis`), `dotenv`, `setInterval` /
174
+ `setTimeout` daemons, `pm2` / `forever` / `nodemon`. The bot does
175
+ **not** reject these — there is no dependency or source-pattern gate
176
+ they simply fail at build or runtime on Cloudflare Pages/Workers. If
177
+ the existing app uses any of these, plan the swap **before** picking
178
+ a slug do not deploy first and port later. What the bot *does*
179
+ enforce at deploy time is the gate set in §A.5.
174
180
 
175
181
  ### A.3 Allowed Entra group
176
182
 
@@ -179,11 +185,17 @@ The CLI resolves Entra groups via the bot's `/groups` endpoint
179
185
  endpoint directly. Two helpers:
180
186
 
181
187
  ```bash
182
- launchpad groups list # every group the caller can see
188
+ launchpad groups list # every group assigned to the Launchpad app in Entra
183
189
  launchpad groups search <query> # fuzzy match by name / nickname / id
184
190
  launchpad groups show <name> # UUID + displayName + mailNickname
191
+ launchpad groups resolve <name> # just the Entra Object-ID UUID (script-friendly)
185
192
  ```
186
193
 
194
+ `groups list` is **not** the whole tenant and not caller-scoped: it
195
+ lists the groups **assigned to the Launchpad enterprise application**
196
+ in Entra — the only groups that can actually grant sign-in to a
197
+ deployed app.
198
+
187
199
  When the user picks a group, pass either the **displayName** or the
188
200
  **UUID** to `launchpad init --group <…>` — the CLI accepts both and
189
201
  the bot canonicalises to the UUID.
@@ -195,12 +207,14 @@ If `launchpad groups list` fails with:
195
207
  and `ENTRA_GRAPH_CLIENT_ID` are non-secret identifiers (live in
196
208
  `wrangler.toml` `[vars]`); only `ENTRA_GRAPH_CLIENT_SECRET`
197
209
  requires `wrangler secret put`.
198
- - `502 graph_auth_failed` / `graph_fetch_failed` → the Entra app's
199
- Graph permission grant (`Group.Read.All` or
200
- `GroupMember.Read.All`) is missing admin consent, or Graph is
201
- unreachable. Surface the error body.
202
- - empty list → no groups visible to the bot; check the Graph
203
- permission scope.
210
+ - `502 graph_auth_failed` / `graph_fetch_failed` → the bot's Graph
211
+ credential can't read the Launchpad service principal's
212
+ `appRoleAssignedTo` assignment list (missing admin consent on the
213
+ Graph application permission), or Graph is unreachable. Surface
214
+ the error body.
215
+ - empty list → no groups are assigned to the Launchpad enterprise
216
+ app in Entra. A group must be **assigned to the app** before it
217
+ can gate anything — membership alone is not enough.
204
218
 
205
219
  Use `launchpad groups whoami` to remind the user which groups
206
220
  **they** are currently a member of — handy when an app is gated and
@@ -213,8 +227,14 @@ launchpad validate
213
227
  ```
214
228
 
215
229
  Parses `launchpad.yaml` against the v1alpha1 schema and reports
216
- problems. Doesn't talk to the bot. Useful when the user wants a
217
- second look before paying for an upload + PR round-trip.
230
+ problems. Doesn't talk to the bot by default. Useful when the user
231
+ wants a second look before paying for an upload round-trip.
232
+
233
+ Add `--strict-groups` to *additionally* resolve the manifest's
234
+ allowed Entra group online against the bot's group list — it catches
235
+ typo'd or renamed group names before deploy time. This mode needs a
236
+ session and the network (exit 1 = group not found / ambiguous,
237
+ 2 = network error, 3 = no valid session).
218
238
 
219
239
  ```bash
220
240
  launchpad plan
@@ -229,10 +249,13 @@ build command, destination directory, allowed group. Still offline.
229
249
  launchpad deploy
230
250
  ```
231
251
 
232
- Bundles the working tree (using `git ls-files -co --exclude-standard`
233
- where available; falls back to a pure-FS walker honouring
234
- `.gitignore` and a default-ignore set), gzips it, and POSTs to the
235
- bot's `/apps/<slug>/deploy/bundle` endpoint.
252
+ Bundles the working tree with a pure-FS walker honouring `.gitignore`
253
+ plus a built-in default-ignore set it never shells out to
254
+ `git ls-files`, so users with no `git` installed can still deploy —
255
+ gzips it, and POSTs to the bot's `/apps/<slug>/deploy/bundle`
256
+ endpoint. Files outside the manifest's app boundary (`app.root` /
257
+ `app.include`) and never-shippable files (private keys, `.env`
258
+ material) are stripped with a warning before upload.
236
259
 
237
260
  **First deploy vs subsequent deploys** (M-1234). Under Model A there
238
261
  is no separate "create the app" step — `launchpad deploy` against a
@@ -245,14 +268,43 @@ fresh slug auto-provisions:
245
268
  (`auth: gateway`); pass `--auth access` to `launchpad init` to use a
246
269
  per-app Cloudflare Access app instead. The CLI prints `✓ First-time deploy — provisioning
247
270
  workflow started` and exits 0. Provisioning typically takes
248
- 5–10 minutes. Watch it with `launchpad status <slug>` and re-run
249
- `launchpad deploy` once lifecycle hits `live`.
271
+ 5–10 minutes. **Your bundle ships with the provisioning run** when
272
+ lifecycle reaches `live`, this deploy's content is what's serving.
273
+ **No second deploy needed**; re-deploying is only for the rare case
274
+ where the app comes up live *without* your content. Watch with
275
+ `launchpad status <slug>`.
250
276
  - **Subsequent deploys** (slug already live): the bot extracts the
251
- tarball, runs the ingest gates (forbidden file types, oversized
252
- binaries, secret patterns, build-command allowlist), commits the
253
- bundle into `launchpad-app-<slug>` via the GitHub App, and CF Pages
254
- auto-deploys on the push. The CLI prints `✓ Bundle accepted —
255
- committed as <sha>`.
277
+ tarball, runs the ingest gates (see below), commits the bundle
278
+ straight onto `main` of `launchpad-app-<slug>` via the GitHub App
279
+ (no PR, no merge step), and CF Pages auto-builds on the push. The
280
+ CLI prints `✓ Bundle accepted — committed as <sha>`. A successful
281
+ deploy is **not** a live app yet — the Pages build runs
282
+ asynchronously and can fail after the commit lands; confirm with
283
+ `launchpad status <slug>`.
284
+
285
+ **What the bot enforces at deploy time** (the real gate set — there
286
+ is no dependency/stack-fit gate, see §A.2):
287
+
288
+ - **Bundle policy** — hard caps (5000 files, 50 MB bundle, 10 MB per
289
+ file); symlinks; path traversal / absolute paths; `.github/workflows`;
290
+ `CODEOWNERS`; `.npmrc`/`.yarnrc` carrying auth tokens; `.git/`
291
+ directories. All violations are returned in one pass, verbatim, so
292
+ the user can fix everything at once.
293
+ - **Secret scan** — high-signal patterns (AWS keys, GitHub tokens,
294
+ Slack tokens, SSH/RSA/EC private keys, generic api-key shapes).
295
+ Rejections name `{path, rule}` only — never the matched bytes.
296
+ - **Build-command allowlist** — the manifest's `spec.build.command`
297
+ is checked against a safety policy.
298
+
299
+ These gates are **delta-judged** (ADR 0025): the bot evaluates what
300
+ your deploy *changes* against what is already on `main`, not the
301
+ whole workspace. Pre-existing violations in files you didn't touch
302
+ become non-blocking **standing exceptions** — the CLI prints them
303
+ after the deploy and `launchpad status <slug>` lists the full
304
+ inventory. If the delta can't be computed, the bot falls back to
305
+ judging the whole bundle (fail-closed — never a skipped gate). The
306
+ bot may also strip never-shippable files server-side; the CLI
307
+ surfaces those as a `boundary_stripped` warning.
256
308
 
257
309
  Concurrent first-deploys against the same slug: the first request
258
310
  wins (gets the `provisioning_started` response); the second gets HTTP
@@ -265,13 +317,16 @@ subsequent deploys it prints the commit short-SHA + repo; for first
265
317
  deploys it prints provisioning guidance. Use `launchpad status
266
318
  <slug>` to watch lifecycle progress to its terminal state.
267
319
 
268
- Common flags:
320
+ Flags — there is nothing useful to pass on the Model A path:
269
321
 
270
- - **`--slug <slug>`** explicit override. Defaults to the slug from
271
- `./launchpad.yaml`.
272
- - **`--message <text>`** — threaded as the PR description. Useful
273
- for change logs.
274
- - **`--file <path>`**point at a non-default manifest path.
322
+ - The slug always comes from `./launchpad.yaml` a `--slug` flag
323
+ does **not** override it (it only applies to the legacy clone flow
324
+ and `--new`).
325
+ - `--message` is sent as a request header on the legacy path only,
326
+ and the bot currently **ignores** it don't offer it as a
327
+ change-log mechanism.
328
+ - `--file <path>` is valid only with the `--dry-run` / `--apply`
329
+ manifest modes, not with a bundle deploy.
275
330
 
276
331
  ### A.6 Terminal handling
277
332
 
@@ -279,12 +334,19 @@ Common flags:
279
334
  Run `/launchpad-status` to confirm; `/launchpad-content-pr` is no
280
335
  longer needed under Model A (the first deploy already shipped your
281
336
  content)."
282
- - **`failed` / `bundle_rejected` / `cf-pages-poll-unrecoverable`** →
283
- run `/launchpad-deploy-status <slug>` and surface the failure
284
- reason. The bundle policy errors (oversized files, forbidden
337
+ - **Terminal failure stages** `validator_rejected`,
338
+ `tf_apply_failed`, `bot_pr_ci_failed`, `abandoned`, `failed` run
339
+ `/launchpad-deploy-status <slug>` and surface the failure reason
340
+ (a string like `cf-pages-poll-unrecoverable` is a failure *reason*,
341
+ not a stage). Bundle-policy errors (oversized files, forbidden
285
342
  symlinks, secret-pattern hits, build-command violations) are
286
- self-describing in the CLI's stderr; surface them verbatim and let
287
- the user fix and re-`launchpad deploy`.
343
+ rejected at upload and self-describing in the CLI's stderr; surface
344
+ them verbatim and let the user fix and re-`launchpad deploy`.
345
+ - **Terminal-failed but the app is actually serving** → `launchpad
346
+ recover <slug>`. The bot re-derives reality from live Cloudflare
347
+ state and repairs the record to `live` only when the app is
348
+ verifiably serving; if it isn't, it refuses with exactly what was
349
+ checked — it never fabricates a live state.
288
350
  - **Anything else terminal** → run `/launchpad-deploy-status <slug>`
289
351
  and surface the diagnostic.
290
352
 
@@ -324,6 +386,30 @@ pushes both with its broker token. Verify with `launchpad secrets status
324
386
  <slug>` (PRESENT on both surfaces). **Never `wrangler secret put`** — the
325
387
  operator has no `wrangler`.
326
388
 
389
+ ## Pages-tier D1 (`d1_binding` on a `pages` target)
390
+
391
+ A pure `react+api` app (no cron Worker) that needs a database declares the
392
+ binding on a `pages` target (sp-pgd1b7):
393
+
394
+ ```yaml
395
+ targets:
396
+ - kind: pages
397
+ d1_binding: DB # env binding name your /api code reads
398
+ ```
399
+
400
+ On `launchpad deploy` the bot auto-provisions the shared D1 named after the
401
+ slug (**create-or-adopt by slug, never deleted** — a re-provision adopts the
402
+ existing database) and binds it to the Pages app, so `env.DB` works with no
403
+ manual `wrangler d1 create`. The bot also pins the matching
404
+ `[[d1_databases]]` block into the **committed** `wrangler.toml` — Pages
405
+ git-source builds read bindings from that file, and a build without the
406
+ block silently resets the binding to empty.
407
+
408
+ **Empty-DB gotcha:** the platform provisions an **empty** database — schema
409
+ and migrations are the app's job. The usual pattern is idempotent
410
+ `CREATE TABLE IF NOT EXISTS …` at runtime (startup / first request); there
411
+ is no platform-side migration step.
412
+
327
413
  ## Gateway auth — the failure classes (assert these; do not relearn them)
328
414
 
329
415
  New apps default to `auth: gateway` (the platform Entra-OIDC gateway). For a
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: launchpad-deploy-status
3
- description: Show the current provisioning stage + failure reason for a Launchpad app via `launchpad status` (Model A drift + deployment_verified) and `launchpad apps` (lifecycle bucket). Renders the M-892 stage trace for legacy in-flight provisioning. Use when someone says "what's the status of demo-X", "/launchpad-deploy-status", "is my deploy stuck", or after `/launchpad-deploy` reports a non-`done` terminal stage.
4
- version: 0.26.1
3
+ description: Show the current provisioning stage + failure reason for a Launchpad app via `launchpad status` (Model A drift + deployment_verified) and `launchpad apps` (lifecycle bucket). Renders the M-892 stage trace for in-flight provisioning, and is the canonical home for `launchpad recover` (repair a terminal-failed app record that is actually live). Use when someone says "what's the status of demo-X", "/launchpad-deploy-status", "is my deploy stuck", "my app says failed but it's serving", or after `/launchpad-deploy` reports a non-`done` terminal stage.
4
+ version: 0.27.1
5
5
  ---
6
6
 
7
7
  <!-- BEGIN shell-contract (managed by scripts/sync-skill-contract.sh — edit skills/_partials/shell-contract.md) -->
@@ -61,6 +61,7 @@ inference source resolves.
61
61
  | What was actually deployed? | `launchpad pull <slug>` |
62
62
  | Is the app stuck in the legacy M-892 zero-touch flow? | See § Legacy below |
63
63
  | What broke the most recent deploy? | `launchpad status <slug> --json` + the bot's PR check trail |
64
+ | It says `failed` but the app is live in a browser | `launchpad recover <slug>` — see § Recover below |
64
65
 
65
66
  The Model A default is `launchpad status <slug>`. The other verbs
66
67
  are specialisations.
@@ -71,10 +72,20 @@ are specialisations.
71
72
  launchpad status <slug>
72
73
  ```
73
74
 
74
- Output is one of three states (see `/launchpad-status` for the
75
- canonical reference):
75
+ Output is one of a closed set of states (see `/launchpad-status` for
76
+ the canonical reference). Lifecycle-shaped states:
76
77
 
77
- - **`in sync`** — local `./launchpad.yaml` matches what's deployed.
78
+ - **`provisioning`** — first deploy still in flight. The live
79
+ workflow stage is shown inline (`stage: …`); see § Stage taxonomy.
80
+ - **`provisioning_failed`** — provisioning failed; the failing stage
81
+ and reason are shown inline. If the app is actually live, see
82
+ § Recover.
83
+ - **`destroying` / `destroyed` / `destroy_failed`** — teardown
84
+ states. Route to `/launchpad-destroy`.
85
+
86
+ Live-app states:
87
+
88
+ - **`in_sync`** — local `./launchpad.yaml` matches what's deployed.
78
89
  Nothing pending. App is live and the most-recent deploy verified.
79
90
  - **`drift: <field list>`** — local and deployed differ on at least
80
91
  one v1 closed-set field (`metadata.name`, `metadata.team`,
@@ -82,10 +93,17 @@ canonical reference):
82
93
  `access.allowed_entra_group`, `hostnames[0]`, `build.command`,
83
94
  `build.destination_dir`, `build.root_dir`, `production_env.*`).
84
95
  Run `launchpad deploy` to roll the local manifest out.
85
- - **`no deployed manifest yet`** the bot reports no
86
- `output "<slug>_manifest_sha"`. Either the first deploy is still
87
- in flight, or it failed. Check `launchpad apps` for the lifecycle
88
- bucket.
96
+ - **`live_no_content`** provisioned, but no content deployed yet.
97
+ Run `launchpad deploy`.
98
+ - **`live_content_untracked`** live Cloudflare Pages content
99
+ exists but there is no platform-tracked manifest: the app deploys
100
+ via git integration / outside `launchpad deploy`.
101
+ - **`live_drift_unknown`** — live and tracked, but no local
102
+ `launchpad.yaml` to compare against; status degrades to the
103
+ live-truth-only view.
104
+ - **`no_deployed_manifest`** — nothing deployed through the platform
105
+ yet. Either the first deploy is still in flight, or it failed.
106
+ Check `launchpad apps` for the lifecycle bucket.
89
107
 
90
108
  Add `--json` for structured output:
91
109
 
@@ -93,10 +111,13 @@ Add `--json` for structured output:
93
111
  launchpad status <slug> --json
94
112
  ```
95
113
 
96
- The JSON envelope includes `deployedSha`, `headSha`, `hasOpenPr`,
97
- `openPrNumber`, `driftFields`, and per-field `driftDetails`. This
98
- is the shape downstream tooling should parse — never grep the prose
99
- output.
114
+ The JSON envelope is discriminated by `state` (the union above) and
115
+ includes `deployedSha`, `headSha`, `hasOpenPr`, `openPrNumber`,
116
+ `driftFields`, and per-field `driftDetails`. Provisioning/failed
117
+ states carry `stage` + `failedReason`; live states carry a
118
+ `deployment` block (last CF Pages deployment, trigger, build outcome
119
+ + failure-log excerpt). This is the shape downstream tooling should
120
+ parse — never grep the prose output.
100
121
 
101
122
  ## Lifecycle bucket
102
123
 
@@ -117,12 +138,15 @@ field. Common values:
117
138
  the apply may have failed silently — surface to platform-team.
118
139
  - **`failed`** — provisioning failed. See `launchpad status <slug>
119
140
  --json` for the most recent error, and the bot's open PR on
120
- `launchpad-platform` for the apply trace.
141
+ `launchpad-platform` for the apply trace. If the app is actually
142
+ live and serving, repair the record with `launchpad recover
143
+ <slug>` (§ Recover).
121
144
  - **`destroying` / `destroyed` / `destroy_failed`** — teardown
122
145
  states. Route to `/launchpad-destroy`.
123
146
 
124
- Restrict to a single slug with `grep` (or `--json` parsing) if the
125
- list is long. `launchpad apps` is read-only and cheap.
147
+ Restrict to a single slug with `grep` if the list is long —
148
+ `launchpad apps` has no `--json` flag, so the table is the only
149
+ surface. `launchpad apps` is read-only and cheap.
126
150
 
127
151
  ## Render
128
152
 
@@ -155,7 +179,7 @@ In-flight first deploy:
155
179
  ```
156
180
  App: demo-9
157
181
  Lifecycle: provisioning
158
- State: no deployed manifest yet
182
+ State: provisioning (stage: tf_applied)
159
183
 
160
184
  Next steps:
161
185
  Wait for the first deploy to complete. Re-run this skill in a few
@@ -168,28 +192,99 @@ Failed deploy:
168
192
  ```
169
193
  App: demo-7
170
194
  Lifecycle: failed
171
- State: no deployed manifest yet bundle_rejected (3 oversized files)
195
+ State: provisioning_failed (stage: content_seeded — 3 oversized files)
172
196
 
173
197
  Next steps:
174
- /launchpad-deploy "Recover a legacy in-flight deploy" resume,
175
- OR fix the bundle (per the reasons above) and run `launchpad deploy`
176
- again. The bot is idempotent on retries against a failed slug.
198
+ If the app is actually live in a browser, run `launchpad recover
199
+ demo-7` to reconcile the record against live state.
200
+ Otherwise fix the bundle (per the reasons above) and run
201
+ `launchpad deploy` again — the bot is idempotent on retries
202
+ against a failed slug. For a stuck legacy in-flight deploy,
203
+ /launchpad-deploy "Recover a legacy in-flight deploy" → resume.
204
+ ```
205
+
206
+ ## Recover — terminal-failed but actually live
207
+
208
+ The observed class (live fixture: `ai-audit`): provisioning hit a
209
+ since-fixed platform bug *after* the app's content was already
210
+ serving, so the registry record is stuck at `lifecycle: failed`
211
+ while the app itself is live. `launchpad status` short-circuits on
212
+ the failed lifecycle, and no other CLI verb can mutate the record.
213
+ The shipped repair verb:
214
+
215
+ ```bash
216
+ launchpad recover <slug>
217
+ # or, structured:
218
+ launchpad recover <slug> --json
177
219
  ```
178
220
 
221
+ Slug inference matches `launchpad status` (manifest slug first, then
222
+ `launchpad-app-<slug>` dirname), so a bare `launchpad recover` works
223
+ from inside the app directory.
224
+
225
+ What it does (bot `POST /apps/<slug>/recover`):
226
+
227
+ - **Re-derives reality from the live Cloudflare Pages API** — never
228
+ TF state, never the stale record. The record is repaired (flipped
229
+ to `live`) only when the Pages project exists and a successful
230
+ production deployment is serving. The repair touches ONLY the
231
+ lifecycle/failure fields; owners, editors, targets, and auth
232
+ config are preserved verbatim.
233
+ - **Never fabricates a live state.** A not-live app is refused
234
+ (exit 1) with exactly what was checked (project existence, latest
235
+ production deployment + build status) and the next steps.
236
+ - **Fail-closed.** Cloudflare unreachable → refusal, nothing
237
+ changed; retry shortly.
238
+ - **Idempotent.** Recovering an already-healthy app is a no-op
239
+ success.
240
+ - **Scoped to terminal `failed` records.** `provisioning` apps are
241
+ refused (let the workflow finish); destroy-side states are owned
242
+ by `launchpad destroy`; container apps are unsupported.
243
+
244
+ Exit codes: `0` repaired or no-op (already healthy); `1` refusal,
245
+ fail-closed, or auth/transport error; `64` usage. Owner or editor
246
+ role required; every decision is audited server-side.
247
+
248
+ Recover is the one **mutating** verb this skill may reach for, and
249
+ its only mutation is the registry lifecycle record — repaired
250
+ strictly after live verification. It never touches Cloudflare
251
+ resources, the app repo, or TF. After a repair, re-run `launchpad
252
+ status <slug>` — it then reports the live deployment truth.
253
+
179
254
  ## Legacy — M-892 stage taxonomy
180
255
 
181
- For apps that are stuck in the pre-Model-A zero-touch provisioning
182
- flow (`launchpad deploy --new` / `--resume` / `--abandon`), the bot
183
- still emits the original stage trace. The taxonomy is:
256
+ The provisioning workflow (Model A first deploys and the legacy
257
+ zero-touch flow `launchpad deploy --new` / `--resume` / `--abandon`
258
+ alike) emits a stage trace; `launchpad status` shows the live stage
259
+ inline while `provisioning`. The happy-path order is:
184
260
 
185
261
  ```
186
262
  pending → repo_created → bootstrap_pr_opened → bootstrap_pr_merged →
187
- tf_pr_opened → tf_pr_merged → tf_applied → cert_active →
188
- policy_attached → ready_for_contentdeployment_verified done
263
+ content_seeded → tf_pr_opened → tf_pr_merged → tf_applied →
264
+ cert_active → policy_attached → tf_env_pr_openedtf_env_pr_merged
265
+ tf_env_applied → ready_for_content → deployment_verified → done
189
266
  ```
190
267
 
191
- Terminal failure stages: `failed`, `bot_pr_ci_failed`, `abandoned`,
192
- `cf-pages-poll-unrecoverable`.
268
+ Notes on the non-obvious stages:
269
+
270
+ - **`content_seeded`** (M-1235) — commits the bundle staged at
271
+ deploy time onto the app repo's `main` *before* the Pages project
272
+ exists, so the first Pages build has real content. No-op when no
273
+ bundle was staged (wizard path).
274
+ - **`tf_env_pr_opened` → `tf_env_pr_merged` → `tf_env_applied`**
275
+ (5c.2) — per-app-workspace audience stages. Legacy slugs set the
276
+ audience inline within `policy_attached` and skip straight to
277
+ `ready_for_content`.
278
+ - **`deployment_verified`** (M-1217) — polls the CF Pages
279
+ production-deployment API; the `lifecycle: live` flip is gated
280
+ behind a terminal build state.
281
+ - **`scheduled_tier`** appears only as an error-prefix pseudo-stage
282
+ for a two-tier app's cron-Worker writes — never in the happy path.
283
+
284
+ Terminal failure stages: `validator_rejected`, `tf_apply_failed`,
285
+ `bot_pr_ci_failed`, `abandoned`, `failed`. Note that
286
+ `cf-pages-poll-unrecoverable` is a failure *reason* string (the
287
+ `deployment_verified` poll giving up), not a stage.
193
288
 
194
289
  The CLI surfaces this through `launchpad apps` (lifecycle bucket)
195
290
  and `launchpad status <slug> --json` (the most recent transition).
@@ -211,16 +306,19 @@ Both flows live in `/launchpad-deploy` § Legacy.
211
306
 
212
307
  If the user wants to see open bot PRs for the app, route through
213
308
  `launchpad status <slug> --json` — the `hasOpenPr` / `openPrNumber`
214
- pair points at the platform-repo TF PR for in-flight changes. For
215
- the app repo, `launchpad apps` includes the deploy PR URL on the
216
- most-recent transition row when relevant. The playbook does not
217
- shell out to `gh pr list` — the bot owns GH credentials, not the
218
- CLI, and external users without M-KOPA GH access will not be able
219
- to follow such a link anyway.
309
+ pair points at the platform-repo TF PR for in-flight changes.
310
+ (`launchpad apps` renders only SLUG / NAME / ROLE / LIFECYCLE /
311
+ UPDATED it carries no PR URLs.) The playbook does not shell out
312
+ to `gh pr list` — the bot owns GH credentials, not the CLI, and
313
+ external users without M-KOPA GH access will not be able to follow
314
+ such a link anyway.
220
315
 
221
316
  ## Don'ts
222
317
 
223
- - Do **not** mutate anything from this skill. Status is read-only.
318
+ - Do **not** mutate anything from this skill with one sanctioned
319
+ exception: `launchpad recover` (§ Recover), whose only mutation is
320
+ the registry lifecycle record, repaired strictly after live
321
+ verification. Every other verb here is read-only.
224
322
  - Do **not** shell out to `gh`, `jq`, `curl`, or `git`. The CLI
225
323
  verbs cover every surface this skill needs.
226
324
  - Do **not** invent stage names or skip stages from the legacy
@@ -230,4 +328,6 @@ to follow such a link anyway.
230
328
  their own bounded retries; one playbook call, one error message,
231
329
  suggest the user re-run.
232
330
  - Do **not** parse the prose output of `launchpad status` /
233
- `launchpad apps`. Use `--json` for any downstream automation.
331
+ `launchpad apps`. Use `launchpad status <slug> --json` for any
332
+ downstream automation (`launchpad apps` has no `--json`; for
333
+ per-app automation go through `status --json` instead).