npm - mustflow - Versions diffs - 2.107.9 → 2.108.2 - Mend

mustflow 2.107.9 → 2.108.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/templates/default/locales/en/.mustflow/skills/browser-automation-reliability-review/SKILL.md ADDED Viewed

@@ -0,0 +1,279 @@
+---
+mustflow_doc: skill.browser-automation-reliability-review
+locale: en
+canonical: true
+revision: 1
+lifecycle: mustflow-owned
+authority: procedure
+name: browser-automation-reliability-review
+description: Apply this skill when browser automation, UI automation, Playwright, Selenium, Puppeteer, WebDriver, computer-use/browser-driving agents, visual browser verification, flaky selectors, page readiness, authentication state, CAPTCHA or anti-bot handling, rate limits, screenshot checks, retry, timeout, human approval, or browser automation observability is created, changed, reviewed, triaged, or reported.
+metadata:
+  mustflow_schema: "1"
+  mustflow_kind: procedure
+  pack_id: mustflow.core
+  skill_id: mustflow.core.browser-automation-reliability-review
+  command_intents:
+    - changes_status
+    - changes_diff_summary
+    - lint
+    - build
+    - test_related
+    - test
+    - docs_validate_fast
+    - test_release
+    - mustflow_check
+---
+# Browser Automation Reliability Review
+<!-- mustflow-section: purpose -->
+## Purpose
+Review browser automation as a stateful, evidence-producing system, not as a sequence of clicks.
+The core question is: "Does the automation know what state the browser, user, page, network,
+session, target data, and approval gate are in before it acts and before it claims success?" If not,
+the flow will look fine in a demo and then fail under rerenders, slow CI, auth drift, anti-bot
+gates, rate limits, visual noise, stale approvals, or agent hallucination.
+<!-- mustflow-section: use-when -->
+## Use When
+- Code, tests, docs, templates, or reviews touch browser automation, UI automation, end-to-end
+  harnesses, Playwright, Selenium, Puppeteer, WebDriver, browser contexts, remote browsers,
+  screenshots, videos, traces, HAR files, synthetic user flows, or computer-use browser agents.
+- A task mentions flaky selectors, unstable locators, actionability, stale elements, rerenders,
+  page readiness, `networkidle`, sleeps, waits, timeouts, retries, screenshot diffs, visual checks,
+  popups, downloads, native dialogs, iframes, shadow DOM, virtualized lists, or input typing.
+- Automation logs into a product, reuses storage state, shares accounts across workers, handles SSO,
+  OAuth, MFA, passkeys, cookies, localStorage, sessionStorage, IndexedDB, account lockout, or
+  permission changes.
+- Browser automation touches third-party sites, CAPTCHA, anti-bot or WAF challenges, rate limits,
+  robots or terms boundaries, IP reputation, headless fingerprints, provider throttling, or manual
+  fallback paths.
+- A browser-driving agent reads page content, follows page instructions, clicks by screenshot or
+  coordinates, extracts table data visually, enters forms, sends messages, purchases, deletes,
+  mutates external state, or asks for human approval before continuing.
+<!-- mustflow-section: do-not-use-when -->
+## Do Not Use When
+- The task is a pure LLM agent control-flow change with no browser or UI automation surface. Use
+  `agent-execution-control-review`.
+- The task is only prompt, RAG, model, tool schema, cost, latency, hallucination, or eval behavior
+  without browser execution. Use the matching LLM or agent specialist skill.
+- The task is only a product auth bug that is not being automated through a browser. Use
+  `auth-flow-triage` or `auth-permission-change`.
+- The task is only a browser request, CORS, CDN, API, or provider failure before the browser
+  automation layer is relevant. Use `api-failure-triage`.
+- The task is only frontend UI quality, layout resilience, accessibility, render stability, or web
+  performance for human users rather than automation harness reliability. Use the matching frontend
+  skill first.
+- The task is only test-suite runtime optimization, shard balance, retry policy, or flaky-test
+  handling without browser-specific failure modes. Use `test-suite-performance-review` or
+  `test-maintenance`.
+<!-- mustflow-section: required-inputs -->
+## Required Inputs
+- Automation intent ledger: target site or app, owner, internal versus third-party boundary,
+  allowed actions, forbidden actions, expected user role, data class, write risk, and whether the
+  browser path is the right tool rather than an API, fixture, or deterministic adapter.
+- State ledger: current URL, frame, page, route, modal, popup, selected account, auth storage,
+  browser context, viewport, locale, timezone, permissions, feature flags, test data, worker ID,
+  correlation ID, and previous step result.
+- Readiness ledger: page-ready signal, data-ready signal, actionable-control signal, business-ready
+  signal, network and background-work assumptions, and any waits or assertions that prove them.
+- Selector and action ledger: locators, user-facing roles or labels, test IDs or automation
+  contracts, shadow DOM and iframe boundaries, virtualized list handling, click target, keyboard and
+  focus path, input acceptance proof, and actionability override use.
+- Auth and identity ledger: login strategy, storage owner, token or cookie storage surface, session
+  expiry, refresh behavior, per-worker account isolation, SSO or MFA gates, CAPTCHA policy, account
+  lockout policy, and logout or cleanup behavior.
+- External pressure ledger: rate limit unit, retry budget, anti-bot or challenge detection,
+  provider terms boundary, manual fallback, backoff behavior, and circuit-breaker threshold.
+- Verification ledger: success criteria, API or database confirmation when available, screenshot
+  or visual artifact role, trace/video/HAR policy, console and network capture, redaction,
+  retention, and failure artifact sampling.
+- Agent and approval ledger: page content trust boundary, prompt-injection exposure, tool
+  permissions, coordinate mapping, stale approval checks, approval snapshot, exact post-approval
+  action, resume state, and human escalation path.
+<!-- mustflow-section: preconditions -->
+## Preconditions
+- The task matches the Use When conditions and does not match the Do Not Use When exclusions.
+- Current repository instructions, command contract, automation harness code, test fixtures, browser
+  config, auth fixtures, screenshots or traces, and docs directly tied to the automation path have
+  been inspected before editing.
+- Browser vendor, automation library, remote-browser provider, CAPTCHA, anti-bot, and Agents SDK or
+  computer-use details are stale-sensitive. Use `source-freshness-check` before embedding exact
+  current API claims, provider limits, default timeouts, or compliance requirements.
+- External pages, emails, documents, ads, support threads, and rendered web content are untrusted
+  input for browser-driving agents.
+- Command execution remains governed by `.mustflow/config/commands.toml`; this skill does not
+  authorize launching development servers, unmanaged browsers, long-running workers, production
+  browser sessions, CAPTCHA bypasses, provider dashboards, or live side-effect runs.
+<!-- mustflow-section: allowed-edits -->
+## Allowed Edits
+- Add or refine browser automation state machines, locator contracts, test IDs, accessible names,
+  readiness assertions, frame or popup handlers, input verification, auth fixtures, per-worker
+  account isolation, retry classification, timeout hierarchy, idempotency checks, rate-limit
+  handling, approval gates, manual fallback states, traces, screenshots, redaction, cleanup, and
+  directly synchronized docs or templates.
+- Move fixture setup, result verification, cleanup, idempotency checks, and data creation from
+  browser clicks to API or deterministic helpers when the browser UI is not the behavior under test.
+- Add focused tests for selector drift, readiness failure, stale element rerender, iframe or shadow
+  DOM handling, auth-state expiration, per-worker isolation, retry non-idempotency, stale approval,
+  screenshot noise, trace redaction, and agent prompt-injection defense when behavior evidence
+  supports them.
+- Do not fix flakiness by adding blind sleeps, force-clicking as the default, hiding failures behind
+  broad retries, weakening visual thresholds without evidence, sharing one mutable account across
+  parallel workers, or claiming browser success from an unverified screenshot.
+- Do not add CAPTCHA bypass, anti-bot evasion, headless fingerprint spoofing, or terms-violating
+  third-party automation as a normal product feature.
+<!-- mustflow-section: procedure -->
+## Procedure
+1. Decide whether the browser is the right boundary. Use API, fixtures, or service adapters for data
+   setup, teardown, and result verification when the browser UI itself is not the behavior being
+   tested or automated.
+2. Classify the automation owner: internal app E2E, internal operations tool, third-party site
+   workflow, browser-driving LLM agent, visual regression, scraping-like extraction, support tool,
+   or production user-assistance flow.
+3. Define a state machine before actions. Name the states such as unauthenticated, authenticated,
+   searching, selecting target, filling form, awaiting approval, submitting, verifying result,
+   retrying, blocked by challenge, manual fallback, succeeded, and failed.
+4. Replace sleeps with readiness evidence. For each step, define what proves the page is ready, the
+   data is ready, the target control is actionable, and the business state is safe to advance.
+5. Treat `networkidle` and selector-visible waits as weak signals. Prefer domain assertions such as
+   expected row identity, enabled submit state, loaded data count, settled validation, known URL,
+   confirmation ID, provider event, or backend result.
+6. Review locator contracts. Prefer stable user-facing roles, labels, names, and explicit test IDs
+   over CSS layout paths, generated classes, index-based XPath, translated prose only, or first-match
+   selectors.
+7. Check ambiguous DOM. Handle hidden duplicate controls, responsive desktop and mobile DOM at the
+   same time, skeletons that resemble real content, virtualized rows, portals, sticky overlays,
+   cookie banners, focus traps, iframes, cross-origin frames, shadow DOM, and custom components.
+8. Avoid stale element handles. Re-resolve locators at action time, and keep find-check-act-verify
+   close together so rerenders cannot invalidate old DOM references silently.
+9. Review actionability honestly. A forced click, coordinate click, JS-dispatched event, or disabled
+   actionability check must be exceptional, documented, and followed by proof that a real user path
+   is not being bypassed.
+10. Verify input acceptance. After typing, pasting, selecting dates, entering currency, using IME,
+    triggering autocomplete, or blurring a field, confirm the stored value, validation state, submit
+    readiness, or outbound payload rather than assuming keystrokes were accepted.
+11. Make auth state explicit. Identify whether auth lives in cookies, localStorage, sessionStorage,
+    IndexedDB, memory, or provider redirects; isolate accounts by worker; avoid shared mutable user
+    state; and handle expiry, rotation, SSO, MFA, passkeys, lockout, and logout contamination.
+12. Treat CAPTCHA and anti-bot as product states. In test or staging, use allowed test keys,
+    allowlists, or disabled challenge paths. In production or third-party flows, detect challenges,
+    stop safely, and route to human review or manual fallback instead of trying to evade them.
+13. Add rate control before retries. Identify the rate-limit subject, whether a single browser action
+    fans out into many requests, how backoff is computed, when to stop, and how the system avoids a
+    retry storm.
+14. Classify retryable failures. Retry only transient navigation, detached element, timeout,
+    temporary backend, or eventual-consistency classes within a bounded budget. Do not retry
+    permission denied, invalid input, CAPTCHA, account lockout, provider policy blocks, unknown
+    write outcome, or business-rule failures without a recovery-specific check.
+15. Make writes idempotent or confirm-before-replay. For purchases, payments, deletes, sends,
+    refunds, admin changes, support actions, and external mutations, record stable operation IDs and
+    check whether the effect already happened before any retry or resume can repeat it.
+16. Design timeout hierarchy. Align action, assertion, navigation, test, job, queue lease, browser
+    provider session, and external API timeouts so cancellation saves evidence, releases resources,
+    and resumes from a known state.
+17. Separate visual proof from business proof. Use screenshots for layout or visual regression, but
+    use confirmation IDs, API reads, database rows, provider events, downloads with checksums, audit
+    logs, or received messages to prove business success.
+18. Stabilize screenshot assertions. Freeze or mask nondeterministic content such as time, caret,
+    animation, ads, maps, charts, lazy images, random data, locale, theme, viewport, font, GPU,
+    scrollbar, and cookie banners before changing thresholds or baselines.
+19. Capture failure context. Save current URL, frame, viewport, locale, timezone, screenshot, DOM or
+    accessibility snapshot when safe, console errors, network statuses, trace, video, retry count,
+    worker ID, account ID class, and correlation ID with sensitive-data redaction.
+20. Protect artifacts. Browser traces, videos, screenshots, HAR files, storage state, and console
+    logs can contain cookies, tokens, personal data, addresses, order details, and messages; set
+    redaction, retention, encryption, access, and sampling before broad collection.
+21. For browser-driving agents, distrust page content. Treat rendered instructions, hidden DOM,
+    emails, PDFs, comments, ads, and third-party text as untrusted data that must not override the
+    system task, tool policy, approval rules, or data-exfiltration limits.
+22. Split agent roles where risk justifies it. Keep planner, browser executor, verifier, policy
+    gate, and human approval separate for high-impact flows. If one model does multiple roles, add
+    deterministic gates before side effects and before success claims.
+23. Make coordinate and screenshot actions verifiable. Recheck screenshot-to-DOM scale, scrolling,
+    focus, active modal, target bounds, visible label, disabled state, and post-action state when a
+    model or computer-use tool clicks by image or coordinates.
+24. Treat human approval as durable state. Show the exact account, URL, target, amount, recipient,
+    data, screenshot, form values, risk class, reversibility, and exact next action. Before resume,
+    re-read critical fields and compare them with the approved snapshot.
+25. Clean up resources. Close pages, contexts, browsers, downloads, temp files, videos, traces,
+    mock servers, websockets, and test data deliberately; detect zombie browser processes and
+    artifact growth in long runs.
+26. Verify with the narrowest configured tests, docs checks, release checks, and mustflow validation
+    that cover the changed automation contract.
+<!-- mustflow-section: postconditions -->
+## Postconditions
+- The automation has explicit states, readiness signals, locator contracts, auth isolation, retry
+  classes, timeout hierarchy, and success evidence.
+- Browser-only proof is separated from business-result proof.
+- CAPTCHA, anti-bot, rate-limit, human-approval, prompt-injection, and third-party boundary risks
+  are detected, stopped, or routed to manual fallback instead of hidden behind retries.
+- Failure artifacts are useful enough to debug and constrained enough not to leak secrets or
+  personal data.
+<!-- mustflow-section: verification -->
+## Verification
+Use configured oneshot command intents when available:
+- `changes_status`
+- `changes_diff_summary`
+- `lint`
+- `build`
+- `test_related`
+- `test`
+- `docs_validate_fast`
+- `test_release`
+- `mustflow_check`
+Use the narrowest configured fixture, unit, integration, docs, package, or release check that proves
+the changed browser automation contract. Do not infer raw browser launches, dev servers, headed
+browsers, provider dashboards, CAPTCHA-solving services, or production automation runs from local
+files.
+<!-- mustflow-section: failure-handling -->
+## Failure Handling
+- If the failure is not localized to browser automation, use `api-failure-triage`,
+  `auth-flow-triage`, `frontend-render-stability`, `test-maintenance`, or another narrower skill
+  first.
+- If a selector is flaky, do not patch only the selector string until locator ownership, duplicate
+  DOM, responsive DOM, skeletons, frames, shadow DOM, and readiness have been checked.
+- If a retry would replay an unknown write, stop and add idempotency or effect-confirmation before
+  enabling retry.
+- If CAPTCHA, anti-bot, account lockout, provider policy, or terms boundaries are detected, stop the
+  automation path and report the manual or contractual fallback instead of bypassing it.
+- If human approval resumes after state changed, expire the approval or request a new approval with
+  the changed fields.
+- If artifacts would leak secrets or personal data, collect a smaller redacted evidence set and
+  report the observability gap.
+- If a configured command fails, use `failure-triage` before continuing.
+<!-- mustflow-section: output-format -->
+## Output Format
+- Browser automation surface reviewed
+- Browser-versus-API boundary and automation owner
+- State machine, readiness, locator, actionability, auth, rate-limit, retry, timeout, and
+  idempotency decisions
+- Screenshot, trace, artifact, redaction, and business-success evidence
+- Agent page-content trust, coordinate action, tool permission, approval, and resume checks
+- Files changed
+- Command intents run
+- Skipped checks and reasons
+- Remaining browser automation reliability risk

package/templates/default/locales/en/.mustflow/skills/ci-pipeline-triage/SKILL.md CHANGED Viewed

@@ -2,11 +2,11 @@
 mustflow_doc: skill.ci-pipeline-triage
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: ci-pipeline-triage
-description: Apply this skill when a CI/CD workflow, pipeline, job, runner, matrix, trigger, cache, artifact, deployment job, required check, or post-deploy verification is failing, skipped, queued, flaky, slow, green despite broken output, or not yet localized to trigger, runner, environment, build, test, artifact, deploy, or verification boundaries.
+description: Apply this skill when a CI/CD workflow, pipeline, job, runner, matrix, trigger, cache, artifact, runner-minute billing, artifact storage or retention, deployment job, required check, or post-deploy verification is failing, skipped, queued, flaky, slow, unexpectedly expensive, green despite broken output, or not yet localized to trigger, runner, environment, build, test, cache, artifact, billing, deploy, or verification boundaries.
 metadata:
   mustflow_schema: "1"
   mustflow_kind: procedure
@@ -46,6 +46,9 @@ changed from the last known-good run, and what evidence would disprove each boun
   deployment permissions, rollout completion, or post-deploy verification.
 - A pipeline suddenly breaks without application-code changes, or only fails on forks, protected
   branches, specific runners, specific regions, specific matrix entries, or reruns.
+- A CI workflow becomes unexpectedly expensive, burns private-repository minutes too quickly,
+  exhausts artifact storage, keeps long-lived test artifacts, or needs a release matrix cost review
+  before the expensive boundary is known.
 <!-- mustflow-section: do-not-use-when -->
 ## Do Not Use When
@@ -66,6 +69,10 @@ changed from the last known-good run, and what evidence would disprove each boun
 - Run identity ledger: commit SHA, branch or tag, trigger event, workflow file revision, matrix
   entry, runner label and image, architecture, region, toolchain versions, package-manager version,
   execution time, and run or job id.
+- CI billing ledger when cost is in scope: public versus private repository behavior, plan or
+  allowance snapshot, provider billing page or docs date, runner OS and size, job count, matrix
+  shape, per-job rounding behavior, queue versus execution time, artifact retention days, cache
+  retention or quota, and release asset handoff.
 - Last-good comparison: last successful commit and first failing commit, including workflow files,
   lockfiles, base images, shared scripts, secrets or permission scopes, runner labels, cache keys,
   feature flags, deployment config, and required-check settings.
@@ -88,9 +95,9 @@ changed from the last known-good run, and what evidence would disprove each boun
 ## Allowed Edits
 - Add or tighten workflow triggers, path filters, matrix guards, version pinning, cache keys,
-  artifact manifests, status aggregation, debug evidence collection, secret-safe diagnostics,
-  timeout classification, runner labels, concurrency locks, environment validation, smoke checks,
-  test isolation, docs, and focused fixtures.
+  artifact manifests, artifact retention, release-asset promotion, status aggregation, debug
+  evidence collection, secret-safe diagnostics, timeout classification, runner labels, concurrency
+  locks, environment validation, smoke checks, test isolation, docs, and focused fixtures.
 - Add tests or docs that prove workflow contract behavior, package metadata, template output,
   release checks, artifact identity, or command-contract mapping when the repository owns those
   surfaces.
@@ -134,21 +141,37 @@ changed from the last known-good run, and what evidence would disprove each boun
     dimensions. Artifacts need file list, size, hash, build SHA, and download verification.
 14. Verify that the tested artifact is the deployed artifact. Rebuilding during deploy can make CI
     test one thing and production receive another.
-15. Check auth and permissions by execution context. Fork PRs, protected branches, environments,
+15. For CI cost or quota questions, split the bill before optimizing:
+    - runner execution minutes, not artifact bytes, usually dominate native app release cost;
+    - macOS or other premium runners can dominate a matrix even when Linux jobs are longer;
+    - job-level minimum billing or rounding can make many tiny split jobs cost more than one
+      grouped job;
+    - public repository standard-runner rules can differ from private repository included minutes;
+    - billing pages may display currency spend while plan allowances are minute or storage quotas,
+      so confirm the unit before comparing options.
+16. Separate Actions artifacts, caches, package registries, and release assets. Short-lived test
+    bundles should use short retention. Long-lived distributables should be promoted through the
+    repository's release or package channel when that is the intended public artifact. Do not treat
+    cache quota as artifact storage or release assets as CI retention.
+17. For native desktop matrices, avoid full bundles on every PR unless the repository explicitly
+    requires it. Prefer PR checks that prove frontend build plus native compile or type contracts on
+    the cheapest adequate runner, then run signed or full OS package matrices only on release tags,
+    release branches, or protected manual gates.
+18. Check auth and permissions by execution context. Fork PRs, protected branches, environments,
     OIDC identity, package publishing identity, cloud role, and repository token scopes can differ
     across otherwise similar runs.
-16. For deployment jobs, require rollout evidence, readiness, smoke checks, error and latency
+19. For deployment jobs, require rollout evidence, readiness, smoke checks, error and latency
     thresholds, and environment concurrency locks instead of treating a zero exit code as success.
-17. Preserve evidence before cleanup. Do not delete runners, caches, artifacts, temporary dirs, or
+20. Preserve evidence before cleanup. Do not delete runners, caches, artifacts, temporary dirs, or
     diagnostic logs until the boundary and redaction plan are clear.
-18. Apply the smallest localized fix and verify with the narrowest configured intent that covers the
+21. Apply the smallest localized fix and verify with the narrowest configured intent that covers the
     changed workflow, package, docs, template, or test surface.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
-- The pipeline failure is localized to trigger, runner, environment, build, test, artifact, deploy,
-  verification, or a named evidence gap.
+- The pipeline failure is localized to trigger, runner, environment, build, test, artifact, billing
+  or storage quota, deploy, verification, or a named evidence gap.
 - Last-good versus first-failure comparison, run identity, false-green risk, cache and artifact
   behavior, permission scope, and rerun determinism are explicit where relevant.
 - Follow-up deployment, test performance, security, command-contract, or package-release work is
@@ -178,6 +201,9 @@ CI reruns, deploys, cloud shell commands, or provider dashboard writes outside t
 - If run identity, last-good comparison, trigger graph, runner, cache, artifact, or permission
   evidence is missing, report the missing field instead of guessing.
+- If CI pricing, included minutes, storage quotas, or runner rates are time-sensitive and not
+  locally available, avoid exact price claims and name the provider billing evidence that must be
+  checked.
 - If debug logs contain secrets or private data, stop copying raw output and summarize safely.
 - If CI evidence requires remote provider access that is unavailable or unconfigured, report the
   manual evidence boundary and continue with local workflow or static evidence.
@@ -191,6 +217,8 @@ CI reruns, deploys, cloud shell commands, or provider dashboard writes outside t
 - Failure shape and localized boundary
 - Run identity and last-good comparison
 - Trigger, runner, environment, build, test, cache, artifact, deploy, and verification findings
+- Billing unit, runner-minute, matrix rounding, artifact retention, cache quota, and release asset
+  findings when cost is in scope
 - Hypotheses killed, still open, and selected follow-up boundary
 - Fix applied or recommended
 - Evidence level: provider run evidence, configured-test evidence, static review risk, manual-only,

package/templates/default/locales/en/.mustflow/skills/cloud-cost-guardrail-review/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.cloud-cost-guardrail-review
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: cloud-cost-guardrail-review
@@ -65,6 +65,9 @@ lifecycle cleanup, and service-specific caps before the bill becomes the first a
   narrower security skill first, then use this skill for spend blast radius.
 - The task only changes local development code with no cloud, provider, telemetry, storage,
   network, external API, or deployable infrastructure surface.
+- The task is primarily CI runner minutes, workflow matrix cost, Actions artifact retention,
+  build-cache quota, release asset handoff, or CI job billing; use `ci-pipeline-triage` first, then
+  return here only when broader cloud, SaaS, or provider spend guardrails remain.
 <!-- mustflow-section: required-inputs -->
 ## Required Inputs

package/templates/default/locales/en/.mustflow/skills/database-change-safety/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.database-change-safety
 locale: en
 canonical: true
-revision: 16
+revision: 17
 lifecycle: mustflow-owned
 authority: procedure
 name: database-change-safety
@@ -79,6 +79,7 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
 - Event role: operational event, audit log, behavior analytics event, integration outbox message, reporting aggregate, or replayable domain event.
 - Data owner and affected tables, collections, stores, indexes, caches, generated files, or read models.
 - Entity identity rules, including stable ids, external provider ids, mutable slugs, titles, locale-specific addresses, redirects, and public API identifiers when content or user-facing resources are involved.
+- Regret-prone schema shape rules, including internal versus public ids, normalized unique keys, tenant-scoped uniqueness, foreign keys, join tables, enum or lookup-table ownership, nullable-field meaning, JSON promotion criteria, custom-field boundaries, status history, optimistic locking, and operational trace fields.
 - Exit and restore rules, including whether exported data preserves relationships, permissions, files, versions, events, audit history, automation rules, provider id mappings, schema metadata, and enough import or restore evidence to reconstruct product state.
 - Identifier ownership rules, including which ids are product-owned, which ids are public, which ids are provider mappings, and whether external auth, payment, CRM, analytics, storage, or CMS ids can change without breaking internal references.
 - Authentication identity rules, including app-owned user id, provider subject records, email-as-attribute behavior, social provider subject preservation, account merge or relink policy, session migration expectations, and whether memberships, roles, permissions, and entitlements live in product-owned tables rather than only provider metadata.
@@ -173,8 +174,25 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
    - External-service core facts, such as current entitlement, subscription or plan state, processed payment event id, email consent state, customer lifecycle state, file identity and ownership, search source document metadata, job processing state, and audit evidence. Do not let a provider dashboard be the only place that can explain these facts.
    - Search and queue reconstruction records, such as index document builders, ranking or synonym policy versions, search logs, queue message schema versions, job idempotency keys, retry state, dead-letter state, and manual replay markers.
 4. Check schema shape: primary keys, foreign keys, unique constraints, nullable fields, defaults, check constraints, status values, timestamps, soft delete fields, tenant scope, audit fields, and retention rules.
+   - Use immutable internal primary keys for joins and separate public identifiers for URLs and APIs. Do not make email, slug, username, external provider id, or mutable display code the primary key for product-owned rows.
+   - Enforce uniqueness in the database, not only in application prechecks. Normalize comparison keys such as email, slug, provider id, and idempotency key explicitly, preserve the display value separately when needed, and name the unique constraint or index so operations can diagnose failures.
+   - Scope unique constraints to the real owner. Tenant-owned slugs, emails, invitations, memberships, idempotency keys, and external references usually need `tenant_id`, `workspace_id`, `operation_type`, or `provider` in the key. Global uniqueness should be a deliberate product rule, not an accident.
+   - Design soft-delete uniqueness before shipping. Active-only uniqueness, nullable unique behavior, restore conflicts, deleted-id reuse, and tombstone requirements must be explicit; otherwise deleted rows either block valid new records or allow duplicate active records.
+   - Prefer database foreign keys for core ownership and reference integrity. If an FK is intentionally omitted for scale, import staging, sharding, or asynchronous reconciliation, name the replacement invariant, cleanup path, and orphan-detection evidence. Index FK columns when joins, parent deletion checks, or tenant deletion depend on them.
+   - Treat `ON DELETE CASCADE` as a lifecycle promise, not cleanup convenience. Use it only when child rows truly share the parent's lifetime and audit, retention, restore, and legal obligations do not require separate survival.
+   - Model many-to-many relationships with join tables that can own role, status, order, source, timestamps, and actor fields. Avoid comma-separated ids, arrays of ids, or JSON lists for relationships that need joins, uniqueness, permissions, deletes, or audit.
+   - Treat polymorphic `entity_type` plus `entity_id` relations as integrity debt for core data because ordinary FKs cannot prove the target exists. Prefer target-specific tables, a shared parent table, or explicit constraint and cleanup machinery when the relation is business-critical.
+   - Choose enum, lookup table, or state machine based on change behavior. Stable technical codes may be enums; operator-managed values, values with display or sort metadata, plan or category catalogs, roles, and jurisdiction-specific rules usually need lookup tables. Workflow status needs allowed transitions and history, not only a value list.
+   - Avoid boolean state soup such as several independent `is_*` flags for one lifecycle. Use one current status plus timestamps or event history when states are mutually exclusive, ordered, reversible, or policy-driven.
+   - Give nullable fields exactly one meaning. Separate unknown, not applicable, not entered yet, deleted, failed, and pending states with explicit status or reason fields when queries or reports depend on the distinction.
+   - Avoid EAV or generic `entities`/`attributes`/`values` tables for core domain facts. If customer-defined fields are required, keep them in a bounded custom-field area with definitions, type validation, quotas, ownership, export semantics, and a promotion path once values drive search, sort, permission, billing, or reporting.
+   - Do not hide behavior-driving data in JSON. Keys used for filters, ordering, joins, uniqueness, permissions, tenant scope, status, retention, money, dates, quotas, indexes, or operational dashboards should be typed columns, child tables, or generated/computed columns with a migration path. Use `database-json-modeling-review` when JSON is part of the diff.
+   - Keep tenant ownership close to the owned row when tenant-scoped operations, billing, audit, export, restore, delete, or performance matter. B2B products should usually separate global users from tenant memberships, roles, invitations, entitlements, and billing records.
    - Treat deletion as lifecycle when recovery, audit, search behavior, support handling, or retention matters. Consider `deleted_at`, `deleted_by`, `delete_reason`, `restored_at`, `restored_by`, and `purge_after` instead of a lone boolean or timestamp.
    - Separate business records that should be soft-deleted or archived from personal data that should be anonymized, purged, or retained under a narrower legal rule.
+   - Keep status history for states that affect money, access, fulfillment, support, compliance, or user-visible commitments. A current status alone rarely explains who changed it, why, under which request, and whether a late webhook, retry, or admin action should still apply.
+   - Add optimistic versioning or conditional updates when two users, admins, workers, or webhooks can edit the same important row. Last-write-wins is usually data loss unless the product explicitly accepts it.
+   - Add operational trace fields where incident response will need them: server timestamps, actor ids, `created_by`, `updated_by`, `request_id`, `source`, import or provider reference, and safe reason codes. Do not add them blindly to every table, but do not leave high-value rows untraceable.
    - Treat mutable high-value records as versioned when reproducibility matters, such as AI prompts, documents, contracts, price policies, experiment configs, comparison data, permission policies, automation rules, and model settings. Prefer a stable parent row with a current-version pointer plus immutable version rows.
    - Use ledgers for money-like or quota-like balances, such as points, credits, inventory reservations, refunds, coupon issuance, entitlement grants, and manual adjustments. Treat cached balances as derived from ledger entries unless the local design proves otherwise.
    - For audit logs, store actor type, actor id when safe, action, target type and id, bounded before and after values, reason, request id, idempotency key, and timestamp in the same local transaction as the audited change when possible. Audit logs should be append-only to normal operators and should redact or omit personal data that is not needed to explain the change.
@@ -318,6 +336,7 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
 ## Postconditions
 - The database role and source of truth are explicit.
+- Regret-prone schema shortcuts such as mutable primary keys, app-only uniqueness, unscoped tenant uniqueness, missing FK or cascade ownership, ambiguous nulls, boolean state soup, polymorphic core relations, EAV core facts, behavior-driving JSON, and user-as-tenant coupling are fixed, explicitly accepted, or reported.
 - Database rows, ORM models, generated caches, and read models do not leak into domain truth unless the local architecture intentionally owns that boundary.
 - Queries preserve authorization, tenant or user scope, deterministic ordering, expected absence behavior, and retention rules.
 - Content and resource models separate stable identity from mutable titles, slugs, URLs, translations, display fields, revisions, facts, sources, projections, and analytics dimensions when those concerns exist.
@@ -375,7 +394,7 @@ Prefer the narrowest configured test, build, docs, release, or mustflow intent t
 - Database role and owner
 - Affected read and write paths
-- Schema, constraint, and query semantics reviewed
+- Schema-regret, constraint, relation, enum, JSON, custom-field, status-history, traceability, and query semantics reviewed
 - Identity, slug, lifecycle, asset, body block, taxonomy, relationship, attribute, filter URL, landing-page, translation, locale, country, currency, timezone, local-date, money, price snapshot, revision, claim, fact, source, collection, verification, comparison methodology, affiliate link, data-ownership, behavior analytics, audit log, API projection, public identifier, backup or restore, bulk update, admin audit, user-state, aggregate, cache-key, projection, and cache-invalidation checks where relevant
 - Export, import, product-owned id, provider-id mapping, relationship, permission, file, automation, event-history, and reconstruction checks where relevant
 - Authorization, tenant scope, retention, and privacy checks

package/templates/default/locales/en/.mustflow/skills/database-migration-change/SKILL.md CHANGED Viewed

@@ -2,11 +2,11 @@
 mustflow_doc: skill.database-migration-change
 locale: en
 canonical: true
-revision: 3
+revision: 4
 lifecycle: mustflow-owned
 authority: procedure
 name: database-migration-change
-description: Apply this skill when database migration files, schema migration history, ORM schema migrations, generated clients, schema dumps, SQL snapshots, online DDL, large indexes, constraints, state-dependent CHECK constraints, backfills, rolling deploy compatibility, expand-and-contract changes, destructive database changes, migration rollback or roll-forward claims, cut-over plans, lock or timeout policy, replication lag risk, migration observability, or production database migration procedures are created, changed, reviewed, or reported.
+description: Apply this skill when database migration files, schema migration history, ORM schema migrations, generated clients, schema dumps, SQL snapshots, online DDL, large indexes, constraints, state-dependent CHECK constraints, background-job backfills, zero-downtime migration claims, rolling deploy compatibility, expand-and-contract changes, destructive database changes, migration rollback or roll-forward claims, cut-over plans, feature-flagged read/write switches, lock or timeout policy, replication lag risk, migration observability, or production database migration procedures are created, changed, reviewed, or reported.
 metadata:
   mustflow_schema: "1"
   mustflow_kind: procedure
@@ -33,13 +33,14 @@ Keep database migrations safe for running systems by checking deploy compatibili
 Do not treat migration authoring as "make a file that applies locally." Treat it as "old code and new code must survive the same database during rollout."
 Migration incidents usually happen in the interval where old code, new code, old data, and new data are all alive at once. Design that interval first.
+Do not collapse schema expansion, data backfill, read or write cut-over, and destructive cleanup into one deploy-time migration just because that worked on a developer database.
 <!-- mustflow-section: use-when -->
 ## Use When
 - A database migration file, migration history entry, schema dump, ORM schema, SQL snapshot, generated client, seed, fixture, schema validator, or migration documentation is created or changed.
 - A change adds, removes, renames, splits, merges, backfills, rewrites, validates, constrains, indexes, foreign-keys, type-changes, defaults, nullable rules, enum values, tables, columns, generated columns, triggers, views, functions, row-level policies, or data migrations.
-- A task mentions rolling deploy, expand-and-contract, online migration, backfill, production schema change, rollback, roll-forward, down migration, migration lock, lock timeout, statement timeout, DDL transaction, `CREATE INDEX CONCURRENTLY`, MySQL `ALGORITHM=INSTANT`, MySQL `LOCK=NONE`, generated ORM client, migration drift, schema drift, or database migration safety.
+- A task mentions rolling deploy, zero-downtime migration, expand-and-contract, online migration, long-running migration, background job backfill, feature flag migration, dual-write, dual-read, compatibility read fallback, production schema change, rollback, roll-forward, down migration, migration lock, lock timeout, statement timeout, DDL transaction, `CREATE INDEX CONCURRENTLY`, MySQL `ALGORITHM=INSTANT`, MySQL `LOCK=NONE`, generated ORM client, migration drift, schema drift, or database migration safety.
 - Prisma, Drizzle, TypeORM, Rails Active Record, Django migrations, Alembic, Diesel, Ecto, Flyway, Liquibase, Knex, Sequelize, SQLx, or another migration tool changes schema, generated output, migration metadata, or deployment behavior.
 - A final report claims a database migration is safe, reversible, applied, validated, production-ready, no-downtime, rollback-safe, or tested from an old schema.
@@ -58,6 +59,8 @@ Migration incidents usually happen in the interval where old code, new code, old
 - Deployment shape: single-step deploy, rolling deploy, blue-green, multiple app versions, background workers, read replicas, multiple services, serverless functions, mobile clients, or external integrations.
 - Database engine and operational surface: PostgreSQL, MySQL, SQLite, SQL Server, managed database, migration lock behavior, DDL transaction behavior, online DDL options, table size, write load, long-running transactions, replication or CDC topology, expected lock time, statement timeout, lock timeout, and restore capability when known.
 - Data preservation needs, compatibility window, backfill size, batch strategy, cursor or checkpoint marker, validation query, observability query, rollback or roll-forward type, cut-over control, and whether old code can run after the new schema lands.
+- Application transition controls: feature flags, tenant gates, read fallback, dual-write window, old-write cutoff, old-read cutoff, worker rollout order, admin/reporting/BI dependency review, and how to disable the new path without restoring the database.
+- Production runbook boundary: execution owner, intended window, expected lock time, expected replication lag, metrics to watch, stop or pause thresholds, retry policy, partial-apply handling, customer-impact communication trigger, and manual approval points when relevant.
 - State and timestamp invariant matrix when a migration introduces lifecycle statuses, terminal
   timestamps, retry or dead-letter states, delivery states, soft-delete states, approval states, or
   other columns whose valid nullability depends on status.
@@ -78,6 +81,7 @@ Migration incidents usually happen in the interval where old code, new code, old
 - Update migration files, ORM schema files, generated client expectations, schema dumps, SQL snapshots, seeds, fixtures, compatibility code, backfill code, validation checks, docs, and tests directly required by the migration.
 - Prefer expand-and-contract for live systems: add compatible shape, dual-write or compatibility-read where needed, backfill safely, switch reads and writes, then contract only after compatibility is proven.
+- Move long-running data rewrites out of deploy-time schema migrations into bounded, restartable background jobs when production-sized data or live traffic can be affected.
 - Keep destructive cleanup separate from expansion unless the repository explicitly proves a single-step deployment is safe.
 - Do not weaken tests, delete migration history, hand-edit generated client output, suppress migration drift, or claim rollback safety for lossy changes.
@@ -93,19 +97,21 @@ Migration incidents usually happen in the interval where old code, new code, old
    - Django: migration files, state operations, historical models, schema editor behavior, generated SQL when relevant, and data migration functions.
    - Alembic or SQLAlchemy: migration revisions, autogenerate output, branch heads, model metadata, downgrade functions, naming conventions, and generated SQL.
    - Diesel, Ecto, Flyway, Liquibase, Knex, Sequelize, SQLx, and raw SQL: migration history, checked-in SQL, generated metadata, compile-time query checks, rollback files, and schema dumps.
-3. Build a migration ledger: old shape, new shape, rows affected, old code behavior, new code behavior, rollback expectation, generated artifact changes, dependent callers, and validation query.
+3. Build a migration ledger: old shape, new shape, rows affected, old code behavior, new code behavior, worker and batch behavior, admin/reporting/BI behavior, rollback expectation, generated artifact changes, dependent callers, and validation query.
 4. Classify compatibility.
    - Old code on old schema.
    - Old code on expanded schema.
    - New code on expanded schema.
    - New code after backfill.
    - New code after contract.
+   - Old background workers, cron jobs, admin tools, reporting queries, and external integrations during the same window.
    If any required state fails, the migration is not rolling-deploy safe.
 5. Split the deployment plan into expand, backfill, switch, and contract phases.
    - Expansion adds shapes old code can ignore and new code can start writing.
-   - Backfill is bounded, restartable, idempotent, observable, and separately validated.
-   - Switch changes read paths through a feature flag, rollout gate, tenant gate, or compatible deploy step where possible.
+   - Backfill is bounded, restartable, idempotent, observable, separately validated, and separated from the deployment pipeline when it can run long.
+   - Switch changes read and write paths through a feature flag, rollout gate, tenant gate, or compatible deploy step where possible.
    - Contract removes old shapes only after at least one compatibility window proves no code, job, report, or manual SQL still depends on them.
+   - A single migration file that expands, rewrites data, flips reads, and drops old structures is not zero-downtime evidence unless the repository proves the single-step path is safe for its deployment model.
 6. For column add, decide nullability, default behavior, backfill strategy, write path, read fallback, index need, and when a future `NOT NULL` or constraint can be enforced.
    - Add nullable first unless a proven engine/version/table-size path makes the non-null default safe.
    - Do not assume a database default backfills existing rows or matches ORM, API, batch, or application defaults.
@@ -139,6 +145,10 @@ Migration incidents usually happen in the interval where old code, new code, old
     - Partition attach can scan existing rows unless a suitable `CHECK` constraint proves the range first.
     - Table split, table merge, or relationship rewrite must preserve stable identifiers, foreign keys, audit references, external IDs, permissions, search documents, exports, and old-to-new mapping until all callers switch.
 15. For backfills, make them bounded, restartable, observable, and validated. Define batch size, cursor-based ordering key such as `id > last_id`, checkpoint, retry behavior, idempotency, timeout, lock expectation, throttle or pause/resume control, dead-letter or manual review behavior, and validation queries.
+    - Keep long-running data rewrites out of deploy-time migrations unless the affected row count, lock behavior, WAL/binlog or undo impact, replication lag, and timeout behavior prove the operation is short and bounded.
+    - Commit in small batches instead of one huge transaction when live data volume can be large.
+    - Process only rows that still need work, so reruns and retries cannot corrupt already migrated rows.
+    - Track progress with a durable cursor or checkpoint; do not rely on offset pagination for mutable production tables.
 16. Do not run or recommend full-table updates on production-sized data without measured volume, lock expectation, WAL or undo impact, replication lag risk, batch plan, timeout policy, and recovery plan.
 17. Review replication, CDC, and long-running transaction interactions.
     - Online DDL can leave replicas, read traffic, backups, CDC connectors, or failover readiness behind even when the primary looks healthy.
@@ -150,14 +160,17 @@ Migration incidents usually happen in the interval where old code, new code, old
     - Monitor dual-write mismatch and sample old/new values during the compatibility window; code intent is not proof that every path writes both sides.
 19. Prepare observability before apply.
     - Pair the migration with read-only progress and safety queries for lock waits, index build progress, replication lag, backfill cursor, skipped rows, failed rows, duplicate rows, missing rows, dead tuples, or estimated remaining range when the engine supports them.
+    - Watch application error rate, p95 or p99 latency, connection pressure, fallback-read rate, dual-write mismatch rate, and critical business event failures when the migration changes a live request path.
     - Log or report dry-run selection counts, apply counts, skip reasons, batch durations, and recovery handles.
     - A final `done` line is not enough evidence for a live migration.
+    - Prepare a runbook before apply. It should name the operator, execution window, expected duration, expected lock and replication behavior, stop thresholds, pause or abort action, partial-apply behavior, code rollback order, feature-flag fallback, validation queries, and customer-impact communication trigger.
 20. Decide rollback honestly and prefer roll-forward for partial live changes.
     - Reversible: schema-only and data-preserving.
     - App rollback: old and new code both tolerate the expanded shape, so the read path can move back without losing new writes.
     - Forward-fix preferred: partial live migration can be corrected without restoring.
     - Restore required: deletes, table merges, generated IDs, hashing, encryption, irreversible type conversions, external side effects, or lossy transforms.
     Do not promise rollback for changes that cannot reconstruct old values.
+    Treat backups as disaster recovery evidence, not ordinary deploy rollback, unless a restore drill proves that restoring the database would not lose acceptable live writes, external side effects, or dependent service state.
 21. Keep external side effects out of database migrations unless the repository has an explicit recovery model. Sending emails, calling payment APIs, deleting files, or mutating external providers from a migration usually breaks rollback.
 22. Check generated surfaces after schema changes: ORM clients, types, SQL snapshots, schema dumps, OpenAPI or GraphQL projections, API mocks, fixtures, seeds, admin screens, analytics, ETL, BI queries, and docs examples.
 23. Review ORM-specific traps.
@@ -180,11 +193,13 @@ Migration incidents usually happen in the interval where old code, new code, old
 - Source schema, target schema, migration files, generated artifacts, schema dumps, seeds, fixtures, and dependent code agree.
 - Expand, backfill, switch, and contract phases are separated or explicitly proven unnecessary.
 - Old-code/new-schema and new-code/expanded-schema compatibility is classified.
+- Read-path fallback, write-path transition, dual-write mismatch detection, feature-flag control, and old worker/admin/reporting dependency review are explicit when a live rollout can overlap versions.
 - Backfill and validation behavior is cursor-based or otherwise bounded, restartable, idempotent, observable, and checkable where relevant.
 - State-dependent CHECK constraints, terminal timestamp exclusivity, and valid nullability matrices
   are explicit where status columns can otherwise contradict timestamp or reason columns.
 - Lock levels, online DDL support, long-running transaction waits, replication lag, cut-over control, timeout policy, and observability queries are explicit where production data may be affected.
 - Rollback claims distinguish schema rollback, data rollback, app rollback, roll-forward, forward-fix, and restore-required cases.
+- Production runbook stop thresholds, pause or abort behavior, partial-apply handling, and communication triggers are explicit where the migration can affect live service behavior.
 - Destructive changes and production lock risks are either deferred, measured, guarded, or reported as remaining risk.
 <!-- mustflow-section: verification -->
@@ -212,6 +227,8 @@ Prefer configured migration dry-run, generated-output, schema-diff, or database
 - If online DDL support, long-running transaction behavior, replication lag, or cut-over control is unknown, report the migration as operationally unproven.
 - If an autogenerator proposes drop/create for a rename, stop and rewrite the migration plan.
 - If a migration is lossy, do not claim rollback beyond restore or forward corrective migration.
+- If rollback depends only on a backup restore, label it disaster recovery instead of deploy rollback and report live-write loss or external-state reconciliation risk.
+- If the migration plan lacks feature-flag fallback, read/write cut-over order, stop thresholds, or partial-apply handling for a live rollout, do not call it zero-downtime.
 - If a backfill is not idempotent, restartable, observable, and throttled or bounded, keep it out of a production migration claim.
 - If generated clients or schema dumps drift, fix the source of truth and regenerated surfaces together.
 - If configured verification is missing, report the missing command intent instead of inferring package-manager, ORM, or migration-tool commands.
@@ -224,7 +241,8 @@ Prefer configured migration dry-run, generated-output, schema-diff, or database
 - Source schema, target schema, and migration phase
 - Old-code/new-schema and new-code/expanded-schema compatibility
 - Expand/backfill/switch/contract plan and destructive cleanup timing
-- Backfill cursor, idempotency, throttle, pause/resume, validation, lock, timeout, replication, cut-over, and observability classification
+- Read/write transition, feature-flag fallback, dual-write or compatibility-read window, old worker/admin/reporting dependency review
+- Backfill cursor, idempotency, throttle, pause/resume, validation, lock, timeout, replication, cut-over, runbook stop threshold, and observability classification
 - Status, timestamp, CHECK constraint, and existing-row validation matrix where relevant
 - Rollback, app rollback, roll-forward, forward-fix, and restore-required classification
 - ORM/generated client/schema dump/snapshot surfaces synchronized