npm - mustflow - Versions diffs - 2.107.9 → 2.108.0 - Mend

mustflow 2.107.9 → 2.108.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/templates/default/locales/en/.mustflow/skills/admin-control-plane-safety-review/SKILL.md ADDED Viewed

@@ -0,0 +1,200 @@
+---
+mustflow_doc: skill.admin-control-plane-safety-review
+locale: en
+canonical: true
+revision: 1
+lifecycle: mustflow-owned
+authority: procedure
+name: admin-control-plane-safety-review
+description: Apply this skill when code is created, changed, reviewed, or reported and admin panels, backoffice tools, operator consoles, support tools, internal dashboards, staff APIs, admin RBAC or ABAC, scoped operator roles, audit logs, change history, impersonation, dangerous action confirmation, approval flows, exports, imports, bulk operations, admin search and filters, production guardrails, PII masking, admin sessions, MFA, or operational evidence need review as a high-risk control plane rather than a convenience UI.
+metadata:
+  mustflow_schema: "1"
+  mustflow_kind: procedure
+  pack_id: mustflow.core
+  skill_id: mustflow.core.admin-control-plane-safety-review
+  command_intents:
+    - changes_status
+    - changes_diff_summary
+    - lint
+    - build
+    - test_related
+    - test
+    - test_audit
+    - docs_validate_fast
+    - test_release
+    - mustflow_check
+---
+# Admin Control Plane Safety Review
+<!-- mustflow-section: purpose -->
+## Purpose
+Review administrator and operator surfaces as production control planes.
+An admin page is not just a place where staff edits rows faster. It is where one mistaken click, stale permission, unscoped search, silent export, or confused impersonation session can delete customer data, leak private information, grant privilege, break billing, or make incident reconstruction impossible.
+<!-- mustflow-section: use-when -->
+## Use When
+- Admin panels, backoffice tools, support consoles, operator dashboards, staff APIs, internal routes, admin GraphQL or RPC procedures, moderation queues, billing consoles, tenant management tools, feature-flag consoles, data repair tools, or emergency-access tools are created, changed, reviewed, or reported.
+- Work touches scoped admin roles, RBAC or ABAC for operators, support permissions, organization or tenant admin scope, role assignment, privilege factories, approval flows, impersonation, break-glass access, or admin session boundaries.
+- Work touches admin audit logs, object change history, before/after snapshots, security event logs, denied admin attempts, read access to sensitive data, export logs, import logs, bulk-operation logs, or operational evidence.
+- Work touches dangerous actions such as delete, restore, refund, charge, ban, unban, transfer ownership, change roles, reset MFA, invalidate sessions, purge cache, reindex search, run migration-like repair, edit production config, or bulk update.
+- Work touches admin search, filtering, sorting, pagination, saved views, exports, imports, bulk actions, CSV or spreadsheet output, PII masking, tenant isolation, production versus staging indicators, or environment guardrails.
+<!-- mustflow-section: do-not-use-when -->
+## Do Not Use When
+- The task changes the core authentication, authorization, session, token, RBAC, ABAC, tenant, or permission engine. Use `auth-permission-change` first, then this skill for the admin control-plane surface around it.
+- The task needs API-specific BOLA, IDOR, object, property, or function authorization review. Use `api-access-control-review` for the API proof and this skill for admin workflow controls, auditability, and operator safety.
+- The task is a broad sensitive-data, retention, logging, dependency, or privacy review with no admin surface. Use `security-privacy-review`.
+- The task primarily changes a domain workflow such as payment, notification, credit ledger, deletion, deployment, or database migration. Use that domain skill for domain correctness and this skill only for the admin operation that exposes or overrides it.
+- The task only changes public website UI copy or layout with no staff-only action, permission, audit, export, or operational evidence surface.
+<!-- mustflow-section: required-inputs -->
+## Required Inputs
+- User goal, current diff or target files, affected admin surfaces, operator personas, protected resources, tenant or organization scopes, and environments.
+- Admin actor and session ledger: staff identity, authentication method, MFA or passkey status, session lifetime, reauthentication points, device or network constraints, support account status, and revocation behavior.
+- Permission ledger: role, capability, scope, tenant, resource, field, condition, explicit deny, policy version, assignment path, approval path, and why the operator may perform the action now.
+- Resource and field ledger: object type, tenant, owner, lifecycle state, sensitive fields, masked fields, read-only fields, mutable fields, derived fields, and privileged field update policy.
+- Dangerous action ledger: action type, impact, target count, preview, confirmation, reauthentication, approval, idempotency key, queued job, cancel or stop behavior, rollback or compensation path, and final server-side recomputation.
+- Audit and change-history ledger: actor, subject, target, action, reason, ticket, before and after values, redaction, denied attempts, read/export events, retention, immutability, correlation id, and reviewer or SIEM handoff when relevant.
+- Impersonation ledger: who acts, who is viewed or acted as, purpose, ticket or reason, TTL, visible banner, prohibited actions, notification policy, exit path, and separate actor plus subject logging.
+- Search, filter, export, import, and bulk ledger: query dimensions, tenant constraints, row limits, pagination, PII masking, field allowlist, file lifetime, watermarking, download audit, dry run, per-item authorization, and partial-failure policy.
+- Existing tests, role matrix, admin docs, runbooks, support procedures, incident expectations, and configured command intents.
+<!-- mustflow-section: preconditions -->
+## Preconditions
+- Treat admin access as an elevated security boundary, not a feature flag or hidden route.
+- Treat UI hiding, disabled buttons, menu filtering, and route redirects as operator UX only. Server, service, policy, or database checks must own permission.
+- Treat one `is_admin` flag as insufficient unless the product truly has one global superuser role and has accepted the audit and blast-radius risk.
+- Identify whether the surface is production, staging, preview, local, or sandbox. Production-like tools need visible environment identity and stronger action friction.
+- Separate audit log from object change history before claiming traceability. Audit explains who did what and why; change history explains how the object changed over time.
+- Do not let support tooling bypass tenant, field, export, payment, deletion, notification, or privacy policy merely because staff can see the UI.
+<!-- mustflow-section: allowed-edits -->
+## Allowed Edits
+- Add or tighten admin permission checks, scoped role checks, server-side field allowlists, tenant-scoped lookups, step-up authentication, dangerous-action confirmations, approval gates, audit logs, change-history records, export controls, bulk-operation controls, tests, docs, and role matrices.
+- Add explicit operator safety states such as dry-run, preview, pending approval, queued, running, partially failed, cancelled, completed, compensated, or blocked.
+- Add masking, redaction, purpose capture, ticket capture, download expiry, and operational evidence fields when the changed admin surface justifies them.
+- Keep edits scoped to the admin control plane and directly synchronized domain surfaces. Do not redesign the entire auth model or domain workflow under this skill alone.
+<!-- mustflow-section: procedure -->
+## Procedure
+1. Classify the admin surface.
+   - Identify whether it is read-only support, customer support mutation, content moderation, billing, security, tenant management, feature/config operation, data repair, incident response, or emergency access.
+   - Name the operator persona, tenant scope, environment, target resource, and blast radius.
+2. Write the control-plane question before editing.
+   - Ask whether this operator can do this action on this tenant, object, field, and environment right now, and whether the system can explain it after an incident.
+3. Replace role labels with effective capability evidence.
+   - `admin`, `staff`, `support`, and `owner` are labels. The decision needs scoped capability, resource relationship, tenant boundary, explicit deny behavior, and policy version.
+4. Enforce permissions at the trusted layer.
+   - Check server routes, service methods, jobs, database policies, queue workers, export generators, and storage signers.
+   - UI guards may make the console easier to use, but they do not authorize anything.
+5. Separate admin role types.
+   - Global admin, tenant admin, billing admin, security admin, content moderator, support viewer, support mutator, impersonating operator, and break-glass operator are different roles.
+   - Role assignment, API-key issuance, MFA reset, session invalidation, ownership transfer, impersonation, export approval, and emergency access are privilege factories.
+6. Review admin sessions.
+   - Require MFA or passkeys for elevated admin access when product risk justifies it.
+   - Use idle timeout, absolute timeout, session revocation, CSRF protection for browser admin tools, and step-up reauthentication for sensitive actions.
+   - Keep customer and staff identity planes separate unless the product intentionally models and audits the shared path.
+7. Build an audit record for every meaningful operator action.
+   - Log actor, target, tenant, action, result, reason or ticket when useful, source IP or safe session id, request id, correlation id, before and after values when safe, and denial reason for rejected attempts.
+   - Audit reads of sensitive PII, exports, impersonation start and stop, role changes, approval decisions, failed dangerous confirmations, and bulk jobs.
+   - Do not log secrets, tokens, full payment data, raw private documents, or excessive PII just to make the audit feel complete.
+8. Keep audit logs durable and tamper-resistant enough for the product risk.
+   - Append-only storage, restricted delete access, retention policy, clock source, hash chaining, WORM storage, SIEM export, or reviewer workflow may be needed for high-impact products.
+   - If the log can be edited by the same operator whose actions it records, report audit-integrity risk.
+9. Keep object change history separate.
+   - Change history should show field-level before and after values, who or what changed them, effective time, source action, and restore or comparison behavior when useful.
+   - Redact or omit sensitive values. Prefer value hashes, labels, or structured diffs when full before/after snapshots would leak data.
+10. Harden impersonation.
+    - Require explicit capability, reason or ticket, reauthentication, TTL, visible banner, easy exit, and actor plus subject logs.
+    - Prohibit or step-up high-risk actions during impersonation: password or MFA changes, payment method changes, role changes, exports, destructive actions, and irreversible account actions unless policy explicitly allows them.
+    - Consider user notification or post-action review for sensitive impersonation.
+11. Add friction to dangerous actions.
+    - Show an impact preview, target identity, tenant, count, irreversible consequences, related resources, and recovery path before the action.
+    - Use typed confirmation, step-up authentication, approval, or delay for destructive or broad actions.
+    - Recompute target counts and permission server-side at execution time; do not trust the preview payload.
+12. Treat bulk operations as jobs, not giant button clicks.
+    - Require dry run, sample rows, target count, per-item authorization, idempotency, concurrency control, progress state, cancel behavior, partial-failure report, and audit trail.
+    - Define whether one denied item fails the whole job or produces per-item failures.
+13. Govern exports.
+    - Use field allowlists, masking, row limits, async jobs for large exports, purpose or ticket capture, approval for sensitive data, file expiry, watermarking when useful, download audit, and tenant-scoped storage.
+    - Do not treat CSV generation as a harmless read. A local spreadsheet can become the real data breach.
+14. Govern imports.
+    - Use preview, validation report, schema version, row limits, duplicate policy, dry run, idempotency key, rollback or compensation plan, and per-row error reporting.
+    - Never let import files mass-assign privileged fields such as role, tenant, owner, balance, entitlement, deletion state, or verified status without explicit policy.
+15. Review admin search and filters.
+    - Enforce tenant and field scopes in the query, not only in rendered rows.
+    - Use cursor pagination or bounded result windows for scale, stable sorting, query limits, and PII masking.
+    - Audit sensitive lookups such as email, phone, account id, payment id, VIP user, security event, or cross-tenant search when product risk requires it.
+16. Protect production from environment confusion.
+    - Display the environment, tenant, and target resource clearly.
+    - Require stronger confirmation for production than staging, block production-only dangerous actions from preview environments, and avoid shared cookies or staff sessions across environments when risk justifies it.
+17. Review observability and incident reconstruction.
+    - Admin jobs need progress, result, failure reason, retry state, initiator, target count, affected item ids or safe references, and correlation ids.
+    - Security-relevant admin events should be alertable: role grant, break-glass access, export, mass deletion, impersonation, MFA reset, ownership transfer, and repeated denied attempts.
+18. Align admin docs and role matrices.
+    - Update role matrices, support runbooks, approval policy, escalation notes, and user-facing support commitments when behavior changes.
+19. Add hostile-path tests when the project has a usable test surface.
+    - Cover direct API calls with hidden buttons bypassed, wrong tenant, wrong admin scope, denied field update, role assignment by low privilege actor, impersonation prohibited action, export field masking, export expiry, audit event creation, bulk dry-run versus execution consistency, partial failures, reauthorization at execution time, and production guardrails where relevant.
+20. Report every admin surface that remains only "trusted because staff will be careful" as a risk.
+<!-- mustflow-section: postconditions -->
+## Postconditions
+- Admin actions are authorized at a trusted layer with scoped roles, resource relationships, tenant boundaries, and default-deny behavior.
+- Audit logs and object change history are distinct, durable enough for the risk, and redacted where needed.
+- Impersonation, dangerous actions, export, import, bulk operations, search, filters, production guardrails, admin sessions, and approval paths are checked or explicitly scoped out.
+- Server-side recomputation, idempotency, job evidence, cancellation, partial-failure, and recovery or compensation are explicit for broad or irreversible actions.
+- Tests, docs, role matrices, runbooks, and release notes are synchronized when the admin behavior is public, packaged, or operationally promised.
+<!-- mustflow-section: verification -->
+## Verification
+Use configured oneshot command intents when available:
+- `changes_status`
+- `changes_diff_summary`
+- `lint`
+- `build`
+- `test_related`
+- `test`
+- `test_audit`
+- `docs_validate_fast`
+- `test_release`
+- `mustflow_check`
+Prefer the narrowest configured tests that prove denial behavior, audit completeness, export masking or expiry, impersonation limits, bulk dry-run consistency, and dangerous-action server-side recomputation. Use release and docs checks when installed skills, templates, public docs, role matrices, or package metadata change.
+<!-- mustflow-section: failure-handling -->
+## Failure Handling
+- If operator scope, tenant boundary, target resource, field policy, or decision source of truth is unclear, report the missing ledger instead of adding another broad admin condition.
+- If audit evidence is missing, do not claim the action is traceable. Add the audit event or report the traceability gap.
+- If the UI hides a dangerous action but the API still permits it, treat the control as incomplete.
+- If a dangerous action cannot be rolled back, require stronger preview, approval, delay, or compensation evidence before treating the workflow as safe.
+- If tests cannot be added in the current task, name the exact untested admin actor, tenant, action, field, export, impersonation, or bulk-job case.
+- If sensitive data appears in logs, exports, fixtures, screenshots, or command output, stop repeating it and use `secret-exposure-response` or `security-privacy-review` as appropriate.
+<!-- mustflow-section: output-format -->
+## Output Format
+- Admin control plane reviewed
+- Operator persona, tenant, resource, action, field, and environment scope
+- Effective permission, explicit deny, policy version, and trusted enforcement findings
+- Admin session, MFA or step-up, CSRF, revocation, and environment guardrail findings
+- Audit log versus change-history findings
+- Impersonation, dangerous action, approval, export, import, bulk, search, and filter findings
+- Job, idempotency, recomputation, cancellation, rollback, compensation, and incident evidence
+- Tests or hostile cases covered
+- Files changed
+- Command intents run
+- Skipped checks and reasons
+- Remaining admin-control-plane risk

package/templates/default/locales/en/.mustflow/skills/ai-product-readiness-review/SKILL.md ADDED Viewed

@@ -0,0 +1,158 @@
+---
+mustflow_doc: skill.ai-product-readiness-review
+locale: en
+canonical: true
+revision: 1
+lifecycle: mustflow-owned
+authority: procedure
+name: ai-product-readiness-review
+description: Apply this skill when an AI product feature, AI Gateway, model or provider integration, prompt/RAG/tool path, AI cache, fallback path, streaming path, evaluation gate, user-data flow, model registry, AI observability surface, or model portability plan is created, changed, reviewed, or reported and the risk is end-to-end product readiness rather than one narrow prompt, cost, latency, or agent concern.
+metadata:
+  mustflow_schema: "1"
+  mustflow_kind: procedure
+  pack_id: mustflow.core
+  skill_id: mustflow.core.ai-product-readiness-review
+  command_intents:
+    - changes_status
+    - changes_diff_summary
+    - lint
+    - build
+    - test_related
+    - test
+    - prompt_cache_audit
+    - docs_validate_fast
+    - test_release
+    - mustflow_check
+---
+# AI Product Readiness Review
+<!-- mustflow-section: purpose -->
+## Purpose
+Review AI features as product and operating systems, not as model calls. A production AI path should make the AI role modest, route every request through a controlled gateway, separate instructions from untrusted data, cap cost, isolate caches, evaluate regressions, expose fallback states, protect user data, stream safely, observe without raw-content leakage, and allow model replacement.
+<!-- mustflow-section: use-when -->
+## Use When
+- A change adds, edits, reviews, or reports an AI feature, AI Gateway, LLM provider adapter, model router, prompt registry, RAG path, tool proposal flow, AI cache, fallback path, streaming path, eval gate, user-data flow, model registry, AI observability surface, AI incident runbook, or model migration plan.
+- A task asks whether an AI feature is launch-ready, production-ready, safe to automate, cost-bounded, prompt-injection-resistant, privacy-safe, eval-ready, fallback-ready, cache-safe, streaming-safe, or provider/model-portable.
+- The feature sends user input, documents, conversation history, files, images, retrieved content, tool observations, logs, or business records to a model.
+- The AI can influence money, permissions, account state, external messages, deletion, workflow status, support outcomes, generated code, compliance claims, or customer-visible answers.
+- Multiple narrower AI skills could apply and the first question is whether the product boundary itself is sound.
+<!-- mustflow-section: do-not-use-when -->
+## Do Not Use When
+- The whole task is one narrow prompt-contract, hallucination, token-cost, response-latency, agent-execution, agent-eval, UX, prompt-injection, privacy, cache, adapter, rate-limit, idempotency, or cloud-cost concern. Use the narrower skill directly.
+- The task only changes ordinary non-AI code with no model, prompt, retrieval, tool, generated output, or AI telemetry path.
+- The task only chooses between model vendors or current model versions. Use `source-freshness-check` or the provider-specific workflow before making stale-sensitive claims.
+- The task only edits marketing copy that says a feature uses AI but does not change product behavior, documentation contract, or release claims.
+<!-- mustflow-section: required-inputs -->
+## Required Inputs
+- Product role ledger: feature goal, user, AI role, human-review boundary, harm if wrong, non-AI fallback, and whether the AI drafts, suggests, classifies, searches, executes, or decides.
+- Gateway ledger: request entrypoint, auth, quota, rate limit, request shaping, prompt version, model routing, redaction, cache, retry, fallback, logging, eval hooks, and kill switch.
+- Authority and action ledger: system/developer instructions, user input, retrieved data, tool results, allowed tool proposals, server-side policy engine, side-effect execution, approval state, idempotency key, and audit trail.
+- Data ledger: data classes, tenant and permission scope, uploaded files, RAG corpus, vector store, provider region or endpoint, retention, logs, analytics, crash reports, staff access, and deletion path.
+- Cost and cache ledger: request volume, user/org/feature budget, input/output/reasoning token caps, retry caps, model tiers, provider prompt-cache boundary, app cache keys, TTL, invalidation, and cache redaction.
+- Eval ledger: golden set, adversarial set, regression set, cost and latency set, privacy leak set, prompt injection set, refusal set, locale set, pass criteria, owner, and CI or release gate.
+- Fallback and streaming ledger: provider failure, rate-limit failure, quality fallback, safety fallback, partial-output policy, cancellation, final validation, and recovery UX.
+- Model portability ledger: model registry, provider adapters, capability map, prompt and schema versions, migration evals, rollout plan, rollback plan, and deprecation monitoring owner.
+- Observability ledger: request ID, user or tenant hash, feature, prompt version, provider, model, tokens, cached tokens, latency breakdown, retrieval IDs, retries, fallback, refusal, validation failures, tool calls, user feedback, cost, and redaction.
+<!-- mustflow-section: preconditions -->
+## Preconditions
+- The task matches the Use When conditions and does not match the exclusions.
+- Current repository instructions, command contract, AI service code, prompt files, retrieval paths, provider adapters, telemetry, tests, and docs in scope have been inspected before editing.
+- External AI-safety guidance, model documentation, vendor deprecation schedules, pricing, rate limits, retention promises, cache behavior, and legal requirements are stale-sensitive. Do not embed exact claims unless refreshed through an authorized source path.
+- Command execution remains governed by `.mustflow/config/commands.toml`; this skill does not authorize raw model calls, vendor dashboard checks, network requests, eval harness runs, migrations, releases, or billing commands.
+<!-- mustflow-section: allowed-edits -->
+## Allowed Edits
+- Add or refine AI Gateway boundaries, provider adapters, model registries, prompt registries, policy engines, tool proposal gates, quota guards, token budgets, cache keys, redaction, retry and fallback state machines, streaming validators, eval fixtures, observability fields, incident runbooks, tests, docs, route metadata, and directly synchronized templates.
+- Add launch-readiness docs, risk ledgers, model migration notes, privacy-safe telemetry contracts, and release notes tied to implemented behavior.
+- Route narrow subproblems to specialist skills instead of duplicating deep prompt, RAG, latency, cost, eval, agent-control, cache, privacy, or UX guidance inside this skill.
+- Do not approve fully automated high-risk decisions without a server-side policy gate, human review path, audit trail, and rollback or appeal path.
+- Do not let clients call model providers directly when product auth, quota, redaction, logging, eval, or policy enforcement must be controlled server-side.
+- Do not log raw prompts, raw responses, retrieved documents, uploaded files, secrets, tokens, private identifiers, or personal data by default while adding AI observability.
+- Do not treat streaming, a second provider, a stronger model, or a longer prompt as proof of safety, quality, privacy, or cost control.
+<!-- mustflow-section: procedure -->
+## Procedure
+1. Classify the AI role. Prefer draft, suggestion, search, classification, extraction, summarization, or proposal roles. Flag automatic decisions about money, permissions, legal, medical, hiring, credit, account status, external sending, deletion, or production execution as high-risk and require human review or a non-AI decision owner.
+2. Name the harm model. Write the failure case in product terms: wrong answer, leaked data, unauthorized action, cost spike, stale source, over-refusal, unsafe advice, duplicate side effect, unsupported citation, broken locale, or missing fallback.
+3. Require an AI Gateway boundary. The client should call the product backend, and the gateway should own auth, quota, request shaping, prompt version, model routing, redaction, cache, retry, fallback, logging, eval hooks, and kill switches.
+4. Separate model speech from product authority. Treat model output as a proposal or draft until server code validates schema, business rules, permission, ownership, rate limits, action risk, confirmation state, idempotency, and audit requirements.
+5. Treat external text as data. Retrieved documents, webpages, emails, tickets, Slack messages, GitHub issues, uploads, logs, tool results, and RAG chunks must not become instructions. Use `external-prompt-injection-defense` when untrusted text can override scope, tools, secrets, or policy.
+6. Check permission and tenant narrowing before retrieval or tool use. The model should receive only data the authenticated user may access. Vector search, document search, and tool calls must enforce tenant, workspace, object, field, and role boundaries outside the prompt.
+7. Validate output before use. Use structured schemas, semantic validators, business-rule checks, HTML or Markdown sanitization, URL allowlists, path sandboxing, SQL parameterization, command allowlists, and refusal or failure envelopes where relevant.
+8. Put cost controls at the product layer. Add user, tenant, org, and feature budgets; request input and output caps; retry caps; model tiering; queue or batch separation; dedupe; cancellation accounting; budget breach events; and an operator kill switch.
+9. Design cache keys as security boundaries. Include tenant, permission scope, feature, prompt version, model, provider, source corpus version, locale, policy version, and TTL when they affect correctness. Never use raw personal data, secrets, tokens, uploaded-file bodies, or sensitive text as cache keys.
+10. Separate cache layers. Review provider prompt-cache prefix stability, app response cache safety, retrieval-result cache invalidation, and RAG corpus versioning separately. Use `llm-token-cost-control-review` or `cache-integrity-review` when cache shape is the main risk.
+11. Build evals before launch claims. Require golden, adversarial, regression, privacy leak, prompt injection, refusal, cost, latency, and locale cases appropriate to the feature. Treat manual spot checks as exploration, not readiness evidence.
+12. Make fallback a product state machine. Distinguish provider outage, rate limit, quality failure, safety failure, validation failure, no evidence, and human-review fallback. Do not make fallback only "try another model".
+13. Guard side effects against retry. Tool actions that send messages, charge or refund money, delete data, change permissions, update records, execute code, or call external APIs need idempotency keys, approval state, request IDs, resource IDs, policy checks, and audit logs.
+14. Review streaming by risk. Low-risk text can stream if cancellation and partial-output policy are clear. High-risk answers, tool arguments, external actions, private data, generated code execution, and compliance-sensitive output should be validated before display or execution.
+15. Protect data beyond provider training policy. Check redaction before model calls, retention in provider state, product DB, vector DB, logs, analytics, crash reports, eval traces, staff tools, exports, backups, and deletion workflows.
+16. Make model portability explicit. Use a task-level model registry or router instead of scattering provider model names through code. Keep a capability map for structured output, tool use, streaming, caching, context limits, refusal behavior, token accounting, regions, and retention.
+17. Plan model changes as migrations. Compare accuracy, hallucination, citation behavior, JSON validity, refusal rate, tool precision, latency, cost, prompt-injection robustness, locale quality, tone, and long-context behavior before rollout. Keep rollback possible.
+18. Instrument without raw-content leakage. Log IDs, hashes, versions, counts, sizes, reason codes, validation status, retrieval IDs, token counts, cached-token counts, latency, retries, fallback, tool outcomes, user feedback, and cost. Store raw content only under explicit debug approval, masking, retention, and deletion policy.
+19. Check UX expectation boundaries. The interface should show when AI is involved, what evidence was used, what was not seen, when manual approval is needed, how to report bad output, and how to recover. Use `llm-service-ux-review` when UI state is the main risk.
+20. Route the sharp edges. Apply specialist skills for prompt contracts, hallucination/RAG grounding, token cost, latency, agent execution, agent evals, prompt injection, privacy, cache integrity, adapter boundaries, rate limits, idempotency, and cloud cost when one of those owns the remaining work.
+21. Verify with the narrowest configured tests, fixture runs, docs validation, release checks, prompt-cache audit, and mustflow validation that cover the changed AI product surface.
+<!-- mustflow-section: postconditions -->
+## Postconditions
+- The AI role, harm model, human-review boundary, and non-AI or degraded fallback are explicit.
+- AI calls pass through a controlled gateway or the absence of a gateway is reported as residual risk.
+- Prompt injection, output validation, tool side effects, tenant permissions, privacy, cost, cache, eval, fallback, streaming, observability, and model-portability surfaces are each accepted, fixed, or routed to a narrower skill.
+- Launch or readiness claims are backed by implemented controls, tests, eval evidence, docs, or explicit remaining risks.
+<!-- mustflow-section: verification -->
+## Verification
+Use configured oneshot command intents when available:
+- `changes_status`
+- `changes_diff_summary`
+- `lint`
+- `build`
+- `test_related`
+- `test`
+- `prompt_cache_audit`
+- `docs_validate_fast`
+- `test_release`
+- `mustflow_check`
+Use the narrowest configured fixture, eval, schema, integration, docs, package, release, or mustflow check that proves the changed AI product-readiness contract. Do not infer raw provider, billing, eval, migration, or dashboard commands.
+<!-- mustflow-section: failure-handling -->
+## Failure Handling
+- If the AI role or harm model is unknown, do not claim readiness. Record the missing product decision and keep automation conservative.
+- If no gateway exists, report which controls are missing and avoid spreading partial controls across clients, prompts, and provider dashboards.
+- If evals are missing, classify the change as static risk reduction or draft readiness rather than proven quality improvement.
+- If cost, privacy, retention, or provider behavior depends on current vendor facts, route exact claims through `source-freshness-check` before embedding them in docs or release notes.
+- If fallback can duplicate side effects, stop and route through `idempotency-integrity-review` before retry or provider-fallback changes.
+- If streaming reveals unvalidated private, unsafe, or irreversible content, switch to buffered validation or report the missing safe state.
+- If model portability conflicts with a provider-specific capability, keep the capability map explicit instead of hiding differences behind a leaky abstraction.
+<!-- mustflow-section: output-format -->
+## Output Format
+- AI product surface reviewed
+- AI role, harm model, human-review boundary, and fallback state
+- AI Gateway, provider adapter, policy engine, tool execution, and kill-switch status
+- Prompt injection, data protection, output validation, tenant permission, and side-effect boundaries checked
+- Cost budget, cache key, eval gate, streaming, observability, and model portability checked
+- Specialist skills applied or intentionally deferred
+- Files changed
+- Command intents run
+- Skipped checks and reasons
+- Remaining AI product-readiness risk

package/templates/default/locales/en/.mustflow/skills/auth-permission-change/SKILL.md CHANGED Viewed

@@ -2,11 +2,11 @@
 mustflow_doc: skill.auth-permission-change
 locale: en
 canonical: true
-revision: 3
+revision: 4
 lifecycle: mustflow-owned
 authority: procedure
 name: auth-permission-change
-description: Apply this skill when authentication, authorization, permissions, roles, effective permissions, policy decisions, tenants, sessions, JWTs, OAuth or OIDC, API keys, route guards, admin access, database policies, object-level access control, signed delivery URLs, credentialed event streams, or private cache behavior are created or changed.
+description: Apply this skill when authentication, authorization, permissions, roles, RBAC or ABAC policy decisions, tenants, organization or team memberships, sessions, cookies, JWTs, refresh tokens, OAuth or OIDC, passkeys, MFA, account recovery, API keys, route guards, admin access, database policies, object-level access control, signed delivery URLs, credentialed event streams, private cache behavior, or account-takeover response paths are created or changed.
 metadata:
   mustflow_schema: "1"
   mustflow_kind: procedure
@@ -36,7 +36,7 @@ Authentication answers who the requester is. Authorization answers what that pri
 <!-- mustflow-section: use-when -->
 ## Use When
-- Authentication, authorization, role, permission, capability, policy, tenant, workspace, organization, session, JWT, OAuth, OIDC, API key, invite, reset token, route guard, admin, impersonation, audit, or database policy behavior changes.
+- Authentication, authorization, role, permission, capability, RBAC, ABAC, policy, tenant, workspace, organization, team, session, cookie, JWT, refresh token, OAuth, OIDC, passkey, MFA, API key, invite, reset token, email change, account recovery, route guard, admin, impersonation, audit, or database policy behavior changes.
 - A route, resolver, controller, service, command handler, job, webhook, API client, generated SDK, UI guard, or database query starts relying on user, tenant, role, ownership, membership, or resource identity.
 - A change affects object-level access control, multi-tenant isolation, shared resources, signed URLs, exports, search, autocomplete, background jobs, webhooks, or admin/support tooling.
 - A change affects credentialed EventSource/SSE streams, WebTransport sessions, WebSocket fallback, signed delivery URLs, private file delivery, CDN/proxy cache keys, CORS credentials, cookies, or auth tokens embedded in delivery URLs.
@@ -53,21 +53,29 @@ Authentication answers who the requester is. Authorization answers what that pri
 <!-- mustflow-section: required-inputs -->
 ## Required Inputs
-- Changed files, user goal, affected actors, protected resources, actions, tenants, roles, and status-code expectations.
+- Changed files, user goal, affected actors, protected resources, actions, tenants, organizations, teams, roles, and status-code expectations.
 - Permission decision tuple for each changed protected action: subject, action, object, tenant or
   organization, relationship path, request environment, policy version, data revision, token issue
   time, and final allow or deny reason.
 - Effective-permission evidence, not only role names: computed capabilities, inherited
   relationships, explicit denies, wildcard policies, policy-combination rules, and default-deny
   behavior.
-- Auth middleware, framework hooks, gateway checks, session config, cookie config, JWT verifier, OAuth/OIDC callback, API key verifier, and logout or revocation code when relevant.
+- Attribute and relationship evidence for ABAC or relationship-based policy: attribute source of
+  truth, freshness, trust boundary, failure behavior, policy version, and explainable deny reason.
+- Auth middleware, framework hooks, gateway checks, session config, cookie config, JWT verifier,
+  refresh-token store, OAuth/OIDC callback, MFA or passkey verifier, API key verifier, and logout
+  or revocation code when relevant.
 - Route guards, client guards, server controllers, resolvers, command handlers, services, policy functions, role or permission tables, database queries, RLS, views, stored procedures, and ORM scopes.
-- Tenant, organization, workspace, project, membership, invite, suspension, ownership, sharing, and admin-support data models.
+- Tenant, organization, workspace, project, team, membership, invite, suspension, ownership,
+  last-owner, sharing, account-recovery, and admin-support data models.
 - Background jobs, queue payloads, webhooks, import/export flows, search or autocomplete indexes, signed URL generation, storage keys, SSE or streaming channels, WebTransport sessions, WebSocket fallback, CDN cache, proxy cache, and permission caches when they can expose protected data.
 - Audit logs, admin action logs, impersonation records, denied-access logs, API docs, role matrix docs, migrations, and tests.
 - Revocation and cache evidence: token claim freshness, session lifetime, permission-cache key and
   TTL, policy replication delay, search-index or export lag, stale queue payloads, and rollback or
   shadow-evaluation behavior when policy changes.
+- Account-takeover and recovery evidence: login throttling, credential-stuffing controls, password
+  reset, magic-link, MFA reset, email change, trusted-device, session-kill, notification, and
+  high-risk-action hold behavior.
 - Configured verification intents.
 <!-- mustflow-section: preconditions -->
@@ -76,6 +84,9 @@ Authentication answers who the requester is. Authorization answers what that pri
 - Classify the change as authentication, authorization, or both before editing.
 - Identify principal, tenant, resource, action, and context for each protected operation.
 - Treat client-provided `userId`, `tenantId`, `workspaceId`, `role`, `isAdmin`, object id, API key label, token claim, and local storage value as untrusted.
+- For browser web apps, identify whether auth state is server-held, BFF-mediated, or token-held by
+  browser code. Treat local storage, session storage, URL tokens, and JavaScript-readable long-lived
+  tokens as exposure risks, not neutral implementation details.
 - Find the current policy source of truth. If authorization is scattered across routes, do not add another scattered condition without first considering a central policy function.
 - Know whether the product intentionally hides resource existence with 404 or exposes permission denial with 403.
 - If SSE, WebTransport, WebSocket fallback, signed URL, CORS, cookie, CDN cache, proxy cache, or streaming delivery behavior changes, use `http-delivery-streaming` for the transport contract while this skill checks access control.
@@ -92,12 +103,16 @@ Authentication answers who the requester is. Authorization answers what that pri
 ## Procedure
 1. Classify the boundary:
-   - authentication: verifies a principal from a session, JWT, OAuth/OIDC account, API key, service account, or anonymous state;
+   - authentication: verifies a principal from a session, cookie, JWT, refresh token, OAuth/OIDC
+     account, passkey, MFA event, API key, service account, reset token, magic link, or anonymous
+     state;
    - authorization: decides whether that principal can perform an action on a resource within a tenant and context.
 2. Read the mandatory surfaces that apply:
-   - auth middleware, hooks, gateway, session store, cookies, JWT, OAuth/OIDC, API key verification, logout, revocation, and token rotation;
+   - auth middleware, hooks, gateway, session store, cookies, JWT, refresh tokens, OAuth/OIDC,
+     passkeys, MFA, API key verification, logout, revocation, and token rotation;
    - route guards, client redirects, server controllers, resolvers, services, command handlers, policy calls, and admin tools;
-   - tenant resolver, membership schema, role assignment, invite flow, suspension, ownership, sharing, and impersonation model;
+   - tenant resolver, membership schema, role assignment, invite flow, suspension, ownership,
+     last-owner, sharing, account-recovery, and impersonation model;
    - database queries, tenant scopes, ownership joins, soft-delete filters, RLS, views, stored procedures, and ORM helpers;
    - background jobs, queues, webhooks, import/export, file storage, signed URLs, SSE or streaming channels, WebTransport sessions, WebSocket fallback, search, autocomplete, caches, CDN or proxy cache rules, audit logs, docs, migrations, and tests.
 3. Write the permission decision inputs for each protected action: principal, tenant, resource, action, and context.
@@ -118,41 +133,75 @@ Authentication answers who the requester is. Authorization answers what that pri
 9. Define policy-combination semantics. Check whether multiple policies combine by union,
    intersection, priority, explicit deny, boundary policy, or tenant override; write tests for
    allow-plus-deny and no-match cases.
-10. Validate every request. Do not rely on login-time checks, client guards, disabled buttons, hidden menus, generated types, OpenAPI docs, or mobile local checks.
-11. Load resources safely before final authorization:
+10. Check ABAC and relationship-policy inputs.
+    - Subject attributes, object attributes, device posture, employment status, department,
+      resource classification, IP or region, and time windows are only useful when their source of
+      truth, freshness, tamper resistance, and unknown-value behavior are explicit.
+    - Attribute fetch failure, policy-store timeout, stale policy bundle, or missing relationship
+      data should deny or degrade to a documented low-risk mode, not silently allow.
+11. Validate every request. Do not rely on login-time checks, client guards, disabled buttons, hidden menus, generated types, OpenAPI docs, or mobile local checks.
+12. Load resources safely before final authorization:
    - include tenant, membership, owner, sharing, and soft-delete constraints in the resource lookup when possible;
    - when existence must be hidden, keep wrong-tenant and missing-resource behavior consistent with the project's 404 policy.
-12. Check multi-tenant risks:
+13. Check multi-tenant and organization/team risks:
    - body, query, header, path, JWT claim, or local storage tenant ids must not become trusted tenant context;
    - tenant-scoped queries must include tenant or membership constraints;
    - pending, suspended, removed, revoked, deleted, disabled, and invited states must not be treated as active access;
+   - user-level roles must not stand in for organization, team, project, billing, security, or
+     support scope;
+   - invitations need target email, target organization, target role, single-use, expiration, and
+     accepter-identity binding;
+   - owner transfer, owner removal, organization deletion, and account deletion must preserve a
+     last-owner or break-glass policy with audit and time bounds.
    - shared links, exports, signed URLs, previews, search, cache, and CDN entries must stay inside the permission model.
    - event streams, WebTransport sessions, WebSocket fallback channels, private downloads, and reconnect URLs must not bypass tenant, resource, role, token expiry, or revocation policy.
-13. Check session, JWT, OAuth/OIDC, and API key contracts when touched:
-   - sessions need expiry, refresh, rotation, logout, revocation, cookie flags, and CSRF posture;
+14. Check session, browser-token, JWT, refresh-token, OAuth/OIDC, MFA, recovery, and API-key
+    contracts when touched:
+   - browser web apps should prefer server sessions or BFF-mediated tokens with `HttpOnly`,
+     `Secure`, appropriately scoped `SameSite`, and preferably host-only cookies; broad domain
+     cookies and JavaScript-readable long-lived tokens need an explicit risk decision.
+   - sessions need idle expiry, absolute expiry, renewal, session-id rotation, logout, all-device
+     revocation, server-side invalidation, cookie flags, and CSRF posture;
+   - refresh tokens need hash storage, family or device binding, rotation, reuse detection,
+     single-flight or short grace handling for multi-tab and retry races, and revocation triggers;
    - JWTs need signature verification, algorithm allowlist, issuer, audience, subject, expiry, not-before, key rotation, and stale-claim handling;
    - OAuth/OIDC needs exact redirect binding, state, nonce, PKCE when relevant, provider account binding, and safe account linking;
+   - password reset, magic-link, email change, and account recovery tokens need purpose binding,
+     single use, short expiry, safe storage, enumeration-safe responses, and post-success session
+     invalidation;
+   - MFA and passkey flows need step-up policy, recovery-code storage, reset policy, phishing
+     resistance expectations, trusted-device boundaries, and failure notifications;
    - API keys need hashing, prefix-only display, owner type, scope, tenant/resource constraints, expiry, rotation, revocation, last-used, rate limit, and audit.
-14. Check permission creators more strictly than ordinary permissions.
+15. Check permission creators more strictly than ordinary permissions.
     - Role creation, role assignment, invitation, service-account issuance, API-key creation,
-      impersonation, support access, and token minting are privilege factories.
+      impersonation, support access, MFA reset, account recovery, email change, ownership transfer,
+      and token minting are privilege factories.
     - A low-privilege actor that can attach a high privilege to itself collapses the whole model.
-15. Check revocation time.
+16. Check revocation time.
     - Demotion, removal, suspension, role deletion, membership expiry, subscription cancellation,
-      and ownership transfer should say how long old sessions, JWT claims, caches, search indexes,
-      queued jobs, and replicas can keep authorizing the old state.
+      ownership transfer, password change, MFA change, email change, account recovery, refresh-token
+      reuse, suspected account takeover, and support access expiry should say how long old sessions,
+      refresh tokens, JWT claims, caches, search indexes, queued jobs, signed URLs, and replicas can
+      keep authorizing the old state.
     - Sensitive actions need server-side recheck, short-lived tokens, revocation lists, or policy
       version gates when stale tokens would be harmful.
-16. Check dependent surfaces: API routes, controllers, services, DB schema, DB queries, RLS, UI navigation, UI actions, API clients, audit logs, notifications, jobs, webhooks, search, file storage, docs, migrations, monitoring, and tests.
+17. Check account-takeover response paths.
+    - Login, signup, password reset, magic link, OTP, MFA, invite acceptance, email verification,
+      API-key creation, and support impersonation need rate limits, account/IP/device dimensions,
+      enumeration-safe responses, audit events, and notification policy.
+    - High-risk sequences such as new-device login followed by MFA reset, email change, API-key
+      creation, organization invite, export, billing change, or ownership transfer should trigger
+      step-up, hold, read-only quarantine, or session revocation according to product risk.
+18. Check dependent surfaces: API routes, controllers, services, DB schema, DB queries, RLS, UI navigation, UI actions, API clients, audit logs, notifications, jobs, webhooks, search, file storage, docs, migrations, monitoring, and tests.
     - For credentialed delivery surfaces, check whether EventSource can supply the intended credentials, whether CORS and cookies match the policy, whether signed URLs expire and scope correctly, and whether caches vary on auth, tenant, and private response dimensions.
-17. Require denial-first tests for changed protected actions when the project has a usable test surface. Cover anonymous, expired, revoked, no role, wrong tenant, wrong owner, suspended or removed member, stale token, stale cache, unknown role, unknown action, wildcard policy, explicit deny, shared-link, read-only API key, org admin, global admin, and impersonating admin cases as applicable.
-18. When changing policies, consider shadow evaluation.
+19. Require denial-first tests for changed protected actions when the project has a usable test surface. Cover anonymous, expired, revoked, no role, wrong tenant, wrong team, wrong owner, suspended or removed member, stale token, refresh-token reuse, stale cache, unknown role, unknown action, wildcard policy, explicit deny, shared-link, read-only API key, org admin, team admin, billing admin, global admin, support user, and impersonating admin cases as applicable.
+20. When changing policies, consider shadow evaluation.
     - Compute old and new decisions side by side for representative requests before flipping broad
       policy changes when the product has the infrastructure to do so.
     - Log policy version and data revision for both decisions without exposing secrets or sensitive
       object contents.
-19. Update docs and role matrices when external behavior, status codes, role names, permission names, admin scope, or API errors change.
-20. Report the policy source of truth, effective-permission evidence, decision explanation, server/database enforcement, client UX-only guards, revocation window, test coverage, skipped checks, and remaining permission risk.
+21. Update docs and role matrices when external behavior, status codes, role names, permission names, admin scope, or API errors change.
+22. Report the policy source of truth, effective-permission evidence, decision explanation, server/database enforcement, client UX-only guards, revocation window, account-takeover response, test coverage, skipped checks, and remaining permission risk.
 ## Boundary Rules
@@ -161,8 +210,15 @@ Authentication answers who the requester is. Authorization answers what that pri
 - Database RLS, tenant-scoped queries, views, and stored procedures are defense-in-depth and may be the strongest final boundary, but they do not excuse unclear application policy.
 - Background jobs, webhooks, imports, exports, and cron paths need actor or service-principal context. User-request paths are not the only permission paths.
 - Admin is scoped. Global admin, org admin, project admin, billing admin, support, and impersonating admin are different roles and need different audit and action rules.
+- Organization and team membership are authorization relationships, not user attributes. A user can
+  be owner in one organization, viewer in another, removed from a team, and still retain a direct
+  resource grant unless the policy says otherwise.
 - Owner is not a wildcard permission. Owners may still lack delete, export, invite, transfer, billing, or admin powers.
 - API keys are principals with scopes and owners, not user sessions with unlimited power.
+- ABAC inputs are security dependencies. Unknown, stale, or client-controlled attributes should not
+  become allow decisions.
+- Account recovery, MFA reset, email change, and invite acceptance are login-equivalent or
+  privilege-factory flows.
 - Wildcard permissions are future permissions. Adding a new action under an old wildcard should be
   treated as a permission expansion and reviewed accordingly.
 - Unknown roles, unknown actions, malformed identifiers, and missing relationship data deny by
@@ -177,8 +233,15 @@ Authentication answers who the requester is. Authorization answers what that pri
 - Treating membership row existence as active membership without checking status.
 - Treating OAuth provider scopes as internal app permissions.
 - Treating JWT role claims as fresh authorization after role changes.
+- Storing long-lived browser auth tokens in local storage or URLs without an explicit browser-threat
+  model and revocation plan.
+- Treating refresh tokens as harmless because access tokens are short-lived.
+- Accepting OAuth account links by email alone instead of provider issuer and subject plus
+  reauthentication.
 - Treating API keys as normal user sessions.
 - Mixing org admin, global admin, support admin, and impersonation into one `admin` check.
+- Treating team membership, invitation, account recovery, or last-owner handling as UI workflow
+  only.
 - Changing 403 to 404, or 404 to 403, without naming the information-disclosure policy.
 - Relaxing permissions without denial-case tests or a written migration and audit plan.
 - Letting role changes ship without permission-cache or token-staleness handling.
@@ -195,11 +258,11 @@ Authentication answers who the requester is. Authorization answers what that pri
 - Authentication and authorization are separated in code and report language.
 - Every changed protected action has a server-side or database-side permission boundary.
-- Tenant isolation, resource ownership, sharing, admin scope, and status-code behavior are explicit.
+- Tenant isolation, organization or team membership, resource ownership, sharing, admin scope, and status-code behavior are explicit.
 - Effective permissions, policy-combination rules, decision explanation, policy version, data
   revision, and revocation window are explicit when relevant.
 - Client guards are described as UX only.
-- Session, token, OAuth/OIDC, API key, cache, audit, docs, migration, and tests are synchronized when touched.
+- Session, cookie, browser-token, refresh-token, OAuth/OIDC, MFA, passkey, recovery, API key, cache, audit, docs, migration, and tests are synchronized when touched.
 - Signed URL, event-stream, WebTransport, WebSocket fallback, CORS, cookie, CDN/proxy cache, and reconnect behavior remains inside the permission model when touched.
 <!-- mustflow-section: verification -->
@@ -217,7 +280,7 @@ Use configured oneshot command intents when available:
 - `test_release`
 - `mustflow_check`
-Prefer the narrowest configured test intent that covers the changed protected actions and denial cases. Report missing auth-specific policy tests, tenant-isolation tests, token/session tests, API-key tests, cache-staleness tests, audit-log checks, docs validation, or database-policy verification when relevant.
+Prefer the narrowest configured test intent that covers the changed protected actions and denial cases. Report missing auth-specific policy tests, tenant-isolation tests, organization/team permission tests, token/session/refresh-token tests, account-recovery tests, API-key tests, cache-staleness tests, audit-log checks, docs validation, or database-policy verification when relevant.
 <!-- mustflow-section: failure-handling -->
 ## Failure Handling
@@ -238,7 +301,7 @@ Prefer the narrowest configured test intent that covers the changed protected ac
 - Effective permission, policy version, data revision, decision explanation, and revocation window
 - Server/database enforcement notes
 - Client guard UX-only notes
-- Tenant, ownership, sharing, admin, token/session/API-key, cache, and audit notes
+- Tenant, organization/team, ownership, sharing, admin, token/session/refresh-token/MFA/API-key, account-recovery, cache, and audit notes
 - Event stream, WebTransport, WebSocket fallback, signed URL, CORS, cookie, and CDN/proxy cache notes when relevant
 - Tests or denial cases covered
 - Files changed