npm - mustflow - Versions diffs - 2.85.4 → 2.99.0 - Mend

mustflow 2.85.4 → 2.99.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

package/templates/default/locales/en/.mustflow/skills/payment-integrity-review/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.payment-integrity-review
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: payment-integrity-review
@@ -53,14 +53,23 @@ Review payment code as money-event integrity, not provider API success. The core
 ## Required Inputs
 - Money-event ledger: every create, authorize, capture, fulfill, refund, dispute, chargeback, settlement, adjustment, cancellation, expiration, and entitlement event that can move money or access.
-- Provider interaction ledger: payment provider calls, webhook event types, redirect handlers, polling, reconciliation jobs, SDK clients, idempotency keys, provider object IDs, and provider environment selection.
-- State-transition ledger: internal states, provider states, allowed transitions, terminal states, retry states, async states, and transition owners.
+- Provider interaction ledger: payment provider calls, webhook event types, redirect handlers,
+  polling, reconciliation jobs, SDK clients, idempotency keys, internal order IDs, internal payment
+  IDs, attempt IDs, provider object IDs, provider reference IDs, and provider environment selection.
+- State-transition ledger: internal states, provider states, allowed transitions, terminal states,
+  retry states, async states, hold states, kill-switch states, and transition owners.
+- Event log ledger: request submission, provider response, redirect, webhook receipt, webhook
+  application, state transition, queue handoff, reconciliation decision, fulfillment, refund,
+  dispute, admin override, and correction events with ordering, actor, reason, and immutable
+  evidence.
 - Idempotency and uniqueness ledger: logical operation IDs, provider idempotency keys, database uniqueness constraints, webhook event dedupe keys, fulfillment dedupe keys, and retry behavior.
 - Amount and currency ledger: product/cart snapshot, server-side calculation path, quantity, discounts, coupons, tax, shipping, minor-unit representation, currency, provider amount, internal ledger amount, receipt amount, and settlement amount.
 - Ownership ledger: user, tenant, account, order, payment session, refund, subscription, invoice, entitlement, admin actor, and provider customer ownership checks.
 - Fulfillment and entitlement ledger: when access, inventory, shipment, credits, licenses, notifications, or downstream side effects are granted or revoked.
 - Webhook and retry ledger: signature verification, raw-body handling, event storage, queue handoff, duplicate and out-of-order handling, timeout classification, backoff, and dead-letter behavior.
-- Audit and sensitive-data ledger: logs, metrics, traces, admin overrides, before/after values, reason fields, approval paths, rollback paths, and payment-sensitive data redaction.
+- Audit and sensitive-data ledger: logs, metrics, traces, segmented approval or decline metrics,
+  payment-path segments, orphan authorization monitors, admin overrides, before/after values, reason
+  fields, approval paths, rollback paths, and payment-sensitive data redaction.
 <!-- mustflow-section: preconditions -->
 ## Preconditions
@@ -80,40 +89,55 @@ Review payment code as money-event integrity, not provider API success. The core
 ## Procedure
 1. Model payment as a state machine. Reject a single `paid`, `success`, or `active` boolean when the code must distinguish created, pending, authorized, captured, failed, cancelled, expired, refunded, partially refunded, disputed, unpaid, retrying, grace, or settled states.
-2. Calculate amount on the server. Treat client-supplied amount, currency, quantity, discount, coupon, tax, shipping, product ID, plan ID, or cart totals as input claims only; rebuild the payable total from trusted product, cart, account, and policy snapshots.
-3. Bind every payment object to its owner. Verify user, tenant, order, payment session, refund, subscription, invoice, provider customer, and admin actor ownership before read, write, refund, cancel, fulfillment, or entitlement changes.
-4. Compare every amount ledger. Trace order amount, provider request amount, provider response amount, internal money ledger, receipt, settlement, fee, refund, and entitlement amount. Flag any path where one amount can drift without reconciliation.
-5. Use integer minor units. Reject float, double, string-concatenated, rounded-late, locale-formatted, or JavaScript-number money math when it can cross currency or precision boundaries.
-6. Make payment creation idempotent. Use a stable key for one logical payment attempt, not a fresh UUID per retry. Include operation identity such as order or attempt ID, but do not include secrets or raw personal data.
-7. Use database uniqueness as the last gate. Add or verify unique constraints for provider payment IDs, provider session IDs, provider event IDs, provider refund IDs, internal ledger IDs, and fulfillment records where duplicates would move money or access twice.
-8. Assume webhooks are duplicated. Store event IDs or object/type pairs before applying effects, make handlers idempotent, and treat duplicate delivery as expected behavior.
-9. Assume webhooks are out of order. Do not let a late `created`, `pending`, or stale failure event overwrite a captured, refunded, disputed, or terminal internal state. Re-fetch provider state when event order is insufficient.
-10. Verify webhook signatures on the raw body. Check signatures before JSON mutation, parsing wrappers, body normalizers, or middleware that changes bytes. Do not keep a debug path that disables signature verification.
-11. Return from webhook endpoints quickly. Persist the event, enqueue durable work, and return a provider-acceptable response without doing slow fulfillment, network fan-out, file work, or long transactions in the webhook request.
-12. Never use success redirects as proof. Treat checkout success pages, return URLs, frontend callbacks, and local storage flags as user navigation only; fulfillment must depend on verified provider state or signed server-side evidence.
-13. Run fulfillment exactly once. Guard entitlement grants, shipments, credit issuance, license creation, invoice finalization, emails with money meaning, and inventory release with unique records or state transitions.
-14. Handle asynchronous payment methods. Do not fulfill on checkout completion when the provider can still move through pending, requires_action, processing, delayed success, delayed failure, or expiry states.
-15. Separate authorization from capture. Do not treat an authorization hold as captured money. Review capture windows, partial captures, expired authorizations, cancellations, and post-authorization amount changes.
-16. Review refunds as money-out events. Check partial refunds, double refunds, refund failures, refund idempotency, refund ownership, refund amount/currency, ledger reversal, entitlement revocation, and receipt updates.
-17. Handle disputes and chargebacks. Ensure dispute events affect access, account risk, support workflow, ledger entries, settlement reports, and customer-visible state without pretending the original capture still stands unchanged.
-18. Review subscriptions as state machines. Separate trialing, active, past_due, grace, unpaid, cancelled, pending cancellation, retrying, upgraded, downgraded, invoice-open, invoice-paid, and invoice-failed states.
-19. Reserve inventory before confirming it. Check that payment, inventory, cancellation, expiration, refund, and fulfillment cannot oversell, lose stock, or keep stock reserved after an abandoned payment.
-20. Reserve coupons before consuming them. Under concurrent attempts, a coupon should not be spent twice or lost forever after a failed or expired payment. Review reservation, consumption, release, and expiry paths.
-21. Treat timeouts as unknown outcomes. A provider timeout after request submission is not a failure proof. Verify by idempotency key, provider object lookup, webhook, or reconciliation before retrying or cancelling.
-22. Classify retries by failure kind. Separate retryable network failures, provider rate limits, validation failures, insufficient funds, authentication failures, duplicate operation responses, and unknown outcomes with bounded backoff.
-23. Keep an append-only money ledger. Prefer immutable entries for payment, capture, refund, fee, settlement, chargeback, adjustment, and correction. Flag mutable balance-only code with no event history.
-24. Reconcile provider and internal state. Check scheduled or manual reconciliation for missed webhooks, stale internal states, provider-side refunds, settlement fees, disputes, and permanently unknown operations.
-25. Redact payment-sensitive data. Never log card numbers, CVV, track data, PINs, raw payment credentials, webhook secrets, bearer tokens, provider secret keys, or full provider payloads containing sensitive fields.
-26. Separate test and live payment planes. Verify API keys, webhook secrets, product IDs, price IDs, environment flags, provider account IDs, and fixtures cannot cross between test and live modes.
-27. Audit manual payment operations. Require role, reason, target object, before/after values, approver or policy evidence, operator identity, timestamp, and rollback or correction path for admin overrides.
-28. Search for stale payment endpoints. Review old checkout paths, hidden callback URLs, deprecated provider versions, old mobile endpoints, webhook v1 handlers, and manual scripts that still mutate money state.
-29. Check public errors and support evidence. Payment failures must not lie about success, leak sensitive payment facts, or leave support with no safe correlation ID, provider object ID, or internal event ID.
-30. Test the nightmare paths. Include repeated pay-button clicks, replayed webhooks, out-of-order webhooks, success redirect plus database failure, database success plus provider timeout, amount or currency tampering, wrong order ID, concurrent double refund, pay then cancel, expired-session completion, subscription retry, provider missed webhook, and admin override rollback.
+2. Separate the identifiers. Do not let one `order_id` or provider session ID stand in for every
+   concept. Track order ID, payment ID, attempt ID, provider customer ID, provider payment/session
+   ID, provider refund ID, provider event ID, and internal ledger entry ID separately so retries,
+   provider redirects, webhooks, refunds, and reconciliation do not overwrite each other.
+3. Keep an immutable event trail. Store request submission, provider response, redirect, webhook,
+   state transition, queue handoff, reconciliation, fulfillment, refund, dispute, and admin override
+   events with actor, reason, timestamp, provider reference, and before/after state when relevant.
+4. Calculate amount on the server. Treat client-supplied amount, currency, quantity, discount, coupon, tax, shipping, product ID, plan ID, or cart totals as input claims only; rebuild the payable total from trusted product, cart, account, and policy snapshots.
+5. Bind every payment object to its owner. Verify user, tenant, order, payment session, refund, subscription, invoice, provider customer, and admin actor ownership before read, write, refund, cancel, fulfillment, or entitlement changes.
+6. Compare every amount ledger. Trace order amount, provider request amount, provider response amount, internal money ledger, receipt, settlement, fee, refund, and entitlement amount. Flag any path where one amount can drift without reconciliation.
+7. Use integer minor units. Reject float, double, string-concatenated, rounded-late, locale-formatted, or JavaScript-number money math when it can cross currency or precision boundaries.
+8. Make payment creation idempotent. Use a stable key for one logical payment attempt, not a fresh UUID per retry. Include operation identity such as order or attempt ID, but do not include secrets or raw personal data.
+9. Use database uniqueness as the last gate. Add or verify unique constraints for provider payment IDs, provider session IDs, provider event IDs, provider refund IDs, internal ledger IDs, and fulfillment records where duplicates would move money or access twice.
+10. Assume webhooks are duplicated. Store event IDs or object/type pairs before applying effects, make handlers idempotent, and treat duplicate delivery as expected behavior.
+11. Assume webhooks are out of order. Do not let a late `created`, `pending`, or stale failure event overwrite a captured, refunded, disputed, or terminal internal state. Re-fetch provider state when event order is insufficient.
+12. Verify webhook signatures on the raw body. Check signatures before JSON mutation, parsing wrappers, body normalizers, or middleware that changes bytes. Do not keep a debug path that disables signature verification.
+13. Return from webhook endpoints quickly. Persist the event, enqueue durable work, and return a provider-acceptable response without doing slow fulfillment, network fan-out, file work, or long transactions in the webhook request.
+14. Never use success redirects as proof. Treat checkout success pages, return URLs, frontend callbacks, and local storage flags as user navigation only; fulfillment must depend on verified provider state or signed server-side evidence.
+15. Run fulfillment exactly once. Guard entitlement grants, shipments, credit issuance, license creation, invoice finalization, emails with money meaning, and inventory release with unique records or state transitions.
+16. Handle asynchronous payment methods. Do not fulfill on checkout completion when the provider can still move through pending, requires_action, processing, delayed success, delayed failure, or expiry states.
+17. Separate authorization from capture. Do not treat an authorization hold as captured money. Review capture windows, partial captures, expired authorizations, cancellations, orphan authorized-but-not-captured operations, and post-authorization amount changes.
+18. Review refunds as money-out events. Check requested, pending, completed, failed, cancelled, and partial refund states; double refunds; refund failures; refund idempotency; refund ownership; refund amount/currency; ledger reversal; entitlement revocation; and receipt updates.
+19. Handle disputes and chargebacks. Ensure dispute events affect access, account risk, support workflow, ledger entries, settlement reports, and customer-visible state without pretending the original capture still stands unchanged.
+20. Review subscriptions as state machines. Separate trialing, active, past_due, grace, unpaid, cancelled, pending cancellation, retrying, upgraded, downgraded, invoice-open, invoice-paid, and invoice-failed states.
+21. Reserve inventory before confirming it. Check that payment, inventory, cancellation, expiration, refund, and fulfillment cannot oversell, lose stock, or keep stock reserved after an abandoned payment.
+22. Reserve coupons before consuming them. Under concurrent attempts, a coupon should not be spent twice or lost forever after a failed or expired payment. Review reservation, consumption, release, and expiry paths.
+23. Treat timeouts as unknown outcomes. A provider timeout after request submission is not a failure proof. Verify by idempotency key, provider object lookup, webhook, or reconciliation before retrying or cancelling.
+24. Classify retries by failure kind. Separate retryable network failures, provider rate limits, validation failures, authentication-required states, insufficient funds, issuer declines, suspected fraud, duplicate operation responses, and unknown outcomes with bounded backoff.
+25. Segment the payment path. When diagnosing approval rate or decline spikes, separate frontend validation, backend request creation, provider gateway, acquirer, card network, issuer, bank, 3DS or additional authentication, and settlement evidence instead of reading one blended failure count.
+26. Keep an append-only money ledger. Prefer immutable entries for payment, capture, refund, fee, settlement, chargeback, adjustment, and correction. Flag mutable balance-only code with no event history.
+27. Reconcile provider and internal state. Check scheduled or manual reconciliation for missed webhooks, stale internal states, provider-side refunds, settlement fees, disputes, orphan authorizations, and permanently unknown operations.
+28. Redact payment-sensitive data. Never log card numbers, CVV, track data, PINs, raw payment credentials, webhook secrets, bearer tokens, provider secret keys, or full provider payloads containing sensitive fields.
+29. Separate test and live payment planes. Verify API keys, webhook secrets, product IDs, price IDs, environment flags, provider account IDs, and fixtures cannot cross between test and live modes.
+30. Audit manual payment operations. Require role, reason, target object, before/after values, approver or policy evidence, operator identity, timestamp, and rollback or correction path for admin overrides.
+31. Add a payment hold or kill-switch path for unsafe flows. Risky provider migrations, webhook
+    regressions, reconciliation uncertainty, fraud spikes, or duplicate-money incidents need a way
+    to hold fulfillment, stop captures, pause refunds, or disable a provider path without corrupting
+    ledger state.
+32. Search for stale payment endpoints. Review old checkout paths, hidden callback URLs, deprecated provider versions, old mobile endpoints, webhook v1 handlers, and manual scripts that still mutate money state.
+33. Check public errors and support evidence. Payment failures must not lie about success, leak sensitive payment facts, or leave support with no safe correlation ID, provider object ID, or internal event ID.
+34. Test the nightmare paths. Include repeated pay-button clicks, replayed webhooks, out-of-order webhooks, success redirect plus database failure, database success plus provider timeout, amount or currency tampering, wrong order ID, concurrent double refund, pay then cancel, expired-session completion, subscription retry, provider missed webhook, orphan authorization cleanup, provider kill switch or hold state, and admin override rollback.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
-- The payment surface has a money-event map, provider interaction map, state-transition map, idempotency and uniqueness map, amount and currency map, ownership map, fulfillment and entitlement map, webhook/retry map, and audit/sensitive-data map.
+- The payment surface has a money-event map, provider interaction map, identifier map,
+  state-transition map, immutable event log, idempotency and uniqueness map, amount and currency map,
+  ownership map, fulfillment and entitlement map, webhook/retry map, reconciliation and hold-state
+  map, and audit/sensitive-data map.
 - Any false success, duplicate money movement, duplicate fulfillment, wrong-owner action, wrong amount, wrong currency, stale event overwrite, timeout misclassification, or missing reconciliation is fixed or reported with evidence.
 - Tests or explicit verification cover the highest-risk nightmare paths available in the current scope.
@@ -148,7 +172,7 @@ Prefer focused tests for duplicate operations, webhook replay, out-of-order even
 ## Output Format
 - Payment surface and provider boundary reviewed
-- Money-event, provider, state, idempotency, amount, ownership, fulfillment, webhook, retry, audit, and sensitive-data ledgers
+- Money-event, provider, identifier, state, event-log, idempotency, amount, ownership, fulfillment, webhook, retry, reconciliation, hold-state, audit, and sensitive-data ledgers
 - Findings or fixes for duplicate, late, out-of-order, wrong-actor, wrong-amount, wrong-currency, timeout, retry, reconciliation, and audit risks
 - Nightmare-path tests or evidence added, run, skipped, or still missing
 - Command intents run

package/templates/default/locales/en/.mustflow/skills/powershell-code-change/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.powershell-code-change
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: powershell-code-change
@@ -29,7 +29,7 @@ metadata:
 <!-- mustflow-section: purpose -->
 ## Purpose
-Preserve PowerShell parsing, quoting, native argument passing, script portability, and command-injection boundaries.
+Preserve PowerShell parsing, quoting, native argument passing, deterministic file rewrites, script portability, and command-injection boundaries.
 PowerShell quoting bugs usually come from parser layering, not from one wrong quote character. A command may be parsed by the host shell, PowerShell expression or argument mode, PowerShell string expansion, and then a native program parser such as `git.exe`, `cmd.exe`, `ssh`, `curl`, `python`, or `node`.
@@ -40,6 +40,7 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
 - PowerShell strings, here-strings, interpolation, splatting, parameter binding, call operator usage, `Start-Process`, `Invoke-Expression`, `--`, `--%`, `-Command`, `-File`, `-EncodedCommand`, or stdin execution changes.
 - PowerShell code calls native commands such as `git.exe`, `cmd.exe`, `.bat`, `.cmd`, `ssh`, `curl`, `python`, `node`, `npm`, `bun`, `docker`, `winget`, `msiexec`, or vendor CLIs.
 - Regex, wildcard, replacement strings, JSON, YAML, XML, SQL, paths with spaces, literal `$`, literal quotes, or shell metacharacters are passed through PowerShell.
+- PowerShell is used for mechanical repository file rewrites, text replacement, generated-file updates, or line-ending-sensitive edits.
 <!-- mustflow-section: do-not-use-when -->
 ## Do Not Use When
@@ -56,6 +57,7 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
 - Invocation path: direct script, module import, profile load, package script, CI step, scheduled task, `pwsh -Command`, `pwsh -File`, stdin, encoded command, or another shell invoking PowerShell.
 - Parser layers involved: host shell, PowerShell expression mode, PowerShell argument mode, expandable or verbatim strings, native command parser, regex parser, wildcard parser, replacement parser, JSON/YAML/XML/SQL parser, or remote shell.
 - Native command boundary: executable path, argument list, wrapper extension, expected argv shape, whether literal quote characters are required, and whether `$PSNativeCommandArgumentPassing` affects behavior.
+- File rewrite boundary: target file policy, expected encoding, expected newline style, replacement count, whether the file may contain mixed line endings, and whether the repository declares an EOL policy.
 - Dynamic input boundaries: user input, paths, URLs, commit messages, regex patterns, replacement strings, JSON bodies, headers, credentials, environment variables, and values that may contain spaces or metacharacters.
 - Existing test, lint, docs, package, workflow, and command-intent verification surfaces.
@@ -72,9 +74,11 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
 - Replace string-built command lines with arrays, hashtables, splatting, direct invocation, or repository-local helpers.
 - Convert fragile multiline commands to splatting or here-strings when behavior stays equivalent.
+- Replace lossy text rewrite pipelines with deterministic read/write APIs that preserve or intentionally normalize encoding and newlines.
 - Add focused tests or fixtures that prove argv shape, parser behavior, escaping, failure paths, or documented examples.
 - Update docs, command examples, CI snippets, or package scripts directly tied to the PowerShell behavior being changed.
 - Do not add `Invoke-Expression`, broad `cmd /c`, broad `--%`, global profile mutation, policy bypasses, or command-string reconstruction to make quoting appear to work.
+- Do not use `Get-Content` piped to `Set-Content`, `Out-File`, or shell redirection for repository-wide mechanical rewrites when line endings, encoding, or BOM behavior matters.
 <!-- mustflow-section: procedure -->
 ## Procedure
@@ -106,10 +110,13 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
    - use single-quoted regex patterns unless PowerShell interpolation is intentional;
    - escape replacement `$` according to replacement-string rules, not only PowerShell string rules;
    - escape wildcard metacharacters for wildcard matching even inside single-quoted PowerShell strings.
-20. For cross-shell PowerShell calls, avoid complex inline `-Command` strings. Prefer `-File`, stdin, or an encoded command when the repository already uses that pattern and the encoding boundary is tested. If `-Command` is used, document the host shell and PowerShell parser layers.
-21. Keep paths as path values, not shell fragments. Prefer `-LiteralPath` when wildcard expansion is not intended, and do not compose destructive filesystem actions through a different shell.
-22. Add or reuse verification that observes behavior, not only spelling. Useful evidence includes argv echo fixtures, Pester cases, dry-run output, parser-specific tests, or configured CI/package/docs checks.
-23. Choose configured verification intents that cover the changed script, docs example, package metadata, CI wrapper, public command behavior, and mustflow contract surface.
+20. For mechanical text replacement, count expected matches before writing. If the count is not exactly the intended number, stop and report the mismatch instead of writing a broad replacement.
+21. For repository file rewrites, prefer .NET file APIs with explicit encoding over PowerShell content cmdlets when deterministic output matters. Normalize CRLF and lone CR to LF only when the repository policy expects LF, and write UTF-8 without BOM explicitly. Treat `Set-Content -Encoding utf8` as version-sensitive because Windows PowerShell 5.1 and PowerShell 6+ differ in default UTF-8 BOM behavior.
+22. If line-ending warnings appear after a PowerShell rewrite, do not assume the last read command caused them. Inspect repository EOL policy and per-file EOL evidence, then activate `line-ending-hygiene` for cause analysis or normalization decisions.
+23. For cross-shell PowerShell calls, avoid complex inline `-Command` strings. Prefer `-File`, stdin, or an encoded command when the repository already uses that pattern and the encoding boundary is tested. If `-Command` is used, document the host shell and PowerShell parser layers.
+24. Keep paths as path values, not shell fragments. Prefer `-LiteralPath` when wildcard expansion is not intended, and do not compose destructive filesystem actions through a different shell.
+25. Add or reuse verification that observes behavior, not only spelling. Useful evidence includes argv echo fixtures, Pester cases, dry-run output, parser-specific tests, deterministic encoding or line-ending checks, or configured CI/package/docs checks.
+26. Choose configured verification intents that cover the changed script, docs example, package metadata, CI wrapper, public command behavior, and mustflow contract surface.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
@@ -117,6 +124,7 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
 - The parser layers and target command type are explicit.
 - Literal strings, expandable strings, here-strings, regex patterns, wildcard patterns, replacement strings, paths, and native argv are not conflated.
 - Native command calls keep executable path and arguments separated unless a documented target requires otherwise.
+- Mechanical rewrites have explicit replacement counts, encoding, and newline decisions.
 - Dynamic values remain data-bound and are not reinterpreted as shell code.
 - PowerShell version, native argument passing mode, and cross-shell boundaries are verified or reported as remaining risks.
@@ -145,6 +153,7 @@ Report missing PowerShell-version, argv-shape, Pester, CI-shell, Windows-native-
 - If regex, wildcard, or replacement escaping breaks, test the second parser explicitly instead of only changing PowerShell string quotes.
 - If cross-shell `pwsh -Command` quoting becomes complex, move the script body to `-File`, stdin, or a tested encoded boundary rather than adding another quoting layer.
 - If untrusted input is interpolated into command text, treat it as a command-injection risk and restructure around argument binding.
+- If a rewrite must preserve or normalize LF, avoid version-sensitive content cmdlets and use an explicit writer. If line-ending policy is absent, report the missing policy instead of silently normalizing.
 <!-- mustflow-section: output-format -->
 ## Output Format
@@ -152,6 +161,7 @@ Report missing PowerShell-version, argv-shape, Pester, CI-shell, Windows-native-
 - PowerShell version and invocation boundary
 - Parser ledger
 - String, here-string, regex, wildcard, replacement, and native argv decisions
+- File rewrite, encoding, and newline decisions
 - Files changed
 - Command intents run
 - Skipped checks and reasons

package/templates/default/locales/en/.mustflow/skills/prompt-contract-quality-review/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.prompt-contract-quality-review
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: prompt-contract-quality-review
@@ -45,6 +45,9 @@ Review prompts as product contracts, not prose polish. A production prompt shoul
 - The change is only LLM chat, copilot, citation, streaming, history, or prompt-composer UI; use `llm-service-ux-review`.
 - The main risk is unsupported factual output, fabricated citations, weak evidence coverage, retrieval thresholds, claim maps, or abstain behavior; use `llm-hallucination-control-review`.
+- The task is an end-to-end RAG failure and it is not yet clear whether ingestion, retrieval,
+  context assembly, prompt construction, generation, citation validation, or answerability failed;
+  use `rag-pipeline-triage` first.
 - The main risk is token spend, provider prompt-cache hit rate, chat-history bloat, RAG context size, model routing cost, reasoning budget, retry replay, or cost observability; use `llm-token-cost-control-review`.
 - The main risk is time to first token, first useful output, streaming latency, LLM round trips, tool wait, prompt-cache latency, model routing speed, realtime continuation, priority tier, predicted-output latency, or user-perceived response speed; use `llm-response-latency-review`.
 - The main risk is autonomous agent control flow, planner/executor/verifier separation, tool-call gates, approval or interrupt state, durable resume behavior, loop budgets, retry classification, handoffs, guardrails, or trace outcome evaluation; use `agent-execution-control-review`.

package/templates/default/locales/en/.mustflow/skills/python-code-change/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.python-code-change
 locale: en
 canonical: true
-revision: 3
+revision: 4
 lifecycle: mustflow-owned
 authority: procedure
 name: python-code-change
@@ -94,12 +94,20 @@ Preserve Python runtime, standard-library, packaging, import, async resource, pu
    - prefer `importlib.resources` for packaged data, `tomllib` for TOML reads, and `Path.walk()` only after checking version support, pruning behavior, symlink recursion, ordering, and cycle risks;
    - use dataclass options such as `slots`, `frozen`, and `kw_only`, `StrEnum`, `TypedDict`, or `Protocol` only when they match the public shape and runtime/type-checker support;
    - treat `functools.cache`, `lru_cache`, `cached_property`, `partial`, and Python 3.14+ `Placeholder` as state, memory, concurrency, and versioned-API choices rather than harmless terseness.
-11. Keep process, archive, and concurrency safety explicit:
+11. Treat newer syntax and typing features as semantic tools, not style trophies:
+   - use template string literals only when a handler needs the static and interpolated parts separately, such as SQL builders, shell command objects, logging templates, or markup renderers; do not replace ordinary f-strings when the result is just a string;
+   - when runtime code reads annotations, use the supported annotation inspection API and choose the intended format explicitly instead of assuming `__annotations__` already contains runtime values;
+   - use sentinel values to distinguish "argument omitted" from `None`, but compare sentinels by identity and keep public signatures readable;
+   - prefer `Mapping` or narrower read-only protocols for read-only inputs so immutable mapping implementations are not rejected accidentally;
+   - use closed or extra-key `TypedDict` forms only when the supported Python and type-checker versions agree with that shape.
+12. Keep `finally` as cleanup, not outcome selection. Do not add `return`, `break`, or `continue` inside `finally` blocks because they can mask exceptions and cancellation; move result decisions outside cleanup or make suppression an explicit documented contract.
+13. Use explicit lazy imports only for startup-sensitive module-scope dependencies after checking version support and import-time side effects. Do not lazily import plugins, registries, monkey patches, model definitions, ORM mappings, or observability setup whose import side effects are part of startup correctness.
+14. Keep process, archive, and concurrency safety explicit:
    - subprocess calls use argument lists, checked failure handling, timeouts, bounded captured output, and a narrow `shell=True` exception when the project already permits it;
    - archive extraction, including `tarfile`, keeps untrusted archive inspection, extraction filters, partial-extract cleanup, and older-runtime defaults visible;
    - `asyncio.TaskGroup`, `asyncio.timeout`, and `asyncio.to_thread` are used only when their cancellation, timeout, blocking-work, and Python-version semantics fit the surrounding lifecycle.
-12. Use runtime diagnostics as evidence, not as permanent workaround code. Interpreter or library diagnostics such as import timing, `tracemalloc`, `faulthandler`, profiling, and allocation tracing should go through configured diagnostic or verification intents when available, and missing intents should be reported instead of adding ad hoc command recipes to the skill.
-13. Preserve async and resource ownership:
+15. Use runtime diagnostics as evidence, not as permanent workaround code. Interpreter or library diagnostics such as import timing, `tracemalloc`, `faulthandler`, profiling, and allocation tracing should go through configured diagnostic or verification intents when available, and missing intents should be reported instead of adding ad hoc command recipes to the skill.
+16. Preserve async and resource ownership:
    - every coroutine is awaited, returned by contract, or scheduled as an owned and tracked task;
    - raw background task creation is allowed only through the project's owner or spawn helper, a task group, or an equivalent lifecycle mechanism;
    - background tasks keep a strong reference, have a shutdown path, and retrieve failures instead of leaving never-retrieved exceptions;
@@ -108,15 +116,15 @@ Preserve Python runtime, standard-library, packaging, import, async resource, pu
    - context managers and async context managers do not suppress exceptions unless suppression is the feature;
    - context-manager helpers that catch exceptions for logging re-raise after logging;
    - early-exit async generators have an explicit close path.
-14. Preserve traceback evidence. Logging inside exception handlers should retain exception information instead of logging only the exception message.
-15. Preserve public contracts:
+17. Preserve traceback evidence. Logging inside exception handlers should retain exception information instead of logging only the exception message.
+18. Preserve public contracts:
    - treat public imports, public signatures, exceptions, return shapes, CLI behavior, entry points, config keys, environment variables, dependency metadata, extras, Python version support, and typing stubs as compatibility-sensitive;
    - do not change sync functions into async functions, accepted input shapes, nullable behavior, documented exception types, tuple/dict/dataclass return shapes, config precedence, or environment variable semantics without a compatibility review;
    - typed packages should keep runtime and typing surfaces aligned, including `py.typed` and stubs when present.
-16. Avoid mutable default arguments, broad `except Exception: pass`, broad `BaseException` catches outside process boundaries, global state hidden behind module imports, and path handling that ignores existing `pathlib` or OS conventions.
-17. Use `# type: ignore[...]` only when tightly scoped, justified, and consistent with local policy.
-18. If packaging, public API, CLI, config, or typing contracts change, synchronize README examples, entry point tests, build metadata, docs, fixtures, and downstream-style examples that describe installation or usage.
-19. Choose configured verification intents that cover formatting, lint, type checking, tests, package build, installed-package smoke checks, and CLI smoke risk when available.
+19. Avoid mutable default arguments, broad `except Exception: pass`, broad `BaseException` catches outside process boundaries, global state hidden behind module imports, `finally` masking, and path handling that ignores existing `pathlib` or OS conventions.
+20. Use `# type: ignore[...]` only when tightly scoped, justified, and consistent with local policy.
+21. If packaging, public API, CLI, config, or typing contracts change, synchronize README examples, entry point tests, build metadata, docs, fixtures, and downstream-style examples that describe installation or usage.
+22. Choose configured verification intents that cover formatting, lint, type checking, tests, package build, installed-package smoke checks, and CLI smoke risk when available.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
@@ -152,6 +160,7 @@ Report missing package, type, or test intents rather than inventing raw tool com
 - If packaging correctness matters but only repository-root tests can run, report that wheel or installed-artifact verification is missing.
 - If the supported Python version blocks a syntax choice, rewrite to the supported form.
 - If the supported Python version blocks a standard-library feature, changed default, diagnostic flag, or helper API, use the supported equivalent or report the runtime-support decision instead of silently raising `requires-python`.
+- If template strings, annotation runtime access, lazy imports, sentinels, immutable mappings, or typed extra keys are useful but version-gated, keep a fallback or report the required support bump instead of smuggling the newer feature into a lower-runtime project.
 - If third-party stubs or package metadata are wrong, document the local workaround and keep it narrow.
 - If a background task lacks owner, shutdown, strong reference, or exception retrieval, do not add it.
 - If cancellation or context-manager behavior is swallowed accidentally, restore propagation or document the intentional suppression contract.

package/templates/default/locales/en/.mustflow/skills/rag-pipeline-triage/SKILL.md ADDED Viewed

@@ -0,0 +1,206 @@
+---
+mustflow_doc: skill.rag-pipeline-triage
+locale: en
+canonical: true
+revision: 1
+lifecycle: mustflow-owned
+authority: procedure
+name: rag-pipeline-triage
+description: Apply this skill when a RAG, knowledge-base answer, grounded chat, citation answer, retrieval-augmented support bot, or document QA flow is wrong, stale, unsupported, slow, leaking data, over-refusing, or not yet localized to ingestion, parsing, chunking, retrieval, filtering, reranking, context assembly, prompt construction, generation, citation validation, or answerability boundaries.
+metadata:
+  mustflow_schema: "1"
+  mustflow_kind: procedure
+  pack_id: mustflow.core
+  skill_id: mustflow.core.rag-pipeline-triage
+  command_intents:
+    - changes_status
+    - changes_diff_summary
+    - lint
+    - build
+    - test_related
+    - test
+    - docs_validate_fast
+    - test_release
+    - mustflow_check
+---
+# RAG Pipeline Triage
+<!-- mustflow-section: purpose -->
+## Purpose
+Localize RAG failures by splitting ingestion, retrieval, context assembly, and answer generation
+before changing models, prompts, chunk sizes, or vector settings.
+The first question is not "is the model bad?" It is "did the correct evidence exist, get retrieved,
+survive filtering and context assembly, and constrain the answer?"
+<!-- mustflow-section: use-when -->
+## Use When
+- A RAG answer, knowledge-base answer, grounded support bot, citation answer, document QA flow, or
+  retrieval-augmented agent is wrong, stale, unsupported, too slow, leaking data, citing the wrong
+  source, refusing answerable questions, or answering unanswerable questions.
+- The failure is not yet localized to source availability, parsing, chunking, embedding, indexing,
+  filters, vector or keyword retrieval, hybrid fusion, reranking, context packing, prompt assembly,
+  generation, validators, citations, answerability, or access control.
+- A review would otherwise tune the model, top-k, chunk size, reranker, or prompt before proving
+  which RAG layer failed.
+<!-- mustflow-section: do-not-use-when -->
+## Do Not Use When
+- The failure is already localized to retrieval mechanics, filters, ANN, embeddings, or vector DB
+  behavior; use `vector-search-integrity-review`.
+- The failure is already localized to unsupported claims, citations, answerability, evidence IDs, or
+  validators; use `llm-hallucination-control-review`.
+- The failure is already localized to prompt structure, tool policy, output schema, or model runtime
+  settings; use `prompt-contract-quality-review`.
+- The task asks for production document dumps, raw embeddings, private prompts, customer text, or
+  tenant-identifying data. Use ids, hashes, safe synthetic fixtures, aggregate metrics, and redacted
+  traces.
+<!-- mustflow-section: required-inputs -->
+## Required Inputs
+- Symptom classification: missing correct document, correct document retrieved but unused, stale
+  answer, unsupported answer, wrong citation, access-control leak, over-refusal, under-refusal,
+  latency, cost, or nondeterministic result.
+- Trace ledger: trace id, original question, normalized question, rewritten query, user or tenant
+  context, filters, embedding model version, index version, candidate ids and scores, reranker
+  output, final context ids and order, prompt version, model version, answer, citations, validators,
+  latency, and cost when safe.
+- Source ledger: authoritative source availability, parsed text, chunk boundaries, metadata, title,
+  section path, version, effective dates, stale or deleted documents, duplicates, and conflicting
+  sources.
+- Comparison ledger: no-retrieval answer, retrieved-context answer, human-selected gold-context
+  answer, exact or keyword search result, vector search result, hybrid result, and expected
+  answerability state.
+- Eval ledger: real failed queries, unanswerable questions, stale-doc cases, conflicting-doc cases,
+  similar-name cases, IDs and error codes, multilingual or typo cases, multi-hop cases, and
+  unauthorized-doc cases.
+- Privacy ledger: raw text, prompts, embeddings, tenant ids, user ids, provider payloads, and which
+  evidence can be stored or reported safely.
+<!-- mustflow-section: preconditions -->
+## Preconditions
+- The task matches the Use When conditions and does not match the Do Not Use When exclusions.
+- Higher-priority instructions and `.mustflow/config/commands.toml` have been checked.
+- Raw documents, prompts, embeddings, user data, and tenant-identifying payloads are not copied into
+  docs, tests, commits, or reports unless they are safe synthetic fixtures.
+<!-- mustflow-section: allowed-edits -->
+## Allowed Edits
+- Add or tighten trace fields, fixture queries, parsing checks, chunk metadata, duplicate or stale
+  source handling, retrieval comparisons, filter checks, context-packing rules, prompt source
+  separation, citation validators, answerability states, dirty eval fixtures, metrics, docs, and
+  directly synchronized templates.
+- Add safe synthetic fixtures for missing-doc, correct-doc-unused, stale-doc, conflicting-doc,
+  unauthorized-doc, exact-id, keyword, vector, hybrid, reranker, citation, and abstain behavior.
+- Do not change models, re-embed data, rebuild production indexes, widen access filters, disable
+  authorization, dump private corpora, or claim quality improvement before the failing layer is
+  localized.
+<!-- mustflow-section: procedure -->
+## Procedure
+1. Classify the symptom. Separate no source, bad parsing, bad chunking, retrieval miss, filter miss,
+   reranker loss, context truncation, prompt misuse, generation drift, false citation, stale answer,
+   access leak, latency, cost, and answerability errors.
+2. Preserve one safe end-to-end trace. Keep ids, versions, scores, context order, prompt version,
+   model version, validator result, latency, and cost. Redact raw sensitive content.
+3. Prove the answer exists in the source of truth. Do not tune retrieval for an answer that is not in
+   the indexed corpus or an allowed tool.
+4. Inspect parsed text before original documents. Tables, PDFs, multi-column pages, code blocks,
+   headers, footers, and OCR can be broken even when the original file looks correct.
+5. Inspect chunk boundaries and metadata. Verify title, parent section, version, dates, audience,
+   product, source authority, and neighboring context survive into chunks or parent retrieval.
+6. Compare source versions and deletes. Duplicates, obsolete documents, tombstones, and conflicting
+   effective dates must not be silently mixed into one answer.
+7. Run the isolation comparison when evidence is available: no retrieval, current retrieved context,
+   and human-selected gold context. Gold-context failure points to generation or prompt; current
+   context failure with gold success points to retrieval or context assembly.
+8. Compare keyword, vector, hybrid, and exact-id retrieval by data shape. IDs, error codes, SKUs,
+   names, dates, and numbers need exact or lexical safeguards; semantic questions may need vector or
+   hybrid retrieval.
+9. Check filters before blaming embeddings. Record pre-filter candidate count, post-filter count,
+   tenant and permission filters, metadata types, time zones, empty arrays, case sensitivity, and
+   stale policy copies.
+10. Check reranker candidate starvation. If the correct source never enters the candidate set, the
+    reranker cannot fix it. If it enters and then drops, inspect reranker inputs and scoring.
+11. Check context assembly. Verify `top_k`, score thresholds, source order, truncation, deduping,
+    conflict handling, source authority, and whether important evidence is buried or cut off.
+12. Check prompt construction. User input, retrieved text, examples, tool observations, and system or
+    developer instructions must remain separated. Retrieved text is data, not authority.
+13. Check answerability and abstain behavior. Track no-evidence, low-confidence, conflicting-source,
+    stale-source, access-denied, tool-failed, and needs-human states separately.
+14. Validate citations claim-by-claim. A citation id proves nothing unless the cited chunk supports
+    the specific generated claim.
+15. Measure each layer separately. Track parsing success, index freshness, Recall@k, MRR or nDCG,
+    rerank survival, context token budget, answer accuracy, citation accuracy, abstain accuracy,
+    access leaks, and retrieval/rerank/generation latency and cost.
+16. Use dirty eval cases from real failures. Include typos, abbreviations, multilingual questions,
+    unanswerable questions, date-sensitive questions, similar names, product codes, multi-hop
+    questions, unauthorized documents, stale documents, and conflicting documents.
+17. Apply the smallest localized fix and switch to the narrower matching skill for retrieval,
+    hallucination control, prompt contract, token cost, latency, access control, or prompt-injection
+    defense once the boundary is known.
+<!-- mustflow-section: postconditions -->
+## Postconditions
+- The RAG failure is localized to ingestion, parsing, chunking, indexing, retrieval, filters,
+  reranking, context assembly, prompt construction, generation, citation validation, answerability,
+  access control, or a named evidence gap.
+- Trace, source, comparison, eval, metric, and privacy ledgers are explicit where relevant.
+- Model, prompt, chunk, top-k, reranker, or index changes are justified by layer evidence rather than
+  by general "RAG quality" claims.
+<!-- mustflow-section: verification -->
+## Verification
+Use configured oneshot command intents when available:
+- `changes_status`
+- `changes_diff_summary`
+- `lint`
+- `build`
+- `test_related`
+- `test`
+- `docs_validate_fast`
+- `test_release`
+- `mustflow_check`
+Prefer the narrowest configured eval, fixture, schema, docs, package, or release check that proves
+the localized RAG boundary. Report missing retrieval, gold-context, citation, answerability,
+privacy, latency, or production-index evidence instead of inventing live diagnostics.
+<!-- mustflow-section: failure-handling -->
+## Failure Handling
+- If the end-to-end trace cannot be reconstructed, report the missing trace fields before tuning
+  models, prompts, chunks, filters, or retrieval parameters.
+- If evidence contains raw private text, embeddings, prompts, personal data, or tenant-identifying
+  data, redact to ids, hashes, dimensions, scores, snippets from safe fixtures, and aggregate
+  metrics.
+- If the fix requires model replacement, re-embedding, index rebuild, private corpus access, or live
+  provider calls outside the command contract, report the manual boundary.
+- If retrieved text can inject instructions, pause RAG quality work and apply
+  `external-prompt-injection-defense`.
+<!-- mustflow-section: output-format -->
+## Output Format
+- RAG pipeline triaged
+- Symptom classification and localized boundary
+- Trace, source, comparison, eval, metric, and privacy ledgers
+- Ingestion, parsing, chunking, retrieval, filter, rerank, context, prompt, generation, citation,
+  answerability, access-control, latency, and cost findings
+- Fix applied or recommended
+- Evidence level: end-to-end trace, gold-context comparison, configured-test evidence, static review
+  risk, manual-only, missing, or not applicable
+- Command intents run
+- Skipped diagnostics and reasons
+- Remaining RAG pipeline risk