mustflow 2.85.4 → 2.99.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. package/dist/cli/commands/script-pack.js +10 -0
  2. package/dist/cli/i18n/en.js +183 -0
  3. package/dist/cli/i18n/es.js +183 -0
  4. package/dist/cli/i18n/fr.js +183 -0
  5. package/dist/cli/i18n/hi.js +183 -0
  6. package/dist/cli/i18n/ko.js +183 -0
  7. package/dist/cli/i18n/zh.js +183 -0
  8. package/dist/cli/lib/script-pack-registry.js +284 -1
  9. package/dist/cli/script-packs/code-change-impact.js +6 -0
  10. package/dist/cli/script-packs/code-import-cycle.js +193 -0
  11. package/dist/cli/script-packs/docs-link-integrity.js +145 -0
  12. package/dist/cli/script-packs/repo-approval-gate.js +100 -0
  13. package/dist/cli/script-packs/repo-git-ignore-audit.js +119 -0
  14. package/dist/cli/script-packs/repo-manifest-lock-drift.js +122 -0
  15. package/dist/cli/script-packs/repo-merge-conflict-scan.js +123 -0
  16. package/dist/cli/script-packs/repo-skill-route-audit.js +86 -0
  17. package/dist/cli/script-packs/repo-version-source.js +92 -0
  18. package/dist/cli/script-packs/test-performance-report.js +247 -0
  19. package/dist/cli/script-packs/test-regression-selector.js +167 -0
  20. package/dist/core/change-impact.js +23 -51
  21. package/dist/core/change-surface-classification.js +198 -0
  22. package/dist/core/docs-link-integrity.js +443 -0
  23. package/dist/core/import-cycle.js +152 -0
  24. package/dist/core/public-json-contracts.js +116 -0
  25. package/dist/core/repo-approval-gate.js +116 -0
  26. package/dist/core/repo-git-ignore-audit.js +302 -0
  27. package/dist/core/repo-manifest-lock-drift.js +321 -0
  28. package/dist/core/repo-merge-conflict-scan.js +335 -0
  29. package/dist/core/repo-version-source.js +82 -0
  30. package/dist/core/script-pack-suggestions.js +77 -1
  31. package/dist/core/skill-route-audit.js +354 -0
  32. package/dist/core/test-performance-report.js +697 -0
  33. package/dist/core/test-regression-selector.js +335 -0
  34. package/package.json +1 -1
  35. package/schemas/README.md +40 -2
  36. package/schemas/change-impact-report.schema.json +35 -1
  37. package/schemas/import-cycle-report.schema.json +157 -0
  38. package/schemas/link-integrity-report.schema.json +176 -0
  39. package/schemas/repo-approval-gate-report.schema.json +115 -0
  40. package/schemas/repo-git-ignore-audit-report.schema.json +201 -0
  41. package/schemas/repo-manifest-lock-drift-report.schema.json +202 -0
  42. package/schemas/repo-merge-conflict-scan-report.schema.json +169 -0
  43. package/schemas/repo-version-source-report.schema.json +127 -0
  44. package/schemas/skill-route-audit-report.schema.json +144 -0
  45. package/schemas/test-performance-report.schema.json +319 -0
  46. package/schemas/test-regression-selector-report.schema.json +187 -0
  47. package/templates/default/i18n.toml +66 -18
  48. package/templates/default/locales/en/.mustflow/skills/INDEX.md +45 -8
  49. package/templates/default/locales/en/.mustflow/skills/api-access-control-review/SKILL.md +48 -27
  50. package/templates/default/locales/en/.mustflow/skills/api-failure-triage/SKILL.md +270 -0
  51. package/templates/default/locales/en/.mustflow/skills/auth-flow-triage/SKILL.md +192 -0
  52. package/templates/default/locales/en/.mustflow/skills/auth-permission-change/SKILL.md +59 -13
  53. package/templates/default/locales/en/.mustflow/skills/backend-log-evidence-review/SKILL.md +14 -5
  54. package/templates/default/locales/en/.mustflow/skills/cache-integrity-review/SKILL.md +30 -15
  55. package/templates/default/locales/en/.mustflow/skills/change-blast-radius-review/SKILL.md +45 -32
  56. package/templates/default/locales/en/.mustflow/skills/ci-pipeline-triage/SKILL.md +200 -0
  57. package/templates/default/locales/en/.mustflow/skills/clarifying-question-gate/SKILL.md +87 -13
  58. package/templates/default/locales/en/.mustflow/skills/docker-runtime-triage/SKILL.md +191 -0
  59. package/templates/default/locales/en/.mustflow/skills/go-code-change/SKILL.md +18 -13
  60. package/templates/default/locales/en/.mustflow/skills/line-ending-hygiene/SKILL.md +18 -10
  61. package/templates/default/locales/en/.mustflow/skills/llm-hallucination-control-review/SKILL.md +4 -1
  62. package/templates/default/locales/en/.mustflow/skills/motion-system-contract-review/SKILL.md +155 -0
  63. package/templates/default/locales/en/.mustflow/skills/next-action-menu/SKILL.md +177 -0
  64. package/templates/default/locales/en/.mustflow/skills/observability-debuggability-review/SKILL.md +15 -7
  65. package/templates/default/locales/en/.mustflow/skills/payment-integrity-review/SKILL.md +59 -35
  66. package/templates/default/locales/en/.mustflow/skills/powershell-code-change/SKILL.md +16 -6
  67. package/templates/default/locales/en/.mustflow/skills/prompt-contract-quality-review/SKILL.md +4 -1
  68. package/templates/default/locales/en/.mustflow/skills/python-code-change/SKILL.md +19 -10
  69. package/templates/default/locales/en/.mustflow/skills/rag-pipeline-triage/SKILL.md +206 -0
  70. package/templates/default/locales/en/.mustflow/skills/routes.toml +54 -0
  71. package/templates/default/locales/en/.mustflow/skills/rust-code-change/SKILL.md +10 -4
  72. package/templates/default/locales/en/.mustflow/skills/search-index-integrity-review/SKILL.md +181 -0
  73. package/templates/default/locales/en/.mustflow/skills/service-boundary-architecture/SKILL.md +37 -23
  74. package/templates/default/locales/en/.mustflow/skills/test-suite-performance-review/SKILL.md +9 -0
  75. package/templates/default/locales/en/.mustflow/skills/typescript-code-change/SKILL.md +14 -9
  76. package/templates/default/locales/en/.mustflow/skills/vector-search-integrity-review/SKILL.md +209 -0
  77. package/templates/default/locales/en/.mustflow/skills/version-freshness-check/SKILL.md +16 -14
  78. package/templates/default/manifest.toml +64 -1
@@ -2,7 +2,7 @@
2
2
  mustflow_doc: skill.payment-integrity-review
3
3
  locale: en
4
4
  canonical: true
5
- revision: 1
5
+ revision: 2
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: payment-integrity-review
@@ -53,14 +53,23 @@ Review payment code as money-event integrity, not provider API success. The core
53
53
  ## Required Inputs
54
54
 
55
55
  - Money-event ledger: every create, authorize, capture, fulfill, refund, dispute, chargeback, settlement, adjustment, cancellation, expiration, and entitlement event that can move money or access.
56
- - Provider interaction ledger: payment provider calls, webhook event types, redirect handlers, polling, reconciliation jobs, SDK clients, idempotency keys, provider object IDs, and provider environment selection.
57
- - State-transition ledger: internal states, provider states, allowed transitions, terminal states, retry states, async states, and transition owners.
56
+ - Provider interaction ledger: payment provider calls, webhook event types, redirect handlers,
57
+ polling, reconciliation jobs, SDK clients, idempotency keys, internal order IDs, internal payment
58
+ IDs, attempt IDs, provider object IDs, provider reference IDs, and provider environment selection.
59
+ - State-transition ledger: internal states, provider states, allowed transitions, terminal states,
60
+ retry states, async states, hold states, kill-switch states, and transition owners.
61
+ - Event log ledger: request submission, provider response, redirect, webhook receipt, webhook
62
+ application, state transition, queue handoff, reconciliation decision, fulfillment, refund,
63
+ dispute, admin override, and correction events with ordering, actor, reason, and immutable
64
+ evidence.
58
65
  - Idempotency and uniqueness ledger: logical operation IDs, provider idempotency keys, database uniqueness constraints, webhook event dedupe keys, fulfillment dedupe keys, and retry behavior.
59
66
  - Amount and currency ledger: product/cart snapshot, server-side calculation path, quantity, discounts, coupons, tax, shipping, minor-unit representation, currency, provider amount, internal ledger amount, receipt amount, and settlement amount.
60
67
  - Ownership ledger: user, tenant, account, order, payment session, refund, subscription, invoice, entitlement, admin actor, and provider customer ownership checks.
61
68
  - Fulfillment and entitlement ledger: when access, inventory, shipment, credits, licenses, notifications, or downstream side effects are granted or revoked.
62
69
  - Webhook and retry ledger: signature verification, raw-body handling, event storage, queue handoff, duplicate and out-of-order handling, timeout classification, backoff, and dead-letter behavior.
63
- - Audit and sensitive-data ledger: logs, metrics, traces, admin overrides, before/after values, reason fields, approval paths, rollback paths, and payment-sensitive data redaction.
70
+ - Audit and sensitive-data ledger: logs, metrics, traces, segmented approval or decline metrics,
71
+ payment-path segments, orphan authorization monitors, admin overrides, before/after values, reason
72
+ fields, approval paths, rollback paths, and payment-sensitive data redaction.
64
73
 
65
74
  <!-- mustflow-section: preconditions -->
66
75
  ## Preconditions
@@ -80,40 +89,55 @@ Review payment code as money-event integrity, not provider API success. The core
80
89
  ## Procedure
81
90
 
82
91
  1. Model payment as a state machine. Reject a single `paid`, `success`, or `active` boolean when the code must distinguish created, pending, authorized, captured, failed, cancelled, expired, refunded, partially refunded, disputed, unpaid, retrying, grace, or settled states.
83
- 2. Calculate amount on the server. Treat client-supplied amount, currency, quantity, discount, coupon, tax, shipping, product ID, plan ID, or cart totals as input claims only; rebuild the payable total from trusted product, cart, account, and policy snapshots.
84
- 3. Bind every payment object to its owner. Verify user, tenant, order, payment session, refund, subscription, invoice, provider customer, and admin actor ownership before read, write, refund, cancel, fulfillment, or entitlement changes.
85
- 4. Compare every amount ledger. Trace order amount, provider request amount, provider response amount, internal money ledger, receipt, settlement, fee, refund, and entitlement amount. Flag any path where one amount can drift without reconciliation.
86
- 5. Use integer minor units. Reject float, double, string-concatenated, rounded-late, locale-formatted, or JavaScript-number money math when it can cross currency or precision boundaries.
87
- 6. Make payment creation idempotent. Use a stable key for one logical payment attempt, not a fresh UUID per retry. Include operation identity such as order or attempt ID, but do not include secrets or raw personal data.
88
- 7. Use database uniqueness as the last gate. Add or verify unique constraints for provider payment IDs, provider session IDs, provider event IDs, provider refund IDs, internal ledger IDs, and fulfillment records where duplicates would move money or access twice.
89
- 8. Assume webhooks are duplicated. Store event IDs or object/type pairs before applying effects, make handlers idempotent, and treat duplicate delivery as expected behavior.
90
- 9. Assume webhooks are out of order. Do not let a late `created`, `pending`, or stale failure event overwrite a captured, refunded, disputed, or terminal internal state. Re-fetch provider state when event order is insufficient.
91
- 10. Verify webhook signatures on the raw body. Check signatures before JSON mutation, parsing wrappers, body normalizers, or middleware that changes bytes. Do not keep a debug path that disables signature verification.
92
- 11. Return from webhook endpoints quickly. Persist the event, enqueue durable work, and return a provider-acceptable response without doing slow fulfillment, network fan-out, file work, or long transactions in the webhook request.
93
- 12. Never use success redirects as proof. Treat checkout success pages, return URLs, frontend callbacks, and local storage flags as user navigation only; fulfillment must depend on verified provider state or signed server-side evidence.
94
- 13. Run fulfillment exactly once. Guard entitlement grants, shipments, credit issuance, license creation, invoice finalization, emails with money meaning, and inventory release with unique records or state transitions.
95
- 14. Handle asynchronous payment methods. Do not fulfill on checkout completion when the provider can still move through pending, requires_action, processing, delayed success, delayed failure, or expiry states.
96
- 15. Separate authorization from capture. Do not treat an authorization hold as captured money. Review capture windows, partial captures, expired authorizations, cancellations, and post-authorization amount changes.
97
- 16. Review refunds as money-out events. Check partial refunds, double refunds, refund failures, refund idempotency, refund ownership, refund amount/currency, ledger reversal, entitlement revocation, and receipt updates.
98
- 17. Handle disputes and chargebacks. Ensure dispute events affect access, account risk, support workflow, ledger entries, settlement reports, and customer-visible state without pretending the original capture still stands unchanged.
99
- 18. Review subscriptions as state machines. Separate trialing, active, past_due, grace, unpaid, cancelled, pending cancellation, retrying, upgraded, downgraded, invoice-open, invoice-paid, and invoice-failed states.
100
- 19. Reserve inventory before confirming it. Check that payment, inventory, cancellation, expiration, refund, and fulfillment cannot oversell, lose stock, or keep stock reserved after an abandoned payment.
101
- 20. Reserve coupons before consuming them. Under concurrent attempts, a coupon should not be spent twice or lost forever after a failed or expired payment. Review reservation, consumption, release, and expiry paths.
102
- 21. Treat timeouts as unknown outcomes. A provider timeout after request submission is not a failure proof. Verify by idempotency key, provider object lookup, webhook, or reconciliation before retrying or cancelling.
103
- 22. Classify retries by failure kind. Separate retryable network failures, provider rate limits, validation failures, insufficient funds, authentication failures, duplicate operation responses, and unknown outcomes with bounded backoff.
104
- 23. Keep an append-only money ledger. Prefer immutable entries for payment, capture, refund, fee, settlement, chargeback, adjustment, and correction. Flag mutable balance-only code with no event history.
105
- 24. Reconcile provider and internal state. Check scheduled or manual reconciliation for missed webhooks, stale internal states, provider-side refunds, settlement fees, disputes, and permanently unknown operations.
106
- 25. Redact payment-sensitive data. Never log card numbers, CVV, track data, PINs, raw payment credentials, webhook secrets, bearer tokens, provider secret keys, or full provider payloads containing sensitive fields.
107
- 26. Separate test and live payment planes. Verify API keys, webhook secrets, product IDs, price IDs, environment flags, provider account IDs, and fixtures cannot cross between test and live modes.
108
- 27. Audit manual payment operations. Require role, reason, target object, before/after values, approver or policy evidence, operator identity, timestamp, and rollback or correction path for admin overrides.
109
- 28. Search for stale payment endpoints. Review old checkout paths, hidden callback URLs, deprecated provider versions, old mobile endpoints, webhook v1 handlers, and manual scripts that still mutate money state.
110
- 29. Check public errors and support evidence. Payment failures must not lie about success, leak sensitive payment facts, or leave support with no safe correlation ID, provider object ID, or internal event ID.
111
- 30. Test the nightmare paths. Include repeated pay-button clicks, replayed webhooks, out-of-order webhooks, success redirect plus database failure, database success plus provider timeout, amount or currency tampering, wrong order ID, concurrent double refund, pay then cancel, expired-session completion, subscription retry, provider missed webhook, and admin override rollback.
92
+ 2. Separate the identifiers. Do not let one `order_id` or provider session ID stand in for every
93
+ concept. Track order ID, payment ID, attempt ID, provider customer ID, provider payment/session
94
+ ID, provider refund ID, provider event ID, and internal ledger entry ID separately so retries,
95
+ provider redirects, webhooks, refunds, and reconciliation do not overwrite each other.
96
+ 3. Keep an immutable event trail. Store request submission, provider response, redirect, webhook,
97
+ state transition, queue handoff, reconciliation, fulfillment, refund, dispute, and admin override
98
+ events with actor, reason, timestamp, provider reference, and before/after state when relevant.
99
+ 4. Calculate amount on the server. Treat client-supplied amount, currency, quantity, discount, coupon, tax, shipping, product ID, plan ID, or cart totals as input claims only; rebuild the payable total from trusted product, cart, account, and policy snapshots.
100
+ 5. Bind every payment object to its owner. Verify user, tenant, order, payment session, refund, subscription, invoice, provider customer, and admin actor ownership before read, write, refund, cancel, fulfillment, or entitlement changes.
101
+ 6. Compare every amount ledger. Trace order amount, provider request amount, provider response amount, internal money ledger, receipt, settlement, fee, refund, and entitlement amount. Flag any path where one amount can drift without reconciliation.
102
+ 7. Use integer minor units. Reject float, double, string-concatenated, rounded-late, locale-formatted, or JavaScript-number money math when it can cross currency or precision boundaries.
103
+ 8. Make payment creation idempotent. Use a stable key for one logical payment attempt, not a fresh UUID per retry. Include operation identity such as order or attempt ID, but do not include secrets or raw personal data.
104
+ 9. Use database uniqueness as the last gate. Add or verify unique constraints for provider payment IDs, provider session IDs, provider event IDs, provider refund IDs, internal ledger IDs, and fulfillment records where duplicates would move money or access twice.
105
+ 10. Assume webhooks are duplicated. Store event IDs or object/type pairs before applying effects, make handlers idempotent, and treat duplicate delivery as expected behavior.
106
+ 11. Assume webhooks are out of order. Do not let a late `created`, `pending`, or stale failure event overwrite a captured, refunded, disputed, or terminal internal state. Re-fetch provider state when event order is insufficient.
107
+ 12. Verify webhook signatures on the raw body. Check signatures before JSON mutation, parsing wrappers, body normalizers, or middleware that changes bytes. Do not keep a debug path that disables signature verification.
108
+ 13. Return from webhook endpoints quickly. Persist the event, enqueue durable work, and return a provider-acceptable response without doing slow fulfillment, network fan-out, file work, or long transactions in the webhook request.
109
+ 14. Never use success redirects as proof. Treat checkout success pages, return URLs, frontend callbacks, and local storage flags as user navigation only; fulfillment must depend on verified provider state or signed server-side evidence.
110
+ 15. Run fulfillment exactly once. Guard entitlement grants, shipments, credit issuance, license creation, invoice finalization, emails with money meaning, and inventory release with unique records or state transitions.
111
+ 16. Handle asynchronous payment methods. Do not fulfill on checkout completion when the provider can still move through pending, requires_action, processing, delayed success, delayed failure, or expiry states.
112
+ 17. Separate authorization from capture. Do not treat an authorization hold as captured money. Review capture windows, partial captures, expired authorizations, cancellations, orphan authorized-but-not-captured operations, and post-authorization amount changes.
113
+ 18. Review refunds as money-out events. Check requested, pending, completed, failed, cancelled, and partial refund states; double refunds; refund failures; refund idempotency; refund ownership; refund amount/currency; ledger reversal; entitlement revocation; and receipt updates.
114
+ 19. Handle disputes and chargebacks. Ensure dispute events affect access, account risk, support workflow, ledger entries, settlement reports, and customer-visible state without pretending the original capture still stands unchanged.
115
+ 20. Review subscriptions as state machines. Separate trialing, active, past_due, grace, unpaid, cancelled, pending cancellation, retrying, upgraded, downgraded, invoice-open, invoice-paid, and invoice-failed states.
116
+ 21. Reserve inventory before confirming it. Check that payment, inventory, cancellation, expiration, refund, and fulfillment cannot oversell, lose stock, or keep stock reserved after an abandoned payment.
117
+ 22. Reserve coupons before consuming them. Under concurrent attempts, a coupon should not be spent twice or lost forever after a failed or expired payment. Review reservation, consumption, release, and expiry paths.
118
+ 23. Treat timeouts as unknown outcomes. A provider timeout after request submission is not a failure proof. Verify by idempotency key, provider object lookup, webhook, or reconciliation before retrying or cancelling.
119
+ 24. Classify retries by failure kind. Separate retryable network failures, provider rate limits, validation failures, authentication-required states, insufficient funds, issuer declines, suspected fraud, duplicate operation responses, and unknown outcomes with bounded backoff.
120
+ 25. Segment the payment path. When diagnosing approval rate or decline spikes, separate frontend validation, backend request creation, provider gateway, acquirer, card network, issuer, bank, 3DS or additional authentication, and settlement evidence instead of reading one blended failure count.
121
+ 26. Keep an append-only money ledger. Prefer immutable entries for payment, capture, refund, fee, settlement, chargeback, adjustment, and correction. Flag mutable balance-only code with no event history.
122
+ 27. Reconcile provider and internal state. Check scheduled or manual reconciliation for missed webhooks, stale internal states, provider-side refunds, settlement fees, disputes, orphan authorizations, and permanently unknown operations.
123
+ 28. Redact payment-sensitive data. Never log card numbers, CVV, track data, PINs, raw payment credentials, webhook secrets, bearer tokens, provider secret keys, or full provider payloads containing sensitive fields.
124
+ 29. Separate test and live payment planes. Verify API keys, webhook secrets, product IDs, price IDs, environment flags, provider account IDs, and fixtures cannot cross between test and live modes.
125
+ 30. Audit manual payment operations. Require role, reason, target object, before/after values, approver or policy evidence, operator identity, timestamp, and rollback or correction path for admin overrides.
126
+ 31. Add a payment hold or kill-switch path for unsafe flows. Risky provider migrations, webhook
127
+ regressions, reconciliation uncertainty, fraud spikes, or duplicate-money incidents need a way
128
+ to hold fulfillment, stop captures, pause refunds, or disable a provider path without corrupting
129
+ ledger state.
130
+ 32. Search for stale payment endpoints. Review old checkout paths, hidden callback URLs, deprecated provider versions, old mobile endpoints, webhook v1 handlers, and manual scripts that still mutate money state.
131
+ 33. Check public errors and support evidence. Payment failures must not lie about success, leak sensitive payment facts, or leave support with no safe correlation ID, provider object ID, or internal event ID.
132
+ 34. Test the nightmare paths. Include repeated pay-button clicks, replayed webhooks, out-of-order webhooks, success redirect plus database failure, database success plus provider timeout, amount or currency tampering, wrong order ID, concurrent double refund, pay then cancel, expired-session completion, subscription retry, provider missed webhook, orphan authorization cleanup, provider kill switch or hold state, and admin override rollback.
112
133
 
113
134
  <!-- mustflow-section: postconditions -->
114
135
  ## Postconditions
115
136
 
116
- - The payment surface has a money-event map, provider interaction map, state-transition map, idempotency and uniqueness map, amount and currency map, ownership map, fulfillment and entitlement map, webhook/retry map, and audit/sensitive-data map.
137
+ - The payment surface has a money-event map, provider interaction map, identifier map,
138
+ state-transition map, immutable event log, idempotency and uniqueness map, amount and currency map,
139
+ ownership map, fulfillment and entitlement map, webhook/retry map, reconciliation and hold-state
140
+ map, and audit/sensitive-data map.
117
141
  - Any false success, duplicate money movement, duplicate fulfillment, wrong-owner action, wrong amount, wrong currency, stale event overwrite, timeout misclassification, or missing reconciliation is fixed or reported with evidence.
118
142
  - Tests or explicit verification cover the highest-risk nightmare paths available in the current scope.
119
143
 
@@ -148,7 +172,7 @@ Prefer focused tests for duplicate operations, webhook replay, out-of-order even
148
172
  ## Output Format
149
173
 
150
174
  - Payment surface and provider boundary reviewed
151
- - Money-event, provider, state, idempotency, amount, ownership, fulfillment, webhook, retry, audit, and sensitive-data ledgers
175
+ - Money-event, provider, identifier, state, event-log, idempotency, amount, ownership, fulfillment, webhook, retry, reconciliation, hold-state, audit, and sensitive-data ledgers
152
176
  - Findings or fixes for duplicate, late, out-of-order, wrong-actor, wrong-amount, wrong-currency, timeout, retry, reconciliation, and audit risks
153
177
  - Nightmare-path tests or evidence added, run, skipped, or still missing
154
178
  - Command intents run
@@ -2,7 +2,7 @@
2
2
  mustflow_doc: skill.powershell-code-change
3
3
  locale: en
4
4
  canonical: true
5
- revision: 1
5
+ revision: 2
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: powershell-code-change
@@ -29,7 +29,7 @@ metadata:
29
29
  <!-- mustflow-section: purpose -->
30
30
  ## Purpose
31
31
 
32
- Preserve PowerShell parsing, quoting, native argument passing, script portability, and command-injection boundaries.
32
+ Preserve PowerShell parsing, quoting, native argument passing, deterministic file rewrites, script portability, and command-injection boundaries.
33
33
 
34
34
  PowerShell quoting bugs usually come from parser layering, not from one wrong quote character. A command may be parsed by the host shell, PowerShell expression or argument mode, PowerShell string expansion, and then a native program parser such as `git.exe`, `cmd.exe`, `ssh`, `curl`, `python`, or `node`.
35
35
 
@@ -40,6 +40,7 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
40
40
  - PowerShell strings, here-strings, interpolation, splatting, parameter binding, call operator usage, `Start-Process`, `Invoke-Expression`, `--`, `--%`, `-Command`, `-File`, `-EncodedCommand`, or stdin execution changes.
41
41
  - PowerShell code calls native commands such as `git.exe`, `cmd.exe`, `.bat`, `.cmd`, `ssh`, `curl`, `python`, `node`, `npm`, `bun`, `docker`, `winget`, `msiexec`, or vendor CLIs.
42
42
  - Regex, wildcard, replacement strings, JSON, YAML, XML, SQL, paths with spaces, literal `$`, literal quotes, or shell metacharacters are passed through PowerShell.
43
+ - PowerShell is used for mechanical repository file rewrites, text replacement, generated-file updates, or line-ending-sensitive edits.
43
44
 
44
45
  <!-- mustflow-section: do-not-use-when -->
45
46
  ## Do Not Use When
@@ -56,6 +57,7 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
56
57
  - Invocation path: direct script, module import, profile load, package script, CI step, scheduled task, `pwsh -Command`, `pwsh -File`, stdin, encoded command, or another shell invoking PowerShell.
57
58
  - Parser layers involved: host shell, PowerShell expression mode, PowerShell argument mode, expandable or verbatim strings, native command parser, regex parser, wildcard parser, replacement parser, JSON/YAML/XML/SQL parser, or remote shell.
58
59
  - Native command boundary: executable path, argument list, wrapper extension, expected argv shape, whether literal quote characters are required, and whether `$PSNativeCommandArgumentPassing` affects behavior.
60
+ - File rewrite boundary: target file policy, expected encoding, expected newline style, replacement count, whether the file may contain mixed line endings, and whether the repository declares an EOL policy.
59
61
  - Dynamic input boundaries: user input, paths, URLs, commit messages, regex patterns, replacement strings, JSON bodies, headers, credentials, environment variables, and values that may contain spaces or metacharacters.
60
62
  - Existing test, lint, docs, package, workflow, and command-intent verification surfaces.
61
63
 
@@ -72,9 +74,11 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
72
74
 
73
75
  - Replace string-built command lines with arrays, hashtables, splatting, direct invocation, or repository-local helpers.
74
76
  - Convert fragile multiline commands to splatting or here-strings when behavior stays equivalent.
77
+ - Replace lossy text rewrite pipelines with deterministic read/write APIs that preserve or intentionally normalize encoding and newlines.
75
78
  - Add focused tests or fixtures that prove argv shape, parser behavior, escaping, failure paths, or documented examples.
76
79
  - Update docs, command examples, CI snippets, or package scripts directly tied to the PowerShell behavior being changed.
77
80
  - Do not add `Invoke-Expression`, broad `cmd /c`, broad `--%`, global profile mutation, policy bypasses, or command-string reconstruction to make quoting appear to work.
81
+ - Do not use `Get-Content` piped to `Set-Content`, `Out-File`, or shell redirection for repository-wide mechanical rewrites when line endings, encoding, or BOM behavior matters.
78
82
 
79
83
  <!-- mustflow-section: procedure -->
80
84
  ## Procedure
@@ -106,10 +110,13 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
106
110
  - use single-quoted regex patterns unless PowerShell interpolation is intentional;
107
111
  - escape replacement `$` according to replacement-string rules, not only PowerShell string rules;
108
112
  - escape wildcard metacharacters for wildcard matching even inside single-quoted PowerShell strings.
109
- 20. For cross-shell PowerShell calls, avoid complex inline `-Command` strings. Prefer `-File`, stdin, or an encoded command when the repository already uses that pattern and the encoding boundary is tested. If `-Command` is used, document the host shell and PowerShell parser layers.
110
- 21. Keep paths as path values, not shell fragments. Prefer `-LiteralPath` when wildcard expansion is not intended, and do not compose destructive filesystem actions through a different shell.
111
- 22. Add or reuse verification that observes behavior, not only spelling. Useful evidence includes argv echo fixtures, Pester cases, dry-run output, parser-specific tests, or configured CI/package/docs checks.
112
- 23. Choose configured verification intents that cover the changed script, docs example, package metadata, CI wrapper, public command behavior, and mustflow contract surface.
113
+ 20. For mechanical text replacement, count expected matches before writing. If the count is not exactly the intended number, stop and report the mismatch instead of writing a broad replacement.
114
+ 21. For repository file rewrites, prefer .NET file APIs with explicit encoding over PowerShell content cmdlets when deterministic output matters. Normalize CRLF and lone CR to LF only when the repository policy expects LF, and write UTF-8 without BOM explicitly. Treat `Set-Content -Encoding utf8` as version-sensitive because Windows PowerShell 5.1 and PowerShell 6+ differ in default UTF-8 BOM behavior.
115
+ 22. If line-ending warnings appear after a PowerShell rewrite, do not assume the last read command caused them. Inspect repository EOL policy and per-file EOL evidence, then activate `line-ending-hygiene` for cause analysis or normalization decisions.
116
+ 23. For cross-shell PowerShell calls, avoid complex inline `-Command` strings. Prefer `-File`, stdin, or an encoded command when the repository already uses that pattern and the encoding boundary is tested. If `-Command` is used, document the host shell and PowerShell parser layers.
117
+ 24. Keep paths as path values, not shell fragments. Prefer `-LiteralPath` when wildcard expansion is not intended, and do not compose destructive filesystem actions through a different shell.
118
+ 25. Add or reuse verification that observes behavior, not only spelling. Useful evidence includes argv echo fixtures, Pester cases, dry-run output, parser-specific tests, deterministic encoding or line-ending checks, or configured CI/package/docs checks.
119
+ 26. Choose configured verification intents that cover the changed script, docs example, package metadata, CI wrapper, public command behavior, and mustflow contract surface.
113
120
 
114
121
  <!-- mustflow-section: postconditions -->
115
122
  ## Postconditions
@@ -117,6 +124,7 @@ PowerShell quoting bugs usually come from parser layering, not from one wrong qu
117
124
  - The parser layers and target command type are explicit.
118
125
  - Literal strings, expandable strings, here-strings, regex patterns, wildcard patterns, replacement strings, paths, and native argv are not conflated.
119
126
  - Native command calls keep executable path and arguments separated unless a documented target requires otherwise.
127
+ - Mechanical rewrites have explicit replacement counts, encoding, and newline decisions.
120
128
  - Dynamic values remain data-bound and are not reinterpreted as shell code.
121
129
  - PowerShell version, native argument passing mode, and cross-shell boundaries are verified or reported as remaining risks.
122
130
 
@@ -145,6 +153,7 @@ Report missing PowerShell-version, argv-shape, Pester, CI-shell, Windows-native-
145
153
  - If regex, wildcard, or replacement escaping breaks, test the second parser explicitly instead of only changing PowerShell string quotes.
146
154
  - If cross-shell `pwsh -Command` quoting becomes complex, move the script body to `-File`, stdin, or a tested encoded boundary rather than adding another quoting layer.
147
155
  - If untrusted input is interpolated into command text, treat it as a command-injection risk and restructure around argument binding.
156
+ - If a rewrite must preserve or normalize LF, avoid version-sensitive content cmdlets and use an explicit writer. If line-ending policy is absent, report the missing policy instead of silently normalizing.
148
157
 
149
158
  <!-- mustflow-section: output-format -->
150
159
  ## Output Format
@@ -152,6 +161,7 @@ Report missing PowerShell-version, argv-shape, Pester, CI-shell, Windows-native-
152
161
  - PowerShell version and invocation boundary
153
162
  - Parser ledger
154
163
  - String, here-string, regex, wildcard, replacement, and native argv decisions
164
+ - File rewrite, encoding, and newline decisions
155
165
  - Files changed
156
166
  - Command intents run
157
167
  - Skipped checks and reasons
@@ -2,7 +2,7 @@
2
2
  mustflow_doc: skill.prompt-contract-quality-review
3
3
  locale: en
4
4
  canonical: true
5
- revision: 1
5
+ revision: 2
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: prompt-contract-quality-review
@@ -45,6 +45,9 @@ Review prompts as product contracts, not prose polish. A production prompt shoul
45
45
 
46
46
  - The change is only LLM chat, copilot, citation, streaming, history, or prompt-composer UI; use `llm-service-ux-review`.
47
47
  - The main risk is unsupported factual output, fabricated citations, weak evidence coverage, retrieval thresholds, claim maps, or abstain behavior; use `llm-hallucination-control-review`.
48
+ - The task is an end-to-end RAG failure and it is not yet clear whether ingestion, retrieval,
49
+ context assembly, prompt construction, generation, citation validation, or answerability failed;
50
+ use `rag-pipeline-triage` first.
48
51
  - The main risk is token spend, provider prompt-cache hit rate, chat-history bloat, RAG context size, model routing cost, reasoning budget, retry replay, or cost observability; use `llm-token-cost-control-review`.
49
52
  - The main risk is time to first token, first useful output, streaming latency, LLM round trips, tool wait, prompt-cache latency, model routing speed, realtime continuation, priority tier, predicted-output latency, or user-perceived response speed; use `llm-response-latency-review`.
50
53
  - The main risk is autonomous agent control flow, planner/executor/verifier separation, tool-call gates, approval or interrupt state, durable resume behavior, loop budgets, retry classification, handoffs, guardrails, or trace outcome evaluation; use `agent-execution-control-review`.
@@ -2,7 +2,7 @@
2
2
  mustflow_doc: skill.python-code-change
3
3
  locale: en
4
4
  canonical: true
5
- revision: 3
5
+ revision: 4
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: python-code-change
@@ -94,12 +94,20 @@ Preserve Python runtime, standard-library, packaging, import, async resource, pu
94
94
  - prefer `importlib.resources` for packaged data, `tomllib` for TOML reads, and `Path.walk()` only after checking version support, pruning behavior, symlink recursion, ordering, and cycle risks;
95
95
  - use dataclass options such as `slots`, `frozen`, and `kw_only`, `StrEnum`, `TypedDict`, or `Protocol` only when they match the public shape and runtime/type-checker support;
96
96
  - treat `functools.cache`, `lru_cache`, `cached_property`, `partial`, and Python 3.14+ `Placeholder` as state, memory, concurrency, and versioned-API choices rather than harmless terseness.
97
- 11. Keep process, archive, and concurrency safety explicit:
97
+ 11. Treat newer syntax and typing features as semantic tools, not style trophies:
98
+ - use template string literals only when a handler needs the static and interpolated parts separately, such as SQL builders, shell command objects, logging templates, or markup renderers; do not replace ordinary f-strings when the result is just a string;
99
+ - when runtime code reads annotations, use the supported annotation inspection API and choose the intended format explicitly instead of assuming `__annotations__` already contains runtime values;
100
+ - use sentinel values to distinguish "argument omitted" from `None`, but compare sentinels by identity and keep public signatures readable;
101
+ - prefer `Mapping` or narrower read-only protocols for read-only inputs so immutable mapping implementations are not rejected accidentally;
102
+ - use closed or extra-key `TypedDict` forms only when the supported Python and type-checker versions agree with that shape.
103
+ 12. Keep `finally` as cleanup, not outcome selection. Do not add `return`, `break`, or `continue` inside `finally` blocks because they can mask exceptions and cancellation; move result decisions outside cleanup or make suppression an explicit documented contract.
104
+ 13. Use explicit lazy imports only for startup-sensitive module-scope dependencies after checking version support and import-time side effects. Do not lazily import plugins, registries, monkey patches, model definitions, ORM mappings, or observability setup whose import side effects are part of startup correctness.
105
+ 14. Keep process, archive, and concurrency safety explicit:
98
106
  - subprocess calls use argument lists, checked failure handling, timeouts, bounded captured output, and a narrow `shell=True` exception when the project already permits it;
99
107
  - archive extraction, including `tarfile`, keeps untrusted archive inspection, extraction filters, partial-extract cleanup, and older-runtime defaults visible;
100
108
  - `asyncio.TaskGroup`, `asyncio.timeout`, and `asyncio.to_thread` are used only when their cancellation, timeout, blocking-work, and Python-version semantics fit the surrounding lifecycle.
101
- 12. Use runtime diagnostics as evidence, not as permanent workaround code. Interpreter or library diagnostics such as import timing, `tracemalloc`, `faulthandler`, profiling, and allocation tracing should go through configured diagnostic or verification intents when available, and missing intents should be reported instead of adding ad hoc command recipes to the skill.
102
- 13. Preserve async and resource ownership:
109
+ 15. Use runtime diagnostics as evidence, not as permanent workaround code. Interpreter or library diagnostics such as import timing, `tracemalloc`, `faulthandler`, profiling, and allocation tracing should go through configured diagnostic or verification intents when available, and missing intents should be reported instead of adding ad hoc command recipes to the skill.
110
+ 16. Preserve async and resource ownership:
103
111
  - every coroutine is awaited, returned by contract, or scheduled as an owned and tracked task;
104
112
  - raw background task creation is allowed only through the project's owner or spawn helper, a task group, or an equivalent lifecycle mechanism;
105
113
  - background tasks keep a strong reference, have a shutdown path, and retrieve failures instead of leaving never-retrieved exceptions;
@@ -108,15 +116,15 @@ Preserve Python runtime, standard-library, packaging, import, async resource, pu
108
116
  - context managers and async context managers do not suppress exceptions unless suppression is the feature;
109
117
  - context-manager helpers that catch exceptions for logging re-raise after logging;
110
118
  - early-exit async generators have an explicit close path.
111
- 14. Preserve traceback evidence. Logging inside exception handlers should retain exception information instead of logging only the exception message.
112
- 15. Preserve public contracts:
119
+ 17. Preserve traceback evidence. Logging inside exception handlers should retain exception information instead of logging only the exception message.
120
+ 18. Preserve public contracts:
113
121
  - treat public imports, public signatures, exceptions, return shapes, CLI behavior, entry points, config keys, environment variables, dependency metadata, extras, Python version support, and typing stubs as compatibility-sensitive;
114
122
  - do not change sync functions into async functions, accepted input shapes, nullable behavior, documented exception types, tuple/dict/dataclass return shapes, config precedence, or environment variable semantics without a compatibility review;
115
123
  - typed packages should keep runtime and typing surfaces aligned, including `py.typed` and stubs when present.
116
- 16. Avoid mutable default arguments, broad `except Exception: pass`, broad `BaseException` catches outside process boundaries, global state hidden behind module imports, and path handling that ignores existing `pathlib` or OS conventions.
117
- 17. Use `# type: ignore[...]` only when tightly scoped, justified, and consistent with local policy.
118
- 18. If packaging, public API, CLI, config, or typing contracts change, synchronize README examples, entry point tests, build metadata, docs, fixtures, and downstream-style examples that describe installation or usage.
119
- 19. Choose configured verification intents that cover formatting, lint, type checking, tests, package build, installed-package smoke checks, and CLI smoke risk when available.
124
+ 19. Avoid mutable default arguments, broad `except Exception: pass`, broad `BaseException` catches outside process boundaries, global state hidden behind module imports, `finally` masking, and path handling that ignores existing `pathlib` or OS conventions.
125
+ 20. Use `# type: ignore[...]` only when tightly scoped, justified, and consistent with local policy.
126
+ 21. If packaging, public API, CLI, config, or typing contracts change, synchronize README examples, entry point tests, build metadata, docs, fixtures, and downstream-style examples that describe installation or usage.
127
+ 22. Choose configured verification intents that cover formatting, lint, type checking, tests, package build, installed-package smoke checks, and CLI smoke risk when available.
120
128
 
121
129
  <!-- mustflow-section: postconditions -->
122
130
  ## Postconditions
@@ -152,6 +160,7 @@ Report missing package, type, or test intents rather than inventing raw tool com
152
160
  - If packaging correctness matters but only repository-root tests can run, report that wheel or installed-artifact verification is missing.
153
161
  - If the supported Python version blocks a syntax choice, rewrite to the supported form.
154
162
  - If the supported Python version blocks a standard-library feature, changed default, diagnostic flag, or helper API, use the supported equivalent or report the runtime-support decision instead of silently raising `requires-python`.
163
+ - If template strings, annotation runtime access, lazy imports, sentinels, immutable mappings, or typed extra keys are useful but version-gated, keep a fallback or report the required support bump instead of smuggling the newer feature into a lower-runtime project.
155
164
  - If third-party stubs or package metadata are wrong, document the local workaround and keep it narrow.
156
165
  - If a background task lacks owner, shutdown, strong reference, or exception retrieval, do not add it.
157
166
  - If cancellation or context-manager behavior is swallowed accidentally, restore propagation or document the intentional suppression contract.
@@ -0,0 +1,206 @@
1
+ ---
2
+ mustflow_doc: skill.rag-pipeline-triage
3
+ locale: en
4
+ canonical: true
5
+ revision: 1
6
+ lifecycle: mustflow-owned
7
+ authority: procedure
8
+ name: rag-pipeline-triage
9
+ description: Apply this skill when a RAG, knowledge-base answer, grounded chat, citation answer, retrieval-augmented support bot, or document QA flow is wrong, stale, unsupported, slow, leaking data, over-refusing, or not yet localized to ingestion, parsing, chunking, retrieval, filtering, reranking, context assembly, prompt construction, generation, citation validation, or answerability boundaries.
10
+ metadata:
11
+ mustflow_schema: "1"
12
+ mustflow_kind: procedure
13
+ pack_id: mustflow.core
14
+ skill_id: mustflow.core.rag-pipeline-triage
15
+ command_intents:
16
+ - changes_status
17
+ - changes_diff_summary
18
+ - lint
19
+ - build
20
+ - test_related
21
+ - test
22
+ - docs_validate_fast
23
+ - test_release
24
+ - mustflow_check
25
+ ---
26
+
27
+ # RAG Pipeline Triage
28
+
29
+ <!-- mustflow-section: purpose -->
30
+ ## Purpose
31
+
32
+ Localize RAG failures by splitting ingestion, retrieval, context assembly, and answer generation
33
+ before changing models, prompts, chunk sizes, or vector settings.
34
+
35
+ The first question is not "is the model bad?" It is "did the correct evidence exist, get retrieved,
36
+ survive filtering and context assembly, and constrain the answer?"
37
+
38
+ <!-- mustflow-section: use-when -->
39
+ ## Use When
40
+
41
+ - A RAG answer, knowledge-base answer, grounded support bot, citation answer, document QA flow, or
42
+ retrieval-augmented agent is wrong, stale, unsupported, too slow, leaking data, citing the wrong
43
+ source, refusing answerable questions, or answering unanswerable questions.
44
+ - The failure is not yet localized to source availability, parsing, chunking, embedding, indexing,
45
+ filters, vector or keyword retrieval, hybrid fusion, reranking, context packing, prompt assembly,
46
+ generation, validators, citations, answerability, or access control.
47
+ - A review would otherwise tune the model, top-k, chunk size, reranker, or prompt before proving
48
+ which RAG layer failed.
49
+
50
+ <!-- mustflow-section: do-not-use-when -->
51
+ ## Do Not Use When
52
+
53
+ - The failure is already localized to retrieval mechanics, filters, ANN, embeddings, or vector DB
54
+ behavior; use `vector-search-integrity-review`.
55
+ - The failure is already localized to unsupported claims, citations, answerability, evidence IDs, or
56
+ validators; use `llm-hallucination-control-review`.
57
+ - The failure is already localized to prompt structure, tool policy, output schema, or model runtime
58
+ settings; use `prompt-contract-quality-review`.
59
+ - The task asks for production document dumps, raw embeddings, private prompts, customer text, or
60
+ tenant-identifying data. Use ids, hashes, safe synthetic fixtures, aggregate metrics, and redacted
61
+ traces.
62
+
63
+ <!-- mustflow-section: required-inputs -->
64
+ ## Required Inputs
65
+
66
+ - Symptom classification: missing correct document, correct document retrieved but unused, stale
67
+ answer, unsupported answer, wrong citation, access-control leak, over-refusal, under-refusal,
68
+ latency, cost, or nondeterministic result.
69
+ - Trace ledger: trace id, original question, normalized question, rewritten query, user or tenant
70
+ context, filters, embedding model version, index version, candidate ids and scores, reranker
71
+ output, final context ids and order, prompt version, model version, answer, citations, validators,
72
+ latency, and cost when safe.
73
+ - Source ledger: authoritative source availability, parsed text, chunk boundaries, metadata, title,
74
+ section path, version, effective dates, stale or deleted documents, duplicates, and conflicting
75
+ sources.
76
+ - Comparison ledger: no-retrieval answer, retrieved-context answer, human-selected gold-context
77
+ answer, exact or keyword search result, vector search result, hybrid result, and expected
78
+ answerability state.
79
+ - Eval ledger: real failed queries, unanswerable questions, stale-doc cases, conflicting-doc cases,
80
+ similar-name cases, IDs and error codes, multilingual or typo cases, multi-hop cases, and
81
+ unauthorized-doc cases.
82
+ - Privacy ledger: raw text, prompts, embeddings, tenant ids, user ids, provider payloads, and which
83
+ evidence can be stored or reported safely.
84
+
85
+ <!-- mustflow-section: preconditions -->
86
+ ## Preconditions
87
+
88
+ - The task matches the Use When conditions and does not match the Do Not Use When exclusions.
89
+ - Higher-priority instructions and `.mustflow/config/commands.toml` have been checked.
90
+ - Raw documents, prompts, embeddings, user data, and tenant-identifying payloads are not copied into
91
+ docs, tests, commits, or reports unless they are safe synthetic fixtures.
92
+
93
+ <!-- mustflow-section: allowed-edits -->
94
+ ## Allowed Edits
95
+
96
+ - Add or tighten trace fields, fixture queries, parsing checks, chunk metadata, duplicate or stale
97
+ source handling, retrieval comparisons, filter checks, context-packing rules, prompt source
98
+ separation, citation validators, answerability states, dirty eval fixtures, metrics, docs, and
99
+ directly synchronized templates.
100
+ - Add safe synthetic fixtures for missing-doc, correct-doc-unused, stale-doc, conflicting-doc,
101
+ unauthorized-doc, exact-id, keyword, vector, hybrid, reranker, citation, and abstain behavior.
102
+ - Do not change models, re-embed data, rebuild production indexes, widen access filters, disable
103
+ authorization, dump private corpora, or claim quality improvement before the failing layer is
104
+ localized.
105
+
106
+ <!-- mustflow-section: procedure -->
107
+ ## Procedure
108
+
109
+ 1. Classify the symptom. Separate no source, bad parsing, bad chunking, retrieval miss, filter miss,
110
+ reranker loss, context truncation, prompt misuse, generation drift, false citation, stale answer,
111
+ access leak, latency, cost, and answerability errors.
112
+ 2. Preserve one safe end-to-end trace. Keep ids, versions, scores, context order, prompt version,
113
+ model version, validator result, latency, and cost. Redact raw sensitive content.
114
+ 3. Prove the answer exists in the source of truth. Do not tune retrieval for an answer that is not in
115
+ the indexed corpus or an allowed tool.
116
+ 4. Inspect parsed text before original documents. Tables, PDFs, multi-column pages, code blocks,
117
+ headers, footers, and OCR can be broken even when the original file looks correct.
118
+ 5. Inspect chunk boundaries and metadata. Verify title, parent section, version, dates, audience,
119
+ product, source authority, and neighboring context survive into chunks or parent retrieval.
120
+ 6. Compare source versions and deletes. Duplicates, obsolete documents, tombstones, and conflicting
121
+ effective dates must not be silently mixed into one answer.
122
+ 7. Run the isolation comparison when evidence is available: no retrieval, current retrieved context,
123
+ and human-selected gold context. Gold-context failure points to generation or prompt; current
124
+ context failure with gold success points to retrieval or context assembly.
125
+ 8. Compare keyword, vector, hybrid, and exact-id retrieval by data shape. IDs, error codes, SKUs,
126
+ names, dates, and numbers need exact or lexical safeguards; semantic questions may need vector or
127
+ hybrid retrieval.
128
+ 9. Check filters before blaming embeddings. Record pre-filter candidate count, post-filter count,
129
+ tenant and permission filters, metadata types, time zones, empty arrays, case sensitivity, and
130
+ stale policy copies.
131
+ 10. Check reranker candidate starvation. If the correct source never enters the candidate set, the
132
+ reranker cannot fix it. If it enters and then drops, inspect reranker inputs and scoring.
133
+ 11. Check context assembly. Verify `top_k`, score thresholds, source order, truncation, deduping,
134
+ conflict handling, source authority, and whether important evidence is buried or cut off.
135
+ 12. Check prompt construction. User input, retrieved text, examples, tool observations, and system or
136
+ developer instructions must remain separated. Retrieved text is data, not authority.
137
+ 13. Check answerability and abstain behavior. Track no-evidence, low-confidence, conflicting-source,
138
+ stale-source, access-denied, tool-failed, and needs-human states separately.
139
+ 14. Validate citations claim-by-claim. A citation id proves nothing unless the cited chunk supports
140
+ the specific generated claim.
141
+ 15. Measure each layer separately. Track parsing success, index freshness, Recall@k, MRR or nDCG,
142
+ rerank survival, context token budget, answer accuracy, citation accuracy, abstain accuracy,
143
+ access leaks, and retrieval/rerank/generation latency and cost.
144
+ 16. Use dirty eval cases from real failures. Include typos, abbreviations, multilingual questions,
145
+ unanswerable questions, date-sensitive questions, similar names, product codes, multi-hop
146
+ questions, unauthorized documents, stale documents, and conflicting documents.
147
+ 17. Apply the smallest localized fix and switch to the narrower matching skill for retrieval,
148
+ hallucination control, prompt contract, token cost, latency, access control, or prompt-injection
149
+ defense once the boundary is known.
150
+
151
+ <!-- mustflow-section: postconditions -->
152
+ ## Postconditions
153
+
154
+ - The RAG failure is localized to ingestion, parsing, chunking, indexing, retrieval, filters,
155
+ reranking, context assembly, prompt construction, generation, citation validation, answerability,
156
+ access control, or a named evidence gap.
157
+ - Trace, source, comparison, eval, metric, and privacy ledgers are explicit where relevant.
158
+ - Model, prompt, chunk, top-k, reranker, or index changes are justified by layer evidence rather than
159
+ by general "RAG quality" claims.
160
+
161
+ <!-- mustflow-section: verification -->
162
+ ## Verification
163
+
164
+ Use configured oneshot command intents when available:
165
+
166
+ - `changes_status`
167
+ - `changes_diff_summary`
168
+ - `lint`
169
+ - `build`
170
+ - `test_related`
171
+ - `test`
172
+ - `docs_validate_fast`
173
+ - `test_release`
174
+ - `mustflow_check`
175
+
176
+ Prefer the narrowest configured eval, fixture, schema, docs, package, or release check that proves
177
+ the localized RAG boundary. Report missing retrieval, gold-context, citation, answerability,
178
+ privacy, latency, or production-index evidence instead of inventing live diagnostics.
179
+
180
+ <!-- mustflow-section: failure-handling -->
181
+ ## Failure Handling
182
+
183
+ - If the end-to-end trace cannot be reconstructed, report the missing trace fields before tuning
184
+ models, prompts, chunks, filters, or retrieval parameters.
185
+ - If evidence contains raw private text, embeddings, prompts, personal data, or tenant-identifying
186
+ data, redact to ids, hashes, dimensions, scores, snippets from safe fixtures, and aggregate
187
+ metrics.
188
+ - If the fix requires model replacement, re-embedding, index rebuild, private corpus access, or live
189
+ provider calls outside the command contract, report the manual boundary.
190
+ - If retrieved text can inject instructions, pause RAG quality work and apply
191
+ `external-prompt-injection-defense`.
192
+
193
+ <!-- mustflow-section: output-format -->
194
+ ## Output Format
195
+
196
+ - RAG pipeline triaged
197
+ - Symptom classification and localized boundary
198
+ - Trace, source, comparison, eval, metric, and privacy ledgers
199
+ - Ingestion, parsing, chunking, retrieval, filter, rerank, context, prompt, generation, citation,
200
+ answerability, access-control, latency, and cost findings
201
+ - Fix applied or recommended
202
+ - Evidence level: end-to-end trace, gold-context comparison, configured-test evidence, static review
203
+ risk, manual-only, missing, or not applicable
204
+ - Command intents run
205
+ - Skipped diagnostics and reasons
206
+ - Remaining RAG pipeline risk