mustflow 2.18.20 → 2.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/dist/cli/commands/classify.js +2 -3
  2. package/dist/cli/commands/doctor.js +46 -6
  3. package/dist/cli/commands/run/output.js +1 -1
  4. package/dist/cli/commands/run/receipt.js +1 -0
  5. package/dist/cli/commands/verify.js +52 -23
  6. package/dist/cli/i18n/en.js +1 -0
  7. package/dist/cli/i18n/es.js +1 -0
  8. package/dist/cli/i18n/fr.js +1 -0
  9. package/dist/cli/i18n/hi.js +1 -0
  10. package/dist/cli/i18n/ko.js +1 -0
  11. package/dist/cli/i18n/zh.js +1 -0
  12. package/dist/cli/lib/git-changes.js +7 -1
  13. package/dist/cli/lib/local-index/index.js +9 -30
  14. package/dist/cli/lib/repo-map.js +3 -2
  15. package/dist/cli/lib/run-plan.js +8 -4
  16. package/dist/core/change-classification.js +24 -2
  17. package/dist/core/check-issues.js +1 -1
  18. package/dist/core/command-contract-rules.js +6 -0
  19. package/dist/core/command-contract-validation.js +24 -10
  20. package/dist/core/command-output-limits.js +2 -1
  21. package/dist/core/line-endings.js +12 -4
  22. package/dist/core/repeated-failure.js +3 -3
  23. package/dist/core/run-performance-history.js +4 -4
  24. package/dist/core/run-profile.js +2 -3
  25. package/dist/core/run-receipt.js +11 -3
  26. package/dist/core/run-write-drift.js +64 -12
  27. package/dist/core/safe-filesystem.js +155 -0
  28. package/package.json +1 -1
  29. package/schemas/commands.schema.json +1 -0
  30. package/schemas/doctor-report.schema.json +23 -1
  31. package/schemas/run-receipt.schema.json +6 -2
  32. package/templates/default/i18n.toml +13 -13
  33. package/templates/default/locales/en/.mustflow/skills/INDEX.md +13 -13
  34. package/templates/default/locales/en/.mustflow/skills/adapter-boundary/SKILL.md +72 -4
  35. package/templates/default/locales/en/.mustflow/skills/command-contract-authoring/SKILL.md +16 -10
  36. package/templates/default/locales/en/.mustflow/skills/command-pattern/SKILL.md +64 -7
  37. package/templates/default/locales/en/.mustflow/skills/database-change-safety/SKILL.md +249 -16
  38. package/templates/default/locales/en/.mustflow/skills/dependency-reality-check/SKILL.md +37 -7
  39. package/templates/default/locales/en/.mustflow/skills/migration-safety-check/SKILL.md +74 -10
  40. package/templates/default/locales/en/.mustflow/skills/performance-budget-check/SKILL.md +132 -5
  41. package/templates/default/locales/en/.mustflow/skills/pure-core-imperative-shell/SKILL.md +12 -5
  42. package/templates/default/locales/en/.mustflow/skills/result-option/SKILL.md +4 -2
  43. package/templates/default/locales/en/.mustflow/skills/security-privacy-review/SKILL.md +112 -29
  44. package/templates/default/locales/en/.mustflow/skills/state-machine-pattern/SKILL.md +17 -4
  45. package/templates/default/locales/en/.mustflow/skills/structure-discovery-gate/SKILL.md +193 -2
  46. package/templates/default/manifest.toml +1 -1
@@ -2,11 +2,11 @@
2
2
  mustflow_doc: skill.command-pattern
3
3
  locale: en
4
4
  canonical: true
5
- revision: 4
5
+ revision: 13
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: command-pattern
9
- description: Apply this skill when a state-changing user or system intent needs to become one traceable, retryable, idempotent, authorized, transactional, and testable execution unit.
9
+ description: Apply this skill when a state-changing user or system intent needs to become one traceable, retryable, idempotent, authorized, transactional, auditable, observable, replayable, and testable execution unit, especially for payment, credit, point, inventory, entitlement, subscription, permission, document, prompt, AI cost-bearing work, AI budget reservation, agent loop execution, long-running job, queue message contract, external-side-effect workflow, provider intent record, webhook follow-up, cron work, worker work, manual recovery action, or core-state change that should accept work in HTTP and hand off analytics, email, AI, search indexing, statistics, cache rebuild, or other auxiliary work after commit while preserving request, trace, causation, and job identifiers.
10
10
  metadata:
11
11
  mustflow_schema: "1"
12
12
  mustflow_kind: procedure
@@ -46,8 +46,19 @@ Use commands to make these questions answerable later:
46
46
  ## Use When
47
47
 
48
48
  - A request creates, updates, deletes, approves, cancels, captures, refunds, archives, sends, publishes, imports, exports, or otherwise changes durable state.
49
+ - A request changes payment, point, credit, inventory, coupon, subscription, entitlement, permission, AI prompt version, document version, policy version, or automation rule state.
50
+ - A request starts, retries, cancels, records, or limits an AI operation where model usage, token cost, cache behavior, retry cost, plan limits, or provider-call reconciliation matters.
51
+ - A request starts an agentic or multi-step AI operation where maximum steps, tool calls, tokens, cost, time, model fallback, policy decisions, or emergency stop behavior must be recorded before work fan-out.
52
+ - A request consumes high-cost resources such as AI calls, image or video conversion, search, automation, file processing, webhooks, realtime fan-out, or provider calls that need credits, quotas, tenant limits, or usage records.
49
53
  - A user or system action calls an external service, sends a message, writes a file, sends email, charges payment, publishes a webhook, or schedules work.
54
+ - A user or system action depends on an external API and must preserve the product's intent before the provider call so failed, unknown, delayed, or duplicate work can be retried, reconciled, or manually recovered later.
55
+ - An operator needs to replay a failed email, reprocess a webhook, retry an AI job, rebuild a search index, disable an external feature, reconcile a payment, or move an exhausted job out of manual review without guessing from logs.
50
56
  - The operation needs authorization, audit logs, idempotency, retry classification, concurrency protection, an outbox, or a transaction boundary.
57
+ - The operation must update several local records atomically and then coordinate with email, notification, webhook, analytics, payment, AI, or other external systems after commit.
58
+ - A user request currently waits for auxiliary work such as analytics logging, search indexing, recommendation refresh, statistics aggregation, email delivery, AI summarization, file conversion, cache purge, or reporting updates even though the core state change could safely complete first.
59
+ - An HTTP route should accept a long-running or externally dependent operation, persist the requested work, and return a queued, processing, or accepted result instead of completing email, AI, embedding, import, export, webhook follow-up, or statistics work inline.
60
+ - A worker job, outbox dispatcher, webhook processor, or retryable background step needs durable status, duplicate prevention, attempts, locking, retry time, and dead-letter behavior.
61
+ - A queue system, worker framework, or scheduler is introduced or replaced and the service needs job message shape, schema versioning, idempotency, retry, dead-letter, ordering, priority, observability, or manual replay to remain product-owned rather than queue-product-owned.
51
62
  - HTTP, queue, cron, CLI, or worker entrypoints should run the same state-changing intent.
52
63
  - An existing handler, job, service, or controller mixes intent parsing, authorization, domain decisions, persistence, side effects, event publishing, and error mapping.
53
64
  - A command bus may be justified because many commands repeat the same tracing, logging, idempotency, or middleware concerns.
@@ -69,8 +80,17 @@ Use commands to make these questions answerable later:
69
80
  - The user or system intent and the command name that would describe it with an imperative verb and target noun.
70
81
  - The source boundary: HTTP, queue, cron, CLI, worker, test, webhook, or internal system action.
71
82
  - The command payload, actor, tenant, request identifier, correlation identifier, causation identifier, source, and current time.
83
+ - Trace or observability identifiers when relevant: trace id, span id, request id, command id, job run id, webhook event id, cron run id, user or anonymous id, tenant or organization id, and which identifiers are safe to log or propagate.
72
84
  - The durable resources loaded or changed by the command.
73
85
  - Authorization policy, domain rules, lifecycle state transitions, transaction needs, outbox or event needs, audit requirements, idempotency needs, retry policy, and concurrency risks.
86
+ - Core state that must be committed before response, auxiliary work that can run after commit, acceptable delay or loss for each follow-up, and the dependency failure policy.
87
+ - Provider intent and recovery policy, including the internal operation id, provider operation, safe payload hash, provider id when known, idempotency key, unknown-result handling, retry budget, reconciliation rule, manual replay rule, and whether a provider swap is an immediate fallback or a later migration path.
88
+ - Work-acceptance response policy, such as immediate success, queued status, processing status, or accepted response; job status vocabulary; deduplication key; attempt limit; next-run time; lock expiry; dead-letter handling; and worker ownership.
89
+ - Queue contract details when work crosses a queue: queue name, business urgency, job id, job type, schema version, created time, run-after time, attempt count, idempotency key, request or trace context, safe payload reference, retry categories, timeout, dead-letter target, ordering requirement, and manual replay rule.
90
+ - AI work accounting when relevant: feature key, model key, usage ledger entry, user request id, provider call id, pricing snapshot, cache-hit type, retry grouping, cost limit, and whether failed or unknown calls require reconciliation before retry.
91
+ - AI policy decision when relevant: estimated cost, remaining budget, selected model, fallback model, blocked reason, maximum input tokens, maximum output tokens, maximum tool calls, maximum agent steps, timeout, and whether provider budgets are only secondary guardrails.
92
+ - Cost-bearing work accounting when relevant: value unit, cost unit, workspace or account quota, shared tenant credit pool, free-plan limit, user-action fan-out, usage event, rollup target, and whether retries or duplicate jobs can double-count cost.
93
+ - Idempotency layers for request acceptance, job execution, provider calls, and incoming webhooks, including scope, request hash, duplicate result behavior, and different-payload conflict behavior.
74
94
  - Existing local conventions for result types, option types, domain errors, repositories, gateways, unit of work, outbox, audit logs, command buses, and tests.
75
95
  - Relevant command-intent contract entries for verification.
76
96
 
@@ -116,8 +136,10 @@ Use commands to make these questions answerable later:
116
136
  - Command payloads must be serializable data.
117
137
  - Do not put request objects, response objects, ORM entities, database connections, file streams, SDK clients, functions, class instances, or loggers in the payload.
118
138
  - Use an envelope when the command may be queued, retried, audited, or stored: command type, schema version, command identifier, optional idempotency key, and payload.
139
+ - Use a job envelope when work is queued or scheduled: job id, job type, schema version, idempotency key, created time, run-after time, attempt, trace or request context, and a safe payload or payload reference.
119
140
  4. Model execution context separately from payload.
120
141
  - Context should carry trusted actor, request identifier, correlation identifier, optional causation identifier, current time or time context, source, and tenant or account scope.
142
+ - When observability continuity matters, context should also carry or create safe trace, command, job run, webhook, or cron identifiers. Use internal ids or hashes for user and tenant context when those identifiers can leave the protected boundary.
121
143
  - Do not trust client-supplied actor identifiers, roles, or tenant identifiers without server-side authentication and membership checks.
122
144
  - Inject time through context. Do not read current time inside the handler except at the outer boundary that builds the context.
123
145
  5. Define the handler contract.
@@ -149,10 +171,22 @@ Use commands to make these questions answerable later:
149
171
  - Commit before publishing external messages or sending external effects.
150
172
  - Record audit evidence for security, payment, permission, and administrator commands.
151
173
  - Schedule follow-up work only after the command decision is persisted.
174
+ - For payment, point, credit, inventory, entitlement, subscription, coupon, and refund commands, prefer append-only ledgers or action records as the evidence source. Treat summary balances or statuses as derived or transactionally updated read state.
175
+ - For ordinary content, account, and workflow commands, persist the core state and outbox or job records before triggering analytics, email, search indexing, AI processing, statistics, cache purge, or feed refresh work.
176
+ - For cost-bearing AI commands, persist the accepted work, idempotency decision, usage-limit decision, and job or outbox record before a worker performs model calls. Record actual usage, retry grouping, cache-hit type, pricing snapshot, and provider outcome after the call.
177
+ - For agentic AI commands, persist the policy decision and hard caps before the first model call. Steps, tool calls, total tokens, elapsed time, and cost should be bounded by the command or job contract, not by provider defaults or operator memory.
178
+ - For any cost-bearing command, check plan, tenant, quota, credit, request-size, and rate limits before accepting the work when possible. Record usage intent or reserved quota before fan-out work starts, then reconcile actual usage after workers and provider calls complete.
179
+ - When one command creates many internal jobs, record the causation relationship so thumbnails, OCR, AI calls, embeddings, search indexing, notifications, logs, analytics exports, and webhooks can be attributed to the original user action without losing retry or cost detail.
180
+ - For HTTP acceptance of long-running work, persist the command result, job row, or outbox row in the same local transaction, then return the created resource identifier and current status. Do not make the HTTP request wait for the worker's external side effect unless the product contract truly requires immediate completion.
181
+ - For external API work, persist the internal intent before the provider call becomes the only record. Payment, email, map, AI, search, file, and webhook follow-up commands should leave enough local evidence to answer what was attempted, why, for whom, and how to retry or reconcile it later.
152
182
  9. Keep external effects out of local transactions.
153
183
  - Do not send email, webhooks, push notifications, SMS, files, AI requests, long network requests, payment captures, or refunds while holding a database transaction open.
154
184
  - Use outbox records, pending-effect records, job records, or a later worker command when local state and external work must both be reliable.
155
185
  - For payment or other harmful repeated effects, store a pending state or action ledger, pass idempotency keys to the provider when supported, and confirm the result through a follow-up command or workflow step.
186
+ - For workflows such as "payment approved then grant credits", persist the attempt, provider reference, ledger entry, balance/status update, and outbox event inside the local transaction after the provider result is known; send receipts, notifications, and analytics through outbox or worker steps after commit.
187
+ - Do not let optional analytics, email, AI, search, statistics, cache, or recommendation dependencies decide whether the core command succeeded. Record retryable follow-up work or a degraded status instead.
188
+ - Do not treat queue publication alone as proof that work exists. When possible, store the job or outbox record durably first so a dispatcher can recover after a process crash or queue-publish failure.
189
+ - Prefer single-provider plus adapter, failure queue, replay, and reconciliation over premature multi-provider orchestration unless the product contract truly needs live failover. Multiple providers do not remove the need for local intent records, idempotency, and manual recovery.
156
190
  10. Make idempotency explicit.
157
191
  - Require idempotency keys for payments, refunds, order creation, subscription starts, invite emails, password reset emails, file upload confirmation, external webhooks, point grants, coupon issuance, and administrator approvals.
158
192
  - Scope idempotency by actor, tenant, workspace, account, or other ownership boundary. Do not treat a raw idempotency key as globally safe.
@@ -160,6 +194,10 @@ Use commands to make these questions answerable later:
160
194
  - Return the previous success result for the same scope, type, key, and payload hash.
161
195
  - Return an idempotency conflict for the same scope, type, and key with a different payload hash.
162
196
  - Distinguish in-progress, succeeded, final failure, and retryable failure records.
197
+ - For retryable jobs, use durable deduplication keys and database uniqueness where possible. Assume queues and workers can deliver or run a job more than once.
198
+ - For provider calls, store the provider, operation, local idempotency key, safe request hash, provider object identifier when known, outcome, and last safe error. Distinguish `failed` from `unknown`: `failed` means the provider is known not to have completed the effect; `unknown` means reconciliation is required before retrying.
199
+ - For provider webhooks, store provider event ids or normalized event hashes so duplicate callbacks cannot double-charge, double-grant, double-refund, or repeat a state transition.
200
+ - For provider replacement, keep idempotency and operation ids in product language first. Provider-specific payment ids, message ids, place ids, model call ids, and search task ids are mappings, not the command identity.
163
201
  11. Record events safely.
164
202
  - Command names are imperative. Event names are past-tense facts.
165
203
  - Store domain events or outbox messages only after the state change decision succeeds.
@@ -170,13 +208,18 @@ Use commands to make these questions answerable later:
170
208
  - Return typed command errors for validation, authorization, not found, conflict, invariant, idempotency, dependency, and internal failures.
171
209
  - Do not throw for expected business failures.
172
210
  - Mark dependency failures as retryable or non-retryable.
173
- - Retry transient network, timeout, rate-limit, lock-contention, queue-delay, or temporary persistence failures only when duplicate execution is safe.
174
- - Do not retry invalid input, denied access, missing resource, domain-rule violation, idempotency conflict, or already-processed terminal states.
211
+ - Retry transient network, timeout, rate-limit, lock-contention, queue-delay, or temporary persistence failures only when duplicate execution is safe.
212
+ - Do not retry invalid input, denied access, missing resource, domain-rule violation, idempotency conflict, or already-processed terminal states.
213
+ - Give external dependencies explicit timeouts, retry budgets, backoff with jitter, and retryable error categories. Do not allow an auxiliary dependency to consume the whole user-request budget unless the command's core outcome depends on it.
214
+ - Classify follow-up failures separately from command failures. A failed email, analytics event, search index update, cache purge, or AI summary usually means pending or degraded follow-up work, not a failed core state change.
215
+ - Do not retry invalid input, denied authorization, permission rejection, malformed provider requests, or idempotency conflicts. Retry transient network failures, timeouts, rate limits, and temporary provider outages only when duplicate effects are prevented.
175
216
  13. Protect concurrency.
176
217
  - Use unique constraints, optimistic locking, pessimistic locking, conditional updates, state-transition checks, idempotency keys, or compare-and-swap saves when simultaneous commands may affect the same resource.
177
218
  - If a version conflict occurs, reload and recompute, return a conflict, enqueue a retry, or apply a domain-specific merge only when that policy is explicit.
219
+ - For slow worker or AI results, include the command version, target version, or expected state so stale results cannot overwrite newer state.
178
220
  14. Add observability and audit evidence.
179
221
  - Logs should include command type, command identifier, schema version, actor identifier, tenant identifier, request identifier, correlation identifier, causation identifier, source, affected resource identifier, duration, outcome, error kind, and error code.
222
+ - For commands that cross HTTP, queue, worker, cron, or webhook boundaries, keep the request id, trace id, causation id, command id, job run id, and webhook event id linked so a later backend change does not break incident reconstruction.
180
223
  - Logs and audits must not include passwords, tokens, cookies, raw card data, raw personal data, raw files, security answers, or raw sensitive provider responses.
181
224
  - Audit logs are required for permission changes, administrator invites, organization deletion, payment capture, refund, subscription cancellation, personal data export, account deletion, security setting changes, API key creation, and API key revocation.
182
225
  15. Introduce a command bus only with evidence.
@@ -184,9 +227,15 @@ Use commands to make these questions answerable later:
184
227
  - The bus may locate handlers, apply middleware, add common tracing, and normalize outer errors.
185
228
  - The bus must not own domain rules, know every handler branch, centralize business logic, or force one transaction policy on all commands.
186
229
  16. Split long-running work.
187
- - Do not make a user request wait for bulk email, bulk file processing, AI document analysis, large imports, or external synchronization.
188
- - Use a start command to create a job and return a queued status.
189
- - Use worker commands for processing steps and completion or failure transitions.
230
+ - Do not make a user request wait for bulk email, bulk file processing, AI document analysis, large imports, or external synchronization.
231
+ - Use a start command to create a job and return a queued status.
232
+ - Use worker commands for processing steps and completion or failure transitions.
233
+ - Keep worker commands idempotent, version-aware, and stale-result safe so a slow AI, import, search, or conversion result cannot overwrite newer user state.
234
+ - Separate queues or worker pools when one class of work can starve another. Payment, webhook, email, AI, embedding, analytics, and dead-letter processing should not all compete for one unbounded worker path when delay or failure policy differs.
235
+ - Name queues by business domain, urgency, and failure policy where useful, such as billing webhook critical, transactional email, marketing email, media conversion, search reindex, analytics rollup, and dead-letter review. Avoid a single vague default queue when unrelated work can block critical rights, payment, or security updates.
236
+ - Put exhausted or poison jobs into a dead-letter or manual-review state with safe error metadata instead of retrying forever.
237
+ - Treat a queued failure as hidden until metrics, alerts, or operator review make it visible. Track queue depth, job age, retry count, failure rate, dead-letter growth, provider rate-limit pressure, and manual replay results for important queues.
238
+ - Define the smallest operator actions that make the command recoverable at 03:00: resend a specific email, reprocess a specific webhook, retry a specific AI job, rebuild a specific search index, reconcile a specific payment attempt, or temporarily disable one provider-backed feature.
190
239
  17. Test command behavior.
191
240
  - Cover success, required input absence, invalid input, unauthorized actor, missing resource, state conflict, domain invariant failure, duplicate retry with same payload, duplicate key with different payload, transaction rollback, outbox creation, dependency failure, retryability, non-retryability, and concurrency conflicts.
192
241
  - Use fake repositories and gateways for handler unit tests.
@@ -200,8 +249,12 @@ Use commands to make these questions answerable later:
200
249
  - The payload is serializable and free of framework, ORM, SDK, connection, stream, and function objects.
201
250
  - The handler has injected dependencies and handles one command.
202
251
  - Authorization, idempotency, transaction boundaries, outbox behavior, retry classification, concurrency protection, observability, and audit requirements are explicit where relevant.
252
+ - Request, trace, command, job, cron, webhook, correlation, and causation identifiers are explicit where the command crosses asynchronous or external boundaries.
253
+ - HTTP acceptance, durable job or outbox creation, worker ownership, queue separation, deduplication, retry budget, dead-letter behavior, and reconciliation rules are explicit when long-running or external work is involved.
254
+ - Credit, quota, tenant-limit, usage-event, fan-out attribution, and retry-cost behavior are explicit when one command consumes high-cost resources or creates multiple internal jobs.
203
255
  - Expected command failures are returned as typed values.
204
256
  - External effects do not run inside local database transactions.
257
+ - Auxiliary work that can lag, retry, degrade, or be lost is separated from the core command outcome instead of blocking the user request.
205
258
  - The final response reports any command bus, outbox, idempotency, audit, or retry behavior that was intentionally skipped because the operation did not need it.
206
259
 
207
260
  <!-- mustflow-section: verification -->
@@ -242,6 +295,10 @@ Choose the narrowest configured verification that proves the changed command pat
242
295
  - Strategy family used or intentionally avoided
243
296
  - Facade workflow used or intentionally avoided
244
297
  - Transaction, outbox, idempotency, retry, concurrency, audit, and observability choices
298
+ - Job, worker, queue, deduplication, dead-letter, and provider reconciliation choices when relevant
299
+ - AI or other high-cost usage ledger, pricing snapshot, cache-hit, credit, quota, tenant-limit, fan-out, retry-cost, provider-call, and limit-enforcement choices when relevant
300
+ - AI policy decision, preflight budget, agent-step, tool-call, token, cost, timeout, fallback, and emergency-stop choices when relevant
301
+ - Core versus auxiliary work split, including delayed, degraded, or lossy follow-up behavior when relevant
245
302
  - Command bus used or intentionally avoided
246
303
  - Tests or verification evidence
247
304
  - Skipped checks and remaining command safety risk
@@ -2,11 +2,11 @@
2
2
  mustflow_doc: skill.database-change-safety
3
3
  locale: en
4
4
  canonical: true
5
- revision: 1
5
+ revision: 16
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: database-change-safety
9
- description: Apply this skill when database schema, queries, transactions, ORM models, repositories, stores, indexes, cache-backed read models, retention, pagination, concurrency, idempotency, audit logs, or persistence boundaries are introduced, changed, reviewed, or reported.
9
+ description: Apply this skill when database schema, database engine choice, managed database extensions, provider-specific database features, SQLite or PostgreSQL suitability, queries, transactions, ORM models, repositories, stores, indexes, cache-backed read models, read/write models, content metadata, content block records, content graph records, lifecycle states, versioned records, ledgers, job tables, outbox events, inbox events, idempotency records, processed webhook records, external API call records, provider intent records, manual recovery records, usage metering records, plan-limit records, AI budget and feature-policy records, cost rollups, claim or fact registries, comparison methodology records, affiliate links, filter URL policies, SEO landing records, source provenance, admin audit logs, behavior analytics events, core event stores, search document or ranking metadata, semantic export and import data, provider id mappings, app-owned identity records, public URL and storage metadata records, data residency records, external-service truth ownership, operational versus analytics data boundaries, cache-as-store decisions, API response projections, cache invalidation data, user activity state, global-ready locale country currency timezone and money models, AI usage and pricing ledgers, hybrid file/database storage, file metadata records, retention, pagination, concurrency, idempotency, audit logs, data ownership boundaries, or persistence boundaries are introduced, changed, reviewed, or reported.
10
10
  metadata:
11
11
  mustflow_schema: "1"
12
12
  mustflow_kind: procedure
@@ -37,6 +37,28 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
37
37
  ## Use When
38
38
 
39
39
  - A schema, migration, table, collection, ORM model, query, repository, store, transaction, index, cache, read model, audit log, or retention rule is introduced or changed.
40
+ - A content, product, review, comparison, marketplace, knowledge-base, search, API, analytics, localization, SEO, redirect, or CMS-like data model is introduced, changed, reviewed, or reported.
41
+ - Long-form content may stay in files or a CMS while metadata, facts, relationships, site exposure, permissions, workflow state, search fields, or analytics dimensions need queryable persistence.
42
+ - Content relationships, controlled tags, aliases, category hierarchy, typed filter attributes, source collection records, verification states, comments, likes, bookmarks, user activity events, or aggregate counters are introduced, changed, reviewed, or reported.
43
+ - Structured content blocks, type-specific content fields, advertisement slots, SEO metadata, filter definitions, URL normalization policy, curated landing pages, cache keys, cache tags, ranking snapshots, search index jobs, or admin audit logs are introduced, changed, reviewed, or reported.
44
+ - Content lifecycle states, image or file assets, claim registries, policy references, source references, effective dates, risk tiers, review owners, comparison methodology versions, affiliate relationship records, or bulk update jobs are introduced, changed, reviewed, or reported.
45
+ - Identity, privacy, editorial, catalog, community, analytics, billing, messaging, or audit data ownership boundaries are introduced, mixed, split, reviewed, or reported.
46
+ - Behavior logs, domain events, audit logs, analytics stores, reporting aggregates, event schemas, cache-backed state, public identifiers, or API response projections are introduced, mixed, split, reviewed, or reported.
47
+ - Data export, import, restore, provider migration, self-hosting migration, internal id ownership, external provider id mapping, relationship portability, permission portability, file portability, audit/event portability, or automation-rule portability is introduced, changed, reviewed, or reported.
48
+ - An external provider, hosted dashboard, no-code tool, CMS, email system, analytics system, search engine, queue, log store, file store, authentication provider, or payment provider could become the only owner of a core customer, entitlement, consent, file, content, event, audit, ranking, or retry fact.
49
+ - Search documents, ranking snapshots, synonym or boost policy references, query logs, click logs, queue job envelopes, dead-letter records, product events, billing events, job events, or security events are introduced, changed, reviewed, or reported.
50
+ - SQLite, PostgreSQL, MySQL, local-file persistence, managed database operations, backup and restore expectations, concurrent writes, tenant scoping, or database-as-operations-center choices are planned, edited, reviewed, or reported.
51
+ - Managed PostgreSQL or another database service uses extensions, provider auth functions, generated APIs, row-level security policies, triggers, stored procedures, console-only settings, vector search, spatial search, full-text search, or other provider-specific conveniences that could become domain rules or migration blockers.
52
+ - User identity, provider identities, emails, sessions, memberships, roles, permissions, entitlements, file ownership, public resource URLs, storage object keys, or provider ids are persisted or referenced by other records.
53
+ - Data residency, processing region, storage region, backup region, log region, analytics region, AI processing location, retention policy, or external processing permission is modeled or claimed.
54
+ - Locale, country, billing country, residence or operating country, currency, timezone, date-only values, recurring local-time schedules, money, taxes, exchange rates, historical price snapshots, or global-ready storage models are introduced, changed, reviewed, or reported.
55
+ - AI usage, AI model pricing, model call cost, token accounting, provider-call cost, feature-level cost, plan limits, retry cost, cache-hit savings, or usage-ledger records are introduced, changed, reviewed, or reported.
56
+ - AI budget, AI feature policy, AI policy decision, provider budget reliance, hard-limit enforcement, model fallback, token cap, tool-call cap, agent-step cap, timeout cap, or emergency disable state is introduced, changed, reviewed, or reported.
57
+ - Product usage metering, customer cost estimation, plan margin analysis, free-plan limits, credit pools, tenant quotas, high-cost feature limits, or cost rollup records are introduced, changed, reviewed, or reported.
58
+ - File upload metadata, object-storage keys, signed upload or download flows, local disk storage, database blob storage, asset status, orphan cleanup, or storage/database consistency rules are introduced, changed, reviewed, or reported.
59
+ - Deletion, restore, purge, versioning, payment, point, credit, inventory, entitlement, subscription, coupon, prompt, document, policy, or automation-rule behavior is introduced, changed, reviewed, or reported.
60
+ - Background jobs, outbox dispatch, dead-letter state, retry scheduling, worker locks, external API call tracking, webhook receipt tracking, or request idempotency records are introduced, changed, reviewed, or reported.
61
+ - A list, feed, search, admin table, dashboard, or API response may risk hidden N+1 queries, ORM lazy-loading surprises, unbounded relation loading, expensive counts, or screen-shaped persistence.
40
62
  - Code reads from or writes to a database, browser storage, cache, local SQLite file, external database, or generated data store.
41
63
  - A task changes authorization, tenant scoping, pagination, sorting, soft delete, status filters, idempotency, duplicate handling, retry, or concurrency behavior around persisted data.
42
64
  - Documentation, tests, or final reports claim that a database change is safe, fast, indexed, migrated, reversible, idempotent, or verified.
@@ -53,9 +75,37 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
53
75
  ## Required Inputs
54
76
 
55
77
  - Database role: source of truth, rebuildable cache, read model, runtime state, analytics store, external provider, or browser storage.
78
+ - Database operating model: SQLite file, managed PostgreSQL, self-managed PostgreSQL, MySQL, local disk, object metadata store, or other store; single server or many app servers; concurrent write pressure; backup and restore target; and whether the deployment can preserve local files safely.
79
+ - Event role: operational event, audit log, behavior analytics event, integration outbox message, reporting aggregate, or replayable domain event.
56
80
  - Data owner and affected tables, collections, stores, indexes, caches, generated files, or read models.
81
+ - Entity identity rules, including stable ids, external provider ids, mutable slugs, titles, locale-specific addresses, redirects, and public API identifiers when content or user-facing resources are involved.
82
+ - Exit and restore rules, including whether exported data preserves relationships, permissions, files, versions, events, audit history, automation rules, provider id mappings, schema metadata, and enough import or restore evidence to reconstruct product state.
83
+ - Identifier ownership rules, including which ids are product-owned, which ids are public, which ids are provider mappings, and whether external auth, payment, CRM, analytics, storage, or CMS ids can change without breaking internal references.
84
+ - Authentication identity rules, including app-owned user id, provider subject records, email-as-attribute behavior, social provider subject preservation, account merge or relink policy, session migration expectations, and whether memberships, roles, permissions, and entitlements live in product-owned tables rather than only provider metadata.
85
+ - Managed database dependency rules, including extension inventory, provider-specific function usage, generated API usage, row-level policy ownership, trigger or stored-procedure ownership, console-only schema or permission settings, and whether a plain or explicitly equivalent database migration rehearsal exists.
86
+ - File and URL ownership rules, including public URL owner, storage provider, bucket, object key, content type, checksum, visibility, file status, variant name, storage region, immutable versioning, private download authorization, and whether raw storage URLs or CDN transform syntax may appear in persisted content.
87
+ - Data location rules, including data classification, home region, storage region, processing region, backup region, log region, analytics region, AI provider region, support-tool access, external transfer notice, deletion path, and whether provider system metadata is outside the selected residency scope.
88
+ - Core-state ownership rules for external services, including which facts must be stored internally even when a provider handles the workflow: entitlement state, plan state, payment event cursor, consent and unsubscribe state, file owner and storage metadata, customer status, search index source document metadata, processed job or webhook state, and administrator audit evidence.
89
+ - External API recovery rules, including which internal intent, attempt, job, webhook receipt, provider reference, dead-letter, manual-review, and reconciliation records are needed before a provider result or dashboard becomes the only evidence.
90
+ - Search, queue, log, metric, and analytics data rules, including search document source, ranking or boost metadata, event names and versions, event retention, job schema versions, idempotency keys, dead-letter retention, and whether SaaS-held data is also exportable or stored internally.
91
+ - Storage split rules for body files, frontmatter, database rows, generated indexes, site-specific overrides, central facts, and external source data when a hybrid model is used.
92
+ - Filter, sort, search, localization, SEO, analytics, revision, source, and cache-invalidation needs that could turn display-only values into persisted typed data.
93
+ - Content graph rules, taxonomy governance, source and collection provenance, verification states, user or anonymous actor state, comment moderation, and aggregate-versus-event ownership when those surfaces exist.
94
+ - Body block vocabulary, block schema versions, content-type-specific fields, filter definition policy, SEO landing policy, cache key normalization, invalidation tags, admin operation logs, and generated-index ownership when those surfaces exist.
95
+ - Lifecycle status vocabulary, delete alternatives, asset original and variant ownership, claim or fact registry shape, source reference shape, jurisdiction, risk tier, effective dates, verification cadence, review owner, usage mapping, bulk update job model, comparison methodology, affiliate link policy, and data-domain owner when those surfaces exist.
57
96
  - Read and write paths, query or ORM behavior, authorization scope, tenant or user scope, and retention expectations.
58
- - Transaction boundary, idempotency, retry, duplicate-delivery, concurrency, migration, rollback, or rebuild expectations.
97
+ - Read and write workload shape, including repeated reads, freshness requirements, same-row write conflicts, write bursts, retry safety, index write cost, and whether a ledger, read model, or projection is needed.
98
+ - Global data shape, including locale, country, billing country, currency, timezone, local date, local time, UTC instant, market-specific price, tax inclusion, rounding, exchange-rate snapshot, and whether account defaults differ from user preferences.
99
+ - Money and value movement rules, including minor-unit integer storage, decimal calculation precision, currency-code ownership, non-two-decimal currency support, tax, discount, refund, exchange-rate, and historical snapshot policy.
100
+ - AI usage and cost ownership, including feature key, account or workspace scope, user request id, provider call id, idempotency key, model, token counts, cache-hit type, pricing snapshot, cost integer unit, retry cost, and per-plan budget limits.
101
+ - AI policy ownership, including budget records, feature policies, preflight policy decisions, hard limit versus provider alert behavior, selected model, fallback model, blocked reason, remaining budget, maximum input and output tokens, maximum tool calls, maximum agent steps, maximum retries, timeout, and emergency disable state.
102
+ - Usage metering ownership, including account, workspace or organization id, user id, feature key, request type, input size, output size, processing time, external API call flag, retry count, failure count, plan at time of use, credit or quota source, rollup period, and whether user actions fan out into multiple cost records.
103
+ - Plan economics ownership, including which records support customer-level variable cost, contribution margin, P50/P90/P99 usage analysis, free-plan loss ceilings, and plan-limit enforcement without relying only on provider dashboards or monthly invoices.
104
+ - Operational query path versus analytics or reporting path, including whether large scans, grouped aggregates, dashboards, experiments, or behavior logs share operational database resources.
105
+ - Cache role and rebuild expectation, including whether the cache can be cleared without losing logical service state.
106
+ - Public API, mobile API, admin API, integration API, search projection, and internal model boundaries when persisted values leave the database layer.
107
+ - Delete lifecycle, versioning rule, ledger rule, read-model shape, expected query count, ORM dependency boundary, transaction boundary, idempotency, retry, duplicate-delivery, concurrency, migration, rollback, or rebuild expectations.
108
+ - Job and integration tables when relevant: queue, job type, deduplication key, payload shape, status values, attempt count, maximum attempts, next run time, lock expiry, last safe error, dead-letter state, outbox publication state, inbox or webhook receipt state, processed webhook identifiers, manual replay state, and external provider call outcome.
59
109
  - Local database, ORM, repository, fixture, migration, cache, and test patterns.
60
110
  - Relevant command-intent contract entries for tests, builds, docs, release checks, and mustflow validation.
61
111
 
@@ -91,18 +141,178 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
91
141
  - Rebuildable cache: can be deleted and regenerated from files, provider data, or another source.
92
142
  - Read model: derived for lookup, search, reporting, or dashboard use.
93
143
  - Runtime state: coordinates in-flight work, locks, sessions, jobs, or retries.
144
+ - Provenance store: records sources, collection runs, raw items, field-level evidence, verification, and source changes.
145
+ - User state store: records actor-specific reactions, bookmarks, comments, reports, read state, follows, notifications, and personalization inputs.
94
146
  - Analytics store, external provider, or browser storage: owned outside the core domain boundary.
95
- 2. Identify the data owner and derived surfaces. Name which table, file, provider, event log, configuration, or generated artifact owns each value.
96
- 3. Check schema shape: primary keys, foreign keys, unique constraints, nullable fields, defaults, check constraints, status values, timestamps, soft delete fields, tenant scope, audit fields, and retention rules.
97
- 4. Check query semantics: authorization scope, tenant or user scope, role or visibility filters, deleted or archived rows, draft or unpublished rows, effective dates, null handling, stale-data behavior, and error or absence handling.
98
- 5. Check pagination and ordering. Lists need deterministic ordering; cursor pagination needs a stable tie breaker such as a unique id in addition to a timestamp.
99
- 6. Check transaction boundaries. Keep database writes and external side effects separate by default; use explicit states, an outbox, an action ledger, or reconciliation when both must be coordinated.
100
- 7. Check idempotency, retries, duplicate delivery, and concurrency. Look for webhook duplicates, job retries, import reruns, payment callbacks, optimistic locks, compare-and-swap updates, unique-constraint races, and double state transitions.
101
- 8. Check indexes and workload cost. Match indexes to `WHERE`, `JOIN`, `ORDER BY`, and `GROUP BY` behavior, but account for write cost. Look for N+1 queries, expensive counts, full scans, materialized read-model needs, and search-index boundaries.
102
- 9. Check privacy and retention. Prefer omission or bounded metadata over storing raw payloads. Do not persist secrets, hidden reasoning, full transcripts, unbounded logs, or personal data without a clear product rule and retention path.
103
- 10. Check migration, rollback, and rebuild paths. If a migration claim exists, prove idempotency and recovery with `migration-safety-check` or report the gap. If the store is a cache, name the rebuild source and stale-index detection.
104
- 11. Check tests and fixtures. Reuse or add repository/store tests, migration fixtures, query fixtures, adapter fixtures, permission regressions, idempotency or concurrency regressions, and cache rebuild checks as justified by the risk.
105
- 12. Verify and report. Separate proven behavior from unverified rollback, migration, privacy, performance, live-data, or concurrency risks.
147
+ - Behavior analytics store: records high-volume user actions, impressions, searches, scrolls, clicks, experiments, and attribution data without becoming part of the core write path.
148
+ - Audit log store: records high-impact human or system changes that must be attributable, bounded, and more durable than ordinary behavior analytics.
149
+ 2. Classify the database operating model before treating the store as a neutral implementation detail.
150
+ - SQLite is a serverless relational database, not merely a toy store. It can be a strong default for a single durable server, a solo or small operator, mostly-read workloads, modest concurrent writes, simple local development, and product-validation phases where simpler operations reduce failure cost.
151
+ - Prefer PostgreSQL or another server database when the system already needs multiple app servers, high concurrent writes, team or tenant access, payment, credit, point, entitlement, permission, settlement, external operators, read replicas, point-in-time restore, managed backups, database-level collaboration, or stronger locking and operational tooling.
152
+ - Treat deployment as part of the database decision. A local SQLite file is risky when containers, serverless hosts, redeploys, or ephemeral volumes can lose or split the file. A managed PostgreSQL service can be operationally simpler than SQLite when backups, dashboards, access control, and restore tooling are already provided.
153
+ - Check SQLite backup details when SQLite is used. A live database with a write-ahead log needs SQLite's backup mechanism or a storage snapshot that captures the database and log consistently; copying only one visible file can be unsafe.
154
+ - Do not choose PostgreSQL only because the product might grow. Choose it when operating shape, concurrent writes, data responsibility, restore needs, or collaboration pressure already justify the cost.
155
+ 3. Identify the data owner and derived surfaces. Name which table, file, provider, event log, configuration, or generated artifact owns each value.
156
+ - Keep app-owned identity separate from provider identity. Product tables should reference the internal user or organization id; external auth, payment, CRM, analytics, storage, and support ids should live in mapping records.
157
+ - Treat email as a mutable contact attribute, not as a permanent user identifier. Preserve provider subject ids when social login or authentication-provider migration might need account relinking.
158
+ - Keep memberships, roles, permissions, plan entitlements, and product access decisions in product-owned records. Provider tokens or metadata may carry hints, but they should not be the only authority.
159
+ - Inventory managed-database dependencies separately from domain meaning. Extensions, provider auth functions, row-level policies, generated APIs, triggers, stored procedures, and console-created settings are acceptable only when their purpose, migration path, and replacement risk are explicit.
160
+ - Keep storage object identity separate from public resource identity. Persist storage provider, bucket, object key, checksum, content type, size, visibility, status, region, and variant metadata without making raw storage URLs or CDN transformation syntax the public contract.
161
+ - Add data-location fields when customer, legal, or AI-processing requirements need proof of where data lives or is processed. Home region, storage region, processing region, retention policy, and external-processing permission belong with the entity or organization that governs the data.
162
+ - Model AI cost control as data, not only code. Budgets, feature policies, policy decisions, usage ledgers, provider-call outcomes, blocked reasons, and fallback decisions need records when AI cost or compliance matters.
163
+ For hybrid content systems, state which values are owned by:
164
+ - The body document, such as Markdown, MDX, CMS rich text, or editor blocks.
165
+ - Strict frontmatter, such as stable id, type, locale, title, slug, summary, status, author, tags, category, related entities, and SEO defaults.
166
+ - Database tables, such as permissions, workflow, site exposure, publication targets, redirects, facts, relationships, analytics, and operational queries.
167
+ - Structured block records, such as headings, images, review boxes, comparison tables, maps, video, FAQ, quotes, call-to-action blocks, ad placeholders, gated sections, and their schema versions when they must be queried or reused.
168
+ - Generated indexes or projections, such as search documents, sitemap records, feeds, API views, static page dependencies, landing pages, ranking snapshots, cache entries, and admin lists.
169
+ - Domain-owned data areas, such as identity, privacy and consent, editorial content, catalog facts, comparison results, community content, analytics events, billing references, messaging logs, and audit records. Keep these boundaries explicit even when they share one physical database.
170
+ - API projection owners, such as public resource responses, admin views, mobile responses, search documents, and integration payloads. Treat these as contracts over source data, not direct exposure of current tables.
171
+ - Exit-owned data areas, such as export manifests, import manifests, relationship maps, permission maps, file inventories, version or event streams, automation rules, external integration mappings, and schema descriptions. Treat these as reconstruction evidence, not as a decorative download.
172
+ - Provider id mappings, such as payment customer ids, authentication subject ids, storage object ids, CRM contact ids, analytics user ids, and CMS entry ids. Internal ids should remain stable when any provider id changes.
173
+ - External-service core facts, such as current entitlement, subscription or plan state, processed payment event id, email consent state, customer lifecycle state, file identity and ownership, search source document metadata, job processing state, and audit evidence. Do not let a provider dashboard be the only place that can explain these facts.
174
+ - Search and queue reconstruction records, such as index document builders, ranking or synonym policy versions, search logs, queue message schema versions, job idempotency keys, retry state, dead-letter state, and manual replay markers.
175
+ 4. Check schema shape: primary keys, foreign keys, unique constraints, nullable fields, defaults, check constraints, status values, timestamps, soft delete fields, tenant scope, audit fields, and retention rules.
176
+ - Treat deletion as lifecycle when recovery, audit, search behavior, support handling, or retention matters. Consider `deleted_at`, `deleted_by`, `delete_reason`, `restored_at`, `restored_by`, and `purge_after` instead of a lone boolean or timestamp.
177
+ - Separate business records that should be soft-deleted or archived from personal data that should be anonymized, purged, or retained under a narrower legal rule.
178
+ - Treat mutable high-value records as versioned when reproducibility matters, such as AI prompts, documents, contracts, price policies, experiment configs, comparison data, permission policies, automation rules, and model settings. Prefer a stable parent row with a current-version pointer plus immutable version rows.
179
+ - Use ledgers for money-like or quota-like balances, such as points, credits, inventory reservations, refunds, coupon issuance, entitlement grants, and manual adjustments. Treat cached balances as derived from ledger entries unless the local design proves otherwise.
180
+ - For audit logs, store actor type, actor id when safe, action, target type and id, bounded before and after values, reason, request id, idempotency key, and timestamp in the same local transaction as the audited change when possible. Audit logs should be append-only to normal operators and should redact or omit personal data that is not needed to explain the change.
181
+ - Keep ledgers and audit logs separate. Audit logs explain who changed what and why; ledgers explain how money-like, quota-like, inventory-like, or entitlement-like value moved.
182
+ - Use tenant, workspace, organization, or team scope keys early when a product can become B2B, team-based, workspace-based, or account-scoped. Retrofitting tenant boundaries later is usually a data migration and authorization rewrite, not a small column add.
183
+ - Separate locale, country, billing country, currency, and timezone. Language is a display preference, country is a rule boundary, currency is a money unit, and timezone is a date boundary; do not infer one from another.
184
+ - Store UTC instants for events such as creation, payment, refund, login, and update times. Store date-only values as dates, not midnight timestamps. Store recurring or locally scheduled work as local time plus IANA timezone and recurrence rule, with the next UTC run time as a derived execution aid.
185
+ - Store final charge, refund, ledger, and invoice values as integer minor units plus currency code. Use decimal or numeric precision for intermediate unit prices, rates, tax, and exchange calculations, then persist the rounded final amount with the rounding policy that produced it.
186
+ - Do not assume all currencies have two decimal places. Treat currency exponent, tax inclusion, discount order, refund basis, and negative amount direction as explicit policy.
187
+ - Snapshot historical prices, taxes, discounts, exchange rates, payment fees, and AI model pricing when those values explain past orders, ledgers, or provider costs. Do not recompute old business records from current prices or current exchange rates.
188
+ - Keep account defaults and user preferences separate when a team, workspace, or organization can have billing currency, reporting timezone, country, or locale defaults that differ from an individual user's UI language or timezone.
189
+ 5. Check query semantics: authorization scope, tenant or user scope, role or visibility filters, deleted or archived rows, draft or unpublished rows, effective dates, null handling, stale-data behavior, and error or absence handling.
190
+ For content and data-product models, check that:
191
+ - Stable ids do not depend on title, slug, URL, locale, provider display name, category, or current screen placement.
192
+ - Slug and redirect history can survive title changes, localization, section moves, and canonical URL changes.
193
+ - Content type, workflow status, category, tag, typed attribute, relation, author, editor, asset, and source data are separate when they have different query, filtering, permission, localization, or retention behavior.
194
+ - Values likely to be filtered, sorted, aggregated, compared, localized, or verified use typed columns or typed attribute records instead of free-form tags or display strings.
195
+ - Localized display names, translated slugs, labels, and formatted money or date strings are not stable identity. Use internal ids for joins and references, and treat localized names or slugs as projection or localization records.
196
+ - User-visible locale-aware sorting may differ from stable database pagination. Use deterministic secondary ordering, such as a stable id, so pages do not duplicate or skip rows when display strings collide or collation behavior changes.
197
+ - Local-day reporting should convert the user's or workspace's timezone-local half-open day range into UTC and query with `>= start` and `< end`; avoid inclusive `23:59:59` style endings.
198
+ - Semantic body units that drive filtering, search, advertisements, analytics, structured data, access control, or reuse are represented as typed blocks or structured fields instead of opaque `body_html`.
199
+ - Content block records have stable ids, order, type, schema version, visibility or access policy, and enough data shape to migrate old blocks safely.
200
+ - Type-specific fields are not hidden in body text when the type depends on them, such as review ratings, comparison criteria, place coordinates, video duration, course lesson state, product prices, FAQ items, or event dates.
201
+ - Post, item, concept, product, place, person, series, comparison, collection, and redirect relationships use typed relationship records when direction, order, confidence, source, creator, or reason matters.
202
+ - Generic "related content" arrays do not hide distinct relations such as sequel, prerequisite, update, replacement, comparison target, same-series item, rebuttal, summary, or same-topic item.
203
+ - Lifecycle states such as draft, scheduled, published, unlisted, private, archived, deprecated, redirected, gone, and soft-deleted are distinct when they differ in search indexing, access, redirect behavior, retention, or recovery.
204
+ - Tags are governed records with stable ids, slugs, aliases, status, parent links, and merge history when they affect search, filters, topic hubs, recommendations, analytics, or SEO.
205
+ - Categories, navigation, URL hierarchy, and topic taxonomy are not collapsed into one mutable field when a section move should not rewrite content identity.
206
+ - Filter attributes such as region, price range, difficulty, audience, platform, availability, reading time, evergreen status, and download/video availability are typed data, not tags.
207
+ - Translations model "one entity with multiple representations" instead of adding a new schema column per language.
208
+ - SEO metadata, canonical URL, robots behavior, sitemap inclusion, and structured-data type are not buried inside body text.
209
+ - SEO metadata supports automatic defaults and manual overrides without making generated values indistinguishable from editor-provided values.
210
+ - Media assets have enough metadata for accessibility, responsive display, reuse, rights, and ownership instead of only storing an image URL.
211
+ - Media assets keep immutable originals separate from rebuildable variants, and asset storage keys do not depend on mutable post slugs, titles, categories, or locale paths.
212
+ - Site-specific title, slug, SEO text, call-to-action, access level, publish status, and display overrides are separate from canonical content when the same content can appear in multiple sites.
213
+ - Content reuse is represented by references to central content, blocks, entities, or facts rather than copied articles that drift independently.
214
+ 6. Check pagination and ordering. Lists need deterministic ordering; cursor pagination needs a stable tie breaker such as a unique id in addition to a timestamp.
215
+ - For list and feed APIs, define the intended read model and expected query shape before returning full ORM entities. A list response should usually fetch only fields, aggregates, and viewer-specific flags required by that response.
216
+ - Look for N+1 risks from lazy-loaded relations, per-row author lookups, counts, viewer-specific reaction checks, tags, attachments, permissions, or nested data. Prefer explicit joins for small required relations, batch queries for one-to-many data, aggregate or denormalized counters for counts, and query services or read models for complex feeds.
217
+ - Do not expose ORM relation loading style as the public contract of a service. Complex admin, search, reporting, and feed queries may need a dedicated query service, projection, materialized view, or raw SQL with explicit authorization and pagination.
218
+ 7. Check revision, fact, and source semantics when data changes independently from prose.
219
+ - Distinguish record creation time, publish time, meaningful content update time, fact verification time, observation time, source retrieval time, effective date, archive time, deletion time, and anonymization time.
220
+ - Use content revisions for body or editorial changes, and fact/version records for prices, legal requirements, model specs, ratings, availability, release dates, compliance status, or other changing facts.
221
+ - Store source records with origin, retrieval time, hash or version metadata when trust, audit, correction, or refresh behavior matters.
222
+ - Do not claim a page was substantively updated when only formatting, copyright year, or copy polish changed.
223
+ - Distinguish current facts from historical facts so reviews, comparisons, and archive content do not silently rewrite past context.
224
+ - Prefer entity-id references for products, companies, plans, features, people, services, prices, ratings, release dates, support status, and availability; names and display text can change.
225
+ - Use a central claim, fact, or policy registry for legal, privacy, finance, health, price, eligibility, risk-disclosure, compliance, product-spec, and recommendation claims that may need impact analysis.
226
+ - Claim or fact records should carry source references, jurisdiction or market, risk tier, effective dates, verification date, review owner, status, and usage mappings such as post-claim or block-claim records.
227
+ - Bulk policy, price, legal, or fact updates should be modeled as jobs with affected records, proposed changes, reviewer, result, rollback or recovery notes, and cache or index invalidation targets.
228
+ - Comparison and ranking data should preserve methodology id, methodology version, criteria, weights, excluded factors, evidence references, score, rank, calculation time, affiliate policy, and reviewer rather than storing only prose order.
229
+ - Affiliate links should store destination, relationship type, campaign or provider metadata, active status, and outbound-link policy such as sponsored or user-generated link treatment.
230
+ 8. Check source collection and verification semantics when data comes from external pages, feeds, APIs, users, crawlers, or imports.
231
+ - Separate raw collected records from canonical records shown to users. A canonical item may have many source records, and a source record should not automatically become trusted display data.
232
+ - Model source, collection run, raw collected item, canonical item, verification record, and change history separately when source trust, deduplication, conflict handling, or refresh behavior matters.
233
+ - Distinguish source published time, source modified time, collection time, parse time, verification time, stale time, source removal time, and rejection time.
234
+ - Use a verification state flow when needed, such as collected, parsed, auto-verified, human-verified, stale, disputed, source-changed, source-removed, and rejected.
235
+ - Store field-level provenance for high-risk values such as price, address, availability, ratings, legal requirements, official status, feature support, and dates when different fields can come from different sources.
236
+ - Define official versus unofficial source priority, conflict resolution, source takedown behavior, and whether raw snapshots are internal evidence or public content.
237
+ 9. Check user activity and community state separately from canonical content.
238
+ - Store likes, bookmarks, comments, reports, reads, follows, notifications, saved searches, hidden items, and direct-visit state in separate records from content rows.
239
+ - Treat aggregate counts such as likes, comments, views, saves, and recent activity as caches or read models unless the local architecture intentionally makes them the source of truth.
240
+ - Use uniqueness rules for actor-content reactions, cancellation or undo semantics, and deterministic rebuild paths for aggregates.
241
+ - Allow the actor model to represent a signed-in user, anonymous session, admin, system process, crawler, or importer when those actions need attribution or later merging.
242
+ - Treat comments as user-generated content with status, parent relationship, moderation reason, report count, edit/delete timestamps, and parent-deletion behavior instead of as a simple reaction counter.
243
+ 10. Check API and projection boundaries. Public, admin, search, analytics, feed, sitemap, and frontend views should be projections over source data, not copies of the current page layout.
244
+ - Public APIs need stable resource identifiers, schema versioning, pagination, allowed filters and sorts, visibility rules, deleted/private/redirected states, and public/admin field separation.
245
+ - Do not return raw database rows, ORM entities, table column names, internal booleans, or mutable implementation ids as the external contract when a response mapper can expose product concepts instead.
246
+ - Do not shape persisted records or API responses around the current frontend component tree. A screen-specific endpoint can exist, but it should still return domain resources, stable ids, machine-readable statuses, safe labels, pagination, and explicit errors rather than card titles, button text, modal flags, internal storage keys, or display-only color decisions.
247
+ - Keep database refactors and API compatibility separate where possible. Table splits, joins, column renames, or subscription-model changes should be absorbed by projections before they force client changes.
248
+ - Use public identifiers or stable resource ids when exposing user-facing resources; avoid making predictable internal numeric ids the only external handle unless the product deliberately accepts that disclosure.
249
+ - Analytics events should reference stable entity ids and typed dimensions instead of mutable URLs or display text when later attribution, filtering, or aggregation matters.
250
+ - Analytics and experimentation events should avoid direct personal data such as email, names, phone numbers, or payment identifiers; reference anonymous ids or internal user ids and keep re-identification inside the identity boundary.
251
+ - Analytics event records should carry event name, event schema version, occurrence time, source, actor or anonymous id, object type and id, and typed properties. Do not let renamed JSON keys silently mix incompatible event eras.
252
+ - Core events that support billing, entitlement, permission, file lifecycle, search, queue recovery, security, and customer support should not live only in an analytics SaaS. Keep an internal event, audit, billing, job, or product-event record when the event is needed to reconstruct what happened.
253
+ - Email-platform tags, analytics cohorts, search ranking settings, no-code views, and provider dashboard fields should usually be derived from internal state. If any of them becomes source of truth, document the recovery and export path explicitly.
254
+ - Account deletion, anonymization, retention, and export behavior should be defined per data owner, because identity, consent, billing, community content, analytics, messaging, and audit records rarely share one deletion rule.
255
+ - Separate behavior analytics logs from audit logs. Behavior analytics can often tolerate delay or bounded loss; audit logs for administrator, payment, permission, publication, or data-access changes usually require stronger durability, attribution, and retention.
256
+ - Keep behavior logs off the synchronous core write path when losing a click, view, search, or impression event should not fail signup, payment, publish, save, or permission changes.
257
+ 11. Check filter, URL, landing-page, and crawl policy data.
258
+ - Filter definitions name allowed keys, allowed values or value ranges, default behavior, canonical order, multi-value ordering, invalid-value behavior, and whether values belong in the URL.
259
+ - Shareable filter-state URLs, curated SEO landing pages, and temporary UI state are different records or policies when they differ in canonical URL, sitemap, indexing, analytics, or cache behavior.
260
+ - Canonical filter state removes defaults, normalizes case and numeric ranges, sorts multi-values, rejects or drops unknown keys intentionally, and produces one stable URL and one stable analytics/cache shape for the same meaning.
261
+ - Curated landing pages own their path, filter preset, search title, description, canonical target, indexability, and sitemap inclusion instead of being every possible filter combination.
262
+ 12. Check admin-controlled operations and audit data.
263
+ - Admin changes to content status, slug, redirect, canonical URL, robots policy, SEO metadata, filter definitions, advertisement slots, cache purge, search reindexing, ranking refresh, and role assignments need authorization, reason, before/after evidence, and rollback or recovery expectations.
264
+ - Audit snapshots are bounded and redacted where needed; they should explain what changed without turning logs into a raw content or personal-data archive.
265
+ 13. Check rendering, cache, and dependency invalidation data. If changed data should refresh static pages, API caches, search indexes, feeds, sitemaps, ranking snapshots, or comparison pages, model the dependency or report the missing invalidation path.
266
+ - Cache keys for filter, search, listing, and comparison APIs are derived from normalized state and include a version when logic or response shape can change.
267
+ - Cache entries, cache tags, purge rules, page dependencies, search index jobs, sitemap rebuild jobs, and ranking snapshots are derived surfaces unless explicitly designated as source of truth.
268
+ - Admin and personalized responses are not stored in shared caches; private or no-store behavior belongs with the response or route policy, not only with caller convention.
269
+ - Use the cache flush question as a boundary test: if clearing Redis or another cache only makes the service slower, it is a cache; if it loses sessions, queues, rate-limit state, permissions, payment state, or user-visible state, document it as runtime storage and design durability accordingly.
270
+ - Do not use cache as the sole authority for permissions, ownership, subscription, payment, entitlement, inventory, or destructive-action decisions unless the cache is intentionally operated as the authoritative store with backup and recovery guarantees.
271
+ 14. Check file-based and uploaded-asset limits when files own content data or user-uploaded bytes. A file-only model needs strict metadata validation, stable ids, deterministic derived indexes, and an explicit escape path once admin filtering, multi-author workflow, multi-site reuse, fact updates, source verification, user state, structured blocks, or cross-content queries outgrow scripts.
272
+ - User-uploaded originals should usually live in object storage when uploads are a product feature. Store metadata, ownership, storage key, size, content type, checksum, visibility, status, dimensions, and timestamps in the database; do not store large original bytes in ordinary relational rows unless the product has a deliberate blob-storage reason.
273
+ - Model upload and processing states such as pending, uploaded, processing, ready, failed, and deleted when the database and storage cannot commit atomically.
274
+ - Define cleanup for stale pending uploads, storage objects without database records, database records without storage objects, failed conversions, and deleted assets waiting for physical removal.
275
+ 15. Check transaction boundaries. Keep database writes and external side effects separate by default; use explicit states, an outbox, an action ledger, or reconciliation when both must be coordinated.
276
+ - Classify which writes must succeed or fail together. Payment, point, credit, inventory, entitlement, subscription, coupon, refund, and permission changes usually need a transaction for local state and a separate outbox or action ledger for external work.
277
+ - Do not call payment providers, email services, notification services, AI providers, webhooks, or other slow external systems while holding a database transaction open. Persist local state and outbox records first, then let follow-up work run after commit.
278
+ 16. Check durable job, outbox, and provider-call state when HTTP requests hand work to workers.
279
+ - Store the existence of accepted work before depending on a queue publish. Durable job or outbox rows should be recoverable by a dispatcher if the process dies after commit.
280
+ - Job records should carry queue, type, safe payload reference, status, attempts, maximum attempts, run-at time, lock expiry, last safe error, and deduplication key when repeating the work could duplicate effects.
281
+ - Outbox records should identify the aggregate, event type, safe payload, idempotency key when relevant, creation time, and publication state. Outbox publication is derived delivery state, not the domain event source of truth unless the local architecture says so.
282
+ - External provider call records should distinguish `pending`, `succeeded`, `failed`, and `unknown`. Unknown means the system must reconcile with the provider before retrying because the provider may have completed the effect.
283
+ - Processed webhook records should store provider, event id, event type, object id when safe, and receipt time. If the provider can send equivalent events with different event ids, also define a normalized deduplication key.
284
+ - Payment data should separate internal order or entitlement state from provider attempts, provider object ids, webhook receipts, state history, and reconciliation status. Provider payment ids are mappings, not the product's order identity.
285
+ - Email data should separate the product event that requested delivery from provider message ids, template keys, recipient snapshots, retry attempts, bounce events, and manual resend state.
286
+ - AI job data should separate the user request from provider calls, prompt version, input hash, selected model, result state, error state, cost estimate, actual usage, and retry grouping.
287
+ - Search and map data should remain reconstructable from product-owned source records. Search indexes, search jobs, external place ids, geocode payloads, ranking snapshots, and click logs are derived or mapping data unless explicitly made a source of truth.
288
+ - Dead-letter or manual-review records should preserve enough safe metadata to diagnose exhausted retries without storing secrets, raw payment data, full prompts, or unnecessary personal data.
289
+ - AI usage records should be written through one application-owned AI call boundary, not scattered provider SDK calls. At minimum, track account or workspace, user when safe, request id, idempotency key, feature key, provider, model, input and output usage, cached input usage when available, cache-hit type, status, latency, pricing snapshot, and integer cost unit.
290
+ - Distinguish one user request from one or more provider calls. Retries, fallbacks, tool calls, embeddings, reranking, image or audio processing, evaluations, vector storage, and logging can all change cost without changing the user's perceived action count.
291
+ - Store pricing snapshots or pricing version references for AI cost calculations. Current model prices should not rewrite the meaning of historical usage.
292
+ - Store cache key hashes or safe cache identifiers for AI and embedding caches. Do not persist raw prompts, confidential documents, personal data, or provider request bodies merely to explain cache behavior.
293
+ - General usage records should support product economics as well as enforcement. Store tenant scope, user scope when safe, feature key, request type, input and output size, duration, external provider involvement, retry and failure counts, plan snapshot, and quota or credit source when those fields explain cost or limit decisions.
294
+ - Separate raw usage events from rollups. Raw or bounded events explain disputes and debugging; daily or monthly rollups support plan limits, contribution margin, and P50/P90/P99 heavy-user analysis.
295
+ - Do not make provider dashboards, monthly invoices, analytics SaaS reports, or log-search queries the only source for customer-level usage, high-cost feature use, or plan-limit enforcement.
296
+ 17. Check idempotency, retries, duplicate delivery, and concurrency. Look for webhook duplicates, double-clicks, job retries, import reruns, payment callbacks, upload confirmations, optimistic locks, compare-and-swap updates, unique-constraint races, and double state transitions.
297
+ - Use idempotency keys, provider event ids, unique constraints, optimistic locking, row locks, version numbers, processed-event records, or ledger source identifiers when repeating a request could create duplicate money, credit, entitlement, coupon, email, file, or state-change effects.
298
+ - Store a request hash with request idempotency keys. Same scope, operation, and key with the same hash can return the prior result; the same key with a different hash should conflict.
299
+ - Use conditional state updates or insert-on-conflict records for repeated effects. Avoid direct arithmetic updates for value movement unless a ledger or unique source record prevents duplicate application.
300
+ - Define the correct outcome for simultaneous updates by two users, two admins, two webhooks, or a slow worker result arriving after a newer result. Return conflict, retry, merge, ignore stale result, or apply a state-machine event intentionally.
301
+ 18. Check indexes and workload cost. Match indexes to `WHERE`, `JOIN`, `ORDER BY`, and `GROUP BY` behavior, but account for write cost. Look for N+1 queries, expensive counts, full scans, materialized read-model needs, graph traversal needs, aggregate rebuilds, ranking recomputation, and search-index boundaries.
302
+ - Set an expected query-count budget for high-traffic list APIs when possible. Query count should not grow linearly with rows because an ORM relation is accessed inside a loop.
303
+ - Classify large scans, grouped aggregates, long time windows, experiment reports, search-term rankings, and repeated admin dashboards as analytics or reporting work unless they are required for one current user or resource.
304
+ - If an operational database must temporarily support reporting queries, bound them by time, index, row count, connection pool, read replica, or precomputed aggregate and report the escape path.
305
+ - For read-heavy paths, prefer query-pattern clarity, indexes, and precomputed projections before adding caches. A cache without a clear invalidation owner becomes data debt.
306
+ - For write-heavy paths, account for index maintenance, audit writes, lock contention, same-row counters, balance or inventory conflicts, and retry safety before claiming the database is sufficient.
307
+ 19. Check privacy and retention. Prefer omission or bounded metadata over storing raw payloads. Do not persist secrets, hidden reasoning, full transcripts, unbounded logs, unnecessary raw source copies, unnecessary audit snapshots, or personal data without a clear product rule and retention path. Distinguish soft delete, hard delete, anonymization, archive, legal retention, and user-visible removal.
308
+ - Do not let user profile rows become the dumping ground for consent, billing, analytics, editorial authorship, messaging, or audit state when those records have different access, retention, deletion, or legal requirements.
309
+ - Treat content versions, claim snapshots, methodology snapshots, and audit logs as bounded evidence records; they should support rollback and review without becoming unbounded archives of personal data or raw provider payloads.
310
+ 20. Check migration, rollback, and rebuild paths. If a migration claim exists, prove idempotency and recovery with `migration-safety-check` or report the gap. If the store is a cache, name the rebuild source and stale-index detection.
311
+ 21. Check backup and restore assumptions when data durability is claimed. Name the restore surface, including database, uploaded files, environment and secret configuration, migration history, external service settings, queue or job state, and any cache used as storage. Do not treat "backup exists" as evidence until restore has been tested or the gap is reported.
312
+ - Check export and import assumptions separately from backup. A useful export should preserve product meaning, including relationships, permissions, files, states, versions, events, audits, automations, provider mappings, and schema notes. A backup proves recovery inside the same system; an export proves the product can leave or be reconstructed elsewhere.
313
+ - Check whether open-source, self-hosted, or replacement deployments can read the exported shape. If the hosted product relies on cloud-only permission, audit, SSO, backup, webhook, bulk-processing, or admin features, report that the data model may not have a real exit path.
314
+ 22. Check tests and fixtures. Reuse or add repository/store tests, migration fixtures, query fixtures, adapter fixtures, permission regressions, idempotency or concurrency regressions, job-state fixtures, outbox fixtures, webhook-deduplication fixtures, provider-unknown-outcome fixtures, event-schema fixtures, API-projection fixtures, restore drills, and cache rebuild checks as justified by the risk.
315
+ 23. Verify and report. Separate proven behavior from unverified rollback, migration, restore, privacy, performance, live-data, or concurrency risks.
106
316
 
107
317
  <!-- mustflow-section: postconditions -->
108
318
  ## Postconditions
@@ -110,7 +320,27 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
110
320
  - The database role and source of truth are explicit.
111
321
  - Database rows, ORM models, generated caches, and read models do not leak into domain truth unless the local architecture intentionally owns that boundary.
112
322
  - Queries preserve authorization, tenant or user scope, deterministic ordering, expected absence behavior, and retention rules.
323
+ - Content and resource models separate stable identity from mutable titles, slugs, URLs, translations, display fields, revisions, facts, sources, projections, and analytics dimensions when those concerns exist.
324
+ - Content graph, taxonomy, source provenance, verification, user-state, and aggregate records are separated when their ownership, lifecycle, permissions, query load, privacy, or trust behavior differs.
325
+ - Hybrid file/database systems have clear ownership: documents for authoring, typed stores for facts and operations, generated projections for delivery, and migration paths for limits.
326
+ - Filtering, sorting, searching, localization, SEO, API, cache-invalidation, and rendering needs are represented by typed data or explicitly deferred before display-only shortcuts become migration debt.
327
+ - Structured body blocks, filter definitions, curated landing pages, admin audit logs, cache entries, search index jobs, and ranking snapshots are treated as typed records or derived surfaces with explicit owners.
328
+ - Lifecycle states, asset originals and variants, central claims or facts, comparison methodologies, affiliate links, data-domain owners, deletion rules, and bulk update jobs are explicit when those concerns exist.
329
+ - SQLite, PostgreSQL, managed database, local file, object storage, and other persistence choices are tied to operating shape, concurrent-write pressure, restore expectations, and data responsibility rather than vague size assumptions.
330
+ - Locale, country, billing country, currency, timezone, local date, UTC instant, local recurring schedule, and formatted display values are separated when global-ready storage matters.
331
+ - Money, exchange rates, taxes, discounts, refunds, AI model prices, and other cost values have currency or unit ownership, precision policy, rounding policy, and historical snapshots when old records must stay explainable.
113
332
  - Transaction, external side effect, idempotency, duplicate, retry, and concurrency decisions are intentional and reported.
333
+ - Job, outbox, processed-webhook, external-provider-call, dead-letter, and reconciliation records are explicit when asynchronous or external work needs durable recovery.
334
+ - Payment attempts, email deliveries, AI jobs, search indexing, map or location provider references, and manual recovery records preserve internal ids, provider mappings, idempotency, status, retry, and reconciliation evidence when those providers can fail or be replaced.
335
+ - AI usage, provider-call, retry, cache-hit, feature-level cost, pricing snapshot, and budget-limit records are explicit when AI calls can affect user limits, plan economics, or operational cost.
336
+ - Usage metering, quota, credit, rollup, and plan-snapshot records are explicit when customer-level cost, free-plan loss, contribution margin, heavy-user behavior, or limit enforcement affects product viability.
337
+ - Delete lifecycle, versioned update behavior, ledger ownership, read/write model split, ORM boundary, N+1 risk, transaction scope, external side-effect handoff, idempotency, duplicate, retry, and concurrency decisions are intentional and reported.
338
+ - Behavior analytics, audit logs, operational data, reporting aggregates, cache-backed state, API projections, and public identifiers have explicit owners when those surfaces exist.
339
+ - Product-owned identifiers, provider id mappings, semantic export or import records, relationship maps, permission maps, file inventories, automation rules, and event or audit histories have explicit owners when exit or restore is part of the data responsibility.
340
+ - App-owned user ids, provider identity mappings, email-as-attribute behavior, membership and permission records, managed-database feature inventory, public URL records, storage object metadata, data-location fields, and AI budget or feature-policy records are explicit when those concerns exist.
341
+ - Core customer, entitlement, consent, file, content, search, queue, billing, and audit facts are not owned only by external SaaS dashboards or proprietary tool formats unless that lock-in is explicitly accepted and reported.
342
+ - Operational databases are not silently made responsible for high-volume future analytics or reporting scans without a bounded escape path.
343
+ - Caches are classified as disposable derived data or intentional runtime storage with matching durability expectations.
114
344
  - Index, query-cost, migration, rollback, rebuild, privacy, and verification claims are tied to evidence or marked as unverified.
115
345
 
116
346
  <!-- mustflow-section: verification -->
@@ -146,9 +376,12 @@ Prefer the narrowest configured test, build, docs, release, or mustflow intent t
146
376
  - Database role and owner
147
377
  - Affected read and write paths
148
378
  - Schema, constraint, and query semantics reviewed
379
+ - Identity, slug, lifecycle, asset, body block, taxonomy, relationship, attribute, filter URL, landing-page, translation, locale, country, currency, timezone, local-date, money, price snapshot, revision, claim, fact, source, collection, verification, comparison methodology, affiliate link, data-ownership, behavior analytics, audit log, API projection, public identifier, backup or restore, bulk update, admin audit, user-state, aggregate, cache-key, projection, and cache-invalidation checks where relevant
380
+ - Export, import, product-owned id, provider-id mapping, relationship, permission, file, automation, event-history, and reconstruction checks where relevant
149
381
  - Authorization, tenant scope, retention, and privacy checks
150
- - Transaction, idempotency, retry, and concurrency decisions
151
- - Index, pagination, and performance notes
382
+ - Delete lifecycle, versioning, money, usage metering, quota or credit, AI usage ledger, job and outbox state, provider-call reconciliation, transaction, idempotency, retry, and concurrency decisions
383
+ - App-owned identity, provider mapping, managed-database dependency, public URL, storage metadata, data residency, and AI budget or policy-decision checks where relevant
384
+ - Read/write model, ORM boundary, N+1 risk, index, pagination, and performance notes
152
385
  - Migration, rollback, dry-run, rebuild, or compatibility status
153
386
  - Tests, fixtures, or verification command intents run
154
387
  - Skipped checks and reasons