npm - mustflow - Versions diffs - 2.103.3 → 2.103.10 - Mend

mustflow 2.103.3 → 2.103.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/dist/cli/commands/run.js +11 -0
package/dist/cli/i18n/en.js +2 -0
package/dist/cli/i18n/es.js +2 -0
package/dist/cli/i18n/fr.js +2 -0
package/dist/cli/i18n/hi.js +2 -0
package/dist/cli/i18n/ko.js +2 -0
package/dist/cli/i18n/zh.js +2 -0
package/dist/cli/lib/external-skill-import.js +78 -14
package/dist/cli/lib/local-index/sql.js +9 -1
package/dist/cli/lib/run-plan.js +37 -0
package/dist/core/change-impact.js +16 -0
package/dist/core/code-outline.js +3 -13
package/dist/core/config-chain.js +3 -13
package/dist/core/dependency-graph.js +3 -13
package/dist/core/docs-link-integrity.js +23 -4
package/dist/core/env-contract.js +3 -13
package/dist/core/export-diff.js +3 -3
package/dist/core/ignored-directories.js +40 -0
package/dist/core/reference-drift.js +4 -2
package/dist/core/related-files.js +3 -13
package/dist/core/repo-merge-conflict-scan.js +3 -9
package/dist/core/route-outline.js +3 -13
package/dist/core/script-pack-suggestions.js +23 -12
package/dist/core/secret-risk-scan.js +3 -13
package/dist/core/skill-route-resolution.js +21 -1
package/package.json +2 -2
package/schemas/link-integrity-report.schema.json +1 -0
package/schemas/reference-drift-report.schema.json +1 -0
package/templates/default/i18n.toml +19 -7
package/templates/default/locales/en/.mustflow/skills/ai-generated-code-hardening/SKILL.md +30 -7
package/templates/default/locales/en/.mustflow/skills/api-request-performance-review/SKILL.md +12 -6
package/templates/default/locales/en/.mustflow/skills/completion-evidence-gate/SKILL.md +20 -9
package/templates/default/locales/en/.mustflow/skills/hot-path-performance-review/SKILL.md +20 -15
package/templates/default/locales/en/.mustflow/skills/next-action-menu/SKILL.md +22 -7
package/templates/default/locales/en/.mustflow/skills/quadratic-scan-review/SKILL.md +21 -19
package/templates/default/locales/en/.mustflow/skills/react-code-change/SKILL.md +54 -8
package/templates/default/locales/en/.mustflow/skills/vertical-slice-tdd/SKILL.md +22 -8
package/templates/default/manifest.toml +1 -1

package/templates/default/locales/en/.mustflow/skills/hot-path-performance-review/SKILL.md CHANGED Viewed

@@ -2,11 +2,11 @@
 mustflow_doc: skill.hot-path-performance-review
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: hot-path-performance-review
-description: Apply this skill when code is created, changed, reviewed, or reported and the main performance risk is ordinary work repeated many times, such as repeated I/O, repeated scans, hidden quadratic lookup, per-item allocation, lock hold time, sequential async waits, unbounded fan-out, or missing observability for hot paths.
+description: Apply this skill when code is created, changed, reviewed, or reported and the main performance risk is ordinary work repeated many times, such as repeated I/O, repeated scans, hidden quadratic lookup, allocation or GC churn, per-item parsing or serialization, lock hold time, sequential async waits, unbounded fan-out, or missing observability for hot paths.
 metadata:
   mustflow_schema: "1"
   mustflow_kind: procedure
@@ -54,7 +54,7 @@ The review question is not only "which line looks slow?" It is "how often does t
 - Hot path: the request, loop, render, job, export, import, queue consumer, sync, report, or command path under review.
 - Multipliers: requests, rows, items, files, users, tenants, retries, pages, renders, workers, queue messages, shards, or nested loops that multiply the work.
-- Per-iteration cost: external calls, queries, filesystem reads, allocations, clones, DTO conversions, JSON parse/stringify, logging, formatting, regex, sorting, hashing, image or crypto work, and lock hold time.
+- Per-iteration cost: external calls, queries, filesystem reads, temporary arrays, object spreads, array spreads, concat copies, clones, DTO conversions, JSON parse/stringify, string splitting, logging, formatting, regex, sorting, hashing, image or crypto work, and lock hold time.
 - Boundary ledger: DB, network, cache, filesystem, IPC, provider SDK, queue, logger, metrics sink, transaction, pool, mutex, thread, goroutine, task, or UI main thread crossed by the path.
 - Data-size and tail-latency evidence when available: p50, p95, p99, row count, payload size, allocation count, query count, round-trip count, queue depth, pool wait, lock wait, cache hit rate, retry count, or timeout behavior.
 - Correctness boundaries: order, duplicates, idempotency, authorization, tenant isolation, consistency, partial failure, stale data, cancellation, retry semantics, and error behavior.
@@ -100,22 +100,27 @@ The review question is not only "which line looks slow?" It is "how often does t
 10. Reuse expensive clients and sessions. Per-request or per-item HTTP clients, DB clients, ORM clients, SDK clients, connection pools, TLS handshakes, regexes, date formatters, and thread pools are performance traps unless the API requires that lifecycle.
 11. Check cache honesty. A cache needs a bounded key space, invalidation or TTL, max size, authorization dimensions, negative-cache policy, stale behavior, and cache stampede protection such as locking, singleflight, early refresh, or request coalescing.
 12. Check logging and telemetry in hot paths. Repeated debug logs, eager log-string creation, whole-object serialization, high-cardinality metrics, and JSON formatting for discarded logs can dominate CPU and I/O during incidents.
-13. Check string, JSON, DTO, and clone churn. Repeated string concatenation, `JSON.parse(JSON.stringify(...))`, `cloneDeep`, broad object spread, deep copy, repeated DTO-to-DTO conversion, and repeated serialization can move the bottleneck into "clean" mapping code.
-14. Check large value passing and materialization. In value-copy languages or APIs, large structs, arrays, buffers, spread copies, full file reads, full JSON loads, and eager `collect` calls can turn neat code into memory traffic.
-15. Check regex, parsing, formatting, and locale work. Nested or ambiguous regexes, repeated date parsing, timezone conversion, numeric or locale formatting, and per-row formatter creation should be reviewed with worst-case input in mind.
-16. Check CPU-heavy work in request or UI paths. Image resizing, compression, encryption, hashing, diffing, report generation, spreadsheet export, and search indexing may need batching, worker offload, queueing, or streaming, but only with clear backpressure and failure behavior.
-17. Check queues and workers. Moving work to a queue only moves the bottleneck unless consumers batch DB writes, bulk external calls where safe, bound retries, apply jitter, define poison-message handling, and expose backlog.
-18. Check retry and timeout multiplication. A request with several calls, long timeouts, and several retries can become a tail-latency monster. Count worst-case wait and verify idempotency before adding more attempts.
-19. Review tail behavior, not just average. p50 can look fine while p95 or p99 holds locks, connections, workers, or thread-pool slots long enough to hurt everyone else.
-20. Add observability before large optimization when evidence is missing. Prefer query count, external-call count, payload bytes, allocation count, cache hit rate, queue backlog, pool wait, lock wait, retry count, and span timing over guessing.
-21. Rank the likely payoff. Usually fix repeated external round trips, N+1 access, hidden quadratic scans, overfetching, wide transactions, lock hold time, unbounded fan-out, and missing timeouts before micro-optimizing arithmetic.
-22. Label evidence honestly. If there is no configured benchmark or production trace, report the finding as static complexity or hot-path risk, not measured speedup.
+13. Check allocation and GC churn before micro-optimizing arithmetic.
+    - `filter().map().reduce()`, `flatMap`, `Object.values`, `Object.entries`, `split().map(trim)`, `slice`, and `sort` chains can allocate large temporary arrays.
+    - Spread accumulation, `concat` in loops, repeated object spread while building indexes, and `cloneDeep` can copy growing data many times.
+    - `JSON.stringify` or `JSON.parse(JSON.stringify(...))` used for comparison, cloning, cache keys, or logging can dominate CPU and allocation while losing type semantics.
+    - Repeated `RegExp`, `Date`, `Intl`, formatter, `Set`, or `Map` construction inside hot loops should move outside the loop or become request-scoped only when ownership and memory bounds are clear.
+14. Check string, JSON, DTO, and clone churn. Repeated string concatenation, `JSON.parse(JSON.stringify(...))`, `cloneDeep`, broad object spread, deep copy, repeated DTO-to-DTO conversion, and repeated serialization can move the bottleneck into "clean" mapping code.
+15. Check large value passing and materialization. In value-copy languages or APIs, large structs, arrays, buffers, spread copies, full file reads, full JSON loads, all-pages accumulation, and eager `collect` calls can turn neat code into memory traffic.
+16. Check regex, parsing, formatting, and locale work. Nested or ambiguous regexes, repeated date parsing, timezone conversion, numeric or locale formatting, and per-row formatter creation should be reviewed with worst-case input in mind.
+17. Check CPU-heavy work in request or UI paths. Image resizing, compression, encryption, hashing, diffing, report generation, spreadsheet export, and search indexing may need batching, worker offload, queueing, or streaming, but only with clear backpressure and failure behavior.
+18. Check queues and workers. Moving work to a queue only moves the bottleneck unless consumers batch DB writes, bulk external calls where safe, bound retries, apply jitter, define poison-message handling, and expose backlog.
+19. Check retry and timeout multiplication. A request with several calls, long timeouts, and several retries can become a tail-latency monster. Count worst-case wait and verify idempotency before adding more attempts.
+20. Review tail behavior, not just average. p50 can look fine while p95 or p99 holds locks, connections, workers, or thread-pool slots long enough to hurt everyone else.
+21. Add observability before large optimization when evidence is missing. Prefer query count, external-call count, payload bytes, allocation count, heap growth, GC pause, event-loop delay, cache hit rate, queue backlog, queue wait, pool wait, lock wait, retry count, and span timing over guessing.
+22. Rank the likely payoff. Usually fix repeated external round trips, N+1 access, hidden quadratic scans, overfetching, wide transactions, lock hold time, allocation churn, unbounded fan-out, and missing timeouts before micro-optimizing arithmetic.
+23. Label evidence honestly. If there is no configured benchmark or production trace, report the finding as static complexity or hot-path risk, not measured speedup.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
 - Hot path, cost multipliers, data size, round-trip count, wait points, and copy or allocation points are explicit.
-- N+1 queries, repeated external calls, hidden quadratic scans, unbounded materialization, sequential waits, unbounded fan-out, per-item client creation, broad logging, repeated serialization, and lock or transaction hold time are fixed or reported.
+- N+1 queries, repeated external calls, hidden quadratic scans, unbounded materialization, temporary-array chains, spread or concat copy accumulation, sequential waits, unbounded fan-out, per-item client creation, broad logging, repeated parsing or serialization, allocation churn, and lock or transaction hold time are fixed or reported.
 - Cache, queue, retry, timeout, batching, bulk-write, concurrency, pagination, projection, index-fit, and observability behavior are explicit where relevant.
 - Correctness, authorization, tenant isolation, ordering, duplicates, partial failure, cancellation, and stale-data behavior remain intact or are called out as tradeoffs.
 - Performance claims are backed by configured evidence or labeled as static review risk.
@@ -151,7 +156,7 @@ Use the narrowest configured test, build, docs, release, or mustflow intent that
 - Hot path reviewed
 - Cost ledger: iteration count, data size, round trips, wait time, copy or allocation count
 - Repeated external access, N+1, hidden quadratic scans, and multi-pass collection findings
-- DB, pagination, index-fit, transaction, lock, async, client reuse, cache, queue, retry, timeout, logging, serialization, clone, regex, parsing, formatting, and CPU-heavy work checked where relevant
+- DB, pagination, index-fit, transaction, lock, async, client reuse, cache, queue, retry, timeout, logging, temporary arrays, spread or concat accumulation, serialization, clone, regex, parsing, formatting, allocation, GC, and CPU-heavy work checked where relevant
 - Optimization or review recommendation
 - Evidence level: measured, configured-test evidence, static complexity risk, manual-only, missing, or not applicable
 - Command intents run

package/templates/default/locales/en/.mustflow/skills/next-action-menu/SKILL.md CHANGED Viewed

@@ -2,11 +2,11 @@
 mustflow_doc: skill.next-action-menu
 locale: en
 canonical: true
-revision: 1
+revision: 3
 lifecycle: mustflow-owned
 authority: procedure
 name: next-action-menu
-description: Apply this skill when a final report, completion note, repository improvement loop, or follow-up workflow should offer a bounded numbered next-action menu that a user can select with a single digit in the next turn.
+description: Apply this skill when a final report, completion note, repository improvement loop, or follow-up workflow should offer a bounded numbered next-action menu that a user can select with a single digit in the next turn. Use especially after non-trivial completed or paused work, commits, pushes, release or deploy preparation, verification, or remaining approval gates when concrete follow-up actions exist.
 metadata:
   mustflow_schema: "1"
   mustflow_kind: procedure
@@ -36,6 +36,8 @@ scope, approval, verification, command contracts, release gates, or safety rules
 - A final report, completion note, handoff, or repository improvement cycle has one or more useful
   follow-up tasks.
+- A non-trivial task is being reported after changed files, a created commit, completed verification,
+  push readiness, release or deploy preparation, paused work, or another concrete approval gate.
 - The user repeatedly asks for "next recommended work", "continue", "proceed", or selects follow-up
   items after previous completion reports.
 - The agent needs to present a bounded backlog that can be selected by a single digit in the next
@@ -48,9 +50,10 @@ scope, approval, verification, command contracts, release gates, or safety rules
 - The current answer is a tiny direct response with no meaningful follow-up.
 - There are no evidence-backed next actions, or all plausible next actions are speculative.
 - The user asked not to include recommendations, menus, or follow-up prompts.
-- The next action requires a blocking product, security, privacy, legal, release, migration,
-  destructive, dependency, credential, deployment, or payment decision that has not been authorized.
-  Report the decision gate instead of offering it as a one-digit action.
+- The only possible next action requires a blocking product, security, privacy, legal, release,
+  migration, destructive, dependency, credential, deployment, or payment decision that has not been
+  authorized and there is no safe bounded action to describe. Report the decision gate instead of
+  offering it as a one-digit action.
 - Another interface already owns selection state and has a stricter picker, ticket, or work-order
   contract.
@@ -89,8 +92,11 @@ scope, approval, verification, command contracts, release gates, or safety rules
 <!-- mustflow-section: procedure -->
 ## Procedure
-1. Decide whether a menu is useful.
-   - Include a menu only when at least one concrete follow-up task is valuable.
+1. Decide whether a menu is useful or required.
+   - Include a menu when at least one concrete follow-up task is valuable.
+   - For non-trivial completion reports, commits, completed verification, push readiness, release or
+     deploy preparation, paused work, or unresolved approval gates, treat the menu as required when
+     any concrete next action exists.
    - Do not fabricate filler items to reach a fixed row count.
 2. Build at most nine items.
    - Use digits `1` through `9`.
@@ -108,6 +114,13 @@ scope, approval, verification, command contracts, release gates, or safety rules
    the host format allows it.
    - Use four columns: number, next task title, description, and recommendation score.
    - In Korean final reports, use `추천도` for the recommendation-score column label.
+   - Use non-breaking padding in short header cells so narrow renderers do not wrap Korean headers
+     vertically. Prefer this template:
+     `| 번호&nbsp;&nbsp; | 다음 작업 | 설명 | 추천도&nbsp;&nbsp; |`
+     `|---:|---|---|:---:|`
+     For English, prefer:
+     `| No.&nbsp;&nbsp; | Next task | Description | Score&nbsp;&nbsp; |`
+     `|---:|---|---|:---:|`
    - Keep descriptions short enough to scan but specific enough to execute.
    - Localize column labels to the report language when appropriate.
 6. Mark gated items plainly.
@@ -116,6 +129,8 @@ scope, approval, verification, command contracts, release gates, or safety rules
      genuinely plausible follow-ups.
    - The description must state the gate, such as explicit user approval, configured command intent,
      owner decision, or manual verification.
+   - A gated item in the table is only a visible next-action option; it is not approval to perform
+     that action.
 7. Interpret a single-digit next user message as a menu selection only when all conditions hold:
    - the immediately relevant previous assistant final report contained a next-action menu;
    - the digit maps to an item in that menu;

package/templates/default/locales/en/.mustflow/skills/quadratic-scan-review/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.quadratic-scan-review
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: quadratic-scan-review
@@ -84,33 +84,35 @@ The review question is not "is there a loop inside a loop?" That catches only th
 ## Procedure
 1. Name the repeated path and multiply call count by inner scan length. Review the product `outer_count * inner_count`, not the apparent number of loops.
-2. Search for the obvious collection-combinator shapes: `map` plus `filter`, `map` plus `find`, `forEach` plus `includes`, `filter` plus `indexOf`, `reduce` plus spread, and chained `filter().map().sort()` inside a repeated path.
+2. Search for the obvious collection-combinator shapes: `map` plus `filter`, `map` plus `find`, `forEach` plus `includes`, `filter` plus `indexOf`, `filter` plus `findIndex`, `reduce` plus spread, and chained `filter().map().sort()` inside a repeated path.
 3. Search for membership checks over arrays. `includes`, `indexOf`, `contains`, `find`, `some`, and list membership inside a loop usually want `Set.has` or `Map.has` unless the searched list is tiny and hard-capped.
 4. Search for code joins by ID. `posts.map(post => users.find(...))`, `users.map(user => orders.filter(...))`, permission lookups, likes, bookmarks, read state, tags, and relation lists usually need a `Map` or grouped `Map` keyed by ID or composite key.
 5. Check duplicate removal. `filter((x, i) => arr.indexOf(x) === i)` is O(N^2). Prefer `Set` for scalar values and `Map` keyed by stable identity for objects.
 6. Check sorted arrays. Sorting does not make `find` fast. If code repeatedly searches a sorted array, use a prebuilt map, binary search with a proven comparator, or a single sorted merge.
 7. Check repeated sorting. Sorting inside a per-item loop is usually worse than scanning once, keeping a top candidate, using a heap, or sorting once before the loop.
-8. Check copy-accumulation patterns. `reduce` with `[...acc, item]`, repeated object spread over a growing object, repeated string `+=`, and repeated concatenation can become quadratic copy work. Prefer push, builders, buffers, or one final copy at the boundary.
-9. Check JSON and serialization comparisons. Repeated `JSON.stringify` inside search, equality, sort, dedupe, or render logic multiplies object size by item count. Use explicit keys and precomputed normalized keys.
-10. Open helper bodies called from loops or render paths. Harmless helper names can hide full-list scans, database calls, resolver calls, serialization, sorting, or permission checks.
-11. Check ORM and lazy relations. A single visible loop can become one query per entity. Replace per-entity relation access with eager loading, joins, `WHERE id IN (...)`, batch loading, or DataLoader-style batching.
-12. Check GraphQL and nested resolvers. Parent-list resolvers plus per-field DB or API calls create hidden pairwise fan-out. Batch by parent IDs and preserve field-level authorization semantics.
-13. Check render-time lookup. `rows.map(row => columns.find(...))`, `items.map(item => selectedIds.includes(item.id))`, derived data recomputed on every render, and per-row helper scans should move to memoized sets or maps when inputs are large or stable.
-14. Check all-data-in-app joins. Fetching `allUsers`, `allOrders`, or `allLogs` and joining in application arrays is often a database join without an index. Push join, filter, sort, and pagination to the data store when the data store owns the index and semantics allow it.
-15. Check tree and graph construction. `nodes.map(node => nodes.filter(child => child.parentId === node.id))` should usually become `childrenByParentId` plus one assembly pass. `visited.includes(id)` in traversal should be a `Set`.
-16. Check event-log and time-window scans. Repeatedly scanning all previous events per event should usually become grouping, sorting once, and one pointer or rolling aggregate per key.
-17. Check interval overlap. All-pairs range checks are sometimes necessary, but overlap detection often only needs sorting by start and comparing adjacent or active intervals.
-18. Check incremental updates. Adding one item should not recompute a full ranking, group map, unread count, cart total, or dashboard aggregate unless the collection is fixed and tiny.
-19. Separate index from cache. A `Map` built from current input is an index. A cache stores results across calls or time. Use an index for repeated lookup over already-owned data before introducing cache invalidation.
-20. Require a hard cap for "small list" exceptions. Countries, enum options, or fixed config lists may stay arrays if the cap is real. User data, logs, orders, comments, permissions, tags, events, and uploaded rows need scalable lookup.
-21. Preserve behavior while changing shape. Before replacing scans with indexes, state how order, duplicates, first or last match, missing references, authorization filtering, and stable keys are preserved.
-22. Add growth evidence when feasible. If configured tests or fixtures can scale input size, prefer a small growth test that compares behavior at larger counts. If benchmarking is not configured, report complexity-only evidence instead of a speedup claim.
+8. Check queue and deletion patterns. JavaScript `shift()` in a large BFS or queue loop moves the remaining array repeatedly; use a head index or real queue. `findIndex` plus `splice` while matching requests to available items can scan and move the same growing array repeatedly; bucket by key and advance a consumption pointer instead.
+9. Check copy-accumulation patterns. `reduce` with `[...acc, item]`, repeated object spread over a growing object, repeated string `+=`, repeated `concat`, and repeated array spread over a growing result can become quadratic copy work. Prefer push, builders, buffers, or one final copy at the boundary.
+10. Check JSON and serialization comparisons. Repeated `JSON.stringify` inside search, equality, sort, dedupe, or render logic multiplies object size by item count. Use explicit keys and precomputed normalized keys.
+11. Open helper bodies called from loops or render paths. Harmless helper names can hide full-list scans, database calls, resolver calls, serialization, sorting, or permission checks.
+12. Check ORM and lazy relations. A single visible loop can become one query per entity. Replace per-entity relation access with eager loading, joins, `WHERE id IN (...)`, batch loading, or DataLoader-style batching.
+13. Check GraphQL and nested resolvers. Parent-list resolvers plus per-field DB or API calls create hidden pairwise fan-out. Batch by parent IDs and preserve field-level authorization semantics.
+14. Check render-time lookup. `rows.map(row => columns.find(...))`, `items.map(item => selectedIds.includes(item.id))`, derived data recomputed on every render, and per-row helper scans should move to memoized sets or maps when inputs are large or stable.
+15. Check all-data-in-app joins. Fetching `allUsers`, `allOrders`, or `allLogs` and joining in application arrays is often a database join without an index. Push join, filter, sort, and pagination to the data store when the data store owns the index and semantics allow it.
+16. Check tree and graph construction. `nodes.map(node => nodes.filter(child => child.parentId === node.id))` should usually become `childrenByParentId` plus one assembly pass. `visited.includes(id)` in traversal should be a `Set`. Very deep trees may also need an explicit stack to avoid call-stack failure.
+17. Check event-log and time-window scans. Repeatedly scanning all previous events per event should usually become grouping, sorting once, and one pointer or rolling aggregate per key.
+18. Check interval overlap. All-pairs range checks are sometimes necessary, but overlap detection often only needs sorting by start and comparing adjacent or active intervals.
+19. Check true all-pairs similarity separately. If every item must be compared with every other item, do not promise a linear rewrite. First narrow candidates with stable keys, categories, buckets, hashes, n-grams, ranges, or database indexes, then compare only within the candidate set.
+20. Check incremental updates. Adding one item should not recompute a full ranking, group map, unread count, cart total, or dashboard aggregate unless the collection is fixed and tiny.
+21. Separate index from cache. A `Map` built from current input is an index. A cache stores results across calls or time. Use an index for repeated lookup over already-owned data before introducing cache invalidation.
+22. Require a hard cap for "small list" exceptions. Countries, enum options, or fixed config lists may stay arrays if the cap is real. User data, logs, orders, comments, permissions, tags, events, and uploaded rows need scalable lookup.
+23. Preserve behavior while changing shape. Before replacing scans with indexes, state how order, duplicates, first or last match, missing references, authorization filtering, and stable keys are preserved.
+24. Add growth evidence when feasible. If configured tests or fixtures can scale input size, prefer a small growth test that compares behavior at larger counts. If benchmarking is not configured, report complexity-only evidence instead of a speedup claim.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
 - Each suspected O(N^2) path has an outer count, inner count, and data-growth classification.
-- Repeated membership checks, code joins, duplicate removal, tree building, resolver fan-out, render-time lookup, helper-hidden scans, repeated sort, copy accumulation, and JSON comparison are fixed or reported.
+- Repeated membership checks, code joins, duplicate removal, tree building, resolver fan-out, render-time lookup, helper-hidden scans, repeated sort, queue `shift()`, `findIndex` plus `splice`, copy accumulation, interval scans, all-pairs candidate narrowing, and JSON comparison are fixed or reported.
 - Array-to-set or array-to-map changes preserve order, duplicates, missing records, first or last winner, authorization, and stable key behavior.
 - Small-list exceptions have an explicit hard cap or are reported as residual risk.
 - Performance claims are backed by configured evidence or labeled as static complexity risk.
@@ -146,7 +148,7 @@ Use the narrowest configured test, build, docs, release, or mustflow intent that
 - Repeated path reviewed
 - Outer count, inner count, and data-growth classification
 - Hidden scan patterns found or ruled out
-- Membership, join, dedupe, helper, ORM, resolver, render, tree, graph, event, interval, sort, copy, string, and JSON checks where relevant
+- Membership, join, dedupe, helper, ORM, resolver, render, tree, graph, event, interval, all-pairs, queue, deletion, sort, copy, string, and JSON checks where relevant
 - Index, grouping, sorted merge, database join, or intentional all-pairs decision
 - Semantics preserved: order, duplicates, first or last winner, missing IDs, authorization, and stable keys
 - Evidence level: configured test, static complexity risk, manual-only, missing, or not applicable

package/templates/default/locales/en/.mustflow/skills/react-code-change/SKILL.md CHANGED Viewed

@@ -2,11 +2,11 @@
 mustflow_doc: skill.react-code-change
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: react-code-change
-description: Apply this skill when React, React DOM, React Server Components, Server Actions, React Compiler, Hooks, Suspense, Actions, forms, refs, context, concurrent rendering, SSR streaming, resource hints, package metadata, or React-related tests are created, changed, reviewed, or upgraded.
+description: Apply this skill when React, React DOM, React Server Components, Server Actions, React Compiler, Hooks, Suspense, Actions, forms, refs, context, render performance, concurrent rendering, SSR streaming, resource hints, package metadata, or React-related tests are created, changed, reviewed, or upgraded.
 metadata:
   mustflow_schema: "1"
   mustflow_kind: procedure
@@ -75,6 +75,10 @@ expect current React guidance and small, compatible changes.
 - State and mutation evidence: local state owner, derived values, external
   stores, context providers, forms, Actions, optimistic updates, and rollback
   behavior.
+- Render performance evidence: React DevTools Profiler or `<Profiler>` data when
+  available, render count, render duration, prop identity changes, context update
+  scope, list size, DOM node count, key stability, layout effect use, first-load
+  bundle ownership, and offscreen DOM cost.
 - Configured verification intents for lint, build, tests, docs, package, and
   mustflow checks.
@@ -186,14 +190,49 @@ expect current React guidance and small, compatible changes.
      errors, resets, progressive enhancement, and rollback.
    - Keep explicit error handling, authorization, validation, idempotency, and
      rollback behavior. Do not hide server failures behind optimistic UI.
-10. **Respect React 19.2 rendering and performance APIs.**
+10. **Review React render hot paths with evidence.**
+    - Use React DevTools Profiler, `<Profiler>`, framework traces, or existing
+      render-count evidence before claiming a render-performance fix. If none is
+      configured, report static render risk instead of measured speedup.
+    - Check whether state is owned too high in the tree. Search inputs, tabs,
+      modal flags, hover state, and local drafts should not rerender a whole page
+      unless that page truly owns the state.
+    - Check `memo` failures from unstable props. Inline objects, arrays, functions,
+      and selector results can make `React.memo` ineffective; prefer primitive
+      props, stable callbacks, or moving object creation behind a real dependency.
+    - Move expensive render-time `filter`, `sort`, `map`, grouping, and lookup work
+      behind `useMemo`, server-side pagination, route loaders, or pre-indexed data
+      when input size can grow.
+    - Large lists need pagination, infinite query boundaries, virtualization, or a
+      documented hard cap. Do not render thousands of rows because the sample data
+      has twenty.
+    - Reject unstable keys such as array index for reorderable data and
+      `Math.random()` for any list. Use stable item identity so React preserves
+      row state and avoids forced remounts.
+    - Split oversized context values by change frequency and ownership. `memo`
+      does not stop rerenders caused by a fresh context value.
+    - Do not use `useEffect` plus `setState` for values derived from current props
+      or state. Compute during render or memoize the calculation to avoid the
+      extra render pass.
+    - For search and filtering, keep the controlled input urgent and move heavy
+      result updates behind `useDeferredValue`, `useTransition`, server filtering,
+      or pagination when the supported React version and UX allow it.
+    - Use `useLayoutEffect` only when pre-paint measurement is required. Avoid
+      DOM read/write interleaving that causes layout thrashing.
+    - Lazy-load heavy charts, editors, maps, markdown renderers, syntax
+      highlighters, and modal-only widgets when they are not needed for the first
+      interaction path.
+    - For large offscreen sections, consider `content-visibility` plus
+      `contain-intrinsic-size`, framework lazy boundaries, or route splitting when
+      browser support and layout stability are acceptable.
+11. **Respect React 19.2 rendering and performance APIs.**
     - Treat `<Activity>` as hidden UI with preserved state, unmounted effects,
       and lower-priority hidden updates, not as `display: none` or ordinary
       conditional rendering.
     - Use React Performance Tracks, React DevTools, or existing profiler evidence
       when claiming render, effect, Scheduler, transition, or component
       performance improvements.
-11. **Keep server rendering and RSC boundaries exact.**
+12. **Keep server rendering and RSC boundaries exact.**
     - Distinguish Server Components from Server Actions. `"use server"` marks
       server functions or modules for actions; it is not a Server Component tag.
     - Keep browser APIs, client state, and event handlers out of Server
@@ -206,13 +245,13 @@ expect current React guidance and small, compatible changes.
     - In Node environments, do not assume Web Streams are faster than Node
       streams; preserve the existing SSR stream API unless the task proves the
       runtime benefit and compression behavior.
-12. **Use React DOM document and resource APIs close to the owner.**
+13. **Use React DOM document and resource APIs close to the owner.**
     - Metadata, stylesheets with `precedence`, async scripts, `preinit`,
       `preload`, `preconnect`, and `prefetchDNS` may belong near the component
       that needs them when React and the framework support that behavior.
     - Avoid duplicate head managers, resource hint spam, and hints for assets
       whose timing or priority is unproven.
-13. **Verify through the repository contract.**
+14. **Verify through the repository contract.**
     - Run the smallest configured checks that cover changed React code, package
       metadata, build output, docs, and tests.
     - Report missing browser, hydration, SSR, RSC, compiler, profiler, or
@@ -225,12 +264,16 @@ expect current React guidance and small, compatible changes.
   status are known or explicitly reported as unknown.
 - Effects, state, memoization, context, refs, forms, Suspense, and async
   boundaries follow React's current model for the supported version.
+- Render performance claims are backed by profiler or render-count evidence, or
+  static risks such as state too high, unstable props, render-time transforms,
+  huge lists, unstable keys, oversized context, derived-state effects, layout
+  thrashing, eager heavy widgets, and offscreen DOM cost are reported honestly.
 - React 19 and React 19.2 APIs are not introduced into code that still promises
   older React compatibility.
 - SSR, RSC, Server Action, browser-only, and resource-hint boundaries are
   preserved.
-- Performance claims have profiler or benchmark evidence, or are reported as
-  unverified.
+- Performance claims have profiler, benchmark, render-count, or configured
+  evidence, or are reported as unverified.
 <!-- mustflow-section: verification -->
 ## Verification
@@ -271,6 +314,9 @@ surfaces changed.
 - React surface and supported version checked
 - Compiler, lint, effect, state, memoization, context, ref, form, Suspense, SSR,
   RSC, and resource-boundary notes
+- Render performance notes: profiler evidence, state ownership, prop identity,
+  render-time work, list size, key stability, context scope, derived state,
+  layout effects, lazy loading, and offscreen DOM
 - Freshness-sensitive React claims checked or left conservative
 - Files changed
 - Command intents run

package/templates/default/locales/en/.mustflow/skills/vertical-slice-tdd/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.vertical-slice-tdd
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: vertical-slice-tdd
@@ -30,7 +30,7 @@ metadata:
 Support explicit test-driven development without making test-first work mandatory for every mustflow task.
-This skill keeps TDD work in narrow vertical behavior slices: one observable contract, one focused test change, the smallest implementation that proves it, and only then a local refactor inside the covered slice.
+This skill keeps TDD work in one vertical behavior slice at a time: choose the next test by risk and evidence value, prove one observable contract, attack the test for false-green weakness, implement only enough behavior to pass, and only then refactor inside the covered slice.
 <!-- mustflow-section: use-when -->
 ## Use When
@@ -54,6 +54,7 @@ This skill keeps TDD work in narrow vertical behavior slices: one observable con
 - User request or issue evidence that makes TDD or slice-by-slice work appropriate.
 - The observable behavior contract for the first slice.
+- A short test list or risk list, ordered by which test would expose the most important uncertainty next.
 - Existing tests, fixtures, and helpers near that behavior.
 - The expected RED category and baseline status before implementation.
 - Relevant command-intent contract entries for the narrowest verification path.
@@ -78,9 +79,11 @@ This skill keeps TDD work in narrow vertical behavior slices: one observable con
 <!-- mustflow-section: procedure -->
 ## Procedure
-1. Select one vertical behavior slice.
+1. Select the next evidence-bearing slice.
    - Name the user-visible or public behavior.
    - Define the smallest input, action, and observable output that prove the slice.
+   - Prefer the test that would reveal the riskiest unknown, boundary, integration contract, or regression path, not merely the easiest happy path.
+   - Treat Red-Green-Refactor as the inner loop, not the whole method. Do not start adding tests before choosing why this test is the next useful evidence.
    - Keep cross-cutting infrastructure, broad refactors, and speculative future cases outside the slice.
 2. Find existing coverage.
    - Prefer extending a nearby existing test when it already owns the behavior surface.
@@ -90,30 +93,39 @@ This skill keeps TDD work in narrow vertical behavior slices: one observable con
    - Use `test-design-guard` to select the test shape and assertion.
    - Assert observable behavior such as a return value, exit code, output, file effect, state transition, schema result, or error shape.
    - Keep mocks supportive rather than the only behavior evidence, unless the interaction itself is the public contract.
-4. Classify the RED result before implementation.
+4. Attack the test before trusting it.
+   - Ask what bug could still pass this test. Strengthen the assertion when the answer is concrete and in scope.
+   - Prefer property, contract, approval, integration, or mutation-style evidence only when `test-design-guard` shows that shape fits the contract and stays bounded.
+   - For legacy code, use characterization or approval-style evidence to freeze current behavior before refactoring when the intended behavior is not yet trusted.
+   - For API or service boundaries, prefer consumer, schema, or contract evidence over mocks of the provider's imagined behavior.
+   - If implementation was AI-assisted, check that generated code did not outrun the selected test by adding untested branches, features, or public behavior.
+5. Classify the RED result before implementation.
    - `behavior_red` is the only valid behavior RED.
    - `api_scaffold_red` may be reported only for an explicitly new public API scaffold and must not be counted as behavior coverage.
    - `invalid_red` includes setup failures, wrong imports, missing unrelated symbols, runner failures, fixture failures, syntax or type errors, bad mocks, missing awaits, environment failures, and unrelated baseline failures.
    - If RED is invalid, fix the test setup or report the invalid evidence before changing implementation behavior.
-5. Implement the smallest behavior change.
+6. Implement the smallest behavior change.
    - Change only the code needed for the current observable contract.
    - Preserve existing public behavior outside the slice.
    - Avoid introducing abstractions unless they directly reduce complexity in the current slice.
-6. Verify GREEN with the narrowest configured command intent.
+   - Do not accept a broad AI-generated implementation just because the narrow test turned green; trim or defer unproven behavior.
+7. Verify GREEN with the narrowest configured command intent.
    - Start with the intent that covers the changed test and implementation surface.
    - Escalate only when the slice crosses public surfaces, package or template contracts, or the related selector cannot cover the changed files.
    - Keep command evidence separate from RED evidence and implementation notes.
-7. Refactor only after GREEN.
+8. Refactor only after GREEN.
    - Limit refactoring to code covered by the slice.
    - Re-run the same configured verification intent after behavior-preserving cleanup when the refactor is non-trivial.
-8. Decide whether to continue.
+9. Decide whether to continue.
    - Repeat only when the next slice is clearly in scope.
+   - Reorder the remaining test list when new evidence changes the highest-risk unknown.
    - Stop and report deferred slices when the remaining work is broader than the user request or needs a new decision.
 <!-- mustflow-section: postconditions -->
 ## Postconditions
 - Each completed slice has a named behavior contract, RED category, implementation summary, and GREEN verification evidence.
+- Each completed slice records why that test was chosen next and how false-green risk was checked.
 - Invalid RED and scaffold-only RED are not reported as behavior coverage.
 - Deferred slices, rejected speculative cases, skipped checks, and remaining risks are explicit.
 - No command execution claim relies on anything outside the configured command intents.
@@ -145,10 +157,12 @@ Prefer the narrowest configured intent that proves the current slice. Escalate o
 ## Output Format
 - TDD trigger and slice scope
+- Next-test selection rationale
 - Existing coverage reused
 - Slices completed
 - Slices deferred
 - Cases rejected as duplicate or speculative
+- False-green checks and test-strength limits
 - RED Evidence:
   - category: `behavior_red`, `api_scaffold_red`, `invalid_red`, or `not_applicable`
   - command intent

package/templates/default/manifest.toml CHANGED Viewed

@@ -1,6 +1,6 @@
 id = "default"
 name = "default"
-version = "2.103.3"
+version = "2.103.10"
 description = "Minimal workflow for LLM agents to read, edit, and verify their work in a repository."
 common_root = "common"
 locales_root = "locales"