npm - mustflow - Versions diffs - 2.22.4 → 2.22.9 - Mend

mustflow 2.22.4 → 2.22.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

package/README.md +17 -75
package/dist/cli/commands/classify.js +2 -0
package/dist/cli/commands/contract-lint.js +2 -2
package/dist/cli/commands/dashboard.js +23 -75
package/dist/cli/commands/help.js +8 -9
package/dist/cli/commands/impact.js +2 -3
package/dist/cli/commands/init.js +61 -5
package/dist/cli/commands/run/receipt.js +1 -0
package/dist/cli/commands/run.js +14 -1
package/dist/cli/commands/update.js +2 -2
package/dist/cli/commands/verify/evidence-input.js +269 -0
package/dist/cli/commands/verify/input.js +212 -0
package/dist/cli/commands/verify.js +23 -482
package/dist/cli/commands/version-sources.js +2 -3
package/dist/cli/i18n/en.js +5 -0
package/dist/cli/i18n/es.js +5 -0
package/dist/cli/i18n/fr.js +5 -0
package/dist/cli/i18n/hi.js +5 -0
package/dist/cli/i18n/ko.js +5 -0
package/dist/cli/i18n/zh.js +5 -0
package/dist/cli/lib/agent-context.js +6 -11
package/dist/cli/lib/dashboard-export.js +2 -0
package/dist/cli/lib/dashboard-mutations.js +79 -0
package/dist/cli/lib/local-index/command-effect-index.js +25 -0
package/dist/cli/lib/local-index/hashing.js +7 -0
package/dist/cli/lib/local-index/index.js +127 -823
package/dist/cli/lib/local-index/source-index.js +137 -0
package/dist/cli/lib/local-index/verification-evidence.js +451 -0
package/dist/cli/lib/local-index/workflow-documents.js +204 -0
package/dist/cli/lib/mustflow-read.js +41 -0
package/dist/cli/lib/project-root.js +1 -2
package/dist/cli/lib/repo-map.js +65 -16
package/dist/cli/lib/run-root-trust.js +27 -0
package/dist/cli/lib/templates.js +124 -8
package/dist/cli/lib/toml.js +6 -1
package/dist/cli/lib/validation/constants.js +2 -0
package/dist/cli/lib/validation/index.js +291 -22
package/dist/cli/lib/validation/primitives.js +2 -2
package/dist/cli/lib/validation/test-selection.js +2 -2
package/dist/core/bounded-output.js +32 -7
package/dist/core/change-classification-policy.js +47 -0
package/dist/core/change-classification.js +10 -43
package/dist/core/check-issues.js +7 -1
package/dist/core/command-contract-validation.js +28 -4
package/dist/core/command-env.js +1 -1
package/dist/core/config-loading.js +9 -3
package/dist/core/contract-lint.js +8 -3
package/dist/core/correlation-id.js +16 -0
package/dist/core/run-receipt.js +1 -0
package/dist/core/safe-filesystem.js +11 -4
package/dist/core/skill-route-alignment.js +1 -0
package/dist/core/skill-route-explanation.js +9 -3
package/dist/core/test-selection.js +2 -3
package/dist/core/verification-scheduler.js +7 -6
package/dist/core/version-sources.js +2 -3
package/package.json +4 -1
package/schemas/README.md +4 -0
package/schemas/change-verification-report.schema.json +4 -0
package/schemas/classify-report.schema.json +4 -0
package/schemas/commands.schema.json +1 -0
package/schemas/dashboard-export.schema.json +4 -0
package/schemas/latest-run-pointer.schema.json +4 -0
package/schemas/run-receipt.schema.json +4 -0
package/schemas/verify-report.schema.json +4 -0
package/schemas/verify-run-manifest.schema.json +4 -0
package/templates/default/i18n.toml +3 -3
package/templates/default/locales/en/.mustflow/skills/INDEX.md +10 -6
package/templates/default/locales/en/.mustflow/skills/architecture-deepening-review/SKILL.md +25 -2
package/templates/default/locales/en/.mustflow/skills/routes.toml +2 -2
package/templates/default/locales/en/.mustflow/skills/security-privacy-review/SKILL.md +9 -1
package/templates/default/locales/en/.mustflow/skills/test-design-guard/SKILL.md +9 -1
package/templates/default/manifest.toml +1 -1

package/templates/default/locales/en/.mustflow/skills/INDEX.md CHANGED Viewed

@@ -36,14 +36,18 @@ refer to `AGENTS.md` and `.mustflow/config/commands.toml` to implement the most
 ## Selection Convention
-- Choose one primary skill that best describes the main work. Prefer the most specific matching
-  skill over a broad architecture or review skill.
+- Choose one main route: a `primary` route for ordinary work or an `authoring` route for
+  mustflow authoring and maintenance work. Prefer the most specific matching skill over a broad
+  architecture or review skill.
+- Treat `authoring` routes as selectable main routes, not adjunct routes. Use them when the task
+  creates or maintains mustflow-owned skills, context files, command contracts, route metadata, or
+  public documentation entries.
 - Add no more than two adjunct skills for secondary risks such as tests, documentation, security,
   privacy, release, or contract drift.
 - Treat event-triggered skills as inactive until the event occurs. For example, read
   `failure-triage` only after a configured command intent or verification step fails.
-- If several primary skills appear to match, choose the one tied to the files and behavior being
-  changed now, then report the skipped plausible skills instead of reading every route.
+- If several main routes appear to match, choose the one tied to the files and behavior being
+  changed now, then report the skipped plausible routes instead of reading every route.
 ## Classification Prefilter
@@ -78,8 +82,8 @@ test tasks from requiring a full read of architecture-pattern routes. Categories
 ## Specific Routes
-After choosing a category, choose one primary route and at most two adjunct routes. Event routes
-stay inactive until their event occurs.
+After choosing a category, choose one main route (`primary` or `authoring`) and at most two adjunct
+routes. Event routes stay inactive until their event occurs.
 ### Bug and Failure

package/templates/default/locales/en/.mustflow/skills/architecture-deepening-review/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.architecture-deepening-review
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: architecture-deepening-review
@@ -37,6 +37,7 @@ This is a review-first skill. It helps decide whether code needs a deeper module
 ## Use When
 - The user asks for architecture review, module boundaries, structural improvement, codebase deepening, maintainability review, or testability improvement.
+- The user asks where a design will break first as it grows, which responsibility boundary is most likely to blur, or whether a module, service, database owner, permission model, deployment unit, or failure boundary is still clear enough.
 - A file, module, service, handler, command, controller, or test suite looks broad enough that the next edit may add another responsibility.
 - Code exposes internal steps to many callers, repeats orchestration, or makes tests hard because policy, I/O, formatting, and dispatch are mixed.
 - A shallow wrapper adds naming without hiding complexity, or a helper has become a pass-through layer around many unrelated concerns.
@@ -57,6 +58,7 @@ This is a review-first skill. It helps decide whether code needs a deeper module
 - Target area, current pain, and the user-facing or maintainer-facing reason to inspect architecture.
 - Relevant source files, call sites, exports, tests, fixtures, schemas, templates, or documentation that show current behavior and ownership.
 - Local patterns for modules, boundaries, naming, errors, dependency direction, and tests.
+- The data owner, write path, failure mode, and expected 3x, 10x, or 100x growth pressure when the review is about a design rather than only a file split.
 - Current changed-file list when the worktree is already dirty.
 - Relevant command-intent contract entries for verification.
@@ -88,7 +90,22 @@ This is a review-first skill. It helps decide whether code needs a deeper module
 3. Identify one to three candidate boundaries.
    - Each candidate must name the responsibility it would hide or clarify.
    - Reject candidates that only rename, wrap, or move code without lowering caller complexity or test cost.
-4. Score each candidate from 1 to 9.
+4. Force the design through the ownership and failure questions before scoring.
+   - Name the first likely mixed-responsibility boundary. Common early failures are business rules leaking into controllers, repositories, external adapters, UI components, or framework-specific handlers.
+   - Name the final owner for important data. The owner is the module that protects the invariant, not necessarily the module that reads the value most often.
+   - Separate original state from cache, search index, analytics, summary, AI output, provider response, or other derived state.
+   - Identify every direct write path for high-impact fields such as status, role, permission, balance, quota, plan, entitlement, deleted state, payment state, or ownership.
+   - Ask whether a failure creates a visible failure state or silently creates false success. High-impact paths such as authorization, payment, entitlement, deletion, and destructive administration should fail closed.
+   - Ask whether duplicate requests, retries, webhook redelivery, queue replay, or worker restart can repeat a harmful effect. If yes, require an idempotency, ledger, outbox, or reconciliation boundary before calling the design safe.
+5. Check growth pressure in concrete stages.
+   - At 3x scale, look first for implementation-quality failures: missing indexes, N+1 reads, large responses, synchronous file or image work, repeated external calls, and insufficient connection pools.
+   - At 10x scale, look first for ownership and state failures: write hot spots, queue delay, cache invalidation, server-local files, scattered permission rules, external API rate limits, and deployment units that change for unrelated reasons.
+   - At 100x scale, look first for partitioning and operational failures: data split boundaries, tenant or region hot spots, retry storms, external dependency isolation, long deploy recovery, missing observability, and manual-only recovery paths.
+6. Check scaling direction without forcing premature distribution.
+   - A small team may start with one larger server or a simple server set, but request handlers should not depend on process memory, local uploads, duplicate cron execution, in-transaction external calls, or server-specific job state.
+   - Application servers should be able to become stateless. Databases may start with vertical scaling, but the design should not block read replicas, read models, queue-backed work, or future data partitioning.
+   - Horizontal scaling is only real if any server can handle the same request, workers can safely duplicate or retry work, and database writes do not all converge on an uncontrolled hot spot.
+7. Score each candidate from 1 to 9.
    - User value: whether the structure protects a user-visible or public contract.
    - Maintenance value: whether future changes become smaller or less error-prone.
    - Blast radius: how many callers, files, schemas, templates, or docs would change.
@@ -111,6 +128,8 @@ This is a review-first skill. It helps decide whether code needs a deeper module
 - The output contains a ranked architecture candidate list or one scoped structural change.
 - Any chosen change has a named reason tied to lower change cost, lower defect risk, or better testability.
+- Important data has a named owner, write path, original-or-derived classification, and failure behavior when the reviewed design touches durable state.
+- Growth pressure is either checked at 3x, 10x, and 100x or explicitly marked not relevant to the current architecture decision.
 - Behavior changes are excluded or explicitly moved to a separate follow-up.
 - Verification evidence or verification gaps are reported without claiming unrun checks passed.
@@ -145,6 +164,10 @@ Use documentation and release checks only when the review or chosen change touch
 - Review target and current pain
 - Evidence inspected
+- Data owner, write path, and original-versus-derived state when relevant
+- Failure mode, idempotency, and recovery boundary when relevant
+- 3x, 10x, and 100x growth pressure when relevant
+- Vertical versus horizontal scaling direction when relevant
 - Candidate boundaries and scores
 - Selected next action
 - Narrower skill used or intentionally avoided

package/templates/default/locales/en/.mustflow/skills/routes.toml CHANGED Viewed

@@ -2,7 +2,7 @@ schema_version = "1"
 [routes."artifact-integrity-check"]
 category = "ui_assets"
-route_type = "adjunct"
+route_type = "primary"
 priority = 80
 applies_to_reasons = ["package_metadata_change", "release_risk"]
@@ -212,7 +212,7 @@ applies_to_reasons = ["test_change"]
 [routes."security-privacy-review"]
 category = "security_privacy"
-route_type = "adjunct"
+route_type = "primary"
 priority = 30
 applies_to_reasons = ["security_change", "privacy_change"]

package/templates/default/locales/en/.mustflow/skills/security-privacy-review/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.security-privacy-review
 locale: en
 canonical: true
-revision: 16
+revision: 17
 lifecycle: mustflow-owned
 authority: procedure
 name: security-privacy-review
@@ -31,6 +31,7 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
 ## Use When
 - A change touches authentication, authorization, sessions, admin behavior, tenant boundaries, personal data, secrets, tokens, credentials, API keys, or private files.
+- A feature adds role, permission, administrator, internal-tool, feature-flag, emergency-access, support, or back-office exceptions that could make the authorization model less explicit over time.
 - A change comes from AI-generated code, vibe-coded output, copied examples, or a broad assistant patch that may have optimized for the happy path without proving abuse boundaries.
 - A change adds or modifies logging, telemetry, diagnostics, receipts, reports, caches, generated state, retention, redaction, export, or external transmission.
 - A change adds or modifies behavior analytics events, event schemas, page views, clicks, searches, impressions, scroll data, experiments, attribution, request traces, or observability data that may include personal data or sensitive context.
@@ -76,6 +77,7 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
 - Changed files, diff summary, and the user goal.
 - Sensitive data, actor, trust boundary, storage, logging, retention, export, or external disclosure surfaces involved.
 - Actor, resource owner, tenant boundary, server-side authorization rule, state-changing route, external network target, dependency source, and agent/tool permission surface involved.
+- Permission model shape when authorization is involved: actor, resource, action, scope, condition, default decision, exception path, emergency-access path, and audit expectation.
 - Read, list, search, update, delete, upload, attach, download, invite, billing, and admin actions affected, including whether the server scopes each action by actor, owner, workspace, organization, team, role, or capability.
 - Cookie, JWT, OAuth, file upload, file download, business-value, database mutation, ORM bulk operation, CI/CD permission, deployment setting, or secret-source surface involved.
 - Cryptographic primitive, password hashing, random-token, secure transport, certificate validation, scanner gate, or security invariant involved.
@@ -126,6 +128,9 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
    - Treat client-provided actor ids, role names, workspace ids, plan names, prices, discounts, entitlement flags, and status values as untrusted input. Derive trusted actor and tenant context from server-side authentication and membership checks.
    - Check list, search, detail, attachment, export, and download paths as carefully as mutation paths. Read access is still data access.
    - Reject mass assignment. Server code should allowlist mutable fields instead of passing raw request bodies into database updates where privileged fields could be set by the client.
+   - Review permission rules as actor, resource, action, scope, and condition rather than role name alone. "Admin can do it" is not enough; the rule should say which administrator can perform which action on which resource and under which tenant or system scope.
+   - Treat growing exceptions such as `isAdmin`, hardcoded user ids, company-email suffixes, internal-tool bypasses, feature-flag bypasses, or support-only shortcuts as authorization-model decay. Replace them with explicit capabilities, scoped roles, or time-limited emergency access.
+   - Emergency access should have a reason, time limit, notification or approval path, and audit log. It should not become a permanent silent superuser branch.
 7. For high-impact admin operations, require a server-side capability or role check, actor attribution, target identity, reason or change note where useful, before/after evidence, and a rollback, preview, or recovery path proportionate to the impact.
    High-impact examples include publish/unpublish, slug change, redirect change, canonical change, robots or sitemap change, filter definition change, advertisement slot or policy change, cache purge, search reindex, ranking refresh, bulk edit, and role or permission change.
 8. For high-risk content claims, require source attribution, jurisdiction or market, effective date, verification date, risk tier, review owner, affected-content lookup, and human approval before publication when the domain is legal, privacy, finance, health, safety, eligibility, pricing, ranking, comparison, or compliance.
@@ -194,6 +199,8 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
 - Public and packaged surfaces do not include unnecessary secrets, personal data, or misleading privacy guarantees.
 - Admin operations, shared-cache behavior, generated-state rebuilds, and audit logs are treated as security-sensitive when they affect private data, permissions, public indexing, traffic, or monetization.
 - Client-side permission displays, file upload or download flows, private asset URLs, and API response fields are treated as disclosure and access-control surfaces.
+- Permission models define actor, resource, action, scope, condition, and default-deny behavior when authorization is involved, or the missing model is reported as a risk.
+- Administrator, support, internal-tool, feature-flag, and emergency-access exceptions are audited, time-bounded, or reported as authorization-model drift.
 - Behavior analytics, observability, and audit logs are separated by durability, retention, attribution, personal-data, and loss-tolerance expectations.
 - Core security, privacy, billing, entitlement, file, search, job, webhook, and administrator events are internally owned or explicitly reported as SaaS-only with the resulting export, retention, and incident-reconstruction risk.
 - Trace context, baggage, request ids, user ids, tenant ids, job ids, and webhook ids are reviewed for sensitive data, external propagation, retention, and backend portability when those surfaces exist.
@@ -240,6 +247,7 @@ Use a narrower configured test, build, or documentation intent when it better pr
 - Data residency, data classification, AI processing location, runtime patch, and hard-limit policy checked when relevant
 - Claim, comparison, affiliate, user-generated content, data-ownership, deletion, anonymization, export, and retention boundaries checked when relevant
 - Authorization, session, token, input, file, network, business-logic, dependency, cryptography, transport, deployment, scanner, and agent-tool boundaries checked
+- Permission exception and emergency-access boundaries checked when relevant
 - Redaction, omission, or wording changes made
 - Related security-regression test need
 - Command intents run

package/templates/default/locales/en/.mustflow/skills/test-design-guard/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 mustflow_doc: skill.test-design-guard
 locale: en
 canonical: true
-revision: 1
+revision: 2
 lifecycle: mustflow-owned
 authority: procedure
 name: test-design-guard
@@ -31,6 +31,8 @@ Guard the design quality of new tests and new test cases. This skill prevents in
 This skill does not force TDD order. It requires evidence that each new or changed test proves an observable behavior contract.
+Good tests prove that important assumptions fail loudly. They should protect the risky behavior, boundary, state, permission, cost, or integration condition that would matter in production rather than only proving that the happy path can be demonstrated once.
 <!-- mustflow-section: use-when -->
 ## Use When
@@ -54,6 +56,7 @@ This skill does not force TDD order. It requires evidence that each new or chang
 - Behavior contract source: user request, issue, bug report, schema, command contract, public docs, fixture, template, or current behavior.
 - Existing tests, fixtures, and helpers near the behavior.
 - Intended test objective and changed files.
+- Risk list for the changed behavior, including money, permissions, deletion, external calls, AI cost, queues, files, data ownership, retries, timeouts, partial failure, or concurrency when those risks exist.
 - Baseline status when using a failing test as evidence.
 - Relevant command-intent contract entries.
@@ -78,6 +81,7 @@ This skill does not force TDD order. It requires evidence that each new or chang
 1. Confirm the contract and coverage.
    - Name the observable behavior being protected.
+   - Name the production risk the test is supposed to catch. If no risk can be named, prefer reusing existing coverage or reporting the idea as speculative.
    - Reuse or strengthen existing tests when they already cover the behavior.
    - Treat uncovered ideas without a contract source as suggestions, not tests.
 2. Select the smallest useful test shape.
@@ -98,6 +102,8 @@ This skill does not force TDD order. It requires evidence that each new or chang
 5. Check assertion quality.
    - Assert at least one observable result: return value, exit code, stdout or stderr, state change, file output, emitted effect, schema result, error shape, or user-visible contract.
    - Mock interaction assertions may support a test, but they must not be the only evidence of behavior unless the mock interaction itself is the public contract.
+   - For high-risk boundaries, prefer assertions over final state, stored records, rejected access, idempotency outcome, usage record, emitted event, or durable failure status rather than only asserting that a mocked collaborator was called.
+   - Treat tests that mock every database, transaction, authorization, serialization, queue, provider, or filesystem boundary as unit evidence only. Require a nearby integration, contract, fixture, or schema check when the real boundary is the risk.
 6. Choose verification by objective.
    - Use a semantic objective such as `new_behavior`, `bug_regression`, `security_negative`, `stale_test_cleanup`, `contract_sync`, `release_surface`, or `docs_or_template_contract`.
    - Start with the narrowest configured intent that proves the objective.
@@ -110,6 +116,7 @@ This skill does not force TDD order. It requires evidence that each new or chang
 ## Postconditions
 - Each new or changed test has a contract source, selected test shape, and observable assertion.
+- Each new or changed test has a named risk, or the final report explains why the change is low-risk or already covered.
 - RED evidence is classified as `behavior_red`, `api_scaffold_red`, `invalid_red`, or `not_applicable`.
 - Speculative edge cases and duplicate coverage are reported instead of silently added.
 - Verification uses configured command intents and reports any missing or skipped coverage.
@@ -142,6 +149,7 @@ Prefer the narrowest configured intent that proves the selected objective. `test
 ## Output Format
 - Contract source
+- Production risk being protected
 - Verification objective
 - Selected test shape: `example`, `boundary`, `property`, `mixed`, or `not_applicable`
 - Cases reused

package/templates/default/manifest.toml CHANGED Viewed

@@ -1,6 +1,6 @@
 id = "default"
 name = "default"
-version = "2.22.4"
+version = "2.22.9"
 description = "Minimal workflow for LLM agents to read, edit, and verify their work in a repository."
 common_root = "common"
 locales_root = "locales"