mustflow 2.22.5 → 2.22.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/README.md +8 -0
  2. package/dist/cli/commands/classify.js +2 -0
  3. package/dist/cli/commands/dashboard.js +9 -69
  4. package/dist/cli/commands/run/receipt.js +1 -0
  5. package/dist/cli/commands/run.js +14 -1
  6. package/dist/cli/commands/verify/evidence-input.js +269 -0
  7. package/dist/cli/commands/verify/input.js +212 -0
  8. package/dist/cli/commands/verify.js +23 -482
  9. package/dist/cli/i18n/en.js +3 -0
  10. package/dist/cli/i18n/es.js +3 -0
  11. package/dist/cli/i18n/fr.js +3 -0
  12. package/dist/cli/i18n/hi.js +3 -0
  13. package/dist/cli/i18n/ko.js +3 -0
  14. package/dist/cli/i18n/zh.js +3 -0
  15. package/dist/cli/lib/dashboard-export.js +2 -0
  16. package/dist/cli/lib/dashboard-mutations.js +79 -0
  17. package/dist/cli/lib/local-index/command-effect-index.js +25 -0
  18. package/dist/cli/lib/local-index/hashing.js +7 -0
  19. package/dist/cli/lib/local-index/index.js +127 -826
  20. package/dist/cli/lib/local-index/source-index.js +137 -0
  21. package/dist/cli/lib/local-index/verification-evidence.js +451 -0
  22. package/dist/cli/lib/local-index/workflow-documents.js +204 -0
  23. package/dist/cli/lib/run-root-trust.js +27 -0
  24. package/dist/core/change-classification-policy.js +47 -0
  25. package/dist/core/change-classification.js +10 -43
  26. package/dist/core/contract-lint.js +6 -2
  27. package/dist/core/correlation-id.js +16 -0
  28. package/dist/core/run-receipt.js +1 -0
  29. package/package.json +4 -1
  30. package/schemas/README.md +4 -0
  31. package/schemas/change-verification-report.schema.json +4 -0
  32. package/schemas/classify-report.schema.json +4 -0
  33. package/schemas/dashboard-export.schema.json +4 -0
  34. package/schemas/latest-run-pointer.schema.json +4 -0
  35. package/schemas/run-receipt.schema.json +4 -0
  36. package/schemas/verify-report.schema.json +4 -0
  37. package/schemas/verify-run-manifest.schema.json +4 -0
  38. package/templates/default/i18n.toml +3 -3
  39. package/templates/default/locales/en/.mustflow/skills/architecture-deepening-review/SKILL.md +25 -2
  40. package/templates/default/locales/en/.mustflow/skills/security-privacy-review/SKILL.md +9 -1
  41. package/templates/default/locales/en/.mustflow/skills/test-design-guard/SKILL.md +9 -1
  42. package/templates/default/manifest.toml +1 -1
@@ -2,7 +2,7 @@
2
2
  mustflow_doc: skill.security-privacy-review
3
3
  locale: en
4
4
  canonical: true
5
- revision: 16
5
+ revision: 17
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: security-privacy-review
@@ -31,6 +31,7 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
31
31
  ## Use When
32
32
 
33
33
  - A change touches authentication, authorization, sessions, admin behavior, tenant boundaries, personal data, secrets, tokens, credentials, API keys, or private files.
34
+ - A feature adds role, permission, administrator, internal-tool, feature-flag, emergency-access, support, or back-office exceptions that could make the authorization model less explicit over time.
34
35
  - A change comes from AI-generated code, vibe-coded output, copied examples, or a broad assistant patch that may have optimized for the happy path without proving abuse boundaries.
35
36
  - A change adds or modifies logging, telemetry, diagnostics, receipts, reports, caches, generated state, retention, redaction, export, or external transmission.
36
37
  - A change adds or modifies behavior analytics events, event schemas, page views, clicks, searches, impressions, scroll data, experiments, attribution, request traces, or observability data that may include personal data or sensitive context.
@@ -76,6 +77,7 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
76
77
  - Changed files, diff summary, and the user goal.
77
78
  - Sensitive data, actor, trust boundary, storage, logging, retention, export, or external disclosure surfaces involved.
78
79
  - Actor, resource owner, tenant boundary, server-side authorization rule, state-changing route, external network target, dependency source, and agent/tool permission surface involved.
80
+ - Permission model shape when authorization is involved: actor, resource, action, scope, condition, default decision, exception path, emergency-access path, and audit expectation.
79
81
  - Read, list, search, update, delete, upload, attach, download, invite, billing, and admin actions affected, including whether the server scopes each action by actor, owner, workspace, organization, team, role, or capability.
80
82
  - Cookie, JWT, OAuth, file upload, file download, business-value, database mutation, ORM bulk operation, CI/CD permission, deployment setting, or secret-source surface involved.
81
83
  - Cryptographic primitive, password hashing, random-token, secure transport, certificate validation, scanner gate, or security invariant involved.
@@ -126,6 +128,9 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
126
128
  - Treat client-provided actor ids, role names, workspace ids, plan names, prices, discounts, entitlement flags, and status values as untrusted input. Derive trusted actor and tenant context from server-side authentication and membership checks.
127
129
  - Check list, search, detail, attachment, export, and download paths as carefully as mutation paths. Read access is still data access.
128
130
  - Reject mass assignment. Server code should allowlist mutable fields instead of passing raw request bodies into database updates where privileged fields could be set by the client.
131
+ - Review permission rules as actor, resource, action, scope, and condition rather than role name alone. "Admin can do it" is not enough; the rule should say which administrator can perform which action on which resource and under which tenant or system scope.
132
+ - Treat growing exceptions such as `isAdmin`, hardcoded user ids, company-email suffixes, internal-tool bypasses, feature-flag bypasses, or support-only shortcuts as authorization-model decay. Replace them with explicit capabilities, scoped roles, or time-limited emergency access.
133
+ - Emergency access should have a reason, time limit, notification or approval path, and audit log. It should not become a permanent silent superuser branch.
129
134
  7. For high-impact admin operations, require a server-side capability or role check, actor attribution, target identity, reason or change note where useful, before/after evidence, and a rollback, preview, or recovery path proportionate to the impact.
130
135
  High-impact examples include publish/unpublish, slug change, redirect change, canonical change, robots or sitemap change, filter definition change, advertisement slot or policy change, cache purge, search reindex, ranking refresh, bulk edit, and role or permission change.
131
136
  8. For high-risk content claims, require source attribution, jurisdiction or market, effective date, verification date, risk tier, review owner, affected-content lookup, and human approval before publication when the domain is legal, privacy, finance, health, safety, eligibility, pricing, ranking, comparison, or compliance.
@@ -194,6 +199,8 @@ Catch security, privacy, and disclosure risks introduced by ordinary code, docum
194
199
  - Public and packaged surfaces do not include unnecessary secrets, personal data, or misleading privacy guarantees.
195
200
  - Admin operations, shared-cache behavior, generated-state rebuilds, and audit logs are treated as security-sensitive when they affect private data, permissions, public indexing, traffic, or monetization.
196
201
  - Client-side permission displays, file upload or download flows, private asset URLs, and API response fields are treated as disclosure and access-control surfaces.
202
+ - Permission models define actor, resource, action, scope, condition, and default-deny behavior when authorization is involved, or the missing model is reported as a risk.
203
+ - Administrator, support, internal-tool, feature-flag, and emergency-access exceptions are audited, time-bounded, or reported as authorization-model drift.
197
204
  - Behavior analytics, observability, and audit logs are separated by durability, retention, attribution, personal-data, and loss-tolerance expectations.
198
205
  - Core security, privacy, billing, entitlement, file, search, job, webhook, and administrator events are internally owned or explicitly reported as SaaS-only with the resulting export, retention, and incident-reconstruction risk.
199
206
  - Trace context, baggage, request ids, user ids, tenant ids, job ids, and webhook ids are reviewed for sensitive data, external propagation, retention, and backend portability when those surfaces exist.
@@ -240,6 +247,7 @@ Use a narrower configured test, build, or documentation intent when it better pr
240
247
  - Data residency, data classification, AI processing location, runtime patch, and hard-limit policy checked when relevant
241
248
  - Claim, comparison, affiliate, user-generated content, data-ownership, deletion, anonymization, export, and retention boundaries checked when relevant
242
249
  - Authorization, session, token, input, file, network, business-logic, dependency, cryptography, transport, deployment, scanner, and agent-tool boundaries checked
250
+ - Permission exception and emergency-access boundaries checked when relevant
243
251
  - Redaction, omission, or wording changes made
244
252
  - Related security-regression test need
245
253
  - Command intents run
@@ -2,7 +2,7 @@
2
2
  mustflow_doc: skill.test-design-guard
3
3
  locale: en
4
4
  canonical: true
5
- revision: 1
5
+ revision: 2
6
6
  lifecycle: mustflow-owned
7
7
  authority: procedure
8
8
  name: test-design-guard
@@ -31,6 +31,8 @@ Guard the design quality of new tests and new test cases. This skill prevents in
31
31
 
32
32
  This skill does not force TDD order. It requires evidence that each new or changed test proves an observable behavior contract.
33
33
 
34
+ Good tests prove that important assumptions fail loudly. They should protect the risky behavior, boundary, state, permission, cost, or integration condition that would matter in production rather than only proving that the happy path can be demonstrated once.
35
+
34
36
  <!-- mustflow-section: use-when -->
35
37
  ## Use When
36
38
 
@@ -54,6 +56,7 @@ This skill does not force TDD order. It requires evidence that each new or chang
54
56
  - Behavior contract source: user request, issue, bug report, schema, command contract, public docs, fixture, template, or current behavior.
55
57
  - Existing tests, fixtures, and helpers near the behavior.
56
58
  - Intended test objective and changed files.
59
+ - Risk list for the changed behavior, including money, permissions, deletion, external calls, AI cost, queues, files, data ownership, retries, timeouts, partial failure, or concurrency when those risks exist.
57
60
  - Baseline status when using a failing test as evidence.
58
61
  - Relevant command-intent contract entries.
59
62
 
@@ -78,6 +81,7 @@ This skill does not force TDD order. It requires evidence that each new or chang
78
81
 
79
82
  1. Confirm the contract and coverage.
80
83
  - Name the observable behavior being protected.
84
+ - Name the production risk the test is supposed to catch. If no risk can be named, prefer reusing existing coverage or reporting the idea as speculative.
81
85
  - Reuse or strengthen existing tests when they already cover the behavior.
82
86
  - Treat uncovered ideas without a contract source as suggestions, not tests.
83
87
  2. Select the smallest useful test shape.
@@ -98,6 +102,8 @@ This skill does not force TDD order. It requires evidence that each new or chang
98
102
  5. Check assertion quality.
99
103
  - Assert at least one observable result: return value, exit code, stdout or stderr, state change, file output, emitted effect, schema result, error shape, or user-visible contract.
100
104
  - Mock interaction assertions may support a test, but they must not be the only evidence of behavior unless the mock interaction itself is the public contract.
105
+ - For high-risk boundaries, prefer assertions over final state, stored records, rejected access, idempotency outcome, usage record, emitted event, or durable failure status rather than only asserting that a mocked collaborator was called.
106
+ - Treat tests that mock every database, transaction, authorization, serialization, queue, provider, or filesystem boundary as unit evidence only. Require a nearby integration, contract, fixture, or schema check when the real boundary is the risk.
101
107
  6. Choose verification by objective.
102
108
  - Use a semantic objective such as `new_behavior`, `bug_regression`, `security_negative`, `stale_test_cleanup`, `contract_sync`, `release_surface`, or `docs_or_template_contract`.
103
109
  - Start with the narrowest configured intent that proves the objective.
@@ -110,6 +116,7 @@ This skill does not force TDD order. It requires evidence that each new or chang
110
116
  ## Postconditions
111
117
 
112
118
  - Each new or changed test has a contract source, selected test shape, and observable assertion.
119
+ - Each new or changed test has a named risk, or the final report explains why the change is low-risk or already covered.
113
120
  - RED evidence is classified as `behavior_red`, `api_scaffold_red`, `invalid_red`, or `not_applicable`.
114
121
  - Speculative edge cases and duplicate coverage are reported instead of silently added.
115
122
  - Verification uses configured command intents and reports any missing or skipped coverage.
@@ -142,6 +149,7 @@ Prefer the narrowest configured intent that proves the selected objective. `test
142
149
  ## Output Format
143
150
 
144
151
  - Contract source
152
+ - Production risk being protected
145
153
  - Verification objective
146
154
  - Selected test shape: `example`, `boundary`, `property`, `mixed`, or `not_applicable`
147
155
  - Cases reused
@@ -1,6 +1,6 @@
1
1
  id = "default"
2
2
  name = "default"
3
- version = "2.22.5"
3
+ version = "2.22.9"
4
4
  description = "Minimal workflow for LLM agents to read, edit, and verify their work in a repository."
5
5
  common_root = "common"
6
6
  locales_root = "locales"