mustflow 2.107.3 → 2.108.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -0
- package/dist/cli/commands/init.js +49 -1
- package/dist/cli/commands/run/execution.js +7 -0
- package/dist/cli/commands/run/executor.js +7 -0
- package/dist/cli/commands/verify.js +14 -0
- package/dist/cli/commands/workspace.js +106 -16
- package/dist/cli/i18n/en.js +6 -1
- package/dist/cli/i18n/es.js +6 -1
- package/dist/cli/i18n/fr.js +6 -1
- package/dist/cli/i18n/hi.js +6 -1
- package/dist/cli/i18n/ko.js +6 -1
- package/dist/cli/i18n/zh.js +6 -1
- package/dist/cli/index.js +8 -0
- package/dist/cli/lib/agent-context.js +7 -0
- package/dist/cli/lib/repo-map.js +14 -0
- package/dist/cli/lib/run-plan.js +7 -0
- package/dist/core/change-verification.js +7 -0
- package/dist/core/verification-scheduler.js +7 -0
- package/package.json +1 -1
- package/schemas/README.md +3 -3
- package/schemas/workspace-status.schema.json +4 -2
- package/templates/default/common/.mustflow/config/mustflow.toml +3 -3
- package/templates/default/i18n.toml +61 -7
- package/templates/default/locales/en/.mustflow/docs/agent-workflow.md +24 -1
- package/templates/default/locales/en/.mustflow/skills/INDEX.md +51 -5
- package/templates/default/locales/en/.mustflow/skills/admin-control-plane-safety-review/SKILL.md +200 -0
- package/templates/default/locales/en/.mustflow/skills/ai-product-readiness-review/SKILL.md +158 -0
- package/templates/default/locales/en/.mustflow/skills/auth-permission-change/SKILL.md +91 -28
- package/templates/default/locales/en/.mustflow/skills/browser-automation-reliability-review/SKILL.md +279 -0
- package/templates/default/locales/en/.mustflow/skills/cli-option-contract-review/SKILL.md +147 -0
- package/templates/default/locales/en/.mustflow/skills/database-change-safety/SKILL.md +21 -2
- package/templates/default/locales/en/.mustflow/skills/database-migration-change/SKILL.md +25 -7
- package/templates/default/locales/en/.mustflow/skills/deployment-rollout-safety-review/SKILL.md +117 -43
- package/templates/default/locales/en/.mustflow/skills/frontend-component-library-review/SKILL.md +299 -0
- package/templates/default/locales/en/.mustflow/skills/frontend-localization-review/SKILL.md +128 -36
- package/templates/default/locales/en/.mustflow/skills/notification-delivery-integrity-review/SKILL.md +226 -0
- package/templates/default/locales/en/.mustflow/skills/payment-integrity-review/SKILL.md +34 -14
- package/templates/default/locales/en/.mustflow/skills/routes.toml +54 -0
- package/templates/default/locales/en/.mustflow/skills/small-service-platform-architecture-review/SKILL.md +273 -0
- package/templates/default/locales/en/.mustflow/skills/third-party-api-integration-review/SKILL.md +188 -0
- package/templates/default/locales/en/.mustflow/skills/website-task-friction-review/SKILL.md +139 -0
- package/templates/default/manifest.toml +60 -1
package/templates/default/locales/en/.mustflow/skills/browser-automation-reliability-review/SKILL.md
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
---
|
|
2
|
+
mustflow_doc: skill.browser-automation-reliability-review
|
|
3
|
+
locale: en
|
|
4
|
+
canonical: true
|
|
5
|
+
revision: 1
|
|
6
|
+
lifecycle: mustflow-owned
|
|
7
|
+
authority: procedure
|
|
8
|
+
name: browser-automation-reliability-review
|
|
9
|
+
description: Apply this skill when browser automation, UI automation, Playwright, Selenium, Puppeteer, WebDriver, computer-use/browser-driving agents, visual browser verification, flaky selectors, page readiness, authentication state, CAPTCHA or anti-bot handling, rate limits, screenshot checks, retry, timeout, human approval, or browser automation observability is created, changed, reviewed, triaged, or reported.
|
|
10
|
+
metadata:
|
|
11
|
+
mustflow_schema: "1"
|
|
12
|
+
mustflow_kind: procedure
|
|
13
|
+
pack_id: mustflow.core
|
|
14
|
+
skill_id: mustflow.core.browser-automation-reliability-review
|
|
15
|
+
command_intents:
|
|
16
|
+
- changes_status
|
|
17
|
+
- changes_diff_summary
|
|
18
|
+
- lint
|
|
19
|
+
- build
|
|
20
|
+
- test_related
|
|
21
|
+
- test
|
|
22
|
+
- docs_validate_fast
|
|
23
|
+
- test_release
|
|
24
|
+
- mustflow_check
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
# Browser Automation Reliability Review
|
|
28
|
+
|
|
29
|
+
<!-- mustflow-section: purpose -->
|
|
30
|
+
## Purpose
|
|
31
|
+
|
|
32
|
+
Review browser automation as a stateful, evidence-producing system, not as a sequence of clicks.
|
|
33
|
+
|
|
34
|
+
The core question is: "Does the automation know what state the browser, user, page, network,
|
|
35
|
+
session, target data, and approval gate are in before it acts and before it claims success?" If not,
|
|
36
|
+
the flow will look fine in a demo and then fail under rerenders, slow CI, auth drift, anti-bot
|
|
37
|
+
gates, rate limits, visual noise, stale approvals, or agent hallucination.
|
|
38
|
+
|
|
39
|
+
<!-- mustflow-section: use-when -->
|
|
40
|
+
## Use When
|
|
41
|
+
|
|
42
|
+
- Code, tests, docs, templates, or reviews touch browser automation, UI automation, end-to-end
|
|
43
|
+
harnesses, Playwright, Selenium, Puppeteer, WebDriver, browser contexts, remote browsers,
|
|
44
|
+
screenshots, videos, traces, HAR files, synthetic user flows, or computer-use browser agents.
|
|
45
|
+
- A task mentions flaky selectors, unstable locators, actionability, stale elements, rerenders,
|
|
46
|
+
page readiness, `networkidle`, sleeps, waits, timeouts, retries, screenshot diffs, visual checks,
|
|
47
|
+
popups, downloads, native dialogs, iframes, shadow DOM, virtualized lists, or input typing.
|
|
48
|
+
- Automation logs into a product, reuses storage state, shares accounts across workers, handles SSO,
|
|
49
|
+
OAuth, MFA, passkeys, cookies, localStorage, sessionStorage, IndexedDB, account lockout, or
|
|
50
|
+
permission changes.
|
|
51
|
+
- Browser automation touches third-party sites, CAPTCHA, anti-bot or WAF challenges, rate limits,
|
|
52
|
+
robots or terms boundaries, IP reputation, headless fingerprints, provider throttling, or manual
|
|
53
|
+
fallback paths.
|
|
54
|
+
- A browser-driving agent reads page content, follows page instructions, clicks by screenshot or
|
|
55
|
+
coordinates, extracts table data visually, enters forms, sends messages, purchases, deletes,
|
|
56
|
+
mutates external state, or asks for human approval before continuing.
|
|
57
|
+
|
|
58
|
+
<!-- mustflow-section: do-not-use-when -->
|
|
59
|
+
## Do Not Use When
|
|
60
|
+
|
|
61
|
+
- The task is a pure LLM agent control-flow change with no browser or UI automation surface. Use
|
|
62
|
+
`agent-execution-control-review`.
|
|
63
|
+
- The task is only prompt, RAG, model, tool schema, cost, latency, hallucination, or eval behavior
|
|
64
|
+
without browser execution. Use the matching LLM or agent specialist skill.
|
|
65
|
+
- The task is only a product auth bug that is not being automated through a browser. Use
|
|
66
|
+
`auth-flow-triage` or `auth-permission-change`.
|
|
67
|
+
- The task is only a browser request, CORS, CDN, API, or provider failure before the browser
|
|
68
|
+
automation layer is relevant. Use `api-failure-triage`.
|
|
69
|
+
- The task is only frontend UI quality, layout resilience, accessibility, render stability, or web
|
|
70
|
+
performance for human users rather than automation harness reliability. Use the matching frontend
|
|
71
|
+
skill first.
|
|
72
|
+
- The task is only test-suite runtime optimization, shard balance, retry policy, or flaky-test
|
|
73
|
+
handling without browser-specific failure modes. Use `test-suite-performance-review` or
|
|
74
|
+
`test-maintenance`.
|
|
75
|
+
|
|
76
|
+
<!-- mustflow-section: required-inputs -->
|
|
77
|
+
## Required Inputs
|
|
78
|
+
|
|
79
|
+
- Automation intent ledger: target site or app, owner, internal versus third-party boundary,
|
|
80
|
+
allowed actions, forbidden actions, expected user role, data class, write risk, and whether the
|
|
81
|
+
browser path is the right tool rather than an API, fixture, or deterministic adapter.
|
|
82
|
+
- State ledger: current URL, frame, page, route, modal, popup, selected account, auth storage,
|
|
83
|
+
browser context, viewport, locale, timezone, permissions, feature flags, test data, worker ID,
|
|
84
|
+
correlation ID, and previous step result.
|
|
85
|
+
- Readiness ledger: page-ready signal, data-ready signal, actionable-control signal, business-ready
|
|
86
|
+
signal, network and background-work assumptions, and any waits or assertions that prove them.
|
|
87
|
+
- Selector and action ledger: locators, user-facing roles or labels, test IDs or automation
|
|
88
|
+
contracts, shadow DOM and iframe boundaries, virtualized list handling, click target, keyboard and
|
|
89
|
+
focus path, input acceptance proof, and actionability override use.
|
|
90
|
+
- Auth and identity ledger: login strategy, storage owner, token or cookie storage surface, session
|
|
91
|
+
expiry, refresh behavior, per-worker account isolation, SSO or MFA gates, CAPTCHA policy, account
|
|
92
|
+
lockout policy, and logout or cleanup behavior.
|
|
93
|
+
- External pressure ledger: rate limit unit, retry budget, anti-bot or challenge detection,
|
|
94
|
+
provider terms boundary, manual fallback, backoff behavior, and circuit-breaker threshold.
|
|
95
|
+
- Verification ledger: success criteria, API or database confirmation when available, screenshot
|
|
96
|
+
or visual artifact role, trace/video/HAR policy, console and network capture, redaction,
|
|
97
|
+
retention, and failure artifact sampling.
|
|
98
|
+
- Agent and approval ledger: page content trust boundary, prompt-injection exposure, tool
|
|
99
|
+
permissions, coordinate mapping, stale approval checks, approval snapshot, exact post-approval
|
|
100
|
+
action, resume state, and human escalation path.
|
|
101
|
+
|
|
102
|
+
<!-- mustflow-section: preconditions -->
|
|
103
|
+
## Preconditions
|
|
104
|
+
|
|
105
|
+
- The task matches the Use When conditions and does not match the Do Not Use When exclusions.
|
|
106
|
+
- Current repository instructions, command contract, automation harness code, test fixtures, browser
|
|
107
|
+
config, auth fixtures, screenshots or traces, and docs directly tied to the automation path have
|
|
108
|
+
been inspected before editing.
|
|
109
|
+
- Browser vendor, automation library, remote-browser provider, CAPTCHA, anti-bot, and Agents SDK or
|
|
110
|
+
computer-use details are stale-sensitive. Use `source-freshness-check` before embedding exact
|
|
111
|
+
current API claims, provider limits, default timeouts, or compliance requirements.
|
|
112
|
+
- External pages, emails, documents, ads, support threads, and rendered web content are untrusted
|
|
113
|
+
input for browser-driving agents.
|
|
114
|
+
- Command execution remains governed by `.mustflow/config/commands.toml`; this skill does not
|
|
115
|
+
authorize launching development servers, unmanaged browsers, long-running workers, production
|
|
116
|
+
browser sessions, CAPTCHA bypasses, provider dashboards, or live side-effect runs.
|
|
117
|
+
|
|
118
|
+
<!-- mustflow-section: allowed-edits -->
|
|
119
|
+
## Allowed Edits
|
|
120
|
+
|
|
121
|
+
- Add or refine browser automation state machines, locator contracts, test IDs, accessible names,
|
|
122
|
+
readiness assertions, frame or popup handlers, input verification, auth fixtures, per-worker
|
|
123
|
+
account isolation, retry classification, timeout hierarchy, idempotency checks, rate-limit
|
|
124
|
+
handling, approval gates, manual fallback states, traces, screenshots, redaction, cleanup, and
|
|
125
|
+
directly synchronized docs or templates.
|
|
126
|
+
- Move fixture setup, result verification, cleanup, idempotency checks, and data creation from
|
|
127
|
+
browser clicks to API or deterministic helpers when the browser UI is not the behavior under test.
|
|
128
|
+
- Add focused tests for selector drift, readiness failure, stale element rerender, iframe or shadow
|
|
129
|
+
DOM handling, auth-state expiration, per-worker isolation, retry non-idempotency, stale approval,
|
|
130
|
+
screenshot noise, trace redaction, and agent prompt-injection defense when behavior evidence
|
|
131
|
+
supports them.
|
|
132
|
+
- Do not fix flakiness by adding blind sleeps, force-clicking as the default, hiding failures behind
|
|
133
|
+
broad retries, weakening visual thresholds without evidence, sharing one mutable account across
|
|
134
|
+
parallel workers, or claiming browser success from an unverified screenshot.
|
|
135
|
+
- Do not add CAPTCHA bypass, anti-bot evasion, headless fingerprint spoofing, or terms-violating
|
|
136
|
+
third-party automation as a normal product feature.
|
|
137
|
+
|
|
138
|
+
<!-- mustflow-section: procedure -->
|
|
139
|
+
## Procedure
|
|
140
|
+
|
|
141
|
+
1. Decide whether the browser is the right boundary. Use API, fixtures, or service adapters for data
|
|
142
|
+
setup, teardown, and result verification when the browser UI itself is not the behavior being
|
|
143
|
+
tested or automated.
|
|
144
|
+
2. Classify the automation owner: internal app E2E, internal operations tool, third-party site
|
|
145
|
+
workflow, browser-driving LLM agent, visual regression, scraping-like extraction, support tool,
|
|
146
|
+
or production user-assistance flow.
|
|
147
|
+
3. Define a state machine before actions. Name the states such as unauthenticated, authenticated,
|
|
148
|
+
searching, selecting target, filling form, awaiting approval, submitting, verifying result,
|
|
149
|
+
retrying, blocked by challenge, manual fallback, succeeded, and failed.
|
|
150
|
+
4. Replace sleeps with readiness evidence. For each step, define what proves the page is ready, the
|
|
151
|
+
data is ready, the target control is actionable, and the business state is safe to advance.
|
|
152
|
+
5. Treat `networkidle` and selector-visible waits as weak signals. Prefer domain assertions such as
|
|
153
|
+
expected row identity, enabled submit state, loaded data count, settled validation, known URL,
|
|
154
|
+
confirmation ID, provider event, or backend result.
|
|
155
|
+
6. Review locator contracts. Prefer stable user-facing roles, labels, names, and explicit test IDs
|
|
156
|
+
over CSS layout paths, generated classes, index-based XPath, translated prose only, or first-match
|
|
157
|
+
selectors.
|
|
158
|
+
7. Check ambiguous DOM. Handle hidden duplicate controls, responsive desktop and mobile DOM at the
|
|
159
|
+
same time, skeletons that resemble real content, virtualized rows, portals, sticky overlays,
|
|
160
|
+
cookie banners, focus traps, iframes, cross-origin frames, shadow DOM, and custom components.
|
|
161
|
+
8. Avoid stale element handles. Re-resolve locators at action time, and keep find-check-act-verify
|
|
162
|
+
close together so rerenders cannot invalidate old DOM references silently.
|
|
163
|
+
9. Review actionability honestly. A forced click, coordinate click, JS-dispatched event, or disabled
|
|
164
|
+
actionability check must be exceptional, documented, and followed by proof that a real user path
|
|
165
|
+
is not being bypassed.
|
|
166
|
+
10. Verify input acceptance. After typing, pasting, selecting dates, entering currency, using IME,
|
|
167
|
+
triggering autocomplete, or blurring a field, confirm the stored value, validation state, submit
|
|
168
|
+
readiness, or outbound payload rather than assuming keystrokes were accepted.
|
|
169
|
+
11. Make auth state explicit. Identify whether auth lives in cookies, localStorage, sessionStorage,
|
|
170
|
+
IndexedDB, memory, or provider redirects; isolate accounts by worker; avoid shared mutable user
|
|
171
|
+
state; and handle expiry, rotation, SSO, MFA, passkeys, lockout, and logout contamination.
|
|
172
|
+
12. Treat CAPTCHA and anti-bot as product states. In test or staging, use allowed test keys,
|
|
173
|
+
allowlists, or disabled challenge paths. In production or third-party flows, detect challenges,
|
|
174
|
+
stop safely, and route to human review or manual fallback instead of trying to evade them.
|
|
175
|
+
13. Add rate control before retries. Identify the rate-limit subject, whether a single browser action
|
|
176
|
+
fans out into many requests, how backoff is computed, when to stop, and how the system avoids a
|
|
177
|
+
retry storm.
|
|
178
|
+
14. Classify retryable failures. Retry only transient navigation, detached element, timeout,
|
|
179
|
+
temporary backend, or eventual-consistency classes within a bounded budget. Do not retry
|
|
180
|
+
permission denied, invalid input, CAPTCHA, account lockout, provider policy blocks, unknown
|
|
181
|
+
write outcome, or business-rule failures without a recovery-specific check.
|
|
182
|
+
15. Make writes idempotent or confirm-before-replay. For purchases, payments, deletes, sends,
|
|
183
|
+
refunds, admin changes, support actions, and external mutations, record stable operation IDs and
|
|
184
|
+
check whether the effect already happened before any retry or resume can repeat it.
|
|
185
|
+
16. Design timeout hierarchy. Align action, assertion, navigation, test, job, queue lease, browser
|
|
186
|
+
provider session, and external API timeouts so cancellation saves evidence, releases resources,
|
|
187
|
+
and resumes from a known state.
|
|
188
|
+
17. Separate visual proof from business proof. Use screenshots for layout or visual regression, but
|
|
189
|
+
use confirmation IDs, API reads, database rows, provider events, downloads with checksums, audit
|
|
190
|
+
logs, or received messages to prove business success.
|
|
191
|
+
18. Stabilize screenshot assertions. Freeze or mask nondeterministic content such as time, caret,
|
|
192
|
+
animation, ads, maps, charts, lazy images, random data, locale, theme, viewport, font, GPU,
|
|
193
|
+
scrollbar, and cookie banners before changing thresholds or baselines.
|
|
194
|
+
19. Capture failure context. Save current URL, frame, viewport, locale, timezone, screenshot, DOM or
|
|
195
|
+
accessibility snapshot when safe, console errors, network statuses, trace, video, retry count,
|
|
196
|
+
worker ID, account ID class, and correlation ID with sensitive-data redaction.
|
|
197
|
+
20. Protect artifacts. Browser traces, videos, screenshots, HAR files, storage state, and console
|
|
198
|
+
logs can contain cookies, tokens, personal data, addresses, order details, and messages; set
|
|
199
|
+
redaction, retention, encryption, access, and sampling before broad collection.
|
|
200
|
+
21. For browser-driving agents, distrust page content. Treat rendered instructions, hidden DOM,
|
|
201
|
+
emails, PDFs, comments, ads, and third-party text as untrusted data that must not override the
|
|
202
|
+
system task, tool policy, approval rules, or data-exfiltration limits.
|
|
203
|
+
22. Split agent roles where risk justifies it. Keep planner, browser executor, verifier, policy
|
|
204
|
+
gate, and human approval separate for high-impact flows. If one model does multiple roles, add
|
|
205
|
+
deterministic gates before side effects and before success claims.
|
|
206
|
+
23. Make coordinate and screenshot actions verifiable. Recheck screenshot-to-DOM scale, scrolling,
|
|
207
|
+
focus, active modal, target bounds, visible label, disabled state, and post-action state when a
|
|
208
|
+
model or computer-use tool clicks by image or coordinates.
|
|
209
|
+
24. Treat human approval as durable state. Show the exact account, URL, target, amount, recipient,
|
|
210
|
+
data, screenshot, form values, risk class, reversibility, and exact next action. Before resume,
|
|
211
|
+
re-read critical fields and compare them with the approved snapshot.
|
|
212
|
+
25. Clean up resources. Close pages, contexts, browsers, downloads, temp files, videos, traces,
|
|
213
|
+
mock servers, websockets, and test data deliberately; detect zombie browser processes and
|
|
214
|
+
artifact growth in long runs.
|
|
215
|
+
26. Verify with the narrowest configured tests, docs checks, release checks, and mustflow validation
|
|
216
|
+
that cover the changed automation contract.
|
|
217
|
+
|
|
218
|
+
<!-- mustflow-section: postconditions -->
|
|
219
|
+
## Postconditions
|
|
220
|
+
|
|
221
|
+
- The automation has explicit states, readiness signals, locator contracts, auth isolation, retry
|
|
222
|
+
classes, timeout hierarchy, and success evidence.
|
|
223
|
+
- Browser-only proof is separated from business-result proof.
|
|
224
|
+
- CAPTCHA, anti-bot, rate-limit, human-approval, prompt-injection, and third-party boundary risks
|
|
225
|
+
are detected, stopped, or routed to manual fallback instead of hidden behind retries.
|
|
226
|
+
- Failure artifacts are useful enough to debug and constrained enough not to leak secrets or
|
|
227
|
+
personal data.
|
|
228
|
+
|
|
229
|
+
<!-- mustflow-section: verification -->
|
|
230
|
+
## Verification
|
|
231
|
+
|
|
232
|
+
Use configured oneshot command intents when available:
|
|
233
|
+
|
|
234
|
+
- `changes_status`
|
|
235
|
+
- `changes_diff_summary`
|
|
236
|
+
- `lint`
|
|
237
|
+
- `build`
|
|
238
|
+
- `test_related`
|
|
239
|
+
- `test`
|
|
240
|
+
- `docs_validate_fast`
|
|
241
|
+
- `test_release`
|
|
242
|
+
- `mustflow_check`
|
|
243
|
+
|
|
244
|
+
Use the narrowest configured fixture, unit, integration, docs, package, or release check that proves
|
|
245
|
+
the changed browser automation contract. Do not infer raw browser launches, dev servers, headed
|
|
246
|
+
browsers, provider dashboards, CAPTCHA-solving services, or production automation runs from local
|
|
247
|
+
files.
|
|
248
|
+
|
|
249
|
+
<!-- mustflow-section: failure-handling -->
|
|
250
|
+
## Failure Handling
|
|
251
|
+
|
|
252
|
+
- If the failure is not localized to browser automation, use `api-failure-triage`,
|
|
253
|
+
`auth-flow-triage`, `frontend-render-stability`, `test-maintenance`, or another narrower skill
|
|
254
|
+
first.
|
|
255
|
+
- If a selector is flaky, do not patch only the selector string until locator ownership, duplicate
|
|
256
|
+
DOM, responsive DOM, skeletons, frames, shadow DOM, and readiness have been checked.
|
|
257
|
+
- If a retry would replay an unknown write, stop and add idempotency or effect-confirmation before
|
|
258
|
+
enabling retry.
|
|
259
|
+
- If CAPTCHA, anti-bot, account lockout, provider policy, or terms boundaries are detected, stop the
|
|
260
|
+
automation path and report the manual or contractual fallback instead of bypassing it.
|
|
261
|
+
- If human approval resumes after state changed, expire the approval or request a new approval with
|
|
262
|
+
the changed fields.
|
|
263
|
+
- If artifacts would leak secrets or personal data, collect a smaller redacted evidence set and
|
|
264
|
+
report the observability gap.
|
|
265
|
+
- If a configured command fails, use `failure-triage` before continuing.
|
|
266
|
+
|
|
267
|
+
<!-- mustflow-section: output-format -->
|
|
268
|
+
## Output Format
|
|
269
|
+
|
|
270
|
+
- Browser automation surface reviewed
|
|
271
|
+
- Browser-versus-API boundary and automation owner
|
|
272
|
+
- State machine, readiness, locator, actionability, auth, rate-limit, retry, timeout, and
|
|
273
|
+
idempotency decisions
|
|
274
|
+
- Screenshot, trace, artifact, redaction, and business-success evidence
|
|
275
|
+
- Agent page-content trust, coordinate action, tool permission, approval, and resume checks
|
|
276
|
+
- Files changed
|
|
277
|
+
- Command intents run
|
|
278
|
+
- Skipped checks and reasons
|
|
279
|
+
- Remaining browser automation reliability risk
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
---
|
|
2
|
+
mustflow_doc: skill.cli-option-contract-review
|
|
3
|
+
locale: en
|
|
4
|
+
canonical: true
|
|
5
|
+
revision: 1
|
|
6
|
+
lifecycle: mustflow-owned
|
|
7
|
+
authority: procedure
|
|
8
|
+
name: cli-option-contract-review
|
|
9
|
+
description: Apply this skill when CLI options, flags, positional arguments, aliases, defaults, parser behavior, prompt controls, config or environment precedence, or automation-facing argument contracts are created, changed, reviewed, or reported.
|
|
10
|
+
metadata:
|
|
11
|
+
mustflow_schema: "1"
|
|
12
|
+
mustflow_kind: procedure
|
|
13
|
+
pack_id: mustflow.core
|
|
14
|
+
skill_id: mustflow.core.cli-option-contract-review
|
|
15
|
+
command_intents:
|
|
16
|
+
- changes_status
|
|
17
|
+
- changes_diff_summary
|
|
18
|
+
- test_related
|
|
19
|
+
- docs_validate_fast
|
|
20
|
+
- test_release
|
|
21
|
+
- mustflow_check
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
# CLI Option Contract Review
|
|
25
|
+
|
|
26
|
+
<!-- mustflow-section: purpose -->
|
|
27
|
+
## Purpose
|
|
28
|
+
|
|
29
|
+
Preserve the contract between CLI syntax and the humans, scripts, CI jobs, shells, terminals, config files, and docs that depend on it.
|
|
30
|
+
|
|
31
|
+
CLI options are public API. A convenient flag can still be unsafe if it collides with existing shorthand, hides destructive behavior behind a vague name, prompts in CI, writes to stdout when scripts expect JSON, or turns a path, format, selector, or environment into an ambiguous value.
|
|
32
|
+
|
|
33
|
+
<!-- mustflow-section: use-when -->
|
|
34
|
+
## Use When
|
|
35
|
+
|
|
36
|
+
- A command adds, removes, renames, aliases, deprecates, validates, or changes a flag, option, positional argument, variadic argument, default value, inherited global flag, or option parser rule.
|
|
37
|
+
- A task designs or reviews standard CLI controls such as dry-run, check, plan, diff, yes, force, confirm, no-input, interactive, verbose, quiet, debug, format, output, color, pager, progress, config, profile, env, timeout, retry, jobs, cache, stdin, token, endpoint, region, project, pagination, target, prune, rollback, or AI-agent permission flags.
|
|
38
|
+
- A command changes prompt behavior, TTY behavior, non-interactive behavior, CI behavior, option terminator support, repeated flags, boolean negation, duration or size parsing, path handling, glob handling, stdin handling, or list parsing.
|
|
39
|
+
- A final report claims that CLI options are safe, automatable, compatible, conventional, discoverable, or aligned with docs and tests.
|
|
40
|
+
|
|
41
|
+
<!-- mustflow-section: do-not-use-when -->
|
|
42
|
+
## Do Not Use When
|
|
43
|
+
|
|
44
|
+
- The task changes only stdout, stderr, JSON fields, JSONL packets, exit codes, color rendering, progress output, warning text, error text, or help wording without changing option or argument semantics. Use `cli-output-contract-review`.
|
|
45
|
+
- The task changes only public JSON, JSONL, schema-backed reports, or machine-readable stdout and stderr contracts. Use `public-json-contract-change`.
|
|
46
|
+
- The task changes only `.mustflow/config/commands.toml` command intents or command authority. Use `command-contract-authoring`.
|
|
47
|
+
- The task changes only environment variables, secrets, config keys, feature flags, or runtime/build-time exposure. Use `config-env-change`.
|
|
48
|
+
- The task changes only docs prose that mentions an unchanged command syntax. Use the matching docs skill.
|
|
49
|
+
|
|
50
|
+
<!-- mustflow-section: required-inputs -->
|
|
51
|
+
## Required Inputs
|
|
52
|
+
|
|
53
|
+
- The affected command, command tree, parser library or command router, inherited global flags, positional arguments, variadic arguments, current aliases, defaults, validation rules, and help metadata.
|
|
54
|
+
- Existing docs, README snippets, examples, tests, snapshots, fixtures, shell completions, schemas, template copies, package tests, and release notes that mention the syntax.
|
|
55
|
+
- The operation type: read-only, planning, validation, write, destructive write, remote write, deploy, migration, deletion, cleanup, generated-file write, or AI-agent action.
|
|
56
|
+
- The intended consumers: humans at a TTY, scripts, CI jobs, package tests, shell completion users, remote APIs, installed templates, release automation, or downstream wrappers.
|
|
57
|
+
- Current config and environment precedence, including config files, profiles, env vars, CLI flags, defaults, and explicit override rules.
|
|
58
|
+
- Current non-interactive, prompt, color, pager, progress, timeout, retry, cache, lock, and exit-code expectations when they exist.
|
|
59
|
+
- Relevant command-intent entries for related tests, docs validation, release checks, and mustflow validation.
|
|
60
|
+
|
|
61
|
+
<!-- mustflow-section: preconditions -->
|
|
62
|
+
## Preconditions
|
|
63
|
+
|
|
64
|
+
- The task matches the Use When conditions and does not match the Do Not Use When exclusions.
|
|
65
|
+
- Existing command syntax, aliases, docs examples, tests, and parser behavior have been inspected before changing or recommending a flag.
|
|
66
|
+
- Short flags are treated as scarce public API. Do not assign them from generic CLI advice without checking collisions, command frequency, and established project conventions.
|
|
67
|
+
- External articles, AI summaries, package defaults, and other CLIs are evidence only. The repository's current parser, command contract, compatibility policy, and user instructions remain authoritative.
|
|
68
|
+
- Command execution remains governed by `.mustflow/config/commands.toml`; this skill does not authorize raw command execution.
|
|
69
|
+
|
|
70
|
+
<!-- mustflow-section: allowed-edits -->
|
|
71
|
+
## Allowed Edits
|
|
72
|
+
|
|
73
|
+
- Update CLI parser code, command metadata, help text, completions, docs examples, tests, fixtures, schemas, template copies, and release-sensitive package metadata that describe the same option contract.
|
|
74
|
+
- Add explicit long flags, validation errors, compatibility aliases, deprecation notices, negative tests, or parser edge-case tests when they reduce ambiguity.
|
|
75
|
+
- Prefer clear long options over clever short aliases. Add a short option only when it is frequent, unambiguous, and consistent with existing command conventions.
|
|
76
|
+
- Do not merge different safety meanings into one flag. For example, prompt acceptance, safety bypass, preview, destructive overwrite, and non-interactive failure should remain separable.
|
|
77
|
+
- Do not introduce unsafe defaults, vague automation flags, broad bypass flags, hidden prompts, or silent output-mode changes.
|
|
78
|
+
- Do not add parser behavior that breaks paths beginning with a dash, negative numbers, option terminators, repeated values, or non-interactive scripts unless that incompatibility is intentional and documented.
|
|
79
|
+
|
|
80
|
+
<!-- mustflow-section: procedure -->
|
|
81
|
+
## Procedure
|
|
82
|
+
|
|
83
|
+
1. Inventory the command syntax: subcommands, positional arguments, variadic arguments, options, inherited global flags, aliases, defaults, environment variables, config files, and generated completions.
|
|
84
|
+
2. Classify each option by role: safety and preview, confirmation and prompts, output and formatting, logging and diagnostics, config and environment, selection and filtering, file input and output, remote endpoint and auth, performance and cache, concurrency and locking, CI automation, destructive lifecycle, or AI-agent authority.
|
|
85
|
+
3. Decide whether the behavior belongs in a subcommand, positional argument, option, config key, environment variable, or separate command. Destructive lifecycle changes often deserve explicit verbs rather than a broad boolean flag.
|
|
86
|
+
4. Review naming collisions before adding names. Pay special attention to common conflicts such as verbose versus version, force versus file, dry-run versus debug or delete or directory, output format versus output path, interactive versus input, and shorthand reused differently across subcommands.
|
|
87
|
+
5. Separate near-neighbor semantics. `--yes` accepts prompts; `--force` bypasses a safety guard; `--dry-run` avoids writes; `--check` reports whether change is needed; `--diff` shows the proposed change; `--output` should mean a destination only if format uses another name such as `--format`.
|
|
88
|
+
6. Prefer explicit paired controls for risky workflows: dry-run, plan, diff, check, validate, no-input, confirm, yes, force, no-clobber, overwrite, backup, rollback, atomic, lock-timeout, fail-fast, and continue-on-error.
|
|
89
|
+
7. Check non-interactive behavior. Prompts should be TTY-only; `--no-input` should fail instead of waiting; CI-oriented paths should be compatible with quiet, JSON, no-color, no-progress, no-pager, timeout, wait, and detailed exit-code behavior when the repository supports those controls.
|
|
90
|
+
8. Check human and machine output interaction. If an option changes output format, route machine-readable results and diagnostics consistently, and use `cli-output-contract-review` or `public-json-contract-change` for the output contract details.
|
|
91
|
+
9. Define config and environment precedence. Document and test whether CLI flags override environment variables, profiles, config files, defaults, and inline `--set` style overrides.
|
|
92
|
+
10. Review parser edge cases: `--` option terminator, paths beginning with `-`, negative numbers, repeated flags, comma-separated lists versus repeated values, boolean negation with `--no-*`, optional values, duration and size units, shell quoting, globs, symlinks, hidden files, recursive flags, and stdin markers.
|
|
93
|
+
11. Check file and generation behavior. Separate input path, output path, output directory, create-dirs, overwrite, no-clobber, backup, atomic write, recursive traversal, hidden files, symlink following, ignore files, and validation-only modes.
|
|
94
|
+
12. Check remote and SaaS behavior when relevant. Separate endpoint URL, region, account, project, token source, token stdin, CA or proxy settings, connect timeout, read timeout, pagination, query filters, and retries.
|
|
95
|
+
13. Check infra or deploy behavior when relevant. Separate plan, apply, refresh, target, replace, prune, rollback, lock, lock-timeout, wait, parallelism, and detailed-exit-code semantics.
|
|
96
|
+
14. Check AI-agent behavior when relevant. Separate model, prompt source, context include or exclude, max files, max bytes, write permissions, command permissions, network permissions, approval policy, checkpoint, dry-run, diff, and apply.
|
|
97
|
+
15. Preserve compatibility. For renamed or split flags, consider aliases, deprecation warnings, migration help, and tests before removing old syntax. Treat breaking option removals, changed defaults, changed prompt behavior, and changed parser grammar as public API changes.
|
|
98
|
+
16. Synchronize every surface that teaches or consumes the syntax: parser code, help text, completions, docs, README, examples, tests, fixtures, schemas, templates, package metadata, and release notes when applicable.
|
|
99
|
+
17. Verify with the narrowest configured related tests first, then docs, release, template, and mustflow checks when syntax, docs, profiles, templates, or package metadata changed.
|
|
100
|
+
|
|
101
|
+
<!-- mustflow-section: postconditions -->
|
|
102
|
+
## Postconditions
|
|
103
|
+
|
|
104
|
+
- Option names, aliases, defaults, parser behavior, config precedence, prompt behavior, and non-interactive behavior are explicit and synchronized.
|
|
105
|
+
- Short flags have a documented reason or are omitted in favor of clear long flags.
|
|
106
|
+
- Destructive, write, preview, confirmation, force, and non-interactive controls are not conflated.
|
|
107
|
+
- Automation-facing use has stable output-mode, no-prompt, no-color, no-progress, no-pager, timeout, retry, and exit-code behavior when relevant.
|
|
108
|
+
- Parser edge cases are covered by tests or reported as remaining risk.
|
|
109
|
+
|
|
110
|
+
<!-- mustflow-section: verification -->
|
|
111
|
+
## Verification
|
|
112
|
+
|
|
113
|
+
Use configured oneshot command intents when available:
|
|
114
|
+
|
|
115
|
+
- `changes_status`
|
|
116
|
+
- `changes_diff_summary`
|
|
117
|
+
- `test_related`
|
|
118
|
+
- `docs_validate_fast`
|
|
119
|
+
- `test_release`
|
|
120
|
+
- `mustflow_check`
|
|
121
|
+
|
|
122
|
+
Use broader configured tests when option parsing is cross-cutting or no narrower related test covers the syntax.
|
|
123
|
+
|
|
124
|
+
<!-- mustflow-section: failure-handling -->
|
|
125
|
+
## Failure Handling
|
|
126
|
+
|
|
127
|
+
- If an option name conflicts with existing syntax, keep the old contract and choose a clearer long option unless a breaking change is intentionally routed through compatibility and versioning.
|
|
128
|
+
- If a parser edge case cannot be verified directly, add focused coverage or report the missing coverage before claiming safety.
|
|
129
|
+
- If docs, help text, completions, or templates cannot be synchronized in the same change, avoid claiming the option contract is installed or documented.
|
|
130
|
+
- If non-interactive behavior is unclear, default to failing safely rather than prompting, writing, deleting, or assuming consent.
|
|
131
|
+
- If an external recommendation conflicts with repository conventions, document the rejected recommendation and the repository-specific reason.
|
|
132
|
+
- If a breaking option change is intentional, route the version impact through the repository versioning policy and report affected consumers.
|
|
133
|
+
|
|
134
|
+
<!-- mustflow-section: output-format -->
|
|
135
|
+
## Output Format
|
|
136
|
+
|
|
137
|
+
- CLI command and options reviewed
|
|
138
|
+
- Option role classification and naming decision
|
|
139
|
+
- Short and long flag collision review
|
|
140
|
+
- Safety, preview, destructive, prompt, and non-interactive controls
|
|
141
|
+
- Parser edge cases checked or reported missing
|
|
142
|
+
- Config and environment precedence
|
|
143
|
+
- Human, machine, CI, color, pager, progress, timeout, retry, and exit-code interaction
|
|
144
|
+
- Docs, help, completions, tests, schemas, templates, and package metadata synchronized
|
|
145
|
+
- Command intents run
|
|
146
|
+
- Skipped checks and reasons
|
|
147
|
+
- Remaining CLI-option contract risk
|
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
mustflow_doc: skill.database-change-safety
|
|
3
3
|
locale: en
|
|
4
4
|
canonical: true
|
|
5
|
-
revision:
|
|
5
|
+
revision: 17
|
|
6
6
|
lifecycle: mustflow-owned
|
|
7
7
|
authority: procedure
|
|
8
8
|
name: database-change-safety
|
|
@@ -79,6 +79,7 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
|
|
|
79
79
|
- Event role: operational event, audit log, behavior analytics event, integration outbox message, reporting aggregate, or replayable domain event.
|
|
80
80
|
- Data owner and affected tables, collections, stores, indexes, caches, generated files, or read models.
|
|
81
81
|
- Entity identity rules, including stable ids, external provider ids, mutable slugs, titles, locale-specific addresses, redirects, and public API identifiers when content or user-facing resources are involved.
|
|
82
|
+
- Regret-prone schema shape rules, including internal versus public ids, normalized unique keys, tenant-scoped uniqueness, foreign keys, join tables, enum or lookup-table ownership, nullable-field meaning, JSON promotion criteria, custom-field boundaries, status history, optimistic locking, and operational trace fields.
|
|
82
83
|
- Exit and restore rules, including whether exported data preserves relationships, permissions, files, versions, events, audit history, automation rules, provider id mappings, schema metadata, and enough import or restore evidence to reconstruct product state.
|
|
83
84
|
- Identifier ownership rules, including which ids are product-owned, which ids are public, which ids are provider mappings, and whether external auth, payment, CRM, analytics, storage, or CMS ids can change without breaking internal references.
|
|
84
85
|
- Authentication identity rules, including app-owned user id, provider subject records, email-as-attribute behavior, social provider subject preservation, account merge or relink policy, session migration expectations, and whether memberships, roles, permissions, and entitlements live in product-owned tables rather than only provider metadata.
|
|
@@ -173,8 +174,25 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
|
|
|
173
174
|
- External-service core facts, such as current entitlement, subscription or plan state, processed payment event id, email consent state, customer lifecycle state, file identity and ownership, search source document metadata, job processing state, and audit evidence. Do not let a provider dashboard be the only place that can explain these facts.
|
|
174
175
|
- Search and queue reconstruction records, such as index document builders, ranking or synonym policy versions, search logs, queue message schema versions, job idempotency keys, retry state, dead-letter state, and manual replay markers.
|
|
175
176
|
4. Check schema shape: primary keys, foreign keys, unique constraints, nullable fields, defaults, check constraints, status values, timestamps, soft delete fields, tenant scope, audit fields, and retention rules.
|
|
177
|
+
- Use immutable internal primary keys for joins and separate public identifiers for URLs and APIs. Do not make email, slug, username, external provider id, or mutable display code the primary key for product-owned rows.
|
|
178
|
+
- Enforce uniqueness in the database, not only in application prechecks. Normalize comparison keys such as email, slug, provider id, and idempotency key explicitly, preserve the display value separately when needed, and name the unique constraint or index so operations can diagnose failures.
|
|
179
|
+
- Scope unique constraints to the real owner. Tenant-owned slugs, emails, invitations, memberships, idempotency keys, and external references usually need `tenant_id`, `workspace_id`, `operation_type`, or `provider` in the key. Global uniqueness should be a deliberate product rule, not an accident.
|
|
180
|
+
- Design soft-delete uniqueness before shipping. Active-only uniqueness, nullable unique behavior, restore conflicts, deleted-id reuse, and tombstone requirements must be explicit; otherwise deleted rows either block valid new records or allow duplicate active records.
|
|
181
|
+
- Prefer database foreign keys for core ownership and reference integrity. If an FK is intentionally omitted for scale, import staging, sharding, or asynchronous reconciliation, name the replacement invariant, cleanup path, and orphan-detection evidence. Index FK columns when joins, parent deletion checks, or tenant deletion depend on them.
|
|
182
|
+
- Treat `ON DELETE CASCADE` as a lifecycle promise, not cleanup convenience. Use it only when child rows truly share the parent's lifetime and audit, retention, restore, and legal obligations do not require separate survival.
|
|
183
|
+
- Model many-to-many relationships with join tables that can own role, status, order, source, timestamps, and actor fields. Avoid comma-separated ids, arrays of ids, or JSON lists for relationships that need joins, uniqueness, permissions, deletes, or audit.
|
|
184
|
+
- Treat polymorphic `entity_type` plus `entity_id` relations as integrity debt for core data because ordinary FKs cannot prove the target exists. Prefer target-specific tables, a shared parent table, or explicit constraint and cleanup machinery when the relation is business-critical.
|
|
185
|
+
- Choose enum, lookup table, or state machine based on change behavior. Stable technical codes may be enums; operator-managed values, values with display or sort metadata, plan or category catalogs, roles, and jurisdiction-specific rules usually need lookup tables. Workflow status needs allowed transitions and history, not only a value list.
|
|
186
|
+
- Avoid boolean state soup such as several independent `is_*` flags for one lifecycle. Use one current status plus timestamps or event history when states are mutually exclusive, ordered, reversible, or policy-driven.
|
|
187
|
+
- Give nullable fields exactly one meaning. Separate unknown, not applicable, not entered yet, deleted, failed, and pending states with explicit status or reason fields when queries or reports depend on the distinction.
|
|
188
|
+
- Avoid EAV or generic `entities`/`attributes`/`values` tables for core domain facts. If customer-defined fields are required, keep them in a bounded custom-field area with definitions, type validation, quotas, ownership, export semantics, and a promotion path once values drive search, sort, permission, billing, or reporting.
|
|
189
|
+
- Do not hide behavior-driving data in JSON. Keys used for filters, ordering, joins, uniqueness, permissions, tenant scope, status, retention, money, dates, quotas, indexes, or operational dashboards should be typed columns, child tables, or generated/computed columns with a migration path. Use `database-json-modeling-review` when JSON is part of the diff.
|
|
190
|
+
- Keep tenant ownership close to the owned row when tenant-scoped operations, billing, audit, export, restore, delete, or performance matter. B2B products should usually separate global users from tenant memberships, roles, invitations, entitlements, and billing records.
|
|
176
191
|
- Treat deletion as lifecycle when recovery, audit, search behavior, support handling, or retention matters. Consider `deleted_at`, `deleted_by`, `delete_reason`, `restored_at`, `restored_by`, and `purge_after` instead of a lone boolean or timestamp.
|
|
177
192
|
- Separate business records that should be soft-deleted or archived from personal data that should be anonymized, purged, or retained under a narrower legal rule.
|
|
193
|
+
- Keep status history for states that affect money, access, fulfillment, support, compliance, or user-visible commitments. A current status alone rarely explains who changed it, why, under which request, and whether a late webhook, retry, or admin action should still apply.
|
|
194
|
+
- Add optimistic versioning or conditional updates when two users, admins, workers, or webhooks can edit the same important row. Last-write-wins is usually data loss unless the product explicitly accepts it.
|
|
195
|
+
- Add operational trace fields where incident response will need them: server timestamps, actor ids, `created_by`, `updated_by`, `request_id`, `source`, import or provider reference, and safe reason codes. Do not add them blindly to every table, but do not leave high-value rows untraceable.
|
|
178
196
|
- Treat mutable high-value records as versioned when reproducibility matters, such as AI prompts, documents, contracts, price policies, experiment configs, comparison data, permission policies, automation rules, and model settings. Prefer a stable parent row with a current-version pointer plus immutable version rows.
|
|
179
197
|
- Use ledgers for money-like or quota-like balances, such as points, credits, inventory reservations, refunds, coupon issuance, entitlement grants, and manual adjustments. Treat cached balances as derived from ledger entries unless the local design proves otherwise.
|
|
180
198
|
- For audit logs, store actor type, actor id when safe, action, target type and id, bounded before and after values, reason, request id, idempotency key, and timestamp in the same local transaction as the audited change when possible. Audit logs should be append-only to normal operators and should redact or omit personal data that is not needed to explain the change.
|
|
@@ -318,6 +336,7 @@ Use the smallest persistence boundary that proves the risk. Do not introduce rep
|
|
|
318
336
|
## Postconditions
|
|
319
337
|
|
|
320
338
|
- The database role and source of truth are explicit.
|
|
339
|
+
- Regret-prone schema shortcuts such as mutable primary keys, app-only uniqueness, unscoped tenant uniqueness, missing FK or cascade ownership, ambiguous nulls, boolean state soup, polymorphic core relations, EAV core facts, behavior-driving JSON, and user-as-tenant coupling are fixed, explicitly accepted, or reported.
|
|
321
340
|
- Database rows, ORM models, generated caches, and read models do not leak into domain truth unless the local architecture intentionally owns that boundary.
|
|
322
341
|
- Queries preserve authorization, tenant or user scope, deterministic ordering, expected absence behavior, and retention rules.
|
|
323
342
|
- Content and resource models separate stable identity from mutable titles, slugs, URLs, translations, display fields, revisions, facts, sources, projections, and analytics dimensions when those concerns exist.
|
|
@@ -375,7 +394,7 @@ Prefer the narrowest configured test, build, docs, release, or mustflow intent t
|
|
|
375
394
|
|
|
376
395
|
- Database role and owner
|
|
377
396
|
- Affected read and write paths
|
|
378
|
-
- Schema, constraint, and query semantics reviewed
|
|
397
|
+
- Schema-regret, constraint, relation, enum, JSON, custom-field, status-history, traceability, and query semantics reviewed
|
|
379
398
|
- Identity, slug, lifecycle, asset, body block, taxonomy, relationship, attribute, filter URL, landing-page, translation, locale, country, currency, timezone, local-date, money, price snapshot, revision, claim, fact, source, collection, verification, comparison methodology, affiliate link, data-ownership, behavior analytics, audit log, API projection, public identifier, backup or restore, bulk update, admin audit, user-state, aggregate, cache-key, projection, and cache-invalidation checks where relevant
|
|
380
399
|
- Export, import, product-owned id, provider-id mapping, relationship, permission, file, automation, event-history, and reconstruction checks where relevant
|
|
381
400
|
- Authorization, tenant scope, retention, and privacy checks
|