npm - @mediadatafusion/pi-workflow-suite - Versions diffs - 0.0.10 → 0.0.11 - Mend

@mediadatafusion/pi-workflow-suite 0.0.10 → 0.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +31 -0
package/README.md +126 -9
package/VERSION +1 -1
package/config/prompts/mission-review-prompt.md +42 -0
package/config/prompts/workflow-reviewer-prompt.md +44 -0
package/extensions/workflow-model-router.ts +28 -14
package/extensions/workflow-modes.ts +1184 -311
package/extensions/workflow-state.ts +62 -5
package/extensions/workflow-tool-guard.ts +105 -15
package/package.json +3 -2

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,37 @@
 All notable public releases will be documented in this file.
+## [0.0.11] - 2026-05-30
+### Added
+- `workflow_browser_check` tool — headless browser verification for validator mode with interactive actions (click, type, read, reload, screenshot, evaluate). Uses Puppeteer from the Pi runtime so validators can verify web app behavior regardless of the target project's dependencies.
+- Token budget controls (`maxTokens`, `maxRuntimeHours`) for Plan, Mission, and Standard modes.
+- Structured validation boolean fields (`concreteRepairableIssue`, `manualVerificationRequired`, `evidenceGap`) on workflow and mission state for infallible classification.
+- `/plan complete` command for explicit Plan Mode completion.
+- Mission reviewer and workflow reviewer prompt templates.
+### Hardened
+- **Validation pipeline**: PARTIAL PASS with no concrete code defects now completes the workflow. Classifier reordered so concrete fixes route to repair over evidence gaps. Expanded automatable-evidence-gap detection. Mission validation manual-only outcomes advance milestones; final-validation manual-only outcomes complete the mission.
+- **Reviewer and repair**: Mission reviewer auto-repair uses centralized default-enabled configuration matching Plan Mode. Built-in mission defaults aligned (reviewer auto-repair enabled, retry mode `safe_only`, max retries 2). Consistent retry gating across both modes.
+- **Tool guard and Repo Lock**: Safe-command recognition expanded across package managers and ecosystems (pnpm, yarn, bun, npx serve, python3, curl localhost, ps, pgrep, sleep). Shell preamble handling (`stripSafePreamble`) for `set -e` and `export` patterns. `/tmp/` and `/dev/` paths exempted from Repo Lock blocking. Reduced false-positive destructive-command blocks.
+- **Prompts and agents**: Unified inline-diagram guidance across all prompts, agents, and skills. Web app verification procedures with structured evidence output requirements in validator prompts. Sub-agent write-ownership clarified in mission run prompt. Agent Mermaid guidance updated for raw blocks since sub-agents lack `workflow_diagram`.
+- **Runtime accounting**: Wall-clock age uses terminal timestamp when workflow is stopped. Active-runtime tracking preserved for workflows that began before the current session.
+- **Plan Mode progress tracking**: Plan execution step progress now reliably tracks across all steps. The `workflow_progress` tool is guaranteed on the agent's active tool surface every turn. The progress guard enforces step activation before file writes while allowing sub-agents to run freely for research and preparation. Multi-step prompt guidance with inline step status display keeps the agent aligned with the approved plan from first step through final validation.
+### Fixed
+- Plan execution sub-agents no longer deadlock against the step progress guard when forced sub-agent policies are active.
+- Plan Mode step progress widget consistently updates across multi-step execution instead of staying stuck on step 1.
+### Updated
+- Validators can write temporary evidence-gathering scripts (was read-only).
+- Mission `maxRuntimeHours` default 8→13.
+- All 4 built-in presets updated with `planShowProgressBar`.
+- Internal architecture documentation expanded with model routing, settings merge chain, and preset bundle architecture.
 ## [0.0.10] - 2026-05-25
 ### Changed

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 [![Install](https://cdn.jsdelivr.net/npm/@mediadatafusion/pi-workflow-suite@0.0.6/docs/assets/readme-link-install.svg)](#installation) [![Quick Start](https://cdn.jsdelivr.net/npm/@mediadatafusion/pi-workflow-suite@0.0.6/docs/assets/readme-link-quick-start.svg)](#quick-start) [![Commands](https://cdn.jsdelivr.net/npm/@mediadatafusion/pi-workflow-suite@0.0.6/docs/assets/readme-link-commands.svg)](#core-commands) [![Settings](https://cdn.jsdelivr.net/npm/@mediadatafusion/pi-workflow-suite@0.0.6/docs/assets/readme-link-settings.svg)](#settings-reference)
-**Workflow Suite Version:** `v0.0.10`
+**Workflow Suite Version:** `v0.0.11`
 ## Overview
@@ -55,6 +55,7 @@ See Pi Workflow Suite in action: structured workflow modes, settings, runtime st
 - [Compaction Support](#compaction-support)
 - [Diagram Support](#diagram-support)
 - [Web Access](#web-access)
+- [Browser Verification](#browser-verification)
 - [Repository Lock](#repository-lock)
 - [Plan History](#plan-history)
 - [Mission Progress, Checkpoints, And Runtime Tracking](#mission-progress-checkpoints-and-runtime-tracking)
@@ -115,9 +116,10 @@ Pi Workflow Suite turns Pi into a guided workflow environment:
 | Mission Mode | Long-running milestone workflows with approval, checkpoints, Mission-specific model overrides, validation gates, repair/retry, pause/resume, final-validation controls, and continuity tracking. |
 | Themes And Startup UI | Workflow Suite themes, startup visual cards, startup logo modes, custom terminal logo text, custom brand cards, footer/status styling, widgets, and optional input border styling. |
 | Interactive Diagrams | `workflow_diagram` Mermaid support with terminal preview, SVG-first clickable artifacts, PNG/runtime rendering support, dark-mode-friendly styling, and runtime artifact storage. |
-| Web Research | First-party `workflow_web_search` and `workflow_web_fetch` tools for public web search/fetch with source URLs, blocked local/private/internal hosts, time/size limits, and untrusted-content handling. |
+| Web Research & Browser Verification | First-party `workflow_web_search`, `workflow_web_fetch`, and `workflow_browser_check` tools. Search and fetch for public web evidence with source URLs, blocked local/private/internal hosts, and time/size limits. Headless browser verification for runtime web app validation with interactive UI actions (click, type, read, screenshot, evaluate). |
 | Repo Lock | Project-scoped Global Safety control that constrains normal file tools, bash path checks, and sub-agents to the active repository, with protected configuration paths and clear non-sandbox caveats. |
 | Compaction | Pi default, custom model, or disabled Workflow Suite compaction so context summarization can use its own provider/model, proactive threshold checks, idle-boundary execution, custom token tuning, adaptive fitting, status reporting, and safe fallback. |
+| Token Budgets | Optional per-mode token and runtime caps (`maxTokens`, `maxRuntimeHours`) for Plan, Mission, and Standard Mode. Off by default (unlimited). When enabled, Workflow Suite tracks cumulative usage and blocks further agent turns when the budget is exceeded. |
 | Workflow Roles | Planner, Executor, Reviewer, Validator, Mission, and compaction responsibilities are separated by phase so each job has clear boundaries and can be matched to the right model. |
 | Model Selection | Configure which provider/model and thinking level powers each workflow role, with shared defaults plus Standard-specific and Mission-specific overrides for simpler or higher-rigor setups. |
 | Presets | Built-in and custom workflow profiles with selector commands and Ctrl+Shift+U cycling while active modes are running. |
@@ -134,13 +136,15 @@ Pi Workflow Suite turns Pi into a guided workflow environment:
 - Mission Mode through `/mission`, `/m`, and `Ctrl+Shift+M` for durable milestone workflows.
 - Configurable clarification in Standard Mode, plus dynamic clarification in Plan Mode and Mission Mode.
 - Review, execution, validation, repair, retry, checkpoint, and final-validation controls where the selected mode supports them.
-- Plan history, mission checkpoint history, Standard runtime tracking, Mission runtime tracking, and mode-aware progress widgets.
+- Plan history, mission checkpoint history, Standard runtime tracking, and Mission runtime tracking.
+- Mode-aware progress widgets: Plan step tracking with step-by-step progress and validation gates, Mission milestone tracking with checkpoint history, and Standard Mode dynamic To Do progress.
 - Workflow settings UI for Standard Mode, Plan Mode, Mission Mode, model selection, sub-agents, compaction, widgets, themes, startup visuals, and safety.
 - Workflow themes with a `none` option, startup visual cards, startup logo modes, custom terminal logo text, custom brand cards, and optional themed input borders.
-- Integrated `workflow_web_search` and `workflow_web_fetch` tools for current public evidence and source-backed URL reading.
+- Integrated `workflow_web_search`, `workflow_web_fetch`, and `workflow_browser_check` tools for current public evidence, source-backed URL reading, and headless browser verification of web app runtime behavior.
 - Interactive `workflow_diagram` Mermaid rendering with terminal preview, clickable SVG artifacts, and PNG/runtime rendering support.
 - Repo Lock for project-scoped path safety around repository work, protected project configuration, and sub-agent inheritance.
 - Role-aware model selection so planning, execution, review, validation, Mission work, and compaction can each use the provider/model and thinking level that fits the job.
+- Optional per-mode token and runtime budgets (`maxTokens`, `maxRuntimeHours`) to cap usage in Plan, Mission, and Standard Mode. Off by default; enable when you need predictable cost or time limits.
 - Sub-agent usage policies for planning, execution, repair, review, and validation, with explicit documentation that these are orchestration settings, not a universal permission manager.
 - Safe install, backup, audit, quarantine, verification, and package validation scripts.
@@ -433,6 +437,80 @@ The grouped settings menus expose shared role selection, Standard-specific model
 Shared model selection is available through `/workflow settings Shared Models`. Standard-specific and mission-specific model selection is available through their mode settings menus.
+## Efficiency Guidance
+Workflow Suite gives you independent control over workflow rigor and model cost. This section explains which settings affect token usage and how to tune them.
+### Thinking Levels
+Six thinking levels control how much reasoning the model applies per inference:
+```text
+off < minimal < low < medium < high < xhigh
+```
+Higher levels use more tokens. Recommendations by role:
+| Role | Guidance |
+|------|----------|
+| Planner | Higher thinking is often worth the cost — the plan defines scope, assumptions, risk, and validation strategy. |
+| Reviewer | Higher thinking helps catch missing requirements, unsafe steps, or weak plans before execution begins. |
+| Validator | Higher thinking reduces shared blind spots with the executor. An independent validator with a different model is more valuable than maxing thinking on the same model. |
+| Executor | Medium-high is usually sufficient. Execution is about precise tool use and instruction following, not open-ended analysis. |
+| Compaction | Summarization is mechanical. If using a custom compaction model, a lower thinking level or a cheaper model is often appropriate. |
+Note: Pi may clamp thinking levels for model/provider combinations that do not support the requested level. This is Pi runtime behavior, not a Workflow Suite setting.
+### Token Budgets
+Each mode supports an optional configurable token budget:
+- `planning.maxTokens` — caps estimated token usage for a Plan Mode workflow.
+- `missions.maxTokens` — caps estimated token usage for a Mission Mode workflow.
+- `standard.maxTokens` — caps estimated token usage for a Standard Mode session.
+Default is `0` (unlimited). When set to a positive value, Workflow Suite tracks cumulative usage and blocks further agent turns when the budget is exceeded. Set budgets with headroom — the tracker uses context window size as a proxy since Pi does not expose cumulative token counts.
+Configure through `/workflow settings Plan Mode`, `/workflow settings Mission Mode`, or `/workflow settings Standard Mode`.
+### Sub-Agent Policy And Token Cost
+Sub-agent policies control how many parallel workers are requested per phase:
+```text
+off < auto < deep < maximum < forced
+```
+Each worker has its own context window, so more workers multiply token spend. The built-in presets scale worker counts from 1 (simple) to 4 (maximum). For cost-sensitive work:
+- Use `auto` to let the model decide when workers are worth the cost.
+- Lower worker counts in `deep`/`maximum`/`forced` policies.
+- Disable phases that are not needed for the current task.
+Sub-agent settings are configured through `/workflow settings Shared Sub-agents`, with per-mode overrides in Standard Mode and Mission Mode settings.
+### Settings That Affect Token Usage
+| Setting | Impact |
+|---------|--------|
+| Thinking levels (per role) | Directly controls tokens per inference for each workflow phase. |
+| Sub-agent policies and worker counts | More workers = more parallel context windows = higher total spend. |
+| `maxClarificationQuestions` | Higher counts mean more clarification turns before work begins. |
+| `planning.depth` | `deep`/`maximum` prompt for more sub-agent research before planning. |
+| `validationRetryMode` | `aggressive_within_scope` can trigger more repair cycles. |
+| `finalValidationEnabled` | Adds a whole-mission validation pass after all milestones complete. |
+| `maxTokens` (per mode) | Optional budget cap; blocks further turns when exceeded. |
+### Presets And Models Are Independent
+Presets control workflow behavior (approval gates, review/validation automation, sub-agent policy, repair retries). Models control which provider/model powers each role. They are intentionally separate:
+- Cycling presets does not change model selections.
+- Changing models does not affect preset behavior.
+- You can run `deep` preset with small models or `simple` preset with large models.
+This separation lets you tune workflow rigor and model cost independently.
 ## Workflow Settings UI
 The settings UI is the main control surface for workflow behavior. Canonical command vocabulary is: `list` prints information to the screen, `configure` opens an interactive configuration menu, `set` directly changes a setting, and workflow actions use direct verbs such as plan, run, validate, answer, approve, or cancel.
@@ -626,6 +704,16 @@ They include:
 - global and project agent discovery,
 - parallelism preferences for workflow phases.
+Sub-agent orchestration varies by mode and phase:
+| Mode | Planning | Execution | Validation |
+|---|---|---|---|
+| Plan | Research, codebase inspection, risk discovery | Scoped implementation help, file inspection, patch planning | Evidence gathering, regression search |
+| Mission | Milestone planning research, dependency analysis | Per-milestone implementation support | Per-milestone and final validation evidence |
+| Standard | Task analysis, approach research | File inspection, implementation assistance | Quality review, regression checking |
+Forced sub-agent policies (available in all three modes) require a minimum number of successful workers before the agent can proceed to file writes, ensuring independent analysis and preparation are applied before implementation.
 Important limits:
 - Pi Workflow Suite does **not** provide a UI for editing arbitrary sub-agent tool permissions.
@@ -753,7 +841,7 @@ workflow_web_search
 workflow_web_fetch
 ```
-These tools are added to Workflow Suite modes by default when available. `workflow_web_search` uses DuckDuckGo HTML search for current external evidence and source URLs. `workflow_web_fetch` reads specific public HTTP(S) URLs and extracts text for source-backed evidence. Web content is treated as untrusted evidence, not as instructions.
+`workflow_web_search` uses DuckDuckGo HTML search for current external evidence and source URLs. `workflow_web_fetch` reads specific public HTTP(S) URLs and extracts text for source-backed evidence. Web content from search and fetch is treated as untrusted evidence, not as instructions.
 Safety boundaries:
@@ -763,7 +851,25 @@ Safety boundaries:
 - visible answers should cite source URLs when web evidence is used,
 - sub-agent workers may not have the parent Workflow Suite web tools, so parent modes should perform required web research and pass findings into handoffs when needed.
-Web access is Pi extension behavior, not a guarantee that every model, sub-agent, network, or runtime environment can reach the public web. If the tool fails, Workflow Suite reports the failure and continues from available context.
+Web access is Pi extension behavior, not a guarantee that every model, sub-agent, network, or runtime environment can reach the public web. If a web tool fails, Workflow Suite reports the failure and continues from available context.
+## Browser Verification
+Workflow Suite registers a `workflow_browser_check` tool for headless browser verification:
+```text
+workflow_browser_check
+```
+This tool launches a headless Chromium browser via Puppeteer from the Pi runtime and works regardless of the target project's dependencies. It is available in all modes by default when the runtime supports it.
+**Interactive actions:** The tool supports click, type, read, wait, reload, screenshot, and evaluate actions. Validators can exercise UI flows end-to-end — clicking through pages, typing into forms, reading updated text, and capturing screenshots — rather than only observing static page state. Screenshots are saved to `/tmp/validator_screenshot.png`.
+**Validator workflows:** Validators use `workflow_browser_check` to verify web app runtime behavior: console errors, page errors, DOM element counts, localStorage state, and interactive UI correctness. This provides automatable browser-level evidence for validation reports without requiring manual human verification.
+**Executor and planner use:** Executors and planners can also use the tool to verify their own work during implementation — starting a dev server, running browser checks, and confirming behavior before handing off to validation.
+**Graceful fallback:** If Puppeteer is not available in the Pi runtime, the tool reports the error and the workflow continues from available context. Browser verification is an enhancement, not a hard dependency.
 ## Repository Lock
@@ -781,6 +887,8 @@ Repo Lock does not grant normal agent tools access to the live Pi runtime under
 Repo Lock helps prevent accidental cross-repository work. It is not an operating-system sandbox, a complete shell parser, or a guarantee that every possible child process is contained. Review commands before running them, especially commands that invoke other tools or scripts.
+Repo Lock is enabled by default in the Standard, Deep, and Maximum built-in presets. If your workflow requires cross-repository access, disable it through `/workflow settings Global Safety` or by setting `safety.repoLockEnabled` to `false`.
 ## Plan History
 Plan history can persist workflow plans under Pi runtime state. Saved plans include:
@@ -887,6 +995,15 @@ cd /path/to/project
 pi install -l npm:@mediadatafusion/pi-workflow-suite
 ```
+### Installing specific versions
+```bash
+pi install npm:@mediadatafusion/pi-workflow-suite@0.0.11
+pi install -l npm:@mediadatafusion/pi-workflow-suite@0.0.11
+```
+An unversioned install follows normal update behavior: `pi update` and `pi update --extensions` will pick up new package releases. A versioned install pins the package to that version. Pinned package specs are intentionally skipped by Pi's normal package update commands. To move a pinned install to a newer version, reinstall with the desired version. To switch back to latest tracking, use the unversioned install command without `@<version>`.
 ### Source install
 ```bash
@@ -1088,10 +1205,10 @@ See `docs/TROUBLESHOOTING.md` for detailed diagnostics.
 ## Versioning
-The current preparation version is `v0.0.10`. Version information is intentionally aligned across:
+The current preparation version is `v0.0.11`. Version information is intentionally aligned across:
-- `VERSION` (`v0.0.10`),
-- `package.json` (`0.0.10`),
+- `VERSION` (`v0.0.11`),
+- `package.json` (`0.0.11`),
 - `package-lock.json`,
 - this README,
 - Workflow Suite settings/about output.

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- v0.0.10
1	+ v0.0.11

package/config/prompts/mission-review-prompt.md ADDED Viewed

@@ -0,0 +1,42 @@
+CRITICAL: Call workflow_review_result as your FIRST action in this turn. Do not output any text, analysis, or diagrams before the tool call. After the tool executes and returns, include a workflow_diagram to visualize your review findings (architecture concerns, risk flow, or recommendation path) with concise prose. Place the diagram inline -- not batched at the end.
+---
+description: Review the mission milestone plan before approval and execution
+---
+You are in PI MISSION MODE REVIEWER MODE.
+Use read-only tools only. Do not edit, write, or run bash. Review the Mission milestone plan before Mission approval and execution. Reviewer is not validation. Reviewer checks whether the mission plan is safe, complete, properly scoped, and has validation-ready milestones before executor work begins.
+Review checklist:
+- Milestones are parser-safe, ordered, and scoped to the mission goal.
+- Each milestone has clear objective, steps, acceptance criteria, required evidence, and risks.
+- The mission plan does not authorize destructive, secret, auth/session/log/runtime-state, database, deployment, push, or out-of-scope work without explicit approval.
+- Validation strategy is strong enough for per-milestone validation and optional final comprehensive validation.
+- Autonomy and pause/continue behavior are safe for the mission scope.
+- Any repair recommendation must revise the mission plan only; do not execute.
+Output exactly:
+# Review Report
+## Verdict
+PASS — plan is complete, safe, scoped correctly, and ready for approval.
+NOTES — plan is acceptable but has non-blocking observations for the executor.
+NEEDS REPAIR — plan has concrete gaps that should be repaired before approval (missing requirements, unclear milestones, insufficient validation).
+FAIL — plan has serious issues that block safe execution (missing safety constraints, out-of-scope work, broken dependencies).
+BLOCKED — plan cannot proceed without external resolution (missing credentials, unavailable services, blocked dependencies).
+Verdict criteria:
+- PASS only when: no repairable issues remain and the plan is ready for approval.
+- NOTES when: plan is sound but has minor observations the executor should consider.
+- NEEDS REPAIR when: milestones lack acceptance criteria, validation plan is weak, scope is unclear, or concrete missing requirements are identified.
+- FAIL when: plan authorizes destructive/secret/auth/database/deploy/push work without explicit approval, or safety constraints are absent.
+- BLOCKED when: plan requires unavailable resources or external dependencies that cannot be resolved by repair.
+## Reason
+## Mission Plan Coverage
+## Milestone Quality
+## Validation Plan Review
+## Safety And Scope Review
+## Missing Requirements
+## Repairable Plan Issues
+## Regression Risks
+## Recommended Next Action

package/config/prompts/workflow-reviewer-prompt.md ADDED Viewed

@@ -0,0 +1,44 @@
+CRITICAL: Call workflow_review_result as your FIRST action in this turn. Do not output any text, analysis, or diagrams before the tool call. After the tool executes and returns, include a workflow_diagram to visualize your review findings (architecture concerns, risk flow, or recommendation path) with concise prose. Place the diagram inline -- not batched at the end.
+---
+description: Review the approved plan before execution
+---
+You are in PI WORKFLOW REVIEWER MODE.
+Use read-only tools only. Do not edit, write, or run bash. Review the approved plan before execution for scope, risk, missing requirements, and files that should remain untouched.
+Reviewer is not validation. Reviewer checks whether the plan or implementation approach is safe, complete, and aligned before execution. Validation checks whether work passes after or during implementation.
+Review checklist:
+- Plan scope is clear, bounded, and aligned with the user's request.
+- Implementation steps are ordered correctly with no circular dependencies.
+- Required files are identified and files to avoid are listed.
+- Validation strategy covers all deliverables with concrete acceptance criteria.
+- Risk assessment covers security, data loss, breaking changes, and deployment concerns.
+- The plan does not authorize destructive, secret, auth/session/log/runtime-state, database, deployment, push, or out-of-scope work without explicit approval.
+- Test and build verification is included where applicable.
+Output exactly:
+# Reviewer Report
+## Verdict
+PASS — plan is complete, safe, properly scoped, and ready for execution.
+NOTES — plan is sound with non-blocking observations for the executor.
+NEEDS REPAIR — plan has concrete gaps (missing steps, unclear files, weak validation, scope creep, risks not addressed).
+FAIL — plan has serious blockers (safety violations, missing security constraints, broken dependencies, impossible steps).
+BLOCKED — plan cannot proceed without external resolution.
+Do not write APPROVED, APPROVE, OK, or PROCEED as the verdict label.
+Verdict criteria:
+- PASS only when: all checklist items are satisfied and no repairable issues remain.
+- NOTES when: minor observations exist (suggested file order, additional test ideas, optional improvements).
+- NEEDS REPAIR when: concrete missing requirements, unclear scope boundaries, insufficient validation, or unaddressed risks.
+- FAIL when: safety/security violations, circular dependencies, impossible steps, or work that exceeds approved scope without authorization.
+- BLOCKED when: plan requires unavailable resources or external dependencies that cannot be resolved by repair.
+## Reason
+## Scope Risks
+## Missing Information
+## Files To Be Careful With
+## Required Plan Revisions
+## Recommended Execution Notes

package/extensions/workflow-model-router.ts CHANGED Viewed

@@ -106,6 +106,8 @@ export interface WorkflowSettings {
     clarificationQualityGate?: boolean;
     allowClarificationWithoutAnalysis?: boolean;
     useSubagentsBeforeClarification?: boolean;
+    maxTokens?: number;
+    maxRuntimeHours?: number;
   };
   workflow: {
     requirePlanApprovalBeforeExecute: boolean;
@@ -138,6 +140,7 @@ export interface WorkflowSettings {
     planHistoryLimit?: number;
     planProgressEnabled?: boolean;
     planRuntimeEnabled?: boolean;
+    planShowProgressBar?: boolean;
   };
   standard: {
     enabled: boolean;
@@ -160,11 +163,13 @@ export interface WorkflowSettings {
     useStandardSpecificModels?: boolean;
     modelRole?: StandardModelRole;
     models?: Record<WorkflowRole, RoleModelSettings>;
+    maxTokens?: number;
   };
   missions: {
     enabled: boolean;
     defaultAutonomy: MissionAutonomy;
     maxRuntimeHours: number;
+    maxTokens?: number;
     checkpointIntervalMinutes: number;
     requireApprovalForDestructiveActions: boolean;
     requireValidationPerMilestone: boolean;
@@ -214,6 +219,7 @@ export interface WorkflowSettings {
     disableBashInPlanMode: boolean;
     disableBashInValidatorMode: boolean;
     blockDestructiveCommands: boolean;
+    allowPackageInstallInExecution: boolean;
   };
   ui: {
     showWorkflowStatus: boolean;
@@ -242,6 +248,7 @@ export interface WorkflowSettings {
     customBrandEnabled?: boolean;
     customBrandText?: string;
     customBrandBaseVisual?: CustomBrandBaseVisual;
+    debugPlanStepTracking?: boolean;
   };
   subagents: WorkflowSubagentSettings;
   context: {
@@ -365,6 +372,7 @@ const BUILTIN_DEFAULT_WORKFLOW_SETTINGS = {
     "planHistoryLimit": 50,
     "planProgressEnabled": true,
     "planRuntimeEnabled": true,
+    "planShowProgressBar": true,
     "requireApprovalBeforeExecution": true,
     "requireApprovalPerStep": false,
     "validateAfterEachStep": false,
@@ -415,7 +423,8 @@ const BUILTIN_DEFAULT_WORKFLOW_SETTINGS = {
         "model": null,
         "thinkingLevel": "xhigh"
       }
-    }
+    },
+    "maxTokens": 0
   },
   "missions": {
     "enabled": true,
@@ -429,9 +438,9 @@ const BUILTIN_DEFAULT_WORKFLOW_SETTINGS = {
     "autoRunAfterApproval": true,
     "offerReviewerBeforeApprove": false,
     "autoRunReviewerBeforeApprove": false,
-    "autoRepairReviewFailures": false,
-    "reviewRetryMode": "off",
-    "maxReviewRetriesPerMission": 0,
+    "autoRepairReviewFailures": true,
+    "reviewRetryMode": "safe_only",
+    "maxReviewRetriesPerMission": 2,
     "continueAcrossMilestones": true,
     "pauseBetweenMilestones": false,
     "progressWidgetEnabled": true,
@@ -488,13 +497,15 @@ const BUILTIN_DEFAULT_WORKFLOW_SETTINGS = {
     "clarificationTiming": "after_initial_analysis",
     "clarificationQualityGate": true,
     "allowClarificationWithoutAnalysis": false,
-    "useSubagentsBeforeClarification": true
+    "useSubagentsBeforeClarification": true,
+    "maxTokens": 0
   },
   "safety": {
-    "repoLockEnabled": false,
-    "disableBashInPlanMode": true,
+    "repoLockEnabled": true,
+    "disableBashInPlanMode": false,
     "disableBashInValidatorMode": true,
-    "blockDestructiveCommands": true
+    "blockDestructiveCommands": true,
+    "allowPackageInstallInExecution": true
   },
   "ui": {
     "showWorkflowStatus": true,
@@ -522,7 +533,8 @@ const BUILTIN_DEFAULT_WORKFLOW_SETTINGS = {
     "startupVisualOnSessionStart": true,
     "customBrandEnabled": false,
     "customBrandText": "",
-    "customBrandBaseVisual": "mission_control"
+    "customBrandBaseVisual": "mission_control",
+    "debugPlanStepTracking": false
   },
   "shortcuts": {
     "planMode": null
@@ -572,7 +584,9 @@ const BUILTIN_DEFAULT_WORKFLOW_SETTINGS = {
     "clarificationTiming": "after_initial_analysis",
     "clarificationQualityGate": true,
     "allowClarificationWithoutAnalysis": false,
-    "useSubagentsBeforeClarification": true
+    "useSubagentsBeforeClarification": true,
+    "maxTokens": 0,
+    "maxRuntimeHours": 0
   },
   "context": {
     "compactionMode": "pi_default",
@@ -869,7 +883,7 @@ export function builtInWorkflowPresets(): Record<string, WorkflowPresetBundle> {
       displayName: "Simple",
       description: "Fast end-to-end Plan/Mission/Standard workflow with minimal ceremony, automatic validation when work runs, low safe repair retries, and one-worker sub-agent support.",
       planning: { depth: "fast", clarificationMode: "auto", maxClarificationQuestions: 2, interactiveClarificationEnabled: true, clarificationQualityGate: false, useSubagentsBeforeClarification: true },
-      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: false, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: false, autoRepairValidationFailures: true, reviewRetryMode: "off", validationRetryMode: "safe_only", maxReviewRetriesPerPlan: 0, maxReviewRetriesPerWorkflow: 0, maxValidationRetriesPerPlan: 1, maxValidationRetriesPerWorkflow: 2, pauseAfterReviewFailure: true, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true },
+      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: false, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: false, autoRepairValidationFailures: true, reviewRetryMode: "off", validationRetryMode: "safe_only", maxReviewRetriesPerPlan: 0, maxReviewRetriesPerWorkflow: 0, maxValidationRetriesPerPlan: 1, maxValidationRetriesPerWorkflow: 2, pauseAfterReviewFailure: true, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true, planShowProgressBar: true },
       standard: { enabled: true, autoTodoEnabled: true, todoProgressVisible: true, todoTriggerMode: "auto", clarificationEnabled: true, clarificationMode: "auto", maxClarificationQuestions: 1, interactiveClarificationEnabled: true, clarificationTiming: "after_initial_analysis", clarificationQualityGate: false, allowClarificationWithoutAnalysis: false, useSubagentsBeforeClarification: false, allowSubagents: true, subagentScope: "user", subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "auto", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 1, minPlanningWorkersForMaximum: 1, minExecutionWorkersForDeep: 1, minExecutionWorkersForMaximum: 1, minRepairWorkersForDeep: 1, minRepairWorkersForMaximum: 1, minReviewWorkersForDeep: 1, minReviewWorkersForMaximum: 1, minValidationWorkersForDeep: 1, minValidationWorkersForMaximum: 1 }, statusWidgetVisible: true, useSharedExecutorModel: true, useStandardSpecificModels: false, modelRole: "executor" },
       missions: { defaultAutonomy: "approval_gated", requireValidationPerMilestone: true, autoRunAfterApproval: true, continueAcrossMilestones: true, pauseBetweenMilestones: false, clarificationMode: "auto", maxClarificationQuestions: 2, planningDepth: "fast", useSubagentsBeforeClarification: true, autoRepairValidationFailures: true, validationRetryMode: "safe_only", maxValidationRetriesPerMilestone: 1, maxValidationRetriesPerMission: 2, finalValidationEnabled: false, autoRepairFinalValidationFailures: false, maxFinalValidationRetries: 0, subagentPolicy: "forced", minWorkersForDeep: 1, minWorkersForMaximum: 1 },
       subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "auto", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 1, minPlanningWorkersForMaximum: 1, minExecutionWorkersForDeep: 1, minExecutionWorkersForMaximum: 1, minRepairWorkersForDeep: 1, minRepairWorkersForMaximum: 1, minReviewWorkersForDeep: 1, minReviewWorkersForMaximum: 1, minValidationWorkersForDeep: 1, minValidationWorkersForMaximum: 1 },
@@ -878,7 +892,7 @@ export function builtInWorkflowPresets(): Record<string, WorkflowPresetBundle> {
       displayName: "Standard",
       description: "Default end-to-end workflow with useful clarification, automatic execution/validation after approval, safe repair retries, and balanced worker support.",
       planning: { depth: "standard", clarificationMode: "auto", maxClarificationQuestions: 3, interactiveClarificationEnabled: true, clarificationQualityGate: true, useSubagentsBeforeClarification: true },
-      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: false, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: true, autoRepairValidationFailures: true, reviewRetryMode: "safe_only", validationRetryMode: "safe_only", maxReviewRetriesPerPlan: 2, maxReviewRetriesPerWorkflow: 4, maxValidationRetriesPerPlan: 2, maxValidationRetriesPerWorkflow: 4, pauseAfterReviewFailure: false, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true },
+      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: false, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: true, autoRepairValidationFailures: true, reviewRetryMode: "safe_only", validationRetryMode: "safe_only", maxReviewRetriesPerPlan: 2, maxReviewRetriesPerWorkflow: 4, maxValidationRetriesPerPlan: 2, maxValidationRetriesPerWorkflow: 4, pauseAfterReviewFailure: false, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true, planShowProgressBar: true },
       standard: { enabled: true, autoTodoEnabled: true, todoProgressVisible: true, todoTriggerMode: "auto", clarificationEnabled: true, clarificationMode: "auto", maxClarificationQuestions: 1, interactiveClarificationEnabled: true, clarificationTiming: "after_initial_analysis", clarificationQualityGate: true, allowClarificationWithoutAnalysis: false, useSubagentsBeforeClarification: false, allowSubagents: true, subagentScope: "user", subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "forced", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 1, minPlanningWorkersForMaximum: 1, minExecutionWorkersForDeep: 2, minExecutionWorkersForMaximum: 2, minRepairWorkersForDeep: 2, minRepairWorkersForMaximum: 2, minReviewWorkersForDeep: 2, minReviewWorkersForMaximum: 2, minValidationWorkersForDeep: 2, minValidationWorkersForMaximum: 2 }, statusWidgetVisible: true, useSharedExecutorModel: true, useStandardSpecificModels: false, modelRole: "executor" },
       missions: { defaultAutonomy: "approval_gated", requireValidationPerMilestone: true, autoRunAfterApproval: true, continueAcrossMilestones: true, pauseBetweenMilestones: false, clarificationMode: "auto", maxClarificationQuestions: 3, planningDepth: "standard", useSubagentsBeforeClarification: true, autoRepairValidationFailures: true, validationRetryMode: "safe_only", maxValidationRetriesPerMilestone: 2, maxValidationRetriesPerMission: 6, finalValidationEnabled: false, autoRepairFinalValidationFailures: false, maxFinalValidationRetries: 1, subagentPolicy: "forced", minWorkersForDeep: 1, minWorkersForMaximum: 1 },
       subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "forced", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 1, minPlanningWorkersForMaximum: 1, minExecutionWorkersForDeep: 2, minExecutionWorkersForMaximum: 2, minRepairWorkersForDeep: 2, minRepairWorkersForMaximum: 2, minReviewWorkersForDeep: 2, minReviewWorkersForMaximum: 2, minValidationWorkersForDeep: 2, minValidationWorkersForMaximum: 2 },
@@ -887,7 +901,7 @@ export function builtInWorkflowPresets(): Record<string, WorkflowPresetBundle> {
       displayName: "Deep",
       description: "Careful end-to-end workflow for risky or codebase-heavy work with stronger clarification, automatic review/validation, final mission validation, and larger worker teams.",
       planning: { depth: "deep", clarificationMode: "always_for_nontrivial", maxClarificationQuestions: 5, interactiveClarificationEnabled: true, clarificationQualityGate: true, useSubagentsBeforeClarification: true },
-      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: true, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: true, autoRepairValidationFailures: true, reviewRetryMode: "safe_only", validationRetryMode: "safe_only", maxReviewRetriesPerPlan: 3, maxReviewRetriesPerWorkflow: 6, maxValidationRetriesPerPlan: 3, maxValidationRetriesPerWorkflow: 6, pauseAfterReviewFailure: false, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true },
+      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: true, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: true, autoRepairValidationFailures: true, reviewRetryMode: "safe_only", validationRetryMode: "safe_only", maxReviewRetriesPerPlan: 3, maxReviewRetriesPerWorkflow: 6, maxValidationRetriesPerPlan: 3, maxValidationRetriesPerWorkflow: 6, pauseAfterReviewFailure: false, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true, planShowProgressBar: true },
       standard: { enabled: true, autoTodoEnabled: true, todoProgressVisible: true, todoTriggerMode: "required", clarificationEnabled: true, clarificationMode: "always_for_nontrivial", maxClarificationQuestions: 2, interactiveClarificationEnabled: true, clarificationTiming: "after_initial_analysis", clarificationQualityGate: true, allowClarificationWithoutAnalysis: false, useSubagentsBeforeClarification: true, allowSubagents: true, subagentScope: "user", subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "forced", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 2, minPlanningWorkersForMaximum: 2, minExecutionWorkersForDeep: 3, minExecutionWorkersForMaximum: 3, minRepairWorkersForDeep: 2, minRepairWorkersForMaximum: 2, minReviewWorkersForDeep: 3, minReviewWorkersForMaximum: 3, minValidationWorkersForDeep: 3, minValidationWorkersForMaximum: 3 }, statusWidgetVisible: true, useSharedExecutorModel: true, useStandardSpecificModels: false, modelRole: "executor" },
       missions: { defaultAutonomy: "approval_gated", requireValidationPerMilestone: true, autoRunAfterApproval: true, continueAcrossMilestones: true, pauseBetweenMilestones: false, clarificationMode: "always_for_nontrivial", maxClarificationQuestions: 5, planningDepth: "deep", useSubagentsBeforeClarification: true, autoRepairValidationFailures: true, validationRetryMode: "safe_only", maxValidationRetriesPerMilestone: 3, maxValidationRetriesPerMission: 8, finalValidationEnabled: true, autoRepairFinalValidationFailures: true, maxFinalValidationRetries: 2, subagentPolicy: "forced", minWorkersForDeep: 3, minWorkersForMaximum: 3 },
       subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "forced", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 3, minPlanningWorkersForMaximum: 3, minExecutionWorkersForDeep: 3, minExecutionWorkersForMaximum: 3, minRepairWorkersForDeep: 2, minRepairWorkersForMaximum: 2, minReviewWorkersForDeep: 3, minReviewWorkersForMaximum: 3, minValidationWorkersForDeep: 3, minValidationWorkersForMaximum: 3 },
@@ -896,7 +910,7 @@ export function builtInWorkflowPresets(): Record<string, WorkflowPresetBundle> {
       displayName: "Maximum",
       description: "Highest-rigor end-to-end workflow with strongest clarification, automatic review/validation, final mission validation, aggressive in-scope repair, and maximum worker teams.",
       planning: { depth: "maximum", clarificationMode: "always_for_nontrivial", maxClarificationQuestions: 5, interactiveClarificationEnabled: true, clarificationQualityGate: true, useSubagentsBeforeClarification: true },
-      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: true, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: true, autoRepairValidationFailures: true, reviewRetryMode: "aggressive_within_scope", validationRetryMode: "aggressive_within_scope", maxReviewRetriesPerPlan: 5, maxReviewRetriesPerWorkflow: 8, maxValidationRetriesPerPlan: 5, maxValidationRetriesPerWorkflow: 8, pauseAfterReviewFailure: false, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true },
+      workflow: { offerReviewerBeforeExecute: false, autoRunReviewerBeforeExecute: true, offerValidationAfterExecute: true, autoRunValidationAfterExecute: true, validateAfterExecution: true, requirePlanApprovalBeforeExecute: false, requireApprovalBeforeExecution: false, autoRepairReviewFailures: true, autoRepairValidationFailures: true, reviewRetryMode: "aggressive_within_scope", validationRetryMode: "aggressive_within_scope", maxReviewRetriesPerPlan: 5, maxReviewRetriesPerWorkflow: 8, maxValidationRetriesPerPlan: 5, maxValidationRetriesPerWorkflow: 8, pauseAfterReviewFailure: false, pauseAfterValidationFailure: false, planProgressEnabled: true, planRuntimeEnabled: true, planShowProgressBar: true },
       standard: { enabled: true, autoTodoEnabled: true, todoProgressVisible: true, todoTriggerMode: "required", clarificationEnabled: true, clarificationMode: "always_for_nontrivial", maxClarificationQuestions: 2, interactiveClarificationEnabled: true, clarificationTiming: "after_initial_analysis", clarificationQualityGate: true, allowClarificationWithoutAnalysis: false, useSubagentsBeforeClarification: true, allowSubagents: true, subagentScope: "user", subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "forced", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 3, minPlanningWorkersForMaximum: 3, minExecutionWorkersForDeep: 4, minExecutionWorkersForMaximum: 4, minRepairWorkersForDeep: 3, minRepairWorkersForMaximum: 3, minReviewWorkersForDeep: 4, minReviewWorkersForMaximum: 4, minValidationWorkersForDeep: 4, minValidationWorkersForMaximum: 4 }, statusWidgetVisible: true, useSharedExecutorModel: true, useStandardSpecificModels: false, modelRole: "executor" },
       missions: { defaultAutonomy: "supervised_auto", requireValidationPerMilestone: true, autoRunAfterApproval: true, continueAcrossMilestones: true, pauseBetweenMilestones: false, clarificationMode: "always_for_nontrivial", maxClarificationQuestions: 6, planningDepth: "maximum", useSubagentsBeforeClarification: true, autoRepairValidationFailures: true, validationRetryMode: "aggressive_within_scope", maxValidationRetriesPerMilestone: 4, maxValidationRetriesPerMission: 12, finalValidationEnabled: true, autoRepairFinalValidationFailures: true, maxFinalValidationRetries: 4, subagentPolicy: "forced", minWorkersForDeep: 4, minWorkersForMaximum: 4 },
       subagents: { planningPolicy: "forced", executionPolicy: "forced", repairPolicy: "forced", reviewPolicy: "forced", validationPolicy: "forced", autoUseDuringPlanning: true, autoUseDuringExecution: true, autoUseDuringRepair: true, autoUseDuringReview: true, autoUseDuringValidation: true, minPlanningWorkersForDeep: 4, minPlanningWorkersForMaximum: 4, minExecutionWorkersForDeep: 4, minExecutionWorkersForMaximum: 4, minRepairWorkersForDeep: 3, minRepairWorkersForMaximum: 3, minReviewWorkersForDeep: 4, minReviewWorkersForMaximum: 4, minValidationWorkersForDeep: 4, minValidationWorkersForMaximum: 4 },