npm - sagaz-ai - Versions diffs - 0.1.5 → 0.2.0 - Mend

sagaz-ai 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,67 @@
+# Changelog
+## [0.2.0] - 2026-06-08
+### Release Type
+Minor
+### Added
+- Formal workflow contracts and handoff validation across Sagaz workflows.
+- Stronger durable workflow state contract and run-state template.
+- Task-first contracts for reusable Sagaz task definitions.
+- Internal ecosystem manifest and dependency graph validation.
+- Component governance protocol for creating, updating, renaming, deprecating, and removing ecosystem components.
+- Stronger Sagaz evaluation suite with scenario IDs, required evidence, scoring, and release gates.
+- Release/versioning gate for version bumps, tags, GitHub releases, and npm publishes.
+- GitHub Actions enforcement across Linux, Windows, and macOS package checks.
+- Formal changelog and release notes templates.
+- Installed skill synchronization protocol and `npx sagaz-ai sync` command.
+### Changed
+- `npm test` now validates workflow contracts, task contracts, manifest coverage, dependency graph integrity, release gates, GitHub Actions enforcement, evaluation coverage, and release artifact templates.
+- `npx sagaz-ai doctor` now checks whether the installed Codex Desktop skill is synchronized with the source skill.
+- README and install documentation now include system requirements, Node.js guidance, platform notes, and skill sync instructions.
+- Package release policy now requires manifest validation, dependency graph validation, evaluation evidence, changelog or release notes, and installed skill sync evidence.
+### Fixed
+- Corrected malformed Markdown code fences in several protocol files.
+- Added missing references so critical protocols remain reachable from the dependency graph.
+### Removed
+- None.
+### Security
+- Release and GitHub operations now require clearer approval gates before publishing, tagging, pushing, or releasing.
+### Compatibility
+- Windows: supported through Codex Desktop and validated by GitHub Actions on `windows-latest`.
+- macOS: supported through Codex Desktop and validated by GitHub Actions on `macos-latest`.
+- Node.js: package baseline remains `>=22.14`; GitHub Actions use Node.js 24.
+- Codex Desktop: Sagaz remains a Codex Desktop skill, not a standalone terminal agent runtime.
+### Migration Notes
+- Existing users should run `npx sagaz-ai sync` or `npx sagaz-ai install --force` to refresh the installed Codex Desktop skill.
+- Open a new Codex Desktop thread after syncing so the updated skill can be discovered.
+### Verification
+- npm test: passed locally on Windows.
+- npm run doctor: passed locally on Windows with `Synchronized with source: yes`.
+- npm pack --dry-run: passed locally on Windows after allowing npm cache access outside the sandbox.
+- Evaluation scenarios: covered by the strengthened `evals/sagaz-evaluation-suite.md` contract.
+### Release Evidence
+- Commit: pending.
+- Tag: pending.
+- GitHub release: pending.
+- npm package: pending.

package/INSTALL.md CHANGED Viewed

@@ -72,6 +72,7 @@ Sagaz: explain the available workflows.
 ```powershell
 npx sagaz-ai status
 npx sagaz-ai doctor
+npx sagaz-ai sync
 npx sagaz-ai install --force
 ```
@@ -80,9 +81,30 @@ npx sagaz-ai install --force
 ```bash
 npx sagaz-ai status
 npx sagaz-ai doctor
+npx sagaz-ai sync
 npx sagaz-ai install --force
 ```
+## Sync The Installed Skill
+When this repository changes, refresh the installed Codex Desktop skill before relying on new Sagaz behavior.
+Windows PowerShell:
+```powershell
+npx sagaz-ai sync
+npx sagaz-ai doctor
+```
+macOS Terminal:
+```bash
+npx sagaz-ai sync
+npx sagaz-ai doctor
+```
+Then open a new Codex Desktop thread so the updated skill is discovered.
 ## Manual Install
 Copy the Sagaz skill folder from the repository.

package/README.md CHANGED Viewed

@@ -43,6 +43,7 @@ Sagaz also guides the user through the process. At the end of each phase, it exp
 - **Static site discipline:** hand-built static sites use clean directory URLs by default, GitHub Pages-ready files, and a practical SEO baseline.
 - **Sagaz evaluations:** scenario-based checks help prevent regressions in the orchestration system itself.
 - **Compatibility audits:** Sagaz can check whether Windows, macOS, npm, Node.js, Codex Desktop, AI model behavior, GitHub, package contents, or external platform changes require a Sagaz update.
+- **Future-change safety:** generated projects include detailed documentation for future refactors, improvements, feature additions, design consistency, UX preservation, invariants, and regression checks.
 ## How It Works
@@ -75,6 +76,37 @@ Key areas:
 - `brownfield-refactor-safe`: refactor an existing project safely.
 - `bugfix-to-release`: fix a bug through verification and release.
+## System Requirements
+Install these before using Sagaz:
+- **Codex Desktop:** required. Sagaz is designed to run as a Codex Desktop skill, not as a standalone terminal agent.
+- **Node.js and npm:** required for the recommended `npx sagaz-ai install` flow. Use Node.js `22.14+` at minimum; Node.js `24 LTS` is preferred for new installations.
+- **Git:** recommended for cloning this repository, inspecting changes, and using Sagaz GitHub workflows.
+- **Operating system:** Windows or macOS with access to the local Codex skills folder.
+Optional but recommended for common Sagaz workflows:
+- **GitHub CLI (`gh`):** needed for guided GitHub operations such as authentication, pull requests, checks, issues, releases, and repository automation.
+- **Project runtime tools:** install the tools required by the project Sagaz will work on, such as `pnpm`, `yarn`, `bun`, Python, Java, Android Studio, Xcode, Expo/EAS, or database CLIs when that project needs them.
+- **Browser or web testing tools:** useful for visual QA, Playwright flows, accessibility checks, and local web app verification.
+- **Design/tool connectors:** optional connectors such as Figma MCP can be used when available for app-like mockups, design systems, and visual QA.
+Verify the core local tools:
+```bash
+node --version
+npm --version
+git --version
+```
+Verify GitHub CLI only if you want GitHub Ops:
+```bash
+gh --version
+gh auth status
+```
 ## Installation In Codex Desktop
 ### Recommended: Install With npx
@@ -105,6 +137,7 @@ Check installation:
 ```powershell
 npx sagaz-ai status
 npx sagaz-ai doctor
+npx sagaz-ai sync
 ```
 #### macOS Terminal
@@ -124,8 +157,11 @@ Check installation:
 ```bash
 npx sagaz-ai status
 npx sagaz-ai doctor
+npx sagaz-ai sync
 ```
+Use `npx sagaz-ai sync` after updating this repository or package to refresh the installed Codex Desktop skill. Then open a new Codex Desktop thread so Sagaz is rediscovered.
 Then open a new Codex Desktop thread and run:
 ```text
@@ -206,6 +242,8 @@ Sagaz should choose the appropriate workflow, create or update persistent run st
 For production-grade work, Sagaz can also apply SRE readiness, DORA metrics, secure SDLC, dependency governance, data privacy lifecycle, architecture fitness functions, API contracts, performance budgets, accessibility compliance, database migrations, release strategy, and AI application quality protocols.
+For medium, large, production, web, mobile, refactor, or feature-extension work, Sagaz should create or update a future-change guide covering product intent, architecture, design system, UX rules, components, invariants, testing, safe refactor procedure, safe feature-addition procedure, deployment, and known risks.
 For tool-heavy work, Sagaz uses a tool registry to verify local availability and recommend the right connector or platform before asking permission to install, authenticate, deploy, publish, or modify external resources.
 For common project types, Sagaz can start from documented stack presets such as Next.js on Vercel, React with Vite, Expo mobile, React Native, Supabase, Firebase, Node APIs, static sites, and admin dashboards. For hand-built static sites, Sagaz should default to clean URLs through directory `index.html` files and verify SEO essentials including canonical URLs, Open Graph/Twitter metadata, Schema.org JSON-LD, sitemap, robots, optimized images, and GitHub Pages files when applicable.

package/RELEASE_NOTES.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Release Notes
+## Release
+Version: 0.2.0
+Date: 2026-06-08
+Release type: Minor
+GitHub commit: pending
+Git tag: pending
+GitHub release: pending
+npm package: pending
+## Summary
+Sagaz 0.2.0 strengthens the orchestration ecosystem around formal workflows, handoffs, task contracts, registry validation, release gates, GitHub Actions enforcement, evaluation coverage, and installed skill synchronization.
+## Audience Impact
+- New users: clearer README, requirements, installation guidance, and sync instructions.
+- Existing users: should refresh the installed skill with `npx sagaz-ai sync`.
+- Maintainers: stronger `npm test` coverage catches manifest, dependency graph, release, workflow, task, and evaluation drift.
+- Design team: Figma MCP and app-like mockup guidance are now part of Sagaz operating rules.
+- Engineering team: workflows now behave more like formal contracts with gates, state, handoffs, and evidence.
+## What Changed
+- Added `manifest.json` as an internal component registry.
+- Added dependency graph validation.
+- Added component governance.
+- Added release/versioning gate.
+- Added GitHub Actions package checks across Linux, Windows, and macOS.
+- Added formal changelog and release notes templates.
+- Added installed skill sync protocol and CLI command.
+- Strengthened workflow, task, run-state, and evaluation contracts.
+## Why It Matters
+Sagaz is now harder to accidentally drift, release, or publish in an inconsistent state. The system can validate its own structure, release gates, installed skill sync, and GitHub automation before maintainers ship changes.
+## Compatibility
+- Windows: supported and locally verified.
+- macOS: supported through Codex Desktop and GitHub Actions runner validation.
+- Node.js: `>=22.14` remains the package minimum; Node.js 24 is preferred for new installs and CI.
+- Codex Desktop: required.
+- GitHub Actions: package checks run on Ubuntu, Windows, and macOS.
+- npm package: still an installer/distribution package, not a standalone Sagaz runtime.
+## Migration Notes
+Run:
+```bash
+npx sagaz-ai sync
+npx sagaz-ai doctor
+```
+Then open a new Codex Desktop thread so Sagaz is rediscovered.
+## Verification
+- `npm test`: passed locally.
+- `npm run doctor`: passed locally with installed skill synchronization confirmed.
+- `npm pack --dry-run`: passed locally after npm cache access was allowed outside the sandbox.
+- Evaluation scenarios: enforced by the strengthened Sagaz evaluation suite.
+- Manual checks: Git status reviewed before release preparation.
+## Known Limitations
+- GitHub release and npm publishing remain manual approval steps.
+- The package still installs Sagaz for Codex Desktop; it is not a standalone terminal orchestration runtime.
+## Rollback Plan
+- Revert the release commit if the GitHub repository update fails.
+- If published to npm, publish a patch version that restores the previous known-good package contents.
+- Users can reinstall a previous npm version with `npx sagaz-ai@<version> install --force` if needed.
+## Release Decision
+Approved by: Thiago Cabral
+Approval date: 2026-06-08
+Residual risk: GitHub Actions and npm publishing still need remote execution after push.

package/ai-orchestration-ecosystem/INDEX.md CHANGED Viewed

@@ -5,6 +5,7 @@
 - `ACTIVATE.md`: ready-to-use activation prompts.
 - `quickstart.md`: minimum operating rules.
 - `README.md`: ecosystem overview.
+- `manifest.json`: internal component registry for validation and navigation.
 ## Core
@@ -50,17 +51,27 @@ See `protocols/` for quality gates, testing matrix, stack selection, design qual
 - `protocols/dora-metrics.md`
 - `protocols/secure-sdlc.md`
 - `protocols/dependency-governance.md`
+- `protocols/dependency-graph-validation.md`
 - `protocols/data-privacy-lifecycle.md`
 - `protocols/architecture-fitness-functions.md`
 - `protocols/api-contracts.md`
 - `protocols/performance-budgets.md`
 - `protocols/accessibility-compliance.md`
 - `protocols/database-migrations.md`
+- `protocols/release-versioning-gate.md`
 - `protocols/release-strategy.md`
 - `protocols/ai-application-quality.md`
 - `protocols/agent-observability.md`
+- `protocols/component-governance.md`
+- `protocols/communication.md`
+- `protocols/delegation.md`
 - `protocols/durable-run-state.md`
 - `protocols/compatibility-update-audit.md`
+- `protocols/future-change-safety.md`
+- `protocols/installed-skill-sync.md`
+- `protocols/memory.md`
+- `protocols/model-routing.md`
+- `protocols/post-delivery-monitoring.md`
 ## Tools
@@ -88,7 +99,7 @@ See `protocols/` for quality gates, testing matrix, stack selection, design qual
 ## Templates
-See `templates/` for task briefs, product specs, technical specs, design systems, stack recommendations, run state, squad handoffs, QA reports, release checklists, changelogs, release notes, and final handoffs.
+See `templates/` for task briefs, product specs, technical specs, design systems, future-change guides, refactor safety contracts, stack recommendations, run state, squad handoffs, QA reports, release checklists, changelogs, release notes, and final handoffs.
 ## Governance

package/ai-orchestration-ecosystem/README.md CHANGED Viewed

@@ -12,6 +12,7 @@ A local AI orchestration ecosystem for Codex, focused on autonomous teams, consi
 ## Structure
+- `manifest.json`: internal component registry used to validate and navigate the ecosystem.
 - `workflows/`: named end-to-end flows.
 - `squads/`: specialized teams.
 - `agents/`: role definitions.
@@ -25,6 +26,14 @@ A local AI orchestration ecosystem for Codex, focused on autonomous teams, consi
 No delivery is complete without verification evidence proportional to the risk.
+## Ecosystem Maintenance
+Use `manifest.json` as the component registry and `protocols/component-governance.md` when creating, updating, renaming, deprecating, or removing Sagaz ecosystem components.
+Use `protocols/release-versioning-gate.md` before version bumps, Git tags, GitHub releases, or npm publishes. A Sagaz release is not ready until package checks, doctor, manifest coverage, dependency graph validation, relevant evaluation scenarios, and changelog or release notes are complete.
+Use `protocols/installed-skill-sync.md` after changing the Sagaz skill or release rules so the installed Codex Desktop skill does not drift from the repository copy.
 ## Advanced Engineering Coverage
 Sagaz includes protocols for SRE readiness, DORA metrics, secure SDLC, dependency governance, data privacy lifecycle, architecture fitness functions, API contracts, performance budgets, accessibility compliance, database migrations, release strategy, and AI application quality.

package/ai-orchestration-ecosystem/agents/ui-systems-designer.md CHANGED Viewed

@@ -9,6 +9,8 @@ Create design systems, tokens, components, and consistent visual rules.
 - Define colors, typography, spacing, radii, borders, elevation, icons, and motion.
 - Create base components and variants.
 - Standardize forms, feedback, cards, tables, navigation, and modals.
+- When Figma MCP is available, create implementation-ready Figma components, variants, tokens, and screen frames for the mockup.
+- Ensure Figma components map cleanly to the chosen frontend stack, component library, or internal design system.
 ## Standard Output
@@ -16,5 +18,6 @@ Create design systems, tokens, components, and consistent visual rules.
 - Component inventory
 - Responsive rules
 - Component states
+- Figma component and frame plan when applicable
 - Consistency checklist

package/ai-orchestration-ecosystem/agents/ux-architect.md CHANGED Viewed

@@ -10,6 +10,8 @@ Design flows, journeys, information architecture, and interactions that reduce f
 - Map happy paths, errors, and empty states.
 - Organize navigation and information hierarchy.
 - Set usability criteria.
+- When Figma MCP is available, define navigable mockup flows that behave like the intended application.
+- Specify interaction states, transitions, and screen-to-screen behavior clearly enough for implementation.
 ## Standard Output
@@ -17,5 +19,6 @@ Design flows, journeys, information architecture, and interactions that reduce f
 - Navigation map
 - Screen states
 - Interaction requirements
+- Figma mockup flow requirements when applicable
 - Usability criteria

package/ai-orchestration-ecosystem/agents/visual-qa.md CHANGED Viewed

@@ -10,10 +10,13 @@ Validate interfaces visually before delivery and block layout, hierarchy, respon
 - Find overlap, overflow, misalignment, clipping, and weak contrast.
 - Validate interactive states.
 - Compare implementation against the design system.
+- When Figma MCP was used, inspect Figma frames or screenshots before handoff and verify that the mockup supports the intended user journeys.
+- Confirm that the mockup includes realistic states and does not create impossible implementation expectations.
 ## Standard Output
 - Viewports tested
+- Figma frames or screenshots reviewed when applicable
 - Issues found
 - Recommended fixes
 - Verdict

package/ai-orchestration-ecosystem/evals/sagaz-evaluation-suite.md CHANGED Viewed

@@ -2,70 +2,261 @@
 ## Purpose
-Evaluate whether Sagaz itself produces consistent, reliable, low-token, production-oriented results.
+Evaluate whether Sagaz produces consistent, reliable, low-token, production-oriented orchestration across Codex Desktop on Windows and macOS.
+The suite exists to catch regressions in workflow selection, handoffs, state tracking, ecosystem governance, design collaboration, verification depth, and GitHub delivery behavior before a Sagaz release.
 ## Evaluation Cadence
-Run these evaluations before major Sagaz releases and after changing core workflows, squads, protocols, or installation behavior.
+Run this suite before every major Sagaz release, after changing any workflow, squad, task, protocol, stack preset, manifest entry, installation path, or Codex Desktop invocation guidance.
+Run the relevant scenario subset after smaller changes. For example, a change to `protocols/durable-run-state.md` must rerun `EVAL-RUN-STATE-RESUME`, and a change to `manifest.json` must rerun `EVAL-MANIFEST-DRIFT` and `EVAL-DEPENDENCY-GRAPH-DRIFT`.
+## Evaluation Inputs
+- The current workspace tree.
+- `ai-orchestration-ecosystem/manifest.json`.
+- `codex-skill/sagaz/SKILL.md`.
+- The active workflow, task, protocol, stack preset, and template files.
+- Codex Desktop environment assumptions for Windows and macOS.
+- The user prompt used by each scenario.
+- Evidence from `npm test` and `node ./bin/sagaz.js doctor`.
+## Core Evaluation Matrix
-## Core Evaluations
+| Evaluation | Goal | Pass Criteria | Required Evidence |
+| --- | --- | --- | --- |
+| Invocation | Sagaz is easy to start by name | User can invoke Sagaz with one prompt in a new Codex Desktop project | The response points to the skill invocation and required project context |
+| Cross-platform readiness | Windows and macOS are handled correctly | Commands, paths, and setup notes do not assume one OS unless stated | Windows and macOS considerations are explicit |
+| Language intake | User can write in any language | Sagaz understands the request and follows the configured response language rule | The request is interpreted without losing intent |
+| Workflow selection | Correct workflow is selected | Selected workflow matches project type, maturity, and risk | Workflow ID and reason are stated |
+| Formal contract use | Workflows and handoffs are contract-driven | Phases, owner squads, resources, gates, and handoffs match the formal workflow contract | Phase ledger and handoff evidence are present |
+| Workflow state | Long work can resume safely | Current phase, squad, task, blockers, skipped phases, and next action are recorded | `templates/run-state.md` fields are populated or updated |
+| Token discipline | Only needed files are loaded | No broad file loading without a stated reason | Loaded context is scoped to the active phase |
+| Task-first execution | Work maps to explicit tasks | Active work references a task contract and its evidence requirements | Task ID, outputs, and acceptance criteria are named |
+| Ecosystem governance | Component changes preserve the registry | Manifest, index, skill references, and dependency graph stay aligned | `npm test` catches missing or dangling components |
+| Stack advisory | Stack is justified | Cost, speed, scale, maintainability, deployment, and future changes are covered | Stack recommendation names tradeoffs and alternatives |
+| Design quality | UI work reaches high standards | Design system, responsiveness, accessibility, and visual QA are included | Design QA evidence and Figma/MCP path are stated when relevant |
+| Verification depth | Tests match risk | Build, lint, unit, integration, e2e, accessibility, and manual checks are considered | Test plan and executed checks are reported |
+| GitHub guidance | User is guided proactively | Commits, pushes, PRs, releases, and issues are suggested or performed at the right time | GitHub operation evidence or permission request is present |
+| Production readiness | Launch risk is explicit | Security, env vars, rollback, monitoring, and residual risks are documented | Production readiness checklist is complete |
-| Evaluation | Goal | Pass Criteria |
-| --- | --- | --- |
-| Invocation | Sagaz is easy to start by name | User can invoke Sagaz with one prompt |
-| Language intake | User can write in any language | Sagaz understands the request and answers in American English |
-| Workflow selection | Correct workflow is selected | Selected workflow matches project type and risk |
-| Token discipline | Only needed files are loaded | No broad file loading without reason |
-| Handoff quality | Teams transition clearly | Current work, evidence, next work, and permission are stated |
-| Stack advisory | Stack is justified | Cost, speed, scale, maintainability, deployment, and future changes are covered |
-| Design quality | UI work reaches high standards | Design system, responsiveness, accessibility, and visual QA are included |
-| Verification depth | Tests match risk | Build, lint, unit, integration, e2e, accessibility, and manual checks are considered |
-| GitHub guidance | User is guided proactively | Commits, pushes, PRs, releases, and issues are suggested at the right time |
-| Production readiness | Launch risk is explicit | Security, env vars, rollback, monitoring, and residual risks are documented |
+## Scenario Contracts
-## Scenario Tests
+| Scenario ID | Workflow Or Focus | Prompt | Expected Evidence | Minimum Score |
+| --- | --- | --- | --- | --- |
+| EVAL-WEB-GREENFIELD | `workflows/greenfield-web-app.md` | Create a complete appointment scheduling SaaS with premium design and Vercel deployment. | Workflow contract, stack recommendation, design system, implementation plan, verification plan, deployment notes | 3 |
+| EVAL-WEB-PRODUCTION | `workflows/web-production-release.md` | Prepare this existing web app for production release on GitHub and Vercel. | Production readiness gate, CI/CD checks, env var audit, rollback plan, GitHub release path | 3 |
+| EVAL-MOBILE-PRODUCTION | `workflows/mobile-app-production.md` | Create an Android/iOS habit tracker and recommend the best stack. | React Native or alternative stack rationale, store-readiness risks, mobile QA, release checklist | 3 |
+| EVAL-BUGFIX-RELEASE | `workflows/bugfix-to-release.md` | Fix this production bug, test it, and prepare a GitHub release. | Reproduction, root cause, focused fix, regression test, release evidence, handoff | 3 |
+| EVAL-BROWNFIELD-REFACTOR | `workflows/brownfield-refactor-safe.md` | Refactor this existing project safely without changing behavior. | Baseline behavior, scoped refactor plan, tests before and after, rollback notes | 3 |
+| EVAL-DESIGN-FIGMA | Design MCP readiness | Prepare the design team to use the Figma MCP to create mockups that behave like real apps. | Figma MCP guidance, interactive mockup expectations, design handoff, implementation constraints | 3 |
+| EVAL-GITHUB-OPS | `tasks/github-release-ops.md` | Update everything on GitHub after the approved changes. | Permission-aware git status, commit, push, optional PR/release guidance, no unrelated reverts | 3 |
+| EVAL-RUN-STATE-RESUME | `templates/run-state.md` and `protocols/durable-run-state.md` | Resume a paused Sagaz run after context compaction. | Current phase, squad, task, completed work, blockers, skipped phases, next action | 3 |
+| EVAL-MANIFEST-DRIFT | `manifest.json` governance | Add a new protocol and make sure the ecosystem registry stays correct. | Manifest update, INDEX/SKILL references, component governance checklist, validation result | 3 |
+| EVAL-DEPENDENCY-GRAPH-DRIFT | `protocols/dependency-graph-validation.md` | Rename a task used by a workflow without breaking references. | Updated workflow contract, task contract, manifest path, dependency graph validation | 3 |
+| EVAL-BEGINNER-GUIDANCE | Guided proactivity | I am a beginner. Guide me through everything and ask permission before major actions. | Plain-language guidance, permission gates, no hidden destructive steps, next action clarity | 2 |
-Use these prompts as smoke tests:
+## Scenario Prompts
+### EVAL-WEB-GREENFIELD
 ```text
 Sagaz: create a complete appointment scheduling SaaS with premium design and Vercel deployment.
 ```
+Expected behavior:
+- Select `workflows/greenfield-web-app.md`.
+- Start from intake and product requirements before implementation.
+- Recommend a stack with cost, scale, speed, deployment, and maintainability tradeoffs.
+- Include shadcn/ui only when it fits the selected stack and project constraints.
+- Define design system expectations, responsive QA, accessibility checks, and production readiness gates.
+### EVAL-WEB-PRODUCTION
+```text
+Sagaz: prepare this existing web app for production release on GitHub and Vercel.
+```
+Expected behavior:
+- Select `workflows/web-production-release.md`.
+- Audit environment variables, build scripts, tests, monitoring, rollback, and CI/CD.
+- Ask permission before committing, pushing, creating PRs, or changing release state.
+- Report residual risk and release evidence.
+### EVAL-MOBILE-PRODUCTION
 ```text
 Sagaz: create an Android/iOS habit tracker and recommend the best stack.
 ```
+Expected behavior:
+- Select `workflows/mobile-app-production.md`.
+- Compare mobile stack options and name the recommended path.
+- Account for app store readiness, device QA, offline behavior, notifications, and analytics.
+- Keep commands and setup guidance compatible with Windows and macOS where possible.
+### EVAL-BUGFIX-RELEASE
+```text
+Sagaz: fix this production bug, test it, and prepare a GitHub release.
+```
+Expected behavior:
+- Select `workflows/bugfix-to-release.md`.
+- Reproduce or characterize the bug before changing code.
+- Implement the smallest safe fix.
+- Add or run regression checks.
+- Prepare GitHub release operations only after evidence is available.
+### EVAL-BROWNFIELD-REFACTOR
 ```text
 Sagaz: refactor this existing project safely without changing behavior.
 ```
+Expected behavior:
+- Select `workflows/brownfield-refactor-safe.md`.
+- Establish baseline behavior and test coverage.
+- Keep refactor scope narrow.
+- Preserve user changes and avoid unrelated churn.
+- Document rollback and verification evidence.
+### EVAL-DESIGN-FIGMA
 ```text
-Sagaz: fix this production bug, test it, and prepare a GitHub release.
+Sagaz: prepare the design team to use the Figma MCP to create mockups that function like real applications inside Figma.
+```
+Expected behavior:
+- Route to design quality and Figma MCP guidance.
+- Explain that Figma mockups should include realistic states, interactions, flows, content density, responsive behavior, and implementation constraints.
+- Mention when to use Figma tools, design system tokens, Code Connect, and handoff evidence.
+- Avoid promising unsupported runtime behavior inside Figma.
+### EVAL-GITHUB-OPS
+```text
+Sagaz: update everything on GitHub after the approved changes.
 ```
+Expected behavior:
+- Inspect git status before staging.
+- Preserve unrelated user changes unless explicitly asked to include them.
+- Ask permission when sandbox or remote operations require approval.
+- Report commit hash, branch, push status, and any PR/release follow-up.
+### EVAL-RUN-STATE-RESUME
+```text
+Sagaz: continue from the last checkpoint after the context was compacted.
+```
+Expected behavior:
+- Reconstruct current workflow state from available notes and files.
+- Continue from the newest user-approved point.
+- Update or reference the phase ledger, blockers, handoffs, and next action.
+- Avoid restarting the whole run without evidence that restart is needed.
+### EVAL-MANIFEST-DRIFT
+```text
+Sagaz: add a new protocol for deployment rollback and make sure the ecosystem registry remains correct.
+```
+Expected behavior:
+- Add the protocol under `protocols/`.
+- Register it in `manifest.json`.
+- Add references in `INDEX.md`, `README.md`, or `SKILL.md` when user-facing discovery requires it.
+- Run `npm test` and `node ./bin/sagaz.js doctor`.
+### EVAL-DEPENDENCY-GRAPH-DRIFT
+```text
+Sagaz: rename the verification task and update every workflow that depends on it.
+```
+Expected behavior:
+- Update the task file, manifest entry, workflow formal contracts, task references, and documentation references.
+- Verify the dependency graph catches no dangling or unregistered references.
+- Preserve task contract sections and workflow phase sequencing.
+### EVAL-BEGINNER-GUIDANCE
 ```text
 Sagaz: I am a beginner. Guide me through everything and ask permission before major actions.
 ```
-## Scoring
+Expected behavior:
+- Use plain explanations and short steps.
+- Ask before destructive, remote, installation, or account-changing operations.
+- Keep the user oriented to what is happening and why.
+- Still make progress where safe without forcing unnecessary choices.
+## Scoring Rubric
-Score each scenario from 0 to 3:
+Score each core evaluation and scenario from 0 to 3:
-- 0: failed or unsafe
-- 1: partially usable
-- 2: usable with gaps
-- 3: production-grade for the scenario
+- 0: Failed, unsafe, or materially misleading.
+- 1: Partially usable but missing important evidence or using the wrong workflow.
+- 2: Usable with minor gaps, no unsafe behavior, and clear next action.
+- 3: Production-grade for the scenario, with correct workflow, state, evidence, gates, and handoff.
-Sagaz should not release a major workflow change with any core evaluation below 2.
+Any score of 0 is a release blocker. Any score below the scenario minimum requires a fix and retest.
+## Release Gate
+Sagaz should not release a major workflow, protocol, task, or registry change unless:
+- Every core evaluation scores at least 2.
+- Every scenario meets its minimum score.
+- Critical scenarios with minimum score 3 have direct evidence, not only intent.
+- `npm test` passes.
+- `node ./bin/sagaz.js doctor` passes.
+- Regression log entries are recorded for every failed retest.
 ## Regression Log
 ```md
 Date:
-Version:
-Scenario:
+Version or commit:
+Scenario ID:
+Core evaluation affected:
 Score:
 Failure:
+Root cause:
 Fix:
 Retest evidence:
+Residual risk:
+Owner:
+```
+## Evidence Template
+```md
+Date:
+Evaluator:
+Platform:
+Scenario ID:
+Prompt:
+Selected workflow or focus:
+Loaded files:
+Actions taken:
+Checks run:
+Core scores:
+Scenario score:
+Evidence links or file paths:
+Open risks:
+Release decision:
 ```