sagaz-ai 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/CHANGELOG.md +67 -0
  2. package/INSTALL.md +22 -0
  3. package/README.md +38 -0
  4. package/RELEASE_NOTES.md +84 -0
  5. package/ai-orchestration-ecosystem/INDEX.md +12 -1
  6. package/ai-orchestration-ecosystem/README.md +9 -0
  7. package/ai-orchestration-ecosystem/agents/ui-systems-designer.md +3 -0
  8. package/ai-orchestration-ecosystem/agents/ux-architect.md +3 -0
  9. package/ai-orchestration-ecosystem/agents/visual-qa.md +3 -0
  10. package/ai-orchestration-ecosystem/evals/sagaz-evaluation-suite.md +218 -27
  11. package/ai-orchestration-ecosystem/governance/package-release-policy.md +28 -1
  12. package/ai-orchestration-ecosystem/governance/versioning.md +10 -0
  13. package/ai-orchestration-ecosystem/manifest.json +168 -0
  14. package/ai-orchestration-ecosystem/protocols/ci-cd-readiness.md +1 -1
  15. package/ai-orchestration-ecosystem/protocols/communication.md +1 -1
  16. package/ai-orchestration-ecosystem/protocols/component-governance.md +72 -0
  17. package/ai-orchestration-ecosystem/protocols/delegation.md +1 -1
  18. package/ai-orchestration-ecosystem/protocols/dependency-graph-validation.md +49 -0
  19. package/ai-orchestration-ecosystem/protocols/design-quality.md +17 -1
  20. package/ai-orchestration-ecosystem/protocols/durable-run-state.md +66 -4
  21. package/ai-orchestration-ecosystem/protocols/future-change-safety.md +98 -0
  22. package/ai-orchestration-ecosystem/protocols/github-operations.md +1 -1
  23. package/ai-orchestration-ecosystem/protocols/guided-proactivity.md +1 -1
  24. package/ai-orchestration-ecosystem/protocols/installed-skill-sync.md +42 -0
  25. package/ai-orchestration-ecosystem/protocols/memory.md +1 -1
  26. package/ai-orchestration-ecosystem/protocols/model-routing.md +1 -1
  27. package/ai-orchestration-ecosystem/protocols/performance-budgets.md +7 -0
  28. package/ai-orchestration-ecosystem/protocols/post-delivery-monitoring.md +1 -1
  29. package/ai-orchestration-ecosystem/protocols/production-readiness.md +1 -1
  30. package/ai-orchestration-ecosystem/protocols/quality-gates.md +1 -1
  31. package/ai-orchestration-ecosystem/protocols/release-versioning-gate.md +85 -0
  32. package/ai-orchestration-ecosystem/protocols/secure-sdlc.md +7 -0
  33. package/ai-orchestration-ecosystem/protocols/squad-pipeline-handoffs.md +1 -1
  34. package/ai-orchestration-ecosystem/protocols/stack-selection.md +1 -1
  35. package/ai-orchestration-ecosystem/protocols/testing-matrix.md +1 -1
  36. package/ai-orchestration-ecosystem/skills/refactor-proofing.md +8 -0
  37. package/ai-orchestration-ecosystem/squads/design-studio.md +19 -3
  38. package/ai-orchestration-ecosystem/stack-presets/admin-dashboard.md +25 -1
  39. package/ai-orchestration-ecosystem/stack-presets/firebase.md +15 -0
  40. package/ai-orchestration-ecosystem/stack-presets/nextjs-vercel.md +7 -0
  41. package/ai-orchestration-ecosystem/stack-presets/node-api.md +6 -0
  42. package/ai-orchestration-ecosystem/stack-presets/react-native.md +6 -0
  43. package/ai-orchestration-ecosystem/stack-presets/react-vite.md +6 -0
  44. package/ai-orchestration-ecosystem/stack-presets/static-site.md +6 -0
  45. package/ai-orchestration-ecosystem/stack-presets/supabase.md +14 -0
  46. package/ai-orchestration-ecosystem/tasks/design-system.md +41 -14
  47. package/ai-orchestration-ecosystem/tasks/github-release-ops.md +37 -14
  48. package/ai-orchestration-ecosystem/tasks/implementation-build.md +40 -14
  49. package/ai-orchestration-ecosystem/tasks/intake-brief.md +40 -12
  50. package/ai-orchestration-ecosystem/tasks/product-requirements.md +39 -13
  51. package/ai-orchestration-ecosystem/tasks/production-readiness.md +39 -14
  52. package/ai-orchestration-ecosystem/tasks/stack-recommendation.md +40 -14
  53. package/ai-orchestration-ecosystem/tasks/verification-qa.md +40 -14
  54. package/ai-orchestration-ecosystem/templates/changelog.md +39 -2
  55. package/ai-orchestration-ecosystem/templates/future-change-guide.md +121 -0
  56. package/ai-orchestration-ecosystem/templates/refactor-safety-contract.md +43 -0
  57. package/ai-orchestration-ecosystem/templates/release-notes.md +41 -1
  58. package/ai-orchestration-ecosystem/templates/run-state.md +80 -0
  59. package/ai-orchestration-ecosystem/tools/tool-registry.md +2 -0
  60. package/ai-orchestration-ecosystem/workflows/brownfield-refactor-safe.md +27 -1
  61. package/ai-orchestration-ecosystem/workflows/bugfix-to-release.md +27 -1
  62. package/ai-orchestration-ecosystem/workflows/greenfield-web-app.md +27 -1
  63. package/ai-orchestration-ecosystem/workflows/mobile-app-production.md +27 -1
  64. package/ai-orchestration-ecosystem/workflows/web-production-release.md +27 -1
  65. package/bin/sagaz.js +35 -7
  66. package/codex-skill/sagaz/SKILL.md +12 -2
  67. package/package.json +3 -1
  68. package/scripts/verify-package.js +1378 -11
package/CHANGELOG.md ADDED
@@ -0,0 +1,67 @@
1
+ # Changelog
2
+
3
+ ## [0.2.0] - 2026-06-08
4
+
5
+ ### Release Type
6
+
7
+ Minor
8
+
9
+ ### Added
10
+
11
+ - Formal workflow contracts and handoff validation across Sagaz workflows.
12
+ - Stronger durable workflow state contract and run-state template.
13
+ - Task-first contracts for reusable Sagaz task definitions.
14
+ - Internal ecosystem manifest and dependency graph validation.
15
+ - Component governance protocol for creating, updating, renaming, deprecating, and removing ecosystem components.
16
+ - Stronger Sagaz evaluation suite with scenario IDs, required evidence, scoring, and release gates.
17
+ - Release/versioning gate for version bumps, tags, GitHub releases, and npm publishes.
18
+ - GitHub Actions enforcement across Linux, Windows, and macOS package checks.
19
+ - Formal changelog and release notes templates.
20
+ - Installed skill synchronization protocol and `npx sagaz-ai sync` command.
21
+
22
+ ### Changed
23
+
24
+ - `npm test` now validates workflow contracts, task contracts, manifest coverage, dependency graph integrity, release gates, GitHub Actions enforcement, evaluation coverage, and release artifact templates.
25
+ - `npx sagaz-ai doctor` now checks whether the installed Codex Desktop skill is synchronized with the source skill.
26
+ - README and install documentation now include system requirements, Node.js guidance, platform notes, and skill sync instructions.
27
+ - Package release policy now requires manifest validation, dependency graph validation, evaluation evidence, changelog or release notes, and installed skill sync evidence.
28
+
29
+ ### Fixed
30
+
31
+ - Corrected malformed Markdown code fences in several protocol files.
32
+ - Added missing references so critical protocols remain reachable from the dependency graph.
33
+
34
+ ### Removed
35
+
36
+ - None.
37
+
38
+ ### Security
39
+
40
+ - Release and GitHub operations now require clearer approval gates before publishing, tagging, pushing, or releasing.
41
+
42
+ ### Compatibility
43
+
44
+ - Windows: supported through Codex Desktop and validated by GitHub Actions on `windows-latest`.
45
+ - macOS: supported through Codex Desktop and validated by GitHub Actions on `macos-latest`.
46
+ - Node.js: package baseline remains `>=22.14`; GitHub Actions use Node.js 24.
47
+ - Codex Desktop: Sagaz remains a Codex Desktop skill, not a standalone terminal agent runtime.
48
+
49
+ ### Migration Notes
50
+
51
+ - Existing users should run `npx sagaz-ai sync` or `npx sagaz-ai install --force` to refresh the installed Codex Desktop skill.
52
+ - Open a new Codex Desktop thread after syncing so the updated skill can be discovered.
53
+
54
+ ### Verification
55
+
56
+ - npm test: passed locally on Windows.
57
+ - npm run doctor: passed locally on Windows with `Synchronized with source: yes`.
58
+ - npm pack --dry-run: passed locally on Windows after allowing npm cache access outside the sandbox.
59
+ - Evaluation scenarios: covered by the strengthened `evals/sagaz-evaluation-suite.md` contract.
60
+
61
+ ### Release Evidence
62
+
63
+ - Commit: pending.
64
+ - Tag: pending.
65
+ - GitHub release: pending.
66
+ - npm package: pending.
67
+
package/INSTALL.md CHANGED
@@ -72,6 +72,7 @@ Sagaz: explain the available workflows.
72
72
  ```powershell
73
73
  npx sagaz-ai status
74
74
  npx sagaz-ai doctor
75
+ npx sagaz-ai sync
75
76
  npx sagaz-ai install --force
76
77
  ```
77
78
 
@@ -80,9 +81,30 @@ npx sagaz-ai install --force
80
81
  ```bash
81
82
  npx sagaz-ai status
82
83
  npx sagaz-ai doctor
84
+ npx sagaz-ai sync
83
85
  npx sagaz-ai install --force
84
86
  ```
85
87
 
88
+ ## Sync The Installed Skill
89
+
90
+ When this repository changes, refresh the installed Codex Desktop skill before relying on new Sagaz behavior.
91
+
92
+ Windows PowerShell:
93
+
94
+ ```powershell
95
+ npx sagaz-ai sync
96
+ npx sagaz-ai doctor
97
+ ```
98
+
99
+ macOS Terminal:
100
+
101
+ ```bash
102
+ npx sagaz-ai sync
103
+ npx sagaz-ai doctor
104
+ ```
105
+
106
+ Then open a new Codex Desktop thread so the updated skill is discovered.
107
+
86
108
  ## Manual Install
87
109
 
88
110
  Copy the Sagaz skill folder from the repository.
package/README.md CHANGED
@@ -43,6 +43,7 @@ Sagaz also guides the user through the process. At the end of each phase, it exp
43
43
  - **Static site discipline:** hand-built static sites use clean directory URLs by default, GitHub Pages-ready files, and a practical SEO baseline.
44
44
  - **Sagaz evaluations:** scenario-based checks help prevent regressions in the orchestration system itself.
45
45
  - **Compatibility audits:** Sagaz can check whether Windows, macOS, npm, Node.js, Codex Desktop, AI model behavior, GitHub, package contents, or external platform changes require a Sagaz update.
46
+ - **Future-change safety:** generated projects include detailed documentation for future refactors, improvements, feature additions, design consistency, UX preservation, invariants, and regression checks.
46
47
 
47
48
  ## How It Works
48
49
 
@@ -75,6 +76,37 @@ Key areas:
75
76
  - `brownfield-refactor-safe`: refactor an existing project safely.
76
77
  - `bugfix-to-release`: fix a bug through verification and release.
77
78
 
79
+ ## System Requirements
80
+
81
+ Install these before using Sagaz:
82
+
83
+ - **Codex Desktop:** required. Sagaz is designed to run as a Codex Desktop skill, not as a standalone terminal agent.
84
+ - **Node.js and npm:** required for the recommended `npx sagaz-ai install` flow. Use Node.js `22.14+` at minimum; Node.js `24 LTS` is preferred for new installations.
85
+ - **Git:** recommended for cloning this repository, inspecting changes, and using Sagaz GitHub workflows.
86
+ - **Operating system:** Windows or macOS with access to the local Codex skills folder.
87
+
88
+ Optional but recommended for common Sagaz workflows:
89
+
90
+ - **GitHub CLI (`gh`):** needed for guided GitHub operations such as authentication, pull requests, checks, issues, releases, and repository automation.
91
+ - **Project runtime tools:** install the tools required by the project Sagaz will work on, such as `pnpm`, `yarn`, `bun`, Python, Java, Android Studio, Xcode, Expo/EAS, or database CLIs when that project needs them.
92
+ - **Browser or web testing tools:** useful for visual QA, Playwright flows, accessibility checks, and local web app verification.
93
+ - **Design/tool connectors:** optional connectors such as Figma MCP can be used when available for app-like mockups, design systems, and visual QA.
94
+
95
+ Verify the core local tools:
96
+
97
+ ```bash
98
+ node --version
99
+ npm --version
100
+ git --version
101
+ ```
102
+
103
+ Verify GitHub CLI only if you want GitHub Ops:
104
+
105
+ ```bash
106
+ gh --version
107
+ gh auth status
108
+ ```
109
+
78
110
  ## Installation In Codex Desktop
79
111
 
80
112
  ### Recommended: Install With npx
@@ -105,6 +137,7 @@ Check installation:
105
137
  ```powershell
106
138
  npx sagaz-ai status
107
139
  npx sagaz-ai doctor
140
+ npx sagaz-ai sync
108
141
  ```
109
142
 
110
143
  #### macOS Terminal
@@ -124,8 +157,11 @@ Check installation:
124
157
  ```bash
125
158
  npx sagaz-ai status
126
159
  npx sagaz-ai doctor
160
+ npx sagaz-ai sync
127
161
  ```
128
162
 
163
+ Use `npx sagaz-ai sync` after updating this repository or package to refresh the installed Codex Desktop skill. Then open a new Codex Desktop thread so Sagaz is rediscovered.
164
+
129
165
  Then open a new Codex Desktop thread and run:
130
166
 
131
167
  ```text
@@ -206,6 +242,8 @@ Sagaz should choose the appropriate workflow, create or update persistent run st
206
242
 
207
243
  For production-grade work, Sagaz can also apply SRE readiness, DORA metrics, secure SDLC, dependency governance, data privacy lifecycle, architecture fitness functions, API contracts, performance budgets, accessibility compliance, database migrations, release strategy, and AI application quality protocols.
208
244
 
245
+ For medium, large, production, web, mobile, refactor, or feature-extension work, Sagaz should create or update a future-change guide covering product intent, architecture, design system, UX rules, components, invariants, testing, safe refactor procedure, safe feature-addition procedure, deployment, and known risks.
246
+
209
247
  For tool-heavy work, Sagaz uses a tool registry to verify local availability and recommend the right connector or platform before asking permission to install, authenticate, deploy, publish, or modify external resources.
210
248
 
211
249
  For common project types, Sagaz can start from documented stack presets such as Next.js on Vercel, React with Vite, Expo mobile, React Native, Supabase, Firebase, Node APIs, static sites, and admin dashboards. For hand-built static sites, Sagaz should default to clean URLs through directory `index.html` files and verify SEO essentials including canonical URLs, Open Graph/Twitter metadata, Schema.org JSON-LD, sitemap, robots, optimized images, and GitHub Pages files when applicable.
@@ -0,0 +1,84 @@
1
+ # Release Notes
2
+
3
+ ## Release
4
+
5
+ Version: 0.2.0
6
+ Date: 2026-06-08
7
+ Release type: Minor
8
+ GitHub commit: pending
9
+ Git tag: pending
10
+ GitHub release: pending
11
+ npm package: pending
12
+
13
+ ## Summary
14
+
15
+ Sagaz 0.2.0 strengthens the orchestration ecosystem around formal workflows, handoffs, task contracts, registry validation, release gates, GitHub Actions enforcement, evaluation coverage, and installed skill synchronization.
16
+
17
+ ## Audience Impact
18
+
19
+ - New users: clearer README, requirements, installation guidance, and sync instructions.
20
+ - Existing users: should refresh the installed skill with `npx sagaz-ai sync`.
21
+ - Maintainers: stronger `npm test` coverage catches manifest, dependency graph, release, workflow, task, and evaluation drift.
22
+ - Design team: Figma MCP and app-like mockup guidance are now part of Sagaz operating rules.
23
+ - Engineering team: workflows now behave more like formal contracts with gates, state, handoffs, and evidence.
24
+
25
+ ## What Changed
26
+
27
+ - Added `manifest.json` as an internal component registry.
28
+ - Added dependency graph validation.
29
+ - Added component governance.
30
+ - Added release/versioning gate.
31
+ - Added GitHub Actions package checks across Linux, Windows, and macOS.
32
+ - Added formal changelog and release notes templates.
33
+ - Added installed skill sync protocol and CLI command.
34
+ - Strengthened workflow, task, run-state, and evaluation contracts.
35
+
36
+ ## Why It Matters
37
+
38
+ Sagaz is now harder to accidentally drift, release, or publish in an inconsistent state. The system can validate its own structure, release gates, installed skill sync, and GitHub automation before maintainers ship changes.
39
+
40
+ ## Compatibility
41
+
42
+ - Windows: supported and locally verified.
43
+ - macOS: supported through Codex Desktop and GitHub Actions runner validation.
44
+ - Node.js: `>=22.14` remains the package minimum; Node.js 24 is preferred for new installs and CI.
45
+ - Codex Desktop: required.
46
+ - GitHub Actions: package checks run on Ubuntu, Windows, and macOS.
47
+ - npm package: still an installer/distribution package, not a standalone Sagaz runtime.
48
+
49
+ ## Migration Notes
50
+
51
+ Run:
52
+
53
+ ```bash
54
+ npx sagaz-ai sync
55
+ npx sagaz-ai doctor
56
+ ```
57
+
58
+ Then open a new Codex Desktop thread so Sagaz is rediscovered.
59
+
60
+ ## Verification
61
+
62
+ - `npm test`: passed locally.
63
+ - `npm run doctor`: passed locally with installed skill synchronization confirmed.
64
+ - `npm pack --dry-run`: passed locally after npm cache access was allowed outside the sandbox.
65
+ - Evaluation scenarios: enforced by the strengthened Sagaz evaluation suite.
66
+ - Manual checks: Git status reviewed before release preparation.
67
+
68
+ ## Known Limitations
69
+
70
+ - GitHub release and npm publishing remain manual approval steps.
71
+ - The package still installs Sagaz for Codex Desktop; it is not a standalone terminal orchestration runtime.
72
+
73
+ ## Rollback Plan
74
+
75
+ - Revert the release commit if the GitHub repository update fails.
76
+ - If published to npm, publish a patch version that restores the previous known-good package contents.
77
+ - Users can reinstall a previous npm version with `npx sagaz-ai@<version> install --force` if needed.
78
+
79
+ ## Release Decision
80
+
81
+ Approved by: Thiago Cabral
82
+ Approval date: 2026-06-08
83
+ Residual risk: GitHub Actions and npm publishing still need remote execution after push.
84
+
@@ -5,6 +5,7 @@
5
5
  - `ACTIVATE.md`: ready-to-use activation prompts.
6
6
  - `quickstart.md`: minimum operating rules.
7
7
  - `README.md`: ecosystem overview.
8
+ - `manifest.json`: internal component registry for validation and navigation.
8
9
 
9
10
  ## Core
10
11
 
@@ -50,17 +51,27 @@ See `protocols/` for quality gates, testing matrix, stack selection, design qual
50
51
  - `protocols/dora-metrics.md`
51
52
  - `protocols/secure-sdlc.md`
52
53
  - `protocols/dependency-governance.md`
54
+ - `protocols/dependency-graph-validation.md`
53
55
  - `protocols/data-privacy-lifecycle.md`
54
56
  - `protocols/architecture-fitness-functions.md`
55
57
  - `protocols/api-contracts.md`
56
58
  - `protocols/performance-budgets.md`
57
59
  - `protocols/accessibility-compliance.md`
58
60
  - `protocols/database-migrations.md`
61
+ - `protocols/release-versioning-gate.md`
59
62
  - `protocols/release-strategy.md`
60
63
  - `protocols/ai-application-quality.md`
61
64
  - `protocols/agent-observability.md`
65
+ - `protocols/component-governance.md`
66
+ - `protocols/communication.md`
67
+ - `protocols/delegation.md`
62
68
  - `protocols/durable-run-state.md`
63
69
  - `protocols/compatibility-update-audit.md`
70
+ - `protocols/future-change-safety.md`
71
+ - `protocols/installed-skill-sync.md`
72
+ - `protocols/memory.md`
73
+ - `protocols/model-routing.md`
74
+ - `protocols/post-delivery-monitoring.md`
64
75
 
65
76
  ## Tools
66
77
 
@@ -88,7 +99,7 @@ See `protocols/` for quality gates, testing matrix, stack selection, design qual
88
99
 
89
100
  ## Templates
90
101
 
91
- See `templates/` for task briefs, product specs, technical specs, design systems, stack recommendations, run state, squad handoffs, QA reports, release checklists, changelogs, release notes, and final handoffs.
102
+ See `templates/` for task briefs, product specs, technical specs, design systems, future-change guides, refactor safety contracts, stack recommendations, run state, squad handoffs, QA reports, release checklists, changelogs, release notes, and final handoffs.
92
103
 
93
104
  ## Governance
94
105
 
@@ -12,6 +12,7 @@ A local AI orchestration ecosystem for Codex, focused on autonomous teams, consi
12
12
 
13
13
  ## Structure
14
14
 
15
+ - `manifest.json`: internal component registry used to validate and navigate the ecosystem.
15
16
  - `workflows/`: named end-to-end flows.
16
17
  - `squads/`: specialized teams.
17
18
  - `agents/`: role definitions.
@@ -25,6 +26,14 @@ A local AI orchestration ecosystem for Codex, focused on autonomous teams, consi
25
26
 
26
27
  No delivery is complete without verification evidence proportional to the risk.
27
28
 
29
+ ## Ecosystem Maintenance
30
+
31
+ Use `manifest.json` as the component registry and `protocols/component-governance.md` when creating, updating, renaming, deprecating, or removing Sagaz ecosystem components.
32
+
33
+ Use `protocols/release-versioning-gate.md` before version bumps, Git tags, GitHub releases, or npm publishes. A Sagaz release is not ready until package checks, doctor, manifest coverage, dependency graph validation, relevant evaluation scenarios, and changelog or release notes are complete.
34
+
35
+ Use `protocols/installed-skill-sync.md` after changing the Sagaz skill or release rules so the installed Codex Desktop skill does not drift from the repository copy.
36
+
28
37
  ## Advanced Engineering Coverage
29
38
 
30
39
  Sagaz includes protocols for SRE readiness, DORA metrics, secure SDLC, dependency governance, data privacy lifecycle, architecture fitness functions, API contracts, performance budgets, accessibility compliance, database migrations, release strategy, and AI application quality.
@@ -9,6 +9,8 @@ Create design systems, tokens, components, and consistent visual rules.
9
9
  - Define colors, typography, spacing, radii, borders, elevation, icons, and motion.
10
10
  - Create base components and variants.
11
11
  - Standardize forms, feedback, cards, tables, navigation, and modals.
12
+ - When Figma MCP is available, create implementation-ready Figma components, variants, tokens, and screen frames for the mockup.
13
+ - Ensure Figma components map cleanly to the chosen frontend stack, component library, or internal design system.
12
14
 
13
15
  ## Standard Output
14
16
 
@@ -16,5 +18,6 @@ Create design systems, tokens, components, and consistent visual rules.
16
18
  - Component inventory
17
19
  - Responsive rules
18
20
  - Component states
21
+ - Figma component and frame plan when applicable
19
22
  - Consistency checklist
20
23
 
@@ -10,6 +10,8 @@ Design flows, journeys, information architecture, and interactions that reduce f
10
10
  - Map happy paths, errors, and empty states.
11
11
  - Organize navigation and information hierarchy.
12
12
  - Set usability criteria.
13
+ - When Figma MCP is available, define navigable mockup flows that behave like the intended application.
14
+ - Specify interaction states, transitions, and screen-to-screen behavior clearly enough for implementation.
13
15
 
14
16
  ## Standard Output
15
17
 
@@ -17,5 +19,6 @@ Design flows, journeys, information architecture, and interactions that reduce f
17
19
  - Navigation map
18
20
  - Screen states
19
21
  - Interaction requirements
22
+ - Figma mockup flow requirements when applicable
20
23
  - Usability criteria
21
24
 
@@ -10,10 +10,13 @@ Validate interfaces visually before delivery and block layout, hierarchy, respon
10
10
  - Find overlap, overflow, misalignment, clipping, and weak contrast.
11
11
  - Validate interactive states.
12
12
  - Compare implementation against the design system.
13
+ - When Figma MCP was used, inspect Figma frames or screenshots before handoff and verify that the mockup supports the intended user journeys.
14
+ - Confirm that the mockup includes realistic states and does not create impossible implementation expectations.
13
15
 
14
16
  ## Standard Output
15
17
 
16
18
  - Viewports tested
19
+ - Figma frames or screenshots reviewed when applicable
17
20
  - Issues found
18
21
  - Recommended fixes
19
22
  - Verdict
@@ -2,70 +2,261 @@
2
2
 
3
3
  ## Purpose
4
4
 
5
- Evaluate whether Sagaz itself produces consistent, reliable, low-token, production-oriented results.
5
+ Evaluate whether Sagaz produces consistent, reliable, low-token, production-oriented orchestration across Codex Desktop on Windows and macOS.
6
+
7
+ The suite exists to catch regressions in workflow selection, handoffs, state tracking, ecosystem governance, design collaboration, verification depth, and GitHub delivery behavior before a Sagaz release.
6
8
 
7
9
  ## Evaluation Cadence
8
10
 
9
- Run these evaluations before major Sagaz releases and after changing core workflows, squads, protocols, or installation behavior.
11
+ Run this suite before every major Sagaz release, after changing any workflow, squad, task, protocol, stack preset, manifest entry, installation path, or Codex Desktop invocation guidance.
12
+
13
+ Run the relevant scenario subset after smaller changes. For example, a change to `protocols/durable-run-state.md` must rerun `EVAL-RUN-STATE-RESUME`, and a change to `manifest.json` must rerun `EVAL-MANIFEST-DRIFT` and `EVAL-DEPENDENCY-GRAPH-DRIFT`.
14
+
15
+ ## Evaluation Inputs
16
+
17
+ - The current workspace tree.
18
+ - `ai-orchestration-ecosystem/manifest.json`.
19
+ - `codex-skill/sagaz/SKILL.md`.
20
+ - The active workflow, task, protocol, stack preset, and template files.
21
+ - Codex Desktop environment assumptions for Windows and macOS.
22
+ - The user prompt used by each scenario.
23
+ - Evidence from `npm test` and `node ./bin/sagaz.js doctor`.
24
+
25
+ ## Core Evaluation Matrix
10
26
 
11
- ## Core Evaluations
27
+ | Evaluation | Goal | Pass Criteria | Required Evidence |
28
+ | --- | --- | --- | --- |
29
+ | Invocation | Sagaz is easy to start by name | User can invoke Sagaz with one prompt in a new Codex Desktop project | The response points to the skill invocation and required project context |
30
+ | Cross-platform readiness | Windows and macOS are handled correctly | Commands, paths, and setup notes do not assume one OS unless stated | Windows and macOS considerations are explicit |
31
+ | Language intake | User can write in any language | Sagaz understands the request and follows the configured response language rule | The request is interpreted without losing intent |
32
+ | Workflow selection | Correct workflow is selected | Selected workflow matches project type, maturity, and risk | Workflow ID and reason are stated |
33
+ | Formal contract use | Workflows and handoffs are contract-driven | Phases, owner squads, resources, gates, and handoffs match the formal workflow contract | Phase ledger and handoff evidence are present |
34
+ | Workflow state | Long work can resume safely | Current phase, squad, task, blockers, skipped phases, and next action are recorded | `templates/run-state.md` fields are populated or updated |
35
+ | Token discipline | Only needed files are loaded | No broad file loading without a stated reason | Loaded context is scoped to the active phase |
36
+ | Task-first execution | Work maps to explicit tasks | Active work references a task contract and its evidence requirements | Task ID, outputs, and acceptance criteria are named |
37
+ | Ecosystem governance | Component changes preserve the registry | Manifest, index, skill references, and dependency graph stay aligned | `npm test` catches missing or dangling components |
38
+ | Stack advisory | Stack is justified | Cost, speed, scale, maintainability, deployment, and future changes are covered | Stack recommendation names tradeoffs and alternatives |
39
+ | Design quality | UI work reaches high standards | Design system, responsiveness, accessibility, and visual QA are included | Design QA evidence and Figma/MCP path are stated when relevant |
40
+ | Verification depth | Tests match risk | Build, lint, unit, integration, e2e, accessibility, and manual checks are considered | Test plan and executed checks are reported |
41
+ | GitHub guidance | User is guided proactively | Commits, pushes, PRs, releases, and issues are suggested or performed at the right time | GitHub operation evidence or permission request is present |
42
+ | Production readiness | Launch risk is explicit | Security, env vars, rollback, monitoring, and residual risks are documented | Production readiness checklist is complete |
12
43
 
13
- | Evaluation | Goal | Pass Criteria |
14
- | --- | --- | --- |
15
- | Invocation | Sagaz is easy to start by name | User can invoke Sagaz with one prompt |
16
- | Language intake | User can write in any language | Sagaz understands the request and answers in American English |
17
- | Workflow selection | Correct workflow is selected | Selected workflow matches project type and risk |
18
- | Token discipline | Only needed files are loaded | No broad file loading without reason |
19
- | Handoff quality | Teams transition clearly | Current work, evidence, next work, and permission are stated |
20
- | Stack advisory | Stack is justified | Cost, speed, scale, maintainability, deployment, and future changes are covered |
21
- | Design quality | UI work reaches high standards | Design system, responsiveness, accessibility, and visual QA are included |
22
- | Verification depth | Tests match risk | Build, lint, unit, integration, e2e, accessibility, and manual checks are considered |
23
- | GitHub guidance | User is guided proactively | Commits, pushes, PRs, releases, and issues are suggested at the right time |
24
- | Production readiness | Launch risk is explicit | Security, env vars, rollback, monitoring, and residual risks are documented |
44
+ ## Scenario Contracts
25
45
 
26
- ## Scenario Tests
46
+ | Scenario ID | Workflow Or Focus | Prompt | Expected Evidence | Minimum Score |
47
+ | --- | --- | --- | --- | --- |
48
+ | EVAL-WEB-GREENFIELD | `workflows/greenfield-web-app.md` | Create a complete appointment scheduling SaaS with premium design and Vercel deployment. | Workflow contract, stack recommendation, design system, implementation plan, verification plan, deployment notes | 3 |
49
+ | EVAL-WEB-PRODUCTION | `workflows/web-production-release.md` | Prepare this existing web app for production release on GitHub and Vercel. | Production readiness gate, CI/CD checks, env var audit, rollback plan, GitHub release path | 3 |
50
+ | EVAL-MOBILE-PRODUCTION | `workflows/mobile-app-production.md` | Create an Android/iOS habit tracker and recommend the best stack. | React Native or alternative stack rationale, store-readiness risks, mobile QA, release checklist | 3 |
51
+ | EVAL-BUGFIX-RELEASE | `workflows/bugfix-to-release.md` | Fix this production bug, test it, and prepare a GitHub release. | Reproduction, root cause, focused fix, regression test, release evidence, handoff | 3 |
52
+ | EVAL-BROWNFIELD-REFACTOR | `workflows/brownfield-refactor-safe.md` | Refactor this existing project safely without changing behavior. | Baseline behavior, scoped refactor plan, tests before and after, rollback notes | 3 |
53
+ | EVAL-DESIGN-FIGMA | Design MCP readiness | Prepare the design team to use the Figma MCP to create mockups that behave like real apps. | Figma MCP guidance, interactive mockup expectations, design handoff, implementation constraints | 3 |
54
+ | EVAL-GITHUB-OPS | `tasks/github-release-ops.md` | Update everything on GitHub after the approved changes. | Permission-aware git status, commit, push, optional PR/release guidance, no unrelated reverts | 3 |
55
+ | EVAL-RUN-STATE-RESUME | `templates/run-state.md` and `protocols/durable-run-state.md` | Resume a paused Sagaz run after context compaction. | Current phase, squad, task, completed work, blockers, skipped phases, next action | 3 |
56
+ | EVAL-MANIFEST-DRIFT | `manifest.json` governance | Add a new protocol and make sure the ecosystem registry stays correct. | Manifest update, INDEX/SKILL references, component governance checklist, validation result | 3 |
57
+ | EVAL-DEPENDENCY-GRAPH-DRIFT | `protocols/dependency-graph-validation.md` | Rename a task used by a workflow without breaking references. | Updated workflow contract, task contract, manifest path, dependency graph validation | 3 |
58
+ | EVAL-BEGINNER-GUIDANCE | Guided proactivity | I am a beginner. Guide me through everything and ask permission before major actions. | Plain-language guidance, permission gates, no hidden destructive steps, next action clarity | 2 |
27
59
 
28
- Use these prompts as smoke tests:
60
+ ## Scenario Prompts
61
+
62
+ ### EVAL-WEB-GREENFIELD
29
63
 
30
64
  ```text
31
65
  Sagaz: create a complete appointment scheduling SaaS with premium design and Vercel deployment.
32
66
  ```
33
67
 
68
+ Expected behavior:
69
+
70
+ - Select `workflows/greenfield-web-app.md`.
71
+ - Start from intake and product requirements before implementation.
72
+ - Recommend a stack with cost, scale, speed, deployment, and maintainability tradeoffs.
73
+ - Include shadcn/ui only when it fits the selected stack and project constraints.
74
+ - Define design system expectations, responsive QA, accessibility checks, and production readiness gates.
75
+
76
+ ### EVAL-WEB-PRODUCTION
77
+
78
+ ```text
79
+ Sagaz: prepare this existing web app for production release on GitHub and Vercel.
80
+ ```
81
+
82
+ Expected behavior:
83
+
84
+ - Select `workflows/web-production-release.md`.
85
+ - Audit environment variables, build scripts, tests, monitoring, rollback, and CI/CD.
86
+ - Ask permission before committing, pushing, creating PRs, or changing release state.
87
+ - Report residual risk and release evidence.
88
+
89
+ ### EVAL-MOBILE-PRODUCTION
90
+
34
91
  ```text
35
92
  Sagaz: create an Android/iOS habit tracker and recommend the best stack.
36
93
  ```
37
94
 
95
+ Expected behavior:
96
+
97
+ - Select `workflows/mobile-app-production.md`.
98
+ - Compare mobile stack options and name the recommended path.
99
+ - Account for app store readiness, device QA, offline behavior, notifications, and analytics.
100
+ - Keep commands and setup guidance compatible with Windows and macOS where possible.
101
+
102
+ ### EVAL-BUGFIX-RELEASE
103
+
104
+ ```text
105
+ Sagaz: fix this production bug, test it, and prepare a GitHub release.
106
+ ```
107
+
108
+ Expected behavior:
109
+
110
+ - Select `workflows/bugfix-to-release.md`.
111
+ - Reproduce or characterize the bug before changing code.
112
+ - Implement the smallest safe fix.
113
+ - Add or run regression checks.
114
+ - Prepare GitHub release operations only after evidence is available.
115
+
116
+ ### EVAL-BROWNFIELD-REFACTOR
117
+
38
118
  ```text
39
119
  Sagaz: refactor this existing project safely without changing behavior.
40
120
  ```
41
121
 
122
+ Expected behavior:
123
+
124
+ - Select `workflows/brownfield-refactor-safe.md`.
125
+ - Establish baseline behavior and test coverage.
126
+ - Keep refactor scope narrow.
127
+ - Preserve user changes and avoid unrelated churn.
128
+ - Document rollback and verification evidence.
129
+
130
+ ### EVAL-DESIGN-FIGMA
131
+
42
132
  ```text
43
- Sagaz: fix this production bug, test it, and prepare a GitHub release.
133
+ Sagaz: prepare the design team to use the Figma MCP to create mockups that function like real applications inside Figma.
134
+ ```
135
+
136
+ Expected behavior:
137
+
138
+ - Route to design quality and Figma MCP guidance.
139
+ - Explain that Figma mockups should include realistic states, interactions, flows, content density, responsive behavior, and implementation constraints.
140
+ - Mention when to use Figma tools, design system tokens, Code Connect, and handoff evidence.
141
+ - Avoid promising unsupported runtime behavior inside Figma.
142
+
143
+ ### EVAL-GITHUB-OPS
144
+
145
+ ```text
146
+ Sagaz: update everything on GitHub after the approved changes.
44
147
  ```
45
148
 
149
+ Expected behavior:
150
+
151
+ - Inspect git status before staging.
152
+ - Preserve unrelated user changes unless explicitly asked to include them.
153
+ - Ask permission when sandbox or remote operations require approval.
154
+ - Report commit hash, branch, push status, and any PR/release follow-up.
155
+
156
+ ### EVAL-RUN-STATE-RESUME
157
+
158
+ ```text
159
+ Sagaz: continue from the last checkpoint after the context was compacted.
160
+ ```
161
+
162
+ Expected behavior:
163
+
164
+ - Reconstruct current workflow state from available notes and files.
165
+ - Continue from the newest user-approved point.
166
+ - Update or reference the phase ledger, blockers, handoffs, and next action.
167
+ - Avoid restarting the whole run without evidence that restart is needed.
168
+
169
+ ### EVAL-MANIFEST-DRIFT
170
+
171
+ ```text
172
+ Sagaz: add a new protocol for deployment rollback and make sure the ecosystem registry remains correct.
173
+ ```
174
+
175
+ Expected behavior:
176
+
177
+ - Add the protocol under `protocols/`.
178
+ - Register it in `manifest.json`.
179
+ - Add references in `INDEX.md`, `README.md`, or `SKILL.md` when user-facing discovery requires it.
180
+ - Run `npm test` and `node ./bin/sagaz.js doctor`.
181
+
182
+ ### EVAL-DEPENDENCY-GRAPH-DRIFT
183
+
184
+ ```text
185
+ Sagaz: rename the verification task and update every workflow that depends on it.
186
+ ```
187
+
188
+ Expected behavior:
189
+
190
+ - Update the task file, manifest entry, workflow formal contracts, task references, and documentation references.
191
+ - Verify the dependency graph catches no dangling or unregistered references.
192
+ - Preserve task contract sections and workflow phase sequencing.
193
+
194
+ ### EVAL-BEGINNER-GUIDANCE
195
+
46
196
  ```text
47
197
  Sagaz: I am a beginner. Guide me through everything and ask permission before major actions.
48
198
  ```
49
199
 
50
- ## Scoring
200
+ Expected behavior:
201
+
202
+ - Use plain explanations and short steps.
203
+ - Ask before destructive, remote, installation, or account-changing operations.
204
+ - Keep the user oriented to what is happening and why.
205
+ - Still make progress where safe without forcing unnecessary choices.
206
+
207
+ ## Scoring Rubric
51
208
 
52
- Score each scenario from 0 to 3:
209
+ Score each core evaluation and scenario from 0 to 3:
53
210
 
54
- - 0: failed or unsafe
55
- - 1: partially usable
56
- - 2: usable with gaps
57
- - 3: production-grade for the scenario
211
+ - 0: Failed, unsafe, or materially misleading.
212
+ - 1: Partially usable but missing important evidence or using the wrong workflow.
213
+ - 2: Usable with minor gaps, no unsafe behavior, and clear next action.
214
+ - 3: Production-grade for the scenario, with correct workflow, state, evidence, gates, and handoff.
58
215
 
59
- Sagaz should not release a major workflow change with any core evaluation below 2.
216
+ Any score of 0 is a release blocker. Any score below the scenario minimum requires a fix and retest.
217
+
218
+ ## Release Gate
219
+
220
+ Sagaz should not release a major workflow, protocol, task, or registry change unless:
221
+
222
+ - Every core evaluation scores at least 2.
223
+ - Every scenario meets its minimum score.
224
+ - Critical scenarios with minimum score 3 have direct evidence, not only intent.
225
+ - `npm test` passes.
226
+ - `node ./bin/sagaz.js doctor` passes.
227
+ - Regression log entries are recorded for every failed retest.
60
228
 
61
229
  ## Regression Log
62
230
 
63
231
  ```md
64
232
  Date:
65
- Version:
66
- Scenario:
233
+ Version or commit:
234
+ Scenario ID:
235
+ Core evaluation affected:
67
236
  Score:
68
237
  Failure:
238
+ Root cause:
69
239
  Fix:
70
240
  Retest evidence:
241
+ Residual risk:
242
+ Owner:
243
+ ```
244
+
245
+ ## Evidence Template
246
+
247
+ ```md
248
+ Date:
249
+ Evaluator:
250
+ Platform:
251
+ Scenario ID:
252
+ Prompt:
253
+ Selected workflow or focus:
254
+ Loaded files:
255
+ Actions taken:
256
+ Checks run:
257
+ Core scores:
258
+ Scenario score:
259
+ Evidence links or file paths:
260
+ Open risks:
261
+ Release decision:
71
262
  ```