npm - @fernado03/zoo-flow - Versions diffs - 0.5.2 → 0.7.0 - Mend

@fernado03/zoo-flow 0.5.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (135) hide show

package/templates/full/.zoo-flow/evals/routing-cases.md CHANGED Viewed

@@ -1,203 +1,227 @@
-# Routing Eval Cases
-Use these cases to check whether the orchestrator chooses the expected workflow.
-In every case:
-- The user did **not** type a slash command.
-- A free-form request is never self-approving. The orchestrator proposes, then waits.
-- Slash commands, mode names, and executable routing text must not appear in clickable suggestions.
-- Slash commands are optional. The user should never be told to type one to use Zoo Flow.
-## Case 1 — Tiny copy change
-User:
-"Change the Save button text to Submit."
-Expected:
-Recommend the small implementation workflow.
-Must not:
-- Route to feature.
-- Read architecture docs by default.
-- Ask the user to type a slash command.
-## Case 2 — Unknown crash
-User:
-"Checkout randomly crashes after payment. It used to work."
-Expected:
-Recommend the diagnosis workflow.
-Must:
-- Reproduce before hypothesizing.
-- Present hypotheses before fix.
-## Case 3 — New capability
-User:
-"Add team invitations with email invites and pending invite states."
-Expected:
-Recommend feature planning.
-Must:
-- Plan before implementation.
-- Use phase gates.
-## Case 4 — Structural cleanup
-User:
-"The auth module is getting hard to change. I want to decouple provider-specific logic."
-Expected:
-Recommend refactor workflow.
-Must:
-- Preserve behavior.
-- Explore architecture candidates before implementation.
-## Case 5 — Unknown area
-User:
-"I need to change billing but I don't know where that logic lives."
-Expected:
-Recommend exploration first.
-Must:
-- Produce a map before choosing feature/fix/refactor.
-## Case 6 — Known mechanical fix
-User:
-"The env var name changed from API_KEY to ZOO_API_KEY. Update the config loader."
-Expected:
-Recommend small implementation workflow.
-Must not:
-- Route to diagnosis.
-- Route to feature.
-## Case 7 — TDD with clear interface
-User:
-"Add a slugify helper for article URLs. I want it test-first."
-Expected:
-Recommend TDD workflow.
-Must:
-- Write the failing test first.
-- Confirm the public interface (input, output, edge cases) is clear before coding.
-## Case 8 — Stale documentation
-User:
-"The ARCHITECTURE.md file describes a checkout flow we removed last quarter. Bring it in line with the code."
-Expected:
-Recommend the documentation update workflow.
-Must:
-- Audit code first, then make surgical doc edits.
-- Not rewrite the file wholesale.
-## Case 9 — Ready to commit
-User:
-"I finished the small tweak. Please commit it and add a journal entry."
-Expected:
-Recommend the commit + journal workflow.
-Must:
-- Propose a Conventional Commit message and wait for approval before running `git commit` or `git push`.
+# Routing Eval Cases
+Use these cases to check whether the orchestrator chooses the expected workflow.
+In every case:
+- The user did **not** type a slash command.
+- A free-form request is never self-approving. The orchestrator proposes, then waits.
+- Slash commands, mode names, and executable routing text must not appear in clickable suggestions.
+- Slash commands are optional. The user should never be told to type one to use Zoo Flow.
+## Case 1 — Tiny copy change
+User:
+"Change the Save button text to Submit."
+Expected:
+Recommend the small implementation workflow.
+Must not:
+- Route to feature.
+- Read architecture docs by default.
+- Ask the user to type a slash command.
+## Case 2 — Unknown crash
+User:
+"Checkout randomly crashes after payment. It used to work."
+Expected:
+Recommend the diagnosis workflow.
+Must:
+- Reproduce before hypothesizing.
+- Present hypotheses before fix.
+## Case 3 — New capability
+User:
+"Add team invitations with email invites and pending invite states."
+Expected:
+Recommend feature planning.
+Must:
+- Plan before implementation.
+- Use phase gates.
+## Case 4 — Structural cleanup
+User:
+"The auth module is getting hard to change. I want to decouple provider-specific logic."
+Expected:
+Recommend refactor workflow.
+Must:
+- Preserve behavior.
+- Explore architecture candidates before implementation.
+## Case 5 — Unknown area
+User:
+"I need to change billing but I don't know where that logic lives."
+Expected:
+Recommend exploration first.
+Must:
+- Produce a map before choosing feature/fix/refactor.
+## Case 6 — Known mechanical fix
+User:
+"The env var name changed from API_KEY to ZOO_API_KEY. Update the config loader."
+Expected:
+Recommend small implementation workflow.
+Must not:
+- Route to diagnosis.
+- Route to feature.
+## Case 7 — TDD with clear interface
+User:
+"Add a slugify helper for article URLs. I want it test-first."
+Expected:
+Recommend TDD workflow.
+Must:
+- Write the failing test first.
+- Confirm the public interface (input, output, edge cases) is clear before coding.
+## Case 8 — Stale documentation
+User:
+"The ARCHITECTURE.md file describes a checkout flow we removed last quarter. Bring it in line with the code."
+Expected:
+Recommend the documentation update workflow.
+Must:
+- Audit code first, then make surgical doc edits.
+- Not rewrite the file wholesale.
+## Case 9 — Ready to commit
+User:
+"I finished the small tweak. Please commit it and add a journal entry."
+Expected:
+Recommend the commit + journal workflow.
+Must:
+- Propose a Conventional Commit message and wait for approval before running `git commit` or `git push`.
+## Case 10 — Issue triage
+User:
+"We have 30 incoming bug reports from the support team. Triage them into the issue tracker."
+Expected:
+Recommend the triage workflow.
+Must:
+- Ask before publishing, labeling, closing, or making any irreversible tracker change.
+## Case 11 — Throwaway design probe
+User:
+"I'm not sure if the new search ranking should run inline or in a queue. Can we try both and see?"
+Expected:
+Recommend a throwaway prototype.
+Must:
+- Keep the work on a prototype branch or `.scratch/prototypes/<slug>/` so it is clearly throwaway.
+- Resolve the design question, not commit to a real implementation.
+## Case 12 — Explicit slash command
+User:
+"/tweak rename the cancel button to close."
+Expected:
+Route immediately. Do not second-guess the explicit command.
+Must not:
+- Repropose the workflow as a numbered choice.
+- Treat the explicit command as if approval were still pending.
+## Case 13 — Ambiguous "fix" for a known mechanical change
+User:
+"Fix the typo in the cancel-button label and update the aria-label to match."
+Expected:
+Recommend the small implementation workflow, not diagnosis.
+Must:
+- Recognize the cause and target are known.
+- Not run a full diagnosis loop for a one-line copy fix.
+## Case — Free-form request must not expose slash commands
+User:
+"Change the Save button text to Submit."
+Expected:
+Recommend the small implementation workflow in plain language.
+Good response:
+"This looks like a small implementation change because the target is known and the risk is low.
+1. Make the small implementation change
+2. Explore the area first"
+Must not:
+- Say "use `/tweak`" in the user-facing recommendation.
+- Offer `/tweak` as a selectable option.
+- Tell the user to type a slash command.
+Allowed:
+- Internally delegate using `/tweak` after the user approves.
+- Mention slash commands only if the user explicitly asks for command syntax.
+## Case — Deep inspection must not route to Ask mode
+User:
+"Do you think these changes are beneficial or not? Inspect deeply if it affects the system."
+Expected:
+Recommend analysis/review through the architecture/inspection workflow.
+Delegation target after approval:
+`system-architect`
+Must not:
+- Delegate to Ask mode.
+- Delegate to default Architect mode.
+- Use any mode other than `system-architect` or `code-tweaker`.
-## Case 10 — Issue triage
+## Case — Review
 User:
-"We have 30 incoming bug reports from the support team. Triage them into the issue tracker."
+"Review this branch before I commit it."
 Expected:
-Recommend the triage workflow.
+Recommend the review workflow.
 Must:
-- Ask before publishing, labeling, closing, or making any irreversible tracker change.
+- Route to `system-architect` after approval.
+- Report findings by severity.
-## Case 11 — Throwaway design probe
+## Case — Verification
 User:
-"I'm not sure if the new search ranking should run inline or in a queue. Can we try both and see?"
+"Run tests for this change and make sure nothing broke."
 Expected:
-Recommend a throwaway prototype.
+Recommend the verification workflow.
 Must:
-- Keep the work on a prototype branch or `.scratch/prototypes/<slug>/` so it is clearly throwaway.
-- Resolve the design question, not commit to a real implementation.
-## Case 12 — Explicit slash command
-User:
-"/tweak rename the cancel button to close."
-Expected:
-Route immediately. Do not second-guess the explicit command.
-Must not:
-- Repropose the workflow as a numbered choice.
-- Treat the explicit command as if approval were still pending.
-## Case 13 — Ambiguous "fix" for a known mechanical change
-User:
-"Fix the typo in the cancel-button label and update the aria-label to match."
-Expected:
-Recommend the small implementation workflow, not diagnosis.
-Must:
-- Recognize the cause and target are known.
-- Not run a full diagnosis loop for a one-line copy fix.
-## Case — Free-form request must not expose slash commands
-User:
-"Change the Save button text to Submit."
-Expected:
-Recommend the small implementation workflow in plain language.
-Good response:
-"This looks like a small implementation change because the target is known and the risk is low.
-1. Make the small implementation change
-2. Explore the area first"
-Must not:
-- Say "use `/tweak`" in the user-facing recommendation.
-- Offer `/tweak` as a selectable option.
-- Tell the user to type a slash command.
-Allowed:
-- Internally delegate using `/tweak` after the user approves.
-- Mention slash commands only if the user explicitly asks for command syntax.
-## Case — Deep inspection must not route to Ask mode
-User:
-"Do you think these changes are beneficial or not? Inspect deeply if it affects the system."
-Expected:
-Recommend analysis/review through the architecture/inspection workflow.
-Delegation target after approval:
-`system-architect`
-Must not:
-- Delegate to Ask mode.
-- Delegate to default Architect mode.
-- Use any mode other than `system-architect` or `code-tweaker`.
+- Route to `code-tweaker` after approval.
+- Report exact commands run and results.

package/templates/full/.zoo-flow/project-profile.json ADDED Viewed

@@ -0,0 +1,24 @@
+{
+  "schemaVersion": 1,
+  "projectShape": null,
+  "packageManager": null,
+  "issueTracker": {
+    "kind": null,
+    "project": null
+  },
+  "verification": {
+    "targetedTest": null,
+    "typecheck": null,
+    "lint": null,
+    "build": null,
+    "fullTest": null
+  },
+  "docsPolicy": {
+    "localContext": ".zoo-flow/",
+    "sharedDocs": ["AGENTS.md", "docs/adr/", "docs/architecture/"]
+  },
+  "commitPolicy": {
+    "conventionalCommits": true,
+    "journal": "docs/journal/"
+  }
+}

package/tests/fixtures/bad-routing-cases/bad-json.jsonl ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name":"bad json","user":"oops",

package/tests/fixtures/bad-routing-cases/bad-mode.jsonl ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name":"bad mode","user":"Review this","expected_workflow":"review","expected_command":"/review","expected_mode":"architect","must_require_approval":true,"must_not_include":[]}

package/tests/fixtures/bad-routing-cases/missing-command.jsonl ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name":"missing command","user":"Do nonexistent thing","expected_workflow":"small implementation","expected_command":"/does-not-exist","expected_mode":"code-tweaker","must_require_approval":true,"must_not_include":[]}

package/tests/fixtures/doctor/bad-built-in-delegation/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"bad-built-in-delegation","message":"Built-in/default delegation target"}

package/tests/fixtures/doctor/bad-mode-slug/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"bad-mode-slug","message":"uses invalid mode: architect"}

package/tests/fixtures/doctor/bad-skill-wrapper/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"bad-skill-wrapper","message":"non-canonical skill wrapper"}

package/tests/fixtures/doctor/bad-zoo-path/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"bad-zoo-path","message":"Bad pattern \".zoo/\""}

package/tests/fixtures/doctor/helper-missing-mode/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"helper-missing-mode","message":"must declare mode: system-architect"}

package/tests/fixtures/doctor/helper-not-permitted/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"helper-not-permitted","message":"does not permit documented command /diagnose"}

package/tests/fixtures/doctor/manual-good-template/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"pass","mutation":"none","message":"doctor passed"}

package/tests/fixtures/doctor/missing-command/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"missing-command","message":"missing command file"}

package/tests/fixtures/doctor/missing-roomodes/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"missing-roomodes","message":"Missing .roomodes"}

package/tests/fixtures/doctor/missing-skill/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expect":"fail","mutation":"missing-skill","message":"references missing skill"}

package/tests/fixtures/project-shapes/cli-tool/cmd/root.go ADDED Viewed

	@@ -0,0 +1 @@
1	+ package main; func main() { println("hello") }

package/tests/fixtures/project-shapes/cli-tool/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expected_shape": "cli-tool", "keywords": ["cli"]}

package/tests/fixtures/project-shapes/cli-tool/package.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name": "test-cli", "keywords": ["cli"]}

package/tests/fixtures/project-shapes/data-pipeline/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expected_shape": "data-pipeline", "keywords": ["data-pipeline"]}

package/tests/fixtures/project-shapes/data-pipeline/pipelines/invoices.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ def run_invoice_pipeline(): pass

package/tests/fixtures/project-shapes/data-pipeline/pyproject.toml ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [project]
2	+ name = "test-pipeline"

package/tests/fixtures/project-shapes/library/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expected_shape": "library", "keywords": ["library"]}

package/tests/fixtures/project-shapes/library/package.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name": "test-lib", "keywords": ["library"]}

package/tests/fixtures/project-shapes/library/src/index.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export const greet = (name: string) => `Hello, ${name}`;

package/tests/fixtures/project-shapes/monorepo/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expected_shape": "monorepo", "keywords": ["monorepo"]}

package/tests/fixtures/project-shapes/monorepo/package.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name": "test-monorepo", "keywords": ["monorepo"]}

package/tests/fixtures/project-shapes/monorepo/packages/core/index.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export const core = "core";

package/tests/fixtures/project-shapes/monorepo/packages/web/index.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export const web = "web";

package/tests/fixtures/project-shapes/serverless/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expected_shape": "serverless", "keywords": ["serverless"]}

package/tests/fixtures/project-shapes/serverless/functions/webhook.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export const handler = async () => ({ statusCode: 200 });

package/tests/fixtures/project-shapes/serverless/package.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name": "test-serverless", "keywords": ["serverless"]}

package/tests/fixtures/project-shapes/web-app/app/routes/index.tsx ADDED Viewed

	@@ -0,0 +1 @@
1	+ export default function Home() { return <div>Home</div>; }

package/tests/fixtures/project-shapes/web-app/fixture.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"expected_shape": "web-app", "keywords": ["web-app"]}

package/tests/fixtures/project-shapes/web-app/package.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"name": "test-webapp", "keywords": ["web-app"]}

package/tests/golden-transcripts/01-small-tweak-golden.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Golden: Small Tweak
+## User
+Change the Save button text to Submit.
+## Workflow
+small implementation -> /tweak -> code-tweaker
+## Expected structure
+1. Orchestrator proposes "small implementation" in plain language.
+2. User approves.
+3. Orchestrator delegates /tweak to code-tweaker.
+4. Code Tweaker reads the command, applies the change.
+5. Code Tweaker reports: files changed, what changed, status.
+6. Code Tweaker uses attempt_completion with evidence.
+## Must not include
+- /tweak in user-facing options
+- code-tweaker as a clickable choice
+- Architecture doc reads
+- Domain doc reads

package/tests/golden-transcripts/02-diagnosis-golden.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Golden: Diagnosis
+## User
+Checkout randomly crashes after payment. It used to work.
+## Workflow
+diagnosis -> /fix -> system-architect -> code-tweaker -> system-architect
+## Expected structure
+1. Orchestrator proposes "diagnosis" in plain language.
+2. User approves.
+3. Orchestrator delegates /fix to system-architect.
+4. System Architect reads domain docs, reproduces, minimizes, hypothesizes.
+5. System Architect hands to code-tweaker for fix.
+6. Code Tweaker implements, verifies.
+7. Control returns to system-architect for post-mortem.
+8. System Architect uses attempt_completion.
+## Must include
+- Reproduction before hypothesis
+- Hypothesis before fix
+- Post-mortem after fix
+## Must not include
+- Built-in mode delegation
+- /fix in user-facing suggestions

package/tests/golden-transcripts/03-verification-golden.md ADDED Viewed

@@ -0,0 +1,24 @@
+# Golden: Verification
+## User
+Run tests for this change and make sure nothing broke.
+## Workflow
+verification -> /verify -> code-tweaker
+## Expected structure
+1. Orchestrator proposes "verification" in plain language.
+2. User approves.
+3. Orchestrator delegates /verify to code-tweaker.
+4. Code Tweaker reads verify skill.
+5. Code Tweaker inspects project type and changed files.
+6. Code Tweaker picks smallest useful checks.
+7. Code Tweaker runs checks, captures output.
+8. Code Tweaker reports: verification result with status, commands run, evidence, remaining risk.
+9. Code Tweaker uses attempt_completion.
+## Evidence format
+- Status: pass | fail | partial | blocked
+- Commands run with pass/fail per command
+- Evidence: short summary of output
+- Remaining risk: what was not checked

package/tests/golden-transcripts/04-review-golden.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Golden: Review
+## User
+Review this branch before I commit it.
+## Workflow
+review -> /review -> system-architect
+## Expected structure
+1. Orchestrator proposes "review" in plain language.
+2. User approves.
+3. Orchestrator delegates /review to system-architect.
+4. System Architect reads review skill.
+5. System Architect identifies target (branch diff).
+6. System Architect reads targeted diffs.
+7. System Architect evaluates axes: standards, spec, security/risk.
+8. System Architect reports findings by severity.
+9. System Architect ends with canonical result line.
+10. System Architect uses attempt_completion.
+## Result line
+- Review result: approve | approve with nits | changes requested | blocked
+## Must include
+- Severity-ordered findings
+- Security/Risk axis when change touches auth/payments/PII/data