@fernado03/zoo-flow 0.5.2 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -11
- package/bin/zoo-flow.js +7 -6
- package/package.json +1 -1
- package/templates/full/.roo/commands/explore.md +13 -13
- package/templates/full/.roo/commands/scaffold-context.md +13 -13
- package/templates/full/.roo/commands/setup-matt-pocock-skills.md +8 -8
- package/templates/full/.roo/commands/update-docs.md +22 -22
- package/templates/full/.roo/rules/04-context-economy.md +29 -29
- package/templates/full/.roo/rules-custom-orchestrator/00-routing.md +69 -69
- package/templates/full/.roo/rules-custom-orchestrator/01-delegation-message.md +62 -62
- package/templates/full/.roo/skills/engineering/grill-with-docs/CONTEXT-FORMAT.md +61 -61
- package/templates/full/.roo/skills/engineering/prototype/SKILL.md +37 -37
- package/templates/full/.roo/skills/engineering/scaffold-context/SKILL.md +152 -152
- package/templates/full/.roo/skills/engineering/to-prd/SKILL.md +57 -57
- package/templates/full/.roomodes +47 -47
- package/templates/full/.zoo-flow/CONTEXT.md +8 -8
- package/templates/full/.zoo-flow/START_HERE.md +61 -61
- package/templates/full/.zoo-flow/docs/adr/0001-record-architecture-decisions.md +22 -22
- package/templates/full/.zoo-flow/evals/no-regression-checklist.md +26 -26
- package/templates/full/.zoo-flow/evals/routing-cases.md +203 -203
|
@@ -1,203 +1,203 @@
|
|
|
1
|
-
# Routing Eval Cases
|
|
2
|
-
|
|
3
|
-
Use these cases to check whether the orchestrator chooses the expected workflow.
|
|
4
|
-
|
|
5
|
-
In every case:
|
|
6
|
-
|
|
7
|
-
- The user did **not** type a slash command.
|
|
8
|
-
- A free-form request is never self-approving. The orchestrator proposes, then waits.
|
|
9
|
-
- Slash commands, mode names, and executable routing text must not appear in clickable suggestions.
|
|
10
|
-
- Slash commands are optional. The user should never be told to type one to use Zoo Flow.
|
|
11
|
-
|
|
12
|
-
## Case 1 — Tiny copy change
|
|
13
|
-
|
|
14
|
-
User:
|
|
15
|
-
"Change the Save button text to Submit."
|
|
16
|
-
|
|
17
|
-
Expected:
|
|
18
|
-
Recommend the small implementation workflow.
|
|
19
|
-
|
|
20
|
-
Must not:
|
|
21
|
-
- Route to feature.
|
|
22
|
-
- Read architecture docs by default.
|
|
23
|
-
- Ask the user to type a slash command.
|
|
24
|
-
|
|
25
|
-
## Case 2 — Unknown crash
|
|
26
|
-
|
|
27
|
-
User:
|
|
28
|
-
"Checkout randomly crashes after payment. It used to work."
|
|
29
|
-
|
|
30
|
-
Expected:
|
|
31
|
-
Recommend the diagnosis workflow.
|
|
32
|
-
|
|
33
|
-
Must:
|
|
34
|
-
- Reproduce before hypothesizing.
|
|
35
|
-
- Present hypotheses before fix.
|
|
36
|
-
|
|
37
|
-
## Case 3 — New capability
|
|
38
|
-
|
|
39
|
-
User:
|
|
40
|
-
"Add team invitations with email invites and pending invite states."
|
|
41
|
-
|
|
42
|
-
Expected:
|
|
43
|
-
Recommend feature planning.
|
|
44
|
-
|
|
45
|
-
Must:
|
|
46
|
-
- Plan before implementation.
|
|
47
|
-
- Use phase gates.
|
|
48
|
-
|
|
49
|
-
## Case 4 — Structural cleanup
|
|
50
|
-
|
|
51
|
-
User:
|
|
52
|
-
"The auth module is getting hard to change. I want to decouple provider-specific logic."
|
|
53
|
-
|
|
54
|
-
Expected:
|
|
55
|
-
Recommend refactor workflow.
|
|
56
|
-
|
|
57
|
-
Must:
|
|
58
|
-
- Preserve behavior.
|
|
59
|
-
- Explore architecture candidates before implementation.
|
|
60
|
-
|
|
61
|
-
## Case 5 — Unknown area
|
|
62
|
-
|
|
63
|
-
User:
|
|
64
|
-
"I need to change billing but I don't know where that logic lives."
|
|
65
|
-
|
|
66
|
-
Expected:
|
|
67
|
-
Recommend exploration first.
|
|
68
|
-
|
|
69
|
-
Must:
|
|
70
|
-
- Produce a map before choosing feature/fix/refactor.
|
|
71
|
-
|
|
72
|
-
## Case 6 — Known mechanical fix
|
|
73
|
-
|
|
74
|
-
User:
|
|
75
|
-
"The env var name changed from API_KEY to ZOO_API_KEY. Update the config loader."
|
|
76
|
-
|
|
77
|
-
Expected:
|
|
78
|
-
Recommend small implementation workflow.
|
|
79
|
-
|
|
80
|
-
Must not:
|
|
81
|
-
- Route to diagnosis.
|
|
82
|
-
- Route to feature.
|
|
83
|
-
|
|
84
|
-
## Case 7 — TDD with clear interface
|
|
85
|
-
|
|
86
|
-
User:
|
|
87
|
-
"Add a slugify helper for article URLs. I want it test-first."
|
|
88
|
-
|
|
89
|
-
Expected:
|
|
90
|
-
Recommend TDD workflow.
|
|
91
|
-
|
|
92
|
-
Must:
|
|
93
|
-
- Write the failing test first.
|
|
94
|
-
- Confirm the public interface (input, output, edge cases) is clear before coding.
|
|
95
|
-
|
|
96
|
-
## Case 8 — Stale documentation
|
|
97
|
-
|
|
98
|
-
User:
|
|
99
|
-
"The ARCHITECTURE.md file describes a checkout flow we removed last quarter. Bring it in line with the code."
|
|
100
|
-
|
|
101
|
-
Expected:
|
|
102
|
-
Recommend the documentation update workflow.
|
|
103
|
-
|
|
104
|
-
Must:
|
|
105
|
-
- Audit code first, then make surgical doc edits.
|
|
106
|
-
- Not rewrite the file wholesale.
|
|
107
|
-
|
|
108
|
-
## Case 9 — Ready to commit
|
|
109
|
-
|
|
110
|
-
User:
|
|
111
|
-
"I finished the small tweak. Please commit it and add a journal entry."
|
|
112
|
-
|
|
113
|
-
Expected:
|
|
114
|
-
Recommend the commit + journal workflow.
|
|
115
|
-
|
|
116
|
-
Must:
|
|
117
|
-
- Propose a Conventional Commit message and wait for approval before running `git commit` or `git push`.
|
|
118
|
-
|
|
119
|
-
## Case 10 — Issue triage
|
|
120
|
-
|
|
121
|
-
User:
|
|
122
|
-
"We have 30 incoming bug reports from the support team. Triage them into the issue tracker."
|
|
123
|
-
|
|
124
|
-
Expected:
|
|
125
|
-
Recommend the triage workflow.
|
|
126
|
-
|
|
127
|
-
Must:
|
|
128
|
-
- Ask before publishing, labeling, closing, or making any irreversible tracker change.
|
|
129
|
-
|
|
130
|
-
## Case 11 — Throwaway design probe
|
|
131
|
-
|
|
132
|
-
User:
|
|
133
|
-
"I'm not sure if the new search ranking should run inline or in a queue. Can we try both and see?"
|
|
134
|
-
|
|
135
|
-
Expected:
|
|
136
|
-
Recommend a throwaway prototype.
|
|
137
|
-
|
|
138
|
-
Must:
|
|
139
|
-
- Keep the work on a prototype branch or `.scratch/prototypes/<slug>/` so it is clearly throwaway.
|
|
140
|
-
- Resolve the design question, not commit to a real implementation.
|
|
141
|
-
|
|
142
|
-
## Case 12 — Explicit slash command
|
|
143
|
-
|
|
144
|
-
User:
|
|
145
|
-
"/tweak rename the cancel button to close."
|
|
146
|
-
|
|
147
|
-
Expected:
|
|
148
|
-
Route immediately. Do not second-guess the explicit command.
|
|
149
|
-
|
|
150
|
-
Must not:
|
|
151
|
-
- Repropose the workflow as a numbered choice.
|
|
152
|
-
- Treat the explicit command as if approval were still pending.
|
|
153
|
-
|
|
154
|
-
## Case 13 — Ambiguous "fix" for a known mechanical change
|
|
155
|
-
|
|
156
|
-
User:
|
|
157
|
-
"Fix the typo in the cancel-button label and update the aria-label to match."
|
|
158
|
-
|
|
159
|
-
Expected:
|
|
160
|
-
Recommend the small implementation workflow, not diagnosis.
|
|
161
|
-
|
|
162
|
-
Must:
|
|
163
|
-
- Recognize the cause and target are known.
|
|
164
|
-
- Not run a full diagnosis loop for a one-line copy fix.
|
|
165
|
-
|
|
166
|
-
## Case — Free-form request must not expose slash commands
|
|
167
|
-
|
|
168
|
-
User:
|
|
169
|
-
"Change the Save button text to Submit."
|
|
170
|
-
|
|
171
|
-
Expected:
|
|
172
|
-
Recommend the small implementation workflow in plain language.
|
|
173
|
-
|
|
174
|
-
Good response:
|
|
175
|
-
"This looks like a small implementation change because the target is known and the risk is low.
|
|
176
|
-
|
|
177
|
-
1. Make the small implementation change
|
|
178
|
-
2. Explore the area first"
|
|
179
|
-
|
|
180
|
-
Must not:
|
|
181
|
-
- Say "use `/tweak`" in the user-facing recommendation.
|
|
182
|
-
- Offer `/tweak` as a selectable option.
|
|
183
|
-
- Tell the user to type a slash command.
|
|
184
|
-
|
|
185
|
-
Allowed:
|
|
186
|
-
- Internally delegate using `/tweak` after the user approves.
|
|
187
|
-
- Mention slash commands only if the user explicitly asks for command syntax.
|
|
188
|
-
|
|
189
|
-
## Case — Deep inspection must not route to Ask mode
|
|
190
|
-
|
|
191
|
-
User:
|
|
192
|
-
"Do you think these changes are beneficial or not? Inspect deeply if it affects the system."
|
|
193
|
-
|
|
194
|
-
Expected:
|
|
195
|
-
Recommend analysis/review through the architecture/inspection workflow.
|
|
196
|
-
|
|
197
|
-
Delegation target after approval:
|
|
198
|
-
`system-architect`
|
|
199
|
-
|
|
200
|
-
Must not:
|
|
201
|
-
- Delegate to Ask mode.
|
|
202
|
-
- Delegate to default Architect mode.
|
|
203
|
-
- Use any mode other than `system-architect` or `code-tweaker`.
|
|
1
|
+
# Routing Eval Cases
|
|
2
|
+
|
|
3
|
+
Use these cases to check whether the orchestrator chooses the expected workflow.
|
|
4
|
+
|
|
5
|
+
In every case:
|
|
6
|
+
|
|
7
|
+
- The user did **not** type a slash command.
|
|
8
|
+
- A free-form request is never self-approving. The orchestrator proposes, then waits.
|
|
9
|
+
- Slash commands, mode names, and executable routing text must not appear in clickable suggestions.
|
|
10
|
+
- Slash commands are optional. The user should never be told to type one to use Zoo Flow.
|
|
11
|
+
|
|
12
|
+
## Case 1 — Tiny copy change
|
|
13
|
+
|
|
14
|
+
User:
|
|
15
|
+
"Change the Save button text to Submit."
|
|
16
|
+
|
|
17
|
+
Expected:
|
|
18
|
+
Recommend the small implementation workflow.
|
|
19
|
+
|
|
20
|
+
Must not:
|
|
21
|
+
- Route to feature.
|
|
22
|
+
- Read architecture docs by default.
|
|
23
|
+
- Ask the user to type a slash command.
|
|
24
|
+
|
|
25
|
+
## Case 2 — Unknown crash
|
|
26
|
+
|
|
27
|
+
User:
|
|
28
|
+
"Checkout randomly crashes after payment. It used to work."
|
|
29
|
+
|
|
30
|
+
Expected:
|
|
31
|
+
Recommend the diagnosis workflow.
|
|
32
|
+
|
|
33
|
+
Must:
|
|
34
|
+
- Reproduce before hypothesizing.
|
|
35
|
+
- Present hypotheses before fix.
|
|
36
|
+
|
|
37
|
+
## Case 3 — New capability
|
|
38
|
+
|
|
39
|
+
User:
|
|
40
|
+
"Add team invitations with email invites and pending invite states."
|
|
41
|
+
|
|
42
|
+
Expected:
|
|
43
|
+
Recommend feature planning.
|
|
44
|
+
|
|
45
|
+
Must:
|
|
46
|
+
- Plan before implementation.
|
|
47
|
+
- Use phase gates.
|
|
48
|
+
|
|
49
|
+
## Case 4 — Structural cleanup
|
|
50
|
+
|
|
51
|
+
User:
|
|
52
|
+
"The auth module is getting hard to change. I want to decouple provider-specific logic."
|
|
53
|
+
|
|
54
|
+
Expected:
|
|
55
|
+
Recommend refactor workflow.
|
|
56
|
+
|
|
57
|
+
Must:
|
|
58
|
+
- Preserve behavior.
|
|
59
|
+
- Explore architecture candidates before implementation.
|
|
60
|
+
|
|
61
|
+
## Case 5 — Unknown area
|
|
62
|
+
|
|
63
|
+
User:
|
|
64
|
+
"I need to change billing but I don't know where that logic lives."
|
|
65
|
+
|
|
66
|
+
Expected:
|
|
67
|
+
Recommend exploration first.
|
|
68
|
+
|
|
69
|
+
Must:
|
|
70
|
+
- Produce a map before choosing feature/fix/refactor.
|
|
71
|
+
|
|
72
|
+
## Case 6 — Known mechanical fix
|
|
73
|
+
|
|
74
|
+
User:
|
|
75
|
+
"The env var name changed from API_KEY to ZOO_API_KEY. Update the config loader."
|
|
76
|
+
|
|
77
|
+
Expected:
|
|
78
|
+
Recommend small implementation workflow.
|
|
79
|
+
|
|
80
|
+
Must not:
|
|
81
|
+
- Route to diagnosis.
|
|
82
|
+
- Route to feature.
|
|
83
|
+
|
|
84
|
+
## Case 7 — TDD with clear interface
|
|
85
|
+
|
|
86
|
+
User:
|
|
87
|
+
"Add a slugify helper for article URLs. I want it test-first."
|
|
88
|
+
|
|
89
|
+
Expected:
|
|
90
|
+
Recommend TDD workflow.
|
|
91
|
+
|
|
92
|
+
Must:
|
|
93
|
+
- Write the failing test first.
|
|
94
|
+
- Confirm the public interface (input, output, edge cases) is clear before coding.
|
|
95
|
+
|
|
96
|
+
## Case 8 — Stale documentation
|
|
97
|
+
|
|
98
|
+
User:
|
|
99
|
+
"The ARCHITECTURE.md file describes a checkout flow we removed last quarter. Bring it in line with the code."
|
|
100
|
+
|
|
101
|
+
Expected:
|
|
102
|
+
Recommend the documentation update workflow.
|
|
103
|
+
|
|
104
|
+
Must:
|
|
105
|
+
- Audit code first, then make surgical doc edits.
|
|
106
|
+
- Not rewrite the file wholesale.
|
|
107
|
+
|
|
108
|
+
## Case 9 — Ready to commit
|
|
109
|
+
|
|
110
|
+
User:
|
|
111
|
+
"I finished the small tweak. Please commit it and add a journal entry."
|
|
112
|
+
|
|
113
|
+
Expected:
|
|
114
|
+
Recommend the commit + journal workflow.
|
|
115
|
+
|
|
116
|
+
Must:
|
|
117
|
+
- Propose a Conventional Commit message and wait for approval before running `git commit` or `git push`.
|
|
118
|
+
|
|
119
|
+
## Case 10 — Issue triage
|
|
120
|
+
|
|
121
|
+
User:
|
|
122
|
+
"We have 30 incoming bug reports from the support team. Triage them into the issue tracker."
|
|
123
|
+
|
|
124
|
+
Expected:
|
|
125
|
+
Recommend the triage workflow.
|
|
126
|
+
|
|
127
|
+
Must:
|
|
128
|
+
- Ask before publishing, labeling, closing, or making any irreversible tracker change.
|
|
129
|
+
|
|
130
|
+
## Case 11 — Throwaway design probe
|
|
131
|
+
|
|
132
|
+
User:
|
|
133
|
+
"I'm not sure if the new search ranking should run inline or in a queue. Can we try both and see?"
|
|
134
|
+
|
|
135
|
+
Expected:
|
|
136
|
+
Recommend a throwaway prototype.
|
|
137
|
+
|
|
138
|
+
Must:
|
|
139
|
+
- Keep the work on a prototype branch or `.scratch/prototypes/<slug>/` so it is clearly throwaway.
|
|
140
|
+
- Resolve the design question, not commit to a real implementation.
|
|
141
|
+
|
|
142
|
+
## Case 12 — Explicit slash command
|
|
143
|
+
|
|
144
|
+
User:
|
|
145
|
+
"/tweak rename the cancel button to close."
|
|
146
|
+
|
|
147
|
+
Expected:
|
|
148
|
+
Route immediately. Do not second-guess the explicit command.
|
|
149
|
+
|
|
150
|
+
Must not:
|
|
151
|
+
- Repropose the workflow as a numbered choice.
|
|
152
|
+
- Treat the explicit command as if approval were still pending.
|
|
153
|
+
|
|
154
|
+
## Case 13 — Ambiguous "fix" for a known mechanical change
|
|
155
|
+
|
|
156
|
+
User:
|
|
157
|
+
"Fix the typo in the cancel-button label and update the aria-label to match."
|
|
158
|
+
|
|
159
|
+
Expected:
|
|
160
|
+
Recommend the small implementation workflow, not diagnosis.
|
|
161
|
+
|
|
162
|
+
Must:
|
|
163
|
+
- Recognize the cause and target are known.
|
|
164
|
+
- Not run a full diagnosis loop for a one-line copy fix.
|
|
165
|
+
|
|
166
|
+
## Case — Free-form request must not expose slash commands
|
|
167
|
+
|
|
168
|
+
User:
|
|
169
|
+
"Change the Save button text to Submit."
|
|
170
|
+
|
|
171
|
+
Expected:
|
|
172
|
+
Recommend the small implementation workflow in plain language.
|
|
173
|
+
|
|
174
|
+
Good response:
|
|
175
|
+
"This looks like a small implementation change because the target is known and the risk is low.
|
|
176
|
+
|
|
177
|
+
1. Make the small implementation change
|
|
178
|
+
2. Explore the area first"
|
|
179
|
+
|
|
180
|
+
Must not:
|
|
181
|
+
- Say "use `/tweak`" in the user-facing recommendation.
|
|
182
|
+
- Offer `/tweak` as a selectable option.
|
|
183
|
+
- Tell the user to type a slash command.
|
|
184
|
+
|
|
185
|
+
Allowed:
|
|
186
|
+
- Internally delegate using `/tweak` after the user approves.
|
|
187
|
+
- Mention slash commands only if the user explicitly asks for command syntax.
|
|
188
|
+
|
|
189
|
+
## Case — Deep inspection must not route to Ask mode
|
|
190
|
+
|
|
191
|
+
User:
|
|
192
|
+
"Do you think these changes are beneficial or not? Inspect deeply if it affects the system."
|
|
193
|
+
|
|
194
|
+
Expected:
|
|
195
|
+
Recommend analysis/review through the architecture/inspection workflow.
|
|
196
|
+
|
|
197
|
+
Delegation target after approval:
|
|
198
|
+
`system-architect`
|
|
199
|
+
|
|
200
|
+
Must not:
|
|
201
|
+
- Delegate to Ask mode.
|
|
202
|
+
- Delegate to default Architect mode.
|
|
203
|
+
- Use any mode other than `system-architect` or `code-tweaker`.
|