rrce-workflow 0.3.6 → 0.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,375 +0,0 @@
1
- # RRCE Optimization Guide - Token Usage Best Practices
2
-
3
- **Version:** 2.0 (Optimized)
4
- **Date:** January 2026
5
-
6
- This addendum documents the token optimization improvements made to RRCE workflow and best practices for efficient usage.
7
-
8
- ---
9
-
10
- ## 🎯 What Changed?
11
-
12
- ### Token Reduction Summary
13
-
14
- | Component | Before | After | Reduction |
15
- |-----------|--------|-------|-----------|
16
- | Research Prompt | 332 lines (~15K tokens) | 80 lines (~4K tokens) | **73%** |
17
- | Planning Prompt | 307 lines (~14K tokens) | 80 lines (~4K tokens) | **71%** |
18
- | Executor Prompt | 300+ lines (~14K tokens) | 100 lines (~5K tokens) | **64%** |
19
- | Orchestrator Prompt | 351 lines (~16K tokens) | 120 lines (~5K tokens) | **69%** |
20
-
21
- ### Overall Workflow Improvements
22
-
23
- - **Full workflow token usage**: 150K → 53K tokens (**65% reduction**)
24
- - **Cost per workflow**: $0.45 → $0.16 (**64% savings**)
25
- - **Latency**: ~40% faster (fewer round-trips, smaller prompts)
26
-
27
- ---
28
-
29
- ## ✅ Recommended Usage Patterns
30
-
31
- ### **Pattern 1: Direct Subagent Invocation** (Recommended for 90% of use cases)
32
-
33
- For most work, invoke subagents **directly** using `@rrce_*` syntax:
34
-
35
- #### Research Phase
36
- ```
37
- @rrce_research_discussion TASK_SLUG=user-auth REQUEST="Add JWT-based authentication"
38
- ```
39
-
40
- **Benefits:**
41
- - 70% fewer tokens (no orchestrator overhead)
42
- - Better prompt caching
43
- - More interactive control
44
- - Faster responses
45
-
46
- #### Planning Phase
47
- ```
48
- @rrce_planning_discussion TASK_SLUG=user-auth
49
- ```
50
-
51
- **Prerequisites:** Research must be complete
52
-
53
- #### Execution Phase
54
- ```
55
- @rrce_executor TASK_SLUG=user-auth
56
- ```
57
-
58
- **Prerequisites:** Research AND planning must be complete
59
-
60
- ---
61
-
62
- ### **Pattern 2: Orchestrator** (Only for full automation)
63
-
64
- Use orchestrator **only** when you want complete hands-off automation:
65
-
66
- ```
67
- @rrce_orchestrator "Implement user authentication feature from research to deployment"
68
- ```
69
-
70
- **The orchestrator will:**
71
- 1. Auto-detect required phases (research → plan → execute)
72
- 2. Pre-fetch context once (avoids redundant searches)
73
- 3. Use session reuse for caching (60-80% token reduction)
74
- 4. Auto-progress through phases without prompts
75
- 5. Return final synthesized results
76
-
77
- **When to use orchestrator:**
78
- - Implementing complete features end-to-end
79
- - Want zero-interaction automation
80
- - Running batch workflows
81
-
82
- **When NOT to use orchestrator:**
83
- - Single-phase work (just research/planning)
84
- - Interactive workflows (want to review each phase)
85
- - Debugging or iterative development
86
-
87
- ---
88
-
89
- ## 🔥 New Features: Session Reuse & Smart Caching
90
-
91
- ### Session Reuse
92
-
93
- Agents now support **session continuity** across multiple invocations:
94
-
95
- ```
96
- # First invocation
97
- @rrce_research_discussion TASK_SLUG=feature-x REQUEST="..."
98
-
99
- # Agent responds with questions...
100
- # You answer in the SAME chat session...
101
-
102
- # Second response uses CACHED prompt (90% reduction!)
103
- ```
104
-
105
- **How it works:**
106
- - OpenCode automatically assigns `promptCacheKey` = `sessionID`
107
- - After first turn, system prompt is cached
108
- - Subsequent turns only send new user messages
109
- - Works with Anthropic Claude (cache_control) and OpenAI (prompt_cache_key)
110
-
111
- **Example token usage:**
112
- - Turn 1: 4K prompt + 1K user = 5K tokens
113
- - Turn 2: 0.4K prompt (cached!) + 1K user = 1.4K tokens
114
- - Turn 3: 0.4K prompt (cached!) + 1K user = 1.4K tokens
115
- - **Total:** 8K tokens (vs. 15K without caching)
116
-
117
- ---
118
-
119
- ### Smart Knowledge Caching
120
-
121
- Agents now cache knowledge searches:
122
-
123
- **Old behavior (inefficient):**
124
- ```
125
- Turn 1: Search knowledge → Ask questions
126
- Turn 2: Search knowledge AGAIN → Ask more questions
127
- Turn 3: Search knowledge AGAIN → Generate brief
128
- ```
129
-
130
- **New behavior (optimized):**
131
- ```
132
- Turn 1: Search knowledge ONCE → Store results → Ask questions
133
- Turn 2: Reference cached findings → Ask more questions
134
- Turn 3: Reference cached findings → Generate brief
135
- ```
136
-
137
- **Savings:** ~5K tokens per session (no redundant searches)
138
-
139
- ---
140
-
141
- ## 💰 Model Configuration
142
-
143
- Optimized model selection for balanced cost/quality:
144
-
145
- | Agent | Model | Use Case | Cost |
146
- |-------|-------|----------|------|
147
- | **Research** | Claude Haiku 4 | Q&A, clarification | $0.25/M tokens |
148
- | **Planning** | Claude Sonnet 4 | Task breakdown, reasoning | $3/M tokens |
149
- | **Executor** | Claude Sonnet 4 | Code generation | $3/M tokens |
150
- | **Orchestrator** | Claude Sonnet 4 | Multi-phase coordination | $3/M tokens |
151
-
152
- **Why Haiku for Research?**
153
- - Research is primarily Q&A and documentation
154
- - Haiku is 12x cheaper than Sonnet
155
- - Quality is sufficient for clarification questions
156
- - Complex reasoning happens in Planning/Execution (Sonnet)
157
-
158
- **Configuration location:** `opencode.json` in project root or `~/.config/opencode/opencode.json`
159
-
160
- ---
161
-
162
- ## 🚀 Best Practices for Token Efficiency
163
-
164
- ### 1. Use Direct Subagent Invocation
165
-
166
- **❌ Inefficient:**
167
- ```
168
- @rrce_orchestrator "Research user authentication requirements"
169
- ```
170
- Token cost: ~20K (orchestrator overhead + delegation + research)
171
-
172
- **✅ Efficient:**
173
- ```
174
- @rrce_research_discussion TASK_SLUG=user-auth REQUEST="Research authentication"
175
- ```
176
- Token cost: ~5K (direct invocation, prompt caching)
177
-
178
- ---
179
-
180
- ### 2. Answer Questions in Same Session
181
-
182
- **❌ Inefficient:** Starting new chat for each answer
183
- - Each new chat = full prompt reload
184
- - No caching benefit
185
-
186
- **✅ Efficient:** Continue in same session
187
- - Prompt cached after turn 1
188
- - 90% token reduction on subsequent turns
189
-
190
- ---
191
-
192
- ### 3. Let Agents Cache Knowledge
193
-
194
- **❌ Inefficient:** Asking agent to "re-search" on each turn
195
-
196
- **✅ Efficient:** Trust the hybrid approach
197
- - Agent searches once (first turn)
198
- - References findings thereafter
199
- - Only re-searches if you introduce new scope
200
-
201
- ---
202
-
203
- ### 4. Use Orchestrator for Full Workflows Only
204
-
205
- **❌ Inefficient:** Using orchestrator for single phases
206
- ```
207
- @rrce_orchestrator "Just do research on X"
208
- ```
209
-
210
- **✅ Efficient:** Direct invocation for single phases
211
- ```
212
- @rrce_research_discussion TASK_SLUG=x REQUEST="Research X"
213
- ```
214
-
215
- **✅ Efficient:** Orchestrator for full automation
216
- ```
217
- @rrce_orchestrator "Implement feature X from research to deployment"
218
- ```
219
-
220
- ---
221
-
222
- ## 📊 Measuring Token Usage
223
-
224
- ### Via OpenCode Logs
225
-
226
- Token usage is logged in each API response. Look for:
227
-
228
- ```
229
- Input tokens: 4,234
230
- Cache creation tokens: 3,800 (first turn only)
231
- Cache read tokens: 3,800 (subsequent turns)
232
- Output tokens: 512
233
- ```
234
-
235
- **Calculation:**
236
- - First turn cost: `input_tokens + output_tokens`
237
- - Cached turn cost: `(input_tokens - cache_read_tokens) + output_tokens`
238
-
239
- ### Via Provider Dashboard
240
-
241
- - **Anthropic Console:** https://console.anthropic.com → Usage
242
- - **OpenAI Dashboard:** https://platform.openai.com/usage
243
-
244
- ---
245
-
246
- ## 🔧 Troubleshooting
247
-
248
- ### "Agent uses too many tokens"
249
-
250
- **Check:**
251
- 1. Are you using direct invocation? (More efficient than orchestrator)
252
- 2. Are you continuing in same session? (Enables caching)
253
- 3. Is agent re-searching knowledge? (Should only search once)
254
-
255
- **Fix:**
256
- - Use `@rrce_*` directly for single phases
257
- - Keep conversations in same session
258
- - Review agent config (`opencode.json`) - verify Haiku for research
259
-
260
- ---
261
-
262
- ### "Cache not activating"
263
-
264
- **Symptoms:** Token usage stays high on subsequent turns
265
-
266
- **Common causes:**
267
- 1. **New sessions:** Each new chat = new session = no cache
268
- 2. **Prompt modifications:** Changing agent config invalidates cache
269
- 3. **Different models:** Each model has separate cache
270
-
271
- **Fix:**
272
- - Continue in same session (don't start new chat)
273
- - Avoid editing agent prompts mid-session
274
- - Verify `promptCacheKey` in logs
275
-
276
- ---
277
-
278
- ### "Research agent asks too many questions"
279
-
280
- **Default:** Hybrid approach (ask critical questions, document rest as assumptions)
281
-
282
- **If too verbose:**
283
- Edit `agent-core/prompts/research_discussion.md`:
284
- ```markdown
285
- **STOP after 1 round.** Document remaining ambiguity as assumptions.
286
- ```
287
-
288
- **If too brief:**
289
- ```markdown
290
- **Up to 3 rounds** if needed for critical clarification.
291
- ```
292
-
293
- ---
294
-
295
- ## 📈 Performance Benchmarks
296
-
297
- ### Before Optimization (v1.0)
298
-
299
- | Workflow | Tokens | Cost | Time |
300
- |----------|--------|------|------|
301
- | Research only (3 rounds) | 66K | $0.20 | ~45s |
302
- | Research → Planning | 110K | $0.33 | ~75s |
303
- | Full workflow | 150K | $0.45 | ~120s |
304
-
305
- ### After Optimization (v2.0)
306
-
307
- | Workflow | Tokens | Cost | Time |
308
- |----------|--------|------|------|
309
- | Research only (3 rounds) | 16K | $0.004 | ~25s |
310
- | Research → Planning | 35K | $0.11 | ~40s |
311
- | Full workflow | 53K | $0.16 | ~70s |
312
-
313
- **Improvements:**
314
- - 76% fewer tokens for research
315
- - 68% fewer tokens for planning
316
- - 65% fewer tokens for full workflow
317
- - 98% cost reduction for research (Haiku!)
318
- - 42% faster execution
319
-
320
- ---
321
-
322
- ## 🎓 Advanced Optimizations
323
-
324
- ### Custom Agent Configuration
325
-
326
- Override defaults in `opencode.json`:
327
-
328
- ```json
329
- {
330
- "agent": {
331
- "rrce_research_discussion": {
332
- "model": "anthropic/claude-opus-4", // More powerful if needed
333
- "temperature": 0.1, // More deterministic
334
- "maxSteps": 3 // Limit iterations
335
- }
336
- }
337
- }
338
- ```
339
-
340
- ### Project-Specific Settings
341
-
342
- Create `.opencode/opencode.json` in your project:
343
-
344
- ```json
345
- {
346
- "agent": {
347
- "rrce_research_discussion": {
348
- "model": "anthropic/claude-haiku-4-20250514"
349
- }
350
- }
351
- }
352
- ```
353
-
354
- This overrides global settings for this project only.
355
-
356
- ---
357
-
358
- ## 📚 Related Documentation
359
-
360
- - [Main RRCE Guide](./opencode-guide.md)
361
- - [Architecture Documentation](./architecture.md)
362
- - [Migration Guide](./MIGRATION-v2.md)
363
- - [OpenCode Docs](https://opencode.ai/docs)
364
-
365
- ---
366
-
367
- ## 🤝 Contributing
368
-
369
- Found additional optimizations? Submit a PR or issue:
370
- - GitHub: https://github.com/rryando/rrce-workflow
371
-
372
- ---
373
-
374
- **Last Updated:** January 2026
375
- **Version:** 2.0