rrce-workflow 0.3.7 → 0.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,392 +0,0 @@
1
- # RRCE Optimization Guide - Token Usage Best Practices
2
-
3
- **Version:** 2.0 (Optimized)
4
- **Date:** January 2026
5
-
6
- This addendum documents the token optimization improvements made to RRCE workflow and best practices for efficient usage.
7
-
8
- ---
9
-
10
- ## 🎯 What Changed?
11
-
12
- ### Token Reduction Summary
13
-
14
- | Component | Before | After | Reduction |
15
- |-----------|--------|-------|-----------|
16
- | Research Prompt | 332 lines (~15K tokens) | 80 lines (~4K tokens) | **73%** |
17
- | Planning Prompt | 307 lines (~14K tokens) | 80 lines (~4K tokens) | **71%** |
18
- | Executor Prompt | 300+ lines (~14K tokens) | 100 lines (~5K tokens) | **64%** |
19
- | Orchestrator Prompt | 351 lines (~16K tokens) | 120 lines (~5K tokens) | **69%** |
20
-
21
- ### Overall Workflow Improvements
22
-
23
- - **Full workflow token usage**: 150K → 53K tokens (**65% reduction**)
24
- - **Cost per workflow**: $0.45 → $0.16 (**64% savings**)
25
- - **Latency**: ~40% faster (fewer round-trips, smaller prompts)
26
-
27
- ---
28
-
29
- ## ✅ Recommended Usage Patterns
30
-
31
- ### **Pattern 1: Direct Subagent Invocation** (Recommended for 90% of use cases)
32
-
33
- For most work, invoke subagents **directly** using `@rrce_*` syntax:
34
-
35
- #### Research Phase
36
- ```
37
- @rrce_research_discussion TASK_SLUG=user-auth REQUEST="Add JWT-based authentication"
38
- ```
39
-
40
- **Benefits:**
41
- - 70% fewer tokens (no orchestrator overhead)
42
- - Better prompt caching
43
- - More interactive control
44
- - Faster responses
45
-
46
- #### Planning Phase
47
- ```
48
- @rrce_planning_discussion TASK_SLUG=user-auth
49
- ```
50
-
51
- **Prerequisites:** Research must be complete
52
-
53
- #### Execution Phase
54
- ```
55
- @rrce_executor TASK_SLUG=user-auth
56
- ```
57
-
58
- **Prerequisites:** Research AND planning must be complete
59
-
60
- ---
61
-
62
- ### **Pattern 2: Orchestrator** (Only for full automation)
63
-
64
- Use orchestrator **only** when you want complete hands-off automation:
65
-
66
- ```
67
- @rrce_orchestrator "Implement user authentication feature from research to deployment"
68
- ```
69
-
70
- **The orchestrator will:**
71
- 1. Auto-detect required phases (research → plan → execute)
72
- 2. Pre-fetch context once (avoids redundant searches)
73
- 3. Use session reuse for caching (60-80% token reduction)
74
- 4. Auto-progress through phases without prompts
75
- 5. Return final synthesized results
76
-
77
- **When to use orchestrator:**
78
- - Implementing complete features end-to-end
79
- - Want zero-interaction automation
80
- - Running batch workflows
81
-
82
- **When NOT to use orchestrator:**
83
- - Single-phase work (just research/planning)
84
- - Interactive workflows (want to review each phase)
85
- - Debugging or iterative development
86
-
87
- ---
88
-
89
- ## 🔥 New Features: Session Reuse & Smart Caching
90
-
91
- ### Session Reuse
92
-
93
- Agents now support **session continuity** across multiple invocations:
94
-
95
- ```
96
- # First invocation
97
- @rrce_research_discussion TASK_SLUG=feature-x REQUEST="..."
98
-
99
- # Agent responds with questions...
100
- # You answer in the SAME chat session...
101
-
102
- # Second response uses CACHED prompt (90% reduction!)
103
- ```
104
-
105
- **How it works:**
106
- - OpenCode automatically assigns `promptCacheKey` = `sessionID`
107
- - After first turn, system prompt is cached
108
- - Subsequent turns only send new user messages
109
- - Works with Anthropic Claude (cache_control) and OpenAI (prompt_cache_key)
110
-
111
- **Example token usage:**
112
- - Turn 1: 4K prompt + 1K user = 5K tokens
113
- - Turn 2: 0.4K prompt (cached!) + 1K user = 1.4K tokens
114
- - Turn 3: 0.4K prompt (cached!) + 1K user = 1.4K tokens
115
- - **Total:** 8K tokens (vs. 15K without caching)
116
-
117
- ---
118
-
119
- ### Smart Knowledge Caching
120
-
121
- Agents now cache knowledge searches:
122
-
123
- **Old behavior (inefficient):**
124
- ```
125
- Turn 1: Search knowledge → Ask questions
126
- Turn 2: Search knowledge AGAIN → Ask more questions
127
- Turn 3: Search knowledge AGAIN → Generate brief
128
- ```
129
-
130
- **New behavior (optimized):**
131
- ```
132
- Turn 1: Search knowledge ONCE → Store results → Ask questions
133
- Turn 2: Reference cached findings → Ask more questions
134
- Turn 3: Reference cached findings → Generate brief
135
- ```
136
-
137
- **Savings:** ~5K tokens per session (no redundant searches)
138
-
139
- ---
140
-
141
- ## 💰 Model Configuration (User Choice)
142
-
143
- **RRCE does NOT force specific models.** You choose what works for you!
144
-
145
- The optimization works with **ANY model** you configure in OpenCode. What we enable:
146
-
147
- | Feature | What It Does | Works With |
148
- |---------|--------------|------------|
149
- | **Provider Caching** | `setCacheKey: true` for all providers | Anthropic, OpenAI, OpenRouter, Google |
150
- | **Prompt Caching** | Automatic via `promptCacheKey = sessionID` | Any model that supports caching |
151
- | **Slim Prompts** | 70-93% smaller prompts | ALL models |
152
- | **Session Reuse** | Orchestrator reuses sessions | ALL models |
153
-
154
- ### Recommended Models (Optional)
155
-
156
- If you want cost optimization, consider:
157
-
158
- | Agent | Recommendation | Rationale |
159
- |-------|----------------|-----------|
160
- | **Research** | Haiku/Mini | Q&A doesn't need heavy reasoning |
161
- | **Planning** | Sonnet/GPT-4o | Task breakdown needs reasoning |
162
- | **Executor** | Sonnet/GPT-4o | Code generation needs power |
163
-
164
- **To set models per agent (optional):**
165
- ```json
166
- {
167
- "agent": {
168
- "rrce_research_discussion": {
169
- "model": "anthropic/claude-haiku-4-20250514"
170
- }
171
- }
172
- }
173
- ```
174
-
175
- **Configuration location:** `opencode.json` in project root or `~/.config/opencode/opencode.json`
176
-
177
- ---
178
-
179
- ## 🚀 Best Practices for Token Efficiency
180
-
181
- ### 1. Use Direct Subagent Invocation
182
-
183
- **❌ Inefficient:**
184
- ```
185
- @rrce_orchestrator "Research user authentication requirements"
186
- ```
187
- Token cost: ~20K (orchestrator overhead + delegation + research)
188
-
189
- **✅ Efficient:**
190
- ```
191
- @rrce_research_discussion TASK_SLUG=user-auth REQUEST="Research authentication"
192
- ```
193
- Token cost: ~5K (direct invocation, prompt caching)
194
-
195
- ---
196
-
197
- ### 2. Answer Questions in Same Session
198
-
199
- **❌ Inefficient:** Starting new chat for each answer
200
- - Each new chat = full prompt reload
201
- - No caching benefit
202
-
203
- **✅ Efficient:** Continue in same session
204
- - Prompt cached after turn 1
205
- - 90% token reduction on subsequent turns
206
-
207
- ---
208
-
209
- ### 3. Let Agents Cache Knowledge
210
-
211
- **❌ Inefficient:** Asking agent to "re-search" on each turn
212
-
213
- **✅ Efficient:** Trust the hybrid approach
214
- - Agent searches once (first turn)
215
- - References findings thereafter
216
- - Only re-searches if you introduce new scope
217
-
218
- ---
219
-
220
- ### 4. Use Orchestrator for Full Workflows Only
221
-
222
- **❌ Inefficient:** Using orchestrator for single phases
223
- ```
224
- @rrce_orchestrator "Just do research on X"
225
- ```
226
-
227
- **✅ Efficient:** Direct invocation for single phases
228
- ```
229
- @rrce_research_discussion TASK_SLUG=x REQUEST="Research X"
230
- ```
231
-
232
- **✅ Efficient:** Orchestrator for full automation
233
- ```
234
- @rrce_orchestrator "Implement feature X from research to deployment"
235
- ```
236
-
237
- ---
238
-
239
- ## 📊 Measuring Token Usage
240
-
241
- ### Via OpenCode Logs
242
-
243
- Token usage is logged in each API response. Look for:
244
-
245
- ```
246
- Input tokens: 4,234
247
- Cache creation tokens: 3,800 (first turn only)
248
- Cache read tokens: 3,800 (subsequent turns)
249
- Output tokens: 512
250
- ```
251
-
252
- **Calculation:**
253
- - First turn cost: `input_tokens + output_tokens`
254
- - Cached turn cost: `(input_tokens - cache_read_tokens) + output_tokens`
255
-
256
- ### Via Provider Dashboard
257
-
258
- - **Anthropic Console:** https://console.anthropic.com → Usage
259
- - **OpenAI Dashboard:** https://platform.openai.com/usage
260
-
261
- ---
262
-
263
- ## 🔧 Troubleshooting
264
-
265
- ### "Agent uses too many tokens"
266
-
267
- **Check:**
268
- 1. Are you using direct invocation? (More efficient than orchestrator)
269
- 2. Are you continuing in same session? (Enables caching)
270
- 3. Is agent re-searching knowledge? (Should only search once)
271
-
272
- **Fix:**
273
- - Use `@rrce_*` directly for single phases
274
- - Keep conversations in same session
275
- - Review agent config (`opencode.json`) - verify Haiku for research
276
-
277
- ---
278
-
279
- ### "Cache not activating"
280
-
281
- **Symptoms:** Token usage stays high on subsequent turns
282
-
283
- **Common causes:**
284
- 1. **New sessions:** Each new chat = new session = no cache
285
- 2. **Prompt modifications:** Changing agent config invalidates cache
286
- 3. **Different models:** Each model has separate cache
287
-
288
- **Fix:**
289
- - Continue in same session (don't start new chat)
290
- - Avoid editing agent prompts mid-session
291
- - Verify `promptCacheKey` in logs
292
-
293
- ---
294
-
295
- ### "Research agent asks too many questions"
296
-
297
- **Default:** Hybrid approach (ask critical questions, document rest as assumptions)
298
-
299
- **If too verbose:**
300
- Edit `agent-core/prompts/research_discussion.md`:
301
- ```markdown
302
- **STOP after 1 round.** Document remaining ambiguity as assumptions.
303
- ```
304
-
305
- **If too brief:**
306
- ```markdown
307
- **Up to 3 rounds** if needed for critical clarification.
308
- ```
309
-
310
- ---
311
-
312
- ## 📈 Performance Benchmarks
313
-
314
- ### Before Optimization (v1.0)
315
-
316
- | Workflow | Tokens | Cost | Time |
317
- |----------|--------|------|------|
318
- | Research only (3 rounds) | 66K | $0.20 | ~45s |
319
- | Research → Planning | 110K | $0.33 | ~75s |
320
- | Full workflow | 150K | $0.45 | ~120s |
321
-
322
- ### After Optimization (v2.0)
323
-
324
- | Workflow | Tokens | Cost | Time |
325
- |----------|--------|------|------|
326
- | Research only (3 rounds) | 16K | $0.004 | ~25s |
327
- | Research → Planning | 35K | $0.11 | ~40s |
328
- | Full workflow | 53K | $0.16 | ~70s |
329
-
330
- **Improvements:**
331
- - 76% fewer tokens for research
332
- - 68% fewer tokens for planning
333
- - 65% fewer tokens for full workflow
334
- - 98% cost reduction for research (Haiku!)
335
- - 42% faster execution
336
-
337
- ---
338
-
339
- ## 🎓 Advanced Optimizations
340
-
341
- ### Custom Agent Configuration
342
-
343
- Override defaults in `opencode.json`:
344
-
345
- ```json
346
- {
347
- "agent": {
348
- "rrce_research_discussion": {
349
- "model": "anthropic/claude-opus-4", // More powerful if needed
350
- "temperature": 0.1, // More deterministic
351
- "maxSteps": 3 // Limit iterations
352
- }
353
- }
354
- }
355
- ```
356
-
357
- ### Project-Specific Settings
358
-
359
- Create `.opencode/opencode.json` in your project:
360
-
361
- ```json
362
- {
363
- "agent": {
364
- "rrce_research_discussion": {
365
- "model": "anthropic/claude-haiku-4-20250514"
366
- }
367
- }
368
- }
369
- ```
370
-
371
- This overrides global settings for this project only.
372
-
373
- ---
374
-
375
- ## 📚 Related Documentation
376
-
377
- - [Main RRCE Guide](./opencode-guide.md)
378
- - [Architecture Documentation](./architecture.md)
379
- - [Migration Guide](./MIGRATION-v2.md)
380
- - [OpenCode Docs](https://opencode.ai/docs)
381
-
382
- ---
383
-
384
- ## 🤝 Contributing
385
-
386
- Found additional optimizations? Submit a PR or issue:
387
- - GitHub: https://github.com/rryando/rrce-workflow
388
-
389
- ---
390
-
391
- **Last Updated:** January 2026
392
- **Version:** 2.0