npm - rrce-workflow - Versions diffs - 0.3.7 → 0.3.8 - Mend

rrce-workflow 0.3.7 → 0.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/agent-core/prompts/executor.md +80 -65
package/agent-core/prompts/orchestrator.md +162 -112
package/agent-core/prompts/planning_discussion.md +67 -58
package/agent-core/prompts/research_discussion.md +74 -47
package/package.json +1 -1
package/docs/MIGRATION-v2.md +0 -427
package/docs/TOKEN-OPTIMIZATION-README.md +0 -231
package/docs/architecture.md +0 -482
package/docs/opencode-guide-optimization-addendum.md +0 -392
package/docs/opencode-guide.md +0 -631

package/docs/opencode-guide-optimization-addendum.md DELETED Viewed

@@ -1,392 +0,0 @@
-# RRCE Optimization Guide - Token Usage Best Practices
-**Version:** 2.0 (Optimized)
-**Date:** January 2026
-This addendum documents the token optimization improvements made to RRCE workflow and best practices for efficient usage.
----
-## 🎯 What Changed?
-### Token Reduction Summary
-| Component | Before | After | Reduction |
-|-----------|--------|-------|-----------|
-| Research Prompt | 332 lines (~15K tokens) | 80 lines (~4K tokens) | **73%** |
-| Planning Prompt | 307 lines (~14K tokens) | 80 lines (~4K tokens) | **71%** |
-| Executor Prompt | 300+ lines (~14K tokens) | 100 lines (~5K tokens) | **64%** |
-| Orchestrator Prompt | 351 lines (~16K tokens) | 120 lines (~5K tokens) | **69%** |
-### Overall Workflow Improvements
-- **Full workflow token usage**: 150K → 53K tokens (**65% reduction**)
-- **Cost per workflow**: $0.45 → $0.16 (**64% savings**)
-- **Latency**: ~40% faster (fewer round-trips, smaller prompts)
----
-## ✅ Recommended Usage Patterns
-### **Pattern 1: Direct Subagent Invocation** (Recommended for 90% of use cases)
-For most work, invoke subagents **directly** using `@rrce_*` syntax:
-#### Research Phase
-```
-@rrce_research_discussion TASK_SLUG=user-auth REQUEST="Add JWT-based authentication"
-```
-**Benefits:**
-- 70% fewer tokens (no orchestrator overhead)
-- Better prompt caching
-- More interactive control
-- Faster responses
-#### Planning Phase
-```
-@rrce_planning_discussion TASK_SLUG=user-auth
-```
-**Prerequisites:** Research must be complete
-#### Execution Phase
-```
-@rrce_executor TASK_SLUG=user-auth
-```
-**Prerequisites:** Research AND planning must be complete
----
-### **Pattern 2: Orchestrator** (Only for full automation)
-Use orchestrator **only** when you want complete hands-off automation:
-```
-@rrce_orchestrator "Implement user authentication feature from research to deployment"
-```
-**The orchestrator will:**
-1. Auto-detect required phases (research → plan → execute)
-2. Pre-fetch context once (avoids redundant searches)
-3. Use session reuse for caching (60-80% token reduction)
-4. Auto-progress through phases without prompts
-5. Return final synthesized results
-**When to use orchestrator:**
-- Implementing complete features end-to-end
-- Want zero-interaction automation
-- Running batch workflows
-**When NOT to use orchestrator:**
-- Single-phase work (just research/planning)
-- Interactive workflows (want to review each phase)
-- Debugging or iterative development
----
-## 🔥 New Features: Session Reuse & Smart Caching
-### Session Reuse
-Agents now support **session continuity** across multiple invocations:
-```
-# First invocation
-@rrce_research_discussion TASK_SLUG=feature-x REQUEST="..."
-# Agent responds with questions...
-# You answer in the SAME chat session...
-# Second response uses CACHED prompt (90% reduction!)
-```
-**How it works:**
-- OpenCode automatically assigns `promptCacheKey` = `sessionID`
-- After first turn, system prompt is cached
-- Subsequent turns only send new user messages
-- Works with Anthropic Claude (cache_control) and OpenAI (prompt_cache_key)
-**Example token usage:**
-- Turn 1: 4K prompt + 1K user = 5K tokens
-- Turn 2: 0.4K prompt (cached!) + 1K user = 1.4K tokens
-- Turn 3: 0.4K prompt (cached!) + 1K user = 1.4K tokens
-- **Total:** 8K tokens (vs. 15K without caching)
----
-### Smart Knowledge Caching
-Agents now cache knowledge searches:
-**Old behavior (inefficient):**
-```
-Turn 1: Search knowledge → Ask questions
-Turn 2: Search knowledge AGAIN → Ask more questions
-Turn 3: Search knowledge AGAIN → Generate brief
-```
-**New behavior (optimized):**
-```
-Turn 1: Search knowledge ONCE → Store results → Ask questions
-Turn 2: Reference cached findings → Ask more questions
-Turn 3: Reference cached findings → Generate brief
-```
-**Savings:** ~5K tokens per session (no redundant searches)
----
-## 💰 Model Configuration (User Choice)
-**RRCE does NOT force specific models.** You choose what works for you!
-The optimization works with **ANY model** you configure in OpenCode. What we enable:
-| Feature | What It Does | Works With |
-|---------|--------------|------------|
-| **Provider Caching** | `setCacheKey: true` for all providers | Anthropic, OpenAI, OpenRouter, Google |
-| **Prompt Caching** | Automatic via `promptCacheKey = sessionID` | Any model that supports caching |
-| **Slim Prompts** | 70-93% smaller prompts | ALL models |
-| **Session Reuse** | Orchestrator reuses sessions | ALL models |
-### Recommended Models (Optional)
-If you want cost optimization, consider:
-| Agent | Recommendation | Rationale |
-|-------|----------------|-----------|
-| **Research** | Haiku/Mini | Q&A doesn't need heavy reasoning |
-| **Planning** | Sonnet/GPT-4o | Task breakdown needs reasoning |
-| **Executor** | Sonnet/GPT-4o | Code generation needs power |
-**To set models per agent (optional):**
-```json
-{
-  "agent": {
-    "rrce_research_discussion": {
-      "model": "anthropic/claude-haiku-4-20250514"
-    }
-  }
-}
-```
-**Configuration location:** `opencode.json` in project root or `~/.config/opencode/opencode.json`
----
-## 🚀 Best Practices for Token Efficiency
-### 1. Use Direct Subagent Invocation
-**❌ Inefficient:**
-```
-@rrce_orchestrator "Research user authentication requirements"
-```
-Token cost: ~20K (orchestrator overhead + delegation + research)
-**✅ Efficient:**
-```
-@rrce_research_discussion TASK_SLUG=user-auth REQUEST="Research authentication"
-```
-Token cost: ~5K (direct invocation, prompt caching)
----
-### 2. Answer Questions in Same Session
-**❌ Inefficient:** Starting new chat for each answer
-- Each new chat = full prompt reload
-- No caching benefit
-**✅ Efficient:** Continue in same session
-- Prompt cached after turn 1
-- 90% token reduction on subsequent turns
----
-### 3. Let Agents Cache Knowledge
-**❌ Inefficient:** Asking agent to "re-search" on each turn
-**✅ Efficient:** Trust the hybrid approach
-- Agent searches once (first turn)
-- References findings thereafter
-- Only re-searches if you introduce new scope
----
-### 4. Use Orchestrator for Full Workflows Only
-**❌ Inefficient:** Using orchestrator for single phases
-```
-@rrce_orchestrator "Just do research on X"
-```
-**✅ Efficient:** Direct invocation for single phases
-```
-@rrce_research_discussion TASK_SLUG=x REQUEST="Research X"
-```
-**✅ Efficient:** Orchestrator for full automation
-```
-@rrce_orchestrator "Implement feature X from research to deployment"
-```
----
-## 📊 Measuring Token Usage
-### Via OpenCode Logs
-Token usage is logged in each API response. Look for:
-```
-Input tokens: 4,234
-Cache creation tokens: 3,800  (first turn only)
-Cache read tokens: 3,800      (subsequent turns)
-Output tokens: 512
-```
-**Calculation:**
-- First turn cost: `input_tokens + output_tokens`
-- Cached turn cost: `(input_tokens - cache_read_tokens) + output_tokens`
-### Via Provider Dashboard
-- **Anthropic Console:** https://console.anthropic.com → Usage
-- **OpenAI Dashboard:** https://platform.openai.com/usage
----
-## 🔧 Troubleshooting
-### "Agent uses too many tokens"
-**Check:**
-1. Are you using direct invocation? (More efficient than orchestrator)
-2. Are you continuing in same session? (Enables caching)
-3. Is agent re-searching knowledge? (Should only search once)
-**Fix:**
-- Use `@rrce_*` directly for single phases
-- Keep conversations in same session
-- Review agent config (`opencode.json`) - verify Haiku for research
----
-### "Cache not activating"
-**Symptoms:** Token usage stays high on subsequent turns
-**Common causes:**
-1. **New sessions:** Each new chat = new session = no cache
-2. **Prompt modifications:** Changing agent config invalidates cache
-3. **Different models:** Each model has separate cache
-**Fix:**
-- Continue in same session (don't start new chat)
-- Avoid editing agent prompts mid-session
-- Verify `promptCacheKey` in logs
----
-### "Research agent asks too many questions"
-**Default:** Hybrid approach (ask critical questions, document rest as assumptions)
-**If too verbose:**
-Edit `agent-core/prompts/research_discussion.md`:
-```markdown
-**STOP after 1 round.** Document remaining ambiguity as assumptions.
-```
-**If too brief:**
-```markdown
-**Up to 3 rounds** if needed for critical clarification.
-```
----
-## 📈 Performance Benchmarks
-### Before Optimization (v1.0)
-| Workflow | Tokens | Cost | Time |
-|----------|--------|------|------|
-| Research only (3 rounds) | 66K | $0.20 | ~45s |
-| Research → Planning | 110K | $0.33 | ~75s |
-| Full workflow | 150K | $0.45 | ~120s |
-### After Optimization (v2.0)
-| Workflow | Tokens | Cost | Time |
-|----------|--------|------|------|
-| Research only (3 rounds) | 16K | $0.004 | ~25s |
-| Research → Planning | 35K | $0.11 | ~40s |
-| Full workflow | 53K | $0.16 | ~70s |
-**Improvements:**
-- 76% fewer tokens for research
-- 68% fewer tokens for planning
-- 65% fewer tokens for full workflow
-- 98% cost reduction for research (Haiku!)
-- 42% faster execution
----
-## 🎓 Advanced Optimizations
-### Custom Agent Configuration
-Override defaults in `opencode.json`:
-```json
-{
-  "agent": {
-    "rrce_research_discussion": {
-      "model": "anthropic/claude-opus-4",  // More powerful if needed
-      "temperature": 0.1,                  // More deterministic
-      "maxSteps": 3                        // Limit iterations
-    }
-  }
-}
-```
-### Project-Specific Settings
-Create `.opencode/opencode.json` in your project:
-```json
-{
-  "agent": {
-    "rrce_research_discussion": {
-      "model": "anthropic/claude-haiku-4-20250514"
-    }
-  }
-}
-```
-This overrides global settings for this project only.
----
-## 📚 Related Documentation
-- [Main RRCE Guide](./opencode-guide.md)
-- [Architecture Documentation](./architecture.md)
-- [Migration Guide](./MIGRATION-v2.md)
-- [OpenCode Docs](https://opencode.ai/docs)
----
-## 🤝 Contributing
-Found additional optimizations? Submit a PR or issue:
-- GitHub: https://github.com/rryando/rrce-workflow
----
-**Last Updated:** January 2026
-**Version:** 2.0