npm - rhachet-roles-bhrain - Versions diffs - 0.1.1 → 0.3.0 - Mend

rhachet-roles-bhrain 0.1.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (159) hide show

package/dist/roles/architect/briefs/brains.replic/arc201.blueprint.claude-code.zoomin.reason.[article].md ADDED Viewed

@@ -0,0 +1,507 @@
+# blueprint: claude code — reasoning strategies (zoom-in)
+## .what
+a detailed analysis of how claude code leverages reasoning strategies from the research literature, mapping theoretical patterns to production implementation choices.
+## .why
+understanding which reasoning strategies claude code uses (and doesn't use) reveals the tradeoffs between theoretical power and practical constraints like latency, cost, and user experience. this zoom-in connects the atomic concepts to their concrete implementation.
+---
+## reasoning strategy landscape
+the research literature offers many reasoning strategies. claude code implements a pragmatic subset:
+| strategy | implemented? | how |
+|----------|--------------|-----|
+| react (implicit) | ✓ yes | interleaved reasoning + tool use |
+| extended thinking | ✓ yes | configurable token budget |
+| think tool | ✓ yes | explicit mid-task reasoning pause |
+| plan-and-solve | ✓ yes | plan mode + EnterPlanMode tool |
+| evaluator-optimizer | ✓ partial | feedback loop with user/tests |
+| reflexion | ✗ not explicit | user feedback serves similar role |
+| tree-of-thoughts | ✗ no | too expensive for interactive use |
+| self-consistency | ✗ no | single path, no voting |
+| lats | ✗ no | no monte carlo tree search |
+---
+## implemented strategies
+### 1. implicit react
+claude code's core loop is fundamentally react-shaped:
+```
+[Thought] "I need to understand why the test is failing"
+[Action]  Read test file
+[Observe] test contents
+[Thought] "The test expects X but function returns Y"
+[Action]  Read implementation
+[Observe] function code
+[Thought] "I see the bug — missing null check"
+[Action]  Edit file
+...
+```
+**key difference from paper**: reasoning traces are implicit (not labeled `Thought:`/`Action:`). the model reasons naturally in its response, then calls tools.
+**why implicit**: explicit labels add token overhead and feel mechanical. implicit react preserves conversational flow while maintaining the grounding benefits.
+### sources
+- [ReAct: Synergizing Reasoning and Acting](https://arxiv.org/abs/2210.03629) — original pattern
+- [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — anthropic's approach
+---
+### 2. extended thinking
+extended thinking allocates a dedicated token budget for reasoning *before* response generation:
+```
+┌─────────────────────────────────────────────────┐
+│              EXTENDED THINKING                  │
+│  budget: 16k tokens                             │
+│                                                 │
+│  "This refactoring is complex. Let me consider: │
+│   1. Current architecture patterns...          │
+│   2. Which files will be affected...           │
+│   3. Migration strategy options...             │
+│   4. Risk of breaking changes..."              │
+└─────────────────────────────────────────────────┘
+                      │
+                      ↓
+┌─────────────────────────────────────────────────┐
+│         RESPONSE (informed by thinking)         │
+│  "I'll refactor in three phases: ..."          │
+└─────────────────────────────────────────────────┘
+```
+**when it helps**: complex coding, math, multi-file refactoring, architectural decisions.
+**budget guidelines**:
+| budget | suitable for |
+|--------|--------------|
+| 1k-4k | simple analysis |
+| 4k-16k | moderate complexity |
+| 16k-64k | complex coding/math |
+| 64k-128k | deep research |
+**interleaved thinking** (claude 4+): with the `interleaved-thinking-2025-05-14` beta header, claude can think *between* tool calls, not just before the first response.
+### sources
+- [Claude's Extended Thinking](https://www.anthropic.com/news/visible-extended-thinking) — announcement
+- [Building with Extended Thinking](https://docs.claude.com/en/docs/build-with-claude/extended-thinking) — implementation guide
+- [Extended Thinking Tips](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/extended-thinking-tips) — best practices
+---
+### 3. think tool
+the think tool is a *tool* claude can call mid-task to pause and reason:
+```python
+{
+  "name": "think",
+  "description": "Record reasoning during complex tasks. Does not
+                  obtain new information or change state."
+}
+```
+**vs extended thinking**:
+| aspect | extended thinking | think tool |
+|--------|-------------------|------------|
+| when | before first response | during response, between tool calls |
+| purpose | deep planning | process new information |
+| trigger | api parameter | claude decides to call it |
+| visibility | may be hidden | logged in conversation |
+**when think tool excels**:
+- long chains of tool calls needing intermediate reasoning
+- processing complex tool results before next action
+- policy-heavy environments with detailed guidelines
+- sequential decisions where each step builds on previous
+**performance impact**: +54% on complex customer service scenarios (tau-bench), +1.6% on SWE-bench.
+### sources
+- [The "think" Tool: Enabling Claude to Stop and Think](https://www.anthropic.com/engineering/claude-think-tool) — official guide
+- [Anthropic's new 'think tool'](https://the-decoder.com/anthropics-new-think-tool-lets-claude-take-notes-to-solve-complex-problems/) — analysis
+---
+### 4. plan mode (plan-and-solve)
+plan mode implements the plan-and-solve pattern as a first-class workflow:
+```
+┌─────────────────────────────────────────────────┐
+│                  PLAN MODE                       │
+│  (shift+tab twice to enter)                     │
+│                                                 │
+│  ┌───────────────────────────────────────────┐  │
+│  │ 1. READ: explore codebase (read-only)     │  │
+│  │ 2. ANALYZE: understand patterns           │  │
+│  │ 3. PLAN: write markdown implementation    │  │
+│  │    plan to .claude/plans/                 │  │
+│  │ 4. PRESENT: call ExitPlanMode for review  │  │
+│  └───────────────────────────────────────────┘  │
+│                                                 │
+│  [NO file writes, NO bash execution]            │
+└─────────────────────────────────────────────────┘
+                      │
+              user approves
+                      │
+                      ↓
+┌─────────────────────────────────────────────────┐
+│               EXECUTION MODE                     │
+│  - execute plan step by step                    │
+│  - user can interrupt at any time               │
+└─────────────────────────────────────────────────┘
+```
+**architectural insight**: plans are stored as markdown files in `.claude/plans/`, creating an audit trail of architectural decisions that outlives sessions.
+**why it works**: mirrors how experienced engineers work — read, understand, plan, *then* build. prevents premature coding and reduces rework.
+**plan mode benefits**:
+- faster iteration (read-only is cheap)
+- token-efficient (opus shines in planning)
+- reviewable artifacts (plans are diffable)
+- reversible (no changes until approved)
+### sources
+- [Understanding Claude Code Plan Mode](https://lord.technology/2025/07/03/understanding-claude-code-plan-mode-and-the-architecture-of-intent.html) — deep dive
+- [What Actually Is Claude Code's Plan Mode?](https://lucumr.pocoo.org/2025/12/17/what-is-plan-mode/) — armin ronacher's analysis
+- [Plan-and-Solve Prompting](https://arxiv.org/abs/2305.04091) — original research
+---
+### 5. evaluator-optimizer loop
+claude code implements a natural evaluator-optimizer pattern through its feedback loop:
+```
+┌─────────┐     ┌─────────┐     ┌─────────┐
+│ GATHER  │────▶│  ACT    │────▶│ VERIFY  │
+│ CONTEXT │     │         │     │         │
+└─────────┘     └─────────┘     └─────────┘
+     ▲                               │
+     │                               │
+     └───────────────────────────────┘
+              (if not satisfied)
+```
+**forms of verification**:
+- run tests after code changes
+- check build output
+- user approval for risky operations
+- self-verification via file re-reading
+**not formal reflexion**: claude code doesn't maintain an explicit "reflection memory" across attempts. however, the iterative loop with test feedback serves a similar function — failures inform subsequent attempts within the same session.
+### sources
+- [Building Agents with the Claude Agent SDK](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk) — feedback loop pattern
+---
+## strategies NOT implemented
+### tree of thoughts
+**why not**: ToT requires multiple LLM calls per step (generate candidates → evaluate each → select best). for interactive coding, this adds unacceptable latency. a single ToT "step" might require 5-10 LLM calls.
+**alternative**: extended thinking provides some deliberation benefits without the call overhead.
+### self-consistency
+**why not**: SC samples multiple reasoning paths and votes on the answer. this multiplies cost by the number of samples (typically 5-40x).
+**alternative**: extended thinking allows exploring multiple approaches within a single forward pass.
+### lats (language agent tree search)
+**why not**: LATS combines tree search with monte carlo methods, requiring many rollouts. prohibitively expensive for real-time interaction.
+**alternative**: subagents can explore different approaches in parallel, with human selecting the best result.
+### formal reflexion
+**why not**: reflexion requires explicit episodic memory of past failures. claude code sessions are typically short enough that context window serves as implicit memory.
+**alternative**: user feedback serves the "evaluator" role. test failures inform next attempt.
+---
+## how strategies are triggered
+a key architectural question: where does each reasoning strategy live?
+| strategy | trigger mechanism | location |
+|----------|-------------------|----------|
+| implicit react | always on | baked into model weights |
+| extended thinking | api parameter | caller configures budget |
+| think tool | model decides | system prompt instructs when to use |
+| plan mode | user or model | system prompt + EnterPlanMode tool |
+| subagent types | model decides | system prompt defines types + when to use |
+---
+## what's trained into model weights
+these capabilities exist in the model regardless of system prompt:
+| capability | how it's trained | evidence |
+|------------|------------------|----------|
+| **chain-of-thought reasoning** | constitutional AI + RLHF | model reasons step-by-step without prompting |
+| **implicit react** | RLHF with tool use examples | naturally interleaves reasoning with tool calls |
+| **tool call formatting** | fine-tuning on function calling | knows JSON schema format without examples |
+| **instruction following** | RLHF | follows instructions without few-shot examples |
+| **coding ability** | pre-training + RLHF | writes code across languages |
+| **constitutional values** | constitutional AI (RLAIF) | helpful, harmless, honest without prompting |
+### constitutional AI training
+anthropic's constitutional AI process:
+1. **supervised phase**: model generates responses → self-critiques against principles → revises → fine-tuned on revisions
+2. **RLAIF phase**: AI compares response pairs against constitution → preference model trained → model fine-tuned via RL
+this embeds values and reasoning patterns into weights, not just prompts.
+### implicit react is trained behavior
+**implicit react** is inherent to how claude reasons. no system prompt needed — the model naturally interleaves reasoning with tool calls. this emerged from:
+- RLHF training on agentic tasks
+- exposure to tool use during training
+- reward for grounded reasoning (reasoning that references observations)
+### sources
+- [Constitutional AI: Harmlessness from AI Feedback](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback) — training method
+- [Tool Use Now Generally Available](https://www.maginative.com/article/tool-use-now-generally-available-in-anthropics-claude-ai/) — function calling
+---
+## what's configured by api caller (not in system prompt)
+these are set by the application calling the API, invisible to the system prompt:
+| parameter | effect | who controls |
+|-----------|--------|--------------|
+| `thinking.budget_tokens` | extended thinking token limit | application developer |
+| `model` | which claude model to use | application developer |
+| `max_tokens` | response length limit | application developer |
+| `tools` | available tool definitions | application developer |
+**extended thinking** specifically is NOT in the system prompt — it's an API parameter:
+```python
+response = client.messages.create(
+    model="claude-opus-4-5-20250101",
+    thinking={
+        "type": "enabled",
+        "budget_tokens": 16000  # caller sets budget
+    },
+    ...
+)
+```
+the caller decides the budget. higher budget = more thinking = higher cost + latency.
+---
+## what's instructed via system prompt
+**plan mode** is heavily prompted in the system prompt:
+```
+## EnterPlanMode tool
+Use this tool proactively when you're about to start a
+non-trivial implementation task. Getting user sign-off
+on your approach before writing code prevents wasted
+effort and ensures alignment.
+When to Use:
+1. New Feature Implementation
+2. Multiple Valid Approaches
+3. Code Modifications affecting existing behavior
+4. Architectural Decisions
+...
+```
+the system prompt explicitly tells claude *when* to use plan mode proactively, not just when asked.
+**subagent delegation** is similarly prompted:
+```
+## Task tool
+Use this tool when the task matches the agent's description.
+You should proactively use the Task tool with specialized
+agents when the task at hand matches the agent's description.
+Available agent types:
+- Explore: Fast agent for codebase exploration
+- Plan: Software architect agent for designing plans
+- general-purpose: Multi-step tasks
+...
+```
+### model autonomy
+critically, the system prompt gives claude *judgment* about when to apply these strategies:
+```
+"if unsure whether to use [plan mode], err on the side of
+planning — it's better to get alignment upfront than to
+redo work"
+```
+so the model decides based on task complexity, not just user request.
+---
+## summary: three layers of reasoning control
+```
+┌─────────────────────────────────────────────────────────────────┐
+│  LAYER 1: MODEL WEIGHTS (trained, always present)               │
+├─────────────────────────────────────────────────────────────────┤
+│  • chain-of-thought reasoning                                   │
+│  • implicit react (interleaved reasoning + tool calls)          │
+│  • tool call formatting (JSON schema)                           │
+│  • constitutional values (helpful, harmless, honest)            │
+│  • instruction following                                        │
+│  • coding ability                                               │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  LAYER 2: API PARAMETERS (caller configures)                    │
+├─────────────────────────────────────────────────────────────────┤
+│  • extended thinking budget (thinking.budget_tokens)            │
+│  • model selection (claude-opus-4, claude-sonnet-4)             │
+│  • available tools (tool definitions)                           │
+│  • max tokens, temperature, etc.                                │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  LAYER 3: SYSTEM PROMPT (anthropic configures for claude code)  │
+├─────────────────────────────────────────────────────────────────┤
+│  • when to use plan mode (criteria + proactive guidance)        │
+│  • when to use subagents (types + selection heuristics)         │
+│  • when to use think tool (complex chains guidance)             │
+│  • specific tool usage instructions                             │
+│  • behavioral constraints (git safety, permission model)        │
+└─────────────────────────────────────────────────────────────────┘
+```
+**key insight**: the reasoning strategy is a product of all three layers. claude code's effectiveness comes from carefully orchestrating trained capabilities (layer 1) with appropriate compute budget (layer 2) and specific behavioral guidance (layer 3).
+---
+## strategy selection heuristic
+claude code dynamically applies reasoning strategies based on task complexity:
+```
+[Simple task]
+  └─▶ implicit react only
+[Moderate complexity]
+  └─▶ implicit react + extended thinking (if budget configured)
+[High complexity / new feature]
+  └─▶ plan mode → approval → execution with think tool
+[Long tool chains]
+  └─▶ think tool between critical steps
+[Multi-concern task]
+  └─▶ subagent delegation (explore, plan, execute separately)
+```
+### decision flow
+```
+┌─────────────────────────────────────────────────┐
+│  USER REQUEST                                    │
+└─────────────────────────────────────────────────┘
+                      │
+                      ▼
+┌─────────────────────────────────────────────────┐
+│  Is this non-trivial implementation?            │
+│  (system prompt criteria)                       │
+├─────────────────────────────────────────────────┤
+│  YES → EnterPlanMode (proactive)                │
+│  NO  → proceed with implicit react              │
+└─────────────────────────────────────────────────┘
+                      │
+                      ▼
+┌─────────────────────────────────────────────────┐
+│  Does this need codebase exploration?           │
+├─────────────────────────────────────────────────┤
+│  YES → Task(subagent_type="Explore")            │
+│  NO  → continue in main context                 │
+└─────────────────────────────────────────────────┘
+                      │
+                      ▼
+┌─────────────────────────────────────────────────┐
+│  [Execute with implicit react]                  │
+│  - if extended thinking enabled by caller:      │
+│    use budget for complex reasoning             │
+│  - if long tool chain:                          │
+│    model may invoke think tool                  │
+└─────────────────────────────────────────────────┘
+```
+---
+## architectural insight
+claude code's reasoning architecture reflects a key insight from the research:
+> **simple patterns, well-orchestrated, outperform complex patterns poorly executed.**
+rather than implementing sophisticated techniques like ToT or LATS, claude code focuses on:
+1. **implicit react** — grounded reasoning at every step
+2. **extended thinking** — more compute when needed
+3. **plan mode** — explicit separation of concerns
+4. **think tool** — targeted reasoning mid-task
+5. **human-in-the-loop** — user as evaluator/optimizer
+this pragmatic approach achieves ~49% on SWE-bench verified while maintaining interactive response times.
+---
+## related concepts
+- [react-pattern](./arc111.concept.react-pattern.[article].md)
+- [extended-thinking](./arc118.concept.extended-thinking.[article].md)
+- [plan-and-solve](./arc122.concept.plan-and-solve.[article].md)
+- [reflexion-pattern](./arc112.concept.reflexion-pattern.[article].md)
+- [tree-of-thoughts](./arc113.concept.tree-of-thoughts.[article].md)
+- [agentic-loop](./arc109.concept.agentic-loop.[article].md)
+---
+## sources
+- [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — foundational patterns
+- [The "think" Tool](https://www.anthropic.com/engineering/claude-think-tool) — think tool guide
+- [Claude's Extended Thinking](https://www.anthropic.com/news/visible-extended-thinking) — extended thinking
+- [Building Agents with Claude Agent SDK](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk) — SDK patterns
+- [Understanding Claude Code Plan Mode](https://lord.technology/2025/07/03/understanding-claude-code-plan-mode-and-the-architecture-of-intent.html) — plan mode analysis
+- [What Actually Is Claude Code's Plan Mode?](https://lucumr.pocoo.org/2025/12/17/what-is-plan-mode/) — plan mode deep dive