npm - anvil-dev-framework - Versions diffs - 0.1.6 → 0.1.8 - Mend

anvil-dev-framework 0.1.6 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (87) hide show

package/README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ```
      ___    _   ___     _____ _
     /   \  | \ | \ \   / /_ _| |
-   / /_\ \ |  \| |\ \ / / | || |      v0.1.6.0 (alpha)
+   / /_\ \ |  \| |\ \ / / | || |      v0.1.8.0 (alpha)
   / _____ \| |\  | \ V /  | || |___
  /_/     \_\_| \_|  \_/  |___|_____|
@@ -10,7 +10,7 @@
    ══════════════════════════════════════════════════════════
 ```
-# Anvil Development Framework <sup>v0.1.6.0</sup>
+# Anvil Development Framework <sup>v0.1.8.0</sup>
 > **A structured AI development system for solo builders who demand production-quality output.**
@@ -18,17 +18,20 @@ Anvil is a comprehensive framework for AI-assisted software development that com
 ---
-## 📦 Latest Changes in v0.1.6.0
+## 📦 Latest Changes in v0.1.8.0
-*Released: 2026-01-07*
+*Released: 2026-01-16*
-- **Ralph Wiggum Autonomous Execution** — Long-running unattended AI execution mode
-  - `/ralph start` — Initialize autonomous loop with task breakdown
-  - Circuit breaker safety (stops stuck loops automatically)
-  - Progress tracking to prevent repeated mistakes
-  - Git checkpointing before each restart
-- **PostToolUse Formatting Hook** — Auto-format files after Edit/Write operations
-- **Usage Guidelines** — Clear guidance on when Ralph is appropriate vs standard workflow
+- **Token Efficiency Audit Framework** — Complete token consumption tracking and optimization
+  - `/efficiency` command for historical analysis with weekly/monthly reports
+  - `/token-budget` command for session budget management with alerts
+  - Efficiency scoring (0-100), trend detection, and automated recommendations
+- **CodeRabbit Deep Integration** — Automated code review workflow
+  - Enhanced `.coderabbit.yaml` with pre-merge checks and custom Anvil validations
+  - `/evidence` command integration with enforcement levels (soft/hard)
+- **Insights Watermark System** — Prevents re-analyzing processed retrospectives
+  - Manifest tracking in `.claude/insights/.manifest.json`
+  - `--all` flag to force re-analysis when needed
 See [CHANGELOG.md](CHANGELOG.md) for complete history.
@@ -154,7 +157,7 @@ npm install -g anvil-dev-framework
 anvil init
 ```
-**Option 3: Homebrew (macOS)**
+**Option 3: Homebrew (macOS)** *(coming soon)*
 ```bash
 brew tap alexandercahiz/anvil
 brew install anvil
@@ -498,10 +501,27 @@ This remains your **default approach** for all normal development work.
 ### Ralph Mode (Special Scenarios Only)
-```
+```bash
+# Manual task description
 /ralph start "Migrate all tests from Jest to Vitest" --max-iterations 50
+# From Linear issue (recommended) - fetches subtasks automatically
+/ralph start --issue ANV-209
+# From Linear project - process all issues in a project
+/ralph start --project "HUD Development"
 ```
+**Linear Integration Flags:**
+| Flag | Description |
+|------|-------------|
+| `--issue` | Linear issue ID to fetch subtasks from (e.g., `ANV-209`) |
+| `--project` | Linear project name to process all issues |
+| `--subtasks` | Filter subtasks (e.g., `ANV-1..ANV-5` or `ANV-1,ANV-3`) |
+| `--include-done` | Include already-completed issues in project mode |
+| `--no-sync` | Disable syncing status back to Linear |
 | Good For | Not Good For |
 |----------|--------------|
 | ✅ Large-scale refactoring with clear completion criteria | ❌ Exploratory work (figuring things out) |

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.1.6.0
1	+ 0.1.8.0

package/docs/ANV-263-hook-logging-investigation.md ADDED Viewed

@@ -0,0 +1,116 @@
+# Investigation: Pre/Post Tool Use Hook Logging Discrepancy
+**Issue**: ANV-263
+**Date**: 2026-01-15
+**Status**: Root cause identified
+---
+## Problem Statement
+The healthcheck report from 2026-01-15-1721 flagged:
+> "pre_tool_use entries (2) much lower than post_tool_use (39) suggests potential hook logging inconsistency"
+## Investigation Summary
+### Initial Observations
+| Metric | pre_tool_use.json | post_tool_use.json |
+|--------|-------------------|-------------------|
+| File size | 23KB → grew to larger | 125KB → smaller |
+| Total entries | 42 | 2 |
+| Unique tool_use_ids | 25 | 1 |
+| Duplicate entries | 17 (40%) | 1 (50%) |
+The discrepancy reversed direction during investigation - at the time of analysis, pre_tool_use had MORE entries than post_tool_use.
+### Root Cause: Duplicate Hook Registration
+**Both hooks are registered in TWO locations:**
+1. **Global settings** (`~/.claude/settings.json`):
+   ```json
+   "PreToolUse": [{
+     "hooks": [{
+       "command": "uv run .claude/hooks/pre_tool_use.py"
+     }]
+   }]
+   ```
+2. **Local settings** (`.claude/settings.local.json`):
+   ```json
+   "PreToolUse": [{
+     "hooks": [{
+       "command": "uv run .claude/hooks/pre_tool_use.py --announce --track-tokens"
+     }]
+   }]
+   ```
+**Result**: Every tool invocation triggers the hook TWICE, producing duplicate log entries.
+### Evidence
+Duplicate tool_use_ids found (each appears exactly 2 times):
+- `toolu_01SLbsZ2Mqn3eP...`
+- `toolu_018xwr4o547Byp...`
+- `toolu_01SrAGnm2mQ7vp...`
+- `toolu_018SeysS4mLJih...`
+- `toolu_01KAMaFWvecFKQ...`
+The consistent 2x duplication pattern confirms the dual registration cause.
+### Why Counts Varied
+The original healthcheck showed pre_tool_use with fewer entries because:
+1. Logs may have been rotated or cleared at different times
+2. Different sessions accumulated different counts
+3. The healthcheck snapshot was taken at a specific moment
+## Recommendations
+### Option A: Remove Global Registration (Recommended)
+Remove the hook from `~/.claude/settings.json` since local settings are project-specific and include the correct flags.
+```bash
+# Edit ~/.claude/settings.json and remove the PreToolUse and PostToolUse entries
+```
+### Option B: Add Deduplication Logic
+Add tool_use_id deduplication in the hook itself:
+```python
+# Before appending, check if tool_use_id already exists
+if not any(e.get('tool_use_id') == input_data.get('tool_use_id') for e in log_data):
+    log_data.append(input_data)
+```
+### Option C: Use File Locking
+If concurrent execution is needed, use proper file locking:
+```python
+import fcntl
+with open(log_path, 'r+') as f:
+    fcntl.flock(f.fileno(), fcntl.LOCK_EX)
+    # ... read, modify, write ...
+    fcntl.flock(f.fileno(), fcntl.LOCK_UN)
+```
+## Related Issues
+- ANV-264: Log Rotation (Phase 3b) - will need the log files to be consistent first
+- ANV-260: Parent issue for Insights Recommended Actions
+## Files Examined
+| File | Purpose |
+|------|---------|
+| `.claude/hooks/pre_tool_use.py` | Pre-tool hook implementation (lines 378-398 for logging) |
+| `.claude/hooks/post_tool_use.py` | Post-tool hook implementation (lines 171-189 for logging) |
+| `.claude/settings.local.json` | Project-specific hook registration |
+| `~/.claude/settings.json` | Global hook registration (duplicate) |
+| `logs/pre_tool_use.json` | Pre-tool log file |
+| `logs/post_tool_use.json` | Post-tool log file |
+---
+**Next Steps**: Fix the duplicate registration, then proceed to ANV-264 (log rotation).

package/docs/INSTALLATION.md CHANGED Viewed

@@ -16,6 +16,24 @@ A context engineering framework that transforms Claude Code from a reactive assi
 ---
+## Quick Install
+For most users, the fastest way to get started:
+```bash
+# Using bun (recommended)
+bun install -g anvil-dev-framework
+anvil init
+# Or using npm
+npm install -g anvil-dev-framework
+anvil init
+```
+This installs the CLI and initializes your project with the Anvil framework. For customization options and detailed setup, continue reading below.
+---
 ## Background: Research Foundation
 This framework synthesizes patterns from 15+ production systems:

package/docs/command-reference.md CHANGED Viewed

@@ -28,6 +28,9 @@
 | `/release` | Consolidate [Unreleased] into versioned release |
 | `/healthcheck` | Framework diagnostics and session health |
 | `/retro` | Write retrospective to capture learnings |
+| `/audit` | Real-time token consumption analysis |
+| `/efficiency` | Historical token efficiency reports |
+| `/token-budget` | Session token budget management |
 | `/shard` | Break large specs into atomic pieces |
 | `/decay-review` | Archive old issues and clean handoffs |
 | `/weekly-review` | Weekly analytics and improvement recommendations |
@@ -69,6 +72,10 @@
   - [/evidence](#evidence)
   - [/healthcheck](#healthcheck)
   - [/retro](#retro)
+- [Token Efficiency Commands](#token-efficiency-commands)
+  - [/audit](#audit)
+  - [/efficiency](#efficiency)
+  - [/token-budget](#token-budget)
 - [Multi-Agent Commands](#multi-agent-commands)
   - [/hud](#hud)
 - [Maintenance Commands](#maintenance-commands)
@@ -104,7 +111,7 @@ bun install -g anvil-dev-framework
 npm install -g anvil-dev-framework
 ```
-**Option 3: Homebrew (macOS)**
+**Option 3: Homebrew (macOS)** *(coming soon)*
 ```bash
 brew tap alexandercahiz/anvil
 brew install anvil
@@ -1358,6 +1365,250 @@ Make a requirements checklist BEFORE copying patterns.
 ---
+## Token Efficiency Commands
+### /audit
+**Purpose**: Real-time token consumption analysis for the current session.
+**When to Use**: During a session to understand token usage, detect waste patterns, and get optimization recommendations.
+**What It Does**:
+1. Analyzes current session's token consumption
+2. Breaks down usage by component type (commands, hooks, tools)
+3. Identifies waste patterns (redundant loads, unused components)
+4. Calculates efficiency score (0-100)
+5. Generates actionable recommendations
+**Output Format**:
+```markdown
+## Token Audit Report
+**Session**: `abc12345...`
+**Analyzed**: 2026-01-16 14:30
+**Efficiency Score**: 78/100
+### Context Usage
+- **Total Tokens**: 45,000
+- **Context Used**: 30% of effective limit
+- **Peak Tokens**: 52,000
+### Breakdown by Type
+| Type | Tokens | % of Total | Count |
+|------|--------|------------|-------|
+| command | 15,000 | 33.3% | 8 |
+| system | 12,000 | 26.7% | 3 |
+| tools | 18,000 | 40.0% | 45 |
+### Detected Waste Patterns
+- 🔄 **orient**: Loaded 3 times, costing 1,200 extra tokens
+- ⚠️ **patterns**: Loaded but never used, wasting 800 tokens
+```
+**Related Commands**:
+- `/efficiency` — Historical analysis over days/weeks
+- `/token-budget` — Set and track session token budgets
+---
+### /efficiency
+**Purpose**: Historical efficiency analysis over time periods (weekly/monthly).
+**When to Use**: Weekly reviews, tracking optimization impact, identifying consistently low-efficiency components.
+**Variants**:
+| Command | Description |
+|---------|-------------|
+| `/efficiency` | Weekly report (default, last 7 days) |
+| `/efficiency --weekly` | Explicit weekly report |
+| `/efficiency --monthly` | Monthly report (last 30 days) |
+| `/efficiency --recommendations` | Show only recommendations |
+**What It Does**:
+1. Analyzes component usage across multiple sessions
+2. Calculates efficiency scores per component
+3. Detects trends (improving/stable/degrading)
+4. Compares to previous period
+5. Generates optimization recommendations
+**Efficiency Score Calculation**:
+| Factor | Points | Criteria |
+|--------|--------|----------|
+| Utilization | 0-50 | % of loads where component was used |
+| Token Cost | 0-30 | Lower avg tokens = higher score |
+| Consistency | 0-20 | Frequent use with high utilization |
+**Score Interpretation**:
+| Score Range | Interpretation |
+|-------------|----------------|
+| 90-100 | Excellent—keep as is |
+| 70-89 | Good—minor optimization possible |
+| 50-69 | Fair—consider optimization |
+| <50 | Poor—candidate for removal/deferral |
+**Trend Indicators**:
+| Icon | Meaning |
+|------|---------|
+| ↑ | Improving (utilization increasing) |
+| → | Stable (no significant change) |
+| ↓ | Degrading (utilization decreasing) |
+| ★ | New (no previous data) |
+**Output Format**:
+```markdown
+## Weekly Efficiency Report
+**Period**: Last 7 days
+**Generated**: 2026-01-16 14:30
+**Overall Efficiency**: 72/100
+### Summary
+- **Sessions Analyzed**: 42
+- **Total Tokens**: 1,250,000
+- **Avg per Session**: 29,762
+### Component Efficiency Scores
+| Component | Type | Score | Utilization | Trend |
+|-----------|------|-------|-------------|-------|
+| patterns | command | 35 | 15% | ↓ |
+| checklist | command | 42 | 22% | → |
+| orient | command | 85 | 92% | ↑ |
+| CLAUDE.md | system | 78 | 100% | → |
+### Top Recommendations
+- 🔴 **Defer loading patterns**: Used only 15% of the time
+  - Potential savings: ~1,020 tokens
+- 🟡 **Optimize large-context**: Averaging 3,500 tokens
+  - Potential savings: ~1,050 tokens
+```
+**Recommendations Workflow**:
+1. Review low-score components (score < 50)
+2. Check trends for degrading patterns
+3. Apply recommendations:
+   - **defer**: Move to on-demand command
+   - **optimize**: Reduce component size
+   - **review**: Consider removal
+4. Track impact in next week's report
+**Related Commands**:
+- `/audit` — Real-time session analysis
+- `/token-budget` — Proactive budget management
+---
+### /token-budget
+**Purpose**: Proactive session token budget management with intelligent alerts.
+**When to Use**: Before starting work to set a budget, during long sessions to monitor usage, when approaching context limits.
+**Variants**:
+| Command | Description |
+|---------|-------------|
+| `/token-budget` | Show current budget status (default) |
+| `/token-budget status` | Same as `/token-budget` |
+| `/token-budget set <tokens>` | Set budget (e.g., `set 100000`) |
+| `/token-budget alert <level> <percent>` | Set custom threshold |
+| `/token-budget clear` | Remove budget constraint |
+**What It Does**:
+1. Sets a token budget for the current session
+2. Tracks usage against budget
+3. Estimates remaining turns based on consumption patterns
+4. Alerts when configurable thresholds are crossed
+5. Integrates with hooks for automatic alerts
+**Default Alert Thresholds**:
+| Level | Threshold | Trigger |
+|-------|-----------|---------|
+| Info | 60% | Informational notice |
+| Warning | 80% | Recommend `/handoff` soon |
+| Critical | 90% | Urgent action required |
+**Setting Custom Thresholds**:
+```bash
+/token-budget alert warning 75
+```
+This sets the warning threshold to 75% instead of the default 80%.
+**Output Format** (when budget is set):
+```markdown
+## Token Budget Status
+**Session**: `abc12345...`
+**Checked**: 2026-01-16 10:30
+### Budget Overview
+| Metric | Value |
+|--------|-------|
+| Budget | 100,000 tokens |
+| Used | 45,230 tokens |
+| Remaining | 54,770 tokens |
+| Used | 45.2% |
+### Remaining Capacity
+- **Estimated Turns**: ~109 turns remaining
+- **Alert Status**: ✅ Normal (under 60%)
+### Alert Thresholds
+| Level | Threshold | Status |
+|-------|-----------|--------|
+| Info | 60% | Not triggered |
+| Warning | 80% | Not triggered |
+| Critical | 90% | Not triggered |
+```
+**Alert Messages** (when thresholds crossed):
+```
+ℹ️ **Budget Notice**: 60% of token budget used (60,000/100,000).
+   ~80 turns remaining. Consider planning session wrap-up.
+⚠️ **Budget Warning**: 80% of token budget used (80,000/100,000).
+   ~40 turns remaining. Recommend running `/handoff` soon.
+🔴 **Budget Critical**: 90% of token budget used (90,000/100,000).
+   ~20 turns remaining. Run `/handoff` or `/clear` immediately.
+```
+**Recommended Budget Sizes**:
+| Session Type | Budget | Approx Turns |
+|--------------|--------|--------------|
+| Light session | 50,000 | ~100 |
+| Standard session | 100,000 | ~200 |
+| Heavy session | 150,000 | ~300 |
+**Related Commands**:
+- `/audit` — Real-time session analysis
+- `/efficiency` — Historical patterns
+- `/handoff` — Session continuity (recommended at 80%)
+---
 ## Multi-Agent Commands
 ### /hud
@@ -1823,12 +2074,21 @@ This reminder appears in the hook output, prompting you to run the configured co
 **Usage**:
 ```bash
-# Start autonomous execution
+# Start autonomous execution (manual task description)
 /ralph start "Migrate all tests from Jest to Vitest"
 # With options
 /ralph start "Add OAuth authentication" --max-iterations 30
+# From Linear issue (recommended) - fetches subtasks automatically
+/ralph start --issue ANV-209
+# From Linear with subtask filter
+/ralph start --issue ANV-209 --subtasks ANV-210..ANV-213
+# From Linear project - process all issues in a project
+/ralph start --project "HUD Development"
 # Check progress
 /ralph status
@@ -1850,6 +2110,11 @@ This reminder appears in the hook output, prompting you to run the configured co
 |------|---------|-------------|
 | `--max-iterations` | 50 | Maximum iterations before stopping |
 | `--completion-promise` | COMPLETE | Text that signals completion |
+| `--issue` | — | Linear issue ID to fetch subtasks from (e.g., `ANV-209`) |
+| `--project` | — | Linear project name to process all issues |
+| `--subtasks` | — | Filter subtasks (e.g., `ANV-1..ANV-5` or `ANV-1,ANV-3`) |
+| `--include-done` | false | Include already-completed issues in project mode |
+| `--no-sync` | false | Disable syncing status back to Linear |
 **What It Does**:
@@ -1906,6 +2171,41 @@ Ralph includes automatic safety stops:
 - Iteration 3: Implemented OAuth redirect
 ```
+**With Linear integration**, additional fields are shown:
+```markdown
+## Ralph Wiggum Status
+| Metric | Value |
+|--------|-------|
+| Status | Running |
+| Iteration | 5 of 50 |
+| Items Complete | 3 of 8 |
+| Progress | 38% |
+### Linear Integration
+| Field | Value |
+|-------|-------|
+| Parent Issue | [ANV-209](https://linear.app/your-org/issue/ANV-209) |
+| Subtasks | 3 done, 0 skipped, 5 remaining |
+| Last Sync | 2026-01-07T10:45:00Z |
+| Sync Status | Enabled |
+### Current Subtask
+[ANV-212] Phase 3: Command Interface Update
+```
+**Error Handling** (Linear integration):
+Ralph includes robust error handling for Linear operations:
+| Error Type | Behavior |
+|------------|----------|
+| Rate limit (429) | Retry with exponential backoff (1s, 2s, 4s) |
+| Timeout | Retry up to 3 times |
+| Missing issue | Skip gracefully, continue session |
+| Sync failure | Log warning, don't block progress |
 **Environment Variables**:
 | Variable | Default | Description |