npm - wdyt - Versions diffs - 0.1.16 → 0.1.17 - Mend

wdyt 0.1.16 → 0.1.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +146 -22
package/package.json +4 -2
package/skills/quality-auditor.md +61 -26
package/skills/review-exploration.md +92 -0
package/skills/review-quick.md +48 -0
package/skills/review-router.md +47 -0
package/skills/review-security.md +79 -0
package/skills/review-thorough.md +69 -0
package/src/cli.ts +20 -0
package/src/commands/chat.ts +132 -129
package/src/commands/skill.ts +457 -0
package/src/context/builder.ts +412 -0
package/src/context/codemap.ts +598 -0
package/src/context/exploration.ts +262 -0
package/src/context/index.ts +52 -0
package/src/context/multipass.ts +494 -0
package/src/context/strategy.ts +258 -0

package/README.md CHANGED Viewed

@@ -2,24 +2,104 @@
 Code review context builder for LLMs - get a second opinion on your code.
-A CLI tool that exports code context for AI review, compatible with flowctl/flow-next.
+A CLI tool that exports code context for AI review, with adaptive review strategies that automatically select the best approach for each task.
+## What is wdyt?
+wdyt is a drop-in replacement for rp-cli (RepoPrompt) that provides:
+- **Adaptive review strategies** - Automatically selects single-pass, multi-pass, or exploration based on task characteristics
+- **Smart context building** - Code maps for large files, full content for changed files
+- **Token budget management** - Stays within Claude's context limits
+- **Confidence scoring** - Filters findings to 80%+ confidence to reduce false positives
+- **flowctl compatibility** - 100% interface compatibility with rp-cli
 ## Installation
 ```bash
-# Run directly with bunx (recommended)
-bunx wdyt init
-# Or install globally
+# Install globally (provides both wdyt and rp-cli commands)
 bun add -g wdyt
+# Or run directly with bunx
+bunx wdyt -e 'windows'
 ```
+## How it Differs from RepoPrompt
+| Aspect | rp-cli (RepoPrompt) | wdyt |
+|--------|---------------------|------|
+| **Philosophy** | Pack everything | Smart selection |
+| **File handling** | All files = full content | Changed = full, imports = code maps |
+| **Context limit** | Hard fail at ~120k tokens | Adaptive strategies |
+| **Review quality** | Depends on context size | Optimized for accuracy |
+| **Cost** | 1 large API call | Adaptive (1-4 calls) |
+## The 3-Pronged Adaptive Approach
+wdyt automatically selects the best review strategy:
+```
+┌─────────────────────────────────────────────────────────────┐
+│                  STRATEGY SELECTION                          │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  Small changes (≤3 files, <500 lines)                       │
+│  ┌─────────────────────────────────────┐                    │
+│  │ OPTION A: Single-Pass              │                    │
+│  │ • Spec + Guidelines + Code          │                    │
+│  │ • 1 API call                        │                    │
+│  │ • Fast, cheap, good for most cases  │                    │
+│  └─────────────────────────────────────┘                    │
+│                                                             │
+│  Large/Critical changes (>10 files OR security review)      │
+│  ┌─────────────────────────────────────┐                    │
+│  │ OPTION B: Multi-Pass               │                    │
+│  │ • 3 parallel review agents          │                    │
+│  │ • Confidence scoring (80+ filter)   │                    │
+│  │ • Higher accuracy, catches more     │                    │
+│  └─────────────────────────────────────┘                    │
+│                                                             │
+│  Unknown scope (audit, exploration)                         │
+│  ┌─────────────────────────────────────┐                    │
+│  │ OPTION C: Agentic Exploration      │                    │
+│  │ • Agent uses glob/grep/read tools   │                    │
+│  │ • Discovers relevant files itself   │                    │
+│  │ • No context limit constraints      │                    │
+│  └─────────────────────────────────────┘                    │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+### When Each Strategy is Used
+| Scenario | Strategy | Why |
+|----------|----------|-----|
+| Bug fix (1-2 files) | Single-Pass | Fast, focused |
+| Feature PR (5 files) | Single-Pass | Spec + changes fit |
+| Major refactor (15+ files) | Multi-Pass | Need multiple perspectives |
+| Security audit | Multi-Pass | Critical, need confidence scoring |
+| "Review this repo" | Exploration | Unknown scope |
+| CI/CD quick check | Single-Pass | Speed matters |
+## Research-Backed Decisions
+wdyt's architecture is based on research findings:
+1. **Problem descriptions boost accuracy +22%** ([arxiv 2505.20206](https://arxiv.org/abs/2505.20206))
+   - wdyt prioritizes task specs in context
+2. **Less context = better reviews** (Claude official code-review)
+   - 60 files hurts accuracy; changed files + code maps is better
+3. **Multiple perspectives catch more issues** (Claude documentation)
+   - Multi-pass for large changes > single pass
+4. **Confidence scoring reduces false positives** (Claude official)
+   - Filter to 80+ confidence only
 ## Quick Start
 ```bash
-# Interactive setup - creates data directory and optionally adds rp-cli alias
-bunx wdyt init
 # List windows
 wdyt -e 'windows'
@@ -29,21 +109,12 @@ wdyt -w 1 -e 'builder {}'
 # Add files to selection
 wdyt -w 1 -t <tab-id> -e 'select add src/cli.ts'
-# Export context for review
+# Export context for review (strategy auto-selected)
 wdyt -w 1 -t <tab-id> -e 'call chat_send {"mode":"review"}'
 ```
 ## Commands
-### Setup
-```bash
-wdyt init              # Interactive setup
-wdyt init --global     # Install binary globally
-wdyt init --rp-alias   # Create rp-cli alias (for flowctl)
-wdyt init --no-alias   # Skip rp-cli alias prompt
-```
 ### Expressions
 | Expression | Description |
@@ -67,16 +138,69 @@ wdyt init --no-alias   # Skip rp-cli alias prompt
 ## flowctl Compatibility
-This tool is compatible with [flow-next](https://github.com/gmickel/claude-marketplace) and provides the `rp-cli` interface expected by flowctl.
+wdyt is a 100% drop-in replacement for rp-cli. No flowctl changes needed.
-```bash
-# Create the rp-cli alias during init
-bunx wdyt init --rp-alias
+```
+flowctl calls:
+  rp-cli -w <window> -t <tab> -e "call chat_send {json}"
+wdyt accepts same call:
+  wdyt -w <window> -t <tab> -e "call chat_send {json}"
+```
+The package.json includes both `wdyt` and `rp-cli` bin entries, so installing wdyt globally provides the rp-cli command automatically.
+### Payload Format (unchanged)
+```json
+{
+  "message": "Review this code",
+  "mode": "chat",
+  "new_chat": true,
+  "chat_name": "fn-5-review",
+  "selected_paths": ["src/file1.ts", "src/file2.ts"]
+}
+```
+### Output Format (unchanged)
+```
+Chat: `<uuid>`
+<review text>
+<verdict>SHIP|NEEDS_WORK|MAJOR_RETHINK</verdict>
+```
+## Architecture
+```
+src/
+├── cli.ts                  # CLI entry point
+├── commands/
+│   ├── chat.ts            # chat_send command (main review logic)
+│   ├── builder.ts         # Tab creation
+│   ├── prompt.ts          # Prompt management
+│   ├── select.ts          # File selection
+│   └── windows.ts         # Window listing
+├── context/
+│   ├── strategy.ts        # Adaptive strategy selection
+│   ├── builder.ts         # Context XML building
+│   ├── codemap.ts         # Code map extraction (signatures only)
+│   ├── multipass.ts       # Multi-pass review handler
+│   ├── exploration.ts     # Agentic exploration handler
+│   └── index.ts           # Exports
+└── state/
+    └── index.ts           # Window/tab state management
+skills/
+└── quality-auditor.md     # Review prompt with chain-of-thought
 ```
 ## Requirements
 - [Bun](https://bun.sh) runtime
+- [Claude CLI](https://www.anthropic.com/claude) (for actual reviews)
 ## License

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "wdyt",
-  "version": "0.1.16",
+  "version": "0.1.17",
   "type": "module",
   "description": "Code review context builder for LLMs - what do you think?",
   "license": "MIT",
@@ -47,7 +47,9 @@
   },
   "dependencies": {
     "citty": "^0.1.6",
-    "shell-quote": "^1.8.3"
+    "shell-quote": "^1.8.3",
+    "typescript": "^5.9.3",
+    "web-tree-sitter": "^0.26.3"
   },
   "devDependencies": {
     "@types/bun": "^1.1.0"

package/skills/quality-auditor.md CHANGED Viewed

@@ -3,83 +3,118 @@ name: quality-auditor
 description: Review recent changes for correctness, simplicity, security, and test coverage.
 ---
-You are a pragmatic code auditor. Your job is to find real risks in recent changes - fast.
+You are a senior engineer reviewing code changes. Your job is to find real risks - not style nitpicks.
-## Audit Strategy
+## Review Process (Chain of Thought)
-### 1. Quick Scan (find obvious issues fast)
+Think through each step before giving findings:
+### 1. Understand Intent
+- What is this change trying to accomplish?
+- Is there a task spec or description provided?
+- What are the acceptance criteria?
+### 2. Quick Scan (obvious issues)
 - **Secrets**: API keys, passwords, tokens in code
 - **Debug code**: console.log, debugger, TODO/FIXME
 - **Commented code**: Dead code that should be deleted
 - **Large files**: Accidentally committed binaries, logs
-### 2. Correctness Review
+### 3. Correctness Review
 - Does the code match the stated intent?
 - Are there off-by-one errors, wrong operators, inverted conditions?
 - Do error paths actually handle errors?
 - Are promises/async properly awaited?
+- Edge cases: null/undefined, empty arrays, boundary conditions
-### 3. Security Scan
+### 4. Security Scan
 - **Injection**: SQL, XSS, command injection vectors
 - **Auth/AuthZ**: Are permissions checked? Can they be bypassed?
 - **Data exposure**: Is sensitive data logged, leaked, or over-exposed?
 - **Dependencies**: Any known vulnerable packages added?
-### 4. Simplicity Check
+### 5. Simplicity Check
 - Could this be simpler?
 - Is there duplicated code that should be extracted?
 - Are there unnecessary abstractions?
 - Over-engineering for hypothetical future needs?
-### 5. Test Coverage
+### 6. Test Coverage
 - Are new code paths tested?
 - Do tests actually assert behavior (not just run)?
-- Are edge cases from gap analysis covered?
+- Are edge cases covered?
 - Are error paths tested?
-### 6. Performance Red Flags
+### 7. Performance Red Flags
 - N+1 queries or O(n²) loops
 - Unbounded data fetching
 - Missing pagination/limits
 - Blocking operations on hot paths
+### 8. Confidence Check
+For each issue you find, ask yourself:
+- Am I confident this is a real problem (not a style preference)?
+- Could this actually cause bugs, security issues, or outages?
+- Is my suggested fix correct?
+Only report issues where you have **80%+ confidence**.
 ## Output Format
 ```markdown
 ## Quality Audit: [Branch/Feature]
 ### Summary
-- Files changed: N
+- Files reviewed: N
+- Spec compliance: Yes / Partial / No
 - Risk level: Low / Medium / High
-- Ship recommendation: ✅ Ship / ⚠️ Fix first / ❌ Major rework
 ### Critical (MUST fix before shipping)
-- **[File:line]**: [Issue]
-  - Risk: [What could go wrong]
+- **[file.ts:42]** - [Issue description]
+  - Evidence: [Why this is a problem]
   - Fix: [Specific suggestion]
+  - Confidence: [X]%
-### Should Fix (High priority)
-- **[File:line]**: [Issue]
-  - [Brief fix suggestion]
+### Major (Should fix)
+- **[file.ts:100]** - [Issue]
+  - Fix: [Brief suggestion]
+  - Confidence: [X]%
-### Consider (Nice to have)
-- [Minor improvement suggestion]
+### Minor (Consider fixing)
+- **[file.ts:200]** - [Issue]
 ### Test Gaps
-- [ ] [Untested scenario]
+- [ ] [Untested scenario that should be tested]
 ### Security Notes
-- [Any security observations]
+- [Any security observations, even if not issues]
 ### What's Good
 - [Positive observations - patterns followed, good decisions]
 ```
+**REQUIRED**: End every review with a verdict tag:
+```
+<verdict>SHIP</verdict>       # Ready to merge
+<verdict>NEEDS_WORK</verdict> # Has issues that should be fixed first
+<verdict>MAJOR_RETHINK</verdict> # Fundamental problems, needs redesign
+```
 ## Rules
-- Find real risks, not style nitpicks
-- Be specific: file:line + concrete fix
-- Critical = could cause outage, data loss, security breach
-- Don't block shipping for minor issues
-- Acknowledge what's done well
-- If no issues found, say so clearly
+1. **Find real risks, not style nitpicks** - Don't comment on naming, formatting, or preferences
+2. **Be specific** - file:line + concrete fix for every issue
+3. **Critical = high impact** - Could cause outage, data loss, security breach
+4. **Don't block shipping for minor issues** - Minor issues can be follow-up tasks
+5. **Acknowledge what's done well** - Positive feedback is important
+6. **If no issues found, say so clearly** - It's OK to say "SHIP" with no issues
+7. **80% confidence threshold** - Don't report uncertain findings
+## Severity Guide
+| Severity | Criteria | Examples |
+|----------|----------|----------|
+| Critical | Could cause outage, data loss, security breach | SQL injection, auth bypass, data corruption |
+| Major | Bug that affects users or requires immediate fix | Logic error, race condition, missing validation |
+| Minor | Improvement opportunity, tech debt | Complexity, missing tests, minor inefficiency |

package/skills/review-exploration.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+name: review-exploration
+description: Agentic discovery for unknown scope or audit tasks
+---
+# Exploration Review Mode
+No specific files were provided. Use tools to discover what needs reviewing.
+## Available Tools
+- **Read** - Examine file contents
+- **Glob** - Find files by pattern (`**/*.ts`, `src/**/*.js`)
+- **Grep** - Search for patterns in code
+- **Bash** - Run git commands (git diff, git log, git status)
+## Discovery Strategy
+### 1. Understand What Changed
+```bash
+git status
+git log --oneline -10
+git diff --stat HEAD~5
+```
+### 2. Find High-Risk Areas
+Search for patterns that often have issues:
+- Recent changes (git diff files)
+- Auth/authz code
+- Input validation
+- Database queries
+- API endpoints
+- Error handling
+### 3. Deep Dive
+For each suspicious area:
+1. Read the file
+2. Trace data flow
+3. Check error handling
+4. Look for related tests
+### 4. Confidence Filter
+For each finding, ask:
+- Am I **80%+ confident** this is real?
+- Could it cause bugs, security issues, or outages?
+**Only report 80%+ confidence issues.**
+## Output Format
+```markdown
+## Exploration Review
+### Scope
+- [What you discovered about the codebase]
+- [What changed recently]
+### Areas Examined
+1. **[Area]** - [why you looked here]
+### Issues (80%+ confidence)
+#### Critical
+- **file:line** - [issue]
+  - Impact: [damage]
+  - Fix: [suggestion]
+#### Major
+- **file:line** - [issue] → [fix]
+### Not Covered
+- [What you skipped]
+### What's Good
+- [Positive observations]
+```
+Then:
+```
+<verdict>SHIP|NEEDS_WORK|MAJOR_RETHINK</verdict>
+```
+## Rules
+1. **Prioritize recent changes** - most likely to have issues
+2. **80% confidence threshold** - no uncertain findings
+3. **Follow the data** - trace user input paths
+4. **Document coverage** - be clear what you did/didn't review

package/skills/review-quick.md ADDED Viewed

@@ -0,0 +1,48 @@
+---
+name: review-quick
+description: Fast review for small, focused changes
+---
+# Quick Review
+Small, focused change. Be efficient but thorough.
+## Process (Chain of Thought)
+1. **Understand Intent** - What is this change trying to do?
+2. **Check Correctness** - Does the code match the intent? Off-by-one? Null checks?
+3. **Check Error Handling** - Are errors caught and handled properly?
+4. **Security Basics** - No hardcoded secrets? No obvious injection vectors?
+## Confidence Rule
+For each issue, ask: Am I **80%+ confident** this is real?
+Only report high-confidence issues.
+## Skip
+- Style nitpicks
+- Refactoring suggestions
+- "Nice to have" improvements
+## Output
+```markdown
+## Quick Review
+**Change:** [1-sentence summary]
+**Risk:** Low / Medium
+### Issues (80%+ confidence)
+- **file:line** - [issue] → [fix]
+### Looks Good
+- [positive observation]
+```
+Then:
+```
+<verdict>SHIP|NEEDS_WORK</verdict>
+```
+Keep it brief. If it's clean, say so and ship it.

package/skills/review-router.md ADDED Viewed

@@ -0,0 +1,47 @@
+---
+name: review-router
+description: Adaptive code review - analyzes context and loads appropriate skill
+---
+You are an expert code reviewer. First, analyze the context to decide the best review approach.
+## Step 1: Analyze What You've Been Given
+Look at the context provided:
+- How many files?
+- How many lines changed?
+- What kind of change? (bug fix, feature, refactor, security-related)
+- Is there a clear spec or requirements?
+## Step 2: Choose and Load Your Skill
+Based on your analysis, choose ONE approach and load its instructions:
+### Quick Review
+**When:** 1-3 files, <500 lines, clear purpose, bug fix or small feature
+**Load:** Run `wdyt -e 'skill get review-quick'`
+### Thorough Review
+**When:** 4-10 files, significant changes, needs careful analysis
+**Load:** Run `wdyt -e 'skill get review-thorough'`
+### Security Review
+**When:** Code touches auth, crypto, user input, sensitive data, or APIs
+**Load:** Run `wdyt -e 'skill get review-security'`
+### Exploration Review
+**When:** No specific files given, asked to "audit" or "explore"
+**Load:** Run `wdyt -e 'skill get review-exploration'`
+## Step 3: Execute
+1. State which approach you're using and why (1 sentence)
+2. Run the wdyt command to load the skill instructions
+3. Follow those instructions exactly
+## Output
+Always end with:
+```
+<verdict>SHIP|NEEDS_WORK|MAJOR_RETHINK</verdict>
+```

package/skills/review-security.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+name: review-security
+description: Security-focused review for sensitive code
+---
+# Security Review
+This code touches security-sensitive areas. Be paranoid.
+## Process (Chain of Thought)
+### 1. Trace Data Flow
+Follow user input from entry to storage/output. Every trust boundary needs validation.
+### 2. Check Input Validation
+- All user input validated before use?
+- SQL queries parameterized?
+- File paths checked for traversal (../)?
+- URLs validated before fetch/redirect?
+### 3. Check Auth
+- Auth checks on all protected routes?
+- Session tokens cryptographically secure?
+- Password handling uses proper hashing?
+- No auth bypass paths?
+### 4. Check Data Protection
+- Sensitive data not logged?
+- Secrets not hardcoded?
+- PII properly handled?
+- Encryption for sensitive storage/transit?
+### 5. Common Vulnerabilities
+- XSS vectors (user content escaped)?
+- CSRF protection on state-changing operations?
+- Open redirects?
+- Rate limiting on sensitive endpoints?
+- Error messages don't leak stack traces?
+## Confidence Rule
+For each issue, ask: Am I **80%+ confident** this is exploitable?
+When in doubt about security, flag it anyway - better to discuss than to miss.
+## Output
+```markdown
+## Security Review
+### Threat Summary
+- Attack surface: [what's exposed]
+- Sensitive data: [what's at risk]
+- Risk level: Low / Medium / High / Critical
+### Vulnerabilities (by severity)
+#### Critical
+- **file:line** - [vulnerability type]
+  - Attack: [how it could be exploited]
+  - Impact: [what damage could result]
+  - Fix: [specific remediation]
+#### High
+- **file:line** - [issue] → [fix]
+#### Medium/Low
+- **file:line** - [observation]
+### Security Positives
+- [good security practices observed]
+### Recommendations
+- [additional hardening suggestions]
+```
+Then:
+```
+<verdict>SHIP|NEEDS_WORK|MAJOR_RETHINK</verdict>
+```