npm - agentic-qe - Versions diffs - 1.5.1 → 1.6.1 - Mend

agentic-qe 1.5.1 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (188) hide show

package/.claude/skills/brutal-honesty-review/SKILL.md ADDED Viewed

@@ -0,0 +1,725 @@
+---
+name: "Brutal Honesty Review"
+description: "Unvarnished technical criticism combining Linus Torvalds' precision, Gordon Ramsay's standards, and James Bach's BS-detection. Use when code/tests need harsh reality checks, certification schemes smell fishy, or technical decisions lack rigor. No sugar-coating, just surgical truth about what's broken and why."
+---
+# Brutal Honesty Review
+## What This Skill Does
+Delivers brutally honest technical criticism across three modes, each calibrated to eliminate different types of mediocrity:
+1. **Linus Mode** - Surgical technical precision on code/architecture
+2. **Ramsay Mode** - Standards-driven quality assessment
+3. **Bach Mode** - BS detection in testing practices/certifications
+Unlike diplomatic reviews, this skill **dissects why something is wrong, explains the correct approach, and has zero patience for repeated mistakes or sloppy thinking**.
+---
+## Prerequisites
+- **Thick skin** - This will hurt
+- **Willingness to learn** - Criticism is actionable, not performative
+- **Context awareness** - Know when harsh honesty helps vs. harms team morale
+---
+## When to Use This Skill
+### ✅ APPROPRIATE CONTEXTS:
+- Senior engineers who want unfiltered technical review
+- Teaching moments for patterns that keep recurring
+- Evaluating vendor claims, certification schemes, or industry hype
+- Code that's been "good enough" for too long
+- Teams explicitly asking for no-BS feedback
+- Security vulnerabilities or critical bugs
+- Technical debt requiring executive attention
+### ❌ INAPPROPRIATE CONTEXTS:
+- Junior developers' first contributions
+- Already-demoralized teams
+- Public forums (brutal honesty ≠ public humiliation)
+- When psychological safety is low
+- Performance reviews (unless specifically requested)
+---
+## Quick Start
+### Mode 1: Linus (Technical Precision)
+**When**: Code is technically wrong, inefficient, or demonstrates misunderstanding
+```bash
+# Usage
+Review this pull request with Linus-level precision:
+[paste code/PR link]
+# Output style
+"This is completely broken. You're holding the lock for the entire
+I/O operation, which means every thread will serialize on this mutex.
+Did you even test this under load? The correct approach is..."
+```
+**Characteristics**:
+- ❌ Eliminates ambiguity about technical standards
+- ✅ Explains WHY it's wrong, not just THAT it's wrong
+- ⚡ Zero tolerance for repeated architectural mistakes
+- 🎯 Focuses on correctness, performance, maintainability
+---
+### Mode 2: Ramsay (Standards-Driven)
+**When**: Quality is subpar compared to clear excellence model
+```bash
+# Usage
+Assess this test suite against production standards:
+[paste test code]
+# Output style
+"Look at this! You've got 12 tests and 10 of them are just checking
+if variables exist. Where's the business logic coverage? Where's the
+edge cases? This is RAW. You wouldn't serve this in production, so
+why are you trying to merge it?"
+```
+**Characteristics**:
+- 🔥 Compares reality against clear mental model of excellence
+- 📊 Uses concrete metrics (coverage, complexity, duplication)
+- 🎓 Teaching through high standards, not just criticism
+- 💎 Doesn't just tear down - shows what good looks like
+---
+### Mode 3: Bach (BS Detection)
+**When**: Certifications, best practices, or vendor hype need reality check
+```bash
+# Usage
+Evaluate this testing certification/practice/tool claim:
+[paste claim or approach]
+# Output style
+"This certification teaches you to follow scripts, not to think.
+Real testing requires context-driven decisions, not checkbox compliance.
+Does this cert help testers find bugs faster? No. Does it help them
+advocate for quality? No. It helps the certification body make money."
+```
+**Characteristics**:
+- 🚨 Calls out cargo cult practices
+- 🔍 Questions: "Does this actually help?"
+- 📉 Exposes when tools/processes exist for vendor profit, not user benefit
+- 🧠 Promotes critical thinking over certification theater
+---
+## Step-by-Step Guide
+### Step 1: Choose Your Mode
+```markdown
+**Linus Mode**: Code/architecture review requiring technical precision
+**Ramsay Mode**: Quality assessment against known standards
+**Bach Mode**: Evaluating practices, certifications, industry claims
+```
+### Step 2: Establish Context
+```markdown
+Before delivering brutal honesty, verify:
+1. **Audience maturity** - Can they handle direct criticism?
+2. **Relationship capital** - Have you earned the right to be harsh?
+3. **Actionability** - Can the recipient actually fix this?
+4. **Intent** - Is this helping them improve or just venting?
+```
+### Step 3: Deliver Structured Criticism
+Each mode follows this template:
+```markdown
+## What's Broken
+[Surgical description of the problem]
+## Why It's Wrong
+[Technical/logical explanation, not opinion]
+## What Correct Looks Like
+[Clear model of excellence]
+## How to Fix It
+[Actionable steps, specific to context]
+## Why This Matters
+[Impact if not fixed]
+```
+### Step 4: Calibrate Harshness
+```markdown
+**Level 1 - Direct** (for experienced engineers):
+"This approach is fundamentally flawed because..."
+**Level 2 - Harsh** (for repeated mistakes):
+"We've discussed this pattern three times. Why is it back?"
+**Level 3 - Brutal** (for critical issues or willful ignorance):
+"This is negligent. You're exposing user data because..."
+```
+---
+## Mode Details
+### Linus Mode: Technical Precision
+#### Code Review Pattern
+```markdown
+### 1. Identify Fundamental Flaw
+"You're doing [X], which demonstrates misunderstanding of [concept]."
+### 2. Explain Why It's Wrong
+"This breaks when [scenario] because [technical reason]."
+### 3. Show Correct Approach
+"The correct pattern is [Y] because [reasoning]."
+### 4. Demand Better
+"This should never have passed local testing. Did you run it?"
+```
+#### Example: Concurrency Bug
+```markdown
+**Problem**: Holding database connection during HTTP call
+**Linus Analysis**:
+"This is completely broken. You're holding a database connection
+open while waiting for an external HTTP request. Under load, you'll
+exhaust the connection pool in seconds.
+Did you even test this with more than one concurrent user?
+The correct approach is:
+1. Fetch data from DB
+2. Close connection
+3. Make HTTP call
+4. Open new connection if needed
+This is Connection Management 101. Why wasn't this caught in review?"
+```
+#### Example: Premature Optimization
+```markdown
+**Problem**: Complex caching for operation that runs once per day
+**Linus Analysis**:
+"You've added 200 lines of caching logic with Redis, LRU eviction,
+and TTL management for a report that generates once daily.
+This is premature optimization. The cure is worse than the disease.
+Measure first. Your 'optimization' added:
+- 3 new failure modes (Redis down, cache corruption, TTL bugs)
+- 10x complexity
+- Zero measurable benefit
+Remove it. If profiling later shows this endpoint is slow, then optimize."
+```
+---
+### Ramsay Mode: Standards-Driven Quality
+#### Test Quality Assessment
+```markdown
+### 1. Compare to Excellence Model
+"Good tests should [criteria]. This test suite [fails at criteria]."
+### 2. Use Concrete Metrics
+"You have 12% branch coverage. Production-ready is 80%+."
+### 3. Show Gap
+"Look at this edge case. It's obvious. Why isn't it tested?"
+### 4. Demand Excellence
+"You know what good looks like. Why didn't you deliver it?"
+```
+#### Example: Weak Test Suite
+```markdown
+**Problem**: Tests only verify happy path
+**Ramsay Analysis**:
+"Look at this test suite. You've got 15 tests, and 14 of them are
+just happy path scenarios. Where's the validation testing? Where are
+the edge cases? Where's the failure mode testing?
+This is RAW. You're testing if the code runs, not if it's correct.
+A production-ready test suite covers:
+✓ Happy path (you have this)
+✗ Validation failures (missing)
+✗ Boundary conditions (missing)
+✗ Error handling (missing)
+✗ Concurrent access (missing)
+✗ Resource exhaustion (missing)
+You wouldn't ship code with 12% coverage. Don't merge tests with
+12% scenario coverage."
+```
+#### Example: Flaky Tests
+```markdown
+**Problem**: Test suite has intermittent failures
+**Ramsay Analysis**:
+"These tests are FLAKY. Every third run fails because you're using
+setTimeout() and hoping things complete in time. That's not testing,
+that's gambling.
+Flaky tests train developers to ignore failures. That's worse than
+no tests.
+Fix this NOW:
+1. Remove all setTimeout() - use proper async/await
+2. Mock external dependencies - don't test network reliability
+3. Make tests deterministic - same input = same output
+4. If it can't be made stable, DELETE IT
+A flaky test is a broken test. Don't merge broken code."
+```
+---
+### Bach Mode: BS Detection
+#### Certification Evaluation
+```markdown
+### 1. Question Core Value
+"Does this [practice/cert/tool] help testers do better work?"
+### 2. Identify Real Beneficiary
+"Who profits from this? Vendor, consultant, or actual testers?"
+### 3. Expose Cargo Cult Thinking
+"This promotes [checkbox compliance] over [critical thinking]."
+### 4. Advocate for Context-Driven Approach
+"Real testing requires [context-specific decisions], not [universal rules]."
+```
+#### Example: Test Automation Tool Hype
+```markdown
+**Problem**: Vendor claims "AI-powered testing eliminates manual testing"
+**Bach Analysis**:
+"This is vendor hype, not reality. They're selling you magic beans.
+'AI-powered' means what exactly? Machine learning model that auto-generates
+tests? Fine. But:
+1. **Who defines test oracles?** (AI can't know business rules)
+2. **Who handles edge cases?** (AI trained on common paths)
+3. **Who questions requirements?** (AI doesn't understand context)
+4. **Who investigates failures?** (AI can't do root cause analysis)
+This tool might auto-generate SOME tests (usually shallow). It won't
+replace thinking testers.
+The real question: Does this tool help YOUR testers on YOUR product
+in YOUR context? If you can't answer that specifically, you're buying
+hype, not tools."
+```
+#### Example: Certification Theater
+```markdown
+**Problem**: Company requires ISTQB certification for QE roles
+**Bach Analysis**:
+"ISTQB tests if you memorized terminology, not if you can test software.
+Real testing skills:
+- Finding bugs others miss
+- Designing effective test strategies for context
+- Communicating risk to stakeholders
+- Questioning requirements and assumptions
+- Advocating for quality
+ISTQB tests:
+- Definitions of 'alpha testing' vs 'beta testing'
+- Names of test design techniques you'll never use
+- V-model vs Agile terminology
+- Checkbox thinking
+If ISTQB helped testers get better, companies with ISTQB-certified
+teams would ship higher quality. They don't.
+Want better testers? Hire curious people, give them context, let them
+explore, teach them technical skills. Certification is optional.
+Thinking is mandatory."
+```
+---
+## Assessment Rubrics
+### Code Quality Rubric (Linus Mode)
+```markdown
+| Criteria | Failing | Passing | Excellent |
+|----------|---------|---------|-----------|
+| **Correctness** | Wrong algorithm/logic | Works in tested cases | Proven correct across edge cases |
+| **Performance** | Naive O(n²) where O(n) exists | Acceptable complexity | Optimal algorithm + profiled |
+| **Error Handling** | Crashes on invalid input | Returns error codes | Graceful degradation + logging |
+| **Concurrency** | Race conditions present | Thread-safe with locks | Lock-free or proven safe |
+| **Testability** | Impossible to unit test | Can be tested with mocks | Self-testing design |
+| **Maintainability** | "Clever" code | Clear intent | Self-documenting + simple |
+**Passing Threshold**: Minimum "Passing" on all criteria
+**Ship-Ready**: Minimum "Excellent" on Correctness, Performance, Error Handling
+```
+### Test Quality Rubric (Ramsay Mode)
+```markdown
+| Criteria | Raw | Acceptable | Michelin Star |
+|----------|-----|------------|---------------|
+| **Coverage** | <50% branch | 80%+ branch | 95%+ branch + mutation tested |
+| **Edge Cases** | Only happy path | Common failures | Boundary analysis complete |
+| **Clarity** | What is this testing? | Clear test names | Self-documenting test pyramid |
+| **Speed** | Minutes to run | <10s for unit tests | <1s, parallelized |
+| **Stability** | Flaky (>1% failure) | Stable but slow | Deterministic + fast |
+| **Isolation** | Tests depend on each other | Independent tests | Pure functions, no shared state |
+**Merge Threshold**: Minimum "Acceptable" on all criteria
+**Production-Ready**: Minimum "Michelin Star" on Coverage, Stability, Isolation
+```
+### BS Detection Rubric (Bach Mode)
+```markdown
+| Red Flag | Evidence | Impact |
+|----------|----------|--------|
+| **Cargo Cult Practice** | "Best practice" with no context | Wasted effort, false confidence |
+| **Certification Theater** | Required cert unrelated to skills | Filters out critical thinkers |
+| **Vendor Lock-In** | Tool solves problem it created | Expensive dependency |
+| **False Automation** | "AI" still needs human verification | Automation debt |
+| **Checkbox Quality** | Compliance without outcomes | Audit passes, customers suffer |
+| **Hype Cycle** | Promises 10x improvement | Budget waste, disillusionment |
+**Green Flag Test**: "Does this help testers/developers do better work in THIS context?"
+```
+---
+## Calibration Guide
+### When Brutal Honesty Works
+```markdown
+✅ **Senior engineer with ego but skills**
+   → They can handle directness and will respect precision
+✅ **Repeated architectural mistakes**
+   → Gentle approaches failed; escalation needed
+✅ **Critical bug in production code**
+   → Stakes are high; no time for sugar-coating
+✅ **Evaluating vendor claims before purchase**
+   → BS detection prevents expensive mistakes
+✅ **Team explicitly requests no-BS feedback**
+   → They've given permission for harshness
+```
+### When to Dial It Back
+```markdown
+❌ **Junior developer's first PR**
+   → Use constructive mentoring instead
+❌ **Team is already demoralized**
+   → Harsh criticism will break, not motivate
+❌ **Public forum or team meeting**
+   → Public humiliation destroys trust
+❌ **Unclear if recipient can fix it**
+   → Frustration without actionability is cruel
+❌ **Personal attack vs. technical criticism**
+   → Never: "You're stupid"
+   → Always: "This approach is flawed because..."
+```
+---
+## Examples from History
+### Linus Torvalds: Technical Precision
+> **Original Email** (kernel mailing list):
+> "Christ, people. Learn to use git rebase. This merge mess is unreadable.
+> I'm not pulling this garbage until you clean up the history. And don't
+> give me that 'git is hard' excuse - it's your job to know your tools."
+**Why It Worked**:
+- ✅ Clear technical standard (clean git history)
+- ✅ Actionable fix (use rebase)
+- ✅ Audience was experienced kernel developers
+- ✅ Pattern had been explained before
+**When It Backfired**:
+- ❌ Created hostile environment for newcomers
+- ❌ Scared away potential contributors
+- ❌ Linus later acknowledged cost to community
+**Lesson**: Technical precision without empathy scales poorly.
+---
+### Gordon Ramsay: Standards-Driven Excellence
+> **Kitchen Nightmares**:
+> "You've served me frozen ravioli from a bag and tried to pass it off as
+> fresh pasta. Do you think I'm an idiot? Your customers aren't idiots either.
+> You know what fresh pasta tastes like - why are you serving this?"
+**Why It Worked**:
+- ✅ Clear standard (fresh pasta vs. frozen)
+- ✅ Owner had expertise (was trained chef)
+- ✅ Impact was clear (losing customers)
+- ✅ Ramsay showed what excellence looked like (cooked fresh pasta)
+**Structure**:
+1. Identify gap between current and excellent
+2. Question why gap exists (laziness, cost-cutting, ignorance)
+3. Demonstrate excellence
+4. Demand recipient meet the standard they already know
+---
+### James Bach: BS Detection in Testing
+> **Blog Post on Test Automation**:
+> "When a vendor tells you their tool 'automates testing,' ask them to define
+> 'testing.' They usually mean 'running checks' - verifying known conditions.
+> Actual testing requires thinking, questioning, exploring. That can't be
+> automated. What they're selling is useful, but it's not testing. Don't
+> let marketing confuse you."
+**Why It Works**:
+- ✅ Clarifies terminology confusion
+- ✅ Exposes economic incentives (vendor profit)
+- ✅ Empowers testers to think critically
+- ✅ Doesn't attack tool, attacks misleading claims
+**Structure**:
+1. Identify the BS claim
+2. Explain why it's misleading
+3. Clarify what's actually true
+4. Advocate for context-driven thinking
+---
+## Advanced Patterns
+### Pattern 1: The Technical Breakdown
+**When**: Code demonstrates fundamental misunderstanding
+```markdown
+**Step 1**: Identify the core misunderstanding
+"You're treating this like a single-threaded problem, but it's not."
+**Step 2**: Explain the fundamental concept
+"In concurrent systems, shared mutable state requires synchronization."
+**Step 3**: Show where it breaks
+"When thread A reads X=5, thread B might write X=10 before A completes."
+**Step 4**: Demand better
+"This is concurrency 101. Why wasn't this caught in review?"
+```
+### Pattern 2: The Standards Gap
+**When**: Quality is measurably below known standards
+```markdown
+**Step 1**: Establish the standard
+"Production-ready code has 80%+ branch coverage."
+**Step 2**: Measure the gap
+"This has 35% coverage."
+**Step 3**: Show what's missing
+"You're not testing error paths, edge cases, or validation."
+**Step 4**: Demand excellence
+"You know what good looks like. Deliver it."
+```
+### Pattern 3: The BS Detector
+**When**: Claims don't match reality
+```markdown
+**Step 1**: State the claim
+"This certification proves testing competency."
+**Step 2**: Question it
+"Does it? What does it actually test?"
+**Step 3**: Expose the gap
+"It tests memorization, not bug-finding ability."
+**Step 4**: Advocate for reality
+"Want better testers? Measure outcomes, not credentials."
+```
+---
+## Troubleshooting
+### Issue: Feedback Feels Personal
+**Symptoms**: Recipient becomes defensive or emotional
+**Cause**: Criticism targeted person instead of work
+**Solution**:
+```markdown
+❌ "You're not thinking about edge cases"
+✅ "This code doesn't handle edge cases because..."
+❌ "You always write flaky tests"
+✅ "These tests are flaky because they depend on timing"
+**Key**: Attack the work, not the worker.
+```
+### Issue: Feedback Isn't Actionable
+**Symptoms**: "This sucks" without explanation
+**Cause**: Missing the "why" and "how to fix"
+**Solution**:
+```markdown
+❌ "This code is terrible"
+✅ "This code is inefficient because [reason]. Fix by [approach]."
+**Structure**:
+1. What's wrong (specific)
+2. Why it's wrong (technical reason)
+3. What correct looks like (model)
+4. How to fix it (actionable)
+```
+### Issue: Calibration is Wrong
+**Symptoms**: Harsh feedback in wrong context (junior dev, demoralized team)
+**Cause**: Forgot to check audience/context
+**Solution**:
+```markdown
+**Before brutal honesty, verify:**
+1. Recipient has skills to fix it
+2. Relationship capital exists
+3. Context allows for directness
+4. Psychological safety is high
+**If any are false, dial back to constructive.**
+```
+---
+## Related Skills
+- **[Code Review Quality](../code-review-quality/)** - Diplomatic version of code review
+- **[Context-Driven Testing](../context-driven-testing/)** - Foundation for Bach-mode BS detection
+- **[TDD Red-Green-Refactor](../tdd-london-chicago/)** - Systematic quality approach
+- **[Exploratory Testing](../exploratory-testing-advanced/)** - Critical thinking in testing
+---
+## Philosophy
+### Why Brutal Honesty Has a Place
+**1. Eliminates Ambiguity**
+- Diplomatic: "Maybe consider using a different approach?"
+- Brutal: "This approach is wrong because [reason]. Use [correct approach]."
+- **Result**: No confusion about expectations.
+**2. Scales Technical Standards**
+- Gentle mentoring works 1:1, doesn't scale
+- Brutal public technical breakdown teaches entire team
+- **Trade-off**: Works only with psychologically safe, mature teams.
+**3. Cuts Through BS**
+- Certifications, vendor hype, cargo cult practices thrive on politeness
+- Brutal honesty exposes when emperor has no clothes
+- **Result**: Resources spent on what actually helps.
+### The Costs
+**1. Relationship Damage**
+- Harsh criticism without trust destroys collaboration
+- Public brutality creates hostile environment
+- **Mitigation**: Earn relationship capital first.
+**2. Chills Participation**
+- Fear of harsh feedback stops people from contributing
+- Newcomers avoid communities known for brutal feedback
+- **Mitigation**: Reserve brutality for experienced engineers and repeated mistakes.
+**3. Burnout**
+- Constant harsh criticism is exhausting
+- Both giver and receiver pay psychological cost
+- **Mitigation**: Use sparingly, only when necessary.
+---
+## The Brutal Honesty Contract
+Before using this skill, establish explicit contract:
+```markdown
+"I'm going to give you unfiltered technical feedback. This will be direct,
+possibly harsh. The goal is clarity, not cruelty. I'll explain:
+1. What's wrong (specifically)
+2. Why it's wrong (technically)
+3. What correct looks like
+4. How to fix it
+If you want diplomatic feedback instead, let me know now."
+```
+**Get explicit consent before proceeding.**
+---
+**Created**: 2025-11-13
+**Category**: Quality Engineering / Code Review
+**Difficulty**: Advanced (requires judgment)
+**Use With Caution**: Can damage morale if misapplied
+**Best For**: Senior engineers, security issues, BS detection