npm - clawpowers - Versions diffs - 1.0.0 - Mend

clawpowers 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/.claude-plugin/manifest.json +19 -0
package/.codex/INSTALL.md +36 -0
package/.cursor-plugin/manifest.json +21 -0
package/.opencode/INSTALL.md +52 -0
package/ARCHITECTURE.md +69 -0
package/README.md +381 -0
package/bin/clawpowers.js +390 -0
package/bin/clawpowers.sh +91 -0
package/gemini-extension.json +32 -0
package/hooks/session-start +205 -0
package/hooks/session-start.cmd +43 -0
package/hooks/session-start.js +163 -0
package/package.json +54 -0
package/runtime/feedback/analyze.js +621 -0
package/runtime/feedback/analyze.sh +546 -0
package/runtime/init.js +172 -0
package/runtime/init.sh +145 -0
package/runtime/metrics/collector.js +361 -0
package/runtime/metrics/collector.sh +308 -0
package/runtime/persistence/store.js +433 -0
package/runtime/persistence/store.sh +303 -0
package/skill.json +74 -0
package/skills/agent-payments/SKILL.md +411 -0
package/skills/brainstorming/SKILL.md +233 -0
package/skills/content-pipeline/SKILL.md +282 -0
package/skills/dispatching-parallel-agents/SKILL.md +305 -0
package/skills/executing-plans/SKILL.md +255 -0
package/skills/finishing-a-development-branch/SKILL.md +260 -0
package/skills/learn-how-to-learn/SKILL.md +235 -0
package/skills/market-intelligence/SKILL.md +288 -0
package/skills/prospecting/SKILL.md +313 -0
package/skills/receiving-code-review/SKILL.md +225 -0
package/skills/requesting-code-review/SKILL.md +206 -0
package/skills/security-audit/SKILL.md +308 -0
package/skills/subagent-driven-development/SKILL.md +244 -0
package/skills/systematic-debugging/SKILL.md +279 -0
package/skills/test-driven-development/SKILL.md +299 -0
package/skills/using-clawpowers/SKILL.md +137 -0
package/skills/using-git-worktrees/SKILL.md +261 -0
package/skills/verification-before-completion/SKILL.md +254 -0
package/skills/writing-plans/SKILL.md +276 -0
package/skills/writing-skills/SKILL.md +260 -0

package/skills/subagent-driven-development/SKILL.md ADDED Viewed

@@ -0,0 +1,244 @@
+---
+name: subagent-driven-development
+description: Orchestrate complex tasks by dispatching fresh subagents with isolated context, two-stage review, and Git worktree isolation. Activate when a task is large enough to benefit from parallelism or context separation.
+version: 1.0.0
+requires:
+  tools: [git, bash]
+  runtime: false
+metrics:
+  tracks: [tasks_dispatched, subagent_success_rate, review_pass_rate, time_to_completion]
+  improves: [task_decomposition_quality, spec_clarity, review_threshold]
+---
+# Subagent-Driven Development
+## When to Use
+Apply this skill when you encounter:
+- A task with 3+ logically independent workstreams
+- A task so large it would exhaust a single context window
+- A feature requiring multiple specialists (frontend + backend + tests + docs)
+- Any work where a bug in one component shouldn't block another
+- A task with clear interfaces between components (you can spec them up front)
+**Skip this skill when:**
+- The task is tightly coupled — one change cascades everywhere
+- You need to maintain narrative continuity across all components
+- The task is < 2 hours of work for a single agent
+- You don't have enough information to spec subagent boundaries yet
+**Decision tree:**
+```
+Can the task be split into N parts with defined interfaces?
+├── No  → single-agent execution
+└── Yes → Can subagents work concurrently without blocking each other?
+          ├── No  → sequential execution with checkpointing (executing-plans)
+          └── Yes → subagent-driven-development ← YOU ARE HERE
+```
+## Core Methodology
+### Stage 0: Task Decomposition (do this yourself, not in a subagent)
+Before dispatching anything, produce:
+1. **Task tree** — hierarchical breakdown of the full work
+2. **Subagent boundaries** — where one agent's output is another's input
+3. **Interface contracts** — what each subagent accepts and delivers
+4. **Dependency order** — which can run in parallel, which must sequence
+**Decomposition heuristic:** Each subagent task should be completable in 1 context window (roughly 2-5K tokens of output). If larger, decompose further.
+**Example decomposition for "Build authentication service":**
+```
+auth-service/
+├── Subagent A: API design + OpenAPI spec     [no dependencies]
+├── Subagent B: Database schema + migrations   [no dependencies]
+├── Subagent C: Core auth logic (JWT, bcrypt)  [depends on: A, B specs]
+├── Subagent D: Integration tests              [depends on: C output]
+└── Subagent E: Documentation                 [depends on: A, C, D output]
+```
+### Stage 1: Spec Writing (per subagent)
+For each subagent, write a precise spec that includes:
+```markdown
+## Subagent Spec: [Component Name]
+**Objective:** [Single sentence — what this subagent produces]
+**Context provided:**
+- [File or artifact they receive as input]
+- [Interface contract from upstream subagent]
+**Deliverables:**
+- [Specific file or artifact, not vague output]
+- [Test file covering the deliverable]
+**Constraints:**
+- [Language/framework requirements]
+- [Performance requirements if applicable]
+- [Must not break: existing interfaces]
+**Done criteria:**
+- [ ] All tests pass
+- [ ] Interface contract satisfied
+- [ ] No TODOs or stubs in production code
+```
+**Anti-pattern:** Vague specs produce vague output. "Build the auth logic" is not a spec. "Implement JWT issuance and validation with RS256, returning {token, expiresAt, userId} from issue() and {valid, userId, error} from validate()" is a spec.
+### Stage 2: Worktree Isolation
+Each subagent works in an isolated Git worktree to prevent interference:
+```bash
+# Create worktrees for parallel subagents
+git worktree add ../task-auth-api feature/auth-api
+git worktree add ../task-auth-db feature/auth-db
+git worktree add ../task-auth-core feature/auth-core
+# Verify isolation
+git worktree list
+```
+Worktrees share the repo history but have independent working directories. A subagent working in `../task-auth-api` cannot accidentally overwrite files in `../task-auth-core`.
+See: `skills/using-git-worktrees/SKILL.md` for full worktree management protocol.
+### Stage 3: Subagent Dispatch
+Dispatch each subagent with:
+1. The spec (complete, not abbreviated)
+2. All input artifacts (relevant files, interface contracts)
+3. Access to their assigned worktree
+4. No instruction to "skip complicated parts" or "use a stub"
+**Dispatch instruction template:**
+```
+You are implementing [component]. Your spec is below. Work only in the provided
+worktree directory. Produce real, working code with tests — no stubs, no TODOs.
+Deliver: [specific files]. When done, output a JSON summary of what you built.
+[Full spec here]
+```
+### Stage 4: Two-Stage Review
+**Stage 4a: Spec review** — Before running any subagent code, review that:
+- The output matches the spec's deliverables
+- Interface contracts are satisfied (types match, method signatures match)
+- No stubs or mocks in production code paths
+- Tests exist and cover the critical paths
+**Stage 4b: Quality review** — After running the code:
+- All tests pass (zero failing)
+- No linting errors
+- Performance meets requirements
+- Security: no hardcoded credentials, no SQL injection vectors, no unvalidated inputs
+**Review failure protocol:**
+```
+If Stage 4a fails → return spec to subagent with specific failure reason
+If Stage 4b fails → return to subagent with exact failing test output
+Never merge code that fails either review stage
+```
+### Stage 5: Integration
+After all subagents pass review:
+1. Merge worktrees in dependency order
+2. Run full integration test suite
+3. Resolve any interface mismatches (typically minor type issues)
+4. Clean up worktrees
+```bash
+# Merge in order (B and C are independent, merge alphabetically)
+git checkout main
+git merge feature/auth-db
+git merge feature/auth-api
+git merge feature/auth-core   # depends on both
+git merge feature/auth-tests
+git merge feature/auth-docs
+# Clean up
+git worktree remove ../task-auth-api
+git worktree remove ../task-auth-db
+# ... etc
+```
+## ClawPowers Enhancement
+When `~/.clawpowers/` runtime is initialized:
+**Persistent Execution DB:** Every subagent dispatch is logged with spec hash, start time, subagent ID, and outcome. If a session is interrupted, you know exactly which subagents completed and which to re-run.
+```bash
+# Record dispatch
+bash runtime/persistence/store.sh set "subagent:auth-api:status" "dispatched"
+bash runtime/persistence/store.sh set "subagent:auth-api:spec_hash" "$(echo "$SPEC" | sha256sum | cut -c1-8)"
+# Check on resume
+bash runtime/persistence/store.sh get "subagent:auth-api:status"
+```
+**Resumable Checkpoints:** The framework saves the task tree and each subagent's completion state. A session that crashes mid-dispatch resumes from the last successful checkpoint, not from scratch.
+**Outcome Metrics:** After integration, record:
+```bash
+bash runtime/metrics/collector.sh record \
+  --skill subagent-driven-development \
+  --outcome success \
+  --duration 3600 \
+  --notes "auth-service: 5 subagents, 2 review cycles, 0 integration failures"
+```
+**Metric-driven decomposition:** After 10+ executions, `runtime/feedback/analyze.sh` identifies your optimal subagent granularity — tasks that are too small (high coordination overhead) or too large (high review failure rate).
+## Anti-Patterns
+| Anti-Pattern | Why It Fails | Correct Approach |
+|-------------|-------------|-----------------|
+| Vague spec ("build the auth thing") | Subagent guesses, output is wrong | Write spec with deliverables and done criteria |
+| Skip the failure witness | Review catches nothing | Require all tests to pass in the review stage |
+| Merge before review | Bad code enters main | Two-stage review is non-negotiable |
+| Single worktree for multiple agents | Files overwrite each other | One worktree per subagent, always |
+| Decompose too fine | Excessive coordination cost | Target 1-context-window tasks (2-5K token output) |
+| Decompose too coarse | Subagent context exhaustion | If output > 1 context window, split further |
+| Stub the hard parts | Tech debt accumulates | "No stubs" is a hard constraint in the spec |
+## Examples
+### Example 1: Simple (2 subagents)
+**Task:** Add email verification to existing user signup
+**Decomposition:**
+- Subagent A: Email service integration (SendGrid/SES wrapper, template rendering)
+- Subagent B: Verification flow (token generation, storage, verification endpoint)
+- Sequential: B depends on A's interface
+**Specs:** A delivers `EmailService` class with `send(to, template, vars)` → B uses that interface
+### Example 2: Complex (5 subagents)
+**Task:** Build real-time dashboard
+**Decomposition:**
+- Subagent A: WebSocket server (connection mgmt, message routing) [parallel]
+- Subagent B: Data aggregation service (query engine, caching) [parallel]
+- Subagent C: Frontend dashboard components (React, chart library) [parallel]
+- Subagent D: Integration tests (WebSocket + aggregation E2E) [depends on A, B]
+- Subagent E: Dashboard state management (connects C to A/B) [depends on A, B, C]
+**Parallel dispatch:** A, B, C run concurrently. D and E run after A, B, C complete review.
+## Integration with Other Skills
+- Use `writing-plans` first if you don't have a clear task tree yet
+- Apply `using-git-worktrees` for worktree lifecycle management
+- Use `dispatching-parallel-agents` if subagents run as independent processes
+- Apply `verification-before-completion` before final integration merge

package/skills/systematic-debugging/SKILL.md ADDED Viewed

@@ -0,0 +1,279 @@
+---
+name: systematic-debugging
+description: Hypothesis-driven debugging with evidence collection. Activate when you encounter unexpected behavior, a failing test, or a bug report.
+version: 1.0.0
+requires:
+  tools: [bash, git]
+  runtime: false
+metrics:
+  tracks: [hypotheses_tested, time_to_root_cause, false_positives, reopen_rate]
+  improves: [hypothesis_quality, evidence_collection_speed, known_issue_match_rate]
+---
+# Systematic Debugging
+## When to Use
+Apply this skill when:
+- A test is failing and the cause isn't immediately obvious
+- A bug report describes behavior that shouldn't happen
+- Code that worked before suddenly doesn't
+- A production alert is firing
+- You've tried 2+ fixes without understanding why they work or don't
+**Skip when:**
+- The cause is obvious from the error message (typo, import missing, syntax error)
+- You've seen this exact error before and know the fix
+- It's a configuration issue, not a logic bug
+**Decision tree:**
+```
+Is the error message self-explanatory?
+├── Yes → fix it directly
+└── No  → Have you seen this pattern before?
+          ├── Yes → apply the known fix, verify, document
+          └── No  → systematic-debugging ← YOU ARE HERE
+```
+## Core Methodology
+### The Scientific Debugging Loop
+```
+Observe → Form hypothesis → Design experiment → Execute → Collect evidence → Conclude → Repeat
+```
+Never skip steps. The most common debugging failure is jumping from "observe" directly to "try a fix" — which produces random mutations until something accidentally works, with no understanding of why.
+### Step 1: Observation (Gather All Evidence First)
+Before forming any hypothesis, collect:
+**Required evidence:**
+- [ ] Exact error message (full stack trace, not a summary)
+- [ ] Steps to reproduce (minimal reproducible case)
+- [ ] What changed recently (git log since last known good)
+- [ ] Environment (OS, language version, dependency versions)
+- [ ] Frequency (always, intermittent, under specific conditions)
+**Observation template:**
+```markdown
+## Bug Observation
+**Error:** [Paste exact error/stack trace]
+**Reproduces:** [Always / Intermittent (N/M times) / Only when X]
+**Environment:** [OS, runtime version, key dependency versions]
+**Last known good:** [commit hash or date when this worked]
+**Recent changes:** [output of: git log --oneline --since="3 days ago"]
+**Minimal repro:**
+[Smallest possible code that triggers the error]
+```
+**The minimal repro is not optional.** Debugging without a minimal repro is debugging the wrong problem. Strip everything until you have the smallest code that still fails.
+### Step 2: Hypothesis Formation
+From the observation, generate 2-4 hypotheses. Rules:
+- Each hypothesis must be **specific** (names a cause, not a category)
+- Each hypothesis must be **falsifiable** (an experiment can prove it wrong)
+- Hypotheses must be **ranked by probability** (investigate most likely first)
+**Bad hypothesis:** "There might be an issue with the database"
+**Good hypothesis:** "The connection pool is exhausted because we're not releasing connections in the error path of `process_payment()`"
+**Hypothesis template:**
+```markdown
+## Hypothesis N: [Specific cause]
+**Mechanism:** [How this cause produces the observed symptom]
+**Probability:** [High/Medium/Low] because [reason]
+**Experiment:** [Specific test that proves or disproves this hypothesis]
+**Expected evidence if TRUE:** [What you'd see if this is the cause]
+**Expected evidence if FALSE:** [What you'd see if this is not the cause]
+```
+### Step 3: Experiments (Investigate, Don't Fix)
+**Critical rule:** Run experiments to gather evidence, not to fix the bug. The fix comes after you understand the cause.
+**Experiment types:**
+**Isolation:** Narrow the failure scope
+```bash
+# Does it fail with a fresh database?
+docker run --rm -e POSTGRES_DB=test postgres:15
+python -m pytest tests/test_payment.py --db-url postgresql://localhost/test
+# Does it fail with a specific user only?
+python -m pytest tests/test_payment.py -k "user_123"
+```
+**Binary search:** Git bisect for regressions
+```bash
+git bisect start
+git bisect bad HEAD
+git bisect good v2.3.1  # last known good
+git bisect run python -m pytest tests/test_payment.py -x
+# Git finds the exact commit that introduced the bug
+```
+**Logging:** Add targeted logging at the hypothesis boundary
+```python
+# Don't add logging everywhere — add it exactly where the hypothesis predicts the failure
+import logging
+logger = logging.getLogger(__name__)
+def process_payment(payment_id: str):
+    conn = get_db_connection()
+    logger.debug(f"process_payment: got connection {id(conn)}, pool size: {pool.size()}")
+    try:
+        # ... payment logic
+        return result
+    except Exception as e:
+        logger.error(f"process_payment FAILED: {e}, conn being released: {id(conn)}")
+        # BUG: connection not released here → pool exhaustion
+        raise  # Fix: conn.close() before raise
+```
+**State inspection:** Check system state at the failure point
+```bash
+# Check connection pool state before/during/after
+psql -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
+# Check event queue depth
+redis-cli LLEN payment_queue
+# Check file descriptor usage
+lsof -p $(pgrep -f payment_service) | wc -l
+```
+### Step 4: Evidence Collection
+After each experiment, record what you found:
+```markdown
+## Evidence: Hypothesis N Test
+**Experiment run:** [command or action taken]
+**Result:** [what actually happened]
+**Conclusion:** [does this support or refute the hypothesis?]
+**Next step:** [if supported: deeper investigation | if refuted: next hypothesis]
+```
+**Never interpret evidence to fit the hypothesis.** If the experiment contradicts the hypothesis, the hypothesis is wrong. Form a new one.
+### Step 5: Root Cause Identification
+When an experiment strongly confirms a hypothesis:
+1. State the root cause precisely: "The root cause is [mechanism], which occurs because [condition], resulting in [symptom]"
+2. Trace back: Is this the root cause or a symptom of a deeper cause? Ask "why" 3-5 times.
+3. Identify the fix that addresses the root cause, not just the symptom.
+**Root cause template:**
+```markdown
+## Root Cause
+**Statement:** [precise description of the cause]
+**Why it happens:** [condition that triggers it]
+**Why it wasn't caught:** [test gap, code review miss, etc.]
+**Fix:** [specific code change that addresses the root cause]
+**Regression test:** [test that would have caught this]
+**Prevention:** [process change to prevent this class of bug]
+```
+### Step 6: Fix and Verify
+1. Apply the minimal fix (don't refactor while fixing — that's scope creep)
+2. Verify the original reproduction case no longer fails
+3. Verify the fix doesn't break other tests
+4. Write the regression test
+5. Commit fix and test together
+## ClawPowers Enhancement
+When `~/.clawpowers/` runtime is initialized:
+**Persistent Hypothesis Tree:**
+The full investigation is saved and never lost between sessions:
+```bash
+# Save investigation state
+bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:observation" "ConnectionPool timeout after 50 requests"
+bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:hypothesis1" "Connection not released in error path"
+bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:h1_result" "CONFIRMED: no conn.close() in except block"
+bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:root_cause" "Missing conn.close() in process_payment error path"
+bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:fix_commit" "a3f9b2c"
+```
+If debugging spans multiple sessions, resume with:
+```bash
+bash runtime/persistence/store.sh list "debug:payment-pool-exhaustion:*"
+```
+**Known-Issue Pattern Matching:**
+Past root causes are searchable. Before forming hypotheses:
+```bash
+bash runtime/persistence/store.sh list "debug:*:root_cause" | grep -i "connection"
+# → Found 2 prior connection-related bugs
+# → Shows fixes applied, saving re-investigation time
+```
+**Debugging Metrics:**
+```bash
+bash runtime/metrics/collector.sh record \
+  --skill systematic-debugging \
+  --outcome success \
+  --duration 1800 \
+  --notes "payment-pool: 3 hypotheses, 1 correct, git bisect narrowed to 1 commit"
+```
+Tracks: time-to-root-cause, hypothesis accuracy rate, which experiment types are most effective.
+## Anti-Patterns
+| Anti-Pattern | Why It Fails | Correct Approach |
+|-------------|-------------|-----------------|
+| "Try-and-see" debugging | Random mutations, no understanding | Form hypothesis before changing code |
+| Fixing without reproducing | Can't verify the fix worked | Minimal repro first, always |
+| Investigating without isolation | Debugging the wrong level | Binary search / isolate the scope first |
+| Multiple changes at once | Can't attribute which change fixed it | One change per experiment |
+| Interpreting evidence to fit hypothesis | Confirmation bias, wrong fix | Evidence disproves or confirms; update hypothesis |
+| Debugging by adding logs everywhere | Signal-to-noise ratio collapses | Targeted logging at hypothesis boundary only |
+| Not writing regression test | Same bug recurs | Regression test is non-optional |
+| Fixing symptoms, not root cause | Bug returns in a different form | Ask "why" 3-5 times to reach root cause |
+## Examples
+### Example 1: Intermittent Test Failure
+**Observation:** `test_concurrent_writes` fails 20% of the time with `AssertionError: expected 100 rows, got 97-99`
+**Hypothesis 1:** Race condition — concurrent writes arrive after the assertion reads
+- Experiment: Add sleep(0.1) before assertion
+- Result: Still fails
+- Conclusion: Not a timing issue
+**Hypothesis 2:** Lost update — concurrent transactions overwrite each other
+- Experiment: Add row-level locking to write path
+- Result: 0 failures in 100 runs
+- Conclusion: CONFIRMED — missing `SELECT FOR UPDATE` in the read-modify-write cycle
+**Root cause:** `update_counter()` reads then writes without a lock — concurrent execution loses updates.
+### Example 2: Production Alert
+**Observation:** Memory usage grows 50MB/hour until OOM restart
+**Hypothesis 1:** Memory leak — objects not garbage collected
+- Experiment: `objgraph.most_common_types()` before and after request batches
+- Result: `WeakValueDictionary` count grows monotonically
+- Conclusion: CONFIRMED — cache holds strong refs despite `WeakValue` (values are themselves containers)
+**Root cause:** Cache stores lists as values, lists are containers that prevent GC of their contents.