npm - shipwright-cli - Versions diffs - 3.2.0 → 3.3.0 - Mend

shipwright-cli 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (279) hide show

package/.claude/agents/code-reviewer.md +2 -0
package/.claude/agents/devops-engineer.md +2 -0
package/.claude/agents/doc-fleet-agent.md +2 -0
package/.claude/agents/pipeline-agent.md +2 -0
package/.claude/agents/shell-script-specialist.md +2 -0
package/.claude/agents/test-specialist.md +2 -0
package/.claude/hooks/agent-crash-capture.sh +32 -0
package/.claude/hooks/post-tool-use.sh +3 -2
package/.claude/hooks/pre-tool-use.sh +35 -3
package/README.md +4 -4
package/claude-code/hooks/config-change.sh +18 -0
package/claude-code/hooks/instructions-reloaded.sh +7 -0
package/claude-code/hooks/worktree-create.sh +25 -0
package/claude-code/hooks/worktree-remove.sh +20 -0
package/config/code-constitution.json +130 -0
package/dashboard/middleware/auth.ts +134 -0
package/dashboard/middleware/constants.ts +21 -0
package/dashboard/public/index.html +2 -6
package/dashboard/public/styles.css +100 -97
package/dashboard/routes/auth.ts +38 -0
package/dashboard/server.ts +66 -25
package/dashboard/services/config.ts +26 -0
package/dashboard/services/db.ts +118 -0
package/dashboard/src/canvas/pixel-agent.ts +298 -0
package/dashboard/src/canvas/pixel-sprites.ts +440 -0
package/dashboard/src/canvas/shipyard-effects.ts +367 -0
package/dashboard/src/canvas/shipyard-scene.ts +616 -0
package/dashboard/src/canvas/submarine-layout.ts +267 -0
package/dashboard/src/components/header.ts +8 -7
package/dashboard/src/core/router.ts +1 -0
package/dashboard/src/design/submarine-theme.ts +253 -0
package/dashboard/src/main.ts +2 -0
package/dashboard/src/types/api.ts +2 -1
package/dashboard/src/views/activity.ts +2 -1
package/dashboard/src/views/shipyard.ts +39 -0
package/dashboard/types/index.ts +166 -0
package/docs/plans/2026-02-28-compound-audit-and-shipyard-design.md +186 -0
package/docs/plans/2026-02-28-skipper-shipwright-implementation-plan.md +1182 -0
package/docs/plans/2026-02-28-skipper-shipwright-integration-design.md +531 -0
package/docs/plans/2026-03-01-ai-powered-skill-injection-design.md +298 -0
package/docs/plans/2026-03-01-ai-powered-skill-injection-plan.md +1109 -0
package/docs/plans/2026-03-01-capabilities-cleanup-plan.md +658 -0
package/docs/plans/2026-03-01-clean-architecture-plan.md +924 -0
package/docs/plans/2026-03-01-compound-audit-cascade-design.md +191 -0
package/docs/plans/2026-03-01-compound-audit-cascade-plan.md +921 -0
package/docs/plans/2026-03-01-deep-integration-plan.md +851 -0
package/docs/plans/2026-03-01-pipeline-audit-trail-design.md +145 -0
package/docs/plans/2026-03-01-pipeline-audit-trail-plan.md +770 -0
package/docs/plans/2026-03-01-refined-depths-brand-design.md +382 -0
package/docs/plans/2026-03-01-refined-depths-implementation.md +599 -0
package/docs/plans/2026-03-01-skipper-kernel-integration-design.md +203 -0
package/docs/plans/2026-03-01-unified-platform-design.md +272 -0
package/docs/plans/2026-03-07-claude-code-feature-integration-design.md +189 -0
package/docs/plans/2026-03-07-claude-code-feature-integration-plan.md +1165 -0
package/docs/research/BACKLOG_QUICK_REFERENCE.md +352 -0
package/docs/research/CUTTING_EDGE_RESEARCH_2026.md +546 -0
package/docs/research/RESEARCH_INDEX.md +439 -0
package/docs/research/RESEARCH_SOURCES.md +440 -0
package/docs/research/RESEARCH_SUMMARY.txt +275 -0
package/docs/superpowers/specs/2026-03-10-pipeline-quality-revolution-design.md +341 -0
package/package.json +2 -2
package/scripts/lib/adaptive-model.sh +427 -0
package/scripts/lib/adaptive-timeout.sh +316 -0
package/scripts/lib/audit-trail.sh +309 -0
package/scripts/lib/auto-recovery.sh +471 -0
package/scripts/lib/bandit-selector.sh +431 -0
package/scripts/lib/bootstrap.sh +104 -2
package/scripts/lib/causal-graph.sh +455 -0
package/scripts/lib/compat.sh +126 -0
package/scripts/lib/compound-audit.sh +337 -0
package/scripts/lib/constitutional.sh +454 -0
package/scripts/lib/context-budget.sh +359 -0
package/scripts/lib/convergence.sh +594 -0
package/scripts/lib/cost-optimizer.sh +634 -0
package/scripts/lib/daemon-adaptive.sh +10 -0
package/scripts/lib/daemon-dispatch.sh +106 -17
package/scripts/lib/daemon-failure.sh +34 -4
package/scripts/lib/daemon-patrol.sh +23 -2
package/scripts/lib/daemon-poll-github.sh +361 -0
package/scripts/lib/daemon-poll-health.sh +299 -0
package/scripts/lib/daemon-poll.sh +27 -611
package/scripts/lib/daemon-state.sh +112 -66
package/scripts/lib/daemon-triage.sh +10 -0
package/scripts/lib/dod-scorecard.sh +442 -0
package/scripts/lib/error-actionability.sh +300 -0
package/scripts/lib/formal-spec.sh +461 -0
package/scripts/lib/helpers.sh +177 -4
package/scripts/lib/intent-analysis.sh +409 -0
package/scripts/lib/loop-convergence.sh +350 -0
package/scripts/lib/loop-iteration.sh +682 -0
package/scripts/lib/loop-progress.sh +48 -0
package/scripts/lib/loop-restart.sh +185 -0
package/scripts/lib/memory-effectiveness.sh +506 -0
package/scripts/lib/mutation-executor.sh +352 -0
package/scripts/lib/outcome-feedback.sh +521 -0
package/scripts/lib/pipeline-cli.sh +336 -0
package/scripts/lib/pipeline-commands.sh +1216 -0
package/scripts/lib/pipeline-detection.sh +100 -2
package/scripts/lib/pipeline-execution.sh +897 -0
package/scripts/lib/pipeline-github.sh +28 -3
package/scripts/lib/pipeline-intelligence-compound.sh +431 -0
package/scripts/lib/pipeline-intelligence-scoring.sh +407 -0
package/scripts/lib/pipeline-intelligence-skip.sh +181 -0
package/scripts/lib/pipeline-intelligence.sh +100 -1136
package/scripts/lib/pipeline-quality-bash-compat.sh +182 -0
package/scripts/lib/pipeline-quality-checks.sh +17 -715
package/scripts/lib/pipeline-quality-gates.sh +563 -0
package/scripts/lib/pipeline-stages-build.sh +730 -0
package/scripts/lib/pipeline-stages-delivery.sh +965 -0
package/scripts/lib/pipeline-stages-intake.sh +1133 -0
package/scripts/lib/pipeline-stages-monitor.sh +407 -0
package/scripts/lib/pipeline-stages-review.sh +1022 -0
package/scripts/lib/pipeline-stages.sh +59 -2929
package/scripts/lib/pipeline-state.sh +36 -5
package/scripts/lib/pipeline-util.sh +487 -0
package/scripts/lib/policy-learner.sh +438 -0
package/scripts/lib/process-reward.sh +493 -0
package/scripts/lib/project-detect.sh +649 -0
package/scripts/lib/quality-profile.sh +334 -0
package/scripts/lib/recruit-commands.sh +885 -0
package/scripts/lib/recruit-learning.sh +739 -0
package/scripts/lib/recruit-roles.sh +648 -0
package/scripts/lib/reward-aggregator.sh +458 -0
package/scripts/lib/rl-optimizer.sh +362 -0
package/scripts/lib/root-cause.sh +427 -0
package/scripts/lib/scope-enforcement.sh +445 -0
package/scripts/lib/session-restart.sh +493 -0
package/scripts/lib/skill-memory.sh +300 -0
package/scripts/lib/skill-registry.sh +775 -0
package/scripts/lib/spec-driven.sh +476 -0
package/scripts/lib/test-helpers.sh +18 -7
package/scripts/lib/test-holdout.sh +429 -0
package/scripts/lib/test-optimizer.sh +511 -0
package/scripts/shipwright-file-suggest.sh +45 -0
package/scripts/skills/adversarial-quality.md +61 -0
package/scripts/skills/api-design.md +44 -0
package/scripts/skills/architecture-design.md +50 -0
package/scripts/skills/brainstorming.md +43 -0
package/scripts/skills/data-pipeline.md +44 -0
package/scripts/skills/deploy-safety.md +64 -0
package/scripts/skills/documentation.md +38 -0
package/scripts/skills/frontend-design.md +45 -0
package/scripts/skills/generated/.gitkeep +0 -0
package/scripts/skills/generated/_refinements/.gitkeep +0 -0
package/scripts/skills/generated/_refinements/adversarial-quality.patch.md +3 -0
package/scripts/skills/generated/_refinements/architecture-design.patch.md +3 -0
package/scripts/skills/generated/_refinements/brainstorming.patch.md +3 -0
package/scripts/skills/generated/cli-version-management.md +29 -0
package/scripts/skills/generated/collection-system-validation.md +99 -0
package/scripts/skills/generated/large-scale-c-refactoring-coordination.md +97 -0
package/scripts/skills/generated/pattern-matching-similarity-scoring.md +195 -0
package/scripts/skills/generated/test-parallelization-detection.md +65 -0
package/scripts/skills/observability.md +79 -0
package/scripts/skills/performance.md +48 -0
package/scripts/skills/pr-quality.md +49 -0
package/scripts/skills/product-thinking.md +43 -0
package/scripts/skills/security-audit.md +49 -0
package/scripts/skills/systematic-debugging.md +40 -0
package/scripts/skills/testing-strategy.md +47 -0
package/scripts/skills/two-stage-review.md +52 -0
package/scripts/skills/validation-thoroughness.md +55 -0
package/scripts/sw +9 -3
package/scripts/sw-activity.sh +9 -2
package/scripts/sw-adaptive.sh +2 -1
package/scripts/sw-adversarial.sh +2 -1
package/scripts/sw-architecture-enforcer.sh +3 -1
package/scripts/sw-auth.sh +12 -2
package/scripts/sw-autonomous.sh +5 -1
package/scripts/sw-changelog.sh +4 -1
package/scripts/sw-checkpoint.sh +2 -1
package/scripts/sw-ci.sh +5 -1
package/scripts/sw-cleanup.sh +4 -26
package/scripts/sw-code-review.sh +10 -4
package/scripts/sw-connect.sh +2 -1
package/scripts/sw-context.sh +2 -1
package/scripts/sw-cost.sh +48 -3
package/scripts/sw-daemon.sh +66 -9
package/scripts/sw-dashboard.sh +3 -1
package/scripts/sw-db.sh +59 -16
package/scripts/sw-decide.sh +8 -2
package/scripts/sw-decompose.sh +360 -17
package/scripts/sw-deps.sh +4 -1
package/scripts/sw-developer-simulation.sh +4 -1
package/scripts/sw-discovery.sh +325 -2
package/scripts/sw-doc-fleet.sh +4 -1
package/scripts/sw-docs-agent.sh +3 -1
package/scripts/sw-docs.sh +2 -1
package/scripts/sw-doctor.sh +453 -2
package/scripts/sw-dora.sh +4 -1
package/scripts/sw-durable.sh +4 -3
package/scripts/sw-e2e-orchestrator.sh +17 -16
package/scripts/sw-eventbus.sh +7 -1
package/scripts/sw-evidence.sh +364 -12
package/scripts/sw-feedback.sh +550 -9
package/scripts/sw-fix.sh +20 -1
package/scripts/sw-fleet-discover.sh +6 -2
package/scripts/sw-fleet-viz.sh +4 -1
package/scripts/sw-fleet.sh +5 -1
package/scripts/sw-github-app.sh +16 -3
package/scripts/sw-github-checks.sh +3 -2
package/scripts/sw-github-deploy.sh +3 -2
package/scripts/sw-github-graphql.sh +18 -7
package/scripts/sw-guild.sh +5 -1
package/scripts/sw-heartbeat.sh +5 -30
package/scripts/sw-hello.sh +67 -0
package/scripts/sw-hygiene.sh +6 -1
package/scripts/sw-incident.sh +265 -1
package/scripts/sw-init.sh +18 -2
package/scripts/sw-instrument.sh +10 -2
package/scripts/sw-intelligence.sh +42 -6
package/scripts/sw-jira.sh +5 -1
package/scripts/sw-launchd.sh +2 -1
package/scripts/sw-linear.sh +4 -1
package/scripts/sw-logs.sh +4 -1
package/scripts/sw-loop.sh +432 -1128
package/scripts/sw-memory.sh +356 -2
package/scripts/sw-mission-control.sh +6 -1
package/scripts/sw-model-router.sh +481 -26
package/scripts/sw-otel.sh +13 -4
package/scripts/sw-oversight.sh +14 -5
package/scripts/sw-patrol-meta.sh +334 -0
package/scripts/sw-pipeline-composer.sh +5 -1
package/scripts/sw-pipeline-vitals.sh +2 -1
package/scripts/sw-pipeline.sh +53 -2664
package/scripts/sw-pm.sh +12 -5
package/scripts/sw-pr-lifecycle.sh +2 -1
package/scripts/sw-predictive.sh +7 -1
package/scripts/sw-prep.sh +185 -2
package/scripts/sw-ps.sh +5 -25
package/scripts/sw-public-dashboard.sh +15 -3
package/scripts/sw-quality.sh +2 -1
package/scripts/sw-reaper.sh +8 -25
package/scripts/sw-recruit.sh +156 -2303
package/scripts/sw-regression.sh +19 -12
package/scripts/sw-release-manager.sh +3 -1
package/scripts/sw-release.sh +4 -1
package/scripts/sw-remote.sh +3 -1
package/scripts/sw-replay.sh +7 -1
package/scripts/sw-retro.sh +158 -1
package/scripts/sw-review-rerun.sh +3 -1
package/scripts/sw-scale.sh +10 -3
package/scripts/sw-security-audit.sh +6 -1
package/scripts/sw-self-optimize.sh +6 -3
package/scripts/sw-session.sh +9 -3
package/scripts/sw-setup.sh +3 -1
package/scripts/sw-stall-detector.sh +406 -0
package/scripts/sw-standup.sh +15 -7
package/scripts/sw-status.sh +3 -1
package/scripts/sw-strategic.sh +4 -1
package/scripts/sw-stream.sh +7 -1
package/scripts/sw-swarm.sh +18 -6
package/scripts/sw-team-stages.sh +13 -6
package/scripts/sw-templates.sh +5 -29
package/scripts/sw-testgen.sh +7 -1
package/scripts/sw-tmux-pipeline.sh +4 -1
package/scripts/sw-tmux-role-color.sh +2 -0
package/scripts/sw-tmux-status.sh +1 -1
package/scripts/sw-tmux.sh +3 -1
package/scripts/sw-trace.sh +3 -1
package/scripts/sw-tracker-github.sh +3 -0
package/scripts/sw-tracker-jira.sh +3 -0
package/scripts/sw-tracker-linear.sh +3 -0
package/scripts/sw-tracker.sh +3 -1
package/scripts/sw-triage.sh +2 -1
package/scripts/sw-upgrade.sh +3 -1
package/scripts/sw-ux.sh +5 -2
package/scripts/sw-webhook.sh +3 -1
package/scripts/sw-widgets.sh +3 -1
package/scripts/sw-worktree.sh +15 -3
package/scripts/test-skill-injection.sh +1233 -0
package/templates/pipelines/autonomous.json +27 -3
package/templates/pipelines/cost-aware.json +34 -8
package/templates/pipelines/deployed.json +12 -0
package/templates/pipelines/enterprise.json +12 -0
package/templates/pipelines/fast.json +6 -0
package/templates/pipelines/full.json +27 -3
package/templates/pipelines/hotfix.json +6 -0
package/templates/pipelines/standard.json +12 -0
package/templates/pipelines/tdd.json +12 -0

package/docs/plans/2026-03-01-compound-audit-cascade-design.md ADDED Viewed

@@ -0,0 +1,191 @@
+# Compound Audit Cascade — Design Document
+## Goal
+Replace the one-shot `compound_quality` stage with an adaptive multi-agent cascade that iteratively probes for bugs across specialized categories until confidence is high.
+## Problem
+The current `compound_quality` stage runs a single adversarial review + negative testing pass. It catches surface-level issues but misses deeper problems because:
+1. **Single perspective** — one Claude call can't specialize in logic, integration, security, AND completeness simultaneously.
+2. **No iteration** — runs once and moves on. If the review misses something, it stays missed.
+3. **No convergence signal** — can't tell whether findings are exhaustive or just scratching the surface.
+4. **No deduplication** — if multiple checks flag the same issue, they report it separately.
+## Architecture
+### Adaptive Cascade Loop
+```
+stage_compound_quality()
+│
+├─ Pre-flight: validate meaningful code changes exist (existing)
+│
+├─ Cycle 1: Core Agents (parallel via claude -p --model haiku)
+│   ├─ Logic Auditor    → bugs, wrong algorithms, edge cases
+│   ├─ Integration Auditor → wiring gaps, missing connections
+│   └─ Completeness Auditor → spec coverage, missing features
+│   │
+│   └─ Dedup + Classify → { critical, high, medium, low }
+│       │
+│       ├─ If critical/high found → trigger specialist escalation
+│       │   ├─ "security" keyword → add Security Auditor
+│       │   ├─ "error handling" keyword → add Error Handling Auditor
+│       │   ├─ "performance" keyword → add Performance Auditor
+│       │   └─ "edge case" keyword → add Edge Case Auditor
+│       │
+│       └─ Check convergence → continue or stop
+│
+├─ Cycle 2..N: Core + Triggered Specialists (parallel)
+│   └─ Run agents → dedup → classify → check convergence
+│
+├─ Convergence: stop when ANY of:
+│   ├─ No new critical/high findings in latest cycle
+│   ├─ Duplicate rate > 98% (diminishing returns)
+│   └─ max_cycles reached (hard cap, default 3)
+│
+├─ Emit audit trail events for each cycle + finding
+│
+└─ Output: structured findings JSON + pass/fail verdict
+```
+### Agent Specializations
+**Core 3 (always run):**
+| Agent | Focus | Example findings |
+|---|---|---|
+| Logic Auditor | Control flow bugs, off-by-one, wrong conditions, null paths | "Function returns early before cleanup on error path" |
+| Integration Auditor | Missing imports, broken call chains, mismatched interfaces | "Handler registered but route never wired in router" |
+| Completeness Auditor | Spec vs. implementation gaps, missing tests, placeholders | "Plan requires --force flag but implementation omits it" |
+**Specialists (triggered by core findings):**
+| Specialist | Trigger keywords | Focus |
+|---|---|---|
+| Security | auth, injection, secrets, permissions, credential | OWASP top 10, credential exposure, input validation |
+| Error Handling | catch, error, fail, exception, silent | Silent swallows, missing error paths, inconsistent handling |
+| Performance | loop, query, memory, scale, O(n) | O(n^2) patterns, unbounded allocations, missing pagination |
+| Edge Cases | boundary, limit, empty, null, zero, max | Zero-length inputs, max values, concurrent access |
+### Context Bundle (shared by all agents)
+Each agent receives:
+- Cumulative git diff from branch point
+- Test evidence JSON (from audit trail)
+- Plan/spec summary (from pipeline artifacts)
+- Previous cycle findings (so agents don't repeat known issues)
+### Finding Schema
+```json
+{
+  "findings": [
+    {
+      "severity": "critical|high|medium|low",
+      "category": "logic|integration|completeness|security|error_handling|performance|edge_case",
+      "file": "path/to/file.sh",
+      "line": 42,
+      "description": "One-sentence description",
+      "evidence": "The specific code or pattern that's wrong",
+      "suggestion": "How to fix it"
+    }
+  ]
+}
+```
+### Deduplication Strategy
+**Tier 1: Structural match (free, instant)**
+- Same file + same category + lines within 5 of each other = duplicate
+- Catches 60-70% of duplicates without any LLM call
+**Tier 2: LLM dedup judge (cheap)**
+- After all agents complete, send findings to `claude -p --model haiku`:
+  "Group findings by whether they describe the SAME underlying issue. Two findings are the same if fixing one would fix the other."
+- Returns groups: `[{"canonical": 0, "duplicates": [2, 5]}, ...]`
+- Canonical finding in each group keeps the best description
+### Convergence Calculation
+```
+new_unique = findings_this_cycle - duplicates_of_previous_cycles
+duplicate_rate = duplicates / total_findings_this_cycle
+converged = (no critical/high in new_unique) OR (duplicate_rate > 0.98) OR (cycle >= max_cycles)
+```
+## Implementation
+### New File: `scripts/lib/compound-audit.sh`
+Four functions:
+| Function | Purpose |
+|---|---|
+| `compound_audit_run_cycle()` | Runs N agents in parallel, collects JSON findings |
+| `compound_audit_dedup()` | Tier 1 structural + Tier 2 haiku judge |
+| `compound_audit_escalate()` | Scans findings for trigger keywords, returns specialist list |
+| `compound_audit_converged()` | Checks stop conditions |
+### Integration
+Replace body of `stage_compound_quality()` in `pipeline-intelligence.sh` (~line 1148). Existing pre-flight checks (bash compat, coverage) stay. Existing adversarial/negative/e2e/dod checks replaced by cascade.
+### Agent Execution
+Each agent: `claude -p "$prompt" --model haiku`
+Core 3 run in parallel: bash background jobs (`&` + `wait`).
+Parse JSON output from each agent's stdout.
+### Audit Trail Events
+| Event | Payload |
+|---|---|
+| `compound.cycle_start` | cycle, agents list |
+| `compound.finding` | severity, category, file, line, description |
+| `compound.dedup` | unique count, duplicate count, duplicate rate |
+| `compound.cycle_complete` | new unique findings, triggered specialists |
+| `compound.converged` | reason (no_criticals, dup_rate, max_cycles) |
+### Template Config
+```json
+{
+  "id": "compound_quality",
+  "config": {
+    "max_cycles": 3,
+    "dedup_model": "haiku",
+    "escalation_enabled": true,
+    "block_on_critical": true
+  }
+}
+```
+### Cost Estimate
+| Scenario | Agents | Cost |
+|---|---|---|
+| Cycle 1 (core only, clean code) | 3 + 1 dedup | ~$0.004 |
+| Cycle 2 (with 2 specialists) | 5 + 1 dedup | ~$0.006 |
+| Worst case (3 cycles, full escalation) | ~15 + 3 dedup | ~$0.02 |
+Negligible vs. build stage costs ($0.50-2.00).
+### Fail-Open Principle
+All compound audit calls wrapped in `|| true`. If any agent call fails or times out, that agent's findings are skipped. The cascade continues with remaining agents. Pipeline never blocks on audit infrastructure failures.
+## Testing Strategy
+1. Unit tests in `scripts/sw-lib-compound-audit-test.sh` — dedup, escalation, convergence functions
+2. Integration: mock agent outputs, verify cascade loop behavior
+3. Regression: existing pipeline tests still pass
+4. E2E: run pipeline, verify compound audit events in JSONL
+## Non-Goals
+- No external LLM providers — Claude Code only (`claude -p`)
+- No persistent finding database — findings live in audit trail JSONL
+- No auto-fix — findings are reported, not automatically resolved
+- No UI — findings appear in pipeline-audit.md report only