npm - opencode-swarm-plugin - Versions diffs - 0.44.2 → 0.45.0 - Mend

opencode-swarm-plugin 0.44.2 → 0.45.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/README.md +277 -54
package/bin/swarm.ts +1 -1
package/dist/decision-trace-integration.d.ts +204 -0
package/dist/decision-trace-integration.d.ts.map +1 -0
package/dist/hive.d.ts.map +1 -1
package/dist/hive.js +9 -9
package/dist/index.d.ts +32 -2
package/dist/index.d.ts.map +1 -1
package/dist/index.js +535 -27
package/dist/plugin.js +295 -27
package/dist/query-tools.d.ts +20 -12
package/dist/query-tools.d.ts.map +1 -1
package/dist/swarm-decompose.d.ts +4 -4
package/dist/swarm-decompose.d.ts.map +1 -1
package/dist/swarm-prompts.d.ts.map +1 -1
package/dist/swarm-prompts.js +220 -22
package/dist/swarm-review.d.ts.map +1 -1
package/dist/swarm-signature.d.ts +106 -0
package/dist/swarm-signature.d.ts.map +1 -0
package/dist/swarm-strategies.d.ts +16 -3
package/dist/swarm-strategies.d.ts.map +1 -1
package/dist/swarm.d.ts +4 -2
package/dist/swarm.d.ts.map +1 -1
package/examples/commands/swarm.md +745 -0
package/examples/plugin-wrapper-template.ts +2611 -0
package/examples/skills/hive-workflow/SKILL.md +212 -0
package/examples/skills/skill-creator/SKILL.md +223 -0
package/examples/skills/swarm-coordination/SKILL.md +292 -0
package/global-skills/cli-builder/SKILL.md +344 -0
package/global-skills/cli-builder/references/advanced-patterns.md +244 -0
package/global-skills/learning-systems/SKILL.md +644 -0
package/global-skills/skill-creator/LICENSE.txt +202 -0
package/global-skills/skill-creator/SKILL.md +352 -0
package/global-skills/skill-creator/references/output-patterns.md +82 -0
package/global-skills/skill-creator/references/workflows.md +28 -0
package/global-skills/swarm-coordination/SKILL.md +995 -0
package/global-skills/swarm-coordination/references/coordinator-patterns.md +235 -0
package/global-skills/swarm-coordination/references/strategies.md +138 -0
package/global-skills/system-design/SKILL.md +213 -0
package/global-skills/testing-patterns/SKILL.md +430 -0
package/global-skills/testing-patterns/references/dependency-breaking-catalog.md +586 -0
package/package.json +5 -2

package/global-skills/learning-systems/SKILL.md ADDED Viewed

@@ -0,0 +1,644 @@
+---
+name: learning-systems
+description: Implicit feedback scoring, confidence decay, and anti-pattern detection. Use when understanding how the swarm plugin learns from outcomes, implementing learning loops, or debugging why patterns are being promoted or deprecated. Unique to opencode-swarm-plugin.
+---
+# Learning Systems
+The swarm plugin learns from task outcomes to improve decomposition quality over time. Three interconnected systems track pattern effectiveness: implicit feedback scoring, confidence decay, and pattern maturity progression.
+## Implicit Feedback Scoring
+Convert task outcomes into learning signals without explicit user feedback.
+### What Gets Scored
+**Duration signals:**
+- Fast (<5 min) = helpful (1.0)
+- Medium (5-30 min) = neutral (0.6)
+- Slow (>30 min) = harmful (0.2)
+**Error signals:**
+- 0 errors = helpful (1.0)
+- 1-2 errors = neutral (0.6)
+- 3+ errors = harmful (0.2)
+**Retry signals:**
+- 0 retries = helpful (1.0)
+- 1 retry = neutral (0.7)
+- 2+ retries = harmful (0.3)
+**Success signal:**
+- Success = 1.0 (40% weight)
+- Failure = 0.0
+### Weighted Score Calculation
+```typescript
+rawScore = success * 0.4 + duration * 0.2 + errors * 0.2 + retries * 0.2;
+```
+**Thresholds:**
+- rawScore >= 0.7 → helpful
+- rawScore <= 0.4 → harmful
+- 0.4 < rawScore < 0.7 → neutral
+### Recording Outcomes
+Call `swarm_record_outcome` after subtask completion:
+```typescript
+swarm_record_outcome({
+  bead_id: "bd-123.1",
+  duration_ms: 180000, // 3 minutes
+  error_count: 0,
+  retry_count: 0,
+  success: true,
+  files_touched: ["src/auth.ts"],
+  strategy: "file-based",
+});
+```
+**Fields tracked:**
+- `bead_id` - subtask identifier
+- `duration_ms` - time from start to completion
+- `error_count` - errors encountered (from ErrorAccumulator)
+- `retry_count` - number of retry attempts
+- `success` - whether subtask completed successfully
+- `files_touched` - modified file paths
+- `strategy` - decomposition strategy used (optional)
+- `failure_mode` - classification if success=false (optional)
+- `failure_details` - error context (optional)
+## Confidence Decay
+Evaluation criteria weights fade unless revalidated. Prevents stale patterns from dominating future decompositions.
+### Half-Life Formula
+```
+decayed_value = raw_value * 0.5^(age_days / 90)
+```
+**Decay timeline:**
+- Day 0: 100% weight
+- Day 90: 50% weight
+- Day 180: 25% weight
+- Day 270: 12.5% weight
+### Criterion Weight Calculation
+Aggregate decayed feedback events:
+```typescript
+helpfulSum = sum(helpful_events.map((e) => e.raw_value * decay(e.timestamp)));
+harmfulSum = sum(harmful_events.map((e) => e.raw_value * decay(e.timestamp)));
+weight = max(0.1, helpfulSum / (helpfulSum + harmfulSum));
+```
+**Weight floor:** minimum 0.1 prevents complete zeroing
+### Revalidation
+Recording new feedback resets decay timer for that criterion:
+```typescript
+{
+  criterion: "type_safe",
+  weight: 0.85,
+  helpful_count: 12,
+  harmful_count: 3,
+  last_validated: "2024-12-12T00:00:00Z",  // Reset on new feedback
+  half_life_days: 90,
+}
+```
+### When Criteria Get Deprecated
+```typescript
+total = helpful_count + harmful_count;
+harmfulRatio = harmful_count / total;
+if (total >= 3 && harmfulRatio > 0.3) {
+  // Deprecate criterion - reduce impact to 0
+}
+```
+## Pattern Maturity States
+Patterns progress through lifecycle based on feedback accumulation:
+**candidate** → **established** → **proven** (or **deprecated**)
+### State Transitions
+**candidate (initial state):**
+- Total feedback < 3 events
+- Not enough data to judge
+- Multiplier: 0.5x
+**established:**
+- Total feedback >= 3 events
+- Has track record but not proven
+- Multiplier: 1.0x
+**proven:**
+- Decayed helpful >= 5 AND
+- Harmful ratio < 15%
+- Multiplier: 1.5x
+**deprecated:**
+- Harmful ratio > 30% AND
+- Total feedback >= 3 events
+- Multiplier: 0x (excluded)
+### Decay Applied to State Calculation
+State determination uses decayed counts, not raw counts:
+```typescript
+const { decayedHelpful, decayedHarmful } =
+  calculateDecayedCounts(feedbackEvents);
+const total = decayedHelpful + decayedHarmful;
+const harmfulRatio = decayedHarmful / total;
+// State logic applies to decayed values
+```
+Old feedback matters less. Pattern must maintain recent positive signal to stay proven.
+### Manual State Changes
+**Promote to proven:**
+```typescript
+promotePattern(maturity); // External validation confirms effectiveness
+```
+**Deprecate:**
+```typescript
+deprecatePattern(maturity, "Causes file conflicts in 80% of cases");
+```
+Cannot promote deprecated patterns. Must reset.
+### Multipliers in Decomposition
+Apply maturity multiplier to pattern scores:
+```typescript
+const multipliers = {
+  candidate: 0.5,
+  established: 1.0,
+  proven: 1.5,
+  deprecated: 0,
+};
+pattern_score = base_score * multipliers[maturity.state];
+```
+Proven patterns get 50% boost, deprecated patterns excluded entirely.
+## Anti-Pattern Inversion
+Failed patterns auto-convert to anti-patterns at >60% failure rate.
+### Inversion Threshold
+```typescript
+const total = pattern.success_count + pattern.failure_count;
+if (total >= 3 && pattern.failure_count / total >= 0.6) {
+  invertToAntiPattern(pattern, reason);
+}
+```
+**Minimum observations:** 3 total (prevents hasty inversion)
+**Failure ratio:** 60% (3+ failures in 5 attempts)
+### Inversion Process
+**Original pattern:**
+```typescript
+{
+  id: "pattern-123",
+  content: "Split by file type",
+  kind: "pattern",
+  is_negative: false,
+  success_count: 2,
+  failure_count: 5,
+}
+```
+**Inverted anti-pattern:**
+```typescript
+{
+  id: "anti-pattern-123",
+  content: "AVOID: Split by file type. Failed 5/7 times (71% failure rate)",
+  kind: "anti_pattern",
+  is_negative: true,
+  success_count: 2,
+  failure_count: 5,
+  reason: "Failed 5/7 times (71% failure rate)",
+}
+```
+### Recording Observations
+Track pattern outcomes to accumulate success/failure counts:
+```typescript
+recordPatternObservation(
+  pattern,
+  success: true,  // or false
+  beadId: "bd-123.1",
+)
+// Returns:
+{
+  pattern: updatedPattern,
+  inversion?: {
+    original: pattern,
+    inverted: antiPattern,
+    reason: "Failed 5/7 times (71% failure rate)",
+  }
+}
+```
+### Pattern Extraction
+Auto-detect strategies from decomposition descriptions:
+```typescript
+extractPatternsFromDescription(
+  "We'll split by file type, one file per subtask",
+);
+// Returns: ["Split by file type", "One file per subtask"]
+```
+**Detected strategies:**
+- Split by file type
+- Split by component
+- Split by layer (UI/logic/data)
+- Split by feature
+- One file per subtask
+- Handle shared types first
+- Separate API routes
+- Tests alongside implementation
+- Tests in separate subtask
+- Maximize parallelization
+- Sequential execution order
+- Respect dependency chain
+### Using Anti-Patterns in Prompts
+Format for decomposition prompt inclusion:
+```typescript
+formatAntiPatternsForPrompt(patterns);
+```
+**Output:**
+```markdown
+## Anti-Patterns to Avoid
+Based on past failures, avoid these decomposition strategies:
+- AVOID: Split by file type. Failed 12/15 times (80% failure rate)
+- AVOID: One file per subtask. Failed 8/10 times (80% failure rate)
+```
+## Error Accumulator
+Track errors during subtask execution for retry prompts and outcome scoring.
+### Error Types
+```typescript
+type ErrorType =
+  | "validation" // Schema/type errors
+  | "timeout" // Task exceeded time limit
+  | "conflict" // File reservation conflicts
+  | "tool_failure" // Tool invocation failed
+  | "unknown"; // Unclassified
+```
+### Recording Errors
+```typescript
+errorAccumulator.recordError(
+  beadId: "bd-123.1",
+  errorType: "validation",
+  message: "Type error in src/auth.ts",
+  options: {
+    stack_trace: "...",
+    tool_name: "typecheck",
+    context: "After adding OAuth types",
+  }
+)
+```
+### Generating Error Context
+Format accumulated errors for retry prompts:
+```typescript
+const context = await errorAccumulator.getErrorContext(
+  beadId: "bd-123.1",
+  includeResolved: false,
+)
+```
+**Output:**
+```markdown
+## Previous Errors
+The following errors were encountered during execution:
+### validation (2 errors)
+- **Type error in src/auth.ts**
+  - Context: After adding OAuth types
+  - Tool: typecheck
+  - Time: 12/12/2024, 10:30 AM
+- **Missing import in src/session.ts**
+  - Tool: typecheck
+  - Time: 12/12/2024, 10:35 AM
+**Action Required**: Address these errors before proceeding. Consider:
+- What caused each error?
+- How can you prevent similar errors?
+- Are there patterns across error types?
+```
+### Resolving Errors
+Mark errors resolved after fixing:
+```typescript
+await errorAccumulator.resolveError(errorId);
+```
+Resolved errors excluded from retry context by default.
+### Error Statistics
+Get error counts for outcome tracking:
+```typescript
+const stats = await errorAccumulator.getErrorStats("bd-123.1")
+// Returns:
+{
+  total: 5,
+  unresolved: 2,
+  by_type: {
+    validation: 3,
+    timeout: 1,
+    tool_failure: 1,
+  }
+}
+```
+Use `total` for `error_count` in outcome signals.
+## Using the Learning System
+### Integration Points
+**1. During decomposition (swarm_plan_prompt):**
+- Query CASS for similar tasks
+- Load pattern maturity records
+- Include proven patterns in prompt
+- Exclude deprecated patterns
+**2. During execution:**
+- ErrorAccumulator tracks errors
+- Record retry attempts
+- Track duration from start to completion
+**3. After completion (swarm_complete):**
+- Record outcome signals
+- Score implicit feedback
+- Update pattern observations
+- Check for anti-pattern inversions
+- Update maturity states
+### Full Workflow Example
+```typescript
+// 1. Decomposition phase
+const cass_results = cass_search({ query: "user authentication", limit: 5 });
+const patterns = loadPatterns(); // Get maturity records
+const prompt = swarm_plan_prompt({
+  task: "Add OAuth",
+  context: formatPatternsWithMaturityForPrompt(patterns),
+  query_cass: true,
+});
+// 2. Execution phase
+const errorAccumulator = new ErrorAccumulator();
+const startTime = Date.now();
+try {
+  // Work happens...
+  await implement_subtask();
+} catch (error) {
+  await errorAccumulator.recordError(
+    bead_id,
+    classifyError(error),
+    error.message,
+  );
+  retryCount++;
+}
+// 3. Completion phase
+const duration = Date.now() - startTime;
+const errorStats = await errorAccumulator.getErrorStats(bead_id);
+swarm_record_outcome({
+  bead_id,
+  duration_ms: duration,
+  error_count: errorStats.total,
+  retry_count: retryCount,
+  success: true,
+  files_touched: modifiedFiles,
+  strategy: "file-based",
+});
+// 4. Learning updates
+const scored = scoreImplicitFeedback({
+  bead_id,
+  duration_ms: duration,
+  error_count: errorStats.total,
+  retry_count: retryCount,
+  success: true,
+  timestamp: new Date().toISOString(),
+  strategy: "file-based",
+});
+// Update patterns
+for (const pattern of extractedPatterns) {
+  const { pattern: updated, inversion } = recordPatternObservation(
+    pattern,
+    scored.type === "helpful",
+    bead_id,
+  );
+  if (inversion) {
+    console.log(`Pattern inverted: ${inversion.reason}`);
+    storeAntiPattern(inversion.inverted);
+  }
+}
+```
+### Configuration Tuning
+Adjust thresholds based on project characteristics:
+```typescript
+const learningConfig = {
+  halfLifeDays: 90, // Decay speed
+  minFeedbackForAdjustment: 3, // Min observations for weight adjustment
+  maxHarmfulRatio: 0.3, // Max harmful % before deprecating criterion
+  fastCompletionThresholdMs: 300000, // 5 min = fast
+  slowCompletionThresholdMs: 1800000, // 30 min = slow
+  maxErrorsForHelpful: 2, // Max errors before marking harmful
+};
+const antiPatternConfig = {
+  minObservations: 3, // Min before inversion
+  failureRatioThreshold: 0.6, // 60% failure triggers inversion
+  antiPatternPrefix: "AVOID: ",
+};
+const maturityConfig = {
+  minFeedback: 3, // Min for leaving candidate state
+  minHelpful: 5, // Decayed helpful threshold for proven
+  maxHarmful: 0.15, // Max 15% harmful for proven
+  deprecationThreshold: 0.3, // 30% harmful triggers deprecation
+  halfLifeDays: 90,
+};
+```
+### Debugging Pattern Issues
+**Why is pattern not proven?**
+Check decayed counts:
+```typescript
+const feedback = await getFeedback(patternId);
+const { decayedHelpful, decayedHarmful } = calculateDecayedCounts(feedback);
+console.log({ decayedHelpful, decayedHarmful });
+// Need: decayedHelpful >= 5 AND harmfulRatio < 0.15
+```
+**Why was pattern inverted?**
+Check observation counts:
+```typescript
+const total = pattern.success_count + pattern.failure_count;
+const failureRatio = pattern.failure_count / total;
+console.log({ total, failureRatio });
+// Inverts if: total >= 3 AND failureRatio >= 0.6
+```
+**Why is criterion weight low?**
+Check feedback events:
+```typescript
+const events = await getFeedbackByCriterion("type_safe");
+const weight = calculateCriterionWeight(events);
+console.log(weight);
+// Shows: helpful vs harmful counts, last_validated date
+```
+## Storage Interfaces
+### FeedbackStorage
+Persist feedback events for criterion weight calculation:
+```typescript
+interface FeedbackStorage {
+  store(event: FeedbackEvent): Promise<void>;
+  getByCriterion(criterion: string): Promise<FeedbackEvent[]>;
+  getByBead(beadId: string): Promise<FeedbackEvent[]>;
+  getAll(): Promise<FeedbackEvent[]>;
+}
+```
+### ErrorStorage
+Persist errors for retry prompts:
+```typescript
+interface ErrorStorage {
+  store(entry: ErrorEntry): Promise<void>;
+  getByBead(beadId: string): Promise<ErrorEntry[]>;
+  getUnresolvedByBead(beadId: string): Promise<ErrorEntry[]>;
+  markResolved(id: string): Promise<void>;
+  getAll(): Promise<ErrorEntry[]>;
+}
+```
+### PatternStorage
+Persist decomposition patterns:
+```typescript
+interface PatternStorage {
+  store(pattern: DecompositionPattern): Promise<void>;
+  get(id: string): Promise<DecompositionPattern | null>;
+  getAll(): Promise<DecompositionPattern[]>;
+  getAntiPatterns(): Promise<DecompositionPattern[]>;
+  getByTag(tag: string): Promise<DecompositionPattern[]>;
+  findByContent(content: string): Promise<DecompositionPattern[]>;
+}
+```
+### MaturityStorage
+Persist pattern maturity records:
+```typescript
+interface MaturityStorage {
+  store(maturity: PatternMaturity): Promise<void>;
+  get(patternId: string): Promise<PatternMaturity | null>;
+  getAll(): Promise<PatternMaturity[]>;
+  getByState(state: MaturityState): Promise<PatternMaturity[]>;
+  storeFeedback(feedback: MaturityFeedback): Promise<void>;
+  getFeedback(patternId: string): Promise<MaturityFeedback[]>;
+}
+```
+In-memory implementations provided for testing. Production should use persistent storage (file-based JSONL or SQLite).