npm - prjct-cli - Versions diffs - 1.19.0 → 1.20.0 - Mend

prjct-cli 1.19.0 → 1.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +74 -2
package/core/__tests__/utils/retry.test.ts +381 -0
package/core/agentic/tool-registry.ts +40 -12
package/core/services/agent-generator.ts +35 -8
package/core/services/agent-service.ts +17 -12
package/core/utils/retry.ts +318 -0
package/dist/bin/prjct.mjs +249 -18
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -1,11 +1,83 @@
 # Changelog
-## [1.19.0] - 2026-02-09
+## [1.20.0] - 2026-02-10
 ### Features
-- implement aggressive archival of stale storage data (PRJ-267) (#161)
+- add retry with exponential backoff for agent and tool operations (#162)
+## [1.20.0] - 2026-02-09
+### Features
+- **Retry with exponential backoff for agent and tool operations** (PRJ-271): Comprehensive retry infrastructure with error classification and circuit breaker
+  - RetryPolicy utility with configurable attempts, delays, and exponential backoff (1s→2s→4s)
+  - Automatic error classification: transient (EBUSY, EAGAIN, ETIMEDOUT) vs permanent (ENOENT, EPERM)
+  - Circuit breaker protection: opens after 5 consecutive failures, auto-closes after 60s
+  - Agent initialization retries (3 attempts with 1s base delay)
+  - Tool operations retry (Read/Write/Bash with 2 attempts)
+  - Resilient parallel agent generation using Promise.allSettled()
+### Implementation Details
+Built RetryPolicy utility with exponential backoff, error classification, and circuit breaker. Integrated across agent initialization, tool operations, and parallel agent generation. The system now automatically retries transient failures while failing fast on permanent errors.
+**New modules:**
+- `core/utils/retry.ts` (320 lines) — Core retry infrastructure with RetryPolicy class, error classification, circuit breaker
+- `core/__tests__/utils/retry.test.ts` (380 lines) — 21 comprehensive tests with 53 assertions
+- `ACCEPTANCE-PRJ-271.md` — Full acceptance criteria verification (22 criteria verified)
+**Modified modules:**
+- `core/services/agent-service.ts` — Wrapped initialize() with retry policy (3 attempts)
+- `core/agentic/tool-registry.ts` — Added retry to Read/Write/Bash tools (2 attempts each)
+- `core/services/agent-generator.ts` — Changed to Promise.allSettled() with per-agent retry
+**Key features:**
+- Exponential backoff: 1s, 2s, 4s (configurable base/max)
+- Error classification: automatic transient vs permanent detection
+- Circuit breaker: per-operation tracking, 5 failure threshold, 60s cooldown
+- Two default policies: defaultAgentRetryPolicy (3 attempts), defaultToolRetryPolicy (2 attempts)
+- Zero breaking changes: all 968 existing tests pass
+### Learnings
+- **RetryPolicy pattern:** Wrapping operations with retry execution provides clean separation of retry logic from business logic
+- **Error classification strategies:** Using error code sets (EBUSY, EAGAIN) for transient vs (ENOENT, EPERM) for permanent enables automatic decision-making
+- **Promise.allSettled() for resilient parallel operations:** Prevents one failure from blocking other operations, enables partial success
+- **Circuit breaker implementation:** Per-operation state tracking prevents cascading failures while allowing recovery
+### Test Plan
+#### For QA
+1. **Agent Initialization Retry**
+   - Temporarily make file system busy during agent initialization
+   - Verify agent initialization retries up to 3 times
+   - Confirm permanent errors (unsupported agent) fail immediately
+2. **Tool Operations Retry**
+   - Test Read/Write/Bash with transient errors (EBUSY, ETIMEDOUT)
+   - Verify operations retry automatically (2 attempts)
+   - Confirm permanent errors (ENOENT, EPERM) return null/false without retry
+3. **Circuit Breaker**
+   - Trigger 5 consecutive failures on same operation
+   - Verify circuit breaker opens and blocks further attempts
+   - Wait 60 seconds and verify circuit closes automatically
+4. **Parallel Agent Generation**
+   - Simulate one agent generation failure during sync
+   - Verify other agents generate successfully (Promise.allSettled behavior)
+   - Check logs for failure warnings
+#### For Users
+**What changed:** The system is now more resilient against transient failures. Operations like agent initialization, file reads/writes, and command execution will automatically retry when they encounter temporary errors (disk busy, timeouts, etc).
+**How to use:** No action required - retry logic works automatically. Users will experience fewer random failures during normal operations.
+**Breaking changes:** None. All changes are backward compatible. Existing tests (968 total) all pass.
 ## [1.19.0] - 2026-02-09

package/core/__tests__/utils/retry.test.ts ADDED Viewed

@@ -0,0 +1,381 @@
+/**
+ * Retry Policy Tests
+ * Tests for exponential backoff, error classification, and circuit breaker
+ */
+import { afterEach, beforeEach, describe, expect, it } from 'bun:test'
+import {
+  defaultAgentRetryPolicy,
+  defaultToolRetryPolicy,
+  isPermanentError,
+  isTransientError,
+  RetryPolicy,
+} from '../../utils/retry'
+describe('RetryPolicy', () => {
+  let policy: RetryPolicy
+  beforeEach(() => {
+    policy = new RetryPolicy({
+      maxAttempts: 3,
+      baseDelayMs: 100, // Shorter delays for tests
+      maxDelayMs: 400,
+      circuitBreakerThreshold: 5,
+      circuitBreakerTimeoutMs: 1000,
+    })
+    // Reset all circuits before each test
+    policy.resetAllCircuits()
+  })
+  afterEach(() => {
+    policy.resetAllCircuits()
+  })
+  describe('Error Classification', () => {
+    it('should identify transient errors correctly', () => {
+      const transientErrors = [
+        { code: 'EBUSY' },
+        { code: 'EAGAIN' },
+        { code: 'ETIMEDOUT' },
+        { code: 'ECONNRESET' },
+        { code: 'ECONNREFUSED' },
+        { message: 'Operation timed out' },
+        { message: 'Request timeout' },
+      ]
+      for (const error of transientErrors) {
+        expect(isTransientError(error)).toBe(true)
+      }
+    })
+    it('should identify permanent errors correctly', () => {
+      const permanentErrors = [
+        { code: 'ENOENT' },
+        { code: 'EACCES' },
+        { code: 'EPERM' },
+        { code: 'EISDIR' },
+        { code: 'ENOTDIR' },
+        { code: 'EINVAL' },
+      ]
+      for (const error of permanentErrors) {
+        expect(isPermanentError(error)).toBe(true)
+        expect(isTransientError(error)).toBe(false)
+      }
+    })
+    it('should not classify unknown errors as transient', () => {
+      const unknownErrors = [
+        { code: 'UNKNOWN' },
+        { message: 'Unknown error' },
+        new Error('Generic error'),
+      ]
+      for (const error of unknownErrors) {
+        expect(isTransientError(error)).toBe(false)
+      }
+    })
+  })
+  describe('Successful Operations', () => {
+    it('should execute successful operation without retry', async () => {
+      let attempts = 0
+      const operation = async () => {
+        attempts++
+        return 'success'
+      }
+      const result = await policy.execute(operation, 'test-op')
+      expect(result).toBe('success')
+      expect(attempts).toBe(1)
+    })
+    it('should reset circuit breaker after success', async () => {
+      // Force some failures to increment circuit state
+      let failCount = 0
+      const failOperation = async () => {
+        failCount++
+        throw { code: 'EBUSY' }
+      }
+      try {
+        await policy.execute(failOperation, 'test-op')
+      } catch {
+        // Expected to fail
+      }
+      expect(failCount).toBe(3) // maxAttempts
+      // Now succeed - should reset circuit
+      const successOperation = async () => 'success'
+      await policy.execute(successOperation, 'test-op')
+      const circuitState = policy.getCircuitState('test-op')
+      expect(circuitState).toBeUndefined()
+    })
+  })
+  describe('Transient Error Retry', () => {
+    it('should retry transient errors and succeed', async () => {
+      let attempts = 0
+      const operation = async () => {
+        attempts++
+        if (attempts < 3) {
+          throw { code: 'EBUSY' } // Transient error
+        }
+        return 'success'
+      }
+      const result = await policy.execute(operation, 'test-op')
+      expect(result).toBe('success')
+      expect(attempts).toBe(3)
+    })
+    it('should apply exponential backoff between retries', async () => {
+      const timestamps: number[] = []
+      let attempts = 0
+      const operation = async () => {
+        timestamps.push(Date.now())
+        attempts++
+        if (attempts < 3) {
+          throw { code: 'ETIMEDOUT' }
+        }
+        return 'success'
+      }
+      await policy.execute(operation, 'test-op')
+      // Check delays: should be ~100ms, ~200ms (with some tolerance)
+      const delay1 = timestamps[1] - timestamps[0]
+      const delay2 = timestamps[2] - timestamps[1]
+      expect(delay1).toBeGreaterThanOrEqual(90) // 100ms with tolerance
+      expect(delay1).toBeLessThan(150)
+      expect(delay2).toBeGreaterThanOrEqual(180) // 200ms with tolerance
+      expect(delay2).toBeLessThan(250)
+    })
+    it('should respect maxDelayMs cap', async () => {
+      const policy = new RetryPolicy({
+        maxAttempts: 5,
+        baseDelayMs: 100,
+        maxDelayMs: 200,
+      })
+      const timestamps: number[] = []
+      let attempts = 0
+      const operation = async () => {
+        timestamps.push(Date.now())
+        attempts++
+        if (attempts < 5) {
+          throw { code: 'EBUSY' }
+        }
+        return 'success'
+      }
+      await policy.execute(operation, 'test-op')
+      // Last delay should not exceed maxDelayMs (check delay between attempt 3 and 4)
+      // Attempt 1: no delay
+      // Attempt 2: 100ms delay (baseDelayMs * 2^0)
+      // Attempt 3: 200ms delay (baseDelayMs * 2^1, capped at maxDelayMs)
+      // Attempt 4: 200ms delay (baseDelayMs * 2^2 = 400ms, capped at 200ms)
+      const delay3 = timestamps[3] - timestamps[2]
+      expect(delay3).toBeLessThanOrEqual(250) // 200ms + tolerance
+      expect(delay3).toBeGreaterThanOrEqual(180)
+    })
+    it('should throw if all retry attempts fail with transient error', async () => {
+      let attempts = 0
+      const operation = async () => {
+        attempts++
+        throw { code: 'EAGAIN' }
+      }
+      await expect(policy.execute(operation, 'test-op')).rejects.toMatchObject({
+        code: 'EAGAIN',
+      })
+      expect(attempts).toBe(3) // maxAttempts
+    })
+  })
+  describe('Permanent Error Handling', () => {
+    it('should fail fast on permanent errors without retry', async () => {
+      let attempts = 0
+      const operation = async () => {
+        attempts++
+        throw { code: 'ENOENT' } // Permanent error
+      }
+      await expect(policy.execute(operation, 'test-op')).rejects.toMatchObject({
+        code: 'ENOENT',
+      })
+      expect(attempts).toBe(1) // No retry
+    })
+    it('should fail fast on permission denied', async () => {
+      let attempts = 0
+      const operation = async () => {
+        attempts++
+        throw { code: 'EPERM' }
+      }
+      await expect(policy.execute(operation, 'test-op')).rejects.toMatchObject({
+        code: 'EPERM',
+      })
+      expect(attempts).toBe(1)
+    })
+    it('should record failure for permanent errors', async () => {
+      const operation = async () => {
+        throw { code: 'ENOENT' }
+      }
+      try {
+        await policy.execute(operation, 'perm-op')
+      } catch {
+        // Expected
+      }
+      const state = policy.getCircuitState('perm-op')
+      expect(state?.consecutiveFailures).toBe(1)
+    })
+  })
+  describe('Circuit Breaker', () => {
+    it('should open circuit after threshold failures', async () => {
+      const operation = async () => {
+        throw { code: 'EBUSY' }
+      }
+      // Execute 5 times to reach threshold (each attempt counts as 1 failure)
+      for (let i = 0; i < 5; i++) {
+        try {
+          await policy.execute(operation, 'circuit-op')
+        } catch {
+          // Expected
+        }
+      }
+      // Circuit should now be open
+      expect(policy.isCircuitOpen('circuit-op')).toBe(true)
+      // Next call should fail immediately with circuit breaker error
+      await expect(policy.execute(operation, 'circuit-op')).rejects.toThrow(
+        /Circuit breaker is open/
+      )
+    })
+    it('should close circuit after timeout', async () => {
+      const policy = new RetryPolicy({
+        maxAttempts: 3,
+        baseDelayMs: 10,
+        maxDelayMs: 50,
+        circuitBreakerThreshold: 3,
+        circuitBreakerTimeoutMs: 100, // Short timeout for test
+      })
+      const operation = async () => {
+        throw { code: 'ETIMEDOUT' }
+      }
+      // Trigger circuit breaker
+      for (let i = 0; i < 3; i++) {
+        try {
+          await policy.execute(operation, 'timeout-op')
+        } catch {
+          // Expected
+        }
+      }
+      expect(policy.isCircuitOpen('timeout-op')).toBe(true)
+      // Wait for timeout
+      await new Promise((resolve) => setTimeout(resolve, 150))
+      // Circuit should be closed now
+      expect(policy.isCircuitOpen('timeout-op')).toBe(false)
+    })
+    it('should track failures per operation independently', async () => {
+      const operation = async () => {
+        throw { code: 'EAGAIN' }
+      }
+      // Fail operation A multiple times
+      for (let i = 0; i < 3; i++) {
+        try {
+          await policy.execute(operation, 'op-a')
+        } catch {
+          // Expected
+        }
+      }
+      const stateA = policy.getCircuitState('op-a')
+      const stateB = policy.getCircuitState('op-b')
+      expect(stateA?.consecutiveFailures).toBe(3)
+      expect(stateB).toBeUndefined()
+    })
+  })
+  describe('Default Policies', () => {
+    it('should have agent retry policy with correct defaults', () => {
+      // Test that default agent policy is configured correctly
+      expect(defaultAgentRetryPolicy).toBeInstanceOf(RetryPolicy)
+      // We can't directly inspect options, but we can test behavior
+    })
+    it('should have tool retry policy with correct defaults', () => {
+      expect(defaultToolRetryPolicy).toBeInstanceOf(RetryPolicy)
+    })
+    it('should retry agent operations 3 times', async () => {
+      let attempts = 0
+      const operation = async () => {
+        attempts++
+        if (attempts < 3) {
+          throw { code: 'EBUSY' }
+        }
+        return 'success'
+      }
+      await defaultAgentRetryPolicy.execute(operation, 'agent-test')
+      expect(attempts).toBe(3)
+    })
+  })
+  describe('Edge Cases', () => {
+    it('should handle non-Error objects', async () => {
+      const operation = async () => {
+        throw 'string error'
+      }
+      await expect(policy.execute(operation, 'edge-op')).rejects.toBe('string error')
+    })
+    it('should handle null/undefined errors', async () => {
+      const operation = async () => {
+        throw null
+      }
+      await expect(policy.execute(operation, 'null-op')).rejects.toBeNull()
+    })
+    it('should handle errors without code property', async () => {
+      const operation = async () => {
+        throw new Error('Generic error')
+      }
+      await expect(policy.execute(operation, 'generic-op')).rejects.toThrow('Generic error')
+    })
+  })
+})

package/core/agentic/tool-registry.ts CHANGED Viewed

@@ -10,6 +10,7 @@ import { exec } from 'node:child_process'
 import fs from 'node:fs/promises'
 import { promisify } from 'node:util'
 import type { ToolFunction, ToolRegistryInterface } from '../types'
+import { defaultToolRetryPolicy, isPermanentError, isTransientError } from '../utils/retry'
 // Re-export types for convenience
 export type { ToolFunction, ToolRegistryInterface } from '../types'
@@ -61,36 +62,63 @@ const toolRegistry: ToolRegistryInterface = {
 // Register built-in tools
-// Read file
+// Read file with retry for transient errors
 toolRegistry.register('Read', async (filePath: unknown): Promise<string | null> => {
   try {
-    return await fs.readFile(filePath as string, 'utf-8')
-  } catch (_error) {
-    // File not found or read error - return null (expected)
+    return await defaultToolRetryPolicy.execute(
+      async () => await fs.readFile(filePath as string, 'utf-8'),
+      `read-${filePath}`
+    )
+  } catch (error) {
+    // Permanent errors (ENOENT, EPERM) - return null (expected)
+    if (isPermanentError(error)) {
+      return null
+    }
+    // Transient errors exhausted retries - return null
+    if (isTransientError(error)) {
+      return null
+    }
+    // Unknown errors - return null (fail gracefully)
     return null
   }
 })
-// Write file
+// Write file with retry for transient errors
 toolRegistry.register('Write', async (filePath: unknown, content: unknown): Promise<boolean> => {
   try {
-    await fs.writeFile(filePath as string, content as string, 'utf-8')
+    await defaultToolRetryPolicy.execute(
+      async () => await fs.writeFile(filePath as string, content as string, 'utf-8'),
+      `write-${filePath}`
+    )
     return true
-  } catch (_error) {
-    // Write error - return false (expected)
+  } catch (error) {
+    // Permanent errors (EPERM, EISDIR) - return false (expected)
+    if (isPermanentError(error)) {
+      return false
+    }
+    // Transient errors exhausted retries - return false
+    if (isTransientError(error)) {
+      return false
+    }
+    // Unknown errors - return false (fail gracefully)
     return false
   }
 })
-// Execute bash command
+// Execute bash command with retry for transient errors
 toolRegistry.register(
   'Bash',
   async (command: unknown): Promise<{ stdout: string; stderr: string }> => {
     try {
-      const { stdout, stderr } = await execAsync(command as string)
-      return { stdout, stderr }
+      return await defaultToolRetryPolicy.execute(
+        async () => await execAsync(command as string),
+        `bash-${command}`
+      )
     } catch (error) {
-      const err = error as { stdout?: string; stderr?: string; message?: string }
+      const err = error as { stdout?: string; stderr?: string; message?: string; code?: string }
+      // For command execution errors, return output with error in stderr
+      // This maintains the existing behavior while adding retry for transient errors
       return {
         stdout: err.stdout || '',
         stderr: err.stderr || err.message || 'Command failed',

package/core/services/agent-generator.ts CHANGED Viewed

@@ -14,6 +14,7 @@ import {
   mergePreservedSections,
   validatePreserveBlocks,
 } from '../utils/preserve-sections'
+import { defaultToolRetryPolicy } from '../utils/retry'
 import type { StackDetection } from './stack-detector'
 // ============================================================================
@@ -169,16 +170,42 @@ export class AgentGenerator {
       agentsToGenerate.push({ name: 'devops', skill: 'developer-kit' })
     }
-    // Generate all domain agents IN PARALLEL
-    await Promise.all(
-      agentsToGenerate.map((agent) => this.generateDomainAgent(agent.name, stats, stack))
+    // Generate all domain agents IN PARALLEL with individual retry
+    // Using Promise.allSettled() so one failure doesn't block others
+    const results = await Promise.allSettled(
+      agentsToGenerate.map((agent) =>
+        defaultToolRetryPolicy.execute(
+          async () => await this.generateDomainAgent(agent.name, stats, stack),
+          `generate-agent-${agent.name}`
+        )
+      )
     )
-    return agentsToGenerate.map((agent) => ({
-      name: agent.name,
-      type: 'domain' as const,
-      skill: agent.skill,
-    }))
+    // Track which agents succeeded and which failed
+    const successfulAgents: AgentInfo[] = []
+    const failedAgents: string[] = []
+    for (let i = 0; i < results.length; i++) {
+      const result = results[i]
+      const agent = agentsToGenerate[i]
+      if (result.status === 'fulfilled') {
+        successfulAgents.push({
+          name: agent.name,
+          type: 'domain' as const,
+          skill: agent.skill,
+        })
+      } else {
+        failedAgents.push(agent.name)
+        // Log failure but continue (don't throw)
+        console.warn(`[prjct] Warning: Failed to generate agent: ${agent.name}`)
+        if (result.reason) {
+          console.warn(`[prjct]   Reason: ${result.reason.message || result.reason}`)
+        }
+      }
+    }
+    return successfulAgents
   }
   /**

package/core/services/agent-service.ts CHANGED Viewed

@@ -8,6 +8,7 @@ import AgentRouter from '../agentic/agent-router'
 import { AgentError } from '../errors'
 import * as agentDetector from '../infrastructure/agent-detector'
 import type { AgentAssignmentResult, AgentInfo, ProjectContext } from '../types'
+import { defaultAgentRetryPolicy } from '../utils/retry'
 // Valid agent types - whitelist for security (prevents path traversal)
 const VALID_AGENT_TYPES = ['claude'] as const
@@ -24,26 +25,30 @@ export class AgentService {
   /**
    * Initialize agent (Claude Code, Desktop, or Terminal)
+   * Wrapped with retry policy to handle transient failures
    */
   async initialize(): Promise<unknown> {
     if (this.agent) return this.agent
-    this.agentInfo = await agentDetector.detect()
+    // Wrap initialization with retry policy (3 attempts, exponential backoff)
+    return await defaultAgentRetryPolicy.execute(async () => {
+      this.agentInfo = await agentDetector.detect()
-    if (!this.agentInfo?.isSupported) {
-      throw AgentError.notSupported(this.agentInfo?.type ?? 'unknown')
-    }
+      if (!this.agentInfo?.isSupported) {
+        throw AgentError.notSupported(this.agentInfo?.type ?? 'unknown')
+      }
-    // Security: validate agent type against whitelist to prevent path traversal
-    const agentType = this.agentInfo.type as ValidAgentType
-    if (!agentType || !VALID_AGENT_TYPES.includes(agentType)) {
-      throw AgentError.notSupported(this.agentInfo?.type ?? 'unknown')
-    }
+      // Security: validate agent type against whitelist to prevent path traversal
+      const agentType = this.agentInfo.type as ValidAgentType
+      if (!agentType || !VALID_AGENT_TYPES.includes(agentType)) {
+        throw AgentError.notSupported(this.agentInfo?.type ?? 'unknown')
+      }
-    const { default: Agent } = await import(`../infrastructure/${agentType}-agent`)
-    this.agent = new Agent()
+      const { default: Agent } = await import(`../infrastructure/${agentType}-agent`)
+      this.agent = new Agent()
-    return this.agent
+      return this.agent
+    }, 'agent-initialization')
   }
   /**

package/core/utils/retry.ts ADDED Viewed

@@ -0,0 +1,318 @@
+/**
+ * Retry Policy Utility
+ *
+ * Provides exponential backoff retry logic with error classification and circuit breaker.
+ * Used to make agent and tool operations resilient against transient failures.
+ *
+ * @module utils/retry
+ * @version 1.0.0
+ */
+// =============================================================================
+// Types
+// =============================================================================
+export interface RetryOptions {
+  /** Maximum number of retry attempts (default: 3) */
+  maxAttempts: number
+  /** Base delay in milliseconds for exponential backoff (default: 1000) */
+  baseDelayMs: number
+  /** Maximum delay in milliseconds (default: 8000) */
+  maxDelayMs: number
+  /** Number of consecutive failures before opening circuit (default: 5) */
+  circuitBreakerThreshold?: number
+  /** Time in milliseconds to keep circuit open (default: 60000) */
+  circuitBreakerTimeoutMs?: number
+}
+export interface CircuitState {
+  consecutiveFailures: number
+  openedAt: number | null
+}
+// =============================================================================
+// Error Classification
+// =============================================================================
+/**
+ * Node.js error codes that indicate transient failures worth retrying
+ */
+const TRANSIENT_ERROR_CODES = new Set([
+  'EBUSY', // Resource busy
+  'EAGAIN', // Resource temporarily unavailable
+  'ETIMEDOUT', // Operation timed out
+  'ECONNRESET', // Connection reset by peer
+  'ECONNREFUSED', // Connection refused (may be temporary)
+  'ENOTFOUND', // DNS lookup failed (may be temporary)
+  'EAI_AGAIN', // DNS temporary failure
+])
+/**
+ * Node.js error codes that indicate permanent failures (fail fast)
+ */
+const PERMANENT_ERROR_CODES = new Set([
+  'ENOENT', // No such file or directory
+  'EACCES', // Permission denied
+  'EPERM', // Operation not permitted
+  'EISDIR', // Is a directory
+  'ENOTDIR', // Not a directory
+  'EINVAL', // Invalid argument
+])
+/**
+ * Check if an error is transient (worth retrying)
+ */
+export function isTransientError(error: unknown): boolean {
+  if (!error || typeof error !== 'object') {
+    return false
+  }
+  const err = error as { code?: string; errno?: number; message?: string }
+  // Check error code
+  if (err.code && TRANSIENT_ERROR_CODES.has(err.code)) {
+    return true
+  }
+  // Permanent errors should never be retried
+  if (err.code && PERMANENT_ERROR_CODES.has(err.code)) {
+    return false
+  }
+  // Check message for timeout indicators
+  if (err.message) {
+    const msg = err.message.toLowerCase()
+    if (msg.includes('timeout') || msg.includes('timed out')) {
+      return true
+    }
+  }
+  // Unknown errors are not retried by default (fail fast)
+  return false
+}
+/**
+ * Check if an error is permanent (should not retry)
+ */
+export function isPermanentError(error: unknown): boolean {
+  if (!error || typeof error !== 'object') {
+    return false
+  }
+  const err = error as { code?: string }
+  return !!(err.code && PERMANENT_ERROR_CODES.has(err.code))
+}
+// =============================================================================
+// Circuit Breaker
+// =============================================================================
+/**
+ * Circuit breaker state registry (per operation ID)
+ */
+const circuitStates = new Map<string, CircuitState>()
+/**
+ * Check if circuit is open for a given operation
+ */
+function isCircuitOpen(operationId: string, threshold: number, timeoutMs: number): boolean {
+  const state = circuitStates.get(operationId)
+  if (!state) {
+    return false
+  }
+  // Circuit is open if threshold exceeded
+  if (state.consecutiveFailures >= threshold && state.openedAt) {
+    const elapsed = Date.now() - state.openedAt
+    // Circuit closes after timeout
+    if (elapsed >= timeoutMs) {
+      // Reset circuit
+      circuitStates.delete(operationId)
+      return false
+    }
+    return true
+  }
+  return false
+}
+/**
+ * Record a failure for circuit breaker
+ */
+function recordFailure(operationId: string, threshold: number): void {
+  const state = circuitStates.get(operationId) || {
+    consecutiveFailures: 0,
+    openedAt: null,
+  }
+  state.consecutiveFailures++
+  // Open circuit if threshold reached
+  if (state.consecutiveFailures >= threshold && !state.openedAt) {
+    state.openedAt = Date.now()
+  }
+  circuitStates.set(operationId, state)
+}
+/**
+ * Record a success (reset circuit breaker)
+ */
+function recordSuccess(operationId: string): void {
+  circuitStates.delete(operationId)
+}
+// =============================================================================
+// Retry Policy
+// =============================================================================
+export class RetryPolicy {
+  private options: Required<RetryOptions>
+  constructor(options: Partial<RetryOptions> = {}) {
+    this.options = {
+      maxAttempts: options.maxAttempts ?? 3,
+      baseDelayMs: options.baseDelayMs ?? 1000,
+      maxDelayMs: options.maxDelayMs ?? 8000,
+      circuitBreakerThreshold: options.circuitBreakerThreshold ?? 5,
+      circuitBreakerTimeoutMs: options.circuitBreakerTimeoutMs ?? 60000,
+    }
+  }
+  /**
+   * Execute an operation with retry logic
+   *
+   * @param operation - Async function to execute
+   * @param operationId - Optional ID for circuit breaker tracking
+   * @returns Result of the operation
+   * @throws Error if all attempts fail or circuit is open
+   */
+  async execute<T>(operation: () => Promise<T>, operationId: string = 'default'): Promise<T> {
+    // Check circuit breaker
+    if (
+      isCircuitOpen(
+        operationId,
+        this.options.circuitBreakerThreshold,
+        this.options.circuitBreakerTimeoutMs
+      )
+    ) {
+      throw new Error(
+        `Circuit breaker is open for operation: ${operationId}. Too many consecutive failures.`
+      )
+    }
+    let lastError: unknown
+    let attempt = 0
+    while (attempt < this.options.maxAttempts) {
+      try {
+        const result = await operation()
+        // Success - reset circuit breaker
+        recordSuccess(operationId)
+        return result
+      } catch (error) {
+        lastError = error
+        attempt++
+        // Check if error is permanent (fail fast)
+        if (isPermanentError(error)) {
+          recordFailure(operationId, this.options.circuitBreakerThreshold)
+          throw error
+        }
+        // Check if error is transient and we have attempts left
+        const shouldRetry = isTransientError(error) && attempt < this.options.maxAttempts
+        if (!shouldRetry) {
+          // Not transient or out of attempts
+          recordFailure(operationId, this.options.circuitBreakerThreshold)
+          throw error
+        }
+        // Calculate delay with exponential backoff
+        const delay = Math.min(
+          this.options.baseDelayMs * 2 ** (attempt - 1),
+          this.options.maxDelayMs
+        )
+        // Wait before retry
+        await new Promise((resolve) => setTimeout(resolve, delay))
+      }
+    }
+    // All attempts failed
+    recordFailure(operationId, this.options.circuitBreakerThreshold)
+    throw lastError
+  }
+  /**
+   * Check if an error is transient (exposed for testing)
+   */
+  isTransientError(error: unknown): boolean {
+    return isTransientError(error)
+  }
+  /**
+   * Check if circuit is open for an operation (exposed for testing)
+   */
+  isCircuitOpen(operationId: string): boolean {
+    return isCircuitOpen(
+      operationId,
+      this.options.circuitBreakerThreshold,
+      this.options.circuitBreakerTimeoutMs
+    )
+  }
+  /**
+   * Get current circuit state for an operation (exposed for testing)
+   */
+  getCircuitState(operationId: string): CircuitState | undefined {
+    return circuitStates.get(operationId)
+  }
+  /**
+   * Reset circuit breaker for an operation (exposed for testing)
+   */
+  resetCircuit(operationId: string): void {
+    circuitStates.delete(operationId)
+  }
+  /**
+   * Reset all circuit breakers (exposed for testing)
+   */
+  resetAllCircuits(): void {
+    circuitStates.clear()
+  }
+}
+// =============================================================================
+// Exports
+// =============================================================================
+/**
+ * Default retry policy for agent operations
+ * - 3 attempts
+ * - 1s base delay
+ * - Up to 8s max delay
+ */
+export const defaultAgentRetryPolicy = new RetryPolicy({
+  maxAttempts: 3,
+  baseDelayMs: 1000,
+  maxDelayMs: 8000,
+})
+/**
+ * Retry policy for tool operations (less aggressive)
+ * - 2 attempts
+ * - 500ms base delay
+ * - Up to 2s max delay
+ */
+export const defaultToolRetryPolicy = new RetryPolicy({
+  maxAttempts: 2,
+  baseDelayMs: 500,
+  maxDelayMs: 2000,
+})

package/dist/bin/prjct.mjs CHANGED Viewed

@@ -18857,6 +18857,211 @@ When fragmenting tasks:
   }
 });
+// core/utils/retry.ts
+function isTransientError(error) {
+  if (!error || typeof error !== "object") {
+    return false;
+  }
+  const err = error;
+  if (err.code && TRANSIENT_ERROR_CODES.has(err.code)) {
+    return true;
+  }
+  if (err.code && PERMANENT_ERROR_CODES.has(err.code)) {
+    return false;
+  }
+  if (err.message) {
+    const msg = err.message.toLowerCase();
+    if (msg.includes("timeout") || msg.includes("timed out")) {
+      return true;
+    }
+  }
+  return false;
+}
+function isPermanentError(error) {
+  if (!error || typeof error !== "object") {
+    return false;
+  }
+  const err = error;
+  return !!(err.code && PERMANENT_ERROR_CODES.has(err.code));
+}
+function isCircuitOpen(operationId, threshold, timeoutMs) {
+  const state = circuitStates.get(operationId);
+  if (!state) {
+    return false;
+  }
+  if (state.consecutiveFailures >= threshold && state.openedAt) {
+    const elapsed = Date.now() - state.openedAt;
+    if (elapsed >= timeoutMs) {
+      circuitStates.delete(operationId);
+      return false;
+    }
+    return true;
+  }
+  return false;
+}
+function recordFailure(operationId, threshold) {
+  const state = circuitStates.get(operationId) || {
+    consecutiveFailures: 0,
+    openedAt: null
+  };
+  state.consecutiveFailures++;
+  if (state.consecutiveFailures >= threshold && !state.openedAt) {
+    state.openedAt = Date.now();
+  }
+  circuitStates.set(operationId, state);
+}
+function recordSuccess(operationId) {
+  circuitStates.delete(operationId);
+}
+var TRANSIENT_ERROR_CODES, PERMANENT_ERROR_CODES, circuitStates, RetryPolicy, defaultAgentRetryPolicy, defaultToolRetryPolicy;
+var init_retry = __esm({
+  "core/utils/retry.ts"() {
+    "use strict";
+    TRANSIENT_ERROR_CODES = /* @__PURE__ */ new Set([
+      "EBUSY",
+      // Resource busy
+      "EAGAIN",
+      // Resource temporarily unavailable
+      "ETIMEDOUT",
+      // Operation timed out
+      "ECONNRESET",
+      // Connection reset by peer
+      "ECONNREFUSED",
+      // Connection refused (may be temporary)
+      "ENOTFOUND",
+      // DNS lookup failed (may be temporary)
+      "EAI_AGAIN"
+      // DNS temporary failure
+    ]);
+    PERMANENT_ERROR_CODES = /* @__PURE__ */ new Set([
+      "ENOENT",
+      // No such file or directory
+      "EACCES",
+      // Permission denied
+      "EPERM",
+      // Operation not permitted
+      "EISDIR",
+      // Is a directory
+      "ENOTDIR",
+      // Not a directory
+      "EINVAL"
+      // Invalid argument
+    ]);
+    __name(isTransientError, "isTransientError");
+    __name(isPermanentError, "isPermanentError");
+    circuitStates = /* @__PURE__ */ new Map();
+    __name(isCircuitOpen, "isCircuitOpen");
+    __name(recordFailure, "recordFailure");
+    __name(recordSuccess, "recordSuccess");
+    RetryPolicy = class {
+      static {
+        __name(this, "RetryPolicy");
+      }
+      options;
+      constructor(options = {}) {
+        this.options = {
+          maxAttempts: options.maxAttempts ?? 3,
+          baseDelayMs: options.baseDelayMs ?? 1e3,
+          maxDelayMs: options.maxDelayMs ?? 8e3,
+          circuitBreakerThreshold: options.circuitBreakerThreshold ?? 5,
+          circuitBreakerTimeoutMs: options.circuitBreakerTimeoutMs ?? 6e4
+        };
+      }
+      /**
+       * Execute an operation with retry logic
+       *
+       * @param operation - Async function to execute
+       * @param operationId - Optional ID for circuit breaker tracking
+       * @returns Result of the operation
+       * @throws Error if all attempts fail or circuit is open
+       */
+      async execute(operation, operationId = "default") {
+        if (isCircuitOpen(
+          operationId,
+          this.options.circuitBreakerThreshold,
+          this.options.circuitBreakerTimeoutMs
+        )) {
+          throw new Error(
+            `Circuit breaker is open for operation: ${operationId}. Too many consecutive failures.`
+          );
+        }
+        let lastError;
+        let attempt = 0;
+        while (attempt < this.options.maxAttempts) {
+          try {
+            const result = await operation();
+            recordSuccess(operationId);
+            return result;
+          } catch (error) {
+            lastError = error;
+            attempt++;
+            if (isPermanentError(error)) {
+              recordFailure(operationId, this.options.circuitBreakerThreshold);
+              throw error;
+            }
+            const shouldRetry = isTransientError(error) && attempt < this.options.maxAttempts;
+            if (!shouldRetry) {
+              recordFailure(operationId, this.options.circuitBreakerThreshold);
+              throw error;
+            }
+            const delay = Math.min(
+              this.options.baseDelayMs * 2 ** (attempt - 1),
+              this.options.maxDelayMs
+            );
+            await new Promise((resolve) => setTimeout(resolve, delay));
+          }
+        }
+        recordFailure(operationId, this.options.circuitBreakerThreshold);
+        throw lastError;
+      }
+      /**
+       * Check if an error is transient (exposed for testing)
+       */
+      isTransientError(error) {
+        return isTransientError(error);
+      }
+      /**
+       * Check if circuit is open for an operation (exposed for testing)
+       */
+      isCircuitOpen(operationId) {
+        return isCircuitOpen(
+          operationId,
+          this.options.circuitBreakerThreshold,
+          this.options.circuitBreakerTimeoutMs
+        );
+      }
+      /**
+       * Get current circuit state for an operation (exposed for testing)
+       */
+      getCircuitState(operationId) {
+        return circuitStates.get(operationId);
+      }
+      /**
+       * Reset circuit breaker for an operation (exposed for testing)
+       */
+      resetCircuit(operationId) {
+        circuitStates.delete(operationId);
+      }
+      /**
+       * Reset all circuit breakers (exposed for testing)
+       */
+      resetAllCircuits() {
+        circuitStates.clear();
+      }
+    };
+    defaultAgentRetryPolicy = new RetryPolicy({
+      maxAttempts: 3,
+      baseDelayMs: 1e3,
+      maxDelayMs: 8e3
+    });
+    defaultToolRetryPolicy = new RetryPolicy({
+      maxAttempts: 2,
+      baseDelayMs: 500,
+      maxDelayMs: 2e3
+    });
+  }
+});
 // core/agentic/tool-registry.ts
 import { exec as exec7 } from "node:child_process";
 import fs36 from "node:fs/promises";
@@ -18865,6 +19070,7 @@ var execAsync4, toolRegistry, tool_registry_default;
 var init_tool_registry = __esm({
   "core/agentic/tool-registry.ts"() {
     "use strict";
+    init_retry();
     execAsync4 = promisify8(exec7);
     toolRegistry = {
       tools: /* @__PURE__ */ new Map(),
@@ -18903,16 +19109,34 @@ var init_tool_registry = __esm({
     };
     toolRegistry.register("Read", async (filePath) => {
       try {
-        return await fs36.readFile(filePath, "utf-8");
-      } catch (_error) {
+        return await defaultToolRetryPolicy.execute(
+          async () => await fs36.readFile(filePath, "utf-8"),
+          `read-${filePath}`
+        );
+      } catch (error) {
+        if (isPermanentError(error)) {
+          return null;
+        }
+        if (isTransientError(error)) {
+          return null;
+        }
         return null;
       }
     });
     toolRegistry.register("Write", async (filePath, content) => {
       try {
-        await fs36.writeFile(filePath, content, "utf-8");
+        await defaultToolRetryPolicy.execute(
+          async () => await fs36.writeFile(filePath, content, "utf-8"),
+          `write-${filePath}`
+        );
         return true;
-      } catch (_error) {
+      } catch (error) {
+        if (isPermanentError(error)) {
+          return false;
+        }
+        if (isTransientError(error)) {
+          return false;
+        }
         return false;
       }
     });
@@ -18920,8 +19144,10 @@ var init_tool_registry = __esm({
       "Bash",
       async (command) => {
         try {
-          const { stdout, stderr } = await execAsync4(command);
-          return { stdout, stderr };
+          return await defaultToolRetryPolicy.execute(
+            async () => await execAsync4(command),
+            `bash-${command}`
+          );
         } catch (error) {
           const err = error;
           return {
@@ -19564,6 +19790,7 @@ var init_agent_generator = __esm({
   "core/services/agent-generator.ts"() {
     "use strict";
     init_preserve_sections();
+    init_retry();
   }
 });
@@ -19792,6 +20019,7 @@ var init_agent_service = __esm({
     init_agent_router();
     init_errors();
     init_agent_detector();
+    init_retry();
     init_();
     VALID_AGENT_TYPES = ["claude"];
     AgentService = class {
@@ -19806,20 +20034,23 @@ var init_agent_service = __esm({
       }
       /**
        * Initialize agent (Claude Code, Desktop, or Terminal)
+       * Wrapped with retry policy to handle transient failures
        */
       async initialize() {
         if (this.agent) return this.agent;
-        this.agentInfo = await detect2();
-        if (!this.agentInfo?.isSupported) {
-          throw AgentError.notSupported(this.agentInfo?.type ?? "unknown");
-        }
-        const agentType = this.agentInfo.type;
-        if (!agentType || !VALID_AGENT_TYPES.includes(agentType)) {
-          throw AgentError.notSupported(this.agentInfo?.type ?? "unknown");
-        }
-        const { default: Agent } = await globImport_infrastructure_agent(`../infrastructure/${agentType}-agent`);
-        this.agent = new Agent();
-        return this.agent;
+        return await defaultAgentRetryPolicy.execute(async () => {
+          this.agentInfo = await detect2();
+          if (!this.agentInfo?.isSupported) {
+            throw AgentError.notSupported(this.agentInfo?.type ?? "unknown");
+          }
+          const agentType = this.agentInfo.type;
+          if (!agentType || !VALID_AGENT_TYPES.includes(agentType)) {
+            throw AgentError.notSupported(this.agentInfo?.type ?? "unknown");
+          }
+          const { default: Agent } = await globImport_infrastructure_agent(`../infrastructure/${agentType}-agent`);
+          this.agent = new Agent();
+          return this.agent;
+        }, "agent-initialization");
       }
       /**
        * Get current agent info
@@ -34241,7 +34472,7 @@ var require_package = __commonJS({
   "package.json"(exports, module) {
     module.exports = {
       name: "prjct-cli",
-      version: "1.19.0",
+      version: "1.20.0",
       description: "Context layer for AI agents. Project context for Claude Code, Gemini CLI, and more.",
       main: "core/index.ts",
       bin: {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "prjct-cli",
-  "version": "1.19.0",
+  "version": "1.20.0",
   "description": "Context layer for AI agents. Project context for Claude Code, Gemini CLI, and more.",
   "main": "core/index.ts",
   "bin": {