npm - @fluentcommerce/fc-connect-sdk - Versions diffs - 0.1.54 → 0.1.55 - Mend

@fluentcommerce/fc-connect-sdk 0.1.54 → 0.1.55

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (475) hide show

package/docs/01-TEMPLATES/patterns/patterns-large-file-splitting.md CHANGED Viewed

@@ -1,1430 +1,1430 @@
-# Pattern: Large File Processing & Chunking
-**FC Connect SDK Use Case Guide**
-> **SDK**: [@fluentcommerce/fc-connect-sdk](https://www.npmjs.com/package/@fluentcommerce/fc-connect-sdk)
-> **Version**: Use latest - `npm install @fluentcommerce/fc-connect-sdk@latest`
-**Context**: Enterprise-scale file ingestion with streaming, splitting, and parallel processing
-**Type**: Advanced Pattern
-**Complexity**: High
-**Volume**: 500MB-5GB files, 1M-10M records
-**Latency**: Batch processing (< 30-60 min for 10M records)
-**Pattern**: Streaming + chunking + parallel Batch API
-## When to Use This Pattern
-Use this pattern when dealing with:
-- **Large CSV files** (>500MB, >1M records)
-- **Memory-constrained environments** (Lambda, containers with limited RAM)
-- **Time-sensitive ingestion** (need parallel processing for speed)
-- **Reliability requirements** (checkpoint/resume on failure)
-- **Progress tracking** (real-time status updates)
-**Volume Guidance:**
-- **Small** (<1K records): Use basic ingestion pattern
-- **Medium** (1K-100K records): Use streaming pattern (Pattern 1)
-- **Large** (100K-1M records): Use file chunking pattern (Pattern 2)
-- **Huge** (1M-10M records): Use parallel processing pattern (Pattern 3)
-- **Enterprise** (10M+ records): Use distributed processing pattern (Pattern 4)
-## Problem Statement
-### Why Splitting is Needed
-**Memory Constraints:**
-```typescript
-// ❌ WRONG - Loads entire 2GB file into memory
-const csvContent = await fs.readFile('huge-inventory.csv', 'utf-8');
-const records = await csvParser.parse(csvContent); // 💥 Out of memory
-```
-**Impact:**
-- Lambda 512MB: Crashes on 500MB+ files
-- Container 1GB: Struggles with 1GB+ files
-- Node.js default heap (4GB): Fails on 5GB+ files
-**Time Constraints:**
-```typescript
-// ❌ WRONG - Sequential processing takes 90+ minutes
-for (const record of records) {
-  await processRecord(record); // Too slow for 10M records
-}
-```
-**Reliability Requirements:**
-```typescript
-// ❌ WRONG - Network failure loses all progress
-await processAllRecords(records); // If fails at record 5M, restart from 0
-```
-### Solution Overview
-This guide demonstrates 4 progressive patterns:
-1. **Basic Streaming** (~200 lines) - Process records as they arrive, memory-efficient
-2. **File Chunking** (~300 lines) - Split large files into manageable chunks
-3. **Parallel Processing** (~400 lines) - Process chunks concurrently with progress tracking
-4. **Distributed Processing** (~300 lines) - Use Versori scheduled workflows for enterprise scale
-## SDK Methods Used
-```typescript
-import {
-  createClient,
-  // Client factory (auto-detects context)
-  CSVParserService,
-  // Streaming CSV parser
-  S3DataSource,
-  // S3 file operations
-  UniversalMapper,
-  // Field mapping
-  StateService,
-  // Progress tracking
-  VersoriKVAdapter,
-  // Versori state management,
-  // Structured logging,
-  createConsoleLogger,
-  toStructuredLogger
-} from '@fluentcommerce/fc-connect-sdk';
-```
----
-## Pattern 1: Basic Streaming (Memory-Efficient)
-**Best for:** 100K-1M records, single-threaded processing, memory-constrained environments
-**Memory Usage:**
-- ❌ Without streaming: 2GB file = 2GB+ RAM (file + parsed objects)
-- ✅ With streaming: 2GB file = ~50MB RAM (processes records incrementally)
-### Implementation
-```typescript
-import {
-  createClient,
-  CSVParserService,
-  S3DataSource,
-  UniversalMapper,
-  createConsoleLogger,
-  toStructuredLogger
-} from '@fluentcommerce/fc-connect-sdk';
-const logger = createConsoleLogger();
-async function streamingIngestion(ctx: any) {
-  logger.info('Starting streaming ingestion');
-  // Create client (auto-detects Versori context)
-  const client = await createClient(ctx);
-  // Initialize S3 data source
-  const s3 = new S3DataSource(
-    {
-      type: 'S3_CSV',
-      connectionId: 'my-s3',
-      name: 'Inventory Files S3',
-      s3Config: {
-        bucket: 'inventory-files',
-        region: 'us-east-1',
-        accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
-        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
-      },
-    },
-    logger
-  );
-  // Define field mapping
-  const mapper = new UniversalMapper({
-    fields: {
-      skuRef: { source: 'sku', required: true },
-      locationRef: { source: 'location_code', required: true },
-      qty: { source: 'quantity', resolver: 'sdk.parseInt' },
-      expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
-    },
-  });
-  // Create CSV parser with streaming enabled
-  const csvParser = new CSVParserService();
-  // Download file as stream (not loaded into memory)
-  logger.info('Downloading file from S3', {
-    key: 'inventory/large-file.csv',
-  });
-  const fileContent = (await s3.downloadFile('inventory/large-file.csv', {
-    encoding: 'utf8',
-  })) as string;
-  // Create job for batch ingestion
-  const job = await client.createJob({
-    name: 'streaming-inventory-ingestion',
-    retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
-  });
-  logger.info('Job created', { jobId: job.id });
-  // Statistics tracking
-  let recordsProcessed = 0;
-  let batchCount = 0;
-  let errors = 0;
-  const BATCH_SIZE = 1000;
-  let currentBatch: any[] = [];
-  // Stream records with batching (memory-efficient)
-  // Records are parsed incrementally, not all at once
-  for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
-    try {
-      // Map record
-      const mapped = await mapper.map(record);
-      if (mapped.success && mapped.data) {
-        currentBatch.push(mapped.data);
-        recordsProcessed++;
-        // Send batch when full
-        if (currentBatch.length >= BATCH_SIZE) {
-          await client.sendBatch(job.id, {
-            entities: currentBatch,
-          });
-          batchCount++;
-          logger.info('Batch sent', {
-            batchNumber: batchCount,
-            recordsProcessed,
-            currentBatchSize: currentBatch.length,
-          });
-          currentBatch = []; // Clear batch (frees memory)
-        }
-      } else {
-        errors++;
-        logger.warn('Record mapping failed', {
-          record,
-          errors: mapped.errors,
-        });
-      }
-    } catch (error) {
-      errors++;
-      logger.error('Record processing failed', error as Error, { record });
-    }
-    // Progress logging every 10K records
-    if (recordsProcessed % 10000 === 0) {
-      logger.info('Progress update', {
-        recordsProcessed,
-        batchesSent: batchCount,
-        errors,
-        memoryUsage: process.memoryUsage().heapUsed / 1024 / 1024 + ' MB',
-      });
-    }
-  }
-  // Send remaining records
-  if (currentBatch.length > 0) {
-    await client.sendBatch(job.id, {
-      entities: currentBatch,
-    });
-    batchCount++;
-  }
-  logger.info('Streaming ingestion complete', {
-    totalRecords: recordsProcessed,
-    batchesSent: batchCount,
-    errors,
-    jobId: job.id,
-  });
-  return {
-    success: true,
-    jobId: job.id,
-    recordsProcessed,
-    batchesSent: batchCount,
-    errors,
-  };
-}
-```
-**Memory Profile:**
-```
-File Size: 2GB (5M records)
-RAM Usage: ~50MB peak (1000 record batches)
-Processing Time: ~45 minutes (sequential)
-```
----
-## Pattern 2: File Chunking (Split & Track)
-**Best for:** 1M-5M records, need checkpoint/resume, want progress visibility
-**Strategy:**
-1. Split large file into 100K record chunks
-2. Write chunks to temp S3 locations
-3. Track chunk metadata in VersoriKV
-4. Process chunks sequentially (can resume on failure)
-### Implementation
-```typescript
-import {
-  createClient,
-  CSVParserService,
-  S3DataSource,
-  UniversalMapper,
-  StateService,
-  VersoriKVAdapter,
-  createConsoleLogger,
-  toStructuredLogger
-} from '@fluentcommerce/fc-connect-sdk';
-const logger = createConsoleLogger();
-interface ChunkMetadata {
-  chunkId: string;
-  startRecord: number;
-  endRecord: number;
-  s3Key: string;
-  recordCount: number;
-  status: 'pending' | 'processing' | 'completed' | 'failed';
-  processedAt?: string;
-  error?: string;
-}
-async function chunkedIngestion(ctx: any) {
-  logger.info('Starting chunked ingestion');
-  // Initialize services
-  const client = await createClient(ctx);
-  const s3 = new S3DataSource(
-    {
-      type: 'S3_CSV',
-      connectionId: 'my-s3-chunked',
-      name: 'Inventory Files S3 Chunked',
-      s3Config: {
-        bucket: 'inventory-files',
-        region: 'us-east-1',
-        accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
-        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
-      },
-    },
-    logger
-  );
-  // Initialize state management
-  const kv = context.openKv();
-  const kvAdapter = new VersoriKVAdapter(kv);
-  const stateService = new StateService(logger);
-  const SOURCE_FILE = 'inventory/huge-inventory.csv';
-  const CHUNK_SIZE = 100000; // 100K records per chunk
-  const workflowId = 'chunked-ingestion';
-  // STEP 1: Check if chunking is already in progress
-  const existingState = await stateService.getSyncState(kvAdapter, workflowId);
-  if (existingState.isInitialized && existingState.lastSyncResult === 'partial') {
-    logger.info('Resuming from previous run', {
-      lastProcessedFile: existingState.lastProcessedFile,
-      lastProcessedCount: existingState.lastProcessedCount,
-    });
-  }
-  // STEP 2: Split file into chunks
-  logger.info('Splitting file into chunks', {
-    sourceFile: SOURCE_FILE,
-    chunkSize: CHUNK_SIZE,
-  });
-  const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
-  logger.info('File split complete', {
-    totalChunks: chunks.length,
-    totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
-  });
-  // STEP 3: Create job for ingestion
-  const job = await client.createJob({
-    name: `chunked-inventory-ingestion-${Date.now()}`,
-    retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
-  });
-  logger.info('Job created', { jobId: job.id });
-  // STEP 4: Process each chunk sequentially
-  let successCount = 0;
-  let failureCount = 0;
-  for (const chunk of chunks) {
-    try {
-      // Skip if already processed
-      const chunkState = await kvAdapter.get(['chunk', workflowId, chunk.chunkId, 'status']);
-      if (chunkState?.value === 'completed') {
-        logger.info('Chunk already processed, skipping', {
-          chunkId: chunk.chunkId,
-        });
-        successCount++;
-        continue;
-      }
-      // Mark chunk as processing
-      await kvAdapter.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
-      logger.info('Processing chunk', {
-        chunkId: chunk.chunkId,
-        recordCount: chunk.recordCount,
-        progress: `${successCount + failureCount}/${chunks.length}`,
-      });
-      // Process chunk
-      await processChunk(s3, client, job.id, chunk);
-      // Mark chunk as completed
-      await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
-        ...chunk,
-        status: 'completed',
-        processedAt: new Date().toISOString(),
-      } as ChunkMetadata);
-      successCount++;
-      logger.info('Chunk completed', {
-        chunkId: chunk.chunkId,
-        successCount,
-        failureCount,
-        percentComplete: (((successCount + failureCount) / chunks.length) * 100).toFixed(1),
-      });
-    } catch (error) {
-      failureCount++;
-      logger.error('Chunk processing failed', error as Error, {
-        chunkId: chunk.chunkId,
-      });
-      // Mark chunk as failed
-      await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
-        ...chunk,
-        status: 'failed',
-        error: (error as Error).message,
-      } as ChunkMetadata);
-    }
-  }
-  // STEP 5: Update final state
-  await stateService.updateSyncState(
-    kvAdapter,
-    [
-      {
-        fileName: SOURCE_FILE,
-        lastModified: new Date().toISOString(),
-        recordCount: chunks.reduce((sum, c) => sum + c.recordCount, 0),
-      },
-    ],
-    workflowId
-  );
-  logger.info('Chunked ingestion complete', {
-    totalChunks: chunks.length,
-    successCount,
-    failureCount,
-    jobId: job.id,
-  });
-  return {
-    success: failureCount === 0,
-    jobId: job.id,
-    chunksProcessed: successCount,
-    chunksFailed: failureCount,
-    totalChunks: chunks.length,
-  };
-}
-/**
- * Split file into chunks and upload to S3
- */
-async function splitFileIntoChunks(
-  s3: S3DataSource,
-  sourceKey: string,
-  chunkSize: number,
-  workflowId: string,
-  kv: VersoriKVAdapter
-): Promise<ChunkMetadata[]> {
-  const csvParser = new CSVParserService();
-  const chunks: ChunkMetadata[] = [];
-  // Download source file
-  const fileContent = (await s3.downloadFile(sourceKey, {
-    encoding: 'utf8',
-  })) as string;
-  let currentChunk: any[] = [];
-  let chunkNumber = 0;
-  let recordNumber = 0;
-  // Stream through file and create chunks
-  for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
-    currentChunk.push(record);
-    recordNumber++;
-    // Create chunk when size reached
-    if (currentChunk.length >= chunkSize) {
-      const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
-      const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
-      // Convert chunk to CSV
-      const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
-      // Upload chunk to S3
-      await s3.uploadFile(chunkKey, chunkCSV, {
-        contentType: 'text/csv',
-      });
-      // Create chunk metadata
-      const metadata: ChunkMetadata = {
-        chunkId,
-        startRecord: recordNumber - currentChunk.length,
-        endRecord: recordNumber - 1,
-        s3Key: chunkKey,
-        recordCount: currentChunk.length,
-        status: 'pending',
-      };
-      chunks.push(metadata);
-      // Store chunk metadata in KV
-      await kv.set(['chunk', workflowId, chunkId], metadata);
-      logger.info('Chunk created', {
-        chunkId,
-        recordCount: currentChunk.length,
-        s3Key: chunkKey,
-      });
-      // Clear chunk (free memory)
-      currentChunk = [];
-      chunkNumber++;
-    }
-  }
-  // Handle remaining records
-  if (currentChunk.length > 0) {
-    const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
-    const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
-    const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
-    await s3.uploadFile(chunkKey, chunkCSV, { contentType: 'text/csv' });
-    const metadata: ChunkMetadata = {
-      chunkId,
-      startRecord: recordNumber - currentChunk.length,
-      endRecord: recordNumber - 1,
-      s3Key: chunkKey,
-      recordCount: currentChunk.length,
-      status: 'pending',
-    };
-    chunks.push(metadata);
-    await kv.set(['chunk', workflowId, chunkId], metadata);
-  }
-  return chunks;
-}
-/**
- * Process a single chunk
- */
-async function processChunk(
-  s3: S3DataSource,
-  client: any,
-  jobId: string,
-  chunk: ChunkMetadata
-): Promise<void> {
-  const csvParser = new CSVParserService();
-  const mapper = new UniversalMapper({
-    fields: {
-      skuRef: { source: 'sku', required: true },
-      locationRef: { source: 'location_code', required: true },
-      qty: { source: 'quantity', resolver: 'sdk.parseInt' },
-      expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
-    },
-  });
-  // Download chunk
-  const chunkContent = (await s3.downloadFile(chunk.s3Key, {
-    encoding: 'utf8',
-  })) as string;
-  // Parse chunk
-  const records = await csvParser.parse(chunkContent);
-  // Map records
-  const entities: any[] = [];
-  for (const record of records) {
-    const mapped = await mapper.map(record);
-    if (mapped.success && mapped.data) {
-      entities.push(mapped.data);
-    }
-  }
-  // Send batch
-  await client.sendBatch(jobId, { entities });
-  logger.info('Chunk batch sent', {
-    chunkId: chunk.chunkId,
-    entityCount: entities.length,
-  });
-}
-```
-**VersoriKV Schema:**
-```typescript
-// Chunk metadata
-['chunk', workflowId, chunkId] => ChunkMetadata
-// Chunk status
-['chunk', workflowId, chunkId, 'status'] => 'pending' | 'processing' | 'completed' | 'failed'
-// Workflow state
-['state', workflowId, 'sync'] => SyncState
-```
-**Performance:**
-```
-File Size: 5GB (10M records)
-Chunk Size: 100K records
-Total Chunks: 100
-Processing Time: ~60 minutes (sequential)
-RAM Usage: ~100MB (processes one chunk at a time)
-```
----
-## Pattern 3: Parallel Processing (High Performance)
-**Best for:** 5M-10M records, time-sensitive ingestion, need speed with reliability
-**Strategy:**
-1. Split file into chunks (same as Pattern 2)
-2. Spawn 5 parallel Batch API jobs
-3. Process chunks concurrently
-4. Track progress in VersoriKV
-5. Resume on failure
-### Implementation
-```typescript
-import {
-  createClient,
-  CSVParserService,
-  S3DataSource,
-  UniversalMapper,
-  StateService,
-  VersoriKVAdapter,
-  createConsoleLogger,
-  toStructuredLogger
-} from '@fluentcommerce/fc-connect-sdk';
-const logger = createConsoleLogger();
-interface ParallelJob {
-  jobId: string;
-  assignedChunks: string[];
-  status: 'pending' | 'processing' | 'completed' | 'failed';
-  recordsProcessed: number;
-  startedAt?: string;
-  completedAt?: string;
-}
-async function parallelIngestion(ctx: any) {
-  logger.info('Starting parallel ingestion');
-  // Initialize services
-  const client = await createClient(ctx);
-  const s3 = new S3DataSource(
-    {
-      type: 'S3_CSV',
-      connectionId: 'my-s3-parallel',
-      name: 'Inventory Files S3 Parallel',
-      s3Config: {
-        bucket: 'inventory-files',
-        region: 'us-east-1',
-        accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
-        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
-      },
-    },
-    logger
-  );
-  const kv = context.openKv();
-  const kvAdapter = new VersoriKVAdapter(kv);
-  const stateService = new StateService(logger);
-  const SOURCE_FILE = 'inventory/huge-inventory.csv';
-  const CHUNK_SIZE = 100000; // 100K records per chunk
-  const PARALLEL_JOBS = 5; // Process 5 chunks concurrently
-  const workflowId = 'parallel-ingestion';
-  // STEP 1: Split file into chunks (reuse from Pattern 2)
-  const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
-  logger.info('File split complete', {
-    totalChunks: chunks.length,
-    totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
-  });
-  // STEP 2: Create multiple jobs for parallel processing
-  const jobs: ParallelJob[] = [];
-  for (let i = 0; i < PARALLEL_JOBS; i++) {
-    const job = await client.createJob({
-      name: `parallel-inventory-ingestion-job-${i + 1}`,
-      retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
-    });
-    jobs.push({
-      jobId: job.id,
-      assignedChunks: [],
-      status: 'pending',
-      recordsProcessed: 0,
-    });
-    logger.info('Parallel job created', {
-      jobNumber: i + 1,
-      jobId: job.id,
-    });
-  }
-  // STEP 3: Distribute chunks across jobs (round-robin)
-  chunks.forEach((chunk, index) => {
-    const jobIndex = index % PARALLEL_JOBS;
-    jobs[jobIndex].assignedChunks.push(chunk.chunkId);
-  });
-  logger.info('Chunks distributed', {
-    totalChunks: chunks.length,
-    jobCount: PARALLEL_JOBS,
-    chunksPerJob: jobs.map(j => j.assignedChunks.length),
-  });
-  // STEP 4: Process chunks in parallel
-  const startTime = Date.now();
-  const jobPromises = jobs.map((job, jobIndex) =>
-    processJobChunks(
-      s3,
-      client,
-      job,
-      chunks.filter(c => job.assignedChunks.includes(c.chunkId)),
-      workflowId,
-      kvAdapter,
-      jobIndex + 1
-    )
-  );
-  // Wait for all jobs to complete
-  const results = await Promise.allSettled(jobPromises);
-  const duration = (Date.now() - startTime) / 1000;
-  // STEP 5: Analyze results
-  let successfulJobs = 0;
-  let failedJobs = 0;
-  let totalRecordsProcessed = 0;
-  results.forEach((result, index) => {
-    if (result.status === 'fulfilled') {
-      successfulJobs++;
-      totalRecordsProcessed += result.value.recordsProcessed;
-      logger.info('Job completed', {
-        jobNumber: index + 1,
-        jobId: jobs[index].jobId,
-        recordsProcessed: result.value.recordsProcessed,
-        chunksProcessed: result.value.chunksProcessed,
-      });
-    } else {
-      failedJobs++;
-      logger.error('Job failed', result.reason, {
-        jobNumber: index + 1,
-        jobId: jobs[index].jobId,
-      });
-    }
-  });
-  // STEP 6: Update final state
-  await stateService.updateSyncState(
-    kvAdapter,
-    [
-      {
-        fileName: SOURCE_FILE,
-        lastModified: new Date().toISOString(),
-        recordCount: totalRecordsProcessed,
-      },
-    ],
-    workflowId
-  );
-  logger.info('Parallel ingestion complete', {
-    totalChunks: chunks.length,
-    parallelJobs: PARALLEL_JOBS,
-    successfulJobs,
-    failedJobs,
-    totalRecordsProcessed,
-    durationSeconds: duration,
-    recordsPerSecond: Math.round(totalRecordsProcessed / duration),
-  });
-  return {
-    success: failedJobs === 0,
-    totalChunks: chunks.length,
-    totalRecordsProcessed,
-    successfulJobs,
-    failedJobs,
-    durationSeconds: duration,
-    recordsPerSecond: Math.round(totalRecordsProcessed / duration),
-  };
-}
-/**
- * Process all chunks assigned to a job
- */
-async function processJobChunks(
-  s3: S3DataSource,
-  client: any,
-  job: ParallelJob,
-  chunks: ChunkMetadata[],
-  workflowId: string,
-  kv: VersoriKVAdapter,
-  jobNumber: number
-): Promise<{ recordsProcessed: number; chunksProcessed: number }> {
-  logger.info(`Job ${jobNumber} starting`, {
-    jobId: job.jobId,
-    assignedChunks: chunks.length,
-  });
-  let recordsProcessed = 0;
-  let chunksProcessed = 0;
-  for (const chunk of chunks) {
-    try {
-      // Check if chunk already processed
-      const chunkState = await kv.get(['chunk', workflowId, chunk.chunkId, 'status']);
-      if (chunkState?.value === 'completed') {
-        logger.info(`Job ${jobNumber}: Chunk already processed`, {
-          chunkId: chunk.chunkId,
-        });
-        chunksProcessed++;
-        continue;
-      }
-      // Mark chunk as processing
-      await kv.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
-      logger.info(`Job ${jobNumber}: Processing chunk`, {
-        chunkId: chunk.chunkId,
-        recordCount: chunk.recordCount,
-        progress: `${chunksProcessed}/${chunks.length}`,
-      });
-      // Process chunk
-      await processChunk(s3, client, job.jobId, chunk);
-      // Mark chunk as completed
-      await kv.set(['chunk', workflowId, chunk.chunkId], {
-        ...chunk,
-        status: 'completed',
-        processedAt: new Date().toISOString(),
-      } as ChunkMetadata);
-      recordsProcessed += chunk.recordCount;
-      chunksProcessed++;
-      logger.info(`Job ${jobNumber}: Chunk completed`, {
-        chunkId: chunk.chunkId,
-        recordsProcessed,
-        chunksProcessed,
-        percentComplete: ((chunksProcessed / chunks.length) * 100).toFixed(1),
-      });
-    } catch (error) {
-      logger.error(`Job ${jobNumber}: Chunk failed`, error as Error, {
-        chunkId: chunk.chunkId,
-      });
-      // Mark chunk as failed (don't throw - continue with remaining chunks)
-      await kv.set(['chunk', workflowId, chunk.chunkId], {
-        ...chunk,
-        status: 'failed',
-        error: (error as Error).message,
-      } as ChunkMetadata);
-    }
-  }
-  logger.info(`Job ${jobNumber} completed`, {
-    jobId: job.jobId,
-    recordsProcessed,
-    chunksProcessed,
-  });
-  return { recordsProcessed, chunksProcessed };
-}
-```
-**Progress Tracking:**
-```typescript
-// Real-time progress query
-async function getIngestionProgress(
-  workflowId: string,
-  kv: VersoriKVAdapter
-): Promise<{
-  totalChunks: number;
-  completedChunks: number;
-  failedChunks: number;
-  processingChunks: number;
-  percentComplete: number;
-}> {
-  // This would query all chunk statuses from KV
-  // Simplified example:
-  const chunks = await getAllChunkMetadata(workflowId, kv);
-  const completed = chunks.filter(c => c.status === 'completed').length;
-  const failed = chunks.filter(c => c.status === 'failed').length;
-  const processing = chunks.filter(c => c.status === 'processing').length;
-  return {
-    totalChunks: chunks.length,
-    completedChunks: completed,
-    failedChunks: failed,
-    processingChunks: processing,
-    percentComplete: (completed / chunks.length) * 100,
-  };
-}
-```
-**Performance:**
-```
-File Size: 5GB (10M records)
-Chunk Size: 100K records
-Total Chunks: 100
-Parallel Jobs: 5
-Processing Time: ~15 minutes (4x speedup)
-RAM Usage: ~500MB (5 chunks in parallel)
-Throughput: ~11,111 records/second
-```
----
-## Pattern 4: Distributed Processing (Versori Workflows)
-**Best for:** 10M+ records, enterprise scale, need maximum reliability and observability
-**Strategy:**
-1. Coordinator workflow splits file and creates scheduled tasks
-2. Each worker workflow processes one chunk
-3. Coordinator tracks completion via VersoriKV
-4. Automatic retry on worker failure
-### Coordinator Workflow
-```typescript
-import { fn, schedule } from '@versori/run';
-import {
-  createClient,
-  S3DataSource,
-  VersoriKVAdapter,
-  createConsoleLogger,
-  toStructuredLogger
-} from '@fluentcommerce/fc-connect-sdk';
-const logger = createConsoleLogger();
-/**
- * Coordinator workflow - splits file and spawns workers
- */
-export const coordinatorWorkflow = schedule('coordinator')
-  .cron('0 2 * * *') // Run daily at 2 AM
-  .then(
-    fn('split-and-schedule', async ({ activation, connections, kv }) => {
-      logger.info('Coordinator: Starting distributed ingestion');
-      const s3 = new S3DataSource(
-        {
-          type: 'S3_CSV',
-          connectionId: 'my-s3-3',
-          name: 'Inventory Files S3 3',
-          s3Config: {
-            bucket: 'inventory-files',
-            region: 'us-east-1',
-            accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
-            secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
-          },
-        },
-        logger
-      );
-      const kvAdapter = new VersoriKVAdapter(kv);
-      const workflowId = `distributed-${Date.now()}`;
-      const SOURCE_FILE = 'inventory/enterprise-inventory.csv';
-      const CHUNK_SIZE = 100000;
-      // Split file into chunks
-      const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
-      logger.info('Coordinator: File split complete', {
-        totalChunks: chunks.length,
-        workflowId,
-      });
-      // Store coordinator state
-      await kvAdapter.set(['coordinator', workflowId], {
-        workflowId,
-        sourceFile: SOURCE_FILE,
-        totalChunks: chunks.length,
-        status: 'scheduled',
-        createdAt: new Date().toISOString(),
-      });
-      // Schedule worker for each chunk
-      for (const chunk of chunks) {
-        // Trigger worker workflow (Versori will handle scheduling)
-        await activation.triggerWorkflow('chunk-worker', {
-          workflowId,
-          chunkId: chunk.chunkId,
-          chunkKey: chunk.s3Key,
-          recordCount: chunk.recordCount,
-        });
-        logger.info('Coordinator: Worker scheduled', {
-          chunkId: chunk.chunkId,
-          workflowId,
-        });
-      }
-      return {
-        workflowId,
-        totalChunks: chunks.length,
-        message: `Scheduled ${chunks.length} worker workflows`,
-      };
-    })
-  );
-/**
- * Monitor workflow - checks completion status
- */
-export const monitorWorkflow = schedule('monitor')
-  .cron('*/5 * * * *') // Run every 5 minutes
-  .then(
-    fn('check-progress', async ({ kv }) => {
-      const kvAdapter = new VersoriKVAdapter(kv);
-      // Get all active coordinators
-      const coordinators = await getActiveCoordinators(kvAdapter);
-      for (const coordinator of coordinators) {
-        const progress = await getIngestionProgress(coordinator.workflowId, kvAdapter);
-        logger.info('Monitor: Progress update', {
-          workflowId: coordinator.workflowId,
-          ...progress,
-        });
-        // Check if complete
-        if (progress.completedChunks + progress.failedChunks === progress.totalChunks) {
-          // Mark coordinator as complete
-          await kvAdapter.set(['coordinator', coordinator.workflowId], {
-            ...coordinator,
-            status: 'completed',
-            completedAt: new Date().toISOString(),
-            progress,
-          });
-          logger.info('Monitor: Ingestion complete', {
-            workflowId: coordinator.workflowId,
-            ...progress,
-          });
-        }
-      }
-      return { coordinatorsChecked: coordinators.length };
-    })
-  );
-```
-### Worker Workflow
-```typescript
-import { fn, webhook } from '@versori/run';
-import {
-  createClient,
-  S3DataSource,
-  CSVParserService,
-  UniversalMapper,
-  VersoriKVAdapter,
-  createConsoleLogger,
-  toStructuredLogger
-} from '@fluentcommerce/fc-connect-sdk';
-const logger = createConsoleLogger();
-/**
- * Worker workflow - processes a single chunk
- */
-export const chunkWorker = webhook('chunk-worker').then(
-  fn('process-chunk', async ({ data, activation, connections, kv }) => {
-    const { workflowId, chunkId, chunkKey, recordCount } = data;
-    logger.info('Worker: Starting chunk processing', {
-      workflowId,
-      chunkId,
-      recordCount,
-    });
-    const kvAdapter = new VersoriKVAdapter(kv);
-    // Check if already processed
-    const chunkState = await kvAdapter.get(['chunk', workflowId, chunkId, 'status']);
-    if (chunkState?.value === 'completed') {
-      logger.info('Worker: Chunk already processed', { chunkId });
-      return { chunkId, status: 'skipped', message: 'Already processed' };
-    }
-    // Mark as processing
-    await kvAdapter.set(['chunk', workflowId, chunkId, 'status'], 'processing');
-    try {
-      // Initialize services
-      const client = await createClient(ctx);
-      const s3 = new S3DataSource(
-        {
-          type: 'S3_CSV',
-          connectionId: 'my-s3-4',
-          name: 'Inventory Files S3 4',
-          s3Config: {
-            bucket: 'inventory-files',
-            region: 'us-east-1',
-            accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
-            secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
-          },
-        },
-        logger
-      );
-      const csvParser = new CSVParserService();
-      const mapper = new UniversalMapper({
-        fields: {
-          skuRef: { source: 'sku', required: true },
-          locationRef: { source: 'location_code', required: true },
-          qty: { source: 'quantity', resolver: 'sdk.parseInt' },
-          expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
-        },
-      });
-      // Get or create job for this workflow
-      let jobId = await kvAdapter.get(['job', workflowId, 'jobId']);
-      if (!jobId?.value) {
-        const job = await client.createJob({
-          name: `distributed-ingestion-${workflowId}`,
-          retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
-        });
-        await kvAdapter.set(['job', workflowId, 'jobId'], job.id);
-        jobId = { value: job.id };
-      }
-      // Download chunk
-      const chunkContent = (await s3.downloadFile(chunkKey, {
-        encoding: 'utf8',
-      })) as string;
-      // Parse chunk
-      const records = await csvParser.parse(chunkContent);
-      // Map records
-      const entities: any[] = [];
-      for (const record of records) {
-        const mapped = await mapper.map(record);
-        if (mapped.success && mapped.data) {
-          entities.push(mapped.data);
-        }
-      }
-      // Send batch
-      await client.sendBatch(jobId.value as string, { entities });
-      // Mark as completed
-      await kvAdapter.set(['chunk', workflowId, chunkId], {
-        chunkId,
-        s3Key: chunkKey,
-        recordCount: entities.length,
-        status: 'completed',
-        processedAt: new Date().toISOString(),
-      });
-      logger.info('Worker: Chunk completed', {
-        workflowId,
-        chunkId,
-        recordCount: entities.length,
-      });
-      return {
-        chunkId,
-        status: 'completed',
-        recordsProcessed: entities.length,
-      };
-    } catch (error) {
-      logger.error('Worker: Chunk failed', error as Error, {
-        workflowId,
-        chunkId,
-      });
-      // Mark as failed
-      await kvAdapter.set(['chunk', workflowId, chunkId], {
-        chunkId,
-        s3Key: chunkKey,
-        recordCount,
-        status: 'failed',
-        error: (error as Error).message,
-      });
-      throw error;
-    }
-  })
-);
-```
-**Performance:**
-```
-File Size: 10GB (20M records)
-Chunk Size: 100K records
-Total Chunks: 200
-Worker Workflows: 200 (parallel)
-Processing Time: ~10 minutes (Versori handles parallelism)
-RAM Usage: ~50MB per worker
-Throughput: ~33,333 records/second
-```
----
-## Memory Optimization Tips
-### 1. Use Streaming APIs
-```typescript
-// ❌ WRONG - Loads entire file into memory
-const fileContent = await fs.readFile('huge.csv', 'utf-8');
-const records = await csvParser.parse(fileContent);
-// ✅ CORRECT - Streams records incrementally
-for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
-  await processRecord(record);
-}
-```
-### 2. Clear Batches After Processing
-```typescript
-let batch: any[] = [];
-for await (const record of records) {
-  batch.push(record);
-  if (batch.length >= 1000) {
-    await sendBatch(batch);
-    batch = []; // ✅ Clear batch to free memory
-  }
-}
-```
-### 3. Monitor Memory Usage
-```typescript
-function logMemoryUsage() {
-  const used = process.memoryUsage();
-  console.log({
-    heapUsed: Math.round(used.heapUsed / 1024 / 1024) + ' MB',
-    heapTotal: Math.round(used.heapTotal / 1024 / 1024) + ' MB',
-    rss: Math.round(used.rss / 1024 / 1024) + ' MB',
-  });
-}
-// Log every 10K records
-if (recordsProcessed % 10000 === 0) {
-  logMemoryUsage();
-}
-```
-### 4. Use Garbage Collection Hints
-```typescript
-// Force garbage collection (requires --expose-gc flag)
-if (recordsProcessed % 100000 === 0 && global.gc) {
-  global.gc();
-  logger.info('Garbage collection triggered', { recordsProcessed });
-}
-```
----
-## Performance Benchmarks
-### Pattern Comparison (10M records, 5GB file)
-| Pattern                   | Time   | RAM    | Throughput     | Complexity |
-| ------------------------- | ------ | ------ | -------------- | ---------- |
-| 1. Basic Streaming        | 90 min | 50MB   | 1,852 rec/sec  | Low        |
-| 2. File Chunking          | 60 min | 100MB  | 2,778 rec/sec  | Medium     |
-| 3. Parallel Processing    | 15 min | 500MB  | 11,111 rec/sec | High       |
-| 4. Distributed Processing | 10 min | 50MB\* | 16,667 rec/sec | Very High  |
-\*Per worker; total RAM = 50MB × worker count
-### Optimization Impact
-| Optimization              | Before  | After    | Improvement |
-| ------------------------- | ------- | -------- | ----------- |
-| Streaming vs Loading      | 5GB RAM | 50MB RAM | 100x        |
-| Batching (1K vs 10K)      | 90 min  | 60 min   | 1.5x        |
-| Parallel (1 vs 5 jobs)    | 60 min  | 15 min   | 4x          |
-| Distributed (200 workers) | 15 min  | 10 min   | 1.5x        |
----
-## Common Issues & Solutions
-### Issue 1: Out of Memory
-**Symptoms:**
-```
-FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
-```
-**Solutions:**
-1. Switch to streaming pattern (Pattern 1)
-2. Reduce batch size (1000 => 500)
-3. Increase Node.js heap: `node --max-old-space-size=4096`
-4. Use file chunking (Pattern 2)
-### Issue 2: Timeout on Large Files
-**Symptoms:**
-```
-TimeoutError: Operation timed out after 300000ms
-```
-**Solutions:**
-1. Increase timeout: `config.timeout = 600000` (10 min)
-2. Split file into chunks (Pattern 2)
-3. Use parallel processing (Pattern 3)
-### Issue 3: Chunks Not Resuming
-**Symptoms:**
-- Re-processing same chunks on failure
-**Solutions:**
-```typescript
-// Check chunk status before processing
-const chunkState = await kv.get(['chunk', workflowId, chunkId, 'status']);
-if (chunkState?.value === 'completed') {
-  logger.info('Chunk already processed, skipping', { chunkId });
-  continue;
-}
-```
-### Issue 4: Progress Tracking Inconsistent
-**Symptoms:**
-- Progress percentage doesn't match reality
-**Solutions:**
-```typescript
-// Always update chunk status atomically
-const atomic = kv.atomic();
-atomic.set(['chunk', workflowId, chunkId, 'status'], 'completed');
-atomic.set(['chunk', workflowId, chunkId, 'processedAt'], new Date().toISOString());
-await atomic.commit();
-```
-### Issue 5: Duplicate Processing
-**Symptoms:**
-- Same records sent multiple times
-**Solutions:**
-```typescript
-// Use idempotency keys in Fluent batch payload
-await client.sendBatch(jobId, {
-  entities,
-  meta: {
-    chunkId: chunk.chunkId,
-    workflowId,
-    idempotencyKey: `${workflowId}-${chunk.chunkId}`,
-  },
-});
-```
----
-## Related Guides
-- [Basic Ingestion Pattern](../standalone/s3-csv-batch-api.md) - For small files (<100K records)
-- [Streaming Pattern](../../02-CORE-GUIDES/ingestion/ingestion-readme.md) - For medium files (100K-1M records)
-- [Error Handling & Retry](./error-handling-retry.md) - Robust error handling strategies
-- [Progress Tracking](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md) - Real-time progress monitoring
-- [State Management](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md) - VersoriKV patterns
----
-## Summary
-**Choose Your Pattern:**
-- **Pattern 1 (Streaming)**: Simple, memory-efficient, suitable for 100K-1M records
-- **Pattern 2 (Chunking)**: Checkpoint/resume, suitable for 1M-5M records
-- **Pattern 3 (Parallel)**: High performance, suitable for 5M-10M records
-- **Pattern 4 (Distributed)**: Enterprise scale, suitable for 10M+ records
-**Key Takeaways:**
-1. Always use streaming APIs for large files
-2. Clear batches after processing to free memory
-3. Use chunks + VersoriKV for checkpoint/resume
-4. Parallel processing trades RAM for speed
-5. Monitor memory usage throughout processing
-6. Test with representative file sizes before production
+# Pattern: Large File Processing & Chunking
+**FC Connect SDK Use Case Guide**
+> **SDK**: [@fluentcommerce/fc-connect-sdk](https://www.npmjs.com/package/@fluentcommerce/fc-connect-sdk)
+> **Version**: Use latest - `npm install @fluentcommerce/fc-connect-sdk@latest`
+**Context**: Enterprise-scale file ingestion with streaming, splitting, and parallel processing
+**Type**: Advanced Pattern
+**Complexity**: High
+**Volume**: 500MB-5GB files, 1M-10M records
+**Latency**: Batch processing (< 30-60 min for 10M records)
+**Pattern**: Streaming + chunking + parallel Batch API
+## When to Use This Pattern
+Use this pattern when dealing with:
+- **Large CSV files** (>500MB, >1M records)
+- **Memory-constrained environments** (Lambda, containers with limited RAM)
+- **Time-sensitive ingestion** (need parallel processing for speed)
+- **Reliability requirements** (checkpoint/resume on failure)
+- **Progress tracking** (real-time status updates)
+**Volume Guidance:**
+- **Small** (<1K records): Use basic ingestion pattern
+- **Medium** (1K-100K records): Use streaming pattern (Pattern 1)
+- **Large** (100K-1M records): Use file chunking pattern (Pattern 2)
+- **Huge** (1M-10M records): Use parallel processing pattern (Pattern 3)
+- **Enterprise** (10M+ records): Use distributed processing pattern (Pattern 4)
+## Problem Statement
+### Why Splitting is Needed
+**Memory Constraints:**
+```typescript
+// ❌ WRONG - Loads entire 2GB file into memory
+const csvContent = await fs.readFile('huge-inventory.csv', 'utf-8');
+const records = await csvParser.parse(csvContent); // 💥 Out of memory
+```
+**Impact:**
+- Lambda 512MB: Crashes on 500MB+ files
+- Container 1GB: Struggles with 1GB+ files
+- Node.js default heap (4GB): Fails on 5GB+ files
+**Time Constraints:**
+```typescript
+// ❌ WRONG - Sequential processing takes 90+ minutes
+for (const record of records) {
+  await processRecord(record); // Too slow for 10M records
+}
+```
+**Reliability Requirements:**
+```typescript
+// ❌ WRONG - Network failure loses all progress
+await processAllRecords(records); // If fails at record 5M, restart from 0
+```
+### Solution Overview
+This guide demonstrates 4 progressive patterns:
+1. **Basic Streaming** (~200 lines) - Process records as they arrive, memory-efficient
+2. **File Chunking** (~300 lines) - Split large files into manageable chunks
+3. **Parallel Processing** (~400 lines) - Process chunks concurrently with progress tracking
+4. **Distributed Processing** (~300 lines) - Use Versori scheduled workflows for enterprise scale
+## SDK Methods Used
+```typescript
+import {
+  createClient,
+  // Client factory (auto-detects context)
+  CSVParserService,
+  // Streaming CSV parser
+  S3DataSource,
+  // S3 file operations
+  UniversalMapper,
+  // Field mapping
+  StateService,
+  // Progress tracking
+  VersoriKVAdapter,
+  // Versori state management,
+  // Structured logging,
+  createConsoleLogger,
+  toStructuredLogger
+} from '@fluentcommerce/fc-connect-sdk';
+```
+---
+## Pattern 1: Basic Streaming (Memory-Efficient)
+**Best for:** 100K-1M records, single-threaded processing, memory-constrained environments
+**Memory Usage:**
+- ❌ Without streaming: 2GB file = 2GB+ RAM (file + parsed objects)
+- ✅ With streaming: 2GB file = ~50MB RAM (processes records incrementally)
+### Implementation
+```typescript
+import {
+  createClient,
+  CSVParserService,
+  S3DataSource,
+  UniversalMapper,
+  createConsoleLogger,
+  toStructuredLogger
+} from '@fluentcommerce/fc-connect-sdk';
+const logger = createConsoleLogger();
+async function streamingIngestion(ctx: any) {
+  logger.info('Starting streaming ingestion');
+  // Create client (auto-detects Versori context)
+  const client = await createClient(ctx);
+  // Initialize S3 data source
+  const s3 = new S3DataSource(
+    {
+      type: 'S3_CSV',
+      connectionId: 'my-s3',
+      name: 'Inventory Files S3',
+      s3Config: {
+        bucket: 'inventory-files',
+        region: 'us-east-1',
+        accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
+        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
+      },
+    },
+    logger
+  );
+  // Define field mapping
+  const mapper = new UniversalMapper({
+    fields: {
+      skuRef: { source: 'sku', required: true },
+      locationRef: { source: 'location_code', required: true },
+      qty: { source: 'quantity', resolver: 'sdk.parseInt' },
+      expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
+    },
+  });
+  // Create CSV parser with streaming enabled
+  const csvParser = new CSVParserService();
+  // Download file as stream (not loaded into memory)
+  logger.info('Downloading file from S3', {
+    key: 'inventory/large-file.csv',
+  });
+  const fileContent = (await s3.downloadFile('inventory/large-file.csv', {
+    encoding: 'utf8',
+  })) as string;
+  // Create job for batch ingestion
+  const job = await client.createJob({
+    name: 'streaming-inventory-ingestion',
+    retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
+  });
+  logger.info('Job created', { jobId: job.id });
+  // Statistics tracking
+  let recordsProcessed = 0;
+  let batchCount = 0;
+  let errors = 0;
+  const BATCH_SIZE = 1000;
+  let currentBatch: any[] = [];
+  // Stream records with batching (memory-efficient)
+  // Records are parsed incrementally, not all at once
+  for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
+    try {
+      // Map record
+      const mapped = await mapper.map(record);
+      if (mapped.success && mapped.data) {
+        currentBatch.push(mapped.data);
+        recordsProcessed++;
+        // Send batch when full
+        if (currentBatch.length >= BATCH_SIZE) {
+          await client.sendBatch(job.id, {
+            entities: currentBatch,
+          });
+          batchCount++;
+          logger.info('Batch sent', {
+            batchNumber: batchCount,
+            recordsProcessed,
+            currentBatchSize: currentBatch.length,
+          });
+          currentBatch = []; // Clear batch (frees memory)
+        }
+      } else {
+        errors++;
+        logger.warn('Record mapping failed', {
+          record,
+          errors: mapped.errors,
+        });
+      }
+    } catch (error) {
+      errors++;
+      logger.error('Record processing failed', error as Error, { record });
+    }
+    // Progress logging every 10K records
+    if (recordsProcessed % 10000 === 0) {
+      logger.info('Progress update', {
+        recordsProcessed,
+        batchesSent: batchCount,
+        errors,
+        memoryUsage: process.memoryUsage().heapUsed / 1024 / 1024 + ' MB',
+      });
+    }
+  }
+  // Send remaining records
+  if (currentBatch.length > 0) {
+    await client.sendBatch(job.id, {
+      entities: currentBatch,
+    });
+    batchCount++;
+  }
+  logger.info('Streaming ingestion complete', {
+    totalRecords: recordsProcessed,
+    batchesSent: batchCount,
+    errors,
+    jobId: job.id,
+  });
+  return {
+    success: true,
+    jobId: job.id,
+    recordsProcessed,
+    batchesSent: batchCount,
+    errors,
+  };
+}
+```
+**Memory Profile:**
+```
+File Size: 2GB (5M records)
+RAM Usage: ~50MB peak (1000 record batches)
+Processing Time: ~45 minutes (sequential)
+```
+---
+## Pattern 2: File Chunking (Split & Track)
+**Best for:** 1M-5M records, need checkpoint/resume, want progress visibility
+**Strategy:**
+1. Split large file into 100K record chunks
+2. Write chunks to temp S3 locations
+3. Track chunk metadata in VersoriKV
+4. Process chunks sequentially (can resume on failure)
+### Implementation
+```typescript
+import {
+  createClient,
+  CSVParserService,
+  S3DataSource,
+  UniversalMapper,
+  StateService,
+  VersoriKVAdapter,
+  createConsoleLogger,
+  toStructuredLogger
+} from '@fluentcommerce/fc-connect-sdk';
+const logger = createConsoleLogger();
+interface ChunkMetadata {
+  chunkId: string;
+  startRecord: number;
+  endRecord: number;
+  s3Key: string;
+  recordCount: number;
+  status: 'pending' | 'processing' | 'completed' | 'failed';
+  processedAt?: string;
+  error?: string;
+}
+async function chunkedIngestion(ctx: any) {
+  logger.info('Starting chunked ingestion');
+  // Initialize services
+  const client = await createClient(ctx);
+  const s3 = new S3DataSource(
+    {
+      type: 'S3_CSV',
+      connectionId: 'my-s3-chunked',
+      name: 'Inventory Files S3 Chunked',
+      s3Config: {
+        bucket: 'inventory-files',
+        region: 'us-east-1',
+        accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
+        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
+      },
+    },
+    logger
+  );
+  // Initialize state management
+  const kv = context.openKv();
+  const kvAdapter = new VersoriKVAdapter(kv);
+  const stateService = new StateService(logger);
+  const SOURCE_FILE = 'inventory/huge-inventory.csv';
+  const CHUNK_SIZE = 100000; // 100K records per chunk
+  const workflowId = 'chunked-ingestion';
+  // STEP 1: Check if chunking is already in progress
+  const existingState = await stateService.getSyncState(kvAdapter, workflowId);
+  if (existingState.isInitialized && existingState.lastSyncResult === 'partial') {
+    logger.info('Resuming from previous run', {
+      lastProcessedFile: existingState.lastProcessedFile,
+      lastProcessedCount: existingState.lastProcessedCount,
+    });
+  }
+  // STEP 2: Split file into chunks
+  logger.info('Splitting file into chunks', {
+    sourceFile: SOURCE_FILE,
+    chunkSize: CHUNK_SIZE,
+  });
+  const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
+  logger.info('File split complete', {
+    totalChunks: chunks.length,
+    totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
+  });
+  // STEP 3: Create job for ingestion
+  const job = await client.createJob({
+    name: `chunked-inventory-ingestion-${Date.now()}`,
+    retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
+  });
+  logger.info('Job created', { jobId: job.id });
+  // STEP 4: Process each chunk sequentially
+  let successCount = 0;
+  let failureCount = 0;
+  for (const chunk of chunks) {
+    try {
+      // Skip if already processed
+      const chunkState = await kvAdapter.get(['chunk', workflowId, chunk.chunkId, 'status']);
+      if (chunkState?.value === 'completed') {
+        logger.info('Chunk already processed, skipping', {
+          chunkId: chunk.chunkId,
+        });
+        successCount++;
+        continue;
+      }
+      // Mark chunk as processing
+      await kvAdapter.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
+      logger.info('Processing chunk', {
+        chunkId: chunk.chunkId,
+        recordCount: chunk.recordCount,
+        progress: `${successCount + failureCount}/${chunks.length}`,
+      });
+      // Process chunk
+      await processChunk(s3, client, job.id, chunk);
+      // Mark chunk as completed
+      await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
+        ...chunk,
+        status: 'completed',
+        processedAt: new Date().toISOString(),
+      } as ChunkMetadata);
+      successCount++;
+      logger.info('Chunk completed', {
+        chunkId: chunk.chunkId,
+        successCount,
+        failureCount,
+        percentComplete: (((successCount + failureCount) / chunks.length) * 100).toFixed(1),
+      });
+    } catch (error) {
+      failureCount++;
+      logger.error('Chunk processing failed', error as Error, {
+        chunkId: chunk.chunkId,
+      });
+      // Mark chunk as failed
+      await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
+        ...chunk,
+        status: 'failed',
+        error: (error as Error).message,
+      } as ChunkMetadata);
+    }
+  }
+  // STEP 5: Update final state
+  await stateService.updateSyncState(
+    kvAdapter,
+    [
+      {
+        fileName: SOURCE_FILE,
+        lastModified: new Date().toISOString(),
+        recordCount: chunks.reduce((sum, c) => sum + c.recordCount, 0),
+      },
+    ],
+    workflowId
+  );
+  logger.info('Chunked ingestion complete', {
+    totalChunks: chunks.length,
+    successCount,
+    failureCount,
+    jobId: job.id,
+  });
+  return {
+    success: failureCount === 0,
+    jobId: job.id,
+    chunksProcessed: successCount,
+    chunksFailed: failureCount,
+    totalChunks: chunks.length,
+  };
+}
+/**
+ * Split file into chunks and upload to S3
+ */
+async function splitFileIntoChunks(
+  s3: S3DataSource,
+  sourceKey: string,
+  chunkSize: number,
+  workflowId: string,
+  kv: VersoriKVAdapter
+): Promise<ChunkMetadata[]> {
+  const csvParser = new CSVParserService();
+  const chunks: ChunkMetadata[] = [];
+  // Download source file
+  const fileContent = (await s3.downloadFile(sourceKey, {
+    encoding: 'utf8',
+  })) as string;
+  let currentChunk: any[] = [];
+  let chunkNumber = 0;
+  let recordNumber = 0;
+  // Stream through file and create chunks
+  for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
+    currentChunk.push(record);
+    recordNumber++;
+    // Create chunk when size reached
+    if (currentChunk.length >= chunkSize) {
+      const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
+      const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
+      // Convert chunk to CSV
+      const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
+      // Upload chunk to S3
+      await s3.uploadFile(chunkKey, chunkCSV, {
+        contentType: 'text/csv',
+      });
+      // Create chunk metadata
+      const metadata: ChunkMetadata = {
+        chunkId,
+        startRecord: recordNumber - currentChunk.length,
+        endRecord: recordNumber - 1,
+        s3Key: chunkKey,
+        recordCount: currentChunk.length,
+        status: 'pending',
+      };
+      chunks.push(metadata);
+      // Store chunk metadata in KV
+      await kv.set(['chunk', workflowId, chunkId], metadata);
+      logger.info('Chunk created', {
+        chunkId,
+        recordCount: currentChunk.length,
+        s3Key: chunkKey,
+      });
+      // Clear chunk (free memory)
+      currentChunk = [];
+      chunkNumber++;
+    }
+  }
+  // Handle remaining records
+  if (currentChunk.length > 0) {
+    const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
+    const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
+    const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
+    await s3.uploadFile(chunkKey, chunkCSV, { contentType: 'text/csv' });
+    const metadata: ChunkMetadata = {
+      chunkId,
+      startRecord: recordNumber - currentChunk.length,
+      endRecord: recordNumber - 1,
+      s3Key: chunkKey,
+      recordCount: currentChunk.length,
+      status: 'pending',
+    };
+    chunks.push(metadata);
+    await kv.set(['chunk', workflowId, chunkId], metadata);
+  }
+  return chunks;
+}
+/**
+ * Process a single chunk
+ */
+async function processChunk(
+  s3: S3DataSource,
+  client: any,
+  jobId: string,
+  chunk: ChunkMetadata
+): Promise<void> {
+  const csvParser = new CSVParserService();
+  const mapper = new UniversalMapper({
+    fields: {
+      skuRef: { source: 'sku', required: true },
+      locationRef: { source: 'location_code', required: true },
+      qty: { source: 'quantity', resolver: 'sdk.parseInt' },
+      expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
+    },
+  });
+  // Download chunk
+  const chunkContent = (await s3.downloadFile(chunk.s3Key, {
+    encoding: 'utf8',
+  })) as string;
+  // Parse chunk
+  const records = await csvParser.parse(chunkContent);
+  // Map records
+  const entities: any[] = [];
+  for (const record of records) {
+    const mapped = await mapper.map(record);
+    if (mapped.success && mapped.data) {
+      entities.push(mapped.data);
+    }
+  }
+  // Send batch
+  await client.sendBatch(jobId, { entities });
+  logger.info('Chunk batch sent', {
+    chunkId: chunk.chunkId,
+    entityCount: entities.length,
+  });
+}
+```
+**VersoriKV Schema:**
+```typescript
+// Chunk metadata
+['chunk', workflowId, chunkId] => ChunkMetadata
+// Chunk status
+['chunk', workflowId, chunkId, 'status'] => 'pending' | 'processing' | 'completed' | 'failed'
+// Workflow state
+['state', workflowId, 'sync'] => SyncState
+```
+**Performance:**
+```
+File Size: 5GB (10M records)
+Chunk Size: 100K records
+Total Chunks: 100
+Processing Time: ~60 minutes (sequential)
+RAM Usage: ~100MB (processes one chunk at a time)
+```
+---
+## Pattern 3: Parallel Processing (High Performance)
+**Best for:** 5M-10M records, time-sensitive ingestion, need speed with reliability
+**Strategy:**
+1. Split file into chunks (same as Pattern 2)
+2. Spawn 5 parallel Batch API jobs
+3. Process chunks concurrently
+4. Track progress in VersoriKV
+5. Resume on failure
+### Implementation
+```typescript
+import {
+  createClient,
+  CSVParserService,
+  S3DataSource,
+  UniversalMapper,
+  StateService,
+  VersoriKVAdapter,
+  createConsoleLogger,
+  toStructuredLogger
+} from '@fluentcommerce/fc-connect-sdk';
+const logger = createConsoleLogger();
+interface ParallelJob {
+  jobId: string;
+  assignedChunks: string[];
+  status: 'pending' | 'processing' | 'completed' | 'failed';
+  recordsProcessed: number;
+  startedAt?: string;
+  completedAt?: string;
+}
+async function parallelIngestion(ctx: any) {
+  logger.info('Starting parallel ingestion');
+  // Initialize services
+  const client = await createClient(ctx);
+  const s3 = new S3DataSource(
+    {
+      type: 'S3_CSV',
+      connectionId: 'my-s3-parallel',
+      name: 'Inventory Files S3 Parallel',
+      s3Config: {
+        bucket: 'inventory-files',
+        region: 'us-east-1',
+        accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
+        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
+      },
+    },
+    logger
+  );
+  const kv = context.openKv();
+  const kvAdapter = new VersoriKVAdapter(kv);
+  const stateService = new StateService(logger);
+  const SOURCE_FILE = 'inventory/huge-inventory.csv';
+  const CHUNK_SIZE = 100000; // 100K records per chunk
+  const PARALLEL_JOBS = 5; // Process 5 chunks concurrently
+  const workflowId = 'parallel-ingestion';
+  // STEP 1: Split file into chunks (reuse from Pattern 2)
+  const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
+  logger.info('File split complete', {
+    totalChunks: chunks.length,
+    totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
+  });
+  // STEP 2: Create multiple jobs for parallel processing
+  const jobs: ParallelJob[] = [];
+  for (let i = 0; i < PARALLEL_JOBS; i++) {
+    const job = await client.createJob({
+      name: `parallel-inventory-ingestion-job-${i + 1}`,
+      retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
+    });
+    jobs.push({
+      jobId: job.id,
+      assignedChunks: [],
+      status: 'pending',
+      recordsProcessed: 0,
+    });
+    logger.info('Parallel job created', {
+      jobNumber: i + 1,
+      jobId: job.id,
+    });
+  }
+  // STEP 3: Distribute chunks across jobs (round-robin)
+  chunks.forEach((chunk, index) => {
+    const jobIndex = index % PARALLEL_JOBS;
+    jobs[jobIndex].assignedChunks.push(chunk.chunkId);
+  });
+  logger.info('Chunks distributed', {
+    totalChunks: chunks.length,
+    jobCount: PARALLEL_JOBS,
+    chunksPerJob: jobs.map(j => j.assignedChunks.length),
+  });
+  // STEP 4: Process chunks in parallel
+  const startTime = Date.now();
+  const jobPromises = jobs.map((job, jobIndex) =>
+    processJobChunks(
+      s3,
+      client,
+      job,
+      chunks.filter(c => job.assignedChunks.includes(c.chunkId)),
+      workflowId,
+      kvAdapter,
+      jobIndex + 1
+    )
+  );
+  // Wait for all jobs to complete
+  const results = await Promise.allSettled(jobPromises);
+  const duration = (Date.now() - startTime) / 1000;
+  // STEP 5: Analyze results
+  let successfulJobs = 0;
+  let failedJobs = 0;
+  let totalRecordsProcessed = 0;
+  results.forEach((result, index) => {
+    if (result.status === 'fulfilled') {
+      successfulJobs++;
+      totalRecordsProcessed += result.value.recordsProcessed;
+      logger.info('Job completed', {
+        jobNumber: index + 1,
+        jobId: jobs[index].jobId,
+        recordsProcessed: result.value.recordsProcessed,
+        chunksProcessed: result.value.chunksProcessed,
+      });
+    } else {
+      failedJobs++;
+      logger.error('Job failed', result.reason, {
+        jobNumber: index + 1,
+        jobId: jobs[index].jobId,
+      });
+    }
+  });
+  // STEP 6: Update final state
+  await stateService.updateSyncState(
+    kvAdapter,
+    [
+      {
+        fileName: SOURCE_FILE,
+        lastModified: new Date().toISOString(),
+        recordCount: totalRecordsProcessed,
+      },
+    ],
+    workflowId
+  );
+  logger.info('Parallel ingestion complete', {
+    totalChunks: chunks.length,
+    parallelJobs: PARALLEL_JOBS,
+    successfulJobs,
+    failedJobs,
+    totalRecordsProcessed,
+    durationSeconds: duration,
+    recordsPerSecond: Math.round(totalRecordsProcessed / duration),
+  });
+  return {
+    success: failedJobs === 0,
+    totalChunks: chunks.length,
+    totalRecordsProcessed,
+    successfulJobs,
+    failedJobs,
+    durationSeconds: duration,
+    recordsPerSecond: Math.round(totalRecordsProcessed / duration),
+  };
+}
+/**
+ * Process all chunks assigned to a job
+ */
+async function processJobChunks(
+  s3: S3DataSource,
+  client: any,
+  job: ParallelJob,
+  chunks: ChunkMetadata[],
+  workflowId: string,
+  kv: VersoriKVAdapter,
+  jobNumber: number
+): Promise<{ recordsProcessed: number; chunksProcessed: number }> {
+  logger.info(`Job ${jobNumber} starting`, {
+    jobId: job.jobId,
+    assignedChunks: chunks.length,
+  });
+  let recordsProcessed = 0;
+  let chunksProcessed = 0;
+  for (const chunk of chunks) {
+    try {
+      // Check if chunk already processed
+      const chunkState = await kv.get(['chunk', workflowId, chunk.chunkId, 'status']);
+      if (chunkState?.value === 'completed') {
+        logger.info(`Job ${jobNumber}: Chunk already processed`, {
+          chunkId: chunk.chunkId,
+        });
+        chunksProcessed++;
+        continue;
+      }
+      // Mark chunk as processing
+      await kv.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
+      logger.info(`Job ${jobNumber}: Processing chunk`, {
+        chunkId: chunk.chunkId,
+        recordCount: chunk.recordCount,
+        progress: `${chunksProcessed}/${chunks.length}`,
+      });
+      // Process chunk
+      await processChunk(s3, client, job.jobId, chunk);
+      // Mark chunk as completed
+      await kv.set(['chunk', workflowId, chunk.chunkId], {
+        ...chunk,
+        status: 'completed',
+        processedAt: new Date().toISOString(),
+      } as ChunkMetadata);
+      recordsProcessed += chunk.recordCount;
+      chunksProcessed++;
+      logger.info(`Job ${jobNumber}: Chunk completed`, {
+        chunkId: chunk.chunkId,
+        recordsProcessed,
+        chunksProcessed,
+        percentComplete: ((chunksProcessed / chunks.length) * 100).toFixed(1),
+      });
+    } catch (error) {
+      logger.error(`Job ${jobNumber}: Chunk failed`, error as Error, {
+        chunkId: chunk.chunkId,
+      });
+      // Mark chunk as failed (don't throw - continue with remaining chunks)
+      await kv.set(['chunk', workflowId, chunk.chunkId], {
+        ...chunk,
+        status: 'failed',
+        error: (error as Error).message,
+      } as ChunkMetadata);
+    }
+  }
+  logger.info(`Job ${jobNumber} completed`, {
+    jobId: job.jobId,
+    recordsProcessed,
+    chunksProcessed,
+  });
+  return { recordsProcessed, chunksProcessed };
+}
+```
+**Progress Tracking:**
+```typescript
+// Real-time progress query
+async function getIngestionProgress(
+  workflowId: string,
+  kv: VersoriKVAdapter
+): Promise<{
+  totalChunks: number;
+  completedChunks: number;
+  failedChunks: number;
+  processingChunks: number;
+  percentComplete: number;
+}> {
+  // This would query all chunk statuses from KV
+  // Simplified example:
+  const chunks = await getAllChunkMetadata(workflowId, kv);
+  const completed = chunks.filter(c => c.status === 'completed').length;
+  const failed = chunks.filter(c => c.status === 'failed').length;
+  const processing = chunks.filter(c => c.status === 'processing').length;
+  return {
+    totalChunks: chunks.length,
+    completedChunks: completed,
+    failedChunks: failed,
+    processingChunks: processing,
+    percentComplete: (completed / chunks.length) * 100,
+  };
+}
+```
+**Performance:**
+```
+File Size: 5GB (10M records)
+Chunk Size: 100K records
+Total Chunks: 100
+Parallel Jobs: 5
+Processing Time: ~15 minutes (4x speedup)
+RAM Usage: ~500MB (5 chunks in parallel)
+Throughput: ~11,111 records/second
+```
+---
+## Pattern 4: Distributed Processing (Versori Workflows)
+**Best for:** 10M+ records, enterprise scale, need maximum reliability and observability
+**Strategy:**
+1. Coordinator workflow splits file and creates scheduled tasks
+2. Each worker workflow processes one chunk
+3. Coordinator tracks completion via VersoriKV
+4. Automatic retry on worker failure
+### Coordinator Workflow
+```typescript
+import { fn, schedule } from '@versori/run';
+import {
+  createClient,
+  S3DataSource,
+  VersoriKVAdapter,
+  createConsoleLogger,
+  toStructuredLogger
+} from '@fluentcommerce/fc-connect-sdk';
+const logger = createConsoleLogger();
+/**
+ * Coordinator workflow - splits file and spawns workers
+ */
+export const coordinatorWorkflow = schedule('coordinator')
+  .cron('0 2 * * *') // Run daily at 2 AM
+  .then(
+    fn('split-and-schedule', async ({ activation, connections, kv }) => {
+      logger.info('Coordinator: Starting distributed ingestion');
+      const s3 = new S3DataSource(
+        {
+          type: 'S3_CSV',
+          connectionId: 'my-s3-3',
+          name: 'Inventory Files S3 3',
+          s3Config: {
+            bucket: 'inventory-files',
+            region: 'us-east-1',
+            accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
+            secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
+          },
+        },
+        logger
+      );
+      const kvAdapter = new VersoriKVAdapter(kv);
+      const workflowId = `distributed-${Date.now()}`;
+      const SOURCE_FILE = 'inventory/enterprise-inventory.csv';
+      const CHUNK_SIZE = 100000;
+      // Split file into chunks
+      const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
+      logger.info('Coordinator: File split complete', {
+        totalChunks: chunks.length,
+        workflowId,
+      });
+      // Store coordinator state
+      await kvAdapter.set(['coordinator', workflowId], {
+        workflowId,
+        sourceFile: SOURCE_FILE,
+        totalChunks: chunks.length,
+        status: 'scheduled',
+        createdAt: new Date().toISOString(),
+      });
+      // Schedule worker for each chunk
+      for (const chunk of chunks) {
+        // Trigger worker workflow (Versori will handle scheduling)
+        await activation.triggerWorkflow('chunk-worker', {
+          workflowId,
+          chunkId: chunk.chunkId,
+          chunkKey: chunk.s3Key,
+          recordCount: chunk.recordCount,
+        });
+        logger.info('Coordinator: Worker scheduled', {
+          chunkId: chunk.chunkId,
+          workflowId,
+        });
+      }
+      return {
+        workflowId,
+        totalChunks: chunks.length,
+        message: `Scheduled ${chunks.length} worker workflows`,
+      };
+    })
+  );
+/**
+ * Monitor workflow - checks completion status
+ */
+export const monitorWorkflow = schedule('monitor')
+  .cron('*/5 * * * *') // Run every 5 minutes
+  .then(
+    fn('check-progress', async ({ kv }) => {
+      const kvAdapter = new VersoriKVAdapter(kv);
+      // Get all active coordinators
+      const coordinators = await getActiveCoordinators(kvAdapter);
+      for (const coordinator of coordinators) {
+        const progress = await getIngestionProgress(coordinator.workflowId, kvAdapter);
+        logger.info('Monitor: Progress update', {
+          workflowId: coordinator.workflowId,
+          ...progress,
+        });
+        // Check if complete
+        if (progress.completedChunks + progress.failedChunks === progress.totalChunks) {
+          // Mark coordinator as complete
+          await kvAdapter.set(['coordinator', coordinator.workflowId], {
+            ...coordinator,
+            status: 'completed',
+            completedAt: new Date().toISOString(),
+            progress,
+          });
+          logger.info('Monitor: Ingestion complete', {
+            workflowId: coordinator.workflowId,
+            ...progress,
+          });
+        }
+      }
+      return { coordinatorsChecked: coordinators.length };
+    })
+  );
+```
+### Worker Workflow
+```typescript
+import { fn, webhook } from '@versori/run';
+import {
+  createClient,
+  S3DataSource,
+  CSVParserService,
+  UniversalMapper,
+  VersoriKVAdapter,
+  createConsoleLogger,
+  toStructuredLogger
+} from '@fluentcommerce/fc-connect-sdk';
+const logger = createConsoleLogger();
+/**
+ * Worker workflow - processes a single chunk
+ */
+export const chunkWorker = webhook('chunk-worker').then(
+  fn('process-chunk', async ({ data, activation, connections, kv }) => {
+    const { workflowId, chunkId, chunkKey, recordCount } = data;
+    logger.info('Worker: Starting chunk processing', {
+      workflowId,
+      chunkId,
+      recordCount,
+    });
+    const kvAdapter = new VersoriKVAdapter(kv);
+    // Check if already processed
+    const chunkState = await kvAdapter.get(['chunk', workflowId, chunkId, 'status']);
+    if (chunkState?.value === 'completed') {
+      logger.info('Worker: Chunk already processed', { chunkId });
+      return { chunkId, status: 'skipped', message: 'Already processed' };
+    }
+    // Mark as processing
+    await kvAdapter.set(['chunk', workflowId, chunkId, 'status'], 'processing');
+    try {
+      // Initialize services
+      const client = await createClient(ctx);
+      const s3 = new S3DataSource(
+        {
+          type: 'S3_CSV',
+          connectionId: 'my-s3-4',
+          name: 'Inventory Files S3 4',
+          s3Config: {
+            bucket: 'inventory-files',
+            region: 'us-east-1',
+            accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
+            secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
+          },
+        },
+        logger
+      );
+      const csvParser = new CSVParserService();
+      const mapper = new UniversalMapper({
+        fields: {
+          skuRef: { source: 'sku', required: true },
+          locationRef: { source: 'location_code', required: true },
+          qty: { source: 'quantity', resolver: 'sdk.parseInt' },
+          expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
+        },
+      });
+      // Get or create job for this workflow
+      let jobId = await kvAdapter.get(['job', workflowId, 'jobId']);
+      if (!jobId?.value) {
+        const job = await client.createJob({
+          name: `distributed-ingestion-${workflowId}`,
+          retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
+        });
+        await kvAdapter.set(['job', workflowId, 'jobId'], job.id);
+        jobId = { value: job.id };
+      }
+      // Download chunk
+      const chunkContent = (await s3.downloadFile(chunkKey, {
+        encoding: 'utf8',
+      })) as string;
+      // Parse chunk
+      const records = await csvParser.parse(chunkContent);
+      // Map records
+      const entities: any[] = [];
+      for (const record of records) {
+        const mapped = await mapper.map(record);
+        if (mapped.success && mapped.data) {
+          entities.push(mapped.data);
+        }
+      }
+      // Send batch
+      await client.sendBatch(jobId.value as string, { entities });
+      // Mark as completed
+      await kvAdapter.set(['chunk', workflowId, chunkId], {
+        chunkId,
+        s3Key: chunkKey,
+        recordCount: entities.length,
+        status: 'completed',
+        processedAt: new Date().toISOString(),
+      });
+      logger.info('Worker: Chunk completed', {
+        workflowId,
+        chunkId,
+        recordCount: entities.length,
+      });
+      return {
+        chunkId,
+        status: 'completed',
+        recordsProcessed: entities.length,
+      };
+    } catch (error) {
+      logger.error('Worker: Chunk failed', error as Error, {
+        workflowId,
+        chunkId,
+      });
+      // Mark as failed
+      await kvAdapter.set(['chunk', workflowId, chunkId], {
+        chunkId,
+        s3Key: chunkKey,
+        recordCount,
+        status: 'failed',
+        error: (error as Error).message,
+      });
+      throw error;
+    }
+  })
+);
+```
+**Performance:**
+```
+File Size: 10GB (20M records)
+Chunk Size: 100K records
+Total Chunks: 200
+Worker Workflows: 200 (parallel)
+Processing Time: ~10 minutes (Versori handles parallelism)
+RAM Usage: ~50MB per worker
+Throughput: ~33,333 records/second
+```
+---
+## Memory Optimization Tips
+### 1. Use Streaming APIs
+```typescript
+// ❌ WRONG - Loads entire file into memory
+const fileContent = await fs.readFile('huge.csv', 'utf-8');
+const records = await csvParser.parse(fileContent);
+// ✅ CORRECT - Streams records incrementally
+for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
+  await processRecord(record);
+}
+```
+### 2. Clear Batches After Processing
+```typescript
+let batch: any[] = [];
+for await (const record of records) {
+  batch.push(record);
+  if (batch.length >= 1000) {
+    await sendBatch(batch);
+    batch = []; // ✅ Clear batch to free memory
+  }
+}
+```
+### 3. Monitor Memory Usage
+```typescript
+function logMemoryUsage() {
+  const used = process.memoryUsage();
+  console.log({
+    heapUsed: Math.round(used.heapUsed / 1024 / 1024) + ' MB',
+    heapTotal: Math.round(used.heapTotal / 1024 / 1024) + ' MB',
+    rss: Math.round(used.rss / 1024 / 1024) + ' MB',
+  });
+}
+// Log every 10K records
+if (recordsProcessed % 10000 === 0) {
+  logMemoryUsage();
+}
+```
+### 4. Use Garbage Collection Hints
+```typescript
+// Force garbage collection (requires --expose-gc flag)
+if (recordsProcessed % 100000 === 0 && global.gc) {
+  global.gc();
+  logger.info('Garbage collection triggered', { recordsProcessed });
+}
+```
+---
+## Performance Benchmarks
+### Pattern Comparison (10M records, 5GB file)
+| Pattern                   | Time   | RAM    | Throughput     | Complexity |
+| ------------------------- | ------ | ------ | -------------- | ---------- |
+| 1. Basic Streaming        | 90 min | 50MB   | 1,852 rec/sec  | Low        |
+| 2. File Chunking          | 60 min | 100MB  | 2,778 rec/sec  | Medium     |
+| 3. Parallel Processing    | 15 min | 500MB  | 11,111 rec/sec | High       |
+| 4. Distributed Processing | 10 min | 50MB\* | 16,667 rec/sec | Very High  |
+\*Per worker; total RAM = 50MB × worker count
+### Optimization Impact
+| Optimization              | Before  | After    | Improvement |
+| ------------------------- | ------- | -------- | ----------- |
+| Streaming vs Loading      | 5GB RAM | 50MB RAM | 100x        |
+| Batching (1K vs 10K)      | 90 min  | 60 min   | 1.5x        |
+| Parallel (1 vs 5 jobs)    | 60 min  | 15 min   | 4x          |
+| Distributed (200 workers) | 15 min  | 10 min   | 1.5x        |
+---
+## Common Issues & Solutions
+### Issue 1: Out of Memory
+**Symptoms:**
+```
+FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
+```
+**Solutions:**
+1. Switch to streaming pattern (Pattern 1)
+2. Reduce batch size (1000 => 500)
+3. Increase Node.js heap: `node --max-old-space-size=4096`
+4. Use file chunking (Pattern 2)
+### Issue 2: Timeout on Large Files
+**Symptoms:**
+```
+TimeoutError: Operation timed out after 300000ms
+```
+**Solutions:**
+1. Increase timeout: `config.timeout = 600000` (10 min)
+2. Split file into chunks (Pattern 2)
+3. Use parallel processing (Pattern 3)
+### Issue 3: Chunks Not Resuming
+**Symptoms:**
+- Re-processing same chunks on failure
+**Solutions:**
+```typescript
+// Check chunk status before processing
+const chunkState = await kv.get(['chunk', workflowId, chunkId, 'status']);
+if (chunkState?.value === 'completed') {
+  logger.info('Chunk already processed, skipping', { chunkId });
+  continue;
+}
+```
+### Issue 4: Progress Tracking Inconsistent
+**Symptoms:**
+- Progress percentage doesn't match reality
+**Solutions:**
+```typescript
+// Always update chunk status atomically
+const atomic = kv.atomic();
+atomic.set(['chunk', workflowId, chunkId, 'status'], 'completed');
+atomic.set(['chunk', workflowId, chunkId, 'processedAt'], new Date().toISOString());
+await atomic.commit();
+```
+### Issue 5: Duplicate Processing
+**Symptoms:**
+- Same records sent multiple times
+**Solutions:**
+```typescript
+// Use idempotency keys in Fluent batch payload
+await client.sendBatch(jobId, {
+  entities,
+  meta: {
+    chunkId: chunk.chunkId,
+    workflowId,
+    idempotencyKey: `${workflowId}-${chunk.chunkId}`,
+  },
+});
+```
+---
+## Related Guides
+- [Basic Ingestion Pattern](../standalone/s3-csv-batch-api.md) - For small files (<100K records)
+- [Streaming Pattern](../../02-CORE-GUIDES/ingestion/ingestion-readme.md) - For medium files (100K-1M records)
+- [Error Handling & Retry](./error-handling-retry.md) - Robust error handling strategies
+- [Progress Tracking](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md) - Real-time progress monitoring
+- [State Management](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md) - VersoriKV patterns
+---
+## Summary
+**Choose Your Pattern:**
+- **Pattern 1 (Streaming)**: Simple, memory-efficient, suitable for 100K-1M records
+- **Pattern 2 (Chunking)**: Checkpoint/resume, suitable for 1M-5M records
+- **Pattern 3 (Parallel)**: High performance, suitable for 5M-10M records
+- **Pattern 4 (Distributed)**: Enterprise scale, suitable for 10M+ records
+**Key Takeaways:**
+1. Always use streaming APIs for large files
+2. Clear batches after processing to free memory
+3. Use chunks + VersoriKV for checkpoint/resume
+4. Parallel processing trades RAM for speed
+5. Monitor memory usage throughout processing
+6. Test with representative file sizes before production