@fluentcommerce/fc-connect-sdk 0.1.53 → 0.1.55

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (495) hide show
  1. package/CHANGELOG.md +30 -2
  2. package/README.md +39 -0
  3. package/dist/cjs/auth/index.d.ts +3 -0
  4. package/dist/cjs/auth/index.js +13 -0
  5. package/dist/cjs/auth/profile-loader.d.ts +18 -0
  6. package/dist/cjs/auth/profile-loader.js +208 -0
  7. package/dist/cjs/client-factory.d.ts +4 -0
  8. package/dist/cjs/client-factory.js +10 -0
  9. package/dist/cjs/clients/fluent-client.js +13 -6
  10. package/dist/cjs/index.d.ts +3 -1
  11. package/dist/cjs/index.js +8 -2
  12. package/dist/cjs/utils/pagination-helpers.js +38 -2
  13. package/dist/cjs/versori/fluent-versori-client.js +11 -5
  14. package/dist/esm/auth/index.d.ts +3 -0
  15. package/dist/esm/auth/index.js +2 -0
  16. package/dist/esm/auth/profile-loader.d.ts +18 -0
  17. package/dist/esm/auth/profile-loader.js +169 -0
  18. package/dist/esm/client-factory.d.ts +4 -0
  19. package/dist/esm/client-factory.js +9 -0
  20. package/dist/esm/clients/fluent-client.js +13 -6
  21. package/dist/esm/index.d.ts +3 -1
  22. package/dist/esm/index.js +2 -1
  23. package/dist/esm/utils/pagination-helpers.js +38 -2
  24. package/dist/esm/versori/fluent-versori-client.js +11 -5
  25. package/dist/tsconfig.esm.tsbuildinfo +1 -1
  26. package/dist/tsconfig.tsbuildinfo +1 -1
  27. package/dist/tsconfig.types.tsbuildinfo +1 -1
  28. package/dist/types/auth/index.d.ts +3 -0
  29. package/dist/types/auth/profile-loader.d.ts +18 -0
  30. package/dist/types/client-factory.d.ts +4 -0
  31. package/dist/types/index.d.ts +3 -1
  32. package/docs/00-START-HERE/EXPORT-VALIDATION.md +158 -158
  33. package/docs/00-START-HERE/cli-analyze-source-structure-guide.md +655 -655
  34. package/docs/00-START-HERE/cli-documentation-index.md +202 -202
  35. package/docs/00-START-HERE/cli-quick-reference.md +252 -252
  36. package/docs/00-START-HERE/decision-tree.md +552 -552
  37. package/docs/00-START-HERE/getting-started.md +1070 -1070
  38. package/docs/00-START-HERE/mapper-quick-decision-guide.md +235 -235
  39. package/docs/00-START-HERE/readme.md +237 -237
  40. package/docs/00-START-HERE/retailerid-configuration.md +404 -404
  41. package/docs/00-START-HERE/sdk-philosophy.md +794 -794
  42. package/docs/00-START-HERE/troubleshooting-quick-reference.md +1086 -1086
  43. package/docs/01-TEMPLATES/faq.md +686 -686
  44. package/docs/01-TEMPLATES/patterns/pattern-templates-guide.md +68 -68
  45. package/docs/01-TEMPLATES/patterns/patterns-csv-schema-validation-and-rejection-report.md +233 -233
  46. package/docs/01-TEMPLATES/patterns/patterns-custom-resolvers.md +407 -407
  47. package/docs/01-TEMPLATES/patterns/patterns-error-handling-retry.md +511 -511
  48. package/docs/01-TEMPLATES/patterns/patterns-field-mapping-universal.md +701 -701
  49. package/docs/01-TEMPLATES/patterns/patterns-large-file-splitting.md +1430 -1430
  50. package/docs/01-TEMPLATES/patterns/patterns-master-data-etl.md +2399 -2399
  51. package/docs/01-TEMPLATES/patterns/patterns-pagination-streaming.md +447 -447
  52. package/docs/01-TEMPLATES/patterns/patterns-state-duplicate-prevention.md +385 -385
  53. package/docs/01-TEMPLATES/readme.md +957 -957
  54. package/docs/01-TEMPLATES/standalone/standalone-asn-inbound-processing.md +1209 -1209
  55. package/docs/01-TEMPLATES/standalone/standalone-graphql-query-export.md +1140 -1140
  56. package/docs/01-TEMPLATES/standalone/standalone-graphql-to-parquet-partitioned-s3.md +432 -432
  57. package/docs/01-TEMPLATES/standalone/standalone-multi-channel-inventory-sync.md +1185 -1185
  58. package/docs/01-TEMPLATES/standalone/standalone-multi-source-aggregation.md +1462 -1462
  59. package/docs/01-TEMPLATES/standalone/standalone-s3-csv-batch-api.md +1390 -1390
  60. package/docs/01-TEMPLATES/standalone/standalone-s3-csv-inventory-to-batch.md +330 -330
  61. package/docs/01-TEMPLATES/standalone/standalone-scripts-guide.md +87 -87
  62. package/docs/01-TEMPLATES/standalone/standalone-sftp-xml-graphql.md +1444 -1444
  63. package/docs/01-TEMPLATES/standalone/standalone-webhook-payload-processing.md +688 -688
  64. package/docs/01-TEMPLATES/versori/business-examples/business-examples-dropship-order-routing.md +193 -193
  65. package/docs/01-TEMPLATES/versori/business-examples/business-examples-graphql-parquet-extraction.md +518 -518
  66. package/docs/01-TEMPLATES/versori/business-examples/business-examples-inter-location-transfers.md +2162 -2162
  67. package/docs/01-TEMPLATES/versori/business-examples/business-examples-pre-order-allocation.md +2226 -2226
  68. package/docs/01-TEMPLATES/versori/business-examples/business-scenarios-guide.md +87 -87
  69. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-connection-validation-pattern.md +656 -656
  70. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-dual-workflow-connector.md +835 -835
  71. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-guide.md +108 -108
  72. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-kv-state-management.md +1533 -1533
  73. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-xml-response-patterns.md +1160 -1160
  74. package/docs/01-TEMPLATES/versori/versori-platform-guide.md +201 -201
  75. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-asn-purchase-order.md +1906 -1906
  76. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-dropship-routing.md +1074 -1074
  77. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-flash-sale-reserve.md +1395 -1395
  78. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-generic-xml-order.md +888 -888
  79. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-payment-gateway-integration.md +2478 -2478
  80. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-rma-returns-comprehensive.md +2240 -2240
  81. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-xml-order-ingestion.md +2029 -2029
  82. package/docs/01-TEMPLATES/versori/webhooks/webhook-templates-guide.md +140 -140
  83. package/docs/01-TEMPLATES/versori/workflows/_examples/sample-data/inventory-mapping.json +20 -20
  84. package/docs/01-TEMPLATES/versori/workflows/_examples/sample-data/products_2025-01-22.csv +11 -11
  85. package/docs/01-TEMPLATES/versori/workflows/_examples/sample-data/sample-data-guide.md +34 -34
  86. package/docs/01-TEMPLATES/versori/workflows/_examples/workflow-examples-guide.md +36 -36
  87. package/docs/01-TEMPLATES/versori/workflows/extraction/extraction-modes-guide.md +1038 -1038
  88. package/docs/01-TEMPLATES/versori/workflows/extraction/extraction-workflows-guide.md +138 -138
  89. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/graphql-extraction-guide.md +63 -63
  90. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-fulfillments-to-sftp-csv.md +2062 -2062
  91. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-fulfillments-to-sftp-xml.md +2294 -2294
  92. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-positions-to-s3-csv.md +2461 -2461
  93. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-positions-to-sftp-xml.md +2529 -2529
  94. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-quantities-to-s3-csv.md +2464 -2464
  95. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-quantities-to-s3-json.md +1959 -1959
  96. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-orders-to-s3-csv.md +1953 -1953
  97. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-orders-to-sftp-xml.md +2541 -2541
  98. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-products-to-s3-json.md +2384 -2384
  99. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-products-to-sftp-xml.md +2445 -2445
  100. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-virtual-positions-to-s3-csv.md +2355 -2355
  101. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-virtual-positions-to-s3-json.md +2042 -2042
  102. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-virtual-positions-to-sftp-xml.md +2726 -2726
  103. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/batch-api-guide.md +206 -206
  104. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-cycle-count-reconciliation.md +2030 -2030
  105. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-multi-channel-inventory-sync.md +1882 -1882
  106. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-s3-csv-inventory-batch.md +2827 -2827
  107. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-s3-json-inventory-batch.md +1952 -1952
  108. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-s3-xml-inventory-batch.md +3289 -3289
  109. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-sftp-csv-inventory-batch.md +3064 -3064
  110. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-sftp-json-inventory-batch.md +3238 -3238
  111. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-sftp-xml-inventory-batch.md +2977 -2977
  112. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/event-api-guide.md +321 -321
  113. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-payload-json-order-cancel-event.md +959 -959
  114. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-payload-xml-order-cancel-event.md +1170 -1170
  115. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-csv-product-event.md +2312 -2312
  116. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-json-product-event.md +2999 -2999
  117. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-parquet-product-event.md +2836 -2836
  118. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-xml-product-event.md +2395 -2395
  119. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-csv-product-event.md +2295 -2295
  120. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-json-product-event.md +2602 -2602
  121. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-parquet-product-event.md +2589 -2589
  122. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-xml-product-event.md +3578 -3578
  123. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/graphql-mutations-guide.md +93 -93
  124. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-payload-json-order-update-graphql.md +1260 -1260
  125. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-payload-xml-order-update-graphql.md +1472 -1472
  126. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-csv-control-graphql.md +2417 -2417
  127. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-csv-location-graphql.md +2811 -2811
  128. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-csv-price-graphql.md +2619 -2619
  129. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-json-location-graphql.md +2807 -2807
  130. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-xml-location-graphql.md +2373 -2373
  131. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-sftp-csv-control-graphql.md +2740 -2740
  132. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-sftp-csv-location-graphql.md +2760 -2760
  133. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-sftp-json-location-graphql.md +1710 -1710
  134. package/docs/01-TEMPLATES/versori/workflows/ingestion/ingestion-workflows-guide.md +136 -136
  135. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/rubix-webhooks-guide.md +520 -520
  136. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/template-webhook-rubix-fulfilment-to-sftp-xml-inline.md +1418 -1418
  137. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/template-webhook-rubix-fulfilment-to-sftp-xml-universal-mapper.md +1785 -1785
  138. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/template-webhook-rubix-order-attribute-update.md +824 -824
  139. package/docs/01-TEMPLATES/versori/workflows/workflows-overview-guide.md +646 -646
  140. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-batch-archival.md +724 -724
  141. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-job-tracker.md +627 -627
  142. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-partial-batch-recovery.md +561 -561
  143. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-quick-reference.md +367 -367
  144. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-readme.md +407 -407
  145. package/docs/02-CORE-GUIDES/advanced-services/readme.md +49 -49
  146. package/docs/02-CORE-GUIDES/api-reference/api-reference-quick-reference.md +548 -548
  147. package/docs/02-CORE-GUIDES/api-reference/event-api-input-output-reference.md +702 -1171
  148. package/docs/02-CORE-GUIDES/api-reference/examples/client-initialization.ts +286 -286
  149. package/docs/02-CORE-GUIDES/api-reference/graphql-error-classification.md +337 -337
  150. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-01-client-api.md +399 -482
  151. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-03-authentication.md +199 -199
  152. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-04-graphql-mapping.md +925 -925
  153. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-05-services.md +1198 -1198
  154. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-06-data-sources.md +1083 -1083
  155. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-07-parsers.md +1097 -1097
  156. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-08-pagination.md +513 -513
  157. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-08-types.md +545 -597
  158. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-09-error-handling.md +527 -527
  159. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-09-webhook-validation.md +514 -514
  160. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-10-extraction.md +557 -557
  161. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-10-utilities.md +412 -412
  162. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-11-cli-tools.md +423 -423
  163. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-11-error-handling.md +716 -716
  164. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-12-analyze-source-structure.md +518 -518
  165. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-12-partial-responses.md +212 -212
  166. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-12-testing.md +300 -300
  167. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-13-resolver-builder.md +322 -322
  168. package/docs/02-CORE-GUIDES/api-reference/readme.md +279 -279
  169. package/docs/02-CORE-GUIDES/auto-pagination/auto-pagination-quick-reference.md +351 -351
  170. package/docs/02-CORE-GUIDES/auto-pagination/auto-pagination-readme.md +277 -277
  171. package/docs/02-CORE-GUIDES/auto-pagination/examples/auto-pagination-readme.md +178 -178
  172. package/docs/02-CORE-GUIDES/auto-pagination/examples/common-patterns.ts +351 -351
  173. package/docs/02-CORE-GUIDES/auto-pagination/examples/paginate-products.ts +384 -384
  174. package/docs/02-CORE-GUIDES/auto-pagination/examples/paginate-virtual-positions.ts +308 -308
  175. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-01-foundations.md +470 -470
  176. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-02-quick-start.md +713 -713
  177. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-03-configuration.md +754 -754
  178. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-04-advanced-patterns.md +732 -732
  179. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-05-sdk-integration.md +847 -847
  180. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-06-troubleshooting.md +359 -359
  181. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-07-api-reference.md +462 -462
  182. package/docs/02-CORE-GUIDES/auto-pagination/readme.md +54 -54
  183. package/docs/02-CORE-GUIDES/data-sources/data-sources-file-operations-error-handling.md +1487 -1487
  184. package/docs/02-CORE-GUIDES/data-sources/data-sources-quick-reference.md +836 -836
  185. package/docs/02-CORE-GUIDES/data-sources/data-sources-readme.md +276 -276
  186. package/docs/02-CORE-GUIDES/data-sources/data-sources-sftp-credential-access-security.md +553 -553
  187. package/docs/02-CORE-GUIDES/data-sources/examples/common-patterns.ts +409 -409
  188. package/docs/02-CORE-GUIDES/data-sources/examples/data-sources-readme.md +178 -178
  189. package/docs/02-CORE-GUIDES/data-sources/examples/s3-operations.ts +308 -308
  190. package/docs/02-CORE-GUIDES/data-sources/examples/sftp-operations.ts +371 -371
  191. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-01-foundations.md +735 -735
  192. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-02-s3-operations.md +1302 -1302
  193. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-03-sftp-operations.md +1379 -1379
  194. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-04-file-patterns.md +941 -941
  195. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-05-advanced-topics.md +813 -813
  196. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-06-integration-patterns.md +486 -486
  197. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-07-troubleshooting.md +387 -387
  198. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-08-api-reference.md +417 -417
  199. package/docs/02-CORE-GUIDES/data-sources/readme.md +77 -77
  200. package/docs/02-CORE-GUIDES/error-handling-guide.md +936 -936
  201. package/docs/02-CORE-GUIDES/extraction/examples/02-core-guides-extraction-readme.md +116 -116
  202. package/docs/02-CORE-GUIDES/extraction/examples/common-patterns.ts +428 -428
  203. package/docs/02-CORE-GUIDES/extraction/examples/extract-inventory-basic.ts +187 -187
  204. package/docs/02-CORE-GUIDES/extraction/extraction-quick-reference.md +596 -596
  205. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-01-foundations.md +514 -514
  206. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-02-basic-extraction.md +823 -823
  207. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-03-parquet-processing.md +507 -507
  208. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-04-data-enrichment.md +546 -546
  209. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-05-transformation.md +494 -494
  210. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-06-export-formats.md +458 -458
  211. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-06-performance.md +138 -138
  212. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-07-api-reference.md +148 -148
  213. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-07-optimization.md +692 -692
  214. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-08-extraction-orchestrator.md +1008 -1008
  215. package/docs/02-CORE-GUIDES/extraction/readme.md +151 -151
  216. package/docs/02-CORE-GUIDES/ingestion/examples/_simple-kv-store.ts +40 -40
  217. package/docs/02-CORE-GUIDES/ingestion/examples/error-recovery.ts +728 -728
  218. package/docs/02-CORE-GUIDES/ingestion/examples/event-driven.ts +501 -501
  219. package/docs/02-CORE-GUIDES/ingestion/examples/local-file-ingestion.ts +88 -88
  220. package/docs/02-CORE-GUIDES/ingestion/examples/parquet-ingestion.ts +117 -117
  221. package/docs/02-CORE-GUIDES/ingestion/examples/performance-optimized.ts +647 -647
  222. package/docs/02-CORE-GUIDES/ingestion/examples/s3-csv-ingestion.ts +169 -169
  223. package/docs/02-CORE-GUIDES/ingestion/examples/sftp-csv-ingestion.ts +134 -134
  224. package/docs/02-CORE-GUIDES/ingestion/ingestion-quick-reference.md +546 -546
  225. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-01-introduction.md +626 -626
  226. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-02-quick-start.md +658 -658
  227. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-03-data-sources.md +1052 -1052
  228. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-04-field-mapping.md +763 -763
  229. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-05-advanced-parsers.md +676 -676
  230. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-06-batch-api.md +1295 -1295
  231. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-api-reference.md +138 -138
  232. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md +1037 -1037
  233. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md +1349 -1349
  234. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-09-best-practices.md +1893 -1893
  235. package/docs/02-CORE-GUIDES/ingestion/readme.md +160 -160
  236. package/docs/02-CORE-GUIDES/logging-guide.md +585 -585
  237. package/docs/02-CORE-GUIDES/mapping/error-handling-patterns.md +401 -401
  238. package/docs/02-CORE-GUIDES/mapping/examples/02-core-guides-mapping-readme.md +128 -128
  239. package/docs/02-CORE-GUIDES/mapping/examples/common-patterns.ts +273 -273
  240. package/docs/02-CORE-GUIDES/mapping/examples/csv-location-ingestion.json +36 -36
  241. package/docs/02-CORE-GUIDES/mapping/examples/csv-mapping.ts +242 -242
  242. package/docs/02-CORE-GUIDES/mapping/examples/graphql-to-parquet-extraction.json +36 -36
  243. package/docs/02-CORE-GUIDES/mapping/examples/json-mapping.ts +213 -213
  244. package/docs/02-CORE-GUIDES/mapping/examples/json-product-to-mutation.json +48 -48
  245. package/docs/02-CORE-GUIDES/mapping/examples/xml-mapping.ts +291 -291
  246. package/docs/02-CORE-GUIDES/mapping/examples/xml-order-to-mutation.json +45 -45
  247. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/graphql-mutation-mapping-quick-reference.md +463 -463
  248. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/graphql-mutation-mapping-readme.md +227 -227
  249. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-01-introduction.md +222 -222
  250. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-02-quick-start.md +351 -351
  251. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-03-schema-validation.md +569 -569
  252. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-04-mapping-patterns.md +471 -471
  253. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-05-configuration-reference.md +611 -611
  254. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-06-advanced-xpath.md +148 -148
  255. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-06-path-syntax.md +464 -464
  256. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-07-api-reference.md +94 -94
  257. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-07-array-handling.md +307 -307
  258. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-08-custom-resolvers.md +544 -544
  259. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-09-advanced-patterns.md +427 -427
  260. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-10-hooks-and-variables.md +336 -336
  261. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-11-error-handling.md +488 -488
  262. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-12-arguments-vs-nodes.md +383 -383
  263. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-13-best-practices.md +477 -477
  264. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/readme.md +62 -62
  265. package/docs/02-CORE-GUIDES/mapping/mapping-format-decision-tree.md +480 -480
  266. package/docs/02-CORE-GUIDES/mapping/mapping-graphql-alias-batching-guide.md +820 -820
  267. package/docs/02-CORE-GUIDES/mapping/mapping-javascript-objects.md +2369 -2369
  268. package/docs/02-CORE-GUIDES/mapping/mapping-mapper-comparison-guide.md +682 -682
  269. package/docs/02-CORE-GUIDES/mapping/modules/02-core-guides-mapping-07-api-reference.md +1327 -1327
  270. package/docs/02-CORE-GUIDES/mapping/modules/02-core-guides-mapping-08-error-handling.md +1142 -1142
  271. package/docs/02-CORE-GUIDES/mapping/modules/mapping-04-use-cases.md +891 -891
  272. package/docs/02-CORE-GUIDES/mapping/modules/mapping-06-helpers-resolvers.md +1126 -1126
  273. package/docs/02-CORE-GUIDES/mapping/modules/mapping-06-sdk-resolvers.md +199 -199
  274. package/docs/02-CORE-GUIDES/mapping/modules/mapping-07-api-reference.md +1319 -1319
  275. package/docs/02-CORE-GUIDES/mapping/readme.md +178 -178
  276. package/docs/02-CORE-GUIDES/mapping/resolver-registration.md +410 -410
  277. package/docs/02-CORE-GUIDES/mapping/resolvers/examples/common-patterns.ts +226 -226
  278. package/docs/02-CORE-GUIDES/mapping/resolvers/examples/custom-resolvers.ts +227 -227
  279. package/docs/02-CORE-GUIDES/mapping/resolvers/examples/sdk-resolvers-usage.ts +203 -203
  280. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-readme.md +274 -274
  281. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-api-reference.md +679 -679
  282. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-cookbook.md +826 -826
  283. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-guide.md +1330 -1330
  284. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-helpers-reference.md +1437 -1437
  285. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-parameters-reference.md +553 -553
  286. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-troubleshooting.md +854 -854
  287. package/docs/02-CORE-GUIDES/mapping/resolvers/readme.md +75 -75
  288. package/docs/02-CORE-GUIDES/parsers/examples/02-core-guides-parsers-readme.md +161 -161
  289. package/docs/02-CORE-GUIDES/parsers/examples/csv-parser-examples.ts +110 -110
  290. package/docs/02-CORE-GUIDES/parsers/examples/json-parser-examples.ts +33 -33
  291. package/docs/02-CORE-GUIDES/parsers/examples/parquet-parser-examples.ts +47 -47
  292. package/docs/02-CORE-GUIDES/parsers/examples/xml-parser-examples.ts +38 -38
  293. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-01-foundations.md +355 -355
  294. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-02-csv-parser.md +772 -772
  295. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-03-json-parser.md +789 -789
  296. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-04-xml-parser.md +857 -857
  297. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-05-parquet-parser.md +603 -603
  298. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-06-integration-patterns.md +702 -702
  299. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-06-streaming.md +121 -121
  300. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-07-api-reference.md +89 -89
  301. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-07-troubleshooting.md +727 -727
  302. package/docs/02-CORE-GUIDES/parsers/parsers-quick-reference.md +482 -482
  303. package/docs/02-CORE-GUIDES/parsers/parsers-readme.md +258 -258
  304. package/docs/02-CORE-GUIDES/parsers/readme.md +65 -65
  305. package/docs/02-CORE-GUIDES/readme.md +194 -194
  306. package/docs/02-CORE-GUIDES/webhook-validation/examples/basic-validation.ts +108 -108
  307. package/docs/02-CORE-GUIDES/webhook-validation/examples/common-patterns.ts +316 -316
  308. package/docs/02-CORE-GUIDES/webhook-validation/examples/webhook-validation-readme.md +61 -61
  309. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-01-foundations.md +440 -440
  310. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-02-quick-start.md +525 -525
  311. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-03-versori-integration.md +741 -741
  312. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-04-platform-integration.md +629 -629
  313. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-05-configuration.md +535 -535
  314. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-06-error-handling.md +611 -611
  315. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-06-troubleshooting.md +124 -124
  316. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-07-api-reference.md +511 -511
  317. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-08-rubix-webhooks.md +590 -590
  318. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-09-rubix-event-vs-http-call.md +432 -432
  319. package/docs/02-CORE-GUIDES/webhook-validation/readme.md +239 -239
  320. package/docs/02-CORE-GUIDES/webhook-validation/webhook-validation-quick-reference.md +392 -392
  321. package/docs/03-PATTERN-GUIDES/connector-scenarios/connector-scenarios-quick-reference.md +498 -498
  322. package/docs/03-PATTERN-GUIDES/connector-scenarios/connector-scenarios-readme.md +313 -313
  323. package/docs/03-PATTERN-GUIDES/connector-scenarios/examples/common-patterns.ts +612 -612
  324. package/docs/03-PATTERN-GUIDES/connector-scenarios/examples/connector-scenarios-readme.md +253 -253
  325. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-01-foundations.md +452 -452
  326. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-02-simple-scenarios.md +681 -681
  327. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-03-intermediate-scenarios.md +637 -637
  328. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-04-advanced-scenarios.md +650 -650
  329. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-05-bidirectional-sync.md +233 -233
  330. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-06-production-patterns.md +442 -442
  331. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-07-reference.md +445 -445
  332. package/docs/03-PATTERN-GUIDES/connector-scenarios/readme.md +31 -31
  333. package/docs/03-PATTERN-GUIDES/enterprise-integration-patterns.md +1528 -1528
  334. package/docs/03-PATTERN-GUIDES/error-handling/comprehensive-error-handling-guide.md +1437 -1437
  335. package/docs/03-PATTERN-GUIDES/error-handling/error-handling-quick-reference.md +390 -390
  336. package/docs/03-PATTERN-GUIDES/error-handling/examples/common-patterns.ts +438 -438
  337. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-01-foundations.md +362 -362
  338. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-02-error-types.md +850 -850
  339. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-03-utf8-handling.md +456 -456
  340. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-04-error-scenarios.md +658 -658
  341. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-05-calling-patterns.md +671 -671
  342. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-06-retry-strategies.md +1034 -1034
  343. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-07-monitoring.md +653 -653
  344. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-08-api-reference.md +847 -847
  345. package/docs/03-PATTERN-GUIDES/error-handling/readme.md +36 -36
  346. package/docs/03-PATTERN-GUIDES/examples/__tests__/readme.md +40 -40
  347. package/docs/03-PATTERN-GUIDES/examples/__tests__/resolver-examples.test.js +282 -282
  348. package/docs/03-PATTERN-GUIDES/examples/test-data/03-pattern-guides-readme.md +110 -110
  349. package/docs/03-PATTERN-GUIDES/examples/test-data/canonical-inventory.json +123 -123
  350. package/docs/03-PATTERN-GUIDES/examples/test-data/canonical-order.json +171 -171
  351. package/docs/03-PATTERN-GUIDES/examples/test-data/readme.md +28 -28
  352. package/docs/03-PATTERN-GUIDES/extraction/extraction-readme.md +15 -15
  353. package/docs/03-PATTERN-GUIDES/extraction/readme.md +25 -25
  354. package/docs/03-PATTERN-GUIDES/file-operations/examples/common-patterns.ts +407 -407
  355. package/docs/03-PATTERN-GUIDES/file-operations/examples/file-operations-readme.md +142 -142
  356. package/docs/03-PATTERN-GUIDES/file-operations/file-operations-quick-reference.md +462 -462
  357. package/docs/03-PATTERN-GUIDES/file-operations/file-operations-readme.md +379 -379
  358. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-01-foundations.md +430 -430
  359. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-02-quick-start.md +484 -484
  360. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-03-s3-operations.md +507 -507
  361. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-04-sftp-operations.md +963 -963
  362. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-05-streaming-performance.md +503 -503
  363. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-06-archive-patterns.md +386 -386
  364. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-06-error-handling.md +117 -117
  365. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-07-api-reference.md +78 -78
  366. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-07-testing-troubleshooting.md +567 -567
  367. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-08-api-reference.md +1055 -1055
  368. package/docs/03-PATTERN-GUIDES/file-operations/readme.md +32 -32
  369. package/docs/03-PATTERN-GUIDES/ingestion/ingestion-readme.md +15 -15
  370. package/docs/03-PATTERN-GUIDES/ingestion/readme.md +25 -25
  371. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/batch-processing.ts +130 -130
  372. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/common-patterns.ts +360 -360
  373. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/delta-sync.ts +130 -130
  374. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/integration-patterns-readme.md +100 -100
  375. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/real-time-webhook.ts +398 -398
  376. package/docs/03-PATTERN-GUIDES/integration-patterns/integration-patterns-quick-reference.md +962 -962
  377. package/docs/03-PATTERN-GUIDES/integration-patterns/integration-patterns-readme.md +134 -134
  378. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-01-real-time-processing.md +991 -991
  379. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-02-batch-processing.md +1547 -1547
  380. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-03-delta-sync.md +1108 -1108
  381. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-04-webhook-patterns.md +1181 -1181
  382. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-05-error-handling.md +1061 -1061
  383. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-06-advanced-integration-services.md +1547 -1547
  384. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-06-performance.md +109 -109
  385. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-07-api-reference.md +34 -34
  386. package/docs/03-PATTERN-GUIDES/integration-patterns/readme.md +30 -30
  387. package/docs/03-PATTERN-GUIDES/logging-minimal-mode.md +128 -128
  388. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/common-patterns.ts +380 -380
  389. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/multiple-connections-readme.md +139 -139
  390. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/parallel-root-connections.ts +149 -149
  391. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/real-world-scenarios.ts +405 -405
  392. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-01-foundations.md +378 -378
  393. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-02-quick-start.md +566 -566
  394. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-03-targeting-connections.md +659 -659
  395. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-04-parallel-queries.md +656 -656
  396. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-05-best-practices.md +624 -624
  397. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-06-api-reference.md +824 -824
  398. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-06-versori.md +119 -119
  399. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-07-api-reference.md +87 -87
  400. package/docs/03-PATTERN-GUIDES/multiple-connections/multiple-connections-quick-reference.md +353 -353
  401. package/docs/03-PATTERN-GUIDES/multiple-connections/multiple-connections-readme.md +270 -270
  402. package/docs/03-PATTERN-GUIDES/multiple-connections/readme.md +30 -30
  403. package/docs/03-PATTERN-GUIDES/pagination/pagination-readme.md +14 -14
  404. package/docs/03-PATTERN-GUIDES/pagination/readme.md +24 -24
  405. package/docs/03-PATTERN-GUIDES/parquet/examples/common-patterns.ts +180 -180
  406. package/docs/03-PATTERN-GUIDES/parquet/examples/read-parquet.ts +48 -48
  407. package/docs/03-PATTERN-GUIDES/parquet/examples/write-parquet.ts +65 -65
  408. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-01-introduction.md +393 -393
  409. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-02-quick-start.md +572 -572
  410. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-03-reading-parquet.md +525 -525
  411. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-04-writing-parquet.md +554 -554
  412. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-05-graphql-extraction.md +405 -405
  413. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-06-performance.md +104 -104
  414. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-06-s3-integration.md +511 -511
  415. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-07-api-reference.md +90 -90
  416. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-07-performance-optimization.md +525 -525
  417. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-08-best-practices.md +712 -712
  418. package/docs/03-PATTERN-GUIDES/parquet/parquet-quick-reference.md +683 -683
  419. package/docs/03-PATTERN-GUIDES/parquet/parquet-readme.md +248 -248
  420. package/docs/03-PATTERN-GUIDES/parquet/readme.md +32 -32
  421. package/docs/03-PATTERN-GUIDES/parsers/parsers-readme.md +12 -12
  422. package/docs/03-PATTERN-GUIDES/parsers/readme.md +24 -24
  423. package/docs/03-PATTERN-GUIDES/readme.md +159 -159
  424. package/docs/03-PATTERN-GUIDES/webhooks/readme.md +24 -24
  425. package/docs/03-PATTERN-GUIDES/webhooks/webhooks-readme.md +8 -8
  426. package/docs/04-REFERENCE/architecture/architecture-01-overview.md +427 -427
  427. package/docs/04-REFERENCE/architecture/architecture-02-client-architecture.md +424 -424
  428. package/docs/04-REFERENCE/architecture/architecture-03-data-flow.md +690 -690
  429. package/docs/04-REFERENCE/architecture/architecture-04-service-layer.md +834 -834
  430. package/docs/04-REFERENCE/architecture/architecture-05-integration-architecture.md +655 -655
  431. package/docs/04-REFERENCE/architecture/architecture-06-state-management.md +653 -653
  432. package/docs/04-REFERENCE/architecture/architecture-adding-new-data-sources.md +686 -686
  433. package/docs/04-REFERENCE/architecture/readme.md +279 -279
  434. package/docs/04-REFERENCE/platforms/deno/readme.md +117 -117
  435. package/docs/04-REFERENCE/platforms/nodejs/readme.md +146 -146
  436. package/docs/04-REFERENCE/platforms/readme.md +135 -135
  437. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-01-introduction.md +398 -398
  438. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-02-quick-start.md +560 -560
  439. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-03-authentication.md +757 -757
  440. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-04-workflows.md +2476 -2476
  441. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-05-connections.md +1167 -1167
  442. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-06-kv-storage.md +990 -990
  443. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-06-state-management.md +121 -121
  444. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-07-api-reference.md +68 -68
  445. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-07-deployment.md +731 -731
  446. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-08-best-practices.md +1111 -1111
  447. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-09-signature-reference.md +766 -766
  448. package/docs/04-REFERENCE/platforms/versori/platforms-versori-readme.md +299 -299
  449. package/docs/04-REFERENCE/platforms/versori/platforms-versori-s3-sftp-configuration-guide.md +1425 -1425
  450. package/docs/04-REFERENCE/platforms/versori/platforms-versori-webhook-api-key-security.md +816 -816
  451. package/docs/04-REFERENCE/platforms/versori/platforms-versori-webhook-connection-security.md +681 -681
  452. package/docs/04-REFERENCE/platforms/versori/platforms-versori-workflow-task-types.md +708 -708
  453. package/docs/04-REFERENCE/platforms/versori/readme.md +108 -108
  454. package/docs/04-REFERENCE/readme.md +148 -148
  455. package/docs/04-REFERENCE/resolver-signature/examples/advanced-resolvers.ts +482 -482
  456. package/docs/04-REFERENCE/resolver-signature/examples/async-resolvers.ts +496 -496
  457. package/docs/04-REFERENCE/resolver-signature/examples/basic-resolvers.ts +343 -343
  458. package/docs/04-REFERENCE/resolver-signature/examples/resolver-signature-readme.md +188 -188
  459. package/docs/04-REFERENCE/resolver-signature/examples/testing-resolvers.ts +463 -463
  460. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-01-foundations.md +286 -286
  461. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-02-parameter-reference.md +643 -643
  462. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-03-basic-examples.md +521 -521
  463. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-04-advanced-patterns.md +739 -739
  464. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-05-sdk-resolvers.md +531 -531
  465. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-06-migration-guide.md +650 -650
  466. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-06-testing.md +125 -125
  467. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-07-api-reference.md +794 -794
  468. package/docs/04-REFERENCE/resolver-signature/readme.md +64 -64
  469. package/docs/04-REFERENCE/resolver-signature/resolver-signature-quick-reference.md +270 -270
  470. package/docs/04-REFERENCE/resolver-signature/resolver-signature-readme.md +351 -351
  471. package/docs/04-REFERENCE/schema/fluent-commerce-schema.json +764 -764
  472. package/docs/04-REFERENCE/schema/readme.md +141 -141
  473. package/docs/04-REFERENCE/testing/examples/04-reference-testing-readme.md +158 -158
  474. package/docs/04-REFERENCE/testing/examples/fluent-testing.ts +62 -62
  475. package/docs/04-REFERENCE/testing/examples/health-check.ts +155 -155
  476. package/docs/04-REFERENCE/testing/examples/integration-test.ts +119 -119
  477. package/docs/04-REFERENCE/testing/examples/performance-test.ts +183 -183
  478. package/docs/04-REFERENCE/testing/examples/s3-testing.ts +127 -127
  479. package/docs/04-REFERENCE/testing/modules/04-reference-testing-01-foundations.md +267 -267
  480. package/docs/04-REFERENCE/testing/modules/04-reference-testing-02-s3-testing.md +599 -599
  481. package/docs/04-REFERENCE/testing/modules/04-reference-testing-03-fluent-testing.md +589 -589
  482. package/docs/04-REFERENCE/testing/modules/04-reference-testing-04-integration-testing.md +699 -699
  483. package/docs/04-REFERENCE/testing/modules/04-reference-testing-05-debugging.md +478 -478
  484. package/docs/04-REFERENCE/testing/modules/04-reference-testing-06-cicd-integration.md +463 -463
  485. package/docs/04-REFERENCE/testing/modules/04-reference-testing-06-preflight-validation.md +131 -131
  486. package/docs/04-REFERENCE/testing/modules/04-reference-testing-07-best-practices.md +499 -499
  487. package/docs/04-REFERENCE/testing/modules/04-reference-testing-07-coverage-ci.md +165 -165
  488. package/docs/04-REFERENCE/testing/modules/04-reference-testing-08-api-reference.md +634 -634
  489. package/docs/04-REFERENCE/testing/readme.md +86 -86
  490. package/docs/04-REFERENCE/testing/testing-quick-reference.md +667 -667
  491. package/docs/04-REFERENCE/testing/testing-readme.md +286 -286
  492. package/docs/04-REFERENCE/troubleshooting/readme.md +144 -144
  493. package/docs/04-REFERENCE/troubleshooting/troubleshooting-deno-sftp-compatibility.md +392 -392
  494. package/docs/template-loading-matrix.md +242 -242
  495. package/package.json +5 -3
@@ -1,1430 +1,1430 @@
1
- # Pattern: Large File Processing & Chunking
2
-
3
- **FC Connect SDK Use Case Guide**
4
-
5
- > **SDK**: [@fluentcommerce/fc-connect-sdk](https://www.npmjs.com/package/@fluentcommerce/fc-connect-sdk)
6
- > **Version**: Use latest - `npm install @fluentcommerce/fc-connect-sdk@latest`
7
-
8
- **Context**: Enterprise-scale file ingestion with streaming, splitting, and parallel processing
9
-
10
- **Type**: Advanced Pattern
11
-
12
- **Complexity**: High
13
-
14
- **Volume**: 500MB-5GB files, 1M-10M records
15
-
16
- **Latency**: Batch processing (< 30-60 min for 10M records)
17
-
18
- **Pattern**: Streaming + chunking + parallel Batch API
19
-
20
- ## When to Use This Pattern
21
-
22
- Use this pattern when dealing with:
23
-
24
- - **Large CSV files** (>500MB, >1M records)
25
- - **Memory-constrained environments** (Lambda, containers with limited RAM)
26
- - **Time-sensitive ingestion** (need parallel processing for speed)
27
- - **Reliability requirements** (checkpoint/resume on failure)
28
- - **Progress tracking** (real-time status updates)
29
-
30
- **Volume Guidance:**
31
-
32
- - **Small** (<1K records): Use basic ingestion pattern
33
- - **Medium** (1K-100K records): Use streaming pattern (Pattern 1)
34
- - **Large** (100K-1M records): Use file chunking pattern (Pattern 2)
35
- - **Huge** (1M-10M records): Use parallel processing pattern (Pattern 3)
36
- - **Enterprise** (10M+ records): Use distributed processing pattern (Pattern 4)
37
-
38
- ## Problem Statement
39
-
40
- ### Why Splitting is Needed
41
-
42
- **Memory Constraints:**
43
-
44
- ```typescript
45
- // ❌ WRONG - Loads entire 2GB file into memory
46
- const csvContent = await fs.readFile('huge-inventory.csv', 'utf-8');
47
- const records = await csvParser.parse(csvContent); // 💥 Out of memory
48
- ```
49
-
50
- **Impact:**
51
-
52
- - Lambda 512MB: Crashes on 500MB+ files
53
- - Container 1GB: Struggles with 1GB+ files
54
- - Node.js default heap (4GB): Fails on 5GB+ files
55
-
56
- **Time Constraints:**
57
-
58
- ```typescript
59
- // ❌ WRONG - Sequential processing takes 90+ minutes
60
- for (const record of records) {
61
- await processRecord(record); // Too slow for 10M records
62
- }
63
- ```
64
-
65
- **Reliability Requirements:**
66
-
67
- ```typescript
68
- // ❌ WRONG - Network failure loses all progress
69
- await processAllRecords(records); // If fails at record 5M, restart from 0
70
- ```
71
-
72
- ### Solution Overview
73
-
74
- This guide demonstrates 4 progressive patterns:
75
-
76
- 1. **Basic Streaming** (~200 lines) - Process records as they arrive, memory-efficient
77
- 2. **File Chunking** (~300 lines) - Split large files into manageable chunks
78
- 3. **Parallel Processing** (~400 lines) - Process chunks concurrently with progress tracking
79
- 4. **Distributed Processing** (~300 lines) - Use Versori scheduled workflows for enterprise scale
80
-
81
- ## SDK Methods Used
82
-
83
- ```typescript
84
- import {
85
- createClient,
86
- // Client factory (auto-detects context)
87
- CSVParserService,
88
- // Streaming CSV parser
89
- S3DataSource,
90
- // S3 file operations
91
- UniversalMapper,
92
- // Field mapping
93
- StateService,
94
- // Progress tracking
95
- VersoriKVAdapter,
96
- // Versori state management,
97
- // Structured logging,
98
- createConsoleLogger,
99
- toStructuredLogger
100
- } from '@fluentcommerce/fc-connect-sdk';
101
- ```
102
-
103
- ---
104
-
105
- ## Pattern 1: Basic Streaming (Memory-Efficient)
106
-
107
- **Best for:** 100K-1M records, single-threaded processing, memory-constrained environments
108
-
109
- **Memory Usage:**
110
-
111
- - ❌ Without streaming: 2GB file = 2GB+ RAM (file + parsed objects)
112
- - ✅ With streaming: 2GB file = ~50MB RAM (processes records incrementally)
113
-
114
- ### Implementation
115
-
116
- ```typescript
117
- import {
118
- createClient,
119
- CSVParserService,
120
- S3DataSource,
121
- UniversalMapper,
122
- createConsoleLogger,
123
- toStructuredLogger
124
- } from '@fluentcommerce/fc-connect-sdk';
125
-
126
- const logger = createConsoleLogger();
127
-
128
- async function streamingIngestion(ctx: any) {
129
- logger.info('Starting streaming ingestion');
130
-
131
- // Create client (auto-detects Versori context)
132
- const client = await createClient(ctx);
133
-
134
- // Initialize S3 data source
135
- const s3 = new S3DataSource(
136
- {
137
- type: 'S3_CSV',
138
- connectionId: 'my-s3',
139
- name: 'Inventory Files S3',
140
- s3Config: {
141
- bucket: 'inventory-files',
142
- region: 'us-east-1',
143
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
144
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
145
- },
146
- },
147
- logger
148
- );
149
-
150
- // Define field mapping
151
- const mapper = new UniversalMapper({
152
- fields: {
153
- skuRef: { source: 'sku', required: true },
154
- locationRef: { source: 'location_code', required: true },
155
- qty: { source: 'quantity', resolver: 'sdk.parseInt' },
156
- expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
157
- },
158
- });
159
-
160
- // Create CSV parser with streaming enabled
161
- const csvParser = new CSVParserService();
162
-
163
- // Download file as stream (not loaded into memory)
164
- logger.info('Downloading file from S3', {
165
- key: 'inventory/large-file.csv',
166
- });
167
-
168
- const fileContent = (await s3.downloadFile('inventory/large-file.csv', {
169
- encoding: 'utf8',
170
- })) as string;
171
-
172
- // Create job for batch ingestion
173
- const job = await client.createJob({
174
- name: 'streaming-inventory-ingestion',
175
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
176
- });
177
-
178
- logger.info('Job created', { jobId: job.id });
179
-
180
- // Statistics tracking
181
- let recordsProcessed = 0;
182
- let batchCount = 0;
183
- let errors = 0;
184
- const BATCH_SIZE = 1000;
185
- let currentBatch: any[] = [];
186
-
187
- // Stream records with batching (memory-efficient)
188
- // Records are parsed incrementally, not all at once
189
- for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
190
- try {
191
- // Map record
192
- const mapped = await mapper.map(record);
193
-
194
- if (mapped.success && mapped.data) {
195
- currentBatch.push(mapped.data);
196
- recordsProcessed++;
197
-
198
- // Send batch when full
199
- if (currentBatch.length >= BATCH_SIZE) {
200
- await client.sendBatch(job.id, {
201
- entities: currentBatch,
202
- });
203
-
204
- batchCount++;
205
-
206
- logger.info('Batch sent', {
207
- batchNumber: batchCount,
208
- recordsProcessed,
209
- currentBatchSize: currentBatch.length,
210
- });
211
-
212
- currentBatch = []; // Clear batch (frees memory)
213
- }
214
- } else {
215
- errors++;
216
- logger.warn('Record mapping failed', {
217
- record,
218
- errors: mapped.errors,
219
- });
220
- }
221
- } catch (error) {
222
- errors++;
223
- logger.error('Record processing failed', error as Error, { record });
224
- }
225
-
226
- // Progress logging every 10K records
227
- if (recordsProcessed % 10000 === 0) {
228
- logger.info('Progress update', {
229
- recordsProcessed,
230
- batchesSent: batchCount,
231
- errors,
232
- memoryUsage: process.memoryUsage().heapUsed / 1024 / 1024 + ' MB',
233
- });
234
- }
235
- }
236
-
237
- // Send remaining records
238
- if (currentBatch.length > 0) {
239
- await client.sendBatch(job.id, {
240
- entities: currentBatch,
241
- });
242
- batchCount++;
243
- }
244
-
245
- logger.info('Streaming ingestion complete', {
246
- totalRecords: recordsProcessed,
247
- batchesSent: batchCount,
248
- errors,
249
- jobId: job.id,
250
- });
251
-
252
- return {
253
- success: true,
254
- jobId: job.id,
255
- recordsProcessed,
256
- batchesSent: batchCount,
257
- errors,
258
- };
259
- }
260
- ```
261
-
262
- **Memory Profile:**
263
-
264
- ```
265
- File Size: 2GB (5M records)
266
- RAM Usage: ~50MB peak (1000 record batches)
267
- Processing Time: ~45 minutes (sequential)
268
- ```
269
-
270
- ---
271
-
272
- ## Pattern 2: File Chunking (Split & Track)
273
-
274
- **Best for:** 1M-5M records, need checkpoint/resume, want progress visibility
275
-
276
- **Strategy:**
277
-
278
- 1. Split large file into 100K record chunks
279
- 2. Write chunks to temp S3 locations
280
- 3. Track chunk metadata in VersoriKV
281
- 4. Process chunks sequentially (can resume on failure)
282
-
283
- ### Implementation
284
-
285
- ```typescript
286
- import {
287
- createClient,
288
- CSVParserService,
289
- S3DataSource,
290
- UniversalMapper,
291
- StateService,
292
- VersoriKVAdapter,
293
- createConsoleLogger,
294
- toStructuredLogger
295
- } from '@fluentcommerce/fc-connect-sdk';
296
-
297
- const logger = createConsoleLogger();
298
-
299
- interface ChunkMetadata {
300
- chunkId: string;
301
- startRecord: number;
302
- endRecord: number;
303
- s3Key: string;
304
- recordCount: number;
305
- status: 'pending' | 'processing' | 'completed' | 'failed';
306
- processedAt?: string;
307
- error?: string;
308
- }
309
-
310
- async function chunkedIngestion(ctx: any) {
311
- logger.info('Starting chunked ingestion');
312
-
313
- // Initialize services
314
- const client = await createClient(ctx);
315
-
316
- const s3 = new S3DataSource(
317
- {
318
- type: 'S3_CSV',
319
- connectionId: 'my-s3-chunked',
320
- name: 'Inventory Files S3 Chunked',
321
- s3Config: {
322
- bucket: 'inventory-files',
323
- region: 'us-east-1',
324
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
325
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
326
- },
327
- },
328
- logger
329
- );
330
-
331
- // Initialize state management
332
- const kv = context.openKv();
333
- const kvAdapter = new VersoriKVAdapter(kv);
334
- const stateService = new StateService(logger);
335
-
336
- const SOURCE_FILE = 'inventory/huge-inventory.csv';
337
- const CHUNK_SIZE = 100000; // 100K records per chunk
338
- const workflowId = 'chunked-ingestion';
339
-
340
- // STEP 1: Check if chunking is already in progress
341
- const existingState = await stateService.getSyncState(kvAdapter, workflowId);
342
-
343
- if (existingState.isInitialized && existingState.lastSyncResult === 'partial') {
344
- logger.info('Resuming from previous run', {
345
- lastProcessedFile: existingState.lastProcessedFile,
346
- lastProcessedCount: existingState.lastProcessedCount,
347
- });
348
- }
349
-
350
- // STEP 2: Split file into chunks
351
- logger.info('Splitting file into chunks', {
352
- sourceFile: SOURCE_FILE,
353
- chunkSize: CHUNK_SIZE,
354
- });
355
-
356
- const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
357
-
358
- logger.info('File split complete', {
359
- totalChunks: chunks.length,
360
- totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
361
- });
362
-
363
- // STEP 3: Create job for ingestion
364
- const job = await client.createJob({
365
- name: `chunked-inventory-ingestion-${Date.now()}`,
366
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
367
- });
368
-
369
- logger.info('Job created', { jobId: job.id });
370
-
371
- // STEP 4: Process each chunk sequentially
372
- let successCount = 0;
373
- let failureCount = 0;
374
-
375
- for (const chunk of chunks) {
376
- try {
377
- // Skip if already processed
378
- const chunkState = await kvAdapter.get(['chunk', workflowId, chunk.chunkId, 'status']);
379
-
380
- if (chunkState?.value === 'completed') {
381
- logger.info('Chunk already processed, skipping', {
382
- chunkId: chunk.chunkId,
383
- });
384
- successCount++;
385
- continue;
386
- }
387
-
388
- // Mark chunk as processing
389
- await kvAdapter.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
390
-
391
- logger.info('Processing chunk', {
392
- chunkId: chunk.chunkId,
393
- recordCount: chunk.recordCount,
394
- progress: `${successCount + failureCount}/${chunks.length}`,
395
- });
396
-
397
- // Process chunk
398
- await processChunk(s3, client, job.id, chunk);
399
-
400
- // Mark chunk as completed
401
- await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
402
- ...chunk,
403
- status: 'completed',
404
- processedAt: new Date().toISOString(),
405
- } as ChunkMetadata);
406
-
407
- successCount++;
408
-
409
- logger.info('Chunk completed', {
410
- chunkId: chunk.chunkId,
411
- successCount,
412
- failureCount,
413
- percentComplete: (((successCount + failureCount) / chunks.length) * 100).toFixed(1),
414
- });
415
- } catch (error) {
416
- failureCount++;
417
- logger.error('Chunk processing failed', error as Error, {
418
- chunkId: chunk.chunkId,
419
- });
420
-
421
- // Mark chunk as failed
422
- await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
423
- ...chunk,
424
- status: 'failed',
425
- error: (error as Error).message,
426
- } as ChunkMetadata);
427
- }
428
- }
429
-
430
- // STEP 5: Update final state
431
- await stateService.updateSyncState(
432
- kvAdapter,
433
- [
434
- {
435
- fileName: SOURCE_FILE,
436
- lastModified: new Date().toISOString(),
437
- recordCount: chunks.reduce((sum, c) => sum + c.recordCount, 0),
438
- },
439
- ],
440
- workflowId
441
- );
442
-
443
- logger.info('Chunked ingestion complete', {
444
- totalChunks: chunks.length,
445
- successCount,
446
- failureCount,
447
- jobId: job.id,
448
- });
449
-
450
- return {
451
- success: failureCount === 0,
452
- jobId: job.id,
453
- chunksProcessed: successCount,
454
- chunksFailed: failureCount,
455
- totalChunks: chunks.length,
456
- };
457
- }
458
-
459
- /**
460
- * Split file into chunks and upload to S3
461
- */
462
- async function splitFileIntoChunks(
463
- s3: S3DataSource,
464
- sourceKey: string,
465
- chunkSize: number,
466
- workflowId: string,
467
- kv: VersoriKVAdapter
468
- ): Promise<ChunkMetadata[]> {
469
- const csvParser = new CSVParserService();
470
- const chunks: ChunkMetadata[] = [];
471
-
472
- // Download source file
473
- const fileContent = (await s3.downloadFile(sourceKey, {
474
- encoding: 'utf8',
475
- })) as string;
476
-
477
- let currentChunk: any[] = [];
478
- let chunkNumber = 0;
479
- let recordNumber = 0;
480
-
481
- // Stream through file and create chunks
482
- for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
483
- currentChunk.push(record);
484
- recordNumber++;
485
-
486
- // Create chunk when size reached
487
- if (currentChunk.length >= chunkSize) {
488
- const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
489
- const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
490
-
491
- // Convert chunk to CSV
492
- const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
493
-
494
- // Upload chunk to S3
495
- await s3.uploadFile(chunkKey, chunkCSV, {
496
- contentType: 'text/csv',
497
- });
498
-
499
- // Create chunk metadata
500
- const metadata: ChunkMetadata = {
501
- chunkId,
502
- startRecord: recordNumber - currentChunk.length,
503
- endRecord: recordNumber - 1,
504
- s3Key: chunkKey,
505
- recordCount: currentChunk.length,
506
- status: 'pending',
507
- };
508
-
509
- chunks.push(metadata);
510
-
511
- // Store chunk metadata in KV
512
- await kv.set(['chunk', workflowId, chunkId], metadata);
513
-
514
- logger.info('Chunk created', {
515
- chunkId,
516
- recordCount: currentChunk.length,
517
- s3Key: chunkKey,
518
- });
519
-
520
- // Clear chunk (free memory)
521
- currentChunk = [];
522
- chunkNumber++;
523
- }
524
- }
525
-
526
- // Handle remaining records
527
- if (currentChunk.length > 0) {
528
- const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
529
- const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
530
-
531
- const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
532
- await s3.uploadFile(chunkKey, chunkCSV, { contentType: 'text/csv' });
533
-
534
- const metadata: ChunkMetadata = {
535
- chunkId,
536
- startRecord: recordNumber - currentChunk.length,
537
- endRecord: recordNumber - 1,
538
- s3Key: chunkKey,
539
- recordCount: currentChunk.length,
540
- status: 'pending',
541
- };
542
-
543
- chunks.push(metadata);
544
- await kv.set(['chunk', workflowId, chunkId], metadata);
545
- }
546
-
547
- return chunks;
548
- }
549
-
550
- /**
551
- * Process a single chunk
552
- */
553
- async function processChunk(
554
- s3: S3DataSource,
555
- client: any,
556
- jobId: string,
557
- chunk: ChunkMetadata
558
- ): Promise<void> {
559
- const csvParser = new CSVParserService();
560
- const mapper = new UniversalMapper({
561
- fields: {
562
- skuRef: { source: 'sku', required: true },
563
- locationRef: { source: 'location_code', required: true },
564
- qty: { source: 'quantity', resolver: 'sdk.parseInt' },
565
- expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
566
- },
567
- });
568
-
569
- // Download chunk
570
- const chunkContent = (await s3.downloadFile(chunk.s3Key, {
571
- encoding: 'utf8',
572
- })) as string;
573
-
574
- // Parse chunk
575
- const records = await csvParser.parse(chunkContent);
576
-
577
- // Map records
578
- const entities: any[] = [];
579
- for (const record of records) {
580
- const mapped = await mapper.map(record);
581
- if (mapped.success && mapped.data) {
582
- entities.push(mapped.data);
583
- }
584
- }
585
-
586
- // Send batch
587
- await client.sendBatch(jobId, { entities });
588
-
589
- logger.info('Chunk batch sent', {
590
- chunkId: chunk.chunkId,
591
- entityCount: entities.length,
592
- });
593
- }
594
- ```
595
-
596
- **VersoriKV Schema:**
597
-
598
- ```typescript
599
- // Chunk metadata
600
- ['chunk', workflowId, chunkId] => ChunkMetadata
601
-
602
- // Chunk status
603
- ['chunk', workflowId, chunkId, 'status'] => 'pending' | 'processing' | 'completed' | 'failed'
604
-
605
- // Workflow state
606
- ['state', workflowId, 'sync'] => SyncState
607
- ```
608
-
609
- **Performance:**
610
-
611
- ```
612
- File Size: 5GB (10M records)
613
- Chunk Size: 100K records
614
- Total Chunks: 100
615
- Processing Time: ~60 minutes (sequential)
616
- RAM Usage: ~100MB (processes one chunk at a time)
617
- ```
618
-
619
- ---
620
-
621
- ## Pattern 3: Parallel Processing (High Performance)
622
-
623
- **Best for:** 5M-10M records, time-sensitive ingestion, need speed with reliability
624
-
625
- **Strategy:**
626
-
627
- 1. Split file into chunks (same as Pattern 2)
628
- 2. Spawn 5 parallel Batch API jobs
629
- 3. Process chunks concurrently
630
- 4. Track progress in VersoriKV
631
- 5. Resume on failure
632
-
633
- ### Implementation
634
-
635
- ```typescript
636
- import {
637
- createClient,
638
- CSVParserService,
639
- S3DataSource,
640
- UniversalMapper,
641
- StateService,
642
- VersoriKVAdapter,
643
- createConsoleLogger,
644
- toStructuredLogger
645
- } from '@fluentcommerce/fc-connect-sdk';
646
-
647
- const logger = createConsoleLogger();
648
-
649
- interface ParallelJob {
650
- jobId: string;
651
- assignedChunks: string[];
652
- status: 'pending' | 'processing' | 'completed' | 'failed';
653
- recordsProcessed: number;
654
- startedAt?: string;
655
- completedAt?: string;
656
- }
657
-
658
- async function parallelIngestion(ctx: any) {
659
- logger.info('Starting parallel ingestion');
660
-
661
- // Initialize services
662
- const client = await createClient(ctx);
663
-
664
- const s3 = new S3DataSource(
665
- {
666
- type: 'S3_CSV',
667
- connectionId: 'my-s3-parallel',
668
- name: 'Inventory Files S3 Parallel',
669
- s3Config: {
670
- bucket: 'inventory-files',
671
- region: 'us-east-1',
672
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
673
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
674
- },
675
- },
676
- logger
677
- );
678
-
679
- const kv = context.openKv();
680
- const kvAdapter = new VersoriKVAdapter(kv);
681
- const stateService = new StateService(logger);
682
-
683
- const SOURCE_FILE = 'inventory/huge-inventory.csv';
684
- const CHUNK_SIZE = 100000; // 100K records per chunk
685
- const PARALLEL_JOBS = 5; // Process 5 chunks concurrently
686
- const workflowId = 'parallel-ingestion';
687
-
688
- // STEP 1: Split file into chunks (reuse from Pattern 2)
689
- const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
690
-
691
- logger.info('File split complete', {
692
- totalChunks: chunks.length,
693
- totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
694
- });
695
-
696
- // STEP 2: Create multiple jobs for parallel processing
697
- const jobs: ParallelJob[] = [];
698
-
699
- for (let i = 0; i < PARALLEL_JOBS; i++) {
700
- const job = await client.createJob({
701
- name: `parallel-inventory-ingestion-job-${i + 1}`,
702
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
703
- });
704
-
705
- jobs.push({
706
- jobId: job.id,
707
- assignedChunks: [],
708
- status: 'pending',
709
- recordsProcessed: 0,
710
- });
711
-
712
- logger.info('Parallel job created', {
713
- jobNumber: i + 1,
714
- jobId: job.id,
715
- });
716
- }
717
-
718
- // STEP 3: Distribute chunks across jobs (round-robin)
719
- chunks.forEach((chunk, index) => {
720
- const jobIndex = index % PARALLEL_JOBS;
721
- jobs[jobIndex].assignedChunks.push(chunk.chunkId);
722
- });
723
-
724
- logger.info('Chunks distributed', {
725
- totalChunks: chunks.length,
726
- jobCount: PARALLEL_JOBS,
727
- chunksPerJob: jobs.map(j => j.assignedChunks.length),
728
- });
729
-
730
- // STEP 4: Process chunks in parallel
731
- const startTime = Date.now();
732
-
733
- const jobPromises = jobs.map((job, jobIndex) =>
734
- processJobChunks(
735
- s3,
736
- client,
737
- job,
738
- chunks.filter(c => job.assignedChunks.includes(c.chunkId)),
739
- workflowId,
740
- kvAdapter,
741
- jobIndex + 1
742
- )
743
- );
744
-
745
- // Wait for all jobs to complete
746
- const results = await Promise.allSettled(jobPromises);
747
- const duration = (Date.now() - startTime) / 1000;
748
-
749
- // STEP 5: Analyze results
750
- let successfulJobs = 0;
751
- let failedJobs = 0;
752
- let totalRecordsProcessed = 0;
753
-
754
- results.forEach((result, index) => {
755
- if (result.status === 'fulfilled') {
756
- successfulJobs++;
757
- totalRecordsProcessed += result.value.recordsProcessed;
758
-
759
- logger.info('Job completed', {
760
- jobNumber: index + 1,
761
- jobId: jobs[index].jobId,
762
- recordsProcessed: result.value.recordsProcessed,
763
- chunksProcessed: result.value.chunksProcessed,
764
- });
765
- } else {
766
- failedJobs++;
767
- logger.error('Job failed', result.reason, {
768
- jobNumber: index + 1,
769
- jobId: jobs[index].jobId,
770
- });
771
- }
772
- });
773
-
774
- // STEP 6: Update final state
775
- await stateService.updateSyncState(
776
- kvAdapter,
777
- [
778
- {
779
- fileName: SOURCE_FILE,
780
- lastModified: new Date().toISOString(),
781
- recordCount: totalRecordsProcessed,
782
- },
783
- ],
784
- workflowId
785
- );
786
-
787
- logger.info('Parallel ingestion complete', {
788
- totalChunks: chunks.length,
789
- parallelJobs: PARALLEL_JOBS,
790
- successfulJobs,
791
- failedJobs,
792
- totalRecordsProcessed,
793
- durationSeconds: duration,
794
- recordsPerSecond: Math.round(totalRecordsProcessed / duration),
795
- });
796
-
797
- return {
798
- success: failedJobs === 0,
799
- totalChunks: chunks.length,
800
- totalRecordsProcessed,
801
- successfulJobs,
802
- failedJobs,
803
- durationSeconds: duration,
804
- recordsPerSecond: Math.round(totalRecordsProcessed / duration),
805
- };
806
- }
807
-
808
- /**
809
- * Process all chunks assigned to a job
810
- */
811
- async function processJobChunks(
812
- s3: S3DataSource,
813
- client: any,
814
- job: ParallelJob,
815
- chunks: ChunkMetadata[],
816
- workflowId: string,
817
- kv: VersoriKVAdapter,
818
- jobNumber: number
819
- ): Promise<{ recordsProcessed: number; chunksProcessed: number }> {
820
- logger.info(`Job ${jobNumber} starting`, {
821
- jobId: job.jobId,
822
- assignedChunks: chunks.length,
823
- });
824
-
825
- let recordsProcessed = 0;
826
- let chunksProcessed = 0;
827
-
828
- for (const chunk of chunks) {
829
- try {
830
- // Check if chunk already processed
831
- const chunkState = await kv.get(['chunk', workflowId, chunk.chunkId, 'status']);
832
-
833
- if (chunkState?.value === 'completed') {
834
- logger.info(`Job ${jobNumber}: Chunk already processed`, {
835
- chunkId: chunk.chunkId,
836
- });
837
- chunksProcessed++;
838
- continue;
839
- }
840
-
841
- // Mark chunk as processing
842
- await kv.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
843
-
844
- logger.info(`Job ${jobNumber}: Processing chunk`, {
845
- chunkId: chunk.chunkId,
846
- recordCount: chunk.recordCount,
847
- progress: `${chunksProcessed}/${chunks.length}`,
848
- });
849
-
850
- // Process chunk
851
- await processChunk(s3, client, job.jobId, chunk);
852
-
853
- // Mark chunk as completed
854
- await kv.set(['chunk', workflowId, chunk.chunkId], {
855
- ...chunk,
856
- status: 'completed',
857
- processedAt: new Date().toISOString(),
858
- } as ChunkMetadata);
859
-
860
- recordsProcessed += chunk.recordCount;
861
- chunksProcessed++;
862
-
863
- logger.info(`Job ${jobNumber}: Chunk completed`, {
864
- chunkId: chunk.chunkId,
865
- recordsProcessed,
866
- chunksProcessed,
867
- percentComplete: ((chunksProcessed / chunks.length) * 100).toFixed(1),
868
- });
869
- } catch (error) {
870
- logger.error(`Job ${jobNumber}: Chunk failed`, error as Error, {
871
- chunkId: chunk.chunkId,
872
- });
873
-
874
- // Mark chunk as failed (don't throw - continue with remaining chunks)
875
- await kv.set(['chunk', workflowId, chunk.chunkId], {
876
- ...chunk,
877
- status: 'failed',
878
- error: (error as Error).message,
879
- } as ChunkMetadata);
880
- }
881
- }
882
-
883
- logger.info(`Job ${jobNumber} completed`, {
884
- jobId: job.jobId,
885
- recordsProcessed,
886
- chunksProcessed,
887
- });
888
-
889
- return { recordsProcessed, chunksProcessed };
890
- }
891
- ```
892
-
893
- **Progress Tracking:**
894
-
895
- ```typescript
896
- // Real-time progress query
897
- async function getIngestionProgress(
898
- workflowId: string,
899
- kv: VersoriKVAdapter
900
- ): Promise<{
901
- totalChunks: number;
902
- completedChunks: number;
903
- failedChunks: number;
904
- processingChunks: number;
905
- percentComplete: number;
906
- }> {
907
- // This would query all chunk statuses from KV
908
- // Simplified example:
909
- const chunks = await getAllChunkMetadata(workflowId, kv);
910
-
911
- const completed = chunks.filter(c => c.status === 'completed').length;
912
- const failed = chunks.filter(c => c.status === 'failed').length;
913
- const processing = chunks.filter(c => c.status === 'processing').length;
914
-
915
- return {
916
- totalChunks: chunks.length,
917
- completedChunks: completed,
918
- failedChunks: failed,
919
- processingChunks: processing,
920
- percentComplete: (completed / chunks.length) * 100,
921
- };
922
- }
923
- ```
924
-
925
- **Performance:**
926
-
927
- ```
928
- File Size: 5GB (10M records)
929
- Chunk Size: 100K records
930
- Total Chunks: 100
931
- Parallel Jobs: 5
932
- Processing Time: ~15 minutes (4x speedup)
933
- RAM Usage: ~500MB (5 chunks in parallel)
934
- Throughput: ~11,111 records/second
935
- ```
936
-
937
- ---
938
-
939
- ## Pattern 4: Distributed Processing (Versori Workflows)
940
-
941
- **Best for:** 10M+ records, enterprise scale, need maximum reliability and observability
942
-
943
- **Strategy:**
944
-
945
- 1. Coordinator workflow splits file and creates scheduled tasks
946
- 2. Each worker workflow processes one chunk
947
- 3. Coordinator tracks completion via VersoriKV
948
- 4. Automatic retry on worker failure
949
-
950
- ### Coordinator Workflow
951
-
952
- ```typescript
953
- import { fn, schedule } from '@versori/run';
954
- import {
955
- createClient,
956
- S3DataSource,
957
- VersoriKVAdapter,
958
- createConsoleLogger,
959
- toStructuredLogger
960
- } from '@fluentcommerce/fc-connect-sdk';
961
-
962
- const logger = createConsoleLogger();
963
-
964
- /**
965
- * Coordinator workflow - splits file and spawns workers
966
- */
967
- export const coordinatorWorkflow = schedule('coordinator')
968
- .cron('0 2 * * *') // Run daily at 2 AM
969
- .then(
970
- fn('split-and-schedule', async ({ activation, connections, kv }) => {
971
- logger.info('Coordinator: Starting distributed ingestion');
972
-
973
- const s3 = new S3DataSource(
974
- {
975
- type: 'S3_CSV',
976
- connectionId: 'my-s3-3',
977
- name: 'Inventory Files S3 3',
978
- s3Config: {
979
- bucket: 'inventory-files',
980
- region: 'us-east-1',
981
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
982
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
983
- },
984
- },
985
- logger
986
- );
987
-
988
- const kvAdapter = new VersoriKVAdapter(kv);
989
- const workflowId = `distributed-${Date.now()}`;
990
- const SOURCE_FILE = 'inventory/enterprise-inventory.csv';
991
- const CHUNK_SIZE = 100000;
992
-
993
- // Split file into chunks
994
- const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
995
-
996
- logger.info('Coordinator: File split complete', {
997
- totalChunks: chunks.length,
998
- workflowId,
999
- });
1000
-
1001
- // Store coordinator state
1002
- await kvAdapter.set(['coordinator', workflowId], {
1003
- workflowId,
1004
- sourceFile: SOURCE_FILE,
1005
- totalChunks: chunks.length,
1006
- status: 'scheduled',
1007
- createdAt: new Date().toISOString(),
1008
- });
1009
-
1010
- // Schedule worker for each chunk
1011
- for (const chunk of chunks) {
1012
- // Trigger worker workflow (Versori will handle scheduling)
1013
- await activation.triggerWorkflow('chunk-worker', {
1014
- workflowId,
1015
- chunkId: chunk.chunkId,
1016
- chunkKey: chunk.s3Key,
1017
- recordCount: chunk.recordCount,
1018
- });
1019
-
1020
- logger.info('Coordinator: Worker scheduled', {
1021
- chunkId: chunk.chunkId,
1022
- workflowId,
1023
- });
1024
- }
1025
-
1026
- return {
1027
- workflowId,
1028
- totalChunks: chunks.length,
1029
- message: `Scheduled ${chunks.length} worker workflows`,
1030
- };
1031
- })
1032
- );
1033
-
1034
- /**
1035
- * Monitor workflow - checks completion status
1036
- */
1037
- export const monitorWorkflow = schedule('monitor')
1038
- .cron('*/5 * * * *') // Run every 5 minutes
1039
- .then(
1040
- fn('check-progress', async ({ kv }) => {
1041
- const kvAdapter = new VersoriKVAdapter(kv);
1042
-
1043
- // Get all active coordinators
1044
- const coordinators = await getActiveCoordinators(kvAdapter);
1045
-
1046
- for (const coordinator of coordinators) {
1047
- const progress = await getIngestionProgress(coordinator.workflowId, kvAdapter);
1048
-
1049
- logger.info('Monitor: Progress update', {
1050
- workflowId: coordinator.workflowId,
1051
- ...progress,
1052
- });
1053
-
1054
- // Check if complete
1055
- if (progress.completedChunks + progress.failedChunks === progress.totalChunks) {
1056
- // Mark coordinator as complete
1057
- await kvAdapter.set(['coordinator', coordinator.workflowId], {
1058
- ...coordinator,
1059
- status: 'completed',
1060
- completedAt: new Date().toISOString(),
1061
- progress,
1062
- });
1063
-
1064
- logger.info('Monitor: Ingestion complete', {
1065
- workflowId: coordinator.workflowId,
1066
- ...progress,
1067
- });
1068
- }
1069
- }
1070
-
1071
- return { coordinatorsChecked: coordinators.length };
1072
- })
1073
- );
1074
- ```
1075
-
1076
- ### Worker Workflow
1077
-
1078
- ```typescript
1079
- import { fn, webhook } from '@versori/run';
1080
- import {
1081
- createClient,
1082
- S3DataSource,
1083
- CSVParserService,
1084
- UniversalMapper,
1085
- VersoriKVAdapter,
1086
- createConsoleLogger,
1087
- toStructuredLogger
1088
- } from '@fluentcommerce/fc-connect-sdk';
1089
-
1090
- const logger = createConsoleLogger();
1091
-
1092
- /**
1093
- * Worker workflow - processes a single chunk
1094
- */
1095
- export const chunkWorker = webhook('chunk-worker').then(
1096
- fn('process-chunk', async ({ data, activation, connections, kv }) => {
1097
- const { workflowId, chunkId, chunkKey, recordCount } = data;
1098
-
1099
- logger.info('Worker: Starting chunk processing', {
1100
- workflowId,
1101
- chunkId,
1102
- recordCount,
1103
- });
1104
-
1105
- const kvAdapter = new VersoriKVAdapter(kv);
1106
-
1107
- // Check if already processed
1108
- const chunkState = await kvAdapter.get(['chunk', workflowId, chunkId, 'status']);
1109
-
1110
- if (chunkState?.value === 'completed') {
1111
- logger.info('Worker: Chunk already processed', { chunkId });
1112
- return { chunkId, status: 'skipped', message: 'Already processed' };
1113
- }
1114
-
1115
- // Mark as processing
1116
- await kvAdapter.set(['chunk', workflowId, chunkId, 'status'], 'processing');
1117
-
1118
- try {
1119
- // Initialize services
1120
- const client = await createClient(ctx);
1121
-
1122
- const s3 = new S3DataSource(
1123
- {
1124
- type: 'S3_CSV',
1125
- connectionId: 'my-s3-4',
1126
- name: 'Inventory Files S3 4',
1127
- s3Config: {
1128
- bucket: 'inventory-files',
1129
- region: 'us-east-1',
1130
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
1131
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
1132
- },
1133
- },
1134
- logger
1135
- );
1136
-
1137
- const csvParser = new CSVParserService();
1138
- const mapper = new UniversalMapper({
1139
- fields: {
1140
- skuRef: { source: 'sku', required: true },
1141
- locationRef: { source: 'location_code', required: true },
1142
- qty: { source: 'quantity', resolver: 'sdk.parseInt' },
1143
- expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
1144
- },
1145
- });
1146
-
1147
- // Get or create job for this workflow
1148
- let jobId = await kvAdapter.get(['job', workflowId, 'jobId']);
1149
-
1150
- if (!jobId?.value) {
1151
- const job = await client.createJob({
1152
- name: `distributed-ingestion-${workflowId}`,
1153
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
1154
- });
1155
-
1156
- await kvAdapter.set(['job', workflowId, 'jobId'], job.id);
1157
- jobId = { value: job.id };
1158
- }
1159
-
1160
- // Download chunk
1161
- const chunkContent = (await s3.downloadFile(chunkKey, {
1162
- encoding: 'utf8',
1163
- })) as string;
1164
-
1165
- // Parse chunk
1166
- const records = await csvParser.parse(chunkContent);
1167
-
1168
- // Map records
1169
- const entities: any[] = [];
1170
- for (const record of records) {
1171
- const mapped = await mapper.map(record);
1172
- if (mapped.success && mapped.data) {
1173
- entities.push(mapped.data);
1174
- }
1175
- }
1176
-
1177
- // Send batch
1178
- await client.sendBatch(jobId.value as string, { entities });
1179
-
1180
- // Mark as completed
1181
- await kvAdapter.set(['chunk', workflowId, chunkId], {
1182
- chunkId,
1183
- s3Key: chunkKey,
1184
- recordCount: entities.length,
1185
- status: 'completed',
1186
- processedAt: new Date().toISOString(),
1187
- });
1188
-
1189
- logger.info('Worker: Chunk completed', {
1190
- workflowId,
1191
- chunkId,
1192
- recordCount: entities.length,
1193
- });
1194
-
1195
- return {
1196
- chunkId,
1197
- status: 'completed',
1198
- recordsProcessed: entities.length,
1199
- };
1200
- } catch (error) {
1201
- logger.error('Worker: Chunk failed', error as Error, {
1202
- workflowId,
1203
- chunkId,
1204
- });
1205
-
1206
- // Mark as failed
1207
- await kvAdapter.set(['chunk', workflowId, chunkId], {
1208
- chunkId,
1209
- s3Key: chunkKey,
1210
- recordCount,
1211
- status: 'failed',
1212
- error: (error as Error).message,
1213
- });
1214
-
1215
- throw error;
1216
- }
1217
- })
1218
- );
1219
- ```
1220
-
1221
- **Performance:**
1222
-
1223
- ```
1224
- File Size: 10GB (20M records)
1225
- Chunk Size: 100K records
1226
- Total Chunks: 200
1227
- Worker Workflows: 200 (parallel)
1228
- Processing Time: ~10 minutes (Versori handles parallelism)
1229
- RAM Usage: ~50MB per worker
1230
- Throughput: ~33,333 records/second
1231
- ```
1232
-
1233
- ---
1234
-
1235
- ## Memory Optimization Tips
1236
-
1237
- ### 1. Use Streaming APIs
1238
-
1239
- ```typescript
1240
- // ❌ WRONG - Loads entire file into memory
1241
- const fileContent = await fs.readFile('huge.csv', 'utf-8');
1242
- const records = await csvParser.parse(fileContent);
1243
-
1244
- // ✅ CORRECT - Streams records incrementally
1245
- for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
1246
- await processRecord(record);
1247
- }
1248
- ```
1249
-
1250
- ### 2. Clear Batches After Processing
1251
-
1252
- ```typescript
1253
- let batch: any[] = [];
1254
- for await (const record of records) {
1255
- batch.push(record);
1256
-
1257
- if (batch.length >= 1000) {
1258
- await sendBatch(batch);
1259
- batch = []; // ✅ Clear batch to free memory
1260
- }
1261
- }
1262
- ```
1263
-
1264
- ### 3. Monitor Memory Usage
1265
-
1266
- ```typescript
1267
- function logMemoryUsage() {
1268
- const used = process.memoryUsage();
1269
- console.log({
1270
- heapUsed: Math.round(used.heapUsed / 1024 / 1024) + ' MB',
1271
- heapTotal: Math.round(used.heapTotal / 1024 / 1024) + ' MB',
1272
- rss: Math.round(used.rss / 1024 / 1024) + ' MB',
1273
- });
1274
- }
1275
-
1276
- // Log every 10K records
1277
- if (recordsProcessed % 10000 === 0) {
1278
- logMemoryUsage();
1279
- }
1280
- ```
1281
-
1282
- ### 4. Use Garbage Collection Hints
1283
-
1284
- ```typescript
1285
- // Force garbage collection (requires --expose-gc flag)
1286
- if (recordsProcessed % 100000 === 0 && global.gc) {
1287
- global.gc();
1288
- logger.info('Garbage collection triggered', { recordsProcessed });
1289
- }
1290
- ```
1291
-
1292
- ---
1293
-
1294
- ## Performance Benchmarks
1295
-
1296
- ### Pattern Comparison (10M records, 5GB file)
1297
-
1298
- | Pattern | Time | RAM | Throughput | Complexity |
1299
- | ------------------------- | ------ | ------ | -------------- | ---------- |
1300
- | 1. Basic Streaming | 90 min | 50MB | 1,852 rec/sec | Low |
1301
- | 2. File Chunking | 60 min | 100MB | 2,778 rec/sec | Medium |
1302
- | 3. Parallel Processing | 15 min | 500MB | 11,111 rec/sec | High |
1303
- | 4. Distributed Processing | 10 min | 50MB\* | 16,667 rec/sec | Very High |
1304
-
1305
- \*Per worker; total RAM = 50MB × worker count
1306
-
1307
- ### Optimization Impact
1308
-
1309
- | Optimization | Before | After | Improvement |
1310
- | ------------------------- | ------- | -------- | ----------- |
1311
- | Streaming vs Loading | 5GB RAM | 50MB RAM | 100x |
1312
- | Batching (1K vs 10K) | 90 min | 60 min | 1.5x |
1313
- | Parallel (1 vs 5 jobs) | 60 min | 15 min | 4x |
1314
- | Distributed (200 workers) | 15 min | 10 min | 1.5x |
1315
-
1316
- ---
1317
-
1318
- ## Common Issues & Solutions
1319
-
1320
- ### Issue 1: Out of Memory
1321
-
1322
- **Symptoms:**
1323
-
1324
- ```
1325
- FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1326
- ```
1327
-
1328
- **Solutions:**
1329
-
1330
- 1. Switch to streaming pattern (Pattern 1)
1331
- 2. Reduce batch size (1000 => 500)
1332
- 3. Increase Node.js heap: `node --max-old-space-size=4096`
1333
- 4. Use file chunking (Pattern 2)
1334
-
1335
- ### Issue 2: Timeout on Large Files
1336
-
1337
- **Symptoms:**
1338
-
1339
- ```
1340
- TimeoutError: Operation timed out after 300000ms
1341
- ```
1342
-
1343
- **Solutions:**
1344
-
1345
- 1. Increase timeout: `config.timeout = 600000` (10 min)
1346
- 2. Split file into chunks (Pattern 2)
1347
- 3. Use parallel processing (Pattern 3)
1348
-
1349
- ### Issue 3: Chunks Not Resuming
1350
-
1351
- **Symptoms:**
1352
-
1353
- - Re-processing same chunks on failure
1354
-
1355
- **Solutions:**
1356
-
1357
- ```typescript
1358
- // Check chunk status before processing
1359
- const chunkState = await kv.get(['chunk', workflowId, chunkId, 'status']);
1360
- if (chunkState?.value === 'completed') {
1361
- logger.info('Chunk already processed, skipping', { chunkId });
1362
- continue;
1363
- }
1364
- ```
1365
-
1366
- ### Issue 4: Progress Tracking Inconsistent
1367
-
1368
- **Symptoms:**
1369
-
1370
- - Progress percentage doesn't match reality
1371
-
1372
- **Solutions:**
1373
-
1374
- ```typescript
1375
- // Always update chunk status atomically
1376
- const atomic = kv.atomic();
1377
- atomic.set(['chunk', workflowId, chunkId, 'status'], 'completed');
1378
- atomic.set(['chunk', workflowId, chunkId, 'processedAt'], new Date().toISOString());
1379
- await atomic.commit();
1380
- ```
1381
-
1382
- ### Issue 5: Duplicate Processing
1383
-
1384
- **Symptoms:**
1385
-
1386
- - Same records sent multiple times
1387
-
1388
- **Solutions:**
1389
-
1390
- ```typescript
1391
- // Use idempotency keys in Fluent batch payload
1392
- await client.sendBatch(jobId, {
1393
- entities,
1394
- meta: {
1395
- chunkId: chunk.chunkId,
1396
- workflowId,
1397
- idempotencyKey: `${workflowId}-${chunk.chunkId}`,
1398
- },
1399
- });
1400
- ```
1401
-
1402
- ---
1403
-
1404
- ## Related Guides
1405
-
1406
- - [Basic Ingestion Pattern](../standalone/s3-csv-batch-api.md) - For small files (<100K records)
1407
- - [Streaming Pattern](../../02-CORE-GUIDES/ingestion/ingestion-readme.md) - For medium files (100K-1M records)
1408
- - [Error Handling & Retry](./error-handling-retry.md) - Robust error handling strategies
1409
- - [Progress Tracking](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md) - Real-time progress monitoring
1410
- - [State Management](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md) - VersoriKV patterns
1411
-
1412
- ---
1413
-
1414
- ## Summary
1415
-
1416
- **Choose Your Pattern:**
1417
-
1418
- - **Pattern 1 (Streaming)**: Simple, memory-efficient, suitable for 100K-1M records
1419
- - **Pattern 2 (Chunking)**: Checkpoint/resume, suitable for 1M-5M records
1420
- - **Pattern 3 (Parallel)**: High performance, suitable for 5M-10M records
1421
- - **Pattern 4 (Distributed)**: Enterprise scale, suitable for 10M+ records
1422
-
1423
- **Key Takeaways:**
1424
-
1425
- 1. Always use streaming APIs for large files
1426
- 2. Clear batches after processing to free memory
1427
- 3. Use chunks + VersoriKV for checkpoint/resume
1428
- 4. Parallel processing trades RAM for speed
1429
- 5. Monitor memory usage throughout processing
1430
- 6. Test with representative file sizes before production
1
+ # Pattern: Large File Processing & Chunking
2
+
3
+ **FC Connect SDK Use Case Guide**
4
+
5
+ > **SDK**: [@fluentcommerce/fc-connect-sdk](https://www.npmjs.com/package/@fluentcommerce/fc-connect-sdk)
6
+ > **Version**: Use latest - `npm install @fluentcommerce/fc-connect-sdk@latest`
7
+
8
+ **Context**: Enterprise-scale file ingestion with streaming, splitting, and parallel processing
9
+
10
+ **Type**: Advanced Pattern
11
+
12
+ **Complexity**: High
13
+
14
+ **Volume**: 500MB-5GB files, 1M-10M records
15
+
16
+ **Latency**: Batch processing (< 30-60 min for 10M records)
17
+
18
+ **Pattern**: Streaming + chunking + parallel Batch API
19
+
20
+ ## When to Use This Pattern
21
+
22
+ Use this pattern when dealing with:
23
+
24
+ - **Large CSV files** (>500MB, >1M records)
25
+ - **Memory-constrained environments** (Lambda, containers with limited RAM)
26
+ - **Time-sensitive ingestion** (need parallel processing for speed)
27
+ - **Reliability requirements** (checkpoint/resume on failure)
28
+ - **Progress tracking** (real-time status updates)
29
+
30
+ **Volume Guidance:**
31
+
32
+ - **Small** (<1K records): Use basic ingestion pattern
33
+ - **Medium** (1K-100K records): Use streaming pattern (Pattern 1)
34
+ - **Large** (100K-1M records): Use file chunking pattern (Pattern 2)
35
+ - **Huge** (1M-10M records): Use parallel processing pattern (Pattern 3)
36
+ - **Enterprise** (10M+ records): Use distributed processing pattern (Pattern 4)
37
+
38
+ ## Problem Statement
39
+
40
+ ### Why Splitting is Needed
41
+
42
+ **Memory Constraints:**
43
+
44
+ ```typescript
45
+ // ❌ WRONG - Loads entire 2GB file into memory
46
+ const csvContent = await fs.readFile('huge-inventory.csv', 'utf-8');
47
+ const records = await csvParser.parse(csvContent); // 💥 Out of memory
48
+ ```
49
+
50
+ **Impact:**
51
+
52
+ - Lambda 512MB: Crashes on 500MB+ files
53
+ - Container 1GB: Struggles with 1GB+ files
54
+ - Node.js default heap (4GB): Fails on 5GB+ files
55
+
56
+ **Time Constraints:**
57
+
58
+ ```typescript
59
+ // ❌ WRONG - Sequential processing takes 90+ minutes
60
+ for (const record of records) {
61
+ await processRecord(record); // Too slow for 10M records
62
+ }
63
+ ```
64
+
65
+ **Reliability Requirements:**
66
+
67
+ ```typescript
68
+ // ❌ WRONG - Network failure loses all progress
69
+ await processAllRecords(records); // If fails at record 5M, restart from 0
70
+ ```
71
+
72
+ ### Solution Overview
73
+
74
+ This guide demonstrates 4 progressive patterns:
75
+
76
+ 1. **Basic Streaming** (~200 lines) - Process records as they arrive, memory-efficient
77
+ 2. **File Chunking** (~300 lines) - Split large files into manageable chunks
78
+ 3. **Parallel Processing** (~400 lines) - Process chunks concurrently with progress tracking
79
+ 4. **Distributed Processing** (~300 lines) - Use Versori scheduled workflows for enterprise scale
80
+
81
+ ## SDK Methods Used
82
+
83
+ ```typescript
84
+ import {
85
+ createClient,
86
+ // Client factory (auto-detects context)
87
+ CSVParserService,
88
+ // Streaming CSV parser
89
+ S3DataSource,
90
+ // S3 file operations
91
+ UniversalMapper,
92
+ // Field mapping
93
+ StateService,
94
+ // Progress tracking
95
+ VersoriKVAdapter,
96
+ // Versori state management,
97
+ // Structured logging,
98
+ createConsoleLogger,
99
+ toStructuredLogger
100
+ } from '@fluentcommerce/fc-connect-sdk';
101
+ ```
102
+
103
+ ---
104
+
105
+ ## Pattern 1: Basic Streaming (Memory-Efficient)
106
+
107
+ **Best for:** 100K-1M records, single-threaded processing, memory-constrained environments
108
+
109
+ **Memory Usage:**
110
+
111
+ - ❌ Without streaming: 2GB file = 2GB+ RAM (file + parsed objects)
112
+ - ✅ With streaming: 2GB file = ~50MB RAM (processes records incrementally)
113
+
114
+ ### Implementation
115
+
116
+ ```typescript
117
+ import {
118
+ createClient,
119
+ CSVParserService,
120
+ S3DataSource,
121
+ UniversalMapper,
122
+ createConsoleLogger,
123
+ toStructuredLogger
124
+ } from '@fluentcommerce/fc-connect-sdk';
125
+
126
+ const logger = createConsoleLogger();
127
+
128
+ async function streamingIngestion(ctx: any) {
129
+ logger.info('Starting streaming ingestion');
130
+
131
+ // Create client (auto-detects Versori context)
132
+ const client = await createClient(ctx);
133
+
134
+ // Initialize S3 data source
135
+ const s3 = new S3DataSource(
136
+ {
137
+ type: 'S3_CSV',
138
+ connectionId: 'my-s3',
139
+ name: 'Inventory Files S3',
140
+ s3Config: {
141
+ bucket: 'inventory-files',
142
+ region: 'us-east-1',
143
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
144
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
145
+ },
146
+ },
147
+ logger
148
+ );
149
+
150
+ // Define field mapping
151
+ const mapper = new UniversalMapper({
152
+ fields: {
153
+ skuRef: { source: 'sku', required: true },
154
+ locationRef: { source: 'location_code', required: true },
155
+ qty: { source: 'quantity', resolver: 'sdk.parseInt' },
156
+ expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
157
+ },
158
+ });
159
+
160
+ // Create CSV parser with streaming enabled
161
+ const csvParser = new CSVParserService();
162
+
163
+ // Download file as stream (not loaded into memory)
164
+ logger.info('Downloading file from S3', {
165
+ key: 'inventory/large-file.csv',
166
+ });
167
+
168
+ const fileContent = (await s3.downloadFile('inventory/large-file.csv', {
169
+ encoding: 'utf8',
170
+ })) as string;
171
+
172
+ // Create job for batch ingestion
173
+ const job = await client.createJob({
174
+ name: 'streaming-inventory-ingestion',
175
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
176
+ });
177
+
178
+ logger.info('Job created', { jobId: job.id });
179
+
180
+ // Statistics tracking
181
+ let recordsProcessed = 0;
182
+ let batchCount = 0;
183
+ let errors = 0;
184
+ const BATCH_SIZE = 1000;
185
+ let currentBatch: any[] = [];
186
+
187
+ // Stream records with batching (memory-efficient)
188
+ // Records are parsed incrementally, not all at once
189
+ for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
190
+ try {
191
+ // Map record
192
+ const mapped = await mapper.map(record);
193
+
194
+ if (mapped.success && mapped.data) {
195
+ currentBatch.push(mapped.data);
196
+ recordsProcessed++;
197
+
198
+ // Send batch when full
199
+ if (currentBatch.length >= BATCH_SIZE) {
200
+ await client.sendBatch(job.id, {
201
+ entities: currentBatch,
202
+ });
203
+
204
+ batchCount++;
205
+
206
+ logger.info('Batch sent', {
207
+ batchNumber: batchCount,
208
+ recordsProcessed,
209
+ currentBatchSize: currentBatch.length,
210
+ });
211
+
212
+ currentBatch = []; // Clear batch (frees memory)
213
+ }
214
+ } else {
215
+ errors++;
216
+ logger.warn('Record mapping failed', {
217
+ record,
218
+ errors: mapped.errors,
219
+ });
220
+ }
221
+ } catch (error) {
222
+ errors++;
223
+ logger.error('Record processing failed', error as Error, { record });
224
+ }
225
+
226
+ // Progress logging every 10K records
227
+ if (recordsProcessed % 10000 === 0) {
228
+ logger.info('Progress update', {
229
+ recordsProcessed,
230
+ batchesSent: batchCount,
231
+ errors,
232
+ memoryUsage: process.memoryUsage().heapUsed / 1024 / 1024 + ' MB',
233
+ });
234
+ }
235
+ }
236
+
237
+ // Send remaining records
238
+ if (currentBatch.length > 0) {
239
+ await client.sendBatch(job.id, {
240
+ entities: currentBatch,
241
+ });
242
+ batchCount++;
243
+ }
244
+
245
+ logger.info('Streaming ingestion complete', {
246
+ totalRecords: recordsProcessed,
247
+ batchesSent: batchCount,
248
+ errors,
249
+ jobId: job.id,
250
+ });
251
+
252
+ return {
253
+ success: true,
254
+ jobId: job.id,
255
+ recordsProcessed,
256
+ batchesSent: batchCount,
257
+ errors,
258
+ };
259
+ }
260
+ ```
261
+
262
+ **Memory Profile:**
263
+
264
+ ```
265
+ File Size: 2GB (5M records)
266
+ RAM Usage: ~50MB peak (1000 record batches)
267
+ Processing Time: ~45 minutes (sequential)
268
+ ```
269
+
270
+ ---
271
+
272
+ ## Pattern 2: File Chunking (Split & Track)
273
+
274
+ **Best for:** 1M-5M records, need checkpoint/resume, want progress visibility
275
+
276
+ **Strategy:**
277
+
278
+ 1. Split large file into 100K record chunks
279
+ 2. Write chunks to temp S3 locations
280
+ 3. Track chunk metadata in VersoriKV
281
+ 4. Process chunks sequentially (can resume on failure)
282
+
283
+ ### Implementation
284
+
285
+ ```typescript
286
+ import {
287
+ createClient,
288
+ CSVParserService,
289
+ S3DataSource,
290
+ UniversalMapper,
291
+ StateService,
292
+ VersoriKVAdapter,
293
+ createConsoleLogger,
294
+ toStructuredLogger
295
+ } from '@fluentcommerce/fc-connect-sdk';
296
+
297
+ const logger = createConsoleLogger();
298
+
299
+ interface ChunkMetadata {
300
+ chunkId: string;
301
+ startRecord: number;
302
+ endRecord: number;
303
+ s3Key: string;
304
+ recordCount: number;
305
+ status: 'pending' | 'processing' | 'completed' | 'failed';
306
+ processedAt?: string;
307
+ error?: string;
308
+ }
309
+
310
+ async function chunkedIngestion(ctx: any) {
311
+ logger.info('Starting chunked ingestion');
312
+
313
+ // Initialize services
314
+ const client = await createClient(ctx);
315
+
316
+ const s3 = new S3DataSource(
317
+ {
318
+ type: 'S3_CSV',
319
+ connectionId: 'my-s3-chunked',
320
+ name: 'Inventory Files S3 Chunked',
321
+ s3Config: {
322
+ bucket: 'inventory-files',
323
+ region: 'us-east-1',
324
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
325
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
326
+ },
327
+ },
328
+ logger
329
+ );
330
+
331
+ // Initialize state management
332
+ const kv = context.openKv();
333
+ const kvAdapter = new VersoriKVAdapter(kv);
334
+ const stateService = new StateService(logger);
335
+
336
+ const SOURCE_FILE = 'inventory/huge-inventory.csv';
337
+ const CHUNK_SIZE = 100000; // 100K records per chunk
338
+ const workflowId = 'chunked-ingestion';
339
+
340
+ // STEP 1: Check if chunking is already in progress
341
+ const existingState = await stateService.getSyncState(kvAdapter, workflowId);
342
+
343
+ if (existingState.isInitialized && existingState.lastSyncResult === 'partial') {
344
+ logger.info('Resuming from previous run', {
345
+ lastProcessedFile: existingState.lastProcessedFile,
346
+ lastProcessedCount: existingState.lastProcessedCount,
347
+ });
348
+ }
349
+
350
+ // STEP 2: Split file into chunks
351
+ logger.info('Splitting file into chunks', {
352
+ sourceFile: SOURCE_FILE,
353
+ chunkSize: CHUNK_SIZE,
354
+ });
355
+
356
+ const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
357
+
358
+ logger.info('File split complete', {
359
+ totalChunks: chunks.length,
360
+ totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
361
+ });
362
+
363
+ // STEP 3: Create job for ingestion
364
+ const job = await client.createJob({
365
+ name: `chunked-inventory-ingestion-${Date.now()}`,
366
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
367
+ });
368
+
369
+ logger.info('Job created', { jobId: job.id });
370
+
371
+ // STEP 4: Process each chunk sequentially
372
+ let successCount = 0;
373
+ let failureCount = 0;
374
+
375
+ for (const chunk of chunks) {
376
+ try {
377
+ // Skip if already processed
378
+ const chunkState = await kvAdapter.get(['chunk', workflowId, chunk.chunkId, 'status']);
379
+
380
+ if (chunkState?.value === 'completed') {
381
+ logger.info('Chunk already processed, skipping', {
382
+ chunkId: chunk.chunkId,
383
+ });
384
+ successCount++;
385
+ continue;
386
+ }
387
+
388
+ // Mark chunk as processing
389
+ await kvAdapter.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
390
+
391
+ logger.info('Processing chunk', {
392
+ chunkId: chunk.chunkId,
393
+ recordCount: chunk.recordCount,
394
+ progress: `${successCount + failureCount}/${chunks.length}`,
395
+ });
396
+
397
+ // Process chunk
398
+ await processChunk(s3, client, job.id, chunk);
399
+
400
+ // Mark chunk as completed
401
+ await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
402
+ ...chunk,
403
+ status: 'completed',
404
+ processedAt: new Date().toISOString(),
405
+ } as ChunkMetadata);
406
+
407
+ successCount++;
408
+
409
+ logger.info('Chunk completed', {
410
+ chunkId: chunk.chunkId,
411
+ successCount,
412
+ failureCount,
413
+ percentComplete: (((successCount + failureCount) / chunks.length) * 100).toFixed(1),
414
+ });
415
+ } catch (error) {
416
+ failureCount++;
417
+ logger.error('Chunk processing failed', error as Error, {
418
+ chunkId: chunk.chunkId,
419
+ });
420
+
421
+ // Mark chunk as failed
422
+ await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
423
+ ...chunk,
424
+ status: 'failed',
425
+ error: (error as Error).message,
426
+ } as ChunkMetadata);
427
+ }
428
+ }
429
+
430
+ // STEP 5: Update final state
431
+ await stateService.updateSyncState(
432
+ kvAdapter,
433
+ [
434
+ {
435
+ fileName: SOURCE_FILE,
436
+ lastModified: new Date().toISOString(),
437
+ recordCount: chunks.reduce((sum, c) => sum + c.recordCount, 0),
438
+ },
439
+ ],
440
+ workflowId
441
+ );
442
+
443
+ logger.info('Chunked ingestion complete', {
444
+ totalChunks: chunks.length,
445
+ successCount,
446
+ failureCount,
447
+ jobId: job.id,
448
+ });
449
+
450
+ return {
451
+ success: failureCount === 0,
452
+ jobId: job.id,
453
+ chunksProcessed: successCount,
454
+ chunksFailed: failureCount,
455
+ totalChunks: chunks.length,
456
+ };
457
+ }
458
+
459
+ /**
460
+ * Split file into chunks and upload to S3
461
+ */
462
+ async function splitFileIntoChunks(
463
+ s3: S3DataSource,
464
+ sourceKey: string,
465
+ chunkSize: number,
466
+ workflowId: string,
467
+ kv: VersoriKVAdapter
468
+ ): Promise<ChunkMetadata[]> {
469
+ const csvParser = new CSVParserService();
470
+ const chunks: ChunkMetadata[] = [];
471
+
472
+ // Download source file
473
+ const fileContent = (await s3.downloadFile(sourceKey, {
474
+ encoding: 'utf8',
475
+ })) as string;
476
+
477
+ let currentChunk: any[] = [];
478
+ let chunkNumber = 0;
479
+ let recordNumber = 0;
480
+
481
+ // Stream through file and create chunks
482
+ for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
483
+ currentChunk.push(record);
484
+ recordNumber++;
485
+
486
+ // Create chunk when size reached
487
+ if (currentChunk.length >= chunkSize) {
488
+ const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
489
+ const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
490
+
491
+ // Convert chunk to CSV
492
+ const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
493
+
494
+ // Upload chunk to S3
495
+ await s3.uploadFile(chunkKey, chunkCSV, {
496
+ contentType: 'text/csv',
497
+ });
498
+
499
+ // Create chunk metadata
500
+ const metadata: ChunkMetadata = {
501
+ chunkId,
502
+ startRecord: recordNumber - currentChunk.length,
503
+ endRecord: recordNumber - 1,
504
+ s3Key: chunkKey,
505
+ recordCount: currentChunk.length,
506
+ status: 'pending',
507
+ };
508
+
509
+ chunks.push(metadata);
510
+
511
+ // Store chunk metadata in KV
512
+ await kv.set(['chunk', workflowId, chunkId], metadata);
513
+
514
+ logger.info('Chunk created', {
515
+ chunkId,
516
+ recordCount: currentChunk.length,
517
+ s3Key: chunkKey,
518
+ });
519
+
520
+ // Clear chunk (free memory)
521
+ currentChunk = [];
522
+ chunkNumber++;
523
+ }
524
+ }
525
+
526
+ // Handle remaining records
527
+ if (currentChunk.length > 0) {
528
+ const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
529
+ const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
530
+
531
+ const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
532
+ await s3.uploadFile(chunkKey, chunkCSV, { contentType: 'text/csv' });
533
+
534
+ const metadata: ChunkMetadata = {
535
+ chunkId,
536
+ startRecord: recordNumber - currentChunk.length,
537
+ endRecord: recordNumber - 1,
538
+ s3Key: chunkKey,
539
+ recordCount: currentChunk.length,
540
+ status: 'pending',
541
+ };
542
+
543
+ chunks.push(metadata);
544
+ await kv.set(['chunk', workflowId, chunkId], metadata);
545
+ }
546
+
547
+ return chunks;
548
+ }
549
+
550
+ /**
551
+ * Process a single chunk
552
+ */
553
+ async function processChunk(
554
+ s3: S3DataSource,
555
+ client: any,
556
+ jobId: string,
557
+ chunk: ChunkMetadata
558
+ ): Promise<void> {
559
+ const csvParser = new CSVParserService();
560
+ const mapper = new UniversalMapper({
561
+ fields: {
562
+ skuRef: { source: 'sku', required: true },
563
+ locationRef: { source: 'location_code', required: true },
564
+ qty: { source: 'quantity', resolver: 'sdk.parseInt' },
565
+ expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
566
+ },
567
+ });
568
+
569
+ // Download chunk
570
+ const chunkContent = (await s3.downloadFile(chunk.s3Key, {
571
+ encoding: 'utf8',
572
+ })) as string;
573
+
574
+ // Parse chunk
575
+ const records = await csvParser.parse(chunkContent);
576
+
577
+ // Map records
578
+ const entities: any[] = [];
579
+ for (const record of records) {
580
+ const mapped = await mapper.map(record);
581
+ if (mapped.success && mapped.data) {
582
+ entities.push(mapped.data);
583
+ }
584
+ }
585
+
586
+ // Send batch
587
+ await client.sendBatch(jobId, { entities });
588
+
589
+ logger.info('Chunk batch sent', {
590
+ chunkId: chunk.chunkId,
591
+ entityCount: entities.length,
592
+ });
593
+ }
594
+ ```
595
+
596
+ **VersoriKV Schema:**
597
+
598
+ ```typescript
599
+ // Chunk metadata
600
+ ['chunk', workflowId, chunkId] => ChunkMetadata
601
+
602
+ // Chunk status
603
+ ['chunk', workflowId, chunkId, 'status'] => 'pending' | 'processing' | 'completed' | 'failed'
604
+
605
+ // Workflow state
606
+ ['state', workflowId, 'sync'] => SyncState
607
+ ```
608
+
609
+ **Performance:**
610
+
611
+ ```
612
+ File Size: 5GB (10M records)
613
+ Chunk Size: 100K records
614
+ Total Chunks: 100
615
+ Processing Time: ~60 minutes (sequential)
616
+ RAM Usage: ~100MB (processes one chunk at a time)
617
+ ```
618
+
619
+ ---
620
+
621
+ ## Pattern 3: Parallel Processing (High Performance)
622
+
623
+ **Best for:** 5M-10M records, time-sensitive ingestion, need speed with reliability
624
+
625
+ **Strategy:**
626
+
627
+ 1. Split file into chunks (same as Pattern 2)
628
+ 2. Spawn 5 parallel Batch API jobs
629
+ 3. Process chunks concurrently
630
+ 4. Track progress in VersoriKV
631
+ 5. Resume on failure
632
+
633
+ ### Implementation
634
+
635
+ ```typescript
636
+ import {
637
+ createClient,
638
+ CSVParserService,
639
+ S3DataSource,
640
+ UniversalMapper,
641
+ StateService,
642
+ VersoriKVAdapter,
643
+ createConsoleLogger,
644
+ toStructuredLogger
645
+ } from '@fluentcommerce/fc-connect-sdk';
646
+
647
+ const logger = createConsoleLogger();
648
+
649
+ interface ParallelJob {
650
+ jobId: string;
651
+ assignedChunks: string[];
652
+ status: 'pending' | 'processing' | 'completed' | 'failed';
653
+ recordsProcessed: number;
654
+ startedAt?: string;
655
+ completedAt?: string;
656
+ }
657
+
658
+ async function parallelIngestion(ctx: any) {
659
+ logger.info('Starting parallel ingestion');
660
+
661
+ // Initialize services
662
+ const client = await createClient(ctx);
663
+
664
+ const s3 = new S3DataSource(
665
+ {
666
+ type: 'S3_CSV',
667
+ connectionId: 'my-s3-parallel',
668
+ name: 'Inventory Files S3 Parallel',
669
+ s3Config: {
670
+ bucket: 'inventory-files',
671
+ region: 'us-east-1',
672
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
673
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
674
+ },
675
+ },
676
+ logger
677
+ );
678
+
679
+ const kv = context.openKv();
680
+ const kvAdapter = new VersoriKVAdapter(kv);
681
+ const stateService = new StateService(logger);
682
+
683
+ const SOURCE_FILE = 'inventory/huge-inventory.csv';
684
+ const CHUNK_SIZE = 100000; // 100K records per chunk
685
+ const PARALLEL_JOBS = 5; // Process 5 chunks concurrently
686
+ const workflowId = 'parallel-ingestion';
687
+
688
+ // STEP 1: Split file into chunks (reuse from Pattern 2)
689
+ const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
690
+
691
+ logger.info('File split complete', {
692
+ totalChunks: chunks.length,
693
+ totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
694
+ });
695
+
696
+ // STEP 2: Create multiple jobs for parallel processing
697
+ const jobs: ParallelJob[] = [];
698
+
699
+ for (let i = 0; i < PARALLEL_JOBS; i++) {
700
+ const job = await client.createJob({
701
+ name: `parallel-inventory-ingestion-job-${i + 1}`,
702
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
703
+ });
704
+
705
+ jobs.push({
706
+ jobId: job.id,
707
+ assignedChunks: [],
708
+ status: 'pending',
709
+ recordsProcessed: 0,
710
+ });
711
+
712
+ logger.info('Parallel job created', {
713
+ jobNumber: i + 1,
714
+ jobId: job.id,
715
+ });
716
+ }
717
+
718
+ // STEP 3: Distribute chunks across jobs (round-robin)
719
+ chunks.forEach((chunk, index) => {
720
+ const jobIndex = index % PARALLEL_JOBS;
721
+ jobs[jobIndex].assignedChunks.push(chunk.chunkId);
722
+ });
723
+
724
+ logger.info('Chunks distributed', {
725
+ totalChunks: chunks.length,
726
+ jobCount: PARALLEL_JOBS,
727
+ chunksPerJob: jobs.map(j => j.assignedChunks.length),
728
+ });
729
+
730
+ // STEP 4: Process chunks in parallel
731
+ const startTime = Date.now();
732
+
733
+ const jobPromises = jobs.map((job, jobIndex) =>
734
+ processJobChunks(
735
+ s3,
736
+ client,
737
+ job,
738
+ chunks.filter(c => job.assignedChunks.includes(c.chunkId)),
739
+ workflowId,
740
+ kvAdapter,
741
+ jobIndex + 1
742
+ )
743
+ );
744
+
745
+ // Wait for all jobs to complete
746
+ const results = await Promise.allSettled(jobPromises);
747
+ const duration = (Date.now() - startTime) / 1000;
748
+
749
+ // STEP 5: Analyze results
750
+ let successfulJobs = 0;
751
+ let failedJobs = 0;
752
+ let totalRecordsProcessed = 0;
753
+
754
+ results.forEach((result, index) => {
755
+ if (result.status === 'fulfilled') {
756
+ successfulJobs++;
757
+ totalRecordsProcessed += result.value.recordsProcessed;
758
+
759
+ logger.info('Job completed', {
760
+ jobNumber: index + 1,
761
+ jobId: jobs[index].jobId,
762
+ recordsProcessed: result.value.recordsProcessed,
763
+ chunksProcessed: result.value.chunksProcessed,
764
+ });
765
+ } else {
766
+ failedJobs++;
767
+ logger.error('Job failed', result.reason, {
768
+ jobNumber: index + 1,
769
+ jobId: jobs[index].jobId,
770
+ });
771
+ }
772
+ });
773
+
774
+ // STEP 6: Update final state
775
+ await stateService.updateSyncState(
776
+ kvAdapter,
777
+ [
778
+ {
779
+ fileName: SOURCE_FILE,
780
+ lastModified: new Date().toISOString(),
781
+ recordCount: totalRecordsProcessed,
782
+ },
783
+ ],
784
+ workflowId
785
+ );
786
+
787
+ logger.info('Parallel ingestion complete', {
788
+ totalChunks: chunks.length,
789
+ parallelJobs: PARALLEL_JOBS,
790
+ successfulJobs,
791
+ failedJobs,
792
+ totalRecordsProcessed,
793
+ durationSeconds: duration,
794
+ recordsPerSecond: Math.round(totalRecordsProcessed / duration),
795
+ });
796
+
797
+ return {
798
+ success: failedJobs === 0,
799
+ totalChunks: chunks.length,
800
+ totalRecordsProcessed,
801
+ successfulJobs,
802
+ failedJobs,
803
+ durationSeconds: duration,
804
+ recordsPerSecond: Math.round(totalRecordsProcessed / duration),
805
+ };
806
+ }
807
+
808
+ /**
809
+ * Process all chunks assigned to a job
810
+ */
811
+ async function processJobChunks(
812
+ s3: S3DataSource,
813
+ client: any,
814
+ job: ParallelJob,
815
+ chunks: ChunkMetadata[],
816
+ workflowId: string,
817
+ kv: VersoriKVAdapter,
818
+ jobNumber: number
819
+ ): Promise<{ recordsProcessed: number; chunksProcessed: number }> {
820
+ logger.info(`Job ${jobNumber} starting`, {
821
+ jobId: job.jobId,
822
+ assignedChunks: chunks.length,
823
+ });
824
+
825
+ let recordsProcessed = 0;
826
+ let chunksProcessed = 0;
827
+
828
+ for (const chunk of chunks) {
829
+ try {
830
+ // Check if chunk already processed
831
+ const chunkState = await kv.get(['chunk', workflowId, chunk.chunkId, 'status']);
832
+
833
+ if (chunkState?.value === 'completed') {
834
+ logger.info(`Job ${jobNumber}: Chunk already processed`, {
835
+ chunkId: chunk.chunkId,
836
+ });
837
+ chunksProcessed++;
838
+ continue;
839
+ }
840
+
841
+ // Mark chunk as processing
842
+ await kv.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
843
+
844
+ logger.info(`Job ${jobNumber}: Processing chunk`, {
845
+ chunkId: chunk.chunkId,
846
+ recordCount: chunk.recordCount,
847
+ progress: `${chunksProcessed}/${chunks.length}`,
848
+ });
849
+
850
+ // Process chunk
851
+ await processChunk(s3, client, job.jobId, chunk);
852
+
853
+ // Mark chunk as completed
854
+ await kv.set(['chunk', workflowId, chunk.chunkId], {
855
+ ...chunk,
856
+ status: 'completed',
857
+ processedAt: new Date().toISOString(),
858
+ } as ChunkMetadata);
859
+
860
+ recordsProcessed += chunk.recordCount;
861
+ chunksProcessed++;
862
+
863
+ logger.info(`Job ${jobNumber}: Chunk completed`, {
864
+ chunkId: chunk.chunkId,
865
+ recordsProcessed,
866
+ chunksProcessed,
867
+ percentComplete: ((chunksProcessed / chunks.length) * 100).toFixed(1),
868
+ });
869
+ } catch (error) {
870
+ logger.error(`Job ${jobNumber}: Chunk failed`, error as Error, {
871
+ chunkId: chunk.chunkId,
872
+ });
873
+
874
+ // Mark chunk as failed (don't throw - continue with remaining chunks)
875
+ await kv.set(['chunk', workflowId, chunk.chunkId], {
876
+ ...chunk,
877
+ status: 'failed',
878
+ error: (error as Error).message,
879
+ } as ChunkMetadata);
880
+ }
881
+ }
882
+
883
+ logger.info(`Job ${jobNumber} completed`, {
884
+ jobId: job.jobId,
885
+ recordsProcessed,
886
+ chunksProcessed,
887
+ });
888
+
889
+ return { recordsProcessed, chunksProcessed };
890
+ }
891
+ ```
892
+
893
+ **Progress Tracking:**
894
+
895
+ ```typescript
896
+ // Real-time progress query
897
+ async function getIngestionProgress(
898
+ workflowId: string,
899
+ kv: VersoriKVAdapter
900
+ ): Promise<{
901
+ totalChunks: number;
902
+ completedChunks: number;
903
+ failedChunks: number;
904
+ processingChunks: number;
905
+ percentComplete: number;
906
+ }> {
907
+ // This would query all chunk statuses from KV
908
+ // Simplified example:
909
+ const chunks = await getAllChunkMetadata(workflowId, kv);
910
+
911
+ const completed = chunks.filter(c => c.status === 'completed').length;
912
+ const failed = chunks.filter(c => c.status === 'failed').length;
913
+ const processing = chunks.filter(c => c.status === 'processing').length;
914
+
915
+ return {
916
+ totalChunks: chunks.length,
917
+ completedChunks: completed,
918
+ failedChunks: failed,
919
+ processingChunks: processing,
920
+ percentComplete: (completed / chunks.length) * 100,
921
+ };
922
+ }
923
+ ```
924
+
925
+ **Performance:**
926
+
927
+ ```
928
+ File Size: 5GB (10M records)
929
+ Chunk Size: 100K records
930
+ Total Chunks: 100
931
+ Parallel Jobs: 5
932
+ Processing Time: ~15 minutes (4x speedup)
933
+ RAM Usage: ~500MB (5 chunks in parallel)
934
+ Throughput: ~11,111 records/second
935
+ ```
936
+
937
+ ---
938
+
939
+ ## Pattern 4: Distributed Processing (Versori Workflows)
940
+
941
+ **Best for:** 10M+ records, enterprise scale, need maximum reliability and observability
942
+
943
+ **Strategy:**
944
+
945
+ 1. Coordinator workflow splits file and creates scheduled tasks
946
+ 2. Each worker workflow processes one chunk
947
+ 3. Coordinator tracks completion via VersoriKV
948
+ 4. Automatic retry on worker failure
949
+
950
+ ### Coordinator Workflow
951
+
952
+ ```typescript
953
+ import { fn, schedule } from '@versori/run';
954
+ import {
955
+ createClient,
956
+ S3DataSource,
957
+ VersoriKVAdapter,
958
+ createConsoleLogger,
959
+ toStructuredLogger
960
+ } from '@fluentcommerce/fc-connect-sdk';
961
+
962
+ const logger = createConsoleLogger();
963
+
964
+ /**
965
+ * Coordinator workflow - splits file and spawns workers
966
+ */
967
+ export const coordinatorWorkflow = schedule('coordinator')
968
+ .cron('0 2 * * *') // Run daily at 2 AM
969
+ .then(
970
+ fn('split-and-schedule', async ({ activation, connections, kv }) => {
971
+ logger.info('Coordinator: Starting distributed ingestion');
972
+
973
+ const s3 = new S3DataSource(
974
+ {
975
+ type: 'S3_CSV',
976
+ connectionId: 'my-s3-3',
977
+ name: 'Inventory Files S3 3',
978
+ s3Config: {
979
+ bucket: 'inventory-files',
980
+ region: 'us-east-1',
981
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
982
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
983
+ },
984
+ },
985
+ logger
986
+ );
987
+
988
+ const kvAdapter = new VersoriKVAdapter(kv);
989
+ const workflowId = `distributed-${Date.now()}`;
990
+ const SOURCE_FILE = 'inventory/enterprise-inventory.csv';
991
+ const CHUNK_SIZE = 100000;
992
+
993
+ // Split file into chunks
994
+ const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
995
+
996
+ logger.info('Coordinator: File split complete', {
997
+ totalChunks: chunks.length,
998
+ workflowId,
999
+ });
1000
+
1001
+ // Store coordinator state
1002
+ await kvAdapter.set(['coordinator', workflowId], {
1003
+ workflowId,
1004
+ sourceFile: SOURCE_FILE,
1005
+ totalChunks: chunks.length,
1006
+ status: 'scheduled',
1007
+ createdAt: new Date().toISOString(),
1008
+ });
1009
+
1010
+ // Schedule worker for each chunk
1011
+ for (const chunk of chunks) {
1012
+ // Trigger worker workflow (Versori will handle scheduling)
1013
+ await activation.triggerWorkflow('chunk-worker', {
1014
+ workflowId,
1015
+ chunkId: chunk.chunkId,
1016
+ chunkKey: chunk.s3Key,
1017
+ recordCount: chunk.recordCount,
1018
+ });
1019
+
1020
+ logger.info('Coordinator: Worker scheduled', {
1021
+ chunkId: chunk.chunkId,
1022
+ workflowId,
1023
+ });
1024
+ }
1025
+
1026
+ return {
1027
+ workflowId,
1028
+ totalChunks: chunks.length,
1029
+ message: `Scheduled ${chunks.length} worker workflows`,
1030
+ };
1031
+ })
1032
+ );
1033
+
1034
+ /**
1035
+ * Monitor workflow - checks completion status
1036
+ */
1037
+ export const monitorWorkflow = schedule('monitor')
1038
+ .cron('*/5 * * * *') // Run every 5 minutes
1039
+ .then(
1040
+ fn('check-progress', async ({ kv }) => {
1041
+ const kvAdapter = new VersoriKVAdapter(kv);
1042
+
1043
+ // Get all active coordinators
1044
+ const coordinators = await getActiveCoordinators(kvAdapter);
1045
+
1046
+ for (const coordinator of coordinators) {
1047
+ const progress = await getIngestionProgress(coordinator.workflowId, kvAdapter);
1048
+
1049
+ logger.info('Monitor: Progress update', {
1050
+ workflowId: coordinator.workflowId,
1051
+ ...progress,
1052
+ });
1053
+
1054
+ // Check if complete
1055
+ if (progress.completedChunks + progress.failedChunks === progress.totalChunks) {
1056
+ // Mark coordinator as complete
1057
+ await kvAdapter.set(['coordinator', coordinator.workflowId], {
1058
+ ...coordinator,
1059
+ status: 'completed',
1060
+ completedAt: new Date().toISOString(),
1061
+ progress,
1062
+ });
1063
+
1064
+ logger.info('Monitor: Ingestion complete', {
1065
+ workflowId: coordinator.workflowId,
1066
+ ...progress,
1067
+ });
1068
+ }
1069
+ }
1070
+
1071
+ return { coordinatorsChecked: coordinators.length };
1072
+ })
1073
+ );
1074
+ ```
1075
+
1076
+ ### Worker Workflow
1077
+
1078
+ ```typescript
1079
+ import { fn, webhook } from '@versori/run';
1080
+ import {
1081
+ createClient,
1082
+ S3DataSource,
1083
+ CSVParserService,
1084
+ UniversalMapper,
1085
+ VersoriKVAdapter,
1086
+ createConsoleLogger,
1087
+ toStructuredLogger
1088
+ } from '@fluentcommerce/fc-connect-sdk';
1089
+
1090
+ const logger = createConsoleLogger();
1091
+
1092
+ /**
1093
+ * Worker workflow - processes a single chunk
1094
+ */
1095
+ export const chunkWorker = webhook('chunk-worker').then(
1096
+ fn('process-chunk', async ({ data, activation, connections, kv }) => {
1097
+ const { workflowId, chunkId, chunkKey, recordCount } = data;
1098
+
1099
+ logger.info('Worker: Starting chunk processing', {
1100
+ workflowId,
1101
+ chunkId,
1102
+ recordCount,
1103
+ });
1104
+
1105
+ const kvAdapter = new VersoriKVAdapter(kv);
1106
+
1107
+ // Check if already processed
1108
+ const chunkState = await kvAdapter.get(['chunk', workflowId, chunkId, 'status']);
1109
+
1110
+ if (chunkState?.value === 'completed') {
1111
+ logger.info('Worker: Chunk already processed', { chunkId });
1112
+ return { chunkId, status: 'skipped', message: 'Already processed' };
1113
+ }
1114
+
1115
+ // Mark as processing
1116
+ await kvAdapter.set(['chunk', workflowId, chunkId, 'status'], 'processing');
1117
+
1118
+ try {
1119
+ // Initialize services
1120
+ const client = await createClient(ctx);
1121
+
1122
+ const s3 = new S3DataSource(
1123
+ {
1124
+ type: 'S3_CSV',
1125
+ connectionId: 'my-s3-4',
1126
+ name: 'Inventory Files S3 4',
1127
+ s3Config: {
1128
+ bucket: 'inventory-files',
1129
+ region: 'us-east-1',
1130
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
1131
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
1132
+ },
1133
+ },
1134
+ logger
1135
+ );
1136
+
1137
+ const csvParser = new CSVParserService();
1138
+ const mapper = new UniversalMapper({
1139
+ fields: {
1140
+ skuRef: { source: 'sku', required: true },
1141
+ locationRef: { source: 'location_code', required: true },
1142
+ qty: { source: 'quantity', resolver: 'sdk.parseInt' },
1143
+ expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
1144
+ },
1145
+ });
1146
+
1147
+ // Get or create job for this workflow
1148
+ let jobId = await kvAdapter.get(['job', workflowId, 'jobId']);
1149
+
1150
+ if (!jobId?.value) {
1151
+ const job = await client.createJob({
1152
+ name: `distributed-ingestion-${workflowId}`,
1153
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
1154
+ });
1155
+
1156
+ await kvAdapter.set(['job', workflowId, 'jobId'], job.id);
1157
+ jobId = { value: job.id };
1158
+ }
1159
+
1160
+ // Download chunk
1161
+ const chunkContent = (await s3.downloadFile(chunkKey, {
1162
+ encoding: 'utf8',
1163
+ })) as string;
1164
+
1165
+ // Parse chunk
1166
+ const records = await csvParser.parse(chunkContent);
1167
+
1168
+ // Map records
1169
+ const entities: any[] = [];
1170
+ for (const record of records) {
1171
+ const mapped = await mapper.map(record);
1172
+ if (mapped.success && mapped.data) {
1173
+ entities.push(mapped.data);
1174
+ }
1175
+ }
1176
+
1177
+ // Send batch
1178
+ await client.sendBatch(jobId.value as string, { entities });
1179
+
1180
+ // Mark as completed
1181
+ await kvAdapter.set(['chunk', workflowId, chunkId], {
1182
+ chunkId,
1183
+ s3Key: chunkKey,
1184
+ recordCount: entities.length,
1185
+ status: 'completed',
1186
+ processedAt: new Date().toISOString(),
1187
+ });
1188
+
1189
+ logger.info('Worker: Chunk completed', {
1190
+ workflowId,
1191
+ chunkId,
1192
+ recordCount: entities.length,
1193
+ });
1194
+
1195
+ return {
1196
+ chunkId,
1197
+ status: 'completed',
1198
+ recordsProcessed: entities.length,
1199
+ };
1200
+ } catch (error) {
1201
+ logger.error('Worker: Chunk failed', error as Error, {
1202
+ workflowId,
1203
+ chunkId,
1204
+ });
1205
+
1206
+ // Mark as failed
1207
+ await kvAdapter.set(['chunk', workflowId, chunkId], {
1208
+ chunkId,
1209
+ s3Key: chunkKey,
1210
+ recordCount,
1211
+ status: 'failed',
1212
+ error: (error as Error).message,
1213
+ });
1214
+
1215
+ throw error;
1216
+ }
1217
+ })
1218
+ );
1219
+ ```
1220
+
1221
+ **Performance:**
1222
+
1223
+ ```
1224
+ File Size: 10GB (20M records)
1225
+ Chunk Size: 100K records
1226
+ Total Chunks: 200
1227
+ Worker Workflows: 200 (parallel)
1228
+ Processing Time: ~10 minutes (Versori handles parallelism)
1229
+ RAM Usage: ~50MB per worker
1230
+ Throughput: ~33,333 records/second
1231
+ ```
1232
+
1233
+ ---
1234
+
1235
+ ## Memory Optimization Tips
1236
+
1237
+ ### 1. Use Streaming APIs
1238
+
1239
+ ```typescript
1240
+ // ❌ WRONG - Loads entire file into memory
1241
+ const fileContent = await fs.readFile('huge.csv', 'utf-8');
1242
+ const records = await csvParser.parse(fileContent);
1243
+
1244
+ // ✅ CORRECT - Streams records incrementally
1245
+ for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
1246
+ await processRecord(record);
1247
+ }
1248
+ ```
1249
+
1250
+ ### 2. Clear Batches After Processing
1251
+
1252
+ ```typescript
1253
+ let batch: any[] = [];
1254
+ for await (const record of records) {
1255
+ batch.push(record);
1256
+
1257
+ if (batch.length >= 1000) {
1258
+ await sendBatch(batch);
1259
+ batch = []; // ✅ Clear batch to free memory
1260
+ }
1261
+ }
1262
+ ```
1263
+
1264
+ ### 3. Monitor Memory Usage
1265
+
1266
+ ```typescript
1267
+ function logMemoryUsage() {
1268
+ const used = process.memoryUsage();
1269
+ console.log({
1270
+ heapUsed: Math.round(used.heapUsed / 1024 / 1024) + ' MB',
1271
+ heapTotal: Math.round(used.heapTotal / 1024 / 1024) + ' MB',
1272
+ rss: Math.round(used.rss / 1024 / 1024) + ' MB',
1273
+ });
1274
+ }
1275
+
1276
+ // Log every 10K records
1277
+ if (recordsProcessed % 10000 === 0) {
1278
+ logMemoryUsage();
1279
+ }
1280
+ ```
1281
+
1282
+ ### 4. Use Garbage Collection Hints
1283
+
1284
+ ```typescript
1285
+ // Force garbage collection (requires --expose-gc flag)
1286
+ if (recordsProcessed % 100000 === 0 && global.gc) {
1287
+ global.gc();
1288
+ logger.info('Garbage collection triggered', { recordsProcessed });
1289
+ }
1290
+ ```
1291
+
1292
+ ---
1293
+
1294
+ ## Performance Benchmarks
1295
+
1296
+ ### Pattern Comparison (10M records, 5GB file)
1297
+
1298
+ | Pattern | Time | RAM | Throughput | Complexity |
1299
+ | ------------------------- | ------ | ------ | -------------- | ---------- |
1300
+ | 1. Basic Streaming | 90 min | 50MB | 1,852 rec/sec | Low |
1301
+ | 2. File Chunking | 60 min | 100MB | 2,778 rec/sec | Medium |
1302
+ | 3. Parallel Processing | 15 min | 500MB | 11,111 rec/sec | High |
1303
+ | 4. Distributed Processing | 10 min | 50MB\* | 16,667 rec/sec | Very High |
1304
+
1305
+ \*Per worker; total RAM = 50MB × worker count
1306
+
1307
+ ### Optimization Impact
1308
+
1309
+ | Optimization | Before | After | Improvement |
1310
+ | ------------------------- | ------- | -------- | ----------- |
1311
+ | Streaming vs Loading | 5GB RAM | 50MB RAM | 100x |
1312
+ | Batching (1K vs 10K) | 90 min | 60 min | 1.5x |
1313
+ | Parallel (1 vs 5 jobs) | 60 min | 15 min | 4x |
1314
+ | Distributed (200 workers) | 15 min | 10 min | 1.5x |
1315
+
1316
+ ---
1317
+
1318
+ ## Common Issues & Solutions
1319
+
1320
+ ### Issue 1: Out of Memory
1321
+
1322
+ **Symptoms:**
1323
+
1324
+ ```
1325
+ FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1326
+ ```
1327
+
1328
+ **Solutions:**
1329
+
1330
+ 1. Switch to streaming pattern (Pattern 1)
1331
+ 2. Reduce batch size (1000 => 500)
1332
+ 3. Increase Node.js heap: `node --max-old-space-size=4096`
1333
+ 4. Use file chunking (Pattern 2)
1334
+
1335
+ ### Issue 2: Timeout on Large Files
1336
+
1337
+ **Symptoms:**
1338
+
1339
+ ```
1340
+ TimeoutError: Operation timed out after 300000ms
1341
+ ```
1342
+
1343
+ **Solutions:**
1344
+
1345
+ 1. Increase timeout: `config.timeout = 600000` (10 min)
1346
+ 2. Split file into chunks (Pattern 2)
1347
+ 3. Use parallel processing (Pattern 3)
1348
+
1349
+ ### Issue 3: Chunks Not Resuming
1350
+
1351
+ **Symptoms:**
1352
+
1353
+ - Re-processing same chunks on failure
1354
+
1355
+ **Solutions:**
1356
+
1357
+ ```typescript
1358
+ // Check chunk status before processing
1359
+ const chunkState = await kv.get(['chunk', workflowId, chunkId, 'status']);
1360
+ if (chunkState?.value === 'completed') {
1361
+ logger.info('Chunk already processed, skipping', { chunkId });
1362
+ continue;
1363
+ }
1364
+ ```
1365
+
1366
+ ### Issue 4: Progress Tracking Inconsistent
1367
+
1368
+ **Symptoms:**
1369
+
1370
+ - Progress percentage doesn't match reality
1371
+
1372
+ **Solutions:**
1373
+
1374
+ ```typescript
1375
+ // Always update chunk status atomically
1376
+ const atomic = kv.atomic();
1377
+ atomic.set(['chunk', workflowId, chunkId, 'status'], 'completed');
1378
+ atomic.set(['chunk', workflowId, chunkId, 'processedAt'], new Date().toISOString());
1379
+ await atomic.commit();
1380
+ ```
1381
+
1382
+ ### Issue 5: Duplicate Processing
1383
+
1384
+ **Symptoms:**
1385
+
1386
+ - Same records sent multiple times
1387
+
1388
+ **Solutions:**
1389
+
1390
+ ```typescript
1391
+ // Use idempotency keys in Fluent batch payload
1392
+ await client.sendBatch(jobId, {
1393
+ entities,
1394
+ meta: {
1395
+ chunkId: chunk.chunkId,
1396
+ workflowId,
1397
+ idempotencyKey: `${workflowId}-${chunk.chunkId}`,
1398
+ },
1399
+ });
1400
+ ```
1401
+
1402
+ ---
1403
+
1404
+ ## Related Guides
1405
+
1406
+ - [Basic Ingestion Pattern](../standalone/s3-csv-batch-api.md) - For small files (<100K records)
1407
+ - [Streaming Pattern](../../02-CORE-GUIDES/ingestion/ingestion-readme.md) - For medium files (100K-1M records)
1408
+ - [Error Handling & Retry](./error-handling-retry.md) - Robust error handling strategies
1409
+ - [Progress Tracking](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md) - Real-time progress monitoring
1410
+ - [State Management](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md) - VersoriKV patterns
1411
+
1412
+ ---
1413
+
1414
+ ## Summary
1415
+
1416
+ **Choose Your Pattern:**
1417
+
1418
+ - **Pattern 1 (Streaming)**: Simple, memory-efficient, suitable for 100K-1M records
1419
+ - **Pattern 2 (Chunking)**: Checkpoint/resume, suitable for 1M-5M records
1420
+ - **Pattern 3 (Parallel)**: High performance, suitable for 5M-10M records
1421
+ - **Pattern 4 (Distributed)**: Enterprise scale, suitable for 10M+ records
1422
+
1423
+ **Key Takeaways:**
1424
+
1425
+ 1. Always use streaming APIs for large files
1426
+ 2. Clear batches after processing to free memory
1427
+ 3. Use chunks + VersoriKV for checkpoint/resume
1428
+ 4. Parallel processing trades RAM for speed
1429
+ 5. Monitor memory usage throughout processing
1430
+ 6. Test with representative file sizes before production