@fluentcommerce/fc-connect-sdk 0.1.54 → 0.1.56

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (476) hide show
  1. package/CHANGELOG.md +12 -0
  2. package/README.md +11 -0
  3. package/dist/cjs/clients/fluent-client.js +13 -6
  4. package/dist/cjs/utils/pagination-helpers.js +38 -2
  5. package/dist/cjs/versori/fluent-versori-client.js +11 -5
  6. package/dist/esm/clients/fluent-client.js +13 -6
  7. package/dist/esm/utils/pagination-helpers.js +38 -2
  8. package/dist/esm/versori/fluent-versori-client.js +11 -5
  9. package/dist/tsconfig.esm.tsbuildinfo +1 -1
  10. package/dist/tsconfig.tsbuildinfo +1 -1
  11. package/dist/tsconfig.types.tsbuildinfo +1 -1
  12. package/docs/00-START-HERE/EXPORT-VALIDATION.md +158 -158
  13. package/docs/00-START-HERE/cli-analyze-source-structure-guide.md +655 -655
  14. package/docs/00-START-HERE/cli-documentation-index.md +202 -202
  15. package/docs/00-START-HERE/cli-quick-reference.md +252 -252
  16. package/docs/00-START-HERE/decision-tree.md +552 -552
  17. package/docs/00-START-HERE/getting-started.md +1070 -1070
  18. package/docs/00-START-HERE/mapper-quick-decision-guide.md +235 -235
  19. package/docs/00-START-HERE/readme.md +237 -237
  20. package/docs/00-START-HERE/retailerid-configuration.md +404 -404
  21. package/docs/00-START-HERE/sdk-philosophy.md +794 -794
  22. package/docs/00-START-HERE/troubleshooting-quick-reference.md +1086 -1086
  23. package/docs/01-TEMPLATES/faq.md +686 -686
  24. package/docs/01-TEMPLATES/patterns/pattern-templates-guide.md +68 -68
  25. package/docs/01-TEMPLATES/patterns/patterns-csv-schema-validation-and-rejection-report.md +233 -233
  26. package/docs/01-TEMPLATES/patterns/patterns-custom-resolvers.md +407 -407
  27. package/docs/01-TEMPLATES/patterns/patterns-error-handling-retry.md +511 -511
  28. package/docs/01-TEMPLATES/patterns/patterns-field-mapping-universal.md +701 -701
  29. package/docs/01-TEMPLATES/patterns/patterns-large-file-splitting.md +1430 -1430
  30. package/docs/01-TEMPLATES/patterns/patterns-master-data-etl.md +2399 -2399
  31. package/docs/01-TEMPLATES/patterns/patterns-pagination-streaming.md +447 -447
  32. package/docs/01-TEMPLATES/patterns/patterns-state-duplicate-prevention.md +385 -385
  33. package/docs/01-TEMPLATES/readme.md +957 -957
  34. package/docs/01-TEMPLATES/standalone/standalone-asn-inbound-processing.md +1209 -1209
  35. package/docs/01-TEMPLATES/standalone/standalone-graphql-query-export.md +1140 -1140
  36. package/docs/01-TEMPLATES/standalone/standalone-graphql-to-parquet-partitioned-s3.md +432 -432
  37. package/docs/01-TEMPLATES/standalone/standalone-multi-channel-inventory-sync.md +1185 -1185
  38. package/docs/01-TEMPLATES/standalone/standalone-multi-source-aggregation.md +1462 -1462
  39. package/docs/01-TEMPLATES/standalone/standalone-s3-csv-batch-api.md +1390 -1390
  40. package/docs/01-TEMPLATES/standalone/standalone-s3-csv-inventory-to-batch.md +330 -330
  41. package/docs/01-TEMPLATES/standalone/standalone-scripts-guide.md +87 -87
  42. package/docs/01-TEMPLATES/standalone/standalone-sftp-xml-graphql.md +1444 -1444
  43. package/docs/01-TEMPLATES/standalone/standalone-webhook-payload-processing.md +688 -688
  44. package/docs/01-TEMPLATES/versori/business-examples/business-examples-dropship-order-routing.md +193 -193
  45. package/docs/01-TEMPLATES/versori/business-examples/business-examples-graphql-parquet-extraction.md +518 -518
  46. package/docs/01-TEMPLATES/versori/business-examples/business-examples-inter-location-transfers.md +2162 -2162
  47. package/docs/01-TEMPLATES/versori/business-examples/business-examples-pre-order-allocation.md +2226 -2226
  48. package/docs/01-TEMPLATES/versori/business-examples/business-scenarios-guide.md +87 -87
  49. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-connection-validation-pattern.md +656 -656
  50. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-dual-workflow-connector.md +835 -835
  51. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-guide.md +108 -108
  52. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-kv-state-management.md +1533 -1533
  53. package/docs/01-TEMPLATES/versori/patterns/versori-patterns-xml-response-patterns.md +1160 -1160
  54. package/docs/01-TEMPLATES/versori/versori-platform-guide.md +201 -201
  55. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-asn-purchase-order.md +1906 -1906
  56. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-dropship-routing.md +1074 -1074
  57. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-flash-sale-reserve.md +1395 -1395
  58. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-generic-xml-order.md +888 -888
  59. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-payment-gateway-integration.md +2478 -2478
  60. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-rma-returns-comprehensive.md +2240 -2240
  61. package/docs/01-TEMPLATES/versori/webhooks/template-webhook-xml-order-ingestion.md +2029 -2029
  62. package/docs/01-TEMPLATES/versori/webhooks/webhook-templates-guide.md +140 -140
  63. package/docs/01-TEMPLATES/versori/workflows/_examples/sample-data/inventory-mapping.json +20 -20
  64. package/docs/01-TEMPLATES/versori/workflows/_examples/sample-data/products_2025-01-22.csv +11 -11
  65. package/docs/01-TEMPLATES/versori/workflows/_examples/sample-data/sample-data-guide.md +34 -34
  66. package/docs/01-TEMPLATES/versori/workflows/_examples/workflow-examples-guide.md +36 -36
  67. package/docs/01-TEMPLATES/versori/workflows/extraction/extraction-modes-guide.md +1038 -1038
  68. package/docs/01-TEMPLATES/versori/workflows/extraction/extraction-workflows-guide.md +138 -138
  69. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/graphql-extraction-guide.md +63 -63
  70. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-fulfillments-to-sftp-csv.md +2062 -2062
  71. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-fulfillments-to-sftp-xml.md +2294 -2294
  72. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-positions-to-s3-csv.md +2461 -2461
  73. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-positions-to-sftp-xml.md +2529 -2529
  74. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-quantities-to-s3-csv.md +2464 -2464
  75. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-inventory-quantities-to-s3-json.md +1959 -1959
  76. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-orders-to-s3-csv.md +1953 -1953
  77. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-orders-to-sftp-xml.md +2541 -2541
  78. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-products-to-s3-json.md +2384 -2384
  79. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-products-to-sftp-xml.md +2445 -2445
  80. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-virtual-positions-to-s3-csv.md +2355 -2355
  81. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-virtual-positions-to-s3-json.md +2042 -2042
  82. package/docs/01-TEMPLATES/versori/workflows/extraction/graphql-queries/template-extraction-virtual-positions-to-sftp-xml.md +2726 -2726
  83. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/batch-api-guide.md +206 -206
  84. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-cycle-count-reconciliation.md +2030 -2030
  85. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-multi-channel-inventory-sync.md +1882 -1882
  86. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-s3-csv-inventory-batch.md +2827 -2827
  87. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-s3-json-inventory-batch.md +1952 -1952
  88. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-s3-xml-inventory-batch.md +3289 -3289
  89. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-sftp-csv-inventory-batch.md +3064 -3064
  90. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-sftp-json-inventory-batch.md +3238 -3238
  91. package/docs/01-TEMPLATES/versori/workflows/ingestion/batch-api/template-ingestion-sftp-xml-inventory-batch.md +2977 -2977
  92. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/event-api-guide.md +321 -321
  93. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-payload-json-order-cancel-event.md +959 -959
  94. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-payload-xml-order-cancel-event.md +1170 -1170
  95. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-csv-product-event.md +2312 -2312
  96. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-json-product-event.md +2999 -2999
  97. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-parquet-product-event.md +2836 -2836
  98. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-s3-xml-product-event.md +2395 -2395
  99. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-csv-product-event.md +2295 -2295
  100. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-json-product-event.md +2602 -2602
  101. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-parquet-product-event.md +2589 -2589
  102. package/docs/01-TEMPLATES/versori/workflows/ingestion/event-api/template-ingestion-sftp-xml-product-event.md +3578 -3578
  103. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/graphql-mutations-guide.md +93 -93
  104. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-payload-json-order-update-graphql.md +1260 -1260
  105. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-payload-xml-order-update-graphql.md +1472 -1472
  106. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-csv-control-graphql.md +2417 -2417
  107. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-csv-location-graphql.md +2811 -2811
  108. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-csv-price-graphql.md +2619 -2619
  109. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-json-location-graphql.md +2807 -2807
  110. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-s3-xml-location-graphql.md +2373 -2373
  111. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-sftp-csv-control-graphql.md +2740 -2740
  112. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-sftp-csv-location-graphql.md +2760 -2760
  113. package/docs/01-TEMPLATES/versori/workflows/ingestion/graphql-mutations/template-ingestion-sftp-json-location-graphql.md +1710 -1710
  114. package/docs/01-TEMPLATES/versori/workflows/ingestion/ingestion-workflows-guide.md +136 -136
  115. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/rubix-webhooks-guide.md +520 -520
  116. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/template-webhook-rubix-fulfilment-to-sftp-xml-inline.md +1418 -1418
  117. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/template-webhook-rubix-fulfilment-to-sftp-xml-universal-mapper.md +1785 -1785
  118. package/docs/01-TEMPLATES/versori/workflows/rubix-webhooks/template-webhook-rubix-order-attribute-update.md +824 -824
  119. package/docs/01-TEMPLATES/versori/workflows/workflows-overview-guide.md +646 -646
  120. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-batch-archival.md +724 -724
  121. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-job-tracker.md +627 -627
  122. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-partial-batch-recovery.md +561 -561
  123. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-quick-reference.md +367 -367
  124. package/docs/02-CORE-GUIDES/advanced-services/advanced-services-readme.md +407 -407
  125. package/docs/02-CORE-GUIDES/advanced-services/readme.md +49 -49
  126. package/docs/02-CORE-GUIDES/api-reference/api-reference-quick-reference.md +548 -548
  127. package/docs/02-CORE-GUIDES/api-reference/event-api-input-output-reference.md +702 -1171
  128. package/docs/02-CORE-GUIDES/api-reference/examples/client-initialization.ts +286 -286
  129. package/docs/02-CORE-GUIDES/api-reference/graphql-error-classification.md +337 -337
  130. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-01-client-api.md +399 -520
  131. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-03-authentication.md +199 -199
  132. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-04-graphql-mapping.md +925 -925
  133. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-05-services.md +1198 -1198
  134. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-06-data-sources.md +1083 -1083
  135. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-07-parsers.md +1097 -1097
  136. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-08-pagination.md +513 -513
  137. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-08-types.md +545 -597
  138. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-09-error-handling.md +527 -527
  139. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-09-webhook-validation.md +514 -514
  140. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-10-extraction.md +557 -557
  141. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-10-utilities.md +412 -412
  142. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-11-cli-tools.md +423 -423
  143. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-11-error-handling.md +716 -716
  144. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-12-analyze-source-structure.md +518 -518
  145. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-12-partial-responses.md +212 -212
  146. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-12-testing.md +300 -300
  147. package/docs/02-CORE-GUIDES/api-reference/modules/api-reference-13-resolver-builder.md +322 -322
  148. package/docs/02-CORE-GUIDES/api-reference/readme.md +279 -279
  149. package/docs/02-CORE-GUIDES/auto-pagination/auto-pagination-quick-reference.md +351 -351
  150. package/docs/02-CORE-GUIDES/auto-pagination/auto-pagination-readme.md +277 -277
  151. package/docs/02-CORE-GUIDES/auto-pagination/examples/auto-pagination-readme.md +178 -178
  152. package/docs/02-CORE-GUIDES/auto-pagination/examples/common-patterns.ts +351 -351
  153. package/docs/02-CORE-GUIDES/auto-pagination/examples/paginate-products.ts +384 -384
  154. package/docs/02-CORE-GUIDES/auto-pagination/examples/paginate-virtual-positions.ts +308 -308
  155. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-01-foundations.md +470 -470
  156. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-02-quick-start.md +713 -713
  157. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-03-configuration.md +754 -754
  158. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-04-advanced-patterns.md +732 -732
  159. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-05-sdk-integration.md +847 -847
  160. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-06-troubleshooting.md +359 -359
  161. package/docs/02-CORE-GUIDES/auto-pagination/modules/auto-pagination-07-api-reference.md +462 -462
  162. package/docs/02-CORE-GUIDES/auto-pagination/readme.md +54 -54
  163. package/docs/02-CORE-GUIDES/data-sources/data-sources-file-operations-error-handling.md +1487 -1487
  164. package/docs/02-CORE-GUIDES/data-sources/data-sources-quick-reference.md +836 -836
  165. package/docs/02-CORE-GUIDES/data-sources/data-sources-readme.md +276 -276
  166. package/docs/02-CORE-GUIDES/data-sources/data-sources-sftp-credential-access-security.md +553 -553
  167. package/docs/02-CORE-GUIDES/data-sources/examples/common-patterns.ts +409 -409
  168. package/docs/02-CORE-GUIDES/data-sources/examples/data-sources-readme.md +178 -178
  169. package/docs/02-CORE-GUIDES/data-sources/examples/s3-operations.ts +308 -308
  170. package/docs/02-CORE-GUIDES/data-sources/examples/sftp-operations.ts +371 -371
  171. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-01-foundations.md +735 -735
  172. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-02-s3-operations.md +1302 -1302
  173. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-03-sftp-operations.md +1379 -1379
  174. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-04-file-patterns.md +941 -941
  175. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-05-advanced-topics.md +813 -813
  176. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-06-integration-patterns.md +486 -486
  177. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-07-troubleshooting.md +387 -387
  178. package/docs/02-CORE-GUIDES/data-sources/modules/data-sources-08-api-reference.md +417 -417
  179. package/docs/02-CORE-GUIDES/data-sources/readme.md +77 -77
  180. package/docs/02-CORE-GUIDES/error-handling-guide.md +936 -936
  181. package/docs/02-CORE-GUIDES/extraction/examples/02-core-guides-extraction-readme.md +116 -116
  182. package/docs/02-CORE-GUIDES/extraction/examples/common-patterns.ts +428 -428
  183. package/docs/02-CORE-GUIDES/extraction/examples/extract-inventory-basic.ts +187 -187
  184. package/docs/02-CORE-GUIDES/extraction/extraction-quick-reference.md +596 -596
  185. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-01-foundations.md +514 -514
  186. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-02-basic-extraction.md +823 -823
  187. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-03-parquet-processing.md +507 -507
  188. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-04-data-enrichment.md +546 -546
  189. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-05-transformation.md +494 -494
  190. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-06-export-formats.md +458 -458
  191. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-06-performance.md +138 -138
  192. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-07-api-reference.md +148 -148
  193. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-07-optimization.md +692 -692
  194. package/docs/02-CORE-GUIDES/extraction/modules/02-core-guides-extraction-08-extraction-orchestrator.md +1008 -1008
  195. package/docs/02-CORE-GUIDES/extraction/readme.md +151 -151
  196. package/docs/02-CORE-GUIDES/ingestion/examples/_simple-kv-store.ts +40 -40
  197. package/docs/02-CORE-GUIDES/ingestion/examples/error-recovery.ts +728 -728
  198. package/docs/02-CORE-GUIDES/ingestion/examples/event-driven.ts +501 -501
  199. package/docs/02-CORE-GUIDES/ingestion/examples/local-file-ingestion.ts +88 -88
  200. package/docs/02-CORE-GUIDES/ingestion/examples/parquet-ingestion.ts +117 -117
  201. package/docs/02-CORE-GUIDES/ingestion/examples/performance-optimized.ts +647 -647
  202. package/docs/02-CORE-GUIDES/ingestion/examples/s3-csv-ingestion.ts +169 -169
  203. package/docs/02-CORE-GUIDES/ingestion/examples/sftp-csv-ingestion.ts +134 -134
  204. package/docs/02-CORE-GUIDES/ingestion/ingestion-quick-reference.md +546 -546
  205. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-01-introduction.md +626 -626
  206. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-02-quick-start.md +658 -658
  207. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-03-data-sources.md +1052 -1052
  208. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-04-field-mapping.md +763 -763
  209. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-05-advanced-parsers.md +676 -676
  210. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-06-batch-api.md +1295 -1295
  211. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-api-reference.md +138 -138
  212. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md +1037 -1037
  213. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md +1349 -1349
  214. package/docs/02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-09-best-practices.md +1893 -1893
  215. package/docs/02-CORE-GUIDES/ingestion/readme.md +160 -160
  216. package/docs/02-CORE-GUIDES/logging-guide.md +585 -585
  217. package/docs/02-CORE-GUIDES/mapping/error-handling-patterns.md +401 -401
  218. package/docs/02-CORE-GUIDES/mapping/examples/02-core-guides-mapping-readme.md +128 -128
  219. package/docs/02-CORE-GUIDES/mapping/examples/common-patterns.ts +273 -273
  220. package/docs/02-CORE-GUIDES/mapping/examples/csv-location-ingestion.json +36 -36
  221. package/docs/02-CORE-GUIDES/mapping/examples/csv-mapping.ts +242 -242
  222. package/docs/02-CORE-GUIDES/mapping/examples/graphql-to-parquet-extraction.json +36 -36
  223. package/docs/02-CORE-GUIDES/mapping/examples/json-mapping.ts +213 -213
  224. package/docs/02-CORE-GUIDES/mapping/examples/json-product-to-mutation.json +48 -48
  225. package/docs/02-CORE-GUIDES/mapping/examples/xml-mapping.ts +291 -291
  226. package/docs/02-CORE-GUIDES/mapping/examples/xml-order-to-mutation.json +45 -45
  227. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/graphql-mutation-mapping-quick-reference.md +463 -463
  228. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/graphql-mutation-mapping-readme.md +227 -227
  229. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-01-introduction.md +222 -222
  230. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-02-quick-start.md +351 -351
  231. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-03-schema-validation.md +569 -569
  232. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-04-mapping-patterns.md +471 -471
  233. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-05-configuration-reference.md +611 -611
  234. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-06-advanced-xpath.md +148 -148
  235. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-06-path-syntax.md +464 -464
  236. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-07-api-reference.md +94 -94
  237. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-07-array-handling.md +307 -307
  238. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-08-custom-resolvers.md +544 -544
  239. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-09-advanced-patterns.md +427 -427
  240. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-10-hooks-and-variables.md +336 -336
  241. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-11-error-handling.md +488 -488
  242. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-12-arguments-vs-nodes.md +383 -383
  243. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/modules/graphql-mutation-mapping-13-best-practices.md +477 -477
  244. package/docs/02-CORE-GUIDES/mapping/graphql-mutation-mapping/readme.md +62 -62
  245. package/docs/02-CORE-GUIDES/mapping/mapping-format-decision-tree.md +480 -480
  246. package/docs/02-CORE-GUIDES/mapping/mapping-graphql-alias-batching-guide.md +820 -820
  247. package/docs/02-CORE-GUIDES/mapping/mapping-javascript-objects.md +2369 -2369
  248. package/docs/02-CORE-GUIDES/mapping/mapping-mapper-comparison-guide.md +682 -682
  249. package/docs/02-CORE-GUIDES/mapping/modules/02-core-guides-mapping-07-api-reference.md +1327 -1327
  250. package/docs/02-CORE-GUIDES/mapping/modules/02-core-guides-mapping-08-error-handling.md +1142 -1142
  251. package/docs/02-CORE-GUIDES/mapping/modules/mapping-04-use-cases.md +891 -891
  252. package/docs/02-CORE-GUIDES/mapping/modules/mapping-06-helpers-resolvers.md +1126 -1126
  253. package/docs/02-CORE-GUIDES/mapping/modules/mapping-06-sdk-resolvers.md +199 -199
  254. package/docs/02-CORE-GUIDES/mapping/modules/mapping-07-api-reference.md +1319 -1319
  255. package/docs/02-CORE-GUIDES/mapping/readme.md +178 -178
  256. package/docs/02-CORE-GUIDES/mapping/resolver-registration.md +410 -410
  257. package/docs/02-CORE-GUIDES/mapping/resolvers/examples/common-patterns.ts +226 -226
  258. package/docs/02-CORE-GUIDES/mapping/resolvers/examples/custom-resolvers.ts +227 -227
  259. package/docs/02-CORE-GUIDES/mapping/resolvers/examples/sdk-resolvers-usage.ts +203 -203
  260. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-readme.md +274 -274
  261. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-api-reference.md +679 -679
  262. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-cookbook.md +826 -826
  263. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-guide.md +1330 -1330
  264. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-helpers-reference.md +1437 -1437
  265. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-parameters-reference.md +553 -553
  266. package/docs/02-CORE-GUIDES/mapping/resolvers/mapping-resolvers-resolver-troubleshooting.md +854 -854
  267. package/docs/02-CORE-GUIDES/mapping/resolvers/readme.md +75 -75
  268. package/docs/02-CORE-GUIDES/parsers/examples/02-core-guides-parsers-readme.md +161 -161
  269. package/docs/02-CORE-GUIDES/parsers/examples/csv-parser-examples.ts +110 -110
  270. package/docs/02-CORE-GUIDES/parsers/examples/json-parser-examples.ts +33 -33
  271. package/docs/02-CORE-GUIDES/parsers/examples/parquet-parser-examples.ts +47 -47
  272. package/docs/02-CORE-GUIDES/parsers/examples/xml-parser-examples.ts +38 -38
  273. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-01-foundations.md +355 -355
  274. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-02-csv-parser.md +772 -772
  275. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-03-json-parser.md +789 -789
  276. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-04-xml-parser.md +857 -857
  277. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-05-parquet-parser.md +603 -603
  278. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-06-integration-patterns.md +702 -702
  279. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-06-streaming.md +121 -121
  280. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-07-api-reference.md +89 -89
  281. package/docs/02-CORE-GUIDES/parsers/modules/02-core-guides-parsers-07-troubleshooting.md +727 -727
  282. package/docs/02-CORE-GUIDES/parsers/parsers-quick-reference.md +482 -482
  283. package/docs/02-CORE-GUIDES/parsers/parsers-readme.md +258 -258
  284. package/docs/02-CORE-GUIDES/parsers/readme.md +65 -65
  285. package/docs/02-CORE-GUIDES/readme.md +194 -194
  286. package/docs/02-CORE-GUIDES/webhook-validation/examples/basic-validation.ts +108 -108
  287. package/docs/02-CORE-GUIDES/webhook-validation/examples/common-patterns.ts +316 -316
  288. package/docs/02-CORE-GUIDES/webhook-validation/examples/webhook-validation-readme.md +61 -61
  289. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-01-foundations.md +440 -440
  290. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-02-quick-start.md +525 -525
  291. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-03-versori-integration.md +741 -741
  292. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-04-platform-integration.md +629 -629
  293. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-05-configuration.md +535 -535
  294. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-06-error-handling.md +611 -611
  295. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-06-troubleshooting.md +124 -124
  296. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-07-api-reference.md +511 -511
  297. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-08-rubix-webhooks.md +590 -590
  298. package/docs/02-CORE-GUIDES/webhook-validation/modules/webhook-validation-09-rubix-event-vs-http-call.md +432 -432
  299. package/docs/02-CORE-GUIDES/webhook-validation/readme.md +239 -239
  300. package/docs/02-CORE-GUIDES/webhook-validation/webhook-validation-quick-reference.md +392 -392
  301. package/docs/03-PATTERN-GUIDES/connector-scenarios/connector-scenarios-quick-reference.md +498 -498
  302. package/docs/03-PATTERN-GUIDES/connector-scenarios/connector-scenarios-readme.md +313 -313
  303. package/docs/03-PATTERN-GUIDES/connector-scenarios/examples/common-patterns.ts +612 -612
  304. package/docs/03-PATTERN-GUIDES/connector-scenarios/examples/connector-scenarios-readme.md +253 -253
  305. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-01-foundations.md +452 -452
  306. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-02-simple-scenarios.md +681 -681
  307. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-03-intermediate-scenarios.md +637 -637
  308. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-04-advanced-scenarios.md +650 -650
  309. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-05-bidirectional-sync.md +233 -233
  310. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-06-production-patterns.md +442 -442
  311. package/docs/03-PATTERN-GUIDES/connector-scenarios/modules/connector-scenarios-07-reference.md +445 -445
  312. package/docs/03-PATTERN-GUIDES/connector-scenarios/readme.md +31 -31
  313. package/docs/03-PATTERN-GUIDES/enterprise-integration-patterns.md +1528 -1528
  314. package/docs/03-PATTERN-GUIDES/error-handling/comprehensive-error-handling-guide.md +1437 -1437
  315. package/docs/03-PATTERN-GUIDES/error-handling/error-handling-quick-reference.md +390 -390
  316. package/docs/03-PATTERN-GUIDES/error-handling/examples/common-patterns.ts +438 -438
  317. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-01-foundations.md +362 -362
  318. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-02-error-types.md +850 -850
  319. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-03-utf8-handling.md +456 -456
  320. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-04-error-scenarios.md +658 -658
  321. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-05-calling-patterns.md +671 -671
  322. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-06-retry-strategies.md +1034 -1034
  323. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-07-monitoring.md +653 -653
  324. package/docs/03-PATTERN-GUIDES/error-handling/modules/error-handling-08-api-reference.md +847 -847
  325. package/docs/03-PATTERN-GUIDES/error-handling/readme.md +36 -36
  326. package/docs/03-PATTERN-GUIDES/examples/__tests__/readme.md +40 -40
  327. package/docs/03-PATTERN-GUIDES/examples/__tests__/resolver-examples.test.js +282 -282
  328. package/docs/03-PATTERN-GUIDES/examples/test-data/03-pattern-guides-readme.md +110 -110
  329. package/docs/03-PATTERN-GUIDES/examples/test-data/canonical-inventory.json +123 -123
  330. package/docs/03-PATTERN-GUIDES/examples/test-data/canonical-order.json +171 -171
  331. package/docs/03-PATTERN-GUIDES/examples/test-data/readme.md +28 -28
  332. package/docs/03-PATTERN-GUIDES/extraction/extraction-readme.md +15 -15
  333. package/docs/03-PATTERN-GUIDES/extraction/readme.md +25 -25
  334. package/docs/03-PATTERN-GUIDES/file-operations/examples/common-patterns.ts +407 -407
  335. package/docs/03-PATTERN-GUIDES/file-operations/examples/file-operations-readme.md +142 -142
  336. package/docs/03-PATTERN-GUIDES/file-operations/file-operations-quick-reference.md +462 -462
  337. package/docs/03-PATTERN-GUIDES/file-operations/file-operations-readme.md +379 -379
  338. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-01-foundations.md +430 -430
  339. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-02-quick-start.md +484 -484
  340. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-03-s3-operations.md +507 -507
  341. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-04-sftp-operations.md +963 -963
  342. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-05-streaming-performance.md +503 -503
  343. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-06-archive-patterns.md +386 -386
  344. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-06-error-handling.md +117 -117
  345. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-07-api-reference.md +78 -78
  346. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-07-testing-troubleshooting.md +567 -567
  347. package/docs/03-PATTERN-GUIDES/file-operations/modules/file-operations-08-api-reference.md +1055 -1055
  348. package/docs/03-PATTERN-GUIDES/file-operations/readme.md +32 -32
  349. package/docs/03-PATTERN-GUIDES/ingestion/ingestion-readme.md +15 -15
  350. package/docs/03-PATTERN-GUIDES/ingestion/readme.md +25 -25
  351. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/batch-processing.ts +130 -130
  352. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/common-patterns.ts +360 -360
  353. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/delta-sync.ts +130 -130
  354. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/integration-patterns-readme.md +100 -100
  355. package/docs/03-PATTERN-GUIDES/integration-patterns/examples/real-time-webhook.ts +398 -398
  356. package/docs/03-PATTERN-GUIDES/integration-patterns/integration-patterns-quick-reference.md +962 -962
  357. package/docs/03-PATTERN-GUIDES/integration-patterns/integration-patterns-readme.md +134 -134
  358. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-01-real-time-processing.md +991 -991
  359. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-02-batch-processing.md +1547 -1547
  360. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-03-delta-sync.md +1108 -1108
  361. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-04-webhook-patterns.md +1181 -1181
  362. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-05-error-handling.md +1061 -1061
  363. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-06-advanced-integration-services.md +1547 -1547
  364. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-06-performance.md +109 -109
  365. package/docs/03-PATTERN-GUIDES/integration-patterns/modules/integration-patterns-07-api-reference.md +34 -34
  366. package/docs/03-PATTERN-GUIDES/integration-patterns/readme.md +30 -30
  367. package/docs/03-PATTERN-GUIDES/logging-minimal-mode.md +128 -128
  368. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/common-patterns.ts +380 -380
  369. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/multiple-connections-readme.md +139 -139
  370. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/parallel-root-connections.ts +149 -149
  371. package/docs/03-PATTERN-GUIDES/multiple-connections/examples/real-world-scenarios.ts +405 -405
  372. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-01-foundations.md +378 -378
  373. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-02-quick-start.md +566 -566
  374. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-03-targeting-connections.md +659 -659
  375. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-04-parallel-queries.md +656 -656
  376. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-05-best-practices.md +624 -624
  377. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-06-api-reference.md +824 -824
  378. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-06-versori.md +119 -119
  379. package/docs/03-PATTERN-GUIDES/multiple-connections/modules/multiple-connections-07-api-reference.md +87 -87
  380. package/docs/03-PATTERN-GUIDES/multiple-connections/multiple-connections-quick-reference.md +353 -353
  381. package/docs/03-PATTERN-GUIDES/multiple-connections/multiple-connections-readme.md +270 -270
  382. package/docs/03-PATTERN-GUIDES/multiple-connections/readme.md +30 -30
  383. package/docs/03-PATTERN-GUIDES/pagination/pagination-readme.md +14 -14
  384. package/docs/03-PATTERN-GUIDES/pagination/readme.md +24 -24
  385. package/docs/03-PATTERN-GUIDES/parquet/examples/common-patterns.ts +180 -180
  386. package/docs/03-PATTERN-GUIDES/parquet/examples/read-parquet.ts +48 -48
  387. package/docs/03-PATTERN-GUIDES/parquet/examples/write-parquet.ts +65 -65
  388. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-01-introduction.md +393 -393
  389. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-02-quick-start.md +572 -572
  390. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-03-reading-parquet.md +525 -525
  391. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-04-writing-parquet.md +554 -554
  392. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-05-graphql-extraction.md +405 -405
  393. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-06-performance.md +104 -104
  394. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-06-s3-integration.md +511 -511
  395. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-07-api-reference.md +90 -90
  396. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-07-performance-optimization.md +525 -525
  397. package/docs/03-PATTERN-GUIDES/parquet/modules/03-pattern-guides-parquet-08-best-practices.md +712 -712
  398. package/docs/03-PATTERN-GUIDES/parquet/parquet-quick-reference.md +683 -683
  399. package/docs/03-PATTERN-GUIDES/parquet/parquet-readme.md +248 -248
  400. package/docs/03-PATTERN-GUIDES/parquet/readme.md +32 -32
  401. package/docs/03-PATTERN-GUIDES/parsers/parsers-readme.md +12 -12
  402. package/docs/03-PATTERN-GUIDES/parsers/readme.md +24 -24
  403. package/docs/03-PATTERN-GUIDES/readme.md +159 -159
  404. package/docs/03-PATTERN-GUIDES/webhooks/readme.md +24 -24
  405. package/docs/03-PATTERN-GUIDES/webhooks/webhooks-readme.md +8 -8
  406. package/docs/04-REFERENCE/architecture/architecture-01-overview.md +427 -427
  407. package/docs/04-REFERENCE/architecture/architecture-02-client-architecture.md +424 -424
  408. package/docs/04-REFERENCE/architecture/architecture-03-data-flow.md +690 -690
  409. package/docs/04-REFERENCE/architecture/architecture-04-service-layer.md +834 -834
  410. package/docs/04-REFERENCE/architecture/architecture-05-integration-architecture.md +655 -655
  411. package/docs/04-REFERENCE/architecture/architecture-06-state-management.md +653 -653
  412. package/docs/04-REFERENCE/architecture/architecture-adding-new-data-sources.md +686 -686
  413. package/docs/04-REFERENCE/architecture/readme.md +279 -279
  414. package/docs/04-REFERENCE/platforms/deno/readme.md +117 -117
  415. package/docs/04-REFERENCE/platforms/nodejs/readme.md +146 -146
  416. package/docs/04-REFERENCE/platforms/readme.md +135 -135
  417. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-01-introduction.md +398 -398
  418. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-02-quick-start.md +560 -560
  419. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-03-authentication.md +757 -757
  420. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-04-workflows.md +2476 -2476
  421. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-05-connections.md +1167 -1167
  422. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-06-kv-storage.md +990 -990
  423. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-06-state-management.md +121 -121
  424. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-07-api-reference.md +68 -68
  425. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-07-deployment.md +731 -731
  426. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-08-best-practices.md +1111 -1111
  427. package/docs/04-REFERENCE/platforms/versori/modules/platforms-versori-09-signature-reference.md +766 -766
  428. package/docs/04-REFERENCE/platforms/versori/platforms-versori-readme.md +299 -299
  429. package/docs/04-REFERENCE/platforms/versori/platforms-versori-s3-sftp-configuration-guide.md +1425 -1425
  430. package/docs/04-REFERENCE/platforms/versori/platforms-versori-webhook-api-key-security.md +816 -816
  431. package/docs/04-REFERENCE/platforms/versori/platforms-versori-webhook-connection-security.md +681 -681
  432. package/docs/04-REFERENCE/platforms/versori/platforms-versori-workflow-task-types.md +708 -708
  433. package/docs/04-REFERENCE/platforms/versori/readme.md +108 -108
  434. package/docs/04-REFERENCE/readme.md +148 -148
  435. package/docs/04-REFERENCE/resolver-signature/examples/advanced-resolvers.ts +482 -482
  436. package/docs/04-REFERENCE/resolver-signature/examples/async-resolvers.ts +496 -496
  437. package/docs/04-REFERENCE/resolver-signature/examples/basic-resolvers.ts +343 -343
  438. package/docs/04-REFERENCE/resolver-signature/examples/resolver-signature-readme.md +188 -188
  439. package/docs/04-REFERENCE/resolver-signature/examples/testing-resolvers.ts +463 -463
  440. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-01-foundations.md +286 -286
  441. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-02-parameter-reference.md +643 -643
  442. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-03-basic-examples.md +521 -521
  443. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-04-advanced-patterns.md +739 -739
  444. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-05-sdk-resolvers.md +531 -531
  445. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-06-migration-guide.md +650 -650
  446. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-06-testing.md +125 -125
  447. package/docs/04-REFERENCE/resolver-signature/modules/resolver-signature-07-api-reference.md +794 -794
  448. package/docs/04-REFERENCE/resolver-signature/readme.md +64 -64
  449. package/docs/04-REFERENCE/resolver-signature/resolver-signature-quick-reference.md +270 -270
  450. package/docs/04-REFERENCE/resolver-signature/resolver-signature-readme.md +351 -351
  451. package/docs/04-REFERENCE/schema/fluent-commerce-schema.json +764 -764
  452. package/docs/04-REFERENCE/schema/readme.md +141 -141
  453. package/docs/04-REFERENCE/testing/examples/04-reference-testing-readme.md +158 -158
  454. package/docs/04-REFERENCE/testing/examples/fluent-testing.ts +62 -62
  455. package/docs/04-REFERENCE/testing/examples/health-check.ts +155 -155
  456. package/docs/04-REFERENCE/testing/examples/integration-test.ts +119 -119
  457. package/docs/04-REFERENCE/testing/examples/performance-test.ts +183 -183
  458. package/docs/04-REFERENCE/testing/examples/s3-testing.ts +127 -127
  459. package/docs/04-REFERENCE/testing/modules/04-reference-testing-01-foundations.md +267 -267
  460. package/docs/04-REFERENCE/testing/modules/04-reference-testing-02-s3-testing.md +599 -599
  461. package/docs/04-REFERENCE/testing/modules/04-reference-testing-03-fluent-testing.md +589 -589
  462. package/docs/04-REFERENCE/testing/modules/04-reference-testing-04-integration-testing.md +699 -699
  463. package/docs/04-REFERENCE/testing/modules/04-reference-testing-05-debugging.md +478 -478
  464. package/docs/04-REFERENCE/testing/modules/04-reference-testing-06-cicd-integration.md +463 -463
  465. package/docs/04-REFERENCE/testing/modules/04-reference-testing-06-preflight-validation.md +131 -131
  466. package/docs/04-REFERENCE/testing/modules/04-reference-testing-07-best-practices.md +499 -499
  467. package/docs/04-REFERENCE/testing/modules/04-reference-testing-07-coverage-ci.md +165 -165
  468. package/docs/04-REFERENCE/testing/modules/04-reference-testing-08-api-reference.md +634 -634
  469. package/docs/04-REFERENCE/testing/readme.md +86 -86
  470. package/docs/04-REFERENCE/testing/testing-quick-reference.md +667 -667
  471. package/docs/04-REFERENCE/testing/testing-readme.md +286 -286
  472. package/docs/04-REFERENCE/troubleshooting/readme.md +144 -144
  473. package/docs/04-REFERENCE/troubleshooting/troubleshooting-deno-sftp-compatibility.md +392 -392
  474. package/docs/template-loading-matrix.md +242 -242
  475. package/package.json +5 -3
  476. package/docs/02-CORE-GUIDES/api-reference/cli-profile-integration.md +0 -377
@@ -1,1430 +1,1430 @@
1
- # Pattern: Large File Processing & Chunking
2
-
3
- **FC Connect SDK Use Case Guide**
4
-
5
- > **SDK**: [@fluentcommerce/fc-connect-sdk](https://www.npmjs.com/package/@fluentcommerce/fc-connect-sdk)
6
- > **Version**: Use latest - `npm install @fluentcommerce/fc-connect-sdk@latest`
7
-
8
- **Context**: Enterprise-scale file ingestion with streaming, splitting, and parallel processing
9
-
10
- **Type**: Advanced Pattern
11
-
12
- **Complexity**: High
13
-
14
- **Volume**: 500MB-5GB files, 1M-10M records
15
-
16
- **Latency**: Batch processing (< 30-60 min for 10M records)
17
-
18
- **Pattern**: Streaming + chunking + parallel Batch API
19
-
20
- ## When to Use This Pattern
21
-
22
- Use this pattern when dealing with:
23
-
24
- - **Large CSV files** (>500MB, >1M records)
25
- - **Memory-constrained environments** (Lambda, containers with limited RAM)
26
- - **Time-sensitive ingestion** (need parallel processing for speed)
27
- - **Reliability requirements** (checkpoint/resume on failure)
28
- - **Progress tracking** (real-time status updates)
29
-
30
- **Volume Guidance:**
31
-
32
- - **Small** (<1K records): Use basic ingestion pattern
33
- - **Medium** (1K-100K records): Use streaming pattern (Pattern 1)
34
- - **Large** (100K-1M records): Use file chunking pattern (Pattern 2)
35
- - **Huge** (1M-10M records): Use parallel processing pattern (Pattern 3)
36
- - **Enterprise** (10M+ records): Use distributed processing pattern (Pattern 4)
37
-
38
- ## Problem Statement
39
-
40
- ### Why Splitting is Needed
41
-
42
- **Memory Constraints:**
43
-
44
- ```typescript
45
- // ❌ WRONG - Loads entire 2GB file into memory
46
- const csvContent = await fs.readFile('huge-inventory.csv', 'utf-8');
47
- const records = await csvParser.parse(csvContent); // 💥 Out of memory
48
- ```
49
-
50
- **Impact:**
51
-
52
- - Lambda 512MB: Crashes on 500MB+ files
53
- - Container 1GB: Struggles with 1GB+ files
54
- - Node.js default heap (4GB): Fails on 5GB+ files
55
-
56
- **Time Constraints:**
57
-
58
- ```typescript
59
- // ❌ WRONG - Sequential processing takes 90+ minutes
60
- for (const record of records) {
61
- await processRecord(record); // Too slow for 10M records
62
- }
63
- ```
64
-
65
- **Reliability Requirements:**
66
-
67
- ```typescript
68
- // ❌ WRONG - Network failure loses all progress
69
- await processAllRecords(records); // If fails at record 5M, restart from 0
70
- ```
71
-
72
- ### Solution Overview
73
-
74
- This guide demonstrates 4 progressive patterns:
75
-
76
- 1. **Basic Streaming** (~200 lines) - Process records as they arrive, memory-efficient
77
- 2. **File Chunking** (~300 lines) - Split large files into manageable chunks
78
- 3. **Parallel Processing** (~400 lines) - Process chunks concurrently with progress tracking
79
- 4. **Distributed Processing** (~300 lines) - Use Versori scheduled workflows for enterprise scale
80
-
81
- ## SDK Methods Used
82
-
83
- ```typescript
84
- import {
85
- createClient,
86
- // Client factory (auto-detects context)
87
- CSVParserService,
88
- // Streaming CSV parser
89
- S3DataSource,
90
- // S3 file operations
91
- UniversalMapper,
92
- // Field mapping
93
- StateService,
94
- // Progress tracking
95
- VersoriKVAdapter,
96
- // Versori state management,
97
- // Structured logging,
98
- createConsoleLogger,
99
- toStructuredLogger
100
- } from '@fluentcommerce/fc-connect-sdk';
101
- ```
102
-
103
- ---
104
-
105
- ## Pattern 1: Basic Streaming (Memory-Efficient)
106
-
107
- **Best for:** 100K-1M records, single-threaded processing, memory-constrained environments
108
-
109
- **Memory Usage:**
110
-
111
- - ❌ Without streaming: 2GB file = 2GB+ RAM (file + parsed objects)
112
- - ✅ With streaming: 2GB file = ~50MB RAM (processes records incrementally)
113
-
114
- ### Implementation
115
-
116
- ```typescript
117
- import {
118
- createClient,
119
- CSVParserService,
120
- S3DataSource,
121
- UniversalMapper,
122
- createConsoleLogger,
123
- toStructuredLogger
124
- } from '@fluentcommerce/fc-connect-sdk';
125
-
126
- const logger = createConsoleLogger();
127
-
128
- async function streamingIngestion(ctx: any) {
129
- logger.info('Starting streaming ingestion');
130
-
131
- // Create client (auto-detects Versori context)
132
- const client = await createClient(ctx);
133
-
134
- // Initialize S3 data source
135
- const s3 = new S3DataSource(
136
- {
137
- type: 'S3_CSV',
138
- connectionId: 'my-s3',
139
- name: 'Inventory Files S3',
140
- s3Config: {
141
- bucket: 'inventory-files',
142
- region: 'us-east-1',
143
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
144
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
145
- },
146
- },
147
- logger
148
- );
149
-
150
- // Define field mapping
151
- const mapper = new UniversalMapper({
152
- fields: {
153
- skuRef: { source: 'sku', required: true },
154
- locationRef: { source: 'location_code', required: true },
155
- qty: { source: 'quantity', resolver: 'sdk.parseInt' },
156
- expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
157
- },
158
- });
159
-
160
- // Create CSV parser with streaming enabled
161
- const csvParser = new CSVParserService();
162
-
163
- // Download file as stream (not loaded into memory)
164
- logger.info('Downloading file from S3', {
165
- key: 'inventory/large-file.csv',
166
- });
167
-
168
- const fileContent = (await s3.downloadFile('inventory/large-file.csv', {
169
- encoding: 'utf8',
170
- })) as string;
171
-
172
- // Create job for batch ingestion
173
- const job = await client.createJob({
174
- name: 'streaming-inventory-ingestion',
175
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
176
- });
177
-
178
- logger.info('Job created', { jobId: job.id });
179
-
180
- // Statistics tracking
181
- let recordsProcessed = 0;
182
- let batchCount = 0;
183
- let errors = 0;
184
- const BATCH_SIZE = 1000;
185
- let currentBatch: any[] = [];
186
-
187
- // Stream records with batching (memory-efficient)
188
- // Records are parsed incrementally, not all at once
189
- for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
190
- try {
191
- // Map record
192
- const mapped = await mapper.map(record);
193
-
194
- if (mapped.success && mapped.data) {
195
- currentBatch.push(mapped.data);
196
- recordsProcessed++;
197
-
198
- // Send batch when full
199
- if (currentBatch.length >= BATCH_SIZE) {
200
- await client.sendBatch(job.id, {
201
- entities: currentBatch,
202
- });
203
-
204
- batchCount++;
205
-
206
- logger.info('Batch sent', {
207
- batchNumber: batchCount,
208
- recordsProcessed,
209
- currentBatchSize: currentBatch.length,
210
- });
211
-
212
- currentBatch = []; // Clear batch (frees memory)
213
- }
214
- } else {
215
- errors++;
216
- logger.warn('Record mapping failed', {
217
- record,
218
- errors: mapped.errors,
219
- });
220
- }
221
- } catch (error) {
222
- errors++;
223
- logger.error('Record processing failed', error as Error, { record });
224
- }
225
-
226
- // Progress logging every 10K records
227
- if (recordsProcessed % 10000 === 0) {
228
- logger.info('Progress update', {
229
- recordsProcessed,
230
- batchesSent: batchCount,
231
- errors,
232
- memoryUsage: process.memoryUsage().heapUsed / 1024 / 1024 + ' MB',
233
- });
234
- }
235
- }
236
-
237
- // Send remaining records
238
- if (currentBatch.length > 0) {
239
- await client.sendBatch(job.id, {
240
- entities: currentBatch,
241
- });
242
- batchCount++;
243
- }
244
-
245
- logger.info('Streaming ingestion complete', {
246
- totalRecords: recordsProcessed,
247
- batchesSent: batchCount,
248
- errors,
249
- jobId: job.id,
250
- });
251
-
252
- return {
253
- success: true,
254
- jobId: job.id,
255
- recordsProcessed,
256
- batchesSent: batchCount,
257
- errors,
258
- };
259
- }
260
- ```
261
-
262
- **Memory Profile:**
263
-
264
- ```
265
- File Size: 2GB (5M records)
266
- RAM Usage: ~50MB peak (1000 record batches)
267
- Processing Time: ~45 minutes (sequential)
268
- ```
269
-
270
- ---
271
-
272
- ## Pattern 2: File Chunking (Split & Track)
273
-
274
- **Best for:** 1M-5M records, need checkpoint/resume, want progress visibility
275
-
276
- **Strategy:**
277
-
278
- 1. Split large file into 100K record chunks
279
- 2. Write chunks to temp S3 locations
280
- 3. Track chunk metadata in VersoriKV
281
- 4. Process chunks sequentially (can resume on failure)
282
-
283
- ### Implementation
284
-
285
- ```typescript
286
- import {
287
- createClient,
288
- CSVParserService,
289
- S3DataSource,
290
- UniversalMapper,
291
- StateService,
292
- VersoriKVAdapter,
293
- createConsoleLogger,
294
- toStructuredLogger
295
- } from '@fluentcommerce/fc-connect-sdk';
296
-
297
- const logger = createConsoleLogger();
298
-
299
- interface ChunkMetadata {
300
- chunkId: string;
301
- startRecord: number;
302
- endRecord: number;
303
- s3Key: string;
304
- recordCount: number;
305
- status: 'pending' | 'processing' | 'completed' | 'failed';
306
- processedAt?: string;
307
- error?: string;
308
- }
309
-
310
- async function chunkedIngestion(ctx: any) {
311
- logger.info('Starting chunked ingestion');
312
-
313
- // Initialize services
314
- const client = await createClient(ctx);
315
-
316
- const s3 = new S3DataSource(
317
- {
318
- type: 'S3_CSV',
319
- connectionId: 'my-s3-chunked',
320
- name: 'Inventory Files S3 Chunked',
321
- s3Config: {
322
- bucket: 'inventory-files',
323
- region: 'us-east-1',
324
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
325
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
326
- },
327
- },
328
- logger
329
- );
330
-
331
- // Initialize state management
332
- const kv = context.openKv();
333
- const kvAdapter = new VersoriKVAdapter(kv);
334
- const stateService = new StateService(logger);
335
-
336
- const SOURCE_FILE = 'inventory/huge-inventory.csv';
337
- const CHUNK_SIZE = 100000; // 100K records per chunk
338
- const workflowId = 'chunked-ingestion';
339
-
340
- // STEP 1: Check if chunking is already in progress
341
- const existingState = await stateService.getSyncState(kvAdapter, workflowId);
342
-
343
- if (existingState.isInitialized && existingState.lastSyncResult === 'partial') {
344
- logger.info('Resuming from previous run', {
345
- lastProcessedFile: existingState.lastProcessedFile,
346
- lastProcessedCount: existingState.lastProcessedCount,
347
- });
348
- }
349
-
350
- // STEP 2: Split file into chunks
351
- logger.info('Splitting file into chunks', {
352
- sourceFile: SOURCE_FILE,
353
- chunkSize: CHUNK_SIZE,
354
- });
355
-
356
- const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
357
-
358
- logger.info('File split complete', {
359
- totalChunks: chunks.length,
360
- totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
361
- });
362
-
363
- // STEP 3: Create job for ingestion
364
- const job = await client.createJob({
365
- name: `chunked-inventory-ingestion-${Date.now()}`,
366
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
367
- });
368
-
369
- logger.info('Job created', { jobId: job.id });
370
-
371
- // STEP 4: Process each chunk sequentially
372
- let successCount = 0;
373
- let failureCount = 0;
374
-
375
- for (const chunk of chunks) {
376
- try {
377
- // Skip if already processed
378
- const chunkState = await kvAdapter.get(['chunk', workflowId, chunk.chunkId, 'status']);
379
-
380
- if (chunkState?.value === 'completed') {
381
- logger.info('Chunk already processed, skipping', {
382
- chunkId: chunk.chunkId,
383
- });
384
- successCount++;
385
- continue;
386
- }
387
-
388
- // Mark chunk as processing
389
- await kvAdapter.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
390
-
391
- logger.info('Processing chunk', {
392
- chunkId: chunk.chunkId,
393
- recordCount: chunk.recordCount,
394
- progress: `${successCount + failureCount}/${chunks.length}`,
395
- });
396
-
397
- // Process chunk
398
- await processChunk(s3, client, job.id, chunk);
399
-
400
- // Mark chunk as completed
401
- await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
402
- ...chunk,
403
- status: 'completed',
404
- processedAt: new Date().toISOString(),
405
- } as ChunkMetadata);
406
-
407
- successCount++;
408
-
409
- logger.info('Chunk completed', {
410
- chunkId: chunk.chunkId,
411
- successCount,
412
- failureCount,
413
- percentComplete: (((successCount + failureCount) / chunks.length) * 100).toFixed(1),
414
- });
415
- } catch (error) {
416
- failureCount++;
417
- logger.error('Chunk processing failed', error as Error, {
418
- chunkId: chunk.chunkId,
419
- });
420
-
421
- // Mark chunk as failed
422
- await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
423
- ...chunk,
424
- status: 'failed',
425
- error: (error as Error).message,
426
- } as ChunkMetadata);
427
- }
428
- }
429
-
430
- // STEP 5: Update final state
431
- await stateService.updateSyncState(
432
- kvAdapter,
433
- [
434
- {
435
- fileName: SOURCE_FILE,
436
- lastModified: new Date().toISOString(),
437
- recordCount: chunks.reduce((sum, c) => sum + c.recordCount, 0),
438
- },
439
- ],
440
- workflowId
441
- );
442
-
443
- logger.info('Chunked ingestion complete', {
444
- totalChunks: chunks.length,
445
- successCount,
446
- failureCount,
447
- jobId: job.id,
448
- });
449
-
450
- return {
451
- success: failureCount === 0,
452
- jobId: job.id,
453
- chunksProcessed: successCount,
454
- chunksFailed: failureCount,
455
- totalChunks: chunks.length,
456
- };
457
- }
458
-
459
- /**
460
- * Split file into chunks and upload to S3
461
- */
462
- async function splitFileIntoChunks(
463
- s3: S3DataSource,
464
- sourceKey: string,
465
- chunkSize: number,
466
- workflowId: string,
467
- kv: VersoriKVAdapter
468
- ): Promise<ChunkMetadata[]> {
469
- const csvParser = new CSVParserService();
470
- const chunks: ChunkMetadata[] = [];
471
-
472
- // Download source file
473
- const fileContent = (await s3.downloadFile(sourceKey, {
474
- encoding: 'utf8',
475
- })) as string;
476
-
477
- let currentChunk: any[] = [];
478
- let chunkNumber = 0;
479
- let recordNumber = 0;
480
-
481
- // Stream through file and create chunks
482
- for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
483
- currentChunk.push(record);
484
- recordNumber++;
485
-
486
- // Create chunk when size reached
487
- if (currentChunk.length >= chunkSize) {
488
- const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
489
- const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
490
-
491
- // Convert chunk to CSV
492
- const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
493
-
494
- // Upload chunk to S3
495
- await s3.uploadFile(chunkKey, chunkCSV, {
496
- contentType: 'text/csv',
497
- });
498
-
499
- // Create chunk metadata
500
- const metadata: ChunkMetadata = {
501
- chunkId,
502
- startRecord: recordNumber - currentChunk.length,
503
- endRecord: recordNumber - 1,
504
- s3Key: chunkKey,
505
- recordCount: currentChunk.length,
506
- status: 'pending',
507
- };
508
-
509
- chunks.push(metadata);
510
-
511
- // Store chunk metadata in KV
512
- await kv.set(['chunk', workflowId, chunkId], metadata);
513
-
514
- logger.info('Chunk created', {
515
- chunkId,
516
- recordCount: currentChunk.length,
517
- s3Key: chunkKey,
518
- });
519
-
520
- // Clear chunk (free memory)
521
- currentChunk = [];
522
- chunkNumber++;
523
- }
524
- }
525
-
526
- // Handle remaining records
527
- if (currentChunk.length > 0) {
528
- const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
529
- const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
530
-
531
- const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
532
- await s3.uploadFile(chunkKey, chunkCSV, { contentType: 'text/csv' });
533
-
534
- const metadata: ChunkMetadata = {
535
- chunkId,
536
- startRecord: recordNumber - currentChunk.length,
537
- endRecord: recordNumber - 1,
538
- s3Key: chunkKey,
539
- recordCount: currentChunk.length,
540
- status: 'pending',
541
- };
542
-
543
- chunks.push(metadata);
544
- await kv.set(['chunk', workflowId, chunkId], metadata);
545
- }
546
-
547
- return chunks;
548
- }
549
-
550
- /**
551
- * Process a single chunk
552
- */
553
- async function processChunk(
554
- s3: S3DataSource,
555
- client: any,
556
- jobId: string,
557
- chunk: ChunkMetadata
558
- ): Promise<void> {
559
- const csvParser = new CSVParserService();
560
- const mapper = new UniversalMapper({
561
- fields: {
562
- skuRef: { source: 'sku', required: true },
563
- locationRef: { source: 'location_code', required: true },
564
- qty: { source: 'quantity', resolver: 'sdk.parseInt' },
565
- expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
566
- },
567
- });
568
-
569
- // Download chunk
570
- const chunkContent = (await s3.downloadFile(chunk.s3Key, {
571
- encoding: 'utf8',
572
- })) as string;
573
-
574
- // Parse chunk
575
- const records = await csvParser.parse(chunkContent);
576
-
577
- // Map records
578
- const entities: any[] = [];
579
- for (const record of records) {
580
- const mapped = await mapper.map(record);
581
- if (mapped.success && mapped.data) {
582
- entities.push(mapped.data);
583
- }
584
- }
585
-
586
- // Send batch
587
- await client.sendBatch(jobId, { entities });
588
-
589
- logger.info('Chunk batch sent', {
590
- chunkId: chunk.chunkId,
591
- entityCount: entities.length,
592
- });
593
- }
594
- ```
595
-
596
- **VersoriKV Schema:**
597
-
598
- ```typescript
599
- // Chunk metadata
600
- ['chunk', workflowId, chunkId] => ChunkMetadata
601
-
602
- // Chunk status
603
- ['chunk', workflowId, chunkId, 'status'] => 'pending' | 'processing' | 'completed' | 'failed'
604
-
605
- // Workflow state
606
- ['state', workflowId, 'sync'] => SyncState
607
- ```
608
-
609
- **Performance:**
610
-
611
- ```
612
- File Size: 5GB (10M records)
613
- Chunk Size: 100K records
614
- Total Chunks: 100
615
- Processing Time: ~60 minutes (sequential)
616
- RAM Usage: ~100MB (processes one chunk at a time)
617
- ```
618
-
619
- ---
620
-
621
- ## Pattern 3: Parallel Processing (High Performance)
622
-
623
- **Best for:** 5M-10M records, time-sensitive ingestion, need speed with reliability
624
-
625
- **Strategy:**
626
-
627
- 1. Split file into chunks (same as Pattern 2)
628
- 2. Spawn 5 parallel Batch API jobs
629
- 3. Process chunks concurrently
630
- 4. Track progress in VersoriKV
631
- 5. Resume on failure
632
-
633
- ### Implementation
634
-
635
- ```typescript
636
- import {
637
- createClient,
638
- CSVParserService,
639
- S3DataSource,
640
- UniversalMapper,
641
- StateService,
642
- VersoriKVAdapter,
643
- createConsoleLogger,
644
- toStructuredLogger
645
- } from '@fluentcommerce/fc-connect-sdk';
646
-
647
- const logger = createConsoleLogger();
648
-
649
- interface ParallelJob {
650
- jobId: string;
651
- assignedChunks: string[];
652
- status: 'pending' | 'processing' | 'completed' | 'failed';
653
- recordsProcessed: number;
654
- startedAt?: string;
655
- completedAt?: string;
656
- }
657
-
658
- async function parallelIngestion(ctx: any) {
659
- logger.info('Starting parallel ingestion');
660
-
661
- // Initialize services
662
- const client = await createClient(ctx);
663
-
664
- const s3 = new S3DataSource(
665
- {
666
- type: 'S3_CSV',
667
- connectionId: 'my-s3-parallel',
668
- name: 'Inventory Files S3 Parallel',
669
- s3Config: {
670
- bucket: 'inventory-files',
671
- region: 'us-east-1',
672
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
673
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
674
- },
675
- },
676
- logger
677
- );
678
-
679
- const kv = context.openKv();
680
- const kvAdapter = new VersoriKVAdapter(kv);
681
- const stateService = new StateService(logger);
682
-
683
- const SOURCE_FILE = 'inventory/huge-inventory.csv';
684
- const CHUNK_SIZE = 100000; // 100K records per chunk
685
- const PARALLEL_JOBS = 5; // Process 5 chunks concurrently
686
- const workflowId = 'parallel-ingestion';
687
-
688
- // STEP 1: Split file into chunks (reuse from Pattern 2)
689
- const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
690
-
691
- logger.info('File split complete', {
692
- totalChunks: chunks.length,
693
- totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
694
- });
695
-
696
- // STEP 2: Create multiple jobs for parallel processing
697
- const jobs: ParallelJob[] = [];
698
-
699
- for (let i = 0; i < PARALLEL_JOBS; i++) {
700
- const job = await client.createJob({
701
- name: `parallel-inventory-ingestion-job-${i + 1}`,
702
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
703
- });
704
-
705
- jobs.push({
706
- jobId: job.id,
707
- assignedChunks: [],
708
- status: 'pending',
709
- recordsProcessed: 0,
710
- });
711
-
712
- logger.info('Parallel job created', {
713
- jobNumber: i + 1,
714
- jobId: job.id,
715
- });
716
- }
717
-
718
- // STEP 3: Distribute chunks across jobs (round-robin)
719
- chunks.forEach((chunk, index) => {
720
- const jobIndex = index % PARALLEL_JOBS;
721
- jobs[jobIndex].assignedChunks.push(chunk.chunkId);
722
- });
723
-
724
- logger.info('Chunks distributed', {
725
- totalChunks: chunks.length,
726
- jobCount: PARALLEL_JOBS,
727
- chunksPerJob: jobs.map(j => j.assignedChunks.length),
728
- });
729
-
730
- // STEP 4: Process chunks in parallel
731
- const startTime = Date.now();
732
-
733
- const jobPromises = jobs.map((job, jobIndex) =>
734
- processJobChunks(
735
- s3,
736
- client,
737
- job,
738
- chunks.filter(c => job.assignedChunks.includes(c.chunkId)),
739
- workflowId,
740
- kvAdapter,
741
- jobIndex + 1
742
- )
743
- );
744
-
745
- // Wait for all jobs to complete
746
- const results = await Promise.allSettled(jobPromises);
747
- const duration = (Date.now() - startTime) / 1000;
748
-
749
- // STEP 5: Analyze results
750
- let successfulJobs = 0;
751
- let failedJobs = 0;
752
- let totalRecordsProcessed = 0;
753
-
754
- results.forEach((result, index) => {
755
- if (result.status === 'fulfilled') {
756
- successfulJobs++;
757
- totalRecordsProcessed += result.value.recordsProcessed;
758
-
759
- logger.info('Job completed', {
760
- jobNumber: index + 1,
761
- jobId: jobs[index].jobId,
762
- recordsProcessed: result.value.recordsProcessed,
763
- chunksProcessed: result.value.chunksProcessed,
764
- });
765
- } else {
766
- failedJobs++;
767
- logger.error('Job failed', result.reason, {
768
- jobNumber: index + 1,
769
- jobId: jobs[index].jobId,
770
- });
771
- }
772
- });
773
-
774
- // STEP 6: Update final state
775
- await stateService.updateSyncState(
776
- kvAdapter,
777
- [
778
- {
779
- fileName: SOURCE_FILE,
780
- lastModified: new Date().toISOString(),
781
- recordCount: totalRecordsProcessed,
782
- },
783
- ],
784
- workflowId
785
- );
786
-
787
- logger.info('Parallel ingestion complete', {
788
- totalChunks: chunks.length,
789
- parallelJobs: PARALLEL_JOBS,
790
- successfulJobs,
791
- failedJobs,
792
- totalRecordsProcessed,
793
- durationSeconds: duration,
794
- recordsPerSecond: Math.round(totalRecordsProcessed / duration),
795
- });
796
-
797
- return {
798
- success: failedJobs === 0,
799
- totalChunks: chunks.length,
800
- totalRecordsProcessed,
801
- successfulJobs,
802
- failedJobs,
803
- durationSeconds: duration,
804
- recordsPerSecond: Math.round(totalRecordsProcessed / duration),
805
- };
806
- }
807
-
808
- /**
809
- * Process all chunks assigned to a job
810
- */
811
- async function processJobChunks(
812
- s3: S3DataSource,
813
- client: any,
814
- job: ParallelJob,
815
- chunks: ChunkMetadata[],
816
- workflowId: string,
817
- kv: VersoriKVAdapter,
818
- jobNumber: number
819
- ): Promise<{ recordsProcessed: number; chunksProcessed: number }> {
820
- logger.info(`Job ${jobNumber} starting`, {
821
- jobId: job.jobId,
822
- assignedChunks: chunks.length,
823
- });
824
-
825
- let recordsProcessed = 0;
826
- let chunksProcessed = 0;
827
-
828
- for (const chunk of chunks) {
829
- try {
830
- // Check if chunk already processed
831
- const chunkState = await kv.get(['chunk', workflowId, chunk.chunkId, 'status']);
832
-
833
- if (chunkState?.value === 'completed') {
834
- logger.info(`Job ${jobNumber}: Chunk already processed`, {
835
- chunkId: chunk.chunkId,
836
- });
837
- chunksProcessed++;
838
- continue;
839
- }
840
-
841
- // Mark chunk as processing
842
- await kv.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
843
-
844
- logger.info(`Job ${jobNumber}: Processing chunk`, {
845
- chunkId: chunk.chunkId,
846
- recordCount: chunk.recordCount,
847
- progress: `${chunksProcessed}/${chunks.length}`,
848
- });
849
-
850
- // Process chunk
851
- await processChunk(s3, client, job.jobId, chunk);
852
-
853
- // Mark chunk as completed
854
- await kv.set(['chunk', workflowId, chunk.chunkId], {
855
- ...chunk,
856
- status: 'completed',
857
- processedAt: new Date().toISOString(),
858
- } as ChunkMetadata);
859
-
860
- recordsProcessed += chunk.recordCount;
861
- chunksProcessed++;
862
-
863
- logger.info(`Job ${jobNumber}: Chunk completed`, {
864
- chunkId: chunk.chunkId,
865
- recordsProcessed,
866
- chunksProcessed,
867
- percentComplete: ((chunksProcessed / chunks.length) * 100).toFixed(1),
868
- });
869
- } catch (error) {
870
- logger.error(`Job ${jobNumber}: Chunk failed`, error as Error, {
871
- chunkId: chunk.chunkId,
872
- });
873
-
874
- // Mark chunk as failed (don't throw - continue with remaining chunks)
875
- await kv.set(['chunk', workflowId, chunk.chunkId], {
876
- ...chunk,
877
- status: 'failed',
878
- error: (error as Error).message,
879
- } as ChunkMetadata);
880
- }
881
- }
882
-
883
- logger.info(`Job ${jobNumber} completed`, {
884
- jobId: job.jobId,
885
- recordsProcessed,
886
- chunksProcessed,
887
- });
888
-
889
- return { recordsProcessed, chunksProcessed };
890
- }
891
- ```
892
-
893
- **Progress Tracking:**
894
-
895
- ```typescript
896
- // Real-time progress query
897
- async function getIngestionProgress(
898
- workflowId: string,
899
- kv: VersoriKVAdapter
900
- ): Promise<{
901
- totalChunks: number;
902
- completedChunks: number;
903
- failedChunks: number;
904
- processingChunks: number;
905
- percentComplete: number;
906
- }> {
907
- // This would query all chunk statuses from KV
908
- // Simplified example:
909
- const chunks = await getAllChunkMetadata(workflowId, kv);
910
-
911
- const completed = chunks.filter(c => c.status === 'completed').length;
912
- const failed = chunks.filter(c => c.status === 'failed').length;
913
- const processing = chunks.filter(c => c.status === 'processing').length;
914
-
915
- return {
916
- totalChunks: chunks.length,
917
- completedChunks: completed,
918
- failedChunks: failed,
919
- processingChunks: processing,
920
- percentComplete: (completed / chunks.length) * 100,
921
- };
922
- }
923
- ```
924
-
925
- **Performance:**
926
-
927
- ```
928
- File Size: 5GB (10M records)
929
- Chunk Size: 100K records
930
- Total Chunks: 100
931
- Parallel Jobs: 5
932
- Processing Time: ~15 minutes (4x speedup)
933
- RAM Usage: ~500MB (5 chunks in parallel)
934
- Throughput: ~11,111 records/second
935
- ```
936
-
937
- ---
938
-
939
- ## Pattern 4: Distributed Processing (Versori Workflows)
940
-
941
- **Best for:** 10M+ records, enterprise scale, need maximum reliability and observability
942
-
943
- **Strategy:**
944
-
945
- 1. Coordinator workflow splits file and creates scheduled tasks
946
- 2. Each worker workflow processes one chunk
947
- 3. Coordinator tracks completion via VersoriKV
948
- 4. Automatic retry on worker failure
949
-
950
- ### Coordinator Workflow
951
-
952
- ```typescript
953
- import { fn, schedule } from '@versori/run';
954
- import {
955
- createClient,
956
- S3DataSource,
957
- VersoriKVAdapter,
958
- createConsoleLogger,
959
- toStructuredLogger
960
- } from '@fluentcommerce/fc-connect-sdk';
961
-
962
- const logger = createConsoleLogger();
963
-
964
- /**
965
- * Coordinator workflow - splits file and spawns workers
966
- */
967
- export const coordinatorWorkflow = schedule('coordinator')
968
- .cron('0 2 * * *') // Run daily at 2 AM
969
- .then(
970
- fn('split-and-schedule', async ({ activation, connections, kv }) => {
971
- logger.info('Coordinator: Starting distributed ingestion');
972
-
973
- const s3 = new S3DataSource(
974
- {
975
- type: 'S3_CSV',
976
- connectionId: 'my-s3-3',
977
- name: 'Inventory Files S3 3',
978
- s3Config: {
979
- bucket: 'inventory-files',
980
- region: 'us-east-1',
981
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
982
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
983
- },
984
- },
985
- logger
986
- );
987
-
988
- const kvAdapter = new VersoriKVAdapter(kv);
989
- const workflowId = `distributed-${Date.now()}`;
990
- const SOURCE_FILE = 'inventory/enterprise-inventory.csv';
991
- const CHUNK_SIZE = 100000;
992
-
993
- // Split file into chunks
994
- const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
995
-
996
- logger.info('Coordinator: File split complete', {
997
- totalChunks: chunks.length,
998
- workflowId,
999
- });
1000
-
1001
- // Store coordinator state
1002
- await kvAdapter.set(['coordinator', workflowId], {
1003
- workflowId,
1004
- sourceFile: SOURCE_FILE,
1005
- totalChunks: chunks.length,
1006
- status: 'scheduled',
1007
- createdAt: new Date().toISOString(),
1008
- });
1009
-
1010
- // Schedule worker for each chunk
1011
- for (const chunk of chunks) {
1012
- // Trigger worker workflow (Versori will handle scheduling)
1013
- await activation.triggerWorkflow('chunk-worker', {
1014
- workflowId,
1015
- chunkId: chunk.chunkId,
1016
- chunkKey: chunk.s3Key,
1017
- recordCount: chunk.recordCount,
1018
- });
1019
-
1020
- logger.info('Coordinator: Worker scheduled', {
1021
- chunkId: chunk.chunkId,
1022
- workflowId,
1023
- });
1024
- }
1025
-
1026
- return {
1027
- workflowId,
1028
- totalChunks: chunks.length,
1029
- message: `Scheduled ${chunks.length} worker workflows`,
1030
- };
1031
- })
1032
- );
1033
-
1034
- /**
1035
- * Monitor workflow - checks completion status
1036
- */
1037
- export const monitorWorkflow = schedule('monitor')
1038
- .cron('*/5 * * * *') // Run every 5 minutes
1039
- .then(
1040
- fn('check-progress', async ({ kv }) => {
1041
- const kvAdapter = new VersoriKVAdapter(kv);
1042
-
1043
- // Get all active coordinators
1044
- const coordinators = await getActiveCoordinators(kvAdapter);
1045
-
1046
- for (const coordinator of coordinators) {
1047
- const progress = await getIngestionProgress(coordinator.workflowId, kvAdapter);
1048
-
1049
- logger.info('Monitor: Progress update', {
1050
- workflowId: coordinator.workflowId,
1051
- ...progress,
1052
- });
1053
-
1054
- // Check if complete
1055
- if (progress.completedChunks + progress.failedChunks === progress.totalChunks) {
1056
- // Mark coordinator as complete
1057
- await kvAdapter.set(['coordinator', coordinator.workflowId], {
1058
- ...coordinator,
1059
- status: 'completed',
1060
- completedAt: new Date().toISOString(),
1061
- progress,
1062
- });
1063
-
1064
- logger.info('Monitor: Ingestion complete', {
1065
- workflowId: coordinator.workflowId,
1066
- ...progress,
1067
- });
1068
- }
1069
- }
1070
-
1071
- return { coordinatorsChecked: coordinators.length };
1072
- })
1073
- );
1074
- ```
1075
-
1076
- ### Worker Workflow
1077
-
1078
- ```typescript
1079
- import { fn, webhook } from '@versori/run';
1080
- import {
1081
- createClient,
1082
- S3DataSource,
1083
- CSVParserService,
1084
- UniversalMapper,
1085
- VersoriKVAdapter,
1086
- createConsoleLogger,
1087
- toStructuredLogger
1088
- } from '@fluentcommerce/fc-connect-sdk';
1089
-
1090
- const logger = createConsoleLogger();
1091
-
1092
- /**
1093
- * Worker workflow - processes a single chunk
1094
- */
1095
- export const chunkWorker = webhook('chunk-worker').then(
1096
- fn('process-chunk', async ({ data, activation, connections, kv }) => {
1097
- const { workflowId, chunkId, chunkKey, recordCount } = data;
1098
-
1099
- logger.info('Worker: Starting chunk processing', {
1100
- workflowId,
1101
- chunkId,
1102
- recordCount,
1103
- });
1104
-
1105
- const kvAdapter = new VersoriKVAdapter(kv);
1106
-
1107
- // Check if already processed
1108
- const chunkState = await kvAdapter.get(['chunk', workflowId, chunkId, 'status']);
1109
-
1110
- if (chunkState?.value === 'completed') {
1111
- logger.info('Worker: Chunk already processed', { chunkId });
1112
- return { chunkId, status: 'skipped', message: 'Already processed' };
1113
- }
1114
-
1115
- // Mark as processing
1116
- await kvAdapter.set(['chunk', workflowId, chunkId, 'status'], 'processing');
1117
-
1118
- try {
1119
- // Initialize services
1120
- const client = await createClient(ctx);
1121
-
1122
- const s3 = new S3DataSource(
1123
- {
1124
- type: 'S3_CSV',
1125
- connectionId: 'my-s3-4',
1126
- name: 'Inventory Files S3 4',
1127
- s3Config: {
1128
- bucket: 'inventory-files',
1129
- region: 'us-east-1',
1130
- accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
1131
- secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
1132
- },
1133
- },
1134
- logger
1135
- );
1136
-
1137
- const csvParser = new CSVParserService();
1138
- const mapper = new UniversalMapper({
1139
- fields: {
1140
- skuRef: { source: 'sku', required: true },
1141
- locationRef: { source: 'location_code', required: true },
1142
- qty: { source: 'quantity', resolver: 'sdk.parseInt' },
1143
- expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
1144
- },
1145
- });
1146
-
1147
- // Get or create job for this workflow
1148
- let jobId = await kvAdapter.get(['job', workflowId, 'jobId']);
1149
-
1150
- if (!jobId?.value) {
1151
- const job = await client.createJob({
1152
- name: `distributed-ingestion-${workflowId}`,
1153
- retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
1154
- });
1155
-
1156
- await kvAdapter.set(['job', workflowId, 'jobId'], job.id);
1157
- jobId = { value: job.id };
1158
- }
1159
-
1160
- // Download chunk
1161
- const chunkContent = (await s3.downloadFile(chunkKey, {
1162
- encoding: 'utf8',
1163
- })) as string;
1164
-
1165
- // Parse chunk
1166
- const records = await csvParser.parse(chunkContent);
1167
-
1168
- // Map records
1169
- const entities: any[] = [];
1170
- for (const record of records) {
1171
- const mapped = await mapper.map(record);
1172
- if (mapped.success && mapped.data) {
1173
- entities.push(mapped.data);
1174
- }
1175
- }
1176
-
1177
- // Send batch
1178
- await client.sendBatch(jobId.value as string, { entities });
1179
-
1180
- // Mark as completed
1181
- await kvAdapter.set(['chunk', workflowId, chunkId], {
1182
- chunkId,
1183
- s3Key: chunkKey,
1184
- recordCount: entities.length,
1185
- status: 'completed',
1186
- processedAt: new Date().toISOString(),
1187
- });
1188
-
1189
- logger.info('Worker: Chunk completed', {
1190
- workflowId,
1191
- chunkId,
1192
- recordCount: entities.length,
1193
- });
1194
-
1195
- return {
1196
- chunkId,
1197
- status: 'completed',
1198
- recordsProcessed: entities.length,
1199
- };
1200
- } catch (error) {
1201
- logger.error('Worker: Chunk failed', error as Error, {
1202
- workflowId,
1203
- chunkId,
1204
- });
1205
-
1206
- // Mark as failed
1207
- await kvAdapter.set(['chunk', workflowId, chunkId], {
1208
- chunkId,
1209
- s3Key: chunkKey,
1210
- recordCount,
1211
- status: 'failed',
1212
- error: (error as Error).message,
1213
- });
1214
-
1215
- throw error;
1216
- }
1217
- })
1218
- );
1219
- ```
1220
-
1221
- **Performance:**
1222
-
1223
- ```
1224
- File Size: 10GB (20M records)
1225
- Chunk Size: 100K records
1226
- Total Chunks: 200
1227
- Worker Workflows: 200 (parallel)
1228
- Processing Time: ~10 minutes (Versori handles parallelism)
1229
- RAM Usage: ~50MB per worker
1230
- Throughput: ~33,333 records/second
1231
- ```
1232
-
1233
- ---
1234
-
1235
- ## Memory Optimization Tips
1236
-
1237
- ### 1. Use Streaming APIs
1238
-
1239
- ```typescript
1240
- // ❌ WRONG - Loads entire file into memory
1241
- const fileContent = await fs.readFile('huge.csv', 'utf-8');
1242
- const records = await csvParser.parse(fileContent);
1243
-
1244
- // ✅ CORRECT - Streams records incrementally
1245
- for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
1246
- await processRecord(record);
1247
- }
1248
- ```
1249
-
1250
- ### 2. Clear Batches After Processing
1251
-
1252
- ```typescript
1253
- let batch: any[] = [];
1254
- for await (const record of records) {
1255
- batch.push(record);
1256
-
1257
- if (batch.length >= 1000) {
1258
- await sendBatch(batch);
1259
- batch = []; // ✅ Clear batch to free memory
1260
- }
1261
- }
1262
- ```
1263
-
1264
- ### 3. Monitor Memory Usage
1265
-
1266
- ```typescript
1267
- function logMemoryUsage() {
1268
- const used = process.memoryUsage();
1269
- console.log({
1270
- heapUsed: Math.round(used.heapUsed / 1024 / 1024) + ' MB',
1271
- heapTotal: Math.round(used.heapTotal / 1024 / 1024) + ' MB',
1272
- rss: Math.round(used.rss / 1024 / 1024) + ' MB',
1273
- });
1274
- }
1275
-
1276
- // Log every 10K records
1277
- if (recordsProcessed % 10000 === 0) {
1278
- logMemoryUsage();
1279
- }
1280
- ```
1281
-
1282
- ### 4. Use Garbage Collection Hints
1283
-
1284
- ```typescript
1285
- // Force garbage collection (requires --expose-gc flag)
1286
- if (recordsProcessed % 100000 === 0 && global.gc) {
1287
- global.gc();
1288
- logger.info('Garbage collection triggered', { recordsProcessed });
1289
- }
1290
- ```
1291
-
1292
- ---
1293
-
1294
- ## Performance Benchmarks
1295
-
1296
- ### Pattern Comparison (10M records, 5GB file)
1297
-
1298
- | Pattern | Time | RAM | Throughput | Complexity |
1299
- | ------------------------- | ------ | ------ | -------------- | ---------- |
1300
- | 1. Basic Streaming | 90 min | 50MB | 1,852 rec/sec | Low |
1301
- | 2. File Chunking | 60 min | 100MB | 2,778 rec/sec | Medium |
1302
- | 3. Parallel Processing | 15 min | 500MB | 11,111 rec/sec | High |
1303
- | 4. Distributed Processing | 10 min | 50MB\* | 16,667 rec/sec | Very High |
1304
-
1305
- \*Per worker; total RAM = 50MB × worker count
1306
-
1307
- ### Optimization Impact
1308
-
1309
- | Optimization | Before | After | Improvement |
1310
- | ------------------------- | ------- | -------- | ----------- |
1311
- | Streaming vs Loading | 5GB RAM | 50MB RAM | 100x |
1312
- | Batching (1K vs 10K) | 90 min | 60 min | 1.5x |
1313
- | Parallel (1 vs 5 jobs) | 60 min | 15 min | 4x |
1314
- | Distributed (200 workers) | 15 min | 10 min | 1.5x |
1315
-
1316
- ---
1317
-
1318
- ## Common Issues & Solutions
1319
-
1320
- ### Issue 1: Out of Memory
1321
-
1322
- **Symptoms:**
1323
-
1324
- ```
1325
- FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1326
- ```
1327
-
1328
- **Solutions:**
1329
-
1330
- 1. Switch to streaming pattern (Pattern 1)
1331
- 2. Reduce batch size (1000 => 500)
1332
- 3. Increase Node.js heap: `node --max-old-space-size=4096`
1333
- 4. Use file chunking (Pattern 2)
1334
-
1335
- ### Issue 2: Timeout on Large Files
1336
-
1337
- **Symptoms:**
1338
-
1339
- ```
1340
- TimeoutError: Operation timed out after 300000ms
1341
- ```
1342
-
1343
- **Solutions:**
1344
-
1345
- 1. Increase timeout: `config.timeout = 600000` (10 min)
1346
- 2. Split file into chunks (Pattern 2)
1347
- 3. Use parallel processing (Pattern 3)
1348
-
1349
- ### Issue 3: Chunks Not Resuming
1350
-
1351
- **Symptoms:**
1352
-
1353
- - Re-processing same chunks on failure
1354
-
1355
- **Solutions:**
1356
-
1357
- ```typescript
1358
- // Check chunk status before processing
1359
- const chunkState = await kv.get(['chunk', workflowId, chunkId, 'status']);
1360
- if (chunkState?.value === 'completed') {
1361
- logger.info('Chunk already processed, skipping', { chunkId });
1362
- continue;
1363
- }
1364
- ```
1365
-
1366
- ### Issue 4: Progress Tracking Inconsistent
1367
-
1368
- **Symptoms:**
1369
-
1370
- - Progress percentage doesn't match reality
1371
-
1372
- **Solutions:**
1373
-
1374
- ```typescript
1375
- // Always update chunk status atomically
1376
- const atomic = kv.atomic();
1377
- atomic.set(['chunk', workflowId, chunkId, 'status'], 'completed');
1378
- atomic.set(['chunk', workflowId, chunkId, 'processedAt'], new Date().toISOString());
1379
- await atomic.commit();
1380
- ```
1381
-
1382
- ### Issue 5: Duplicate Processing
1383
-
1384
- **Symptoms:**
1385
-
1386
- - Same records sent multiple times
1387
-
1388
- **Solutions:**
1389
-
1390
- ```typescript
1391
- // Use idempotency keys in Fluent batch payload
1392
- await client.sendBatch(jobId, {
1393
- entities,
1394
- meta: {
1395
- chunkId: chunk.chunkId,
1396
- workflowId,
1397
- idempotencyKey: `${workflowId}-${chunk.chunkId}`,
1398
- },
1399
- });
1400
- ```
1401
-
1402
- ---
1403
-
1404
- ## Related Guides
1405
-
1406
- - [Basic Ingestion Pattern](../standalone/s3-csv-batch-api.md) - For small files (<100K records)
1407
- - [Streaming Pattern](../../02-CORE-GUIDES/ingestion/ingestion-readme.md) - For medium files (100K-1M records)
1408
- - [Error Handling & Retry](./error-handling-retry.md) - Robust error handling strategies
1409
- - [Progress Tracking](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md) - Real-time progress monitoring
1410
- - [State Management](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md) - VersoriKV patterns
1411
-
1412
- ---
1413
-
1414
- ## Summary
1415
-
1416
- **Choose Your Pattern:**
1417
-
1418
- - **Pattern 1 (Streaming)**: Simple, memory-efficient, suitable for 100K-1M records
1419
- - **Pattern 2 (Chunking)**: Checkpoint/resume, suitable for 1M-5M records
1420
- - **Pattern 3 (Parallel)**: High performance, suitable for 5M-10M records
1421
- - **Pattern 4 (Distributed)**: Enterprise scale, suitable for 10M+ records
1422
-
1423
- **Key Takeaways:**
1424
-
1425
- 1. Always use streaming APIs for large files
1426
- 2. Clear batches after processing to free memory
1427
- 3. Use chunks + VersoriKV for checkpoint/resume
1428
- 4. Parallel processing trades RAM for speed
1429
- 5. Monitor memory usage throughout processing
1430
- 6. Test with representative file sizes before production
1
+ # Pattern: Large File Processing & Chunking
2
+
3
+ **FC Connect SDK Use Case Guide**
4
+
5
+ > **SDK**: [@fluentcommerce/fc-connect-sdk](https://www.npmjs.com/package/@fluentcommerce/fc-connect-sdk)
6
+ > **Version**: Use latest - `npm install @fluentcommerce/fc-connect-sdk@latest`
7
+
8
+ **Context**: Enterprise-scale file ingestion with streaming, splitting, and parallel processing
9
+
10
+ **Type**: Advanced Pattern
11
+
12
+ **Complexity**: High
13
+
14
+ **Volume**: 500MB-5GB files, 1M-10M records
15
+
16
+ **Latency**: Batch processing (< 30-60 min for 10M records)
17
+
18
+ **Pattern**: Streaming + chunking + parallel Batch API
19
+
20
+ ## When to Use This Pattern
21
+
22
+ Use this pattern when dealing with:
23
+
24
+ - **Large CSV files** (>500MB, >1M records)
25
+ - **Memory-constrained environments** (Lambda, containers with limited RAM)
26
+ - **Time-sensitive ingestion** (need parallel processing for speed)
27
+ - **Reliability requirements** (checkpoint/resume on failure)
28
+ - **Progress tracking** (real-time status updates)
29
+
30
+ **Volume Guidance:**
31
+
32
+ - **Small** (<1K records): Use basic ingestion pattern
33
+ - **Medium** (1K-100K records): Use streaming pattern (Pattern 1)
34
+ - **Large** (100K-1M records): Use file chunking pattern (Pattern 2)
35
+ - **Huge** (1M-10M records): Use parallel processing pattern (Pattern 3)
36
+ - **Enterprise** (10M+ records): Use distributed processing pattern (Pattern 4)
37
+
38
+ ## Problem Statement
39
+
40
+ ### Why Splitting is Needed
41
+
42
+ **Memory Constraints:**
43
+
44
+ ```typescript
45
+ // ❌ WRONG - Loads entire 2GB file into memory
46
+ const csvContent = await fs.readFile('huge-inventory.csv', 'utf-8');
47
+ const records = await csvParser.parse(csvContent); // 💥 Out of memory
48
+ ```
49
+
50
+ **Impact:**
51
+
52
+ - Lambda 512MB: Crashes on 500MB+ files
53
+ - Container 1GB: Struggles with 1GB+ files
54
+ - Node.js default heap (4GB): Fails on 5GB+ files
55
+
56
+ **Time Constraints:**
57
+
58
+ ```typescript
59
+ // ❌ WRONG - Sequential processing takes 90+ minutes
60
+ for (const record of records) {
61
+ await processRecord(record); // Too slow for 10M records
62
+ }
63
+ ```
64
+
65
+ **Reliability Requirements:**
66
+
67
+ ```typescript
68
+ // ❌ WRONG - Network failure loses all progress
69
+ await processAllRecords(records); // If fails at record 5M, restart from 0
70
+ ```
71
+
72
+ ### Solution Overview
73
+
74
+ This guide demonstrates 4 progressive patterns:
75
+
76
+ 1. **Basic Streaming** (~200 lines) - Process records as they arrive, memory-efficient
77
+ 2. **File Chunking** (~300 lines) - Split large files into manageable chunks
78
+ 3. **Parallel Processing** (~400 lines) - Process chunks concurrently with progress tracking
79
+ 4. **Distributed Processing** (~300 lines) - Use Versori scheduled workflows for enterprise scale
80
+
81
+ ## SDK Methods Used
82
+
83
+ ```typescript
84
+ import {
85
+ createClient,
86
+ // Client factory (auto-detects context)
87
+ CSVParserService,
88
+ // Streaming CSV parser
89
+ S3DataSource,
90
+ // S3 file operations
91
+ UniversalMapper,
92
+ // Field mapping
93
+ StateService,
94
+ // Progress tracking
95
+ VersoriKVAdapter,
96
+ // Versori state management,
97
+ // Structured logging,
98
+ createConsoleLogger,
99
+ toStructuredLogger
100
+ } from '@fluentcommerce/fc-connect-sdk';
101
+ ```
102
+
103
+ ---
104
+
105
+ ## Pattern 1: Basic Streaming (Memory-Efficient)
106
+
107
+ **Best for:** 100K-1M records, single-threaded processing, memory-constrained environments
108
+
109
+ **Memory Usage:**
110
+
111
+ - ❌ Without streaming: 2GB file = 2GB+ RAM (file + parsed objects)
112
+ - ✅ With streaming: 2GB file = ~50MB RAM (processes records incrementally)
113
+
114
+ ### Implementation
115
+
116
+ ```typescript
117
+ import {
118
+ createClient,
119
+ CSVParserService,
120
+ S3DataSource,
121
+ UniversalMapper,
122
+ createConsoleLogger,
123
+ toStructuredLogger
124
+ } from '@fluentcommerce/fc-connect-sdk';
125
+
126
+ const logger = createConsoleLogger();
127
+
128
+ async function streamingIngestion(ctx: any) {
129
+ logger.info('Starting streaming ingestion');
130
+
131
+ // Create client (auto-detects Versori context)
132
+ const client = await createClient(ctx);
133
+
134
+ // Initialize S3 data source
135
+ const s3 = new S3DataSource(
136
+ {
137
+ type: 'S3_CSV',
138
+ connectionId: 'my-s3',
139
+ name: 'Inventory Files S3',
140
+ s3Config: {
141
+ bucket: 'inventory-files',
142
+ region: 'us-east-1',
143
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
144
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
145
+ },
146
+ },
147
+ logger
148
+ );
149
+
150
+ // Define field mapping
151
+ const mapper = new UniversalMapper({
152
+ fields: {
153
+ skuRef: { source: 'sku', required: true },
154
+ locationRef: { source: 'location_code', required: true },
155
+ qty: { source: 'quantity', resolver: 'sdk.parseInt' },
156
+ expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
157
+ },
158
+ });
159
+
160
+ // Create CSV parser with streaming enabled
161
+ const csvParser = new CSVParserService();
162
+
163
+ // Download file as stream (not loaded into memory)
164
+ logger.info('Downloading file from S3', {
165
+ key: 'inventory/large-file.csv',
166
+ });
167
+
168
+ const fileContent = (await s3.downloadFile('inventory/large-file.csv', {
169
+ encoding: 'utf8',
170
+ })) as string;
171
+
172
+ // Create job for batch ingestion
173
+ const job = await client.createJob({
174
+ name: 'streaming-inventory-ingestion',
175
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
176
+ });
177
+
178
+ logger.info('Job created', { jobId: job.id });
179
+
180
+ // Statistics tracking
181
+ let recordsProcessed = 0;
182
+ let batchCount = 0;
183
+ let errors = 0;
184
+ const BATCH_SIZE = 1000;
185
+ let currentBatch: any[] = [];
186
+
187
+ // Stream records with batching (memory-efficient)
188
+ // Records are parsed incrementally, not all at once
189
+ for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
190
+ try {
191
+ // Map record
192
+ const mapped = await mapper.map(record);
193
+
194
+ if (mapped.success && mapped.data) {
195
+ currentBatch.push(mapped.data);
196
+ recordsProcessed++;
197
+
198
+ // Send batch when full
199
+ if (currentBatch.length >= BATCH_SIZE) {
200
+ await client.sendBatch(job.id, {
201
+ entities: currentBatch,
202
+ });
203
+
204
+ batchCount++;
205
+
206
+ logger.info('Batch sent', {
207
+ batchNumber: batchCount,
208
+ recordsProcessed,
209
+ currentBatchSize: currentBatch.length,
210
+ });
211
+
212
+ currentBatch = []; // Clear batch (frees memory)
213
+ }
214
+ } else {
215
+ errors++;
216
+ logger.warn('Record mapping failed', {
217
+ record,
218
+ errors: mapped.errors,
219
+ });
220
+ }
221
+ } catch (error) {
222
+ errors++;
223
+ logger.error('Record processing failed', error as Error, { record });
224
+ }
225
+
226
+ // Progress logging every 10K records
227
+ if (recordsProcessed % 10000 === 0) {
228
+ logger.info('Progress update', {
229
+ recordsProcessed,
230
+ batchesSent: batchCount,
231
+ errors,
232
+ memoryUsage: process.memoryUsage().heapUsed / 1024 / 1024 + ' MB',
233
+ });
234
+ }
235
+ }
236
+
237
+ // Send remaining records
238
+ if (currentBatch.length > 0) {
239
+ await client.sendBatch(job.id, {
240
+ entities: currentBatch,
241
+ });
242
+ batchCount++;
243
+ }
244
+
245
+ logger.info('Streaming ingestion complete', {
246
+ totalRecords: recordsProcessed,
247
+ batchesSent: batchCount,
248
+ errors,
249
+ jobId: job.id,
250
+ });
251
+
252
+ return {
253
+ success: true,
254
+ jobId: job.id,
255
+ recordsProcessed,
256
+ batchesSent: batchCount,
257
+ errors,
258
+ };
259
+ }
260
+ ```
261
+
262
+ **Memory Profile:**
263
+
264
+ ```
265
+ File Size: 2GB (5M records)
266
+ RAM Usage: ~50MB peak (1000 record batches)
267
+ Processing Time: ~45 minutes (sequential)
268
+ ```
269
+
270
+ ---
271
+
272
+ ## Pattern 2: File Chunking (Split & Track)
273
+
274
+ **Best for:** 1M-5M records, need checkpoint/resume, want progress visibility
275
+
276
+ **Strategy:**
277
+
278
+ 1. Split large file into 100K record chunks
279
+ 2. Write chunks to temp S3 locations
280
+ 3. Track chunk metadata in VersoriKV
281
+ 4. Process chunks sequentially (can resume on failure)
282
+
283
+ ### Implementation
284
+
285
+ ```typescript
286
+ import {
287
+ createClient,
288
+ CSVParserService,
289
+ S3DataSource,
290
+ UniversalMapper,
291
+ StateService,
292
+ VersoriKVAdapter,
293
+ createConsoleLogger,
294
+ toStructuredLogger
295
+ } from '@fluentcommerce/fc-connect-sdk';
296
+
297
+ const logger = createConsoleLogger();
298
+
299
+ interface ChunkMetadata {
300
+ chunkId: string;
301
+ startRecord: number;
302
+ endRecord: number;
303
+ s3Key: string;
304
+ recordCount: number;
305
+ status: 'pending' | 'processing' | 'completed' | 'failed';
306
+ processedAt?: string;
307
+ error?: string;
308
+ }
309
+
310
+ async function chunkedIngestion(ctx: any) {
311
+ logger.info('Starting chunked ingestion');
312
+
313
+ // Initialize services
314
+ const client = await createClient(ctx);
315
+
316
+ const s3 = new S3DataSource(
317
+ {
318
+ type: 'S3_CSV',
319
+ connectionId: 'my-s3-chunked',
320
+ name: 'Inventory Files S3 Chunked',
321
+ s3Config: {
322
+ bucket: 'inventory-files',
323
+ region: 'us-east-1',
324
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
325
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
326
+ },
327
+ },
328
+ logger
329
+ );
330
+
331
+ // Initialize state management
332
+ const kv = context.openKv();
333
+ const kvAdapter = new VersoriKVAdapter(kv);
334
+ const stateService = new StateService(logger);
335
+
336
+ const SOURCE_FILE = 'inventory/huge-inventory.csv';
337
+ const CHUNK_SIZE = 100000; // 100K records per chunk
338
+ const workflowId = 'chunked-ingestion';
339
+
340
+ // STEP 1: Check if chunking is already in progress
341
+ const existingState = await stateService.getSyncState(kvAdapter, workflowId);
342
+
343
+ if (existingState.isInitialized && existingState.lastSyncResult === 'partial') {
344
+ logger.info('Resuming from previous run', {
345
+ lastProcessedFile: existingState.lastProcessedFile,
346
+ lastProcessedCount: existingState.lastProcessedCount,
347
+ });
348
+ }
349
+
350
+ // STEP 2: Split file into chunks
351
+ logger.info('Splitting file into chunks', {
352
+ sourceFile: SOURCE_FILE,
353
+ chunkSize: CHUNK_SIZE,
354
+ });
355
+
356
+ const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
357
+
358
+ logger.info('File split complete', {
359
+ totalChunks: chunks.length,
360
+ totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
361
+ });
362
+
363
+ // STEP 3: Create job for ingestion
364
+ const job = await client.createJob({
365
+ name: `chunked-inventory-ingestion-${Date.now()}`,
366
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
367
+ });
368
+
369
+ logger.info('Job created', { jobId: job.id });
370
+
371
+ // STEP 4: Process each chunk sequentially
372
+ let successCount = 0;
373
+ let failureCount = 0;
374
+
375
+ for (const chunk of chunks) {
376
+ try {
377
+ // Skip if already processed
378
+ const chunkState = await kvAdapter.get(['chunk', workflowId, chunk.chunkId, 'status']);
379
+
380
+ if (chunkState?.value === 'completed') {
381
+ logger.info('Chunk already processed, skipping', {
382
+ chunkId: chunk.chunkId,
383
+ });
384
+ successCount++;
385
+ continue;
386
+ }
387
+
388
+ // Mark chunk as processing
389
+ await kvAdapter.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
390
+
391
+ logger.info('Processing chunk', {
392
+ chunkId: chunk.chunkId,
393
+ recordCount: chunk.recordCount,
394
+ progress: `${successCount + failureCount}/${chunks.length}`,
395
+ });
396
+
397
+ // Process chunk
398
+ await processChunk(s3, client, job.id, chunk);
399
+
400
+ // Mark chunk as completed
401
+ await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
402
+ ...chunk,
403
+ status: 'completed',
404
+ processedAt: new Date().toISOString(),
405
+ } as ChunkMetadata);
406
+
407
+ successCount++;
408
+
409
+ logger.info('Chunk completed', {
410
+ chunkId: chunk.chunkId,
411
+ successCount,
412
+ failureCount,
413
+ percentComplete: (((successCount + failureCount) / chunks.length) * 100).toFixed(1),
414
+ });
415
+ } catch (error) {
416
+ failureCount++;
417
+ logger.error('Chunk processing failed', error as Error, {
418
+ chunkId: chunk.chunkId,
419
+ });
420
+
421
+ // Mark chunk as failed
422
+ await kvAdapter.set(['chunk', workflowId, chunk.chunkId], {
423
+ ...chunk,
424
+ status: 'failed',
425
+ error: (error as Error).message,
426
+ } as ChunkMetadata);
427
+ }
428
+ }
429
+
430
+ // STEP 5: Update final state
431
+ await stateService.updateSyncState(
432
+ kvAdapter,
433
+ [
434
+ {
435
+ fileName: SOURCE_FILE,
436
+ lastModified: new Date().toISOString(),
437
+ recordCount: chunks.reduce((sum, c) => sum + c.recordCount, 0),
438
+ },
439
+ ],
440
+ workflowId
441
+ );
442
+
443
+ logger.info('Chunked ingestion complete', {
444
+ totalChunks: chunks.length,
445
+ successCount,
446
+ failureCount,
447
+ jobId: job.id,
448
+ });
449
+
450
+ return {
451
+ success: failureCount === 0,
452
+ jobId: job.id,
453
+ chunksProcessed: successCount,
454
+ chunksFailed: failureCount,
455
+ totalChunks: chunks.length,
456
+ };
457
+ }
458
+
459
+ /**
460
+ * Split file into chunks and upload to S3
461
+ */
462
+ async function splitFileIntoChunks(
463
+ s3: S3DataSource,
464
+ sourceKey: string,
465
+ chunkSize: number,
466
+ workflowId: string,
467
+ kv: VersoriKVAdapter
468
+ ): Promise<ChunkMetadata[]> {
469
+ const csvParser = new CSVParserService();
470
+ const chunks: ChunkMetadata[] = [];
471
+
472
+ // Download source file
473
+ const fileContent = (await s3.downloadFile(sourceKey, {
474
+ encoding: 'utf8',
475
+ })) as string;
476
+
477
+ let currentChunk: any[] = [];
478
+ let chunkNumber = 0;
479
+ let recordNumber = 0;
480
+
481
+ // Stream through file and create chunks
482
+ for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
483
+ currentChunk.push(record);
484
+ recordNumber++;
485
+
486
+ // Create chunk when size reached
487
+ if (currentChunk.length >= chunkSize) {
488
+ const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
489
+ const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
490
+
491
+ // Convert chunk to CSV
492
+ const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
493
+
494
+ // Upload chunk to S3
495
+ await s3.uploadFile(chunkKey, chunkCSV, {
496
+ contentType: 'text/csv',
497
+ });
498
+
499
+ // Create chunk metadata
500
+ const metadata: ChunkMetadata = {
501
+ chunkId,
502
+ startRecord: recordNumber - currentChunk.length,
503
+ endRecord: recordNumber - 1,
504
+ s3Key: chunkKey,
505
+ recordCount: currentChunk.length,
506
+ status: 'pending',
507
+ };
508
+
509
+ chunks.push(metadata);
510
+
511
+ // Store chunk metadata in KV
512
+ await kv.set(['chunk', workflowId, chunkId], metadata);
513
+
514
+ logger.info('Chunk created', {
515
+ chunkId,
516
+ recordCount: currentChunk.length,
517
+ s3Key: chunkKey,
518
+ });
519
+
520
+ // Clear chunk (free memory)
521
+ currentChunk = [];
522
+ chunkNumber++;
523
+ }
524
+ }
525
+
526
+ // Handle remaining records
527
+ if (currentChunk.length > 0) {
528
+ const chunkId = `chunk-${chunkNumber.toString().padStart(5, '0')}`;
529
+ const chunkKey = `temp/${workflowId}/${chunkId}.csv`;
530
+
531
+ const chunkCSV = csvParser.stringify(currentChunk, { headers: true });
532
+ await s3.uploadFile(chunkKey, chunkCSV, { contentType: 'text/csv' });
533
+
534
+ const metadata: ChunkMetadata = {
535
+ chunkId,
536
+ startRecord: recordNumber - currentChunk.length,
537
+ endRecord: recordNumber - 1,
538
+ s3Key: chunkKey,
539
+ recordCount: currentChunk.length,
540
+ status: 'pending',
541
+ };
542
+
543
+ chunks.push(metadata);
544
+ await kv.set(['chunk', workflowId, chunkId], metadata);
545
+ }
546
+
547
+ return chunks;
548
+ }
549
+
550
+ /**
551
+ * Process a single chunk
552
+ */
553
+ async function processChunk(
554
+ s3: S3DataSource,
555
+ client: any,
556
+ jobId: string,
557
+ chunk: ChunkMetadata
558
+ ): Promise<void> {
559
+ const csvParser = new CSVParserService();
560
+ const mapper = new UniversalMapper({
561
+ fields: {
562
+ skuRef: { source: 'sku', required: true },
563
+ locationRef: { source: 'location_code', required: true },
564
+ qty: { source: 'quantity', resolver: 'sdk.parseInt' },
565
+ expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
566
+ },
567
+ });
568
+
569
+ // Download chunk
570
+ const chunkContent = (await s3.downloadFile(chunk.s3Key, {
571
+ encoding: 'utf8',
572
+ })) as string;
573
+
574
+ // Parse chunk
575
+ const records = await csvParser.parse(chunkContent);
576
+
577
+ // Map records
578
+ const entities: any[] = [];
579
+ for (const record of records) {
580
+ const mapped = await mapper.map(record);
581
+ if (mapped.success && mapped.data) {
582
+ entities.push(mapped.data);
583
+ }
584
+ }
585
+
586
+ // Send batch
587
+ await client.sendBatch(jobId, { entities });
588
+
589
+ logger.info('Chunk batch sent', {
590
+ chunkId: chunk.chunkId,
591
+ entityCount: entities.length,
592
+ });
593
+ }
594
+ ```
595
+
596
+ **VersoriKV Schema:**
597
+
598
+ ```typescript
599
+ // Chunk metadata
600
+ ['chunk', workflowId, chunkId] => ChunkMetadata
601
+
602
+ // Chunk status
603
+ ['chunk', workflowId, chunkId, 'status'] => 'pending' | 'processing' | 'completed' | 'failed'
604
+
605
+ // Workflow state
606
+ ['state', workflowId, 'sync'] => SyncState
607
+ ```
608
+
609
+ **Performance:**
610
+
611
+ ```
612
+ File Size: 5GB (10M records)
613
+ Chunk Size: 100K records
614
+ Total Chunks: 100
615
+ Processing Time: ~60 minutes (sequential)
616
+ RAM Usage: ~100MB (processes one chunk at a time)
617
+ ```
618
+
619
+ ---
620
+
621
+ ## Pattern 3: Parallel Processing (High Performance)
622
+
623
+ **Best for:** 5M-10M records, time-sensitive ingestion, need speed with reliability
624
+
625
+ **Strategy:**
626
+
627
+ 1. Split file into chunks (same as Pattern 2)
628
+ 2. Spawn 5 parallel Batch API jobs
629
+ 3. Process chunks concurrently
630
+ 4. Track progress in VersoriKV
631
+ 5. Resume on failure
632
+
633
+ ### Implementation
634
+
635
+ ```typescript
636
+ import {
637
+ createClient,
638
+ CSVParserService,
639
+ S3DataSource,
640
+ UniversalMapper,
641
+ StateService,
642
+ VersoriKVAdapter,
643
+ createConsoleLogger,
644
+ toStructuredLogger
645
+ } from '@fluentcommerce/fc-connect-sdk';
646
+
647
+ const logger = createConsoleLogger();
648
+
649
+ interface ParallelJob {
650
+ jobId: string;
651
+ assignedChunks: string[];
652
+ status: 'pending' | 'processing' | 'completed' | 'failed';
653
+ recordsProcessed: number;
654
+ startedAt?: string;
655
+ completedAt?: string;
656
+ }
657
+
658
+ async function parallelIngestion(ctx: any) {
659
+ logger.info('Starting parallel ingestion');
660
+
661
+ // Initialize services
662
+ const client = await createClient(ctx);
663
+
664
+ const s3 = new S3DataSource(
665
+ {
666
+ type: 'S3_CSV',
667
+ connectionId: 'my-s3-parallel',
668
+ name: 'Inventory Files S3 Parallel',
669
+ s3Config: {
670
+ bucket: 'inventory-files',
671
+ region: 'us-east-1',
672
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
673
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
674
+ },
675
+ },
676
+ logger
677
+ );
678
+
679
+ const kv = context.openKv();
680
+ const kvAdapter = new VersoriKVAdapter(kv);
681
+ const stateService = new StateService(logger);
682
+
683
+ const SOURCE_FILE = 'inventory/huge-inventory.csv';
684
+ const CHUNK_SIZE = 100000; // 100K records per chunk
685
+ const PARALLEL_JOBS = 5; // Process 5 chunks concurrently
686
+ const workflowId = 'parallel-ingestion';
687
+
688
+ // STEP 1: Split file into chunks (reuse from Pattern 2)
689
+ const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
690
+
691
+ logger.info('File split complete', {
692
+ totalChunks: chunks.length,
693
+ totalRecords: chunks.reduce((sum, c) => sum + c.recordCount, 0),
694
+ });
695
+
696
+ // STEP 2: Create multiple jobs for parallel processing
697
+ const jobs: ParallelJob[] = [];
698
+
699
+ for (let i = 0; i < PARALLEL_JOBS; i++) {
700
+ const job = await client.createJob({
701
+ name: `parallel-inventory-ingestion-job-${i + 1}`,
702
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
703
+ });
704
+
705
+ jobs.push({
706
+ jobId: job.id,
707
+ assignedChunks: [],
708
+ status: 'pending',
709
+ recordsProcessed: 0,
710
+ });
711
+
712
+ logger.info('Parallel job created', {
713
+ jobNumber: i + 1,
714
+ jobId: job.id,
715
+ });
716
+ }
717
+
718
+ // STEP 3: Distribute chunks across jobs (round-robin)
719
+ chunks.forEach((chunk, index) => {
720
+ const jobIndex = index % PARALLEL_JOBS;
721
+ jobs[jobIndex].assignedChunks.push(chunk.chunkId);
722
+ });
723
+
724
+ logger.info('Chunks distributed', {
725
+ totalChunks: chunks.length,
726
+ jobCount: PARALLEL_JOBS,
727
+ chunksPerJob: jobs.map(j => j.assignedChunks.length),
728
+ });
729
+
730
+ // STEP 4: Process chunks in parallel
731
+ const startTime = Date.now();
732
+
733
+ const jobPromises = jobs.map((job, jobIndex) =>
734
+ processJobChunks(
735
+ s3,
736
+ client,
737
+ job,
738
+ chunks.filter(c => job.assignedChunks.includes(c.chunkId)),
739
+ workflowId,
740
+ kvAdapter,
741
+ jobIndex + 1
742
+ )
743
+ );
744
+
745
+ // Wait for all jobs to complete
746
+ const results = await Promise.allSettled(jobPromises);
747
+ const duration = (Date.now() - startTime) / 1000;
748
+
749
+ // STEP 5: Analyze results
750
+ let successfulJobs = 0;
751
+ let failedJobs = 0;
752
+ let totalRecordsProcessed = 0;
753
+
754
+ results.forEach((result, index) => {
755
+ if (result.status === 'fulfilled') {
756
+ successfulJobs++;
757
+ totalRecordsProcessed += result.value.recordsProcessed;
758
+
759
+ logger.info('Job completed', {
760
+ jobNumber: index + 1,
761
+ jobId: jobs[index].jobId,
762
+ recordsProcessed: result.value.recordsProcessed,
763
+ chunksProcessed: result.value.chunksProcessed,
764
+ });
765
+ } else {
766
+ failedJobs++;
767
+ logger.error('Job failed', result.reason, {
768
+ jobNumber: index + 1,
769
+ jobId: jobs[index].jobId,
770
+ });
771
+ }
772
+ });
773
+
774
+ // STEP 6: Update final state
775
+ await stateService.updateSyncState(
776
+ kvAdapter,
777
+ [
778
+ {
779
+ fileName: SOURCE_FILE,
780
+ lastModified: new Date().toISOString(),
781
+ recordCount: totalRecordsProcessed,
782
+ },
783
+ ],
784
+ workflowId
785
+ );
786
+
787
+ logger.info('Parallel ingestion complete', {
788
+ totalChunks: chunks.length,
789
+ parallelJobs: PARALLEL_JOBS,
790
+ successfulJobs,
791
+ failedJobs,
792
+ totalRecordsProcessed,
793
+ durationSeconds: duration,
794
+ recordsPerSecond: Math.round(totalRecordsProcessed / duration),
795
+ });
796
+
797
+ return {
798
+ success: failedJobs === 0,
799
+ totalChunks: chunks.length,
800
+ totalRecordsProcessed,
801
+ successfulJobs,
802
+ failedJobs,
803
+ durationSeconds: duration,
804
+ recordsPerSecond: Math.round(totalRecordsProcessed / duration),
805
+ };
806
+ }
807
+
808
+ /**
809
+ * Process all chunks assigned to a job
810
+ */
811
+ async function processJobChunks(
812
+ s3: S3DataSource,
813
+ client: any,
814
+ job: ParallelJob,
815
+ chunks: ChunkMetadata[],
816
+ workflowId: string,
817
+ kv: VersoriKVAdapter,
818
+ jobNumber: number
819
+ ): Promise<{ recordsProcessed: number; chunksProcessed: number }> {
820
+ logger.info(`Job ${jobNumber} starting`, {
821
+ jobId: job.jobId,
822
+ assignedChunks: chunks.length,
823
+ });
824
+
825
+ let recordsProcessed = 0;
826
+ let chunksProcessed = 0;
827
+
828
+ for (const chunk of chunks) {
829
+ try {
830
+ // Check if chunk already processed
831
+ const chunkState = await kv.get(['chunk', workflowId, chunk.chunkId, 'status']);
832
+
833
+ if (chunkState?.value === 'completed') {
834
+ logger.info(`Job ${jobNumber}: Chunk already processed`, {
835
+ chunkId: chunk.chunkId,
836
+ });
837
+ chunksProcessed++;
838
+ continue;
839
+ }
840
+
841
+ // Mark chunk as processing
842
+ await kv.set(['chunk', workflowId, chunk.chunkId, 'status'], 'processing');
843
+
844
+ logger.info(`Job ${jobNumber}: Processing chunk`, {
845
+ chunkId: chunk.chunkId,
846
+ recordCount: chunk.recordCount,
847
+ progress: `${chunksProcessed}/${chunks.length}`,
848
+ });
849
+
850
+ // Process chunk
851
+ await processChunk(s3, client, job.jobId, chunk);
852
+
853
+ // Mark chunk as completed
854
+ await kv.set(['chunk', workflowId, chunk.chunkId], {
855
+ ...chunk,
856
+ status: 'completed',
857
+ processedAt: new Date().toISOString(),
858
+ } as ChunkMetadata);
859
+
860
+ recordsProcessed += chunk.recordCount;
861
+ chunksProcessed++;
862
+
863
+ logger.info(`Job ${jobNumber}: Chunk completed`, {
864
+ chunkId: chunk.chunkId,
865
+ recordsProcessed,
866
+ chunksProcessed,
867
+ percentComplete: ((chunksProcessed / chunks.length) * 100).toFixed(1),
868
+ });
869
+ } catch (error) {
870
+ logger.error(`Job ${jobNumber}: Chunk failed`, error as Error, {
871
+ chunkId: chunk.chunkId,
872
+ });
873
+
874
+ // Mark chunk as failed (don't throw - continue with remaining chunks)
875
+ await kv.set(['chunk', workflowId, chunk.chunkId], {
876
+ ...chunk,
877
+ status: 'failed',
878
+ error: (error as Error).message,
879
+ } as ChunkMetadata);
880
+ }
881
+ }
882
+
883
+ logger.info(`Job ${jobNumber} completed`, {
884
+ jobId: job.jobId,
885
+ recordsProcessed,
886
+ chunksProcessed,
887
+ });
888
+
889
+ return { recordsProcessed, chunksProcessed };
890
+ }
891
+ ```
892
+
893
+ **Progress Tracking:**
894
+
895
+ ```typescript
896
+ // Real-time progress query
897
+ async function getIngestionProgress(
898
+ workflowId: string,
899
+ kv: VersoriKVAdapter
900
+ ): Promise<{
901
+ totalChunks: number;
902
+ completedChunks: number;
903
+ failedChunks: number;
904
+ processingChunks: number;
905
+ percentComplete: number;
906
+ }> {
907
+ // This would query all chunk statuses from KV
908
+ // Simplified example:
909
+ const chunks = await getAllChunkMetadata(workflowId, kv);
910
+
911
+ const completed = chunks.filter(c => c.status === 'completed').length;
912
+ const failed = chunks.filter(c => c.status === 'failed').length;
913
+ const processing = chunks.filter(c => c.status === 'processing').length;
914
+
915
+ return {
916
+ totalChunks: chunks.length,
917
+ completedChunks: completed,
918
+ failedChunks: failed,
919
+ processingChunks: processing,
920
+ percentComplete: (completed / chunks.length) * 100,
921
+ };
922
+ }
923
+ ```
924
+
925
+ **Performance:**
926
+
927
+ ```
928
+ File Size: 5GB (10M records)
929
+ Chunk Size: 100K records
930
+ Total Chunks: 100
931
+ Parallel Jobs: 5
932
+ Processing Time: ~15 minutes (4x speedup)
933
+ RAM Usage: ~500MB (5 chunks in parallel)
934
+ Throughput: ~11,111 records/second
935
+ ```
936
+
937
+ ---
938
+
939
+ ## Pattern 4: Distributed Processing (Versori Workflows)
940
+
941
+ **Best for:** 10M+ records, enterprise scale, need maximum reliability and observability
942
+
943
+ **Strategy:**
944
+
945
+ 1. Coordinator workflow splits file and creates scheduled tasks
946
+ 2. Each worker workflow processes one chunk
947
+ 3. Coordinator tracks completion via VersoriKV
948
+ 4. Automatic retry on worker failure
949
+
950
+ ### Coordinator Workflow
951
+
952
+ ```typescript
953
+ import { fn, schedule } from '@versori/run';
954
+ import {
955
+ createClient,
956
+ S3DataSource,
957
+ VersoriKVAdapter,
958
+ createConsoleLogger,
959
+ toStructuredLogger
960
+ } from '@fluentcommerce/fc-connect-sdk';
961
+
962
+ const logger = createConsoleLogger();
963
+
964
+ /**
965
+ * Coordinator workflow - splits file and spawns workers
966
+ */
967
+ export const coordinatorWorkflow = schedule('coordinator')
968
+ .cron('0 2 * * *') // Run daily at 2 AM
969
+ .then(
970
+ fn('split-and-schedule', async ({ activation, connections, kv }) => {
971
+ logger.info('Coordinator: Starting distributed ingestion');
972
+
973
+ const s3 = new S3DataSource(
974
+ {
975
+ type: 'S3_CSV',
976
+ connectionId: 'my-s3-3',
977
+ name: 'Inventory Files S3 3',
978
+ s3Config: {
979
+ bucket: 'inventory-files',
980
+ region: 'us-east-1',
981
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
982
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
983
+ },
984
+ },
985
+ logger
986
+ );
987
+
988
+ const kvAdapter = new VersoriKVAdapter(kv);
989
+ const workflowId = `distributed-${Date.now()}`;
990
+ const SOURCE_FILE = 'inventory/enterprise-inventory.csv';
991
+ const CHUNK_SIZE = 100000;
992
+
993
+ // Split file into chunks
994
+ const chunks = await splitFileIntoChunks(s3, SOURCE_FILE, CHUNK_SIZE, workflowId, kvAdapter);
995
+
996
+ logger.info('Coordinator: File split complete', {
997
+ totalChunks: chunks.length,
998
+ workflowId,
999
+ });
1000
+
1001
+ // Store coordinator state
1002
+ await kvAdapter.set(['coordinator', workflowId], {
1003
+ workflowId,
1004
+ sourceFile: SOURCE_FILE,
1005
+ totalChunks: chunks.length,
1006
+ status: 'scheduled',
1007
+ createdAt: new Date().toISOString(),
1008
+ });
1009
+
1010
+ // Schedule worker for each chunk
1011
+ for (const chunk of chunks) {
1012
+ // Trigger worker workflow (Versori will handle scheduling)
1013
+ await activation.triggerWorkflow('chunk-worker', {
1014
+ workflowId,
1015
+ chunkId: chunk.chunkId,
1016
+ chunkKey: chunk.s3Key,
1017
+ recordCount: chunk.recordCount,
1018
+ });
1019
+
1020
+ logger.info('Coordinator: Worker scheduled', {
1021
+ chunkId: chunk.chunkId,
1022
+ workflowId,
1023
+ });
1024
+ }
1025
+
1026
+ return {
1027
+ workflowId,
1028
+ totalChunks: chunks.length,
1029
+ message: `Scheduled ${chunks.length} worker workflows`,
1030
+ };
1031
+ })
1032
+ );
1033
+
1034
+ /**
1035
+ * Monitor workflow - checks completion status
1036
+ */
1037
+ export const monitorWorkflow = schedule('monitor')
1038
+ .cron('*/5 * * * *') // Run every 5 minutes
1039
+ .then(
1040
+ fn('check-progress', async ({ kv }) => {
1041
+ const kvAdapter = new VersoriKVAdapter(kv);
1042
+
1043
+ // Get all active coordinators
1044
+ const coordinators = await getActiveCoordinators(kvAdapter);
1045
+
1046
+ for (const coordinator of coordinators) {
1047
+ const progress = await getIngestionProgress(coordinator.workflowId, kvAdapter);
1048
+
1049
+ logger.info('Monitor: Progress update', {
1050
+ workflowId: coordinator.workflowId,
1051
+ ...progress,
1052
+ });
1053
+
1054
+ // Check if complete
1055
+ if (progress.completedChunks + progress.failedChunks === progress.totalChunks) {
1056
+ // Mark coordinator as complete
1057
+ await kvAdapter.set(['coordinator', coordinator.workflowId], {
1058
+ ...coordinator,
1059
+ status: 'completed',
1060
+ completedAt: new Date().toISOString(),
1061
+ progress,
1062
+ });
1063
+
1064
+ logger.info('Monitor: Ingestion complete', {
1065
+ workflowId: coordinator.workflowId,
1066
+ ...progress,
1067
+ });
1068
+ }
1069
+ }
1070
+
1071
+ return { coordinatorsChecked: coordinators.length };
1072
+ })
1073
+ );
1074
+ ```
1075
+
1076
+ ### Worker Workflow
1077
+
1078
+ ```typescript
1079
+ import { fn, webhook } from '@versori/run';
1080
+ import {
1081
+ createClient,
1082
+ S3DataSource,
1083
+ CSVParserService,
1084
+ UniversalMapper,
1085
+ VersoriKVAdapter,
1086
+ createConsoleLogger,
1087
+ toStructuredLogger
1088
+ } from '@fluentcommerce/fc-connect-sdk';
1089
+
1090
+ const logger = createConsoleLogger();
1091
+
1092
+ /**
1093
+ * Worker workflow - processes a single chunk
1094
+ */
1095
+ export const chunkWorker = webhook('chunk-worker').then(
1096
+ fn('process-chunk', async ({ data, activation, connections, kv }) => {
1097
+ const { workflowId, chunkId, chunkKey, recordCount } = data;
1098
+
1099
+ logger.info('Worker: Starting chunk processing', {
1100
+ workflowId,
1101
+ chunkId,
1102
+ recordCount,
1103
+ });
1104
+
1105
+ const kvAdapter = new VersoriKVAdapter(kv);
1106
+
1107
+ // Check if already processed
1108
+ const chunkState = await kvAdapter.get(['chunk', workflowId, chunkId, 'status']);
1109
+
1110
+ if (chunkState?.value === 'completed') {
1111
+ logger.info('Worker: Chunk already processed', { chunkId });
1112
+ return { chunkId, status: 'skipped', message: 'Already processed' };
1113
+ }
1114
+
1115
+ // Mark as processing
1116
+ await kvAdapter.set(['chunk', workflowId, chunkId, 'status'], 'processing');
1117
+
1118
+ try {
1119
+ // Initialize services
1120
+ const client = await createClient(ctx);
1121
+
1122
+ const s3 = new S3DataSource(
1123
+ {
1124
+ type: 'S3_CSV',
1125
+ connectionId: 'my-s3-4',
1126
+ name: 'Inventory Files S3 4',
1127
+ s3Config: {
1128
+ bucket: 'inventory-files',
1129
+ region: 'us-east-1',
1130
+ accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
1131
+ secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
1132
+ },
1133
+ },
1134
+ logger
1135
+ );
1136
+
1137
+ const csvParser = new CSVParserService();
1138
+ const mapper = new UniversalMapper({
1139
+ fields: {
1140
+ skuRef: { source: 'sku', required: true },
1141
+ locationRef: { source: 'location_code', required: true },
1142
+ qty: { source: 'quantity', resolver: 'sdk.parseInt' },
1143
+ expectedOn: { source: 'expected_date', resolver: 'sdk.formatDate' },
1144
+ },
1145
+ });
1146
+
1147
+ // Get or create job for this workflow
1148
+ let jobId = await kvAdapter.get(['job', workflowId, 'jobId']);
1149
+
1150
+ if (!jobId?.value) {
1151
+ const job = await client.createJob({
1152
+ name: `distributed-ingestion-${workflowId}`,
1153
+ retailerId: client.getRetailerId() || ctx.connections?.fluent_commerce?.retailerId,
1154
+ });
1155
+
1156
+ await kvAdapter.set(['job', workflowId, 'jobId'], job.id);
1157
+ jobId = { value: job.id };
1158
+ }
1159
+
1160
+ // Download chunk
1161
+ const chunkContent = (await s3.downloadFile(chunkKey, {
1162
+ encoding: 'utf8',
1163
+ })) as string;
1164
+
1165
+ // Parse chunk
1166
+ const records = await csvParser.parse(chunkContent);
1167
+
1168
+ // Map records
1169
+ const entities: any[] = [];
1170
+ for (const record of records) {
1171
+ const mapped = await mapper.map(record);
1172
+ if (mapped.success && mapped.data) {
1173
+ entities.push(mapped.data);
1174
+ }
1175
+ }
1176
+
1177
+ // Send batch
1178
+ await client.sendBatch(jobId.value as string, { entities });
1179
+
1180
+ // Mark as completed
1181
+ await kvAdapter.set(['chunk', workflowId, chunkId], {
1182
+ chunkId,
1183
+ s3Key: chunkKey,
1184
+ recordCount: entities.length,
1185
+ status: 'completed',
1186
+ processedAt: new Date().toISOString(),
1187
+ });
1188
+
1189
+ logger.info('Worker: Chunk completed', {
1190
+ workflowId,
1191
+ chunkId,
1192
+ recordCount: entities.length,
1193
+ });
1194
+
1195
+ return {
1196
+ chunkId,
1197
+ status: 'completed',
1198
+ recordsProcessed: entities.length,
1199
+ };
1200
+ } catch (error) {
1201
+ logger.error('Worker: Chunk failed', error as Error, {
1202
+ workflowId,
1203
+ chunkId,
1204
+ });
1205
+
1206
+ // Mark as failed
1207
+ await kvAdapter.set(['chunk', workflowId, chunkId], {
1208
+ chunkId,
1209
+ s3Key: chunkKey,
1210
+ recordCount,
1211
+ status: 'failed',
1212
+ error: (error as Error).message,
1213
+ });
1214
+
1215
+ throw error;
1216
+ }
1217
+ })
1218
+ );
1219
+ ```
1220
+
1221
+ **Performance:**
1222
+
1223
+ ```
1224
+ File Size: 10GB (20M records)
1225
+ Chunk Size: 100K records
1226
+ Total Chunks: 200
1227
+ Worker Workflows: 200 (parallel)
1228
+ Processing Time: ~10 minutes (Versori handles parallelism)
1229
+ RAM Usage: ~50MB per worker
1230
+ Throughput: ~33,333 records/second
1231
+ ```
1232
+
1233
+ ---
1234
+
1235
+ ## Memory Optimization Tips
1236
+
1237
+ ### 1. Use Streaming APIs
1238
+
1239
+ ```typescript
1240
+ // ❌ WRONG - Loads entire file into memory
1241
+ const fileContent = await fs.readFile('huge.csv', 'utf-8');
1242
+ const records = await csvParser.parse(fileContent);
1243
+
1244
+ // ✅ CORRECT - Streams records incrementally
1245
+ for await (const record of csvParser.parseStreaming(fileContent, {}, 1)) {
1246
+ await processRecord(record);
1247
+ }
1248
+ ```
1249
+
1250
+ ### 2. Clear Batches After Processing
1251
+
1252
+ ```typescript
1253
+ let batch: any[] = [];
1254
+ for await (const record of records) {
1255
+ batch.push(record);
1256
+
1257
+ if (batch.length >= 1000) {
1258
+ await sendBatch(batch);
1259
+ batch = []; // ✅ Clear batch to free memory
1260
+ }
1261
+ }
1262
+ ```
1263
+
1264
+ ### 3. Monitor Memory Usage
1265
+
1266
+ ```typescript
1267
+ function logMemoryUsage() {
1268
+ const used = process.memoryUsage();
1269
+ console.log({
1270
+ heapUsed: Math.round(used.heapUsed / 1024 / 1024) + ' MB',
1271
+ heapTotal: Math.round(used.heapTotal / 1024 / 1024) + ' MB',
1272
+ rss: Math.round(used.rss / 1024 / 1024) + ' MB',
1273
+ });
1274
+ }
1275
+
1276
+ // Log every 10K records
1277
+ if (recordsProcessed % 10000 === 0) {
1278
+ logMemoryUsage();
1279
+ }
1280
+ ```
1281
+
1282
+ ### 4. Use Garbage Collection Hints
1283
+
1284
+ ```typescript
1285
+ // Force garbage collection (requires --expose-gc flag)
1286
+ if (recordsProcessed % 100000 === 0 && global.gc) {
1287
+ global.gc();
1288
+ logger.info('Garbage collection triggered', { recordsProcessed });
1289
+ }
1290
+ ```
1291
+
1292
+ ---
1293
+
1294
+ ## Performance Benchmarks
1295
+
1296
+ ### Pattern Comparison (10M records, 5GB file)
1297
+
1298
+ | Pattern | Time | RAM | Throughput | Complexity |
1299
+ | ------------------------- | ------ | ------ | -------------- | ---------- |
1300
+ | 1. Basic Streaming | 90 min | 50MB | 1,852 rec/sec | Low |
1301
+ | 2. File Chunking | 60 min | 100MB | 2,778 rec/sec | Medium |
1302
+ | 3. Parallel Processing | 15 min | 500MB | 11,111 rec/sec | High |
1303
+ | 4. Distributed Processing | 10 min | 50MB\* | 16,667 rec/sec | Very High |
1304
+
1305
+ \*Per worker; total RAM = 50MB × worker count
1306
+
1307
+ ### Optimization Impact
1308
+
1309
+ | Optimization | Before | After | Improvement |
1310
+ | ------------------------- | ------- | -------- | ----------- |
1311
+ | Streaming vs Loading | 5GB RAM | 50MB RAM | 100x |
1312
+ | Batching (1K vs 10K) | 90 min | 60 min | 1.5x |
1313
+ | Parallel (1 vs 5 jobs) | 60 min | 15 min | 4x |
1314
+ | Distributed (200 workers) | 15 min | 10 min | 1.5x |
1315
+
1316
+ ---
1317
+
1318
+ ## Common Issues & Solutions
1319
+
1320
+ ### Issue 1: Out of Memory
1321
+
1322
+ **Symptoms:**
1323
+
1324
+ ```
1325
+ FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1326
+ ```
1327
+
1328
+ **Solutions:**
1329
+
1330
+ 1. Switch to streaming pattern (Pattern 1)
1331
+ 2. Reduce batch size (1000 => 500)
1332
+ 3. Increase Node.js heap: `node --max-old-space-size=4096`
1333
+ 4. Use file chunking (Pattern 2)
1334
+
1335
+ ### Issue 2: Timeout on Large Files
1336
+
1337
+ **Symptoms:**
1338
+
1339
+ ```
1340
+ TimeoutError: Operation timed out after 300000ms
1341
+ ```
1342
+
1343
+ **Solutions:**
1344
+
1345
+ 1. Increase timeout: `config.timeout = 600000` (10 min)
1346
+ 2. Split file into chunks (Pattern 2)
1347
+ 3. Use parallel processing (Pattern 3)
1348
+
1349
+ ### Issue 3: Chunks Not Resuming
1350
+
1351
+ **Symptoms:**
1352
+
1353
+ - Re-processing same chunks on failure
1354
+
1355
+ **Solutions:**
1356
+
1357
+ ```typescript
1358
+ // Check chunk status before processing
1359
+ const chunkState = await kv.get(['chunk', workflowId, chunkId, 'status']);
1360
+ if (chunkState?.value === 'completed') {
1361
+ logger.info('Chunk already processed, skipping', { chunkId });
1362
+ continue;
1363
+ }
1364
+ ```
1365
+
1366
+ ### Issue 4: Progress Tracking Inconsistent
1367
+
1368
+ **Symptoms:**
1369
+
1370
+ - Progress percentage doesn't match reality
1371
+
1372
+ **Solutions:**
1373
+
1374
+ ```typescript
1375
+ // Always update chunk status atomically
1376
+ const atomic = kv.atomic();
1377
+ atomic.set(['chunk', workflowId, chunkId, 'status'], 'completed');
1378
+ atomic.set(['chunk', workflowId, chunkId, 'processedAt'], new Date().toISOString());
1379
+ await atomic.commit();
1380
+ ```
1381
+
1382
+ ### Issue 5: Duplicate Processing
1383
+
1384
+ **Symptoms:**
1385
+
1386
+ - Same records sent multiple times
1387
+
1388
+ **Solutions:**
1389
+
1390
+ ```typescript
1391
+ // Use idempotency keys in Fluent batch payload
1392
+ await client.sendBatch(jobId, {
1393
+ entities,
1394
+ meta: {
1395
+ chunkId: chunk.chunkId,
1396
+ workflowId,
1397
+ idempotencyKey: `${workflowId}-${chunk.chunkId}`,
1398
+ },
1399
+ });
1400
+ ```
1401
+
1402
+ ---
1403
+
1404
+ ## Related Guides
1405
+
1406
+ - [Basic Ingestion Pattern](../standalone/s3-csv-batch-api.md) - For small files (<100K records)
1407
+ - [Streaming Pattern](../../02-CORE-GUIDES/ingestion/ingestion-readme.md) - For medium files (100K-1M records)
1408
+ - [Error Handling & Retry](./error-handling-retry.md) - Robust error handling strategies
1409
+ - [Progress Tracking](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-08-performance-optimization.md) - Real-time progress monitoring
1410
+ - [State Management](../../02-CORE-GUIDES/ingestion/modules/02-core-guides-ingestion-07-state-management.md) - VersoriKV patterns
1411
+
1412
+ ---
1413
+
1414
+ ## Summary
1415
+
1416
+ **Choose Your Pattern:**
1417
+
1418
+ - **Pattern 1 (Streaming)**: Simple, memory-efficient, suitable for 100K-1M records
1419
+ - **Pattern 2 (Chunking)**: Checkpoint/resume, suitable for 1M-5M records
1420
+ - **Pattern 3 (Parallel)**: High performance, suitable for 5M-10M records
1421
+ - **Pattern 4 (Distributed)**: Enterprise scale, suitable for 10M+ records
1422
+
1423
+ **Key Takeaways:**
1424
+
1425
+ 1. Always use streaming APIs for large files
1426
+ 2. Clear batches after processing to free memory
1427
+ 3. Use chunks + VersoriKV for checkpoint/resume
1428
+ 4. Parallel processing trades RAM for speed
1429
+ 5. Monitor memory usage throughout processing
1430
+ 6. Test with representative file sizes before production