data-science-document-ai 1.42.3__tar.gz → 1.42.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/PKG-INFO +1 -1
  2. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/pyproject.toml +1 -1
  3. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/postprocessing/common.py +5 -5
  4. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bundeskasse/other/placeholders.json +1 -1
  5. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bundeskasse/other/prompt.txt +1 -1
  6. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/customsInvoice/other/placeholders.json +1 -1
  7. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/customsInvoice/other/prompt.txt +2 -2
  8. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/partnerInvoice/other/placeholders.json +39 -12
  9. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/partnerInvoice/other/prompt.txt +2 -15
  10. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/constants.py +0 -0
  11. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/constants_sandbox.py +0 -0
  12. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/docai.py +0 -0
  13. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/docai_processor_config.yaml +0 -0
  14. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/excel_processing.py +0 -0
  15. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/io.py +0 -0
  16. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/llm.py +0 -0
  17. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/log_setup.py +0 -0
  18. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/pdf_processing.py +0 -0
  19. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/postprocessing/postprocess_booking_confirmation.py +0 -0
  20. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/postprocessing/postprocess_commercial_invoice.py +0 -0
  21. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/postprocessing/postprocess_partner_invoice.py +0 -0
  22. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/evergreen/placeholders.json +0 -0
  23. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/evergreen/prompt.txt +0 -0
  24. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/hapag-lloyd/placeholders.json +0 -0
  25. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/hapag-lloyd/prompt.txt +0 -0
  26. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/maersk/placeholders.json +0 -0
  27. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/maersk/prompt.txt +0 -0
  28. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/msc/placeholders.json +0 -0
  29. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/msc/prompt.txt +0 -0
  30. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/oocl/placeholders.json +0 -0
  31. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/oocl/prompt.txt +0 -0
  32. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/other/placeholders.json +0 -0
  33. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/other/prompt.txt +0 -0
  34. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/yangming/placeholders.json +0 -0
  35. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/bookingConfirmation/yangming/prompt.txt +0 -0
  36. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/commercialInvoice/other/prompt.txt +0 -0
  37. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/customsAssessment/other/prompt.txt +0 -0
  38. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/deliveryOrder/other/placeholders.json +0 -0
  39. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/deliveryOrder/other/prompt.txt +0 -0
  40. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/draftMbl/hapag-lloyd/prompt.txt +0 -0
  41. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/draftMbl/maersk/prompt.txt +0 -0
  42. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/draftMbl/other/placeholders.json +0 -0
  43. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/draftMbl/other/prompt.txt +0 -0
  44. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/finalMbL/hapag-lloyd/prompt.txt +0 -0
  45. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/finalMbL/maersk/prompt.txt +0 -0
  46. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/finalMbL/other/prompt.txt +0 -0
  47. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/packingList/other/prompt.txt +0 -0
  48. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/postprocessing/port_code/placeholders.json +0 -0
  49. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/postprocessing/port_code/prompt_port_code.txt +0 -0
  50. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/preprocessing/carrier/placeholders.json +0 -0
  51. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/preprocessing/carrier/prompt.txt +0 -0
  52. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/library/shippingInstruction/other/prompt.txt +0 -0
  53. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/prompts/prompt_library.py +0 -0
  54. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/setup.py +0 -0
  55. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/tms.py +0 -0
  56. {data_science_document_ai-1.42.3 → data_science_document_ai-1.42.4}/src/utils.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: data-science-document-ai
3
- Version: 1.42.3
3
+ Version: 1.42.4
4
4
  Summary: "Document AI repo for data science"
5
5
  Author: Naomi Nguyen
6
6
  Author-email: naomi.nguyen@forto.com
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "data-science-document-ai"
3
- version = "1.42.3"
3
+ version = "1.42.4"
4
4
  description = "\"Document AI repo for data science\""
5
5
  authors = ["Naomi Nguyen <naomi.nguyen@forto.com>", "Kumar Rajendrababu <kumar.rajendrababu@forto.com>", "Igor Tonko <igor.tonko@forto.com>", "Osman Demirel <osman.demirel@forto.com>"]
6
6
  packages = [
@@ -84,16 +84,16 @@ def clean_shipment_id(shipment_id):
84
84
  """
85
85
  if not shipment_id:
86
86
  return
87
- # '#S123456@-1' -> 'S123456'
88
- # Find the pattern of a shipment ID that starts with 'S' followed by 5 to 7 digits
89
- match = re.findall(r"S\d{5,7}", shipment_id)
87
+ # '#S1234565@-1' -> 'S1234565'
88
+ # Find the pattern of a shipment ID that starts with 'S' followed by 7 to 8 digits
89
+ match = re.findall(r"S\d{6,8}", shipment_id)
90
90
  stripped_value = match[0] if match else None
91
91
 
92
92
  if not stripped_value:
93
93
  return None
94
94
 
95
95
  # Check if length is valid (should be either 7 or 8)
96
- if len(stripped_value) not in (6, 7, 8):
96
+ if len(stripped_value) not in (7, 8, 9):
97
97
  return None
98
98
 
99
99
  return stripped_value
@@ -425,7 +425,7 @@ async def format_label(entity_k, entity_value, document_type_code, params):
425
425
  formatted_value = clean_invoice_number(entity_value)
426
426
 
427
427
  elif entity_key in ("shipmentid", "partnerreference"):
428
- # Clean the shipment ID to match Forto's standard (starts with 'S' followed by 5 to 7 digits)
428
+ # Clean the shipment ID to match Forto's standard (starts with 'S' followed by 7 or 8 digits)
429
429
  formatted_value = clean_shipment_id(entity_value)
430
430
 
431
431
  elif entity_key == "containernumber":
@@ -34,7 +34,7 @@
34
34
  "shipmentId": {
35
35
  "type": "STRING",
36
36
  "nullable": true,
37
- "description": "Starting with an \"S\" and followed by 6 or 7 digits. Example: S124321"
37
+ "description": "Starting with an \"S\" and followed by 6 or 8 digits. Example: S1243213 or S12876549"
38
38
  },
39
39
  "vendorName": {
40
40
  "type": "STRING",
@@ -19,7 +19,7 @@ Your role is to accurately extract specific entities from these Customs invoices
19
19
  - Few invoices contains multiple container numbers, in that case, all container numbers should be captured.
20
20
 
21
21
  - shipmentID:
22
- - Shipment ID is a unique identifier for the shipment, it starts with "S" followed by 6-7 digits (e.g., S123058).
22
+ - Shipment ID is a unique identifier for the shipment, it starts with "S" followed by 6-8 digits (e.g., S1230583 or S12305876).
23
23
  - It can be found in the top section of the invoice. Few times, it can be found in between a long string of numbers as well. (e.g., "FORTO-S136748-").
24
24
  - It can also be referred to as "Bezugsnummer" in the invoice.
25
25
 
@@ -116,7 +116,7 @@
116
116
  "description": "Bill of Lading number (B/L NO.), a document issued by the carrier."
117
117
  },
118
118
  "partnerReference": {"type": "STRING", "nullable": true,
119
- "description": "A partnerReference can be a shipment ID. It starts with 'S' followed by 6 or 7 digits (e.g., 'S1234567')."
119
+ "description": "A partnerReference can be a shipment ID. It starts with 'S' followed by 6 or 8 digits (e.g., 'S1234567')."
120
120
  },
121
121
  "paymentTerm": {"type": "STRING", "nullable": true,
122
122
  "description": "The payment term indicates the conditions under which the payment should be made. E.g., 'In 10 TAGEN', '14 TAGEN', '14 days', etc."},
@@ -49,7 +49,7 @@ Your role is to accurately extract specific entities from these invoices to supp
49
49
  - dueDate: The date by which the payment should be made. Do Not calculate dueDate based on issueDate or any other date. Extract it directly from the invoice.
50
50
 
51
51
  - lineItem: Details of each COGS and Customs line item on the invoice. Make sure to extract each amount and currency separately.
52
- - uniqueId: A unique id which associated with the lineItem as each line item can belong to a different shipment. Extract only if its available in the line item. Either a shipmentId starting with an S and followed by 6 or 7 numeric values or a mblNumber. If shipmentId or mblNumber does not exist, set it to containerNumber.
52
+ - uniqueId: A unique id which associated with the lineItem as each line item can belong to a different shipment. Extract only if its available in the line item. Either a shipmentId starting with an S and followed by 6 or 8 numeric values or a mblNumber. If shipmentId or mblNumber does not exist, set it to containerNumber.
53
53
  - lineItemDescription: The name or description of the item. Usually, it will be a one line sentence.
54
54
  - unitPrice: Even if the quantity is not mentioned, you can still extract the unit price. Check the naming of the columns in a different languages, it can be "Unit Price", "Prezzo unitario", "Prix Unitaire", "Unitario", etc. Refer to "Prezzo unitario" field in the italian invoice example.
55
55
  - totalAmount: The total amount for the item. It can be in different currencies, so ensure to capture the currency as well for the totalAmountCurrency.
@@ -60,7 +60,7 @@ Your role is to accurately extract specific entities from these invoices to supp
60
60
  - hblNumber and mblNumber:
61
61
  - The Master Bill of Lading number. Commonly known as "Bill of Lading Number", "BILL OF LADING NO.", "BL Number", "BL No.", "B/L No.", "BL-Nr.", "B/L", or "HBL No.".
62
62
  - Do not confuse with the containerNumber that always starts with 4 letters and is followed by 7 digits (e.g., SEGU3090389). This is not the mblNumber or hblNumber.
63
- - partnerReference: Shipment_ID can be a reference number for the partner. Shipment_ID always starts with "S" followed by 6 or 7 digits (e.g., S2654361).
63
+ - partnerReference: Shipment_ID can be a reference number for the partner. Shipment_ID always starts with "S" followed by 6 or 8 digits (e.g., S2654361).
64
64
 
65
65
  - vendorName and vendorAddress:
66
66
  - The name and address of the vendor providing the service and to whom the payment should be made.
@@ -110,18 +110,45 @@
110
110
  "containerSize"
111
111
  ]
112
112
  },
113
- "mblNumber": {"type": "STRING", "nullable": true},
114
- "partnerReference": {"type": "STRING", "nullable": true},
115
- "paymentTerm": {"type": "STRING", "nullable": true},
116
- "portOfDischarge": {"type": "STRING", "nullable": true},
117
- "portOfLoading": {"type": "STRING", "nullable": true},
118
- "recipientAddress": {"type": "STRING", "nullable": true},
119
- "recipientName": {"type": "STRING", "nullable": true},
120
- "serviceDate": {"type": "STRING", "nullable": true},
121
- "vatId": {"type": "STRING", "nullable": true},
122
- "vendorAddress": {"type": "STRING", "nullable": true},
123
- "vendorName": {"type": "STRING", "nullable": true},
124
- "reverseChargeSentence": {"type": "STRING", "nullable": true}
113
+ "mblNumber": {"type": "STRING", "nullable": true,
114
+ "description": "Bill of Lading number (B/L NO.), a document issued by the carrier."
115
+ },
116
+ "partnerReference": {"type": "STRING", "nullable": true,
117
+ "description": "A partnerReference can be a shipment ID. It starts with 'S' followed by 6 or 8 digits (e.g., 'S1234567')."
118
+ },
119
+ "paymentTerm": {"type": "STRING", "nullable": true,
120
+ "description": "The payment term indicates the conditions under which the payment should be made. E.g., 'In 10 TAGEN', '14 TAGEN', '14 days', etc."},
121
+ "portOfDischarge": {"type": "STRING", "nullable": true,
122
+ "description": "The port where the goods are discharged from the vessel. This is the destination port for the shipment."},
123
+ "portOfLoading": {"type": "STRING", "nullable": true,
124
+ "description": "The origin port where the goods are loaded onto the vessel. Find information like 'Ladehafen' or 'Port of Loading' in the invoice."},
125
+ "recipientAddress": {"type": "STRING", "nullable": true,
126
+ "description": "Majority of the times, it is 'Forto Logistics SE & Co KG' Address depends on the entity."},
127
+ "recipientName": {"type": "STRING", "nullable": true,
128
+ "description": "The name of the recipient who is responsible for making the payment. This is often the 'Forto Logistics SE & Co KG' entity or partner."},
129
+ "serviceDate": {"type": "STRING", "nullable": true,
130
+ "description": "The date when the service was provided. If Service date is not available in the invoice, Estimated Time of Arrival (ETA) can be used."},
131
+ "vatId": {"type": "STRING", "nullable": true,
132
+ "description": "The VAT ID of the vendor. This is used for tax purposes and to identify the vendor in financial transactions."},
133
+ "vendorAddress": {"type": "STRING", "nullable": true,
134
+ "description": "The address of the vendor to whom the payment should be made."},
135
+ "vendorName": {"type": "STRING", "nullable": true,
136
+ "description": "The name of the vendor to whom the payment should be made. Extract the main vendor details incase the invoice contains 'As Agent For'."},
137
+ "agentName": {
138
+ "type": "STRING",
139
+ "nullable": true,
140
+ "description": "The name of the agent or intermediary involved in the transaction, if applicable."},
141
+ "agentKeyWord": {
142
+ "type": "STRING",
143
+ "nullable": true,
144
+ "description": "A keyword or phrase that indicates the presence of an agent or intermediary in the transaction, such as 'As Agent For', 'Acting Agent', 'Issuing agent', 'Contact Agent', or similar words."},
145
+
146
+ "reverseChargeSentence": {
147
+ "type": "STRING",
148
+ "nullable": true,
149
+ "description": "A sentence which indicate that the reverse charge applies. Mostly found as VAT/Tax Clause."
150
+ }
151
+
125
152
  },
126
153
  "required": [
127
154
  "bankAccount",
@@ -47,7 +47,7 @@ Your role is to accurately extract specific entities from these invoices to supp
47
47
  - eta and etd: Few invoices contains same date for ARRIVED/DEPARTED or ETA/ETD. Extract it for both eta and etd.
48
48
 
49
49
  - lineItem: Details of each COGS and Customs line item on the invoice. Make sure to extract each amount and currency separately.
50
- - uniqueId: A unique id which associated with the lineItem as each line item can belong to a different shipment. Extract only if its available in the line item. Either a shipmentId starting with an S and followed by 6 or 7 numeric values or a mblNumber. If shipmentId or mblNumber does not exist, set it to containerNumber.
50
+ - uniqueId: A unique id which associated with the lineItem as each line item can belong to a different shipment. Extract only if its available in the line item. Either a shipmentId starting with an S and followed by 6 or 8 numeric values or a mblNumber. If shipmentId or mblNumber does not exist, set it to containerNumber.
51
51
  - lineItemDescription: The name or description of the item. Usually, it will be a one line sentence.
52
52
  - unitPrice: Even if the quantity is not mentioned, you can still extract the unit price. Check the naming of the columns in a different languages, it can be "Unit Price", "Prezzo unitario", "Prix Unitaire", "Unitario", etc. Refer to "Prezzo unitario" field in the italian invoice example.
53
53
  - totalAmount: The total amount for the item. It can be in different currencies, so ensure to capture the currency as well for the totalAmountCurrency.
@@ -59,7 +59,7 @@ Your role is to accurately extract specific entities from these invoices to supp
59
59
  - The Master Bill of Lading number. Commonly known as "Bill of Lading Number", "BILL OF LADING NO.", "BL Number", "BL No.", "B/L No.", "BL-Nr.", "B/L", or "HBL No.".
60
60
  - Do not confuse with the containerNumber that always starts with 4 letters and is followed by 7 digits (e.g., SEGU3090389). This is not the mblNumber or hblNumber.
61
61
  - partnerReference:
62
- - Shipment_ID can be a reference number for the partner. Shipment_ID always starts with "S" followed by 6 or 7 digits (e.g., S2654361).
62
+ - Shipment_ID can be a reference number for the partner. Shipment_ID always starts with "S" followed by 6 or 8 digits (e.g., S2654361).
63
63
  - If Shipment_ID is not available, extract any Booking Number as partnerReference.
64
64
 
65
65
  - vendorName and vendorAddress:
@@ -79,16 +79,6 @@ Your role is to accurately extract specific entities from these invoices to supp
79
79
  - serviceDate: The date of service provided. If the serviceDate is not specifically mentioned in the invoice, you can use the ETA of the shipment as a serviceDate.
80
80
  - reverseChargeSentence: A sentence which indicate that the reverse charge applies. Mostly fund as Tax Clause.
81
81
 
82
- - paymentInformation:
83
- - Some partners receive prepayment before providing the service. They later send a final invoice that includes both the amount already paid and the remaining amount due.
84
- - This applies when the invoice contains prepayment-related terms such as Vorschuss, BEREITS BEZAHLT, or similar at the invoice total section.
85
- - do not get confused with the paidAmount and remainingAmountToPay. Few invoices may not have a paidAmount or remainingAmountToPay in such cases pay attention to the sentence field alignment.
86
- - Extract the following fields, if applicable:
87
- - paidAmount: The amount that has already been paid. You can identify this in the invoice by looking for terms like "Vorschuss", "BEREITS BEZAHLT".
88
- - remainingAmountToPay: The amount still due. This can be negative if the paid amount exceeds the total invoice amount. Ensure the negative sign is captured if applicable. You can identify this by looking for terms like "Bitte Zahlen", "Zu zahlen", "Remaining Amount", "To Pay", "Due", or "Unpaid".
89
- - currency: The currency of both the paid and remaining amounts.
90
- - sentence: A sentence from the invoice indicating the payment status (e.g., "Vorschuss", "Prepayment", "Paid", "Partially Paid", "Unpaid"). This helps summarize the overall payment status of the invoice.
91
-
92
82
  IMPORTANT NOTE:
93
83
  - Ensure all extracted values are directly from the document. Do not make assumptions, modifications or calculations.
94
84
  - Do not normalize or modify any entity values.
@@ -96,8 +86,5 @@ IMPORTANT NOTE:
96
86
 
97
87
  PAY ATTENTION TO THE SGS MACO CUSTOMS SERVICE INVOICES:
98
88
  - invoices from SGS maco customs service,
99
- - Extract only "Vorschuss" as a paidAmount but not "Vorauszahlung".
100
- - Extract "Zu zahlen" or "Bitte Zahlen" as a remainingAmountToPay.
101
- - do not get confused with the paidAmount and remainingAmountToPay. Few invoices may not have a paidAmount or remainingAmountToPay. In such cases, pay attention to the sentence field alignment.
102
89
  - "Total Kosten excl. MwSt." is not the vatApplicableAmount
103
90
  <INSTRUCTIONS>