data-science-document-ai 1.58.0__py3-none-any.whl → 1.60.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (22) hide show
  1. {data_science_document_ai-1.58.0.dist-info → data_science_document_ai-1.60.0.dist-info}/METADATA +1 -1
  2. {data_science_document_ai-1.58.0.dist-info → data_science_document_ai-1.60.0.dist-info}/RECORD +22 -22
  3. src/postprocessing/common.py +0 -35
  4. src/prompts/library/bookingConfirmation/evergreen/placeholders.json +7 -7
  5. src/prompts/library/bookingConfirmation/evergreen/prompt.txt +45 -29
  6. src/prompts/library/bookingConfirmation/hapag-lloyd/prompt.txt +3 -3
  7. src/prompts/library/bookingConfirmation/maersk/placeholders.json +5 -5
  8. src/prompts/library/bookingConfirmation/maersk/prompt.txt +48 -56
  9. src/prompts/library/bookingConfirmation/msc/placeholders.json +9 -9
  10. src/prompts/library/bookingConfirmation/msc/prompt.txt +57 -60
  11. src/prompts/library/bookingConfirmation/oocl/placeholders.json +12 -12
  12. src/prompts/library/bookingConfirmation/oocl/prompt.txt +38 -13
  13. src/prompts/library/bookingConfirmation/other/placeholders.json +11 -11
  14. src/prompts/library/bookingConfirmation/other/prompt.txt +36 -12
  15. src/prompts/library/bookingConfirmation/yangming/placeholders.json +12 -12
  16. src/prompts/library/bookingConfirmation/yangming/prompt.txt +45 -57
  17. src/prompts/library/customsInvoice/other/placeholders.json +1 -1
  18. src/prompts/library/customsInvoice/other/prompt.txt +6 -2
  19. src/prompts/library/partnerInvoice/other/placeholders.json +1 -1
  20. src/prompts/library/partnerInvoice/other/prompt.txt +6 -2
  21. src/utils.py +2 -6
  22. {data_science_document_ai-1.58.0.dist-info → data_science_document_ai-1.60.0.dist-info}/WHEEL +0 -0
@@ -7,66 +7,54 @@ The Freight Forwarding company receives Booking Confirmation from Yangming Carri
7
7
  These Booking Confirmations contain various details related to booking, container pick up and drop off depot details, vessel details, as well as other transport Legs data.
8
8
  They may be written in different languages such as English, German, Vietnamese, Chinese, and other European languages, and can appear in a variety of formats and layouts.
9
9
  Your role is to accurately extract specific entities from these Booking Confirmations to support efficient processing and accurate record-keeping.
10
- <CONTEXT>
11
-
12
10
 
13
- bookingNumber: A unique identifier for the booking.
14
- cyCutOff: The deadline for cargo to be delivered to the Container Yard.
15
- gateInReference: A reference code for cargo entering the terminal.
16
- gateInTerminal: The specific terminal where cargo is gated in.
17
- mblNumber: The Master Bill of Lading number.
18
- pickUpReference: A reference code for cargo pickup.
19
- pickUpTerminal: The specific terminal for cargo pickup.
20
- siCutOff: The deadline for submitting shipping instructions.
21
- vgmCutOff: The deadline for submitting the Verified Gross Mass of the cargo.
22
- transportLegs:
23
- eta: The estimated time of arrival for a specific leg.
24
- etd: The estimated time of departure for a specific leg.
25
- imoNumber: The International Maritime Organization number for a specific leg.
26
- portOfDischarge: The port where cargo is unloaded for a specific leg.
27
- portOfLoading: The port where cargo is loaded for a specific leg.
28
- vesselName: The name of the vessel for a specific leg.
29
- voyage: The journey or route taken by the vessel for a specific leg.
11
+ To provide context on the journey of a containers for both Export and Import shipments,
12
+ For Export shipment: An empty container is picked up from a depot (pickupDepotCode) using a pickUpReference and goods loaded into it at a warehouse. Then the loaded container / cargo is transported back to a Container Yard or gateInTerminal before the cyCutOff date for further shipping processes. Then the POL of the First TransportLeg may start from the gateInTerminal or a different POL too.
13
+ For Import Shipment: The loaded container / cargo arrives at a port of discharge then picked up at pickUpTerminal using pickUpReference. After delivery, an empty container is returned to a depot (dropOffDepotCode).
14
+ <CONTEXT>
30
15
 
31
- your task is to extract the text value of the following entities and page numbers starting from 0 where the value was found in the document:
32
- SCHEMA_PLACEHOLDER
16
+ <INSTRUCTIONS>
17
+ - Populate fields as defined in the response schema.
18
+ - Use the data field description to understand the context of the data.
33
19
 
34
- Keywords for datapoints:
35
- - bookingNumber: Our Reference
36
- - cyCutOff: FCL delivery cut-off
37
- - gateInReference: Our Reference
38
- - gateInTerminal: Export terminal delivery address
39
- - mblNumber: BL/SWB No(s).
40
- - pickUpReference: Export door positioning address(es)
41
- - siCutOff: shipping instruction closing
42
- - vgmCutOff: VGM cut-off
43
- - eta: eta, ETA
44
- - etd: etd, ETD
45
- - imoNumber: IMO No, IMO number
46
- - portOfDischarge: to
47
- - portOfLoading: from
48
- - vesselName: vessel
49
- - voyage: Voy. no
20
+ - gateInTerminalCode: The specific terminal where cargo is gated in. It is mentioned as Delivery Terminal. This sometimes can be the same as portOfLoading of the First transportLeg.
21
+ - cyCutOff: The deadline for cargo to be delivered to the Container Yard. It can be found at Cargo Cut Off or FCL delivery cut-off.
50
22
 
51
- Table Structure to extract TransporLegs:
52
- - transportlegs table has following colum names: From, To, By, ETD, ETA
53
- - If the first leg does not have a ETD and ETA, Also By column includes following info: ["Truck", "Combined Waterway"]; skip the row, start from one row below.
23
+ - transportLegs: Multiple Transport Legs entries may exist, capture all instances under "transportLegs". Make sure the order of the legs are important.
24
+ - eta: The estimated time of arrival for a specific leg.
25
+ - etd: The estimated time of departure for a specific leg. It can also be referred as ETS (Estimated Time of Sailing).
26
+ - imoNumber: The International Maritime Organization number for a specific leg.
27
+ - portOfDischarge: The port where cargo is unloaded for a specific leg.
28
+ - portOfLoading: The port where cargo is loaded for a specific leg.
29
+ - vesselName: The name of the vessel for a specific leg.
30
+ - voyage: The journey or route taken by the vessel for a specific leg.
54
31
 
55
- Further explanation for the transportLegs part as follows:
56
- - you must differentiate the Voyage from DP Voyage. Never used the value with the key 'DP Voyage'.
57
- - Vessel name is in 'By' column and under the key Vessel
32
+ IMPORTANT explanation for the transportLegs part as follows:
33
+ - There is at least one leg in each document.
58
34
  - 'eta' must be equal or later than 'etd'!
59
- - portOfLoading and portOfDischarge are name of the Ports. You can rely on the port names from all over the world.
60
- - portOfLoading and portOfDischarge distinctly denotes the name of the ports. If you find abbreviation of the port use it, if not you can use the full name of the port
61
- - Abbrevations most likely to be in the paranthesis like follows (DEHAM).
62
-
63
- Further explanation for datapoints except transportLegs part as follows:
64
- - If gateInReference is null, assign it the same value as bookingNumber.
65
- - If pickUpReference is null, assign it the same value as bookingNumber.
66
- - If mblNumber is null, assign it the same value as bookingNumber.
67
-
68
- You must apply the following rules:
69
- - The JSON schema must be followed during the extraction.
70
- - The values must only include text found in the document
71
- - Do not normalize any entity value.
72
- - Validate the JSON make sure its a valid JSON ! No extra text, no missing comma!
35
+ - Multiple legs are possible. When there are multiple legs,
36
+ - Sequential Sorting: You must manually re-order legs based on etd then eta, regardless of their order in the source text.
37
+ - "Transhipment" indicates the presence of a multi-leg journey.
38
+ - Transhipment Date is usually empty therefore you may keep ETA of the first leg and ETD of the second leg as null if there is a transhipment.
39
+ - Transhipment Handling: Treat any mentioned "Transhipment" as the bridge between two legs (Discharge for Leg A and Loading for Leg B).
40
+ - The Connectivity Rule: For any sequence of legs, the Port of Discharge of the previous leg must match the Port of Loading of the following leg.
41
+ - First Transhipment is the Port of Discharge for the first transportLegs and Port of Loading for the second transportLegs.
42
+ - Second Transhipment is the Port of Discharge for the second transportLegs and Port of Loading for the third transportLegs.
43
+ - Timeline Integrity: Ensure a "No Time Travel" policy: The eta of a previous leg must be earlier than or equal to the etd of the following leg.
44
+
45
+
46
+ Structure of Multiple Leg Sequence & Mapping
47
+ Leg 1 (Initial):
48
+ - `portOfLoading`: Sailing.
49
+ - `portOfDischarge`: Transhipment (if exists), otherwise DISCHARGE.
50
+ - `etd`: Sailing Date ETS.
51
+ - `eta`: Transhipment Date (if exists), otherwise keep it null in case of multi-legs. Else, DISCHARGE Date for single leg.
52
+
53
+
54
+ Leg 2 (Intermediate): Trigger: Only if PORT OF TRANSHIPMENT exists.
55
+ - `portOfLoading`: portOfDischarge of Leg 1 (Transhipment).
56
+ - `portOfDischarge`: DISCHARGE.
57
+ - `etd`: Transhipment Date if exists else keep it null
58
+ - `eta`: DISCHARGE Date. Do not extract Destination Date here.
59
+
60
+ <INSTRUCTIONS>
@@ -31,7 +31,7 @@
31
31
  },
32
32
  "documentType": {"type": "STRING", "nullable": true},
33
33
  "dueDate": {"type": "STRING", "nullable": true,
34
- "description": "The date by which the payment should be made by Forto Logistics SE & Co KG. Do Not calculate dueDate based on issueDate or any other date. Extract it directly from the invoice."},
34
+ "description": "The date by which the payment should be made by Forto Logistics SE & Co KG. If dueDate is not available in the invoice, calculate dueDate based on issueDate and paymentTerm."},
35
35
  "eta": {"type": "STRING", "nullable": true,
36
36
  "description": "Estimated Time of Arrival (ETA) is the expected date when the shipment will arrive at its destination."},
37
37
  "etd": {"type": "STRING", "nullable": true,
@@ -46,7 +46,11 @@ Your role is to accurately extract specific entities from these invoices to supp
46
46
  - CUSTOMS SUPPORT invoices do not have a vatAmount and vatPercentage.
47
47
 
48
48
  - issueDate: The date the document was issued.
49
- - dueDate: The date by which the payment should be made. Do Not calculate dueDate based on issueDate or any other date. Extract it directly from the invoice.
49
+ - dueDate:
50
+ - The date by which the payment should be made. If dueDate is explicitly mentioned in the invoice, extract that value directly.
51
+ - If dueDate is not available in the invoice, calculate dueDate based on issueDate and paymentTerm.
52
+ - Example: If issueDate is 01.03.2023 and paymentTerm is 30 days, then dueDate will be 31.03.2023.
53
+ - If paymentTerm is not available, do not extract dueDate.
50
54
 
51
55
  - lineItem: Details of each COGS and Customs line item on the invoice from each page. Make sure to extract each amount and currency separately.
52
56
  - uniqueId: A unique id which associated with the lineItem as each line item can belong to a different shipment. Extract only if its available in the line item. Either a shipmentId starting with an S and followed by 6 or 8 numeric values or a mblNumber. If shipmentId or mblNumber does not exist, set it to containerNumber.
@@ -91,7 +95,7 @@ Your role is to accurately extract specific entities from these invoices to supp
91
95
  - sentence: A sentence from the invoice indicating the payment status (e.g., "Vorschuss", "Prepayment", "Paid", "Partially Paid", "Unpaid"). This helps summarize the overall payment status of the invoice.
92
96
 
93
97
  IMPORTANT NOTE:
94
- - Ensure all extracted values are directly from the document. Do not make assumptions or modifications.
98
+ - Ensure all extracted values are directly from the document. Do not make assumptions, modifications or calculations except dueDate.
95
99
  - CustomSized invoices contain line items in a table format in a attached page. Table with headings Shipment ID, Partner Line Item Description, Quantity, Amount, and VAT. Extract all the line items from each tables from each page.
96
100
  - Do not normalize or modify any entity values.
97
101
  - Pay attention to the line item details and paymentInformation, as they may vary significantly across different invoices.
@@ -29,7 +29,7 @@
29
29
  },
30
30
  "documentType": {"type": "STRING", "nullable": true},
31
31
  "dueDate": {"type": "STRING", "nullable": true,
32
- "description": "The date by which the payment should be made by Forto Logistics SE & Co KG. Do Not calculate dueDate based on issueDate or any other date. Extract it directly from the invoice."},
32
+ "description": "The date by which the payment should be made by Forto Logistics SE & Co KG. If dueDate is not available in the invoice, calculate dueDate based on issueDate and paymentTerm."},
33
33
  "eta": {"type": "STRING", "nullable": true,
34
34
  "description": "Estimated Time of Arrival (ETA) is the expected date when the shipment will arrive at its destination."},
35
35
  "etd": {"type": "STRING", "nullable": true,
@@ -42,7 +42,11 @@ Your role is to accurately extract specific entities from these invoices to supp
42
42
  - Remove "TVA" characters from VAT ID in MSC invoices. For example, CHE-111.954.803 TVA should be extracted as CHE-111.954.803
43
43
 
44
44
  - issueDate: The date the document was issued.
45
- - dueDate: The date by which the payment should be made. Do Not calculate dueDate based on issueDate or any other date. Extract it directly from the invoice.
45
+ - dueDate:
46
+ - The date by which the payment should be made. If dueDate is explicitly mentioned in the invoice, extract that value directly.
47
+ - If dueDate is not available in the invoice, calculate dueDate based on issueDate and paymentTerm.
48
+ - Example: If issueDate is 01.03.2023 and paymentTerm is 30 days, then dueDate will be 31.03.2023.
49
+ - If paymentTerm is not available, do not extract dueDate.
46
50
 
47
51
  - eta and etd: Few invoices contains same date for ARRIVED/DEPARTED or ETA/ETD. Extract it for both eta and etd.
48
52
 
@@ -81,7 +85,7 @@ Your role is to accurately extract specific entities from these invoices to supp
81
85
  - reverseChargeSentence: A sentence which indicate that the reverse charge applies. Mostly fund as Tax Clause.
82
86
 
83
87
  IMPORTANT NOTE:
84
- - Ensure all extracted values are directly from the document. Do not make assumptions, modifications or calculations.
88
+ - Ensure all extracted values are directly from the document. Do not make assumptions, modifications or calculations except dueDate.
85
89
  - CustomSized invoices contain line items in a table format in the attached page. Table with headings Shipment ID, Partner Line Item Description, Quantity, Amount, and VAT. Extract all the line items from such tables from each page.
86
90
  - Do not split the quantity into different line items. e.g., if quantity is 2 or 2 CTR or 2 BIL, do not create 2 separate line items with quantity 1 each.
87
91
  - Do not normalize or modify any entity values.
src/utils.py CHANGED
@@ -361,10 +361,7 @@ def extract_top_pages(pdf_bytes, num_pages=4):
361
361
 
362
362
 
363
363
  async def get_tms_mappings(
364
- input_list: List[str],
365
- embedding_type: str,
366
- llm_ports: Optional[List[str]] = None,
367
- input_key: str = None,
364
+ input_list: List[str], embedding_type: str, llm_ports: Optional[List[str]] = None
368
365
  ) -> Dict[str, Any]:
369
366
  """Get TMS mappings for the given values.
370
367
 
@@ -373,7 +370,6 @@ async def get_tms_mappings(
373
370
  embedding_type (str): Type of embedding to use
374
371
  (e.g., "container_types", "ports", "depots", "lineitems", "terminals").
375
372
  llm_ports (list[str], optional): List of LLM ports to use. Defaults to None.
376
- input_key (str, optional): Key to use for input list in payload. Defaults to None.
377
373
 
378
374
  Returns:
379
375
  dict or string: A dictionary or a string with the mapping results.
@@ -393,7 +389,7 @@ async def get_tms_mappings(
393
389
  input_list = [input_list]
394
390
 
395
391
  # Always send a dict with named keys
396
- payload = {input_key or embedding_type: input_list}
392
+ payload = {embedding_type: input_list}
397
393
 
398
394
  if llm_ports:
399
395
  payload["llm_ports"] = llm_ports if isinstance(llm_ports, list) else [llm_ports]