data-science-document-ai 1.56.0__tar.gz → 1.57.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/PKG-INFO +1 -1
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/pyproject.toml +1 -1
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/constants.py +8 -26
- data_science_document_ai-1.57.0/src/docai_processor_config.yaml +9 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/pdf_processing.py +29 -27
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/evergreen/placeholders.json +146 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bookingConfirmation/evergreen/prompt.txt +21 -17
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/hapag-lloyd/placeholders.json +146 -0
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/hapag-lloyd/prompt.txt +59 -0
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/maersk/placeholders.json +146 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bookingConfirmation/maersk/prompt.txt +10 -1
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/msc/placeholders.json +146 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bookingConfirmation/msc/prompt.txt +10 -1
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/oocl/placeholders.json +160 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bookingConfirmation/oocl/prompt.txt +11 -3
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/other/placeholders.json +160 -0
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/other/prompt.txt +57 -0
- data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/yangming/placeholders.json +160 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bookingConfirmation/yangming/prompt.txt +11 -1
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/customsInvoice/other/prompt.txt +1 -1
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/partnerInvoice/other/prompt.txt +2 -4
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/setup.py +11 -9
- data_science_document_ai-1.56.0/src/docai_processor_config.yaml +0 -22
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/evergreen/placeholders.json +0 -32
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/hapag-lloyd/placeholders.json +0 -32
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/hapag-lloyd/prompt.txt +0 -65
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/maersk/placeholders.json +0 -32
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/msc/placeholders.json +0 -32
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/oocl/placeholders.json +0 -32
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/other/placeholders.json +0 -32
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/other/prompt.txt +0 -58
- data_science_document_ai-1.56.0/src/prompts/library/bookingConfirmation/yangming/placeholders.json +0 -32
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/constants_sandbox.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/docai.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/excel_processing.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/io.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/llm.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/log_setup.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/postprocessing/common.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/postprocessing/postprocess_booking_confirmation.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/postprocessing/postprocess_commercial_invoice.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/postprocessing/postprocess_partner_invoice.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/arrivalNotice/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/arrivalNotice/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bundeskasse/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/bundeskasse/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/commercialInvoice/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/commercialInvoice/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/customsAssessment/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/customsAssessment/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/customsInvoice/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/deliveryOrder/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/deliveryOrder/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/draftMbl/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/draftMbl/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/finalMbL/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/finalMbL/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/packingList/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/packingList/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/partnerInvoice/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/postprocessing/port_code/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/postprocessing/port_code/prompt_port_code.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/preprocessing/carrier/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/preprocessing/carrier/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/shippingInstruction/other/placeholders.json +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/library/shippingInstruction/other/prompt.txt +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/prompts/prompt_library.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/tms.py +0 -0
- {data_science_document_ai-1.56.0 → data_science_document_ai-1.57.0}/src/utils.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[tool.poetry]
|
|
2
2
|
name = "data-science-document-ai"
|
|
3
|
-
version = "1.
|
|
3
|
+
version = "1.57.0"
|
|
4
4
|
description = "\"Document AI repo for data science\""
|
|
5
5
|
authors = ["Naomi Nguyen <naomi.nguyen@forto.com>", "Kumar Rajendrababu <kumar.rajendrababu@forto.com>", "Igor Tonko <igor.tonko@forto.com>", "Osman Demirel <osman.demirel@forto.com>"]
|
|
6
6
|
packages = [
|
|
@@ -37,6 +37,8 @@ project_parameters = {
|
|
|
37
37
|
# models metadata (confidence),
|
|
38
38
|
"g_model_data_folder": "models",
|
|
39
39
|
"local_model_data_folder": "data",
|
|
40
|
+
"if_use_docai": False,
|
|
41
|
+
"if_use_llm": True, # Keep it always True
|
|
40
42
|
"released_doc_types": {
|
|
41
43
|
"bookingConfirmation",
|
|
42
44
|
"packingList",
|
|
@@ -51,16 +53,6 @@ project_parameters = {
|
|
|
51
53
|
"customsInvoice",
|
|
52
54
|
"bundeskasse",
|
|
53
55
|
},
|
|
54
|
-
"model_selector": {
|
|
55
|
-
"stable": {
|
|
56
|
-
"bookingConfirmation": 1,
|
|
57
|
-
},
|
|
58
|
-
"beta": {
|
|
59
|
-
"bookingConfirmation": 0,
|
|
60
|
-
},
|
|
61
|
-
},
|
|
62
|
-
# this is the model selector for the model to be used from the model_config.yaml
|
|
63
|
-
# file based on the environment, 0 mean the first model in the list
|
|
64
56
|
# LLM model parameters
|
|
65
57
|
"gemini_params": {
|
|
66
58
|
"temperature": 0,
|
|
@@ -78,25 +70,15 @@ project_parameters = {
|
|
|
78
70
|
"seed": 42,
|
|
79
71
|
"model_id": "gemini-2.5-flash",
|
|
80
72
|
},
|
|
81
|
-
# Key to combine the LLM results with the Doc Ai results
|
|
82
|
-
"key_to_combine": {
|
|
83
|
-
"bookingConfirmation": ["transportLegs"],
|
|
84
|
-
"arrivalNotice": ["containers"],
|
|
85
|
-
"finalMbL": ["containers"],
|
|
86
|
-
"draftMbl": ["containers"],
|
|
87
|
-
"deliveryOrder": ["Equipment", "TransportLeg"],
|
|
88
|
-
"customsAssessment": ["containers"],
|
|
89
|
-
"packingList": ["skuData"],
|
|
90
|
-
"commercialInvoice": ["skus"],
|
|
91
|
-
"shippingInstruction": ["containers"],
|
|
92
|
-
"partnerInvoice": ["lineItem"],
|
|
93
|
-
"customsInvoice": ["lineItem"],
|
|
94
|
-
"bundeskasse": ["lineItem"],
|
|
95
|
-
},
|
|
96
73
|
}
|
|
97
74
|
|
|
98
75
|
# Hardcoded rules for data points formatting that can't be based on label name alone
|
|
99
76
|
formatting_rules = {
|
|
100
|
-
"bookingConfirmation": {
|
|
77
|
+
"bookingConfirmation": {
|
|
78
|
+
"pickUpDepotCode": "depot",
|
|
79
|
+
"dropOffDepotCode": "depot",
|
|
80
|
+
"gateInTerminalCode": "terminal",
|
|
81
|
+
"pickUpTerminalCode": "terminal",
|
|
82
|
+
},
|
|
101
83
|
"deliveryOrder": {"pickUpTerminal": "terminal", "EmptyContainerDepot": "depot"},
|
|
102
84
|
}
|
|
@@ -201,9 +201,6 @@ async def process_file_w_llm(params, file_content, input_doc_type, llm_client):
|
|
|
201
201
|
number_of_pages = get_pdf_page_count(file_content)
|
|
202
202
|
logger.info(f"processing {input_doc_type} with {number_of_pages} pages...")
|
|
203
203
|
|
|
204
|
-
# get the schema placeholder
|
|
205
|
-
response_schema = prompt_library.library[input_doc_type]["other"]["placeholders"]
|
|
206
|
-
|
|
207
204
|
carrier = "other"
|
|
208
205
|
carrier_schema = (
|
|
209
206
|
prompt_library.library.get("preprocessing", {})
|
|
@@ -240,6 +237,9 @@ async def process_file_w_llm(params, file_content, input_doc_type, llm_client):
|
|
|
240
237
|
# get the related prompt from predefined prompt library
|
|
241
238
|
prompt = prompt_library.library[input_doc_type][carrier]["prompt"]
|
|
242
239
|
|
|
240
|
+
# get the schema placeholder
|
|
241
|
+
response_schema = prompt_library.library[input_doc_type][carrier]["placeholders"]
|
|
242
|
+
|
|
243
243
|
# Add page-number extraction for moderately large docs
|
|
244
244
|
use_chunking = number_of_pages >= params["chunk_after"]
|
|
245
245
|
|
|
@@ -353,8 +353,7 @@ async def extract_data_from_pdf_w_llm(params, input_doc_type, file_content, llm_
|
|
|
353
353
|
# Add currency from the amount field
|
|
354
354
|
if input_doc_type in ["commercialInvoice"]:
|
|
355
355
|
result = postprocessing_commercial_invoice(result, params, input_doc_type)
|
|
356
|
-
|
|
357
|
-
result = postprocess_booking_confirmation(result)
|
|
356
|
+
|
|
358
357
|
return result, llm_client.model_id
|
|
359
358
|
|
|
360
359
|
|
|
@@ -373,13 +372,14 @@ def combine_llm_results_w_doc_ai(
|
|
|
373
372
|
Returns:
|
|
374
373
|
combined result
|
|
375
374
|
"""
|
|
376
|
-
result =
|
|
377
|
-
|
|
378
|
-
|
|
375
|
+
result = remove_none_values(llm)
|
|
376
|
+
|
|
377
|
+
docAi = doc_ai.copy()
|
|
378
|
+
if not docAi:
|
|
379
379
|
return result
|
|
380
380
|
|
|
381
381
|
# Merge top-level keys
|
|
382
|
-
result.update({k: v for k, v in
|
|
382
|
+
result.update({k: v for k, v in docAi.items() if k not in result})
|
|
383
383
|
|
|
384
384
|
if (
|
|
385
385
|
input_doc_type
|
|
@@ -387,28 +387,28 @@ def combine_llm_results_w_doc_ai(
|
|
|
387
387
|
and keys_to_combine
|
|
388
388
|
):
|
|
389
389
|
result.update(
|
|
390
|
-
{key:
|
|
390
|
+
{key: docAi.get(key) for key in keys_to_combine if key in docAi.keys()}
|
|
391
391
|
)
|
|
392
392
|
return result
|
|
393
393
|
|
|
394
394
|
# Handle specific key-based merging logic for multiple keys
|
|
395
395
|
if keys_to_combine:
|
|
396
396
|
for key in keys_to_combine:
|
|
397
|
-
if key in
|
|
397
|
+
if key in docAi.keys():
|
|
398
398
|
# Merge the list of dictionaries
|
|
399
|
-
# If the length of the
|
|
400
|
-
if len(
|
|
401
|
-
result[key] =
|
|
399
|
+
# If the length of the docAi list is less than the LLM result, replace with the docAi list
|
|
400
|
+
if len(docAi[key]) < len(result[key]):
|
|
401
|
+
result[key] = docAi[key]
|
|
402
402
|
else:
|
|
403
|
-
# If the length of the
|
|
403
|
+
# If the length of the docAi list is greater than or equal to the LLM result,
|
|
404
404
|
# add & merge the dictionaries
|
|
405
|
-
if isinstance(
|
|
406
|
-
for i in range(len(
|
|
405
|
+
if isinstance(docAi[key], list):
|
|
406
|
+
for i in range(len(docAi[key])):
|
|
407
407
|
if i == len(result[key]):
|
|
408
|
-
result[key].append(
|
|
408
|
+
result[key].append(docAi[key][i])
|
|
409
409
|
else:
|
|
410
|
-
for sub_key in
|
|
411
|
-
result[key][i][sub_key] =
|
|
410
|
+
for sub_key in docAi[key][i].keys():
|
|
411
|
+
result[key][i][sub_key] = docAi[key][i][sub_key]
|
|
412
412
|
return result
|
|
413
413
|
|
|
414
414
|
|
|
@@ -502,13 +502,15 @@ async def data_extraction_manual_flow(
|
|
|
502
502
|
page_count = None
|
|
503
503
|
# Validate the file type
|
|
504
504
|
if mime_type == "application/pdf":
|
|
505
|
+
if_use_docai = params["if_use_docai"]
|
|
506
|
+
|
|
505
507
|
# Enable Doc Ai only for certain document types.
|
|
506
|
-
if_use_docai
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
|
|
508
|
+
if params["if_use_docai"]:
|
|
509
|
+
if_use_docai = (
|
|
510
|
+
True
|
|
511
|
+
if meta.documentTypeCode in params["model_config"]["stable"]
|
|
512
|
+
else False
|
|
513
|
+
)
|
|
512
514
|
|
|
513
515
|
(
|
|
514
516
|
extracted_data,
|
|
@@ -520,7 +522,7 @@ async def data_extraction_manual_flow(
|
|
|
520
522
|
meta.documentTypeCode,
|
|
521
523
|
processor_client,
|
|
522
524
|
if_use_docai=if_use_docai,
|
|
523
|
-
if_use_llm=if_use_llm,
|
|
525
|
+
if_use_llm=params["if_use_llm"],
|
|
524
526
|
llm_client=llm_client,
|
|
525
527
|
isBetaTest=False,
|
|
526
528
|
)
|
data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/evergreen/placeholders.json
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
{
|
|
2
|
+
"type": "OBJECT",
|
|
3
|
+
"properties": {
|
|
4
|
+
"bookingNumber": {
|
|
5
|
+
"type": "STRING",
|
|
6
|
+
"nullable": true,
|
|
7
|
+
"description": "A unique identifier assigned to the shipment booking, used for tracking and reference. They are often referred to as 'Booking No.', 'Booking Reference', 'Our Reference', or 'Order Ref'."
|
|
8
|
+
},
|
|
9
|
+
"contractNumber": {
|
|
10
|
+
"type": "STRING",
|
|
11
|
+
"nullable": true,
|
|
12
|
+
"description": "It's a contract number between the carrier and Forto Logistics SE & Co KG."
|
|
13
|
+
},
|
|
14
|
+
"pickUpTerminalCode": {
|
|
15
|
+
"type": "STRING",
|
|
16
|
+
"nullable": true,
|
|
17
|
+
"description": "The specific terminal for cargo pickup during the import shipment."
|
|
18
|
+
},
|
|
19
|
+
"gateInTerminalCode": {
|
|
20
|
+
"type": "STRING",
|
|
21
|
+
"nullable": true,
|
|
22
|
+
"description": "The specific terminal where cargo is gated in especially Export terminal delivery address. E.g., Export terminal delivery address, Export terminal location, or Export terminal name."
|
|
23
|
+
},
|
|
24
|
+
"performaDate": {
|
|
25
|
+
"type": "STRING",
|
|
26
|
+
"nullable": true,
|
|
27
|
+
"description": "The date considered to apply the rates and charges specified in the booking confirmation"
|
|
28
|
+
},
|
|
29
|
+
"cyCutOff": {
|
|
30
|
+
"type": "STRING",
|
|
31
|
+
"nullable": true,
|
|
32
|
+
"description": "The datetime by which the cargo to be delivered to the Container Yard. It can be found with keys FCL delivery cut-off, FCL DG delivery cut-off, CY CUT OFF, CY Closing."
|
|
33
|
+
},
|
|
34
|
+
"gateInReference": {
|
|
35
|
+
"type": "STRING",
|
|
36
|
+
"nullable": true,
|
|
37
|
+
"description": "A reference code for cargo entering the terminal to drop the loaded cargo for Export. Sometimes it can be 'Our Reference'."
|
|
38
|
+
},
|
|
39
|
+
"mblNumber": {
|
|
40
|
+
"type": "STRING",
|
|
41
|
+
"nullable": true,
|
|
42
|
+
"description": "Bill of Lading number (B/L NO.), a document issued by the carrier."
|
|
43
|
+
},
|
|
44
|
+
"pickUpReference": {
|
|
45
|
+
"type": "STRING",
|
|
46
|
+
"nullable": true,
|
|
47
|
+
"description": "A reference code for cargo pickup during the import shipment. Sometimes it can be 'Our Reference'."
|
|
48
|
+
},
|
|
49
|
+
"siCutOff": {
|
|
50
|
+
"type": "STRING",
|
|
51
|
+
"nullable": true,
|
|
52
|
+
"description": "The deadline datetime for submitting the Shipping Instructions (SI) to the carrier. It can be found with keys Shipping Instruction Closing."
|
|
53
|
+
},
|
|
54
|
+
"vgmCutOff": {
|
|
55
|
+
"type": "STRING",
|
|
56
|
+
"nullable": true,
|
|
57
|
+
"description": "The deadline datetime for submitting the Verified Gross Mass (VGM) to the carrier. It can be found with keys VGM DEADLINE, VGM DUE, VGM CUT OFF."
|
|
58
|
+
},
|
|
59
|
+
"containers": {
|
|
60
|
+
"type": "ARRAY",
|
|
61
|
+
"items": {
|
|
62
|
+
"type": "OBJECT",
|
|
63
|
+
"properties": {
|
|
64
|
+
"containerType": {
|
|
65
|
+
"type": "STRING",
|
|
66
|
+
"nullable": true,
|
|
67
|
+
"description": "The size / type of the container, such as 20ft, 40ft, 40HC, 20DC etc under Type/Size column."
|
|
68
|
+
},
|
|
69
|
+
"pickUpDepotCode": {
|
|
70
|
+
"type": "STRING",
|
|
71
|
+
"nullable": true,
|
|
72
|
+
"description": "The depot code where the empty container will be picked up. It is identified as Empty Pick Up Depot or Export Empty Pick Up Depot(s)."
|
|
73
|
+
},
|
|
74
|
+
"dropOffDepotCode": {
|
|
75
|
+
"type": "STRING",
|
|
76
|
+
"nullable": true,
|
|
77
|
+
"description": "The depot code where the empty container will be dropped off."
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
},
|
|
81
|
+
"required": ["containerType", "pickupDepotCode", "dropoffDepotCode"]
|
|
82
|
+
},
|
|
83
|
+
"transportLegs": {
|
|
84
|
+
"type": "ARRAY",
|
|
85
|
+
"items": {
|
|
86
|
+
"type": "OBJECT",
|
|
87
|
+
"properties": {
|
|
88
|
+
"eta": {
|
|
89
|
+
"type": "STRING",
|
|
90
|
+
"nullable": true,
|
|
91
|
+
"description": "Estimated Time of Arrival (ETA) is the expected date when the shipment will arrive at its destination."
|
|
92
|
+
},
|
|
93
|
+
"etd": {
|
|
94
|
+
"type": "STRING",
|
|
95
|
+
"nullable": true,
|
|
96
|
+
"description": "Estimated Time of Departure (ETD) is the expected date when the shipment will leave the origin port."
|
|
97
|
+
},
|
|
98
|
+
"imoNumber": {
|
|
99
|
+
"type": "STRING",
|
|
100
|
+
"nullable": true,
|
|
101
|
+
"description": "The International Maritime Organization number for a specific leg. It can be found as IMO No, IMO number."
|
|
102
|
+
},
|
|
103
|
+
"portOfDischarge": {
|
|
104
|
+
"type": "STRING",
|
|
105
|
+
"nullable": true,
|
|
106
|
+
"description": "The port where the goods are discharged from the vessel. This is the destination port for the shipment. It can be found at POD, Port of Discharge, To, Discharge Port"
|
|
107
|
+
},
|
|
108
|
+
"portOfLoading": {
|
|
109
|
+
"type": "STRING",
|
|
110
|
+
"nullable": true,
|
|
111
|
+
"description": "The port where the goods are loaded onto the vessel. This is the origin port for the shipment. It can be found at POL, Port of Loading, From, Load Port"
|
|
112
|
+
},
|
|
113
|
+
"vesselName": {
|
|
114
|
+
"type": "STRING",
|
|
115
|
+
"nullable": true,
|
|
116
|
+
"description": "The name of the vessel carrying the shipment. It can be found at vessel, INTENDED VESSEL/VOYAGE"
|
|
117
|
+
},
|
|
118
|
+
"voyage": {
|
|
119
|
+
"type": "STRING",
|
|
120
|
+
"nullable": true,
|
|
121
|
+
"description": "The journey or route taken by the vessel for a specific leg. It can be found at Voy. no, INTENDED VESSEL/VOYAGE"
|
|
122
|
+
}
|
|
123
|
+
}
|
|
124
|
+
},
|
|
125
|
+
"required": [
|
|
126
|
+
"eta",
|
|
127
|
+
"etd",
|
|
128
|
+
"portOfDischarge",
|
|
129
|
+
"portOfLoading",
|
|
130
|
+
"vesselName",
|
|
131
|
+
"voyage"
|
|
132
|
+
]
|
|
133
|
+
},
|
|
134
|
+
"carrierAddress": {
|
|
135
|
+
"type": "STRING",
|
|
136
|
+
"nullable": true,
|
|
137
|
+
"description": "The address of the carrier who provides service and issued the document."
|
|
138
|
+
},
|
|
139
|
+
"carrierName": {
|
|
140
|
+
"type": "STRING",
|
|
141
|
+
"nullable": true,
|
|
142
|
+
"description": "The name of the carrier who issued the document e,g, Hapag-Lloyd."
|
|
143
|
+
}
|
|
144
|
+
},
|
|
145
|
+
"required": ["bookingNumber", "transportLegs", "containers", "cyCutOff", "vgmCutOff", "siCutOff"]
|
|
146
|
+
}
|
|
@@ -1,6 +1,14 @@
|
|
|
1
|
-
|
|
2
|
-
|
|
3
|
-
|
|
1
|
+
<PERSONA> You are an efficient document entity data extraction specialist working for a Freight Forwarding company. <PERSONA>
|
|
2
|
+
|
|
3
|
+
<TASK> Your task is to extract data from Booking Confirmation documents as per the given response schema structure. <TASK>
|
|
4
|
+
|
|
5
|
+
<CONTEXT>
|
|
6
|
+
The Freight Forwarding company receives Booking Confirmation from EverGreen Carrier (Shipping Lines) partner.
|
|
7
|
+
These Booking Confirmations contain various details related to booking, container pick up and drop off depot details, vessel details, as well as other transport Legs data.
|
|
8
|
+
They may be written in different languages such as English, German, Vietnamese, Chinese, and other European languages, and can appear in a variety of formats and layouts.
|
|
9
|
+
Your role is to accurately extract specific entities from these Booking Confirmations to support efficient processing and accurate record-keeping.
|
|
10
|
+
<CONTEXT>
|
|
11
|
+
|
|
4
12
|
"mblNumber": "Extract the value after the label 'BOOKING NO.'.",
|
|
5
13
|
"gateInReference": "Extract the value after the label 'BOOKING NO.'.",
|
|
6
14
|
"pickUpReference": "Extract the value after the label 'BOOKING NO.'.",
|
|
@@ -14,23 +22,19 @@ your task is to extract the text value of the following entities and page number
|
|
|
14
22
|
"portOfDischarge": "Extract the text after the label 'PORT OF DISCHARGING:' and before 'FINAL DESTINATION'.",
|
|
15
23
|
"pickUpTerminal": "Extract the text after the label 'EMPTY PICK UP AT:' removing any extra spaces or line breaks.",
|
|
16
24
|
"gateInTerminal": "Extract the text after the label 'FULL RETURN TO:' removing any extra spaces or line breaks.",
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
"portOfLoading": "For the first leg, use the extracted 'portOfLoading'.",
|
|
20
|
-
"portOfDischarge": "Extract the text after the label 'T/S PORT OF LOADING:'.",
|
|
21
|
-
"vesselName": "For the first leg, use the extracted 'vesselName'.",
|
|
22
|
-
"voyage": "Voyage is a code of numbers and letters sometimes separated by '-'. For the first leg, use the extracted 'voyage'.",
|
|
23
|
-
"eta": "Extract the date after the label 'ETA DATE' that appears within the section starting with 'FINAL DESTINATION:' and ending with 'T/S PORT OF LOADING:'.",
|
|
24
|
-
"etd": "Extract the date after the label 'ETD DATE' that appears within the section starting with 'PORT OF LOADING:' and ending with 'FINAL DESTINATION:'.",
|
|
25
|
-
|
|
26
|
-
|
|
25
|
+
|
|
26
|
+
"transportLegs":
|
|
27
|
+
"portOfLoading": "For the first leg, use the extracted 'portOfLoading'.",
|
|
28
|
+
"portOfDischarge": "Extract the text after the label 'T/S PORT OF LOADING:'.",
|
|
29
|
+
"vesselName": "For the first leg, use the extracted 'vesselName'.",
|
|
30
|
+
"voyage": "Voyage is a code of numbers and letters sometimes separated by '-'. For the first leg, use the extracted 'voyage'.",
|
|
31
|
+
"eta": "Extract the date after the label 'ETA DATE' that appears within the section starting with 'FINAL DESTINATION:' and ending with 'T/S PORT OF LOADING:'.",
|
|
32
|
+
"etd": "Extract the date after the label 'ETD DATE' that appears within the section starting with 'PORT OF LOADING:' and ending with 'FINAL DESTINATION:'.",
|
|
33
|
+
|
|
34
|
+
|
|
27
35
|
"portOfLoading": "For the second leg, use the 'portOfDischarge' from the previous leg.",
|
|
28
36
|
"portOfDischarge": "For the second leg, use the extracted 'portOfDischarge' from the main extraction.",
|
|
29
37
|
"vesselName": "Extract the text after the label 'EST. CONNECT VSL/VOY:' and before the hyphen and numbers.",
|
|
30
38
|
"voyage": "Voyage is a code of numbers and letters sometimes separated by '-'. Extract the code after the label 'EST. CONNECT VSL/VOY:' and after the vessel name.",
|
|
31
39
|
"eta": "Extract the date after the label 'ETA DATE' that is after the line that contains 'T/S PORT OF LOADING'",
|
|
32
40
|
"etd": "Extract the date after the label 'ETD DATE' that is related to the 'EST. CONNECT VSL/VOY:'. "
|
|
33
|
-
}
|
|
34
|
-
]
|
|
35
|
-
}
|
|
36
|
-
```
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
{
|
|
2
|
+
"type": "OBJECT",
|
|
3
|
+
"properties": {
|
|
4
|
+
"bookingNumber": {
|
|
5
|
+
"type": "STRING",
|
|
6
|
+
"nullable": true,
|
|
7
|
+
"description": "A unique identifier assigned to the shipment booking, used for tracking and reference. They are often referred to as 'Booking No.', 'Booking Reference', 'Our Reference', or 'Order Ref'."
|
|
8
|
+
},
|
|
9
|
+
"contractNumber": {
|
|
10
|
+
"type": "STRING",
|
|
11
|
+
"nullable": true,
|
|
12
|
+
"description": "It's a contract number between the carrier and Forto Logistics SE & Co KG."
|
|
13
|
+
},
|
|
14
|
+
"pickUpTerminalCode": {
|
|
15
|
+
"type": "STRING",
|
|
16
|
+
"nullable": true,
|
|
17
|
+
"description": "The specific terminal for cargo pickup during the import shipment."
|
|
18
|
+
},
|
|
19
|
+
"gateInTerminalCode": {
|
|
20
|
+
"type": "STRING",
|
|
21
|
+
"nullable": true,
|
|
22
|
+
"description": "The specific terminal where cargo is gated in especially Export terminal delivery address. E.g., Export terminal delivery address, Export terminal location, or Export terminal name."
|
|
23
|
+
},
|
|
24
|
+
"performaDate": {
|
|
25
|
+
"type": "STRING",
|
|
26
|
+
"nullable": true,
|
|
27
|
+
"description": "The date considered to apply the rates and charges specified in the booking confirmation"
|
|
28
|
+
},
|
|
29
|
+
"cyCutOff": {
|
|
30
|
+
"type": "STRING",
|
|
31
|
+
"nullable": true,
|
|
32
|
+
"description": "The datetime by which the cargo to be delivered to the Container Yard. It can be found with keys FCL delivery cut-off, FCL DG delivery cut-off, CY CUT OFF, CY Closing."
|
|
33
|
+
},
|
|
34
|
+
"gateInReference": {
|
|
35
|
+
"type": "STRING",
|
|
36
|
+
"nullable": true,
|
|
37
|
+
"description": "A reference code for cargo entering the terminal to drop the loaded cargo for Export. Sometimes it can be 'Our Reference'."
|
|
38
|
+
},
|
|
39
|
+
"mblNumber": {
|
|
40
|
+
"type": "STRING",
|
|
41
|
+
"nullable": true,
|
|
42
|
+
"description": "Bill of Lading number (B/L NO.), a document issued by the carrier."
|
|
43
|
+
},
|
|
44
|
+
"pickUpReference": {
|
|
45
|
+
"type": "STRING",
|
|
46
|
+
"nullable": true,
|
|
47
|
+
"description": "A reference code for cargo pickup during the import shipment. Sometimes it can be 'Our Reference'."
|
|
48
|
+
},
|
|
49
|
+
"siCutOff": {
|
|
50
|
+
"type": "STRING",
|
|
51
|
+
"nullable": true,
|
|
52
|
+
"description": "The deadline datetime for submitting the Shipping Instructions (SI) to the carrier. It can be found with keys Shipping Instruction Closing."
|
|
53
|
+
},
|
|
54
|
+
"vgmCutOff": {
|
|
55
|
+
"type": "STRING",
|
|
56
|
+
"nullable": true,
|
|
57
|
+
"description": "The deadline datetime for submitting the Verified Gross Mass (VGM) to the carrier. It can be found with keys VGM DEADLINE, VGM DUE, VGM CUT OFF."
|
|
58
|
+
},
|
|
59
|
+
"containers": {
|
|
60
|
+
"type": "ARRAY",
|
|
61
|
+
"items": {
|
|
62
|
+
"type": "OBJECT",
|
|
63
|
+
"properties": {
|
|
64
|
+
"containerType": {
|
|
65
|
+
"type": "STRING",
|
|
66
|
+
"nullable": true,
|
|
67
|
+
"description": "The size / type of the container, such as 20ft, 40ft, 40HC, 20DC etc under Type/Size column."
|
|
68
|
+
},
|
|
69
|
+
"pickUpDepotCode": {
|
|
70
|
+
"type": "STRING",
|
|
71
|
+
"nullable": true,
|
|
72
|
+
"description": "The depot code where the empty container will be picked up. It is identified as Empty Pick Up Depot or Export Empty Pick Up Depot(s)."
|
|
73
|
+
},
|
|
74
|
+
"dropOffDepotCode": {
|
|
75
|
+
"type": "STRING",
|
|
76
|
+
"nullable": true,
|
|
77
|
+
"description": "The depot code where the empty container will be dropped off."
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
},
|
|
81
|
+
"required": ["containerType", "pickupDepotCode", "dropoffDepotCode"]
|
|
82
|
+
},
|
|
83
|
+
"transportLegs": {
|
|
84
|
+
"type": "ARRAY",
|
|
85
|
+
"items": {
|
|
86
|
+
"type": "OBJECT",
|
|
87
|
+
"properties": {
|
|
88
|
+
"eta": {
|
|
89
|
+
"type": "STRING",
|
|
90
|
+
"nullable": true,
|
|
91
|
+
"description": "Estimated Time of Arrival (ETA) is the expected date when the shipment will arrive at its destination."
|
|
92
|
+
},
|
|
93
|
+
"etd": {
|
|
94
|
+
"type": "STRING",
|
|
95
|
+
"nullable": true,
|
|
96
|
+
"description": "Estimated Time of Departure (ETD) is the expected date when the shipment will leave the origin port."
|
|
97
|
+
},
|
|
98
|
+
"imoNumber": {
|
|
99
|
+
"type": "STRING",
|
|
100
|
+
"nullable": true,
|
|
101
|
+
"description": "The International Maritime Organization number for a specific leg. It can be found as IMO No, IMO number."
|
|
102
|
+
},
|
|
103
|
+
"portOfDischarge": {
|
|
104
|
+
"type": "STRING",
|
|
105
|
+
"nullable": true,
|
|
106
|
+
"description": "The port where the goods are discharged from the vessel. This is the destination port for the shipment. It can be found at POD, Port of Discharge, To, Discharge Port"
|
|
107
|
+
},
|
|
108
|
+
"portOfLoading": {
|
|
109
|
+
"type": "STRING",
|
|
110
|
+
"nullable": true,
|
|
111
|
+
"description": "The port where the goods are loaded onto the vessel. This is the origin port for the shipment. It can be found at POL, Port of Loading, From, Load Port"
|
|
112
|
+
},
|
|
113
|
+
"vesselName": {
|
|
114
|
+
"type": "STRING",
|
|
115
|
+
"nullable": true,
|
|
116
|
+
"description": "The name of the vessel carrying the shipment. It can be found at vessel, INTENDED VESSEL/VOYAGE"
|
|
117
|
+
},
|
|
118
|
+
"voyage": {
|
|
119
|
+
"type": "STRING",
|
|
120
|
+
"nullable": true,
|
|
121
|
+
"description": "The journey or route taken by the vessel for a specific leg. It can be found at Voy. no, INTENDED VESSEL/VOYAGE"
|
|
122
|
+
}
|
|
123
|
+
}
|
|
124
|
+
},
|
|
125
|
+
"required": [
|
|
126
|
+
"eta",
|
|
127
|
+
"etd",
|
|
128
|
+
"portOfDischarge",
|
|
129
|
+
"portOfLoading",
|
|
130
|
+
"vesselName",
|
|
131
|
+
"voyage"
|
|
132
|
+
]
|
|
133
|
+
},
|
|
134
|
+
"carrierAddress": {
|
|
135
|
+
"type": "STRING",
|
|
136
|
+
"nullable": true,
|
|
137
|
+
"description": "The address of the carrier who provides service and issued the document."
|
|
138
|
+
},
|
|
139
|
+
"carrierName": {
|
|
140
|
+
"type": "STRING",
|
|
141
|
+
"nullable": true,
|
|
142
|
+
"description": "The name of the carrier who issued the document e,g, Hapag-Lloyd."
|
|
143
|
+
}
|
|
144
|
+
},
|
|
145
|
+
"required": ["bookingNumber", "transportLegs", "containers", "cyCutOff", "vgmCutOff", "siCutOff"]
|
|
146
|
+
}
|
data_science_document_ai-1.57.0/src/prompts/library/bookingConfirmation/hapag-lloyd/prompt.txt
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
<PERSONA> You are an efficient document entity data extraction specialist working for a Freight Forwarding company. <PERSONA>
|
|
2
|
+
|
|
3
|
+
<TASK> Your task is to extract data from Booking Confirmation documents as per the given response schema structure. <TASK>
|
|
4
|
+
|
|
5
|
+
<CONTEXT>
|
|
6
|
+
The Freight Forwarding company receives Booking Confirmation from Hapag-Lloyd Carrier (Shipping Lines) partners.
|
|
7
|
+
These Booking Confirmations contain various details related to booking, container pick up and drop off depot details, vessel details, as well as other transport Legs data.
|
|
8
|
+
They may be written in different languages such as English, German, Vietnamese, Chinese, and other European languages, and can appear in a variety of formats and layouts.
|
|
9
|
+
Your role is to accurately extract specific entities from these Booking Confirmations to support efficient processing and accurate record-keeping.
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
To provide context on the journey of a containers for both Export and Import shipments,
|
|
13
|
+
For Export shipment: An empty container is picked up from a depot (pickupDepotCode) using a pickUpReference and goods loaded into it at a warehouse. Then the loaded container / cargo is transported back to a Container Yard or gateInTerminal before the cyCutOff date for further shipping processes.
|
|
14
|
+
For Import Shipment: The loaded container / cargo arrives at a port of discharge then picked up at pickUpTerminal using pickUpReference. After delivery, an empty container is returned to a depot (dropOffDepotCode).
|
|
15
|
+
<CONTEXT>
|
|
16
|
+
|
|
17
|
+
<INSTRUCTIONS>
|
|
18
|
+
- Populate fields as defined in the response schema.
|
|
19
|
+
- Use the data field description to understand the context of the data.
|
|
20
|
+
|
|
21
|
+
- gateInTerminal: The specific terminal where cargo is gated in. It can be found as Export terminal delivery address, PORT OF LOADING (after the slash '/').
|
|
22
|
+
- gateInReference: A reference code for cargo entering the terminal. If not mentioned explicitly and gateInTerminal is extracted, then use bookingNumber as gateInReference.
|
|
23
|
+
- pickUpTerminal: The specific terminal for cargo pickup. It can be found as Import pick up address(es), PORT OF DISCHARGE (after the slash '/').
|
|
24
|
+
- pickUpReference: A reference code for cargo pickup. If not mentioned explicitly and pickUpTerminal is extracted, then use bookingNumber as pickUpReference.
|
|
25
|
+
|
|
26
|
+
- cyCutOff: The deadline for cargo to be delivered to the Container Yard. It can be referred to as FCL delivery cut-off, CY CUT OFF, CY Closing - Latest Return Container Date, Cargo Cut-off deadline
|
|
27
|
+
- siCutOff: The deadline for submitting shipping instructions. It can be referred to as Shipping Instruction closing, SI Cut Off, Shipping Instruction deadline, INTENDED SI CUT-OFF
|
|
28
|
+
- vgmCutOff: The deadline for submitting the Verified Gross Mass of the cargo. It can be referred to as VGM cut-off, VGM Submission Deadline, Verified Gross Mass deadline
|
|
29
|
+
|
|
30
|
+
- carrierName and carrierAddress:
|
|
31
|
+
- Extract the name and address of the carrier who is the main parent company in the document.
|
|
32
|
+
- Example:
|
|
33
|
+
- "Hapag-Lloyd AG" or "Hapag-Lloyd Aktiengesellschaft" for vendorName.
|
|
34
|
+
|
|
35
|
+
- transportLegs: Multiple Transport Legs entries may exist, capture all instances under "transportLegs". Make sure the order of the legs are important.
|
|
36
|
+
- eta: The estimated time of arrival for a specific leg.
|
|
37
|
+
- etd: The estimated time of departure for a specific leg.
|
|
38
|
+
- imoNumber: The International Maritime Organization number for a specific leg.
|
|
39
|
+
- portOfDischarge: The port where cargo is unloaded for a specific leg.
|
|
40
|
+
- portOfLoading: The port where cargo is loaded for a specific leg.
|
|
41
|
+
- vesselName: The name of the vessel for a specific leg.
|
|
42
|
+
- voyage: The journey or route taken by the vessel for a specific leg.
|
|
43
|
+
|
|
44
|
+
- Containers: Need to extract Depot details per Container Type. Multiple Containers entries may exist, capture all instances under "Containers".
|
|
45
|
+
- containerType: The type of container (e.g., 20FT, 40FT, 20ft, 40ft, 40HC, 20DC, etc...).
|
|
46
|
+
- pickupDepotCode: The code of the depot where the empty container is picked up.
|
|
47
|
+
- dropOffDepotCode: The code of the depot where the empty container is dropped off.
|
|
48
|
+
|
|
49
|
+
IMPORTANT explanation for the transportLegs part as follows:
|
|
50
|
+
- There is at least one leg in each document.
|
|
51
|
+
- 'eta' must be equal or later than 'etd'!
|
|
52
|
+
- Multiple legs are possible. When there are multiple legs,
|
|
53
|
+
- Sequential Sorting: You must manually re-order legs based on etd then eta, regardless of their order in the source text.
|
|
54
|
+
- The Connectivity Rule: For any sequence of legs, the Destination (Port of Discharge) of the previous leg must match the Origin (Port of Loading) of the following leg.
|
|
55
|
+
- Transhipment Handling: Treat any mentioned "Transhipment Port" as the bridge between two legs (Discharge for Leg A / Loading for Leg B).
|
|
56
|
+
- Timeline Integrity: Ensure a "No Time Travel" policy: The eta of a previous leg must be earlier than or equal to the etd of the following leg.
|
|
57
|
+
- Naming Convention: Look for Port Names followed by abbreviations in parentheses, e.g., "Port Name (ABCDE)".
|
|
58
|
+
|
|
59
|
+
<INSTRUCTIONS>
|