data-science-document-ai 1.47.0__tar.gz → 1.48.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/PKG-INFO +1 -1
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/pyproject.toml +1 -1
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/constants.py +1 -4
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/docai_processor_config.yaml +0 -21
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/excel_processing.py +7 -17
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/llm.py +0 -29
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/pdf_processing.py +2 -17
- data_science_document_ai-1.48.0/src/prompts/library/customsAssessment/other/placeholders.json +70 -0
- data_science_document_ai-1.48.0/src/prompts/library/customsAssessment/other/prompt.txt +29 -0
- data_science_document_ai-1.48.0/src/prompts/library/deliveryOrder/other/placeholders.json +82 -0
- data_science_document_ai-1.48.0/src/prompts/library/deliveryOrder/other/prompt.txt +36 -0
- data_science_document_ai-1.48.0/src/prompts/library/preprocessing/carrier/placeholders.json +14 -0
- data_science_document_ai-1.48.0/src/prompts/library/shippingInstruction/other/placeholders.json +115 -0
- data_science_document_ai-1.48.0/src/prompts/library/shippingInstruction/other/prompt.txt +28 -0
- data_science_document_ai-1.47.0/src/prompts/library/customsAssessment/other/prompt.txt +0 -42
- data_science_document_ai-1.47.0/src/prompts/library/deliveryOrder/other/placeholders.json +0 -29
- data_science_document_ai-1.47.0/src/prompts/library/deliveryOrder/other/prompt.txt +0 -50
- data_science_document_ai-1.47.0/src/prompts/library/preprocessing/carrier/placeholders.json +0 -30
- data_science_document_ai-1.47.0/src/prompts/library/shippingInstruction/other/prompt.txt +0 -16
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/constants_sandbox.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/docai.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/io.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/log_setup.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/postprocessing/common.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/postprocessing/postprocess_booking_confirmation.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/postprocessing/postprocess_commercial_invoice.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/postprocessing/postprocess_partner_invoice.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/arrivalNotice/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/arrivalNotice/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/evergreen/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/evergreen/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/hapag-lloyd/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/hapag-lloyd/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/maersk/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/maersk/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/msc/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/msc/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/oocl/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/oocl/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/yangming/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bookingConfirmation/yangming/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bundeskasse/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/bundeskasse/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/commercialInvoice/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/commercialInvoice/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/customsInvoice/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/customsInvoice/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/draftMbl/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/draftMbl/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/finalMbL/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/finalMbL/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/packingList/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/packingList/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/partnerInvoice/other/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/partnerInvoice/other/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/postprocessing/port_code/placeholders.json +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/postprocessing/port_code/prompt_port_code.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/library/preprocessing/carrier/prompt.txt +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/prompt_library.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/setup.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/tms.py +0 -0
- {data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/utils.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[tool.poetry]
|
|
2
2
|
name = "data-science-document-ai"
|
|
3
|
-
version = "1.
|
|
3
|
+
version = "1.48.0"
|
|
4
4
|
description = "\"Document AI repo for data science\""
|
|
5
5
|
authors = ["Naomi Nguyen <naomi.nguyen@forto.com>", "Kumar Rajendrababu <kumar.rajendrababu@forto.com>", "Igor Tonko <igor.tonko@forto.com>", "Osman Demirel <osman.demirel@forto.com>"]
|
|
6
6
|
packages = [
|
|
@@ -53,10 +53,6 @@ project_parameters = {
|
|
|
53
53
|
"model_selector": {
|
|
54
54
|
"stable": {
|
|
55
55
|
"bookingConfirmation": 1,
|
|
56
|
-
"shippingInstruction": 0,
|
|
57
|
-
"customsAssessment": 0,
|
|
58
|
-
"deliveryOrder": 0,
|
|
59
|
-
"partnerInvoice": 0,
|
|
60
56
|
},
|
|
61
57
|
"beta": {
|
|
62
58
|
"bookingConfirmation": 0,
|
|
@@ -87,6 +83,7 @@ project_parameters = {
|
|
|
87
83
|
"arrivalNotice": ["containers"],
|
|
88
84
|
"finalMbL": ["containers"],
|
|
89
85
|
"draftMbl": ["containers"],
|
|
86
|
+
"deliveryOrder": ["Equipment", "TransportLeg"],
|
|
90
87
|
"customsAssessment": ["containers"],
|
|
91
88
|
"packingList": ["skuData"],
|
|
92
89
|
"commercialInvoice": ["skus"],
|
{data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/docai_processor_config.yaml
RENAMED
|
@@ -13,27 +13,6 @@ model_config:
|
|
|
13
13
|
author: "igor.tonko@forto.com"
|
|
14
14
|
created_date: ""
|
|
15
15
|
|
|
16
|
-
shippingInstruction:
|
|
17
|
-
- id: "c77a0a515d99a8ba"
|
|
18
|
-
details:
|
|
19
|
-
display_name: "doc_cap_shippingInstruction"
|
|
20
|
-
author: "kumar.rajendrababu@forto.com"
|
|
21
|
-
created_date: ""
|
|
22
|
-
|
|
23
|
-
customsAssessment:
|
|
24
|
-
- id: "c464a18d82fad9be"
|
|
25
|
-
details:
|
|
26
|
-
display_name: "doc_cap_customsAssessment"
|
|
27
|
-
author: "igor.tonko@forto.com"
|
|
28
|
-
created_date: ""
|
|
29
|
-
|
|
30
|
-
deliveryOrder:
|
|
31
|
-
- id: "2245a72c7a5dbf5f"
|
|
32
|
-
details:
|
|
33
|
-
display_name: "doc_cap_releaseNote"
|
|
34
|
-
author: "igor.tonko@forto.com"
|
|
35
|
-
created_date: ""
|
|
36
|
-
|
|
37
16
|
beta:
|
|
38
17
|
bookingConfirmation:
|
|
39
18
|
- id: "3c280b11bdb3ed89"
|
|
@@ -11,9 +11,8 @@ import asyncio
|
|
|
11
11
|
import numpy as np
|
|
12
12
|
import pandas as pd
|
|
13
13
|
|
|
14
|
-
from src.llm import prompt_excel_extraction
|
|
15
14
|
from src.prompts.prompt_library import prompt_library
|
|
16
|
-
from src.utils import estimate_page_count,
|
|
15
|
+
from src.utils import estimate_page_count, get_excel_sheets
|
|
17
16
|
|
|
18
17
|
|
|
19
18
|
async def extract_data_from_sheet(
|
|
@@ -29,11 +28,14 @@ async def extract_data_from_sheet(
|
|
|
29
28
|
)
|
|
30
29
|
|
|
31
30
|
# Prompt for the LLM JSON
|
|
32
|
-
|
|
31
|
+
prompt = prompt_library.library[doc_type]["other"]["prompt"]
|
|
32
|
+
|
|
33
|
+
# Join the worksheet content with the prompt
|
|
34
|
+
prompt = worksheet + "\n" + prompt
|
|
33
35
|
|
|
34
36
|
try:
|
|
35
37
|
result = await llm_client.get_unified_json_genai(
|
|
36
|
-
|
|
38
|
+
prompt,
|
|
37
39
|
response_schema=response_schema,
|
|
38
40
|
doc_type=doc_type,
|
|
39
41
|
)
|
|
@@ -67,19 +69,7 @@ async def extract_data_from_excel(
|
|
|
67
69
|
|
|
68
70
|
"""
|
|
69
71
|
# Generate the response structure
|
|
70
|
-
response_schema =
|
|
71
|
-
prompt_library.library[input_doc_type]["other"]["placeholders"]
|
|
72
|
-
if input_doc_type
|
|
73
|
-
in [
|
|
74
|
-
"partnerInvoice",
|
|
75
|
-
"customsInvoice",
|
|
76
|
-
"bundeskasse",
|
|
77
|
-
"commercialInvoice",
|
|
78
|
-
"packingList",
|
|
79
|
-
"bookingConfirmation",
|
|
80
|
-
]
|
|
81
|
-
else generate_schema_structure(params, input_doc_type)
|
|
82
|
-
)
|
|
72
|
+
response_schema = prompt_library.library[input_doc_type]["other"]["placeholders"]
|
|
83
73
|
|
|
84
74
|
# Load the Excel file and get ONLY the "visible" sheet names
|
|
85
75
|
sheets, workbook = get_excel_sheets(file_content, mime_type)
|
|
@@ -201,33 +201,4 @@ class LlmClient:
|
|
|
201
201
|
return response
|
|
202
202
|
|
|
203
203
|
|
|
204
|
-
def prompt_excel_extraction(excel_structured_text):
|
|
205
|
-
"""Write a prompt to extract data from Excel files.
|
|
206
|
-
|
|
207
|
-
Args:
|
|
208
|
-
excel_structured_text (str): The structured text of the Excel file.
|
|
209
|
-
|
|
210
|
-
Returns:
|
|
211
|
-
prompt str: The prompt for common json.
|
|
212
|
-
"""
|
|
213
|
-
prompt = f"""{excel_structured_text}
|
|
214
|
-
|
|
215
|
-
Task: Fill in the following dictionary from the information in the given in the above excel data.
|
|
216
|
-
|
|
217
|
-
Instructions:
|
|
218
|
-
- Do not change the keys of the following dictionary.
|
|
219
|
-
- The values should be filled in as per the schema provided below.
|
|
220
|
-
- If an entity contains a 'display_name', consider its properties as child data points in the below format.
|
|
221
|
-
{{'data-field': {{
|
|
222
|
-
'child-data-field': 'type -occurrence_type- description',
|
|
223
|
-
}}
|
|
224
|
-
}}
|
|
225
|
-
- The entity with 'display_name' can be extracted multiple times. Please pay attention to the occurrence_type.
|
|
226
|
-
- Ensure the schema reflects the hierarchical relationship.
|
|
227
|
-
- Use the data field description to understand the context of the data.
|
|
228
|
-
|
|
229
|
-
"""
|
|
230
|
-
return prompt
|
|
231
|
-
|
|
232
|
-
|
|
233
204
|
# pylint: enable=all
|
|
@@ -202,23 +202,8 @@ async def process_file_w_llm(params, file_content, input_doc_type, llm_client):
|
|
|
202
202
|
number_of_pages = get_pdf_page_count(file_content)
|
|
203
203
|
logger.info(f"processing {input_doc_type} with {number_of_pages} pages...")
|
|
204
204
|
|
|
205
|
-
# get the schema placeholder
|
|
206
|
-
response_schema =
|
|
207
|
-
prompt_library.library[input_doc_type]["other"]["placeholders"]
|
|
208
|
-
if input_doc_type
|
|
209
|
-
in [
|
|
210
|
-
"partnerInvoice",
|
|
211
|
-
"customsInvoice",
|
|
212
|
-
"bundeskasse",
|
|
213
|
-
"commercialInvoice",
|
|
214
|
-
"packingList",
|
|
215
|
-
"bookingConfirmation",
|
|
216
|
-
"arrivalNotice",
|
|
217
|
-
"finalMbL",
|
|
218
|
-
"draftMbl",
|
|
219
|
-
] # Move this to constants or remove after complete migration to LLM
|
|
220
|
-
else generate_schema_structure(params, input_doc_type)
|
|
221
|
-
)
|
|
205
|
+
# get the schema placeholder
|
|
206
|
+
response_schema = prompt_library.library[input_doc_type]["other"]["placeholders"]
|
|
222
207
|
|
|
223
208
|
carrier = "other"
|
|
224
209
|
carrier_schema = (
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
{
|
|
2
|
+
"type": "OBJECT",
|
|
3
|
+
"properties": {
|
|
4
|
+
"consignee": {
|
|
5
|
+
"type": "STRING",
|
|
6
|
+
"nullable": true,
|
|
7
|
+
"description": "The receiver or buyer of the goods. It can be find with the keywords like Importeur, Anmelder, Empfanger, Consignee, Buyer, Receiver, etc.."
|
|
8
|
+
},
|
|
9
|
+
"countryOfOrigin": {
|
|
10
|
+
"type": "STRING",
|
|
11
|
+
"nullable": true,
|
|
12
|
+
"description": "The country where the goods were manufactured or produced. It can be identified as Land van oorsprong, Ursprungsland in the document."
|
|
13
|
+
},
|
|
14
|
+
"MRN": {
|
|
15
|
+
"type": "STRING",
|
|
16
|
+
"nullable": true,
|
|
17
|
+
"description": "Movement Reference Number (MRN) is a unique identifier assigned to each customs declaration for goods being imported or exported within the European Union (EU). It is used to track and monitor the movement of goods across EU member states. It can be found with MRN, Reg. Nr., Reg. Kennzeigechen, etc.."
|
|
18
|
+
},
|
|
19
|
+
"shipper": {
|
|
20
|
+
"type": "STRING",
|
|
21
|
+
"nullable": true,
|
|
22
|
+
"description": "The seller or shipper of the goods. It is often indicated by the term Shipper, Speditore, Esportatore, Exporteur, Versender."
|
|
23
|
+
},
|
|
24
|
+
"totalValueOfGoods": {
|
|
25
|
+
"type": "STRING",
|
|
26
|
+
"nullable": true,
|
|
27
|
+
"description": "The total monetary value of the goods being shipped, usually declared for customs purposes. It can be found with Waarde, Warenwert, Factuurwaarde, Invoice Value, etc.."
|
|
28
|
+
},
|
|
29
|
+
"containers": {
|
|
30
|
+
"type": "ARRAY",
|
|
31
|
+
"items": {
|
|
32
|
+
"type": "OBJECT",
|
|
33
|
+
"properties": {
|
|
34
|
+
"containerNumber": {
|
|
35
|
+
"type": "STRING",
|
|
36
|
+
"nullable": true,
|
|
37
|
+
"description": "The unique identifier for each container. It always starts with 4 capital letters and followed by 7 digits. Example: TEMU7972458."
|
|
38
|
+
},
|
|
39
|
+
"goodsDescription": {
|
|
40
|
+
"type": "STRING",
|
|
41
|
+
"nullable": true,
|
|
42
|
+
"description": "A brief description of the goods contained within the container. It can be found with goods description, Bezeichnung, goederenomschrijving."
|
|
43
|
+
},
|
|
44
|
+
"grossWeight": {
|
|
45
|
+
"type": "STRING",
|
|
46
|
+
"nullable": true,
|
|
47
|
+
"description": "The gross weight of the container. Usually mentioned as G.W or GW, Bruto, or Gross Weight, etc.."
|
|
48
|
+
},
|
|
49
|
+
"nettWeight": {
|
|
50
|
+
"type": "STRING",
|
|
51
|
+
"nullable": true,
|
|
52
|
+
"description": "The net weight of the goods inside the container. Usually mentioned as N.W or NW, Net Weight, or Netto, Eigenmasse, etc.."
|
|
53
|
+
},
|
|
54
|
+
"packagingNumber": {
|
|
55
|
+
"type": "STRING",
|
|
56
|
+
"nullable": true,
|
|
57
|
+
"description": "The quantity of the goods. Usually, the quantity is in pallets, PLT, cartons, CTNS, pieces, PCS, packages, boxes, etc. Please prioritize the packaging types based on their size, as follows: Pallets (PLT) >> Cartons (CTNS) >> Pieces (PCS). Extract the Larger packaging types that will have a lower count."
|
|
58
|
+
},
|
|
59
|
+
"packagingType": {
|
|
60
|
+
"type": "STRING",
|
|
61
|
+
"nullable": true,
|
|
62
|
+
"description": "The packaging type is the unit of packagingNumber. Example; pallets, PLT, cartons, CTNS, pieces, PCS, packages, etc. Sometimes, the packaging type is available in the column name of the packagingNumber."
|
|
63
|
+
}
|
|
64
|
+
},
|
|
65
|
+
"required": ["containerNumber", "goodsDescription", "grossWeight", "nettWeight", "packagingNumber", "packagingType"]
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
},
|
|
69
|
+
"required": ["countryOfOrigin", "MRN", "totalValueOfGoods", "containers"]
|
|
70
|
+
}
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
<PERSONA> You are an efficient document entity data extraction specialist working for a Freight Forwarding company. <PERSONA>
|
|
2
|
+
|
|
3
|
+
<TASK> Your task is to extract data from delivery order documents as per the given response schema structure. <TASK>
|
|
4
|
+
|
|
5
|
+
<CONTEXT>
|
|
6
|
+
The Freight Forwarding company receives Customs Assessment from customs partners.
|
|
7
|
+
These documents contain various details related to shipper, buyer, MRN, and container data such as container number, goods details at container level.
|
|
8
|
+
They may be written in different languages such as English, German, Vietnamese, Chinese, and other European languages, and can appear in a variety of formats and layouts.
|
|
9
|
+
Your role is to accurately extract specific entities from these Customs Assessment to support efficient processing and accurate record-keeping.
|
|
10
|
+
<CONTEXT>
|
|
11
|
+
|
|
12
|
+
<INSTRUCTIONS>
|
|
13
|
+
- Populate fields as defined in the response schema.
|
|
14
|
+
- Multiple containers entries may exist — capture all instances under "containers".
|
|
15
|
+
- Use the data field description to understand the context of the data.
|
|
16
|
+
|
|
17
|
+
- MRN: Movement Reference Number (MRN) is a unique identifier assigned to each customs declaration for goods being imported or exported within the European Union (EU). It is used to track and monitor the movement of goods across EU member states. It can be found with MRN, Reg. Nr., Reg. Kennzeigechen, etc..
|
|
18
|
+
|
|
19
|
+
- containers: Details of each container on the Customs Assessment. Make sure to extract each container information separately.
|
|
20
|
+
- containerNumber: Container Number consists of 4 capital letters followed by 7 digits (e.g., TEMU7972458, CAIU 7222892). It can be identified as container number, cntr. nos., containernummern, cont. nr.
|
|
21
|
+
- goodsDescription: Extract only the description of the goods for the "goodsDescription" but not other information like packing, marks, etc.
|
|
22
|
+
- packagingNumber:
|
|
23
|
+
- Prioritize the "Pallets/PLTS/Cartons/CTNS/Package" over "PCS" count to extract the data for the "packagingNumber".
|
|
24
|
+
- example: If the table has "17CTNS", "9PLTS", "850", "850PCS", prioritize "9PLTS"
|
|
25
|
+
- Do not extract the pack Quantity field such as "50PCS/CTN", "5PC/Box" (these represent quantity per carton, not total shipped quantity).
|
|
26
|
+
- packagingType:
|
|
27
|
+
- Extract the unit associated with the "packagingNumber" in the table to extract the "packagingType"
|
|
28
|
+
- Sometimes it can be found on the column name of the "packagingNumber" in the table to extract the "packagingType"
|
|
29
|
+
<INSTRUCTIONS>
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
{
|
|
2
|
+
"type": "OBJECT",
|
|
3
|
+
"properties": {
|
|
4
|
+
"EmptyContainerDepot": {
|
|
5
|
+
"type": "STRING",
|
|
6
|
+
"nullable": true,
|
|
7
|
+
"description": "The depot where the empty container is returned."
|
|
8
|
+
},
|
|
9
|
+
"Equipment": {
|
|
10
|
+
"type": "ARRAY",
|
|
11
|
+
"items": {
|
|
12
|
+
"type": "OBJECT",
|
|
13
|
+
"properties": {
|
|
14
|
+
"CargoGrossWeight": {
|
|
15
|
+
"type": "STRING",
|
|
16
|
+
"nullable": true,
|
|
17
|
+
"description": "The gross weight of the Cargo. Usually mentioned as G.W or GW or Gross Weight, etc.."},
|
|
18
|
+
"ContainerNumber": {
|
|
19
|
+
"type": "STRING",
|
|
20
|
+
"nullable": true,
|
|
21
|
+
"description": "The container number associated with the document. They MUST consist of 4 letters followed by 7 digits (e.g., 'CMAU1234567', 'BMOU 575538/3', 'XLXU 1277652'). It can be found in the document as 'Container No.', 'Container Number', 'Cont. No.', 'Cont Nr.', 'Seefrachtcontainer-Nr.', or 'Containernummer'."},
|
|
22
|
+
"ContainerType": {
|
|
23
|
+
"type": "STRING",
|
|
24
|
+
"nullable": true,
|
|
25
|
+
"description": "The size or Type of the container associated with the containerNumber, such as 20ft, 40ft, 40HC, 20DC etc."},
|
|
26
|
+
"EmptyReturnReference": {
|
|
27
|
+
"type": "STRING",
|
|
28
|
+
"nullable": true,
|
|
29
|
+
"description": "The reference number or code for the return of the empty container."},
|
|
30
|
+
"Pin": {
|
|
31
|
+
"type": "STRING",
|
|
32
|
+
"nullable": true,
|
|
33
|
+
"description": "The PIN code associated with the container, often used for security or access purposes."},
|
|
34
|
+
"TareWeight": {
|
|
35
|
+
"type": "STRING",
|
|
36
|
+
"nullable": true,
|
|
37
|
+
"description": "The weight of the empty container itself, without any cargo inside. Usually mentioned as T.W or TW or Tare Weight, etc.."}
|
|
38
|
+
},
|
|
39
|
+
"required": ["CargoGrossWeight", "ContainerNumber", "EmptyReturnReference", "Pin", "TareWeight"]
|
|
40
|
+
}
|
|
41
|
+
},
|
|
42
|
+
"pickUpTerminal": {
|
|
43
|
+
"type": "STRING",
|
|
44
|
+
"nullable": true,
|
|
45
|
+
"description": "The terminal where the container or cargo is picked up."
|
|
46
|
+
},
|
|
47
|
+
"TransportLeg": {
|
|
48
|
+
"type": "ARRAY",
|
|
49
|
+
"items": {
|
|
50
|
+
"type": "OBJECT",
|
|
51
|
+
"properties": {
|
|
52
|
+
"eta": {
|
|
53
|
+
"type": "STRING",
|
|
54
|
+
"nullable": true,
|
|
55
|
+
"description": "Estimated Time of Arrival (ETA) is the expected date when the shipment will arrive at its destination."},
|
|
56
|
+
"etd": {
|
|
57
|
+
"type": "STRING",
|
|
58
|
+
"nullable": true,
|
|
59
|
+
"description": "Estimated Time of Departure (ETD) is the expected date when the shipment will leave the origin port."},
|
|
60
|
+
"portOfDischarge": {
|
|
61
|
+
"type": "STRING",
|
|
62
|
+
"nullable": true,
|
|
63
|
+
"description": "The port where the goods are discharged from the vessel. This is the destination port for the shipment."},
|
|
64
|
+
"portOfLoading": {
|
|
65
|
+
"type": "STRING",
|
|
66
|
+
"nullable": true,
|
|
67
|
+
"description": "The origin port where the goods are loaded onto the vessel. Find information like 'Ladehafen' or 'Port of Loading' in the invoice."},
|
|
68
|
+
"vesselName": {
|
|
69
|
+
"type": "STRING",
|
|
70
|
+
"nullable": true,
|
|
71
|
+
"description": "The name of the vessel carrying the container or shipment"},
|
|
72
|
+
"voyage": {
|
|
73
|
+
"type": "STRING",
|
|
74
|
+
"nullable": true,
|
|
75
|
+
"description": "The unique voyage number or identifier assigned to a vessel’s specific journey. This typically corresponds to the scheduled sailing associated with the shipment and can often be found near vessel information on shipping documents. such as voyage, voy. no, voyage-no."}
|
|
76
|
+
},
|
|
77
|
+
"required": ["eta", "etd", "portOfDischarge", "portOfLoading", "vesselName", "voyage"]
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
},
|
|
81
|
+
"required": ["Equipment", "TransportLeg"]
|
|
82
|
+
}
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
<PERSONA> You are an efficient document entity data extraction specialist working for a Freight Forwarding company. <PERSONA>
|
|
2
|
+
|
|
3
|
+
<TASK> Your task is to extract data from Delivery Order documents as per the given response schema structure. <TASK>
|
|
4
|
+
|
|
5
|
+
<CONTEXT>
|
|
6
|
+
The Freight Forwarding company receives Delivery Order from Carrier (Shipping Lines) partners.
|
|
7
|
+
These documents contain various details related to shipments, equipment details, transport leg details, delivery / pickup details, vessel details, pick up terminal data.
|
|
8
|
+
They may be written in different languages such as English, German, Vietnamese, Chinese, and other European languages, and can appear in a variety of formats and layouts.
|
|
9
|
+
Your role is to accurately extract specific entities from these Delivery Orders to support efficient processing and accurate record-keeping.
|
|
10
|
+
<CONTEXT>
|
|
11
|
+
|
|
12
|
+
<INSTRUCTIONS>
|
|
13
|
+
- Populate fields as defined in the response schema.
|
|
14
|
+
- Multiple Equipment and TransportLeg entries may exist — capture all instances under "Equipment" and "TransportLeg".
|
|
15
|
+
- Use the data field description to understand the context of the data.
|
|
16
|
+
|
|
17
|
+
EmptyContainerDepot: Empty container depot address.
|
|
18
|
+
Equipment: Details of each Equipment on the Delivery Order. Make sure to extract each Equipment information separately.
|
|
19
|
+
CargoGrossWeight: Total weight of the cargo, including the tare weight of the container. Weight(incl. tare), Cargo Weight, Weight (KG)
|
|
20
|
+
ContainerNumber: Container Number consists of 4 capital letters followed by 7 digits (e.g., TEMU7972458, CAIU 7222892).
|
|
21
|
+
ContainerType: Type of the shipping container, usually related to it's size.
|
|
22
|
+
EmptyReturnReference: A reference code for empty container return. Find it as Drop off reference, turn-in reference in the document.
|
|
23
|
+
Pin: Container release reference or PIN code to pick up the container. Can be found as Release reference, pin
|
|
24
|
+
TareWeight: Weight of the empty container without cargo. It can be found as Tare weight, tare.
|
|
25
|
+
|
|
26
|
+
pickUpTerminal: The specific terminal for cargo pickup. It can also be found as pick up at depot, empty container depot, pickup depot, empty pickup location in the doc.
|
|
27
|
+
|
|
28
|
+
TransportLeg: Details of each TransportLeg on the Delivery Order. Make sure to extract each TransportLeg information separately.
|
|
29
|
+
eta: The estimated time of arrival for a specific leg.
|
|
30
|
+
etd: The estimated time of departure for a specific leg.
|
|
31
|
+
portOfDischarge: The port where cargo is unloaded.
|
|
32
|
+
portOfLoading: The port where cargo is loaded.
|
|
33
|
+
vesselName: The name of the vessel.
|
|
34
|
+
voyage: The journey or route code taken by the vessel. It is often identified as voyage, voy. no, voyage-no in the document.
|
|
35
|
+
|
|
36
|
+
<INSTRUCTIONS>
|
data_science_document_ai-1.48.0/src/prompts/library/shippingInstruction/other/placeholders.json
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
{
|
|
2
|
+
"type": "OBJECT",
|
|
3
|
+
"properties": {
|
|
4
|
+
"consignee": {
|
|
5
|
+
"type": "STRING",
|
|
6
|
+
"nullable": true,
|
|
7
|
+
"description": "The receiver or buyer of the goods. It can be find with the keywords like Importeur, Anmelder, Empfanger, Consignee, Buyer, Receiver, etc.."
|
|
8
|
+
},
|
|
9
|
+
"finalDestination": {
|
|
10
|
+
"type": "STRING",
|
|
11
|
+
"nullable": true,
|
|
12
|
+
"description": "The ultimate location where the goods are to be delivered, marking the end point of the shipment's journey."
|
|
13
|
+
},
|
|
14
|
+
"freight": {
|
|
15
|
+
"type": "STRING",
|
|
16
|
+
"nullable": true,
|
|
17
|
+
"description": "The cost type associated with transporting goods. Can be classified as 'prepaid' or 'collect'."
|
|
18
|
+
},
|
|
19
|
+
"hblType": {
|
|
20
|
+
"type": "STRING",
|
|
21
|
+
"nullable": true,
|
|
22
|
+
"description": "The type of House Bill of Lading such as Telex Released, ORIGINAL B/L, EXPRESS, Sur Bill, Sea WayBill, etc., indicating the document issued by a freight forwarder that outlines the terms and details of the shipment."
|
|
23
|
+
},
|
|
24
|
+
"notify": {
|
|
25
|
+
"type": "STRING",
|
|
26
|
+
"nullable": true,
|
|
27
|
+
"description": "The party to be informed upon the arrival of the shipment at the destination. often responsible for coordinating the delivery. Extract the notify details including the address."
|
|
28
|
+
},
|
|
29
|
+
"placeOfReceipt": {
|
|
30
|
+
"type": "STRING",
|
|
31
|
+
"nullable": true,
|
|
32
|
+
"description": "The location where the goods are initially handed over to the freight forwarder or carrier for transportation"
|
|
33
|
+
},
|
|
34
|
+
"portOfDischarge": {
|
|
35
|
+
"type": "STRING",
|
|
36
|
+
"nullable": true,
|
|
37
|
+
"description": "The port where the goods are discharged from the vessel. This is the destination port for the shipment."
|
|
38
|
+
},
|
|
39
|
+
"portOfLoading": {
|
|
40
|
+
"type": "STRING",
|
|
41
|
+
"nullable": true,
|
|
42
|
+
"description": "The origin port where the goods are loaded onto the vessel. Find information like 'Ladehafen' or 'Port of Loading' in the invoice."
|
|
43
|
+
},
|
|
44
|
+
"shipper": {
|
|
45
|
+
"type": "STRING",
|
|
46
|
+
"nullable": true,
|
|
47
|
+
"description": "The sender or exporter of the goods. It can be find with the keywords like Absender, Versender, Shipper, Exporter, Supplier, Seller, etc.."
|
|
48
|
+
},
|
|
49
|
+
"containers": {
|
|
50
|
+
"type": "ARRAY",
|
|
51
|
+
"items": {
|
|
52
|
+
"type": "OBJECT",
|
|
53
|
+
"properties": {
|
|
54
|
+
"cargoDescription": {
|
|
55
|
+
"type": "STRING",
|
|
56
|
+
"nullable": true,
|
|
57
|
+
"description": "A brief description of the goods contained within the container. It can be found with goods description, Bezeichnung, goederenomschrijving."
|
|
58
|
+
},
|
|
59
|
+
"marksAndNumbers": {
|
|
60
|
+
"type": "STRING",
|
|
61
|
+
"nullable": true,
|
|
62
|
+
"description": "Identification details printed or attached to packages for easy recognition during handling and customs procedures, ensuring accurate delivery. Extract the details including the numbers."
|
|
63
|
+
},
|
|
64
|
+
"hsCode": {
|
|
65
|
+
"type": "STRING",
|
|
66
|
+
"nullable": true,
|
|
67
|
+
"description": "A numerical code from the Harmonized System used for classifying traded products. It helps in determining tariffs and regulations for the goods being shipped. Extract the full HS code including all digits."
|
|
68
|
+
},
|
|
69
|
+
"containerNumber": {
|
|
70
|
+
"type": "STRING",
|
|
71
|
+
"nullable": true,
|
|
72
|
+
"description": "The unique identifier for each container. It always starts with 4 capital letters and followed by 7 digits. Example: TEMU7972458."
|
|
73
|
+
},
|
|
74
|
+
"containerType": {
|
|
75
|
+
"type": "STRING",
|
|
76
|
+
"nullable": true,
|
|
77
|
+
"description": "The size of the container associated with the containerNumber, such as 20ft, 40ft, 40HC, 20DC etc."
|
|
78
|
+
},
|
|
79
|
+
"grossWeight": {
|
|
80
|
+
"type": "STRING",
|
|
81
|
+
"nullable": true,
|
|
82
|
+
"description": "The gross weight of the container. Usually mentioned as G.W or GW, Bruto, or Gross Weight, etc.."
|
|
83
|
+
},
|
|
84
|
+
"nettWeight": {
|
|
85
|
+
"type": "STRING",
|
|
86
|
+
"nullable": true,
|
|
87
|
+
"description": "The net weight of the container. Usually mentioned as N.W or NW, Net Weight, or Netto, Eigenmasse, etc.."
|
|
88
|
+
},
|
|
89
|
+
"measurements": {
|
|
90
|
+
"type": "STRING",
|
|
91
|
+
"nullable": true,
|
|
92
|
+
"description": "The volume of the goods. Usually, it is measured in 'Cubic Meter (cbm)' or dimensions. But volume in 'Cubic Meter (cbm)' is preferred if it’s available in the skus"
|
|
93
|
+
},
|
|
94
|
+
"packageQuantity": {
|
|
95
|
+
"type": "STRING",
|
|
96
|
+
"nullable": true,
|
|
97
|
+
"description": "The quantity of the goods. Usually, the quantity is in pallets, PLT, cartons, CTNS, pieces, PCS, packages, boxes, etc. Please prioritize the packaging types based on their size, as follows: Pallets (PLT) >> Cartons (CTNS) >> Pieces (PCS). Extract the Larger packaging types that will have a lower count."
|
|
98
|
+
},
|
|
99
|
+
"packagingType": {
|
|
100
|
+
"type": "STRING",
|
|
101
|
+
"nullable": true,
|
|
102
|
+
"description": "The packaging type is the unit of packageQuantity. Example; pallets, PLT, cartons, CTNS, pieces, PCS, packages, etc. Sometimes, the packaging type is available in the column name of the packageQuantity."
|
|
103
|
+
},
|
|
104
|
+
"sealNumber": {
|
|
105
|
+
"type": "STRING",
|
|
106
|
+
"nullable": true,
|
|
107
|
+
"description": "A unique number associated with the container number. But it is not a container number. Usually mentioned as Seal No., Seal Number, Siegelnummer, etc.."
|
|
108
|
+
}
|
|
109
|
+
},
|
|
110
|
+
"required": ["cargoDescription", "containerNumber", "hsCode", "grossWeight", "nettWeight", "packageQuantity", "packagingType"]
|
|
111
|
+
}
|
|
112
|
+
}
|
|
113
|
+
},
|
|
114
|
+
"required": ["shipper", "consignee", "portOfLoading", "portOfDischarge", "placeOfReceipt", "finalDestination", "freight", "hblType", "notify", "containers"]
|
|
115
|
+
}
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
<PERSONA> You are an efficient document entity data extraction specialist working for a Freight Forwarding company. <PERSONA>
|
|
2
|
+
|
|
3
|
+
<TASK> Your task is to extract data from Shipping Instruction documents as per the given response schema structure. <TASK>
|
|
4
|
+
|
|
5
|
+
<CONTEXT>
|
|
6
|
+
The Freight Forwarding company receives Shipping Instruction from customers or shipper.
|
|
7
|
+
These Shipping Instruction contain various details related to shipping information, as well as container data such as goods, HS code, container details and gross and net weight.
|
|
8
|
+
They may be written in different languages such as English, German, Vietnamese, Chinese, and other European languages, and can appear in a variety of formats and layouts.
|
|
9
|
+
Your role is to accurately extract specific entities from these Shipping Instruction to support efficient processing and accurate record-keeping.
|
|
10
|
+
<CONTEXT>
|
|
11
|
+
|
|
12
|
+
<INSTRUCTIONS>
|
|
13
|
+
- Populate fields as defined in the response schema.
|
|
14
|
+
- Multiple Container entries may exist, capture all instances under "containers".
|
|
15
|
+
- Use the data field description to understand the context of the data.
|
|
16
|
+
|
|
17
|
+
- "containers" Data Fields: Details of each container on the Shipping Instruction. Make sure to extract each container information separately.
|
|
18
|
+
- containerNumber: Container Number always starts with 4 letters and is followed by 7 digits (e.g., ABCD1234567, XALU 8593678).
|
|
19
|
+
- cargoDescription: Extract only the description of the goods for the "cargoDescription" but not other information like packing, marks, etc.
|
|
20
|
+
- packageQuantity:
|
|
21
|
+
- Prioritize the "Pallets/PLTS/Cartons/CTNS/Package" over "PCS" count to extract the data for the "packageQuantity".
|
|
22
|
+
- example: If the table has "17CTNS", "9PLTS", "850", "850PCS", prioritize "9PLTS"
|
|
23
|
+
- Do not extract the pack Quantity field such as "50PCS/CTN", "5PC/Box" (these represent quantity per carton, not total shipped quantity).
|
|
24
|
+
- packagingType:
|
|
25
|
+
- Extract the unit associated with the "packageQuantity" in the table to extract the "packagingType"
|
|
26
|
+
- Sometimes it can be found on the column name of the "packageQuantity" in the table to extract the "packagingType"
|
|
27
|
+
|
|
28
|
+
<INSTRUCTIONS>
|
|
@@ -1,42 +0,0 @@
|
|
|
1
|
-
You are a document entity extraction specialist. Your task is to extract data from a customs assessment document.
|
|
2
|
-
Customs assessment contain necessary information about imported / exported goods and containers.
|
|
3
|
-
|
|
4
|
-
consignee: Legal Entity that is responsible for importing goods, name and address.
|
|
5
|
-
shipper: Legal Entity that is responsible for sending goods, name and address.
|
|
6
|
-
countryOfOrigin: Country of origin of goods.
|
|
7
|
-
MRN: MRN code.
|
|
8
|
-
totalValueOfGoods: Total value of goods.
|
|
9
|
-
containers:
|
|
10
|
-
containerNumber: Unique ID for tracking the shipping container.
|
|
11
|
-
grossWeight: Total weight of the cargo, including the tare weight of the container.
|
|
12
|
-
packagingNumber: Packaging number.
|
|
13
|
-
nettWeight: Weight of the goods excluding packaging and containers.
|
|
14
|
-
packagingType: Type of packaging used (e.g., cartons, pallets, barrels).
|
|
15
|
-
goodsDescription: Goods description.
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
Your task is to extract the text value of the following entities and page numbers starting from 0 where the value was found in the document:
|
|
19
|
-
SCHEMA_PLACEHOLDER
|
|
20
|
-
|
|
21
|
-
Keywords for datapoints:
|
|
22
|
-
- consignee: Importeur, Anmelder, Empfanger.
|
|
23
|
-
- shipper: Speditore, Esportatore, Exporteur, Versender.
|
|
24
|
-
- countryOfOrigin: Land van oorsprong, Ursprungsland.
|
|
25
|
-
- MRN: MRN, Reg. Nr., Reg. Kennzeigechen.
|
|
26
|
-
- totalValueOfGoods: Waarde, Warenwert, Factuurwaarde.
|
|
27
|
-
- containers:
|
|
28
|
-
- containerNumber: container number, cntr. nos., containernummern, cont. nr.
|
|
29
|
-
- grossWeight: gross weight, Bruto.
|
|
30
|
-
- nettWeight: Weight of the goods excluding packaging and containers, Netto, Eigenmasse.
|
|
31
|
-
- packagingNumber: package number, Anzahl.
|
|
32
|
-
- packagingType: Type of packaging used (e.g., cartons, pallets, barrels), number and kind of packages, description of goods.
|
|
33
|
-
- goodsDescription: goods description, Bezeichnung, goederenomschrijving.
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
You must apply the following rules:
|
|
37
|
-
- The JSON schema must be followed during the extraction.
|
|
38
|
-
- The values must only include text found in the document
|
|
39
|
-
- Do not normalize any entity value.
|
|
40
|
-
- nettWeight can't be equal to grossWeight.
|
|
41
|
-
- Validate the JSON make sure it is a valid JSON ! No extra text, no missing comma!
|
|
42
|
-
- Add an escape character (backwards slash) in from of all quotes in values
|
|
@@ -1,29 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"type": "OBJECT",
|
|
3
|
-
"properties": {
|
|
4
|
-
"EmptyContainerDepot": {"type": "STRING", "nullable": true},
|
|
5
|
-
"Equipment": {"type": "ARRAY",
|
|
6
|
-
"items": {
|
|
7
|
-
"type": "OBJECT", "properties": {
|
|
8
|
-
"CargoGrossWeight": {"type": "STRING", "nullable": true},
|
|
9
|
-
"ContainerNumber": {"type": "STRING", "nullable": true},
|
|
10
|
-
"ContainerType": {"type": "STRING", "nullable": true},
|
|
11
|
-
"EmptyReturnReference": {"type": "STRING", "nullable": true},
|
|
12
|
-
"Pin": {"type": "STRING", "nullable": true},
|
|
13
|
-
"TareWeight": {"type": "STRING", "nullable": true}
|
|
14
|
-
}, "required": []}
|
|
15
|
-
},
|
|
16
|
-
"pickUpTerminal": {"type": "STRING", "nullable": true},
|
|
17
|
-
"TransportLeg": {"type": "ARRAY",
|
|
18
|
-
"items": {
|
|
19
|
-
"type": "OBJECT", "properties": {
|
|
20
|
-
"eta": {"type": "STRING", "nullable": true},
|
|
21
|
-
"etd": {"type": "STRING", "nullable": true},
|
|
22
|
-
"portOfDischarge": {"type": "STRING", "nullable": true},
|
|
23
|
-
"portOfLoading": {"type": "STRING", "nullable": true},
|
|
24
|
-
"vesselName": {"type": "STRING", "nullable": true},
|
|
25
|
-
"voyage": {"type": "STRING", "nullable": true}
|
|
26
|
-
}, "required": []}
|
|
27
|
-
},
|
|
28
|
-
"required": []}
|
|
29
|
-
}
|
|
@@ -1,50 +0,0 @@
|
|
|
1
|
-
You are a document entity extraction specialist. Given a document, the explained datapoint need to extract.
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
EmptyContainerDepot: Empty container depot address.
|
|
5
|
-
Equipment:
|
|
6
|
-
CargoGrossWeight: Total weight of the cargo, including the tare weight of the container.
|
|
7
|
-
ContainerNumber: Unique ID for tracking the shipping container.
|
|
8
|
-
ContainerType: Type of the shipping container, usually related to it's size.
|
|
9
|
-
EmptyReturnReference: A reference code for empty container return.
|
|
10
|
-
Pin: Container release reference.
|
|
11
|
-
TareWeight: Tare weight.
|
|
12
|
-
pickUpTerminal: The specific terminal for cargo pickup.
|
|
13
|
-
TransportLeg:
|
|
14
|
-
eta: The estimated time of arrival for a specific leg.
|
|
15
|
-
etd: The estimated time of departure for a specific leg.
|
|
16
|
-
portOfDischarge: The port where cargo is unloaded.
|
|
17
|
-
portOfLoading: The port where cargo is loaded.
|
|
18
|
-
vesselName: The name of the vessel.
|
|
19
|
-
voyage: The journey or route code taken by the vessel.
|
|
20
|
-
|
|
21
|
-
Your task is to extract the text value of the following entities and page numbers starting from 0 where the value was found in the document:
|
|
22
|
-
SCHEMA_PLACEHOLDER
|
|
23
|
-
|
|
24
|
-
Keywords for datapoints:
|
|
25
|
-
- EmptyContainerDepot: Empty Container Depot
|
|
26
|
-
- Equipment:
|
|
27
|
-
- CargoGrossWeight: Weight(incl. tare), Cargo Weight, Weight (KG)
|
|
28
|
-
- ContainerNumber: Container, Container Number, Container No.
|
|
29
|
-
- ContainerType: Type, Size/type
|
|
30
|
-
- EmptyReturnReference: Drop off reference, turn-in reference
|
|
31
|
-
- Pin: Release reference, pin
|
|
32
|
-
- TareWeight: Tare weight, tare
|
|
33
|
-
- pickUpTerminal: pick up at depot, empty container depot, pickup depot, empty pickup location
|
|
34
|
-
- TransportLeg:
|
|
35
|
-
- eta: eta, ETA
|
|
36
|
-
- etd: etd, ETD
|
|
37
|
-
- portOfDischarge: to, PORT OF DISCHARGE
|
|
38
|
-
- portOfLoading: from, PORT OF LOADING
|
|
39
|
-
- vesselName: vessel
|
|
40
|
-
- voyage: voyage, voy. no, voyage-no.
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
You must apply the following rules:
|
|
44
|
-
- The JSON schema must be followed during the extraction.
|
|
45
|
-
- The values must only include text found in the document
|
|
46
|
-
- Do not normalize any entity value.
|
|
47
|
-
- portOfLoading and portOfDischarge are name of the Ports. You can rely on the port names from all over the world.
|
|
48
|
-
- portOfLoading and portOfDischarge distinctly denotes the name of the ports. If you find abbreviation of the port use it, if not you can use the full name of the port
|
|
49
|
-
- Validate the JSON make sure it is a valid JSON ! No extra text, no missing comma!
|
|
50
|
-
- Add an escape character (backwards slash) in from of all quotes in values
|
|
@@ -1,30 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"bookingConfirmation": {
|
|
3
|
-
"type": "string",
|
|
4
|
-
"enum": [
|
|
5
|
-
"Hapag-Lloyd",
|
|
6
|
-
"MsC",
|
|
7
|
-
"Maersk",
|
|
8
|
-
"YangMing",
|
|
9
|
-
"Evergreen",
|
|
10
|
-
"OOCL",
|
|
11
|
-
"Other"
|
|
12
|
-
]
|
|
13
|
-
},
|
|
14
|
-
"finalMbL": {
|
|
15
|
-
"type": "string",
|
|
16
|
-
"enum": [
|
|
17
|
-
"Hapag-Lloyd",
|
|
18
|
-
"Maersk",
|
|
19
|
-
"Other"
|
|
20
|
-
]
|
|
21
|
-
},
|
|
22
|
-
"draftMbl": {
|
|
23
|
-
"type": "string",
|
|
24
|
-
"enum": [
|
|
25
|
-
"Hapag-Lloyd",
|
|
26
|
-
"Maersk",
|
|
27
|
-
"Other"
|
|
28
|
-
]
|
|
29
|
-
}
|
|
30
|
-
}
|
|
@@ -1,16 +0,0 @@
|
|
|
1
|
-
Task: Extract data from the provided shipping instruction PDF document and populate the following dictionary based on the given schema.
|
|
2
|
-
Your task is to extract the text value of the following entities and page numbers starting from 0 where the value was found in the document:
|
|
3
|
-
|
|
4
|
-
### Instructions:
|
|
5
|
-
1. Extract all data points from the shipping instruction document.
|
|
6
|
-
2. Each extracted data point must be part of a master field called "containers". There may be multiple "containers" entries in the document. Ensure you extract details for all instances.
|
|
7
|
-
3. "Containers" Data Fields:
|
|
8
|
-
- Fill in the data fields as per the response schema provided.
|
|
9
|
-
- Always search for the Quantity mentioned as pallets, PLT, cartons, CTNS, pieces, PCS, packages, boxes, etc...
|
|
10
|
-
- If a field such as `containerNumber`, `sealNumber`, 'hsCode' or any other fields are not found within the "containers" section, search for these fields elsewhere in the document. Once located, populate the respective fields in all relevant "containers" entities.
|
|
11
|
-
- If the document contains only one container, use the total values for attributes like `grossWeight`, `netWeight`, `measurements`, and `packageQuantity` to populate the single container entry.
|
|
12
|
-
- Avoid creating separate entries for these shared attributes; instead, merge the data into the existing "containers" entries.
|
|
13
|
-
|
|
14
|
-
4. Output:
|
|
15
|
-
- Return the extracted data in JSON format.
|
|
16
|
-
- Exclude all other information from the response.
|
{data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/constants_sandbox.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/postprocessing/common.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{data_science_document_ai-1.47.0 → data_science_document_ai-1.48.0}/src/prompts/prompt_library.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|