mindee 3.14.0 → 3.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -0
- data/docs/business_card_v1.md +169 -0
- data/docs/code_samples/business_card_v1_async.txt +19 -0
- data/docs/code_samples/delivery_notes_v1_async.txt +19 -0
- data/docs/code_samples/expense_receipts_v5_async.txt +19 -0
- data/docs/code_samples/ind_passport_v1_async.txt +19 -0
- data/docs/delivery_notes_v1.md +143 -0
- data/docs/energy_bill_fra_v1.md +2 -2
- data/docs/expense_receipts_v5.md +27 -2
- data/docs/financial_document_v1.md +8 -4
- data/docs/ind_passport_v1.md +281 -0
- data/docs/invoices_v4.md +12 -8
- data/docs/resume_v1.md +17 -16
- data/lib/mindee/client.rb +9 -8
- data/lib/mindee/extraction/common/extracted_image.rb +0 -1
- data/lib/mindee/extraction/common/image_extractor.rb +7 -22
- data/lib/mindee/extraction/pdf_extractor/pdf_extractor.rb +2 -0
- data/lib/mindee/extraction/tax_extractor/tax_extractor.rb +1 -0
- data/lib/mindee/geometry/point.rb +2 -1
- data/lib/mindee/image/image_compressor.rb +29 -0
- data/lib/mindee/image/image_utils.rb +104 -0
- data/lib/mindee/image.rb +4 -0
- data/lib/mindee/input/sources.rb +36 -0
- data/lib/mindee/parsing/standard/position_field.rb +3 -0
- data/lib/mindee/pdf/pdf_compressor.rb +117 -0
- data/lib/mindee/pdf/{pdf_processing.rb → pdf_processor.rb} +17 -0
- data/lib/mindee/pdf/pdf_tools.rb +100 -0
- data/lib/mindee/pdf.rb +3 -1
- data/lib/mindee/product/business_card/business_card_v1.rb +39 -0
- data/lib/mindee/product/business_card/business_card_v1_document.rb +85 -0
- data/lib/mindee/product/business_card/business_card_v1_page.rb +32 -0
- data/lib/mindee/product/delivery_note/delivery_note_v1.rb +39 -0
- data/lib/mindee/product/delivery_note/delivery_note_v1_document.rb +61 -0
- data/lib/mindee/product/delivery_note/delivery_note_v1_page.rb +32 -0
- data/lib/mindee/product/financial_document/financial_document_v1_document.rb +1 -1
- data/lib/mindee/product/financial_document/financial_document_v1_page.rb +1 -1
- data/lib/mindee/product/ind/indian_passport/indian_passport_v1.rb +41 -0
- data/lib/mindee/product/ind/indian_passport/indian_passport_v1_document.rb +143 -0
- data/lib/mindee/product/ind/indian_passport/indian_passport_v1_page.rb +34 -0
- data/lib/mindee/product/invoice/invoice_v4_document.rb +1 -1
- data/lib/mindee/product/invoice/invoice_v4_page.rb +1 -1
- data/lib/mindee/product/resume/resume_v1_document.rb +3 -1
- data/lib/mindee/product/resume/resume_v1_page.rb +1 -1
- data/lib/mindee/product/resume/resume_v1_professional_experience.rb +8 -0
- data/lib/mindee/product.rb +10 -7
- data/lib/mindee/version.rb +1 -1
- data/lib/mindee.rb +10 -0
- data/mindee.gemspec +2 -1
- metadata +47 -7
@@ -0,0 +1,281 @@
|
|
1
|
+
---
|
2
|
+
title: IND Passport - India OCR Ruby
|
3
|
+
category: 622b805aaec68102ea7fcbc2
|
4
|
+
slug: ruby-ind-passport---india-ocr
|
5
|
+
parentDoc: 6294d97ee723f1008d2ab28e
|
6
|
+
---
|
7
|
+
The Ruby OCR SDK supports the [Passport - India API](https://platform.mindee.com/mindee/ind_passport).
|
8
|
+
|
9
|
+
Using the [sample below](https://github.com/mindee/client-lib-test-data/blob/main/products/ind_passport/default_sample.jpg), we are going to illustrate how to extract the data that we want using the OCR SDK.
|
10
|
+

|
11
|
+
|
12
|
+
# Quick-Start
|
13
|
+
```rb
|
14
|
+
require 'mindee'
|
15
|
+
|
16
|
+
# Init a new client
|
17
|
+
mindee_client = Mindee::Client.new(api_key: 'my-api-key')
|
18
|
+
|
19
|
+
# Load a file from disk
|
20
|
+
input_source = mindee_client.source_from_path('/path/to/the/file.ext')
|
21
|
+
|
22
|
+
# Parse the file
|
23
|
+
result = mindee_client.enqueue_and_parse(
|
24
|
+
input_source,
|
25
|
+
Mindee::Product::IND::IndianPassport::IndianPassportV1
|
26
|
+
)
|
27
|
+
|
28
|
+
# Print a full summary of the parsed data in RST format
|
29
|
+
puts result.document
|
30
|
+
|
31
|
+
# Print the document-level parsed data
|
32
|
+
# puts result.document.inference.prediction
|
33
|
+
|
34
|
+
```
|
35
|
+
|
36
|
+
**Output (RST):**
|
37
|
+
```rst
|
38
|
+
########
|
39
|
+
Document
|
40
|
+
########
|
41
|
+
:Mindee ID: cf88fd43-eaa1-497a-ba29-a9569a4edaa7
|
42
|
+
:Filename: default_sample.jpg
|
43
|
+
|
44
|
+
Inference
|
45
|
+
#########
|
46
|
+
:Product: mindee/ind_passport v1.0
|
47
|
+
:Rotation applied: Yes
|
48
|
+
|
49
|
+
Prediction
|
50
|
+
==========
|
51
|
+
:Page Number: 1
|
52
|
+
:Country: IND
|
53
|
+
:ID Number: J8369854
|
54
|
+
:Given Names: JOCELYN MICHELLE
|
55
|
+
:Surname: DOE
|
56
|
+
:Birth Date: 1959-09-23
|
57
|
+
:Birth Place: GUNDUGOLANU
|
58
|
+
:Issuance Place: HYDERABAD
|
59
|
+
:Gender: F
|
60
|
+
:Issuance Date: 2011-10-11
|
61
|
+
:Expiry Date: 2021-10-10
|
62
|
+
:MRZ Line 1: P<DOE<<JOCELYNMICHELLE<<<<<<<<<<<<<<<<<<<<<
|
63
|
+
:MRZ Line 2: J8369854<4IND5909234F2110101<<<<<<<<<<<<<<<8
|
64
|
+
:Legal Guardian:
|
65
|
+
:Name of Spouse:
|
66
|
+
:Name of Mother:
|
67
|
+
:Old Passport Date of Issue:
|
68
|
+
:Old Passport Number:
|
69
|
+
:Address Line 1:
|
70
|
+
:Address Line 2:
|
71
|
+
:Address Line 3:
|
72
|
+
:Old Passport Place of Issue:
|
73
|
+
:File Number:
|
74
|
+
```
|
75
|
+
|
76
|
+
# Field Types
|
77
|
+
## Standard Fields
|
78
|
+
These fields are generic and used in several products.
|
79
|
+
|
80
|
+
### Basic Field
|
81
|
+
Each prediction object contains a set of fields that inherit from the generic `Field` class.
|
82
|
+
A typical `Field` object will have the following attributes:
|
83
|
+
|
84
|
+
* **value** (`String`, `Float`, `Integer`, `Boolean`): corresponds to the field value. Can be `nil` if no value was extracted.
|
85
|
+
* **confidence** (Float, nil): the confidence score of the field prediction.
|
86
|
+
* **bounding_box** (`Mindee::Geometry::Quadrilateral`, `nil`): contains exactly 4 relative vertices (points) coordinates of a right rectangle containing the field in the document.
|
87
|
+
* **polygon** (`Mindee::Geometry::Polygon`, `nil`): contains the relative vertices coordinates (`Point`) of a polygon containing the field in the image.
|
88
|
+
* **page_id** (`Integer`, `nil`): the ID of the page, always `nil` when at document-level.
|
89
|
+
* **reconstructed** (`Boolean`): indicates whether an object was reconstructed (not extracted as the API gave it).
|
90
|
+
|
91
|
+
|
92
|
+
Aside from the previous attributes, all basic fields have access to a `to_s` method that can be used to print their value as a string.
|
93
|
+
|
94
|
+
|
95
|
+
### Classification Field
|
96
|
+
The classification field `ClassificationField` does not implement all the basic `Field` attributes. It only implements **value**, **confidence** and **page_id**.
|
97
|
+
|
98
|
+
> Note: a classification field's `value is always a `String`.
|
99
|
+
|
100
|
+
### Date Field
|
101
|
+
Aside from the basic `Field` attributes, the date field `DateField` also implements the following:
|
102
|
+
|
103
|
+
* **date_object** (`Date`): an accessible representation of the value as a JavaScript object.
|
104
|
+
|
105
|
+
### String Field
|
106
|
+
The text field `StringField` only has one constraint: it's **value** is a `String` (or `nil`).
|
107
|
+
|
108
|
+
# Attributes
|
109
|
+
The following fields are extracted for Passport - India V1:
|
110
|
+
|
111
|
+
## Address Line 1
|
112
|
+
**address1** ([StringField](#string-field)): The first line of the address of the passport holder.
|
113
|
+
|
114
|
+
```rb
|
115
|
+
puts result.document.inference.prediction.address1.value
|
116
|
+
```
|
117
|
+
|
118
|
+
## Address Line 2
|
119
|
+
**address2** ([StringField](#string-field)): The second line of the address of the passport holder.
|
120
|
+
|
121
|
+
```rb
|
122
|
+
puts result.document.inference.prediction.address2.value
|
123
|
+
```
|
124
|
+
|
125
|
+
## Address Line 3
|
126
|
+
**address3** ([StringField](#string-field)): The third line of the address of the passport holder.
|
127
|
+
|
128
|
+
```rb
|
129
|
+
puts result.document.inference.prediction.address3.value
|
130
|
+
```
|
131
|
+
|
132
|
+
## Birth Date
|
133
|
+
**birth_date** ([DateField](#date-field)): The birth date of the passport holder, ISO format: YYYY-MM-DD.
|
134
|
+
|
135
|
+
```rb
|
136
|
+
puts result.document.inference.prediction.birth_date.value
|
137
|
+
```
|
138
|
+
|
139
|
+
## Birth Place
|
140
|
+
**birth_place** ([StringField](#string-field)): The birth place of the passport holder.
|
141
|
+
|
142
|
+
```rb
|
143
|
+
puts result.document.inference.prediction.birth_place.value
|
144
|
+
```
|
145
|
+
|
146
|
+
## Country
|
147
|
+
**country** ([StringField](#string-field)): ISO 3166-1 alpha-3 country code (3 letters format).
|
148
|
+
|
149
|
+
```rb
|
150
|
+
puts result.document.inference.prediction.country.value
|
151
|
+
```
|
152
|
+
|
153
|
+
## Expiry Date
|
154
|
+
**expiry_date** ([DateField](#date-field)): The date when the passport will expire, ISO format: YYYY-MM-DD.
|
155
|
+
|
156
|
+
```rb
|
157
|
+
puts result.document.inference.prediction.expiry_date.value
|
158
|
+
```
|
159
|
+
|
160
|
+
## File Number
|
161
|
+
**file_number** ([StringField](#string-field)): The file number of the passport document.
|
162
|
+
|
163
|
+
```rb
|
164
|
+
puts result.document.inference.prediction.file_number.value
|
165
|
+
```
|
166
|
+
|
167
|
+
## Gender
|
168
|
+
**gender** ([ClassificationField](#classification-field)): The gender of the passport holder.
|
169
|
+
|
170
|
+
#### Possible values include:
|
171
|
+
- M
|
172
|
+
- F
|
173
|
+
|
174
|
+
```rb
|
175
|
+
puts result.document.inference.prediction.gender.value
|
176
|
+
```
|
177
|
+
|
178
|
+
## Given Names
|
179
|
+
**given_names** ([StringField](#string-field)): The given names of the passport holder.
|
180
|
+
|
181
|
+
```rb
|
182
|
+
puts result.document.inference.prediction.given_names.value
|
183
|
+
```
|
184
|
+
|
185
|
+
## ID Number
|
186
|
+
**id_number** ([StringField](#string-field)): The identification number of the passport document.
|
187
|
+
|
188
|
+
```rb
|
189
|
+
puts result.document.inference.prediction.id_number.value
|
190
|
+
```
|
191
|
+
|
192
|
+
## Issuance Date
|
193
|
+
**issuance_date** ([DateField](#date-field)): The date when the passport was issued, ISO format: YYYY-MM-DD.
|
194
|
+
|
195
|
+
```rb
|
196
|
+
puts result.document.inference.prediction.issuance_date.value
|
197
|
+
```
|
198
|
+
|
199
|
+
## Issuance Place
|
200
|
+
**issuance_place** ([StringField](#string-field)): The place where the passport was issued.
|
201
|
+
|
202
|
+
```rb
|
203
|
+
puts result.document.inference.prediction.issuance_place.value
|
204
|
+
```
|
205
|
+
|
206
|
+
## Legal Guardian
|
207
|
+
**legal_guardian** ([StringField](#string-field)): The name of the legal guardian of the passport holder (if applicable).
|
208
|
+
|
209
|
+
```rb
|
210
|
+
puts result.document.inference.prediction.legal_guardian.value
|
211
|
+
```
|
212
|
+
|
213
|
+
## MRZ Line 1
|
214
|
+
**mrz1** ([StringField](#string-field)): The first line of the machine-readable zone (MRZ) of the passport document.
|
215
|
+
|
216
|
+
```rb
|
217
|
+
puts result.document.inference.prediction.mrz1.value
|
218
|
+
```
|
219
|
+
|
220
|
+
## MRZ Line 2
|
221
|
+
**mrz2** ([StringField](#string-field)): The second line of the machine-readable zone (MRZ) of the passport document.
|
222
|
+
|
223
|
+
```rb
|
224
|
+
puts result.document.inference.prediction.mrz2.value
|
225
|
+
```
|
226
|
+
|
227
|
+
## Name of Mother
|
228
|
+
**name_of_mother** ([StringField](#string-field)): The name of the mother of the passport holder.
|
229
|
+
|
230
|
+
```rb
|
231
|
+
puts result.document.inference.prediction.name_of_mother.value
|
232
|
+
```
|
233
|
+
|
234
|
+
## Name of Spouse
|
235
|
+
**name_of_spouse** ([StringField](#string-field)): The name of the spouse of the passport holder (if applicable).
|
236
|
+
|
237
|
+
```rb
|
238
|
+
puts result.document.inference.prediction.name_of_spouse.value
|
239
|
+
```
|
240
|
+
|
241
|
+
## Old Passport Date of Issue
|
242
|
+
**old_passport_date_of_issue** ([DateField](#date-field)): The date of issue of the old passport (if applicable), ISO format: YYYY-MM-DD.
|
243
|
+
|
244
|
+
```rb
|
245
|
+
puts result.document.inference.prediction.old_passport_date_of_issue.value
|
246
|
+
```
|
247
|
+
|
248
|
+
## Old Passport Number
|
249
|
+
**old_passport_number** ([StringField](#string-field)): The number of the old passport (if applicable).
|
250
|
+
|
251
|
+
```rb
|
252
|
+
puts result.document.inference.prediction.old_passport_number.value
|
253
|
+
```
|
254
|
+
|
255
|
+
## Old Passport Place of Issue
|
256
|
+
**old_passport_place_of_issue** ([StringField](#string-field)): The place of issue of the old passport (if applicable).
|
257
|
+
|
258
|
+
```rb
|
259
|
+
puts result.document.inference.prediction.old_passport_place_of_issue.value
|
260
|
+
```
|
261
|
+
|
262
|
+
## Page Number
|
263
|
+
**page_number** ([ClassificationField](#classification-field)): The page number of the passport document.
|
264
|
+
|
265
|
+
#### Possible values include:
|
266
|
+
- 1
|
267
|
+
- 2
|
268
|
+
|
269
|
+
```rb
|
270
|
+
puts result.document.inference.prediction.page_number.value
|
271
|
+
```
|
272
|
+
|
273
|
+
## Surname
|
274
|
+
**surname** ([StringField](#string-field)): The surname of the passport holder.
|
275
|
+
|
276
|
+
```rb
|
277
|
+
puts result.document.inference.prediction.surname.value
|
278
|
+
```
|
279
|
+
|
280
|
+
# Questions?
|
281
|
+
[Join our Slack](https://join.slack.com/t/mindee-community/shared_invite/zt-2d0ds7dtz-DPAF81ZqTy20chsYpQBW5g)
|
data/docs/invoices_v4.md
CHANGED
@@ -63,21 +63,23 @@ puts result.document
|
|
63
63
|
########
|
64
64
|
Document
|
65
65
|
########
|
66
|
-
:Mindee ID:
|
66
|
+
:Mindee ID: a67b70ea-4b1e-4eac-ae75-dda47a7064ae
|
67
67
|
:Filename: default_sample.jpg
|
68
68
|
|
69
69
|
Inference
|
70
70
|
#########
|
71
|
-
:Product: mindee/invoices v4.
|
71
|
+
:Product: mindee/invoices v4.9
|
72
72
|
:Rotation applied: Yes
|
73
73
|
|
74
74
|
Prediction
|
75
75
|
==========
|
76
|
-
:Locale: en; en; CAD;
|
76
|
+
:Locale: en-CA; en; CA; CAD;
|
77
77
|
:Invoice Number: 14
|
78
|
+
:Purchase Order Number: AD29094
|
78
79
|
:Reference Numbers: AD29094
|
79
80
|
:Purchase Date: 2018-09-25
|
80
|
-
:Due Date:
|
81
|
+
:Due Date: 2011-12-01
|
82
|
+
:Payment Date: 2011-12-01
|
81
83
|
:Total Net: 2145.00
|
82
84
|
:Total Amount: 2608.20
|
83
85
|
:Total Tax: 193.20
|
@@ -93,7 +95,7 @@ Prediction
|
|
93
95
|
:Supplier Address: 156 University Ave, Toronto ON, Canada, M5H 2H7
|
94
96
|
:Supplier Phone Number: 4165551212
|
95
97
|
:Supplier Website:
|
96
|
-
:Supplier Email:
|
98
|
+
:Supplier Email: j_coi@example.com
|
97
99
|
:Customer Name: JIRO DOI
|
98
100
|
:Customer Company Registrations:
|
99
101
|
:Customer Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
|
@@ -117,11 +119,13 @@ Page Predictions
|
|
117
119
|
|
118
120
|
Page 0
|
119
121
|
------
|
120
|
-
:Locale: en; en; CAD;
|
122
|
+
:Locale: en-CA; en; CA; CAD;
|
121
123
|
:Invoice Number: 14
|
124
|
+
:Purchase Order Number: AD29094
|
122
125
|
:Reference Numbers: AD29094
|
123
126
|
:Purchase Date: 2018-09-25
|
124
|
-
:Due Date:
|
127
|
+
:Due Date: 2011-12-01
|
128
|
+
:Payment Date: 2011-12-01
|
125
129
|
:Total Net: 2145.00
|
126
130
|
:Total Amount: 2608.20
|
127
131
|
:Total Tax: 193.20
|
@@ -137,7 +141,7 @@ Page 0
|
|
137
141
|
:Supplier Address: 156 University Ave, Toronto ON, Canada, M5H 2H7
|
138
142
|
:Supplier Phone Number: 4165551212
|
139
143
|
:Supplier Website:
|
140
|
-
:Supplier Email:
|
144
|
+
:Supplier Email: j_coi@example.com
|
141
145
|
:Customer Name: JIRO DOI
|
142
146
|
:Customer Company Registrations:
|
143
147
|
:Customer Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
|
data/docs/resume_v1.md
CHANGED
@@ -38,13 +38,13 @@ puts result.document
|
|
38
38
|
########
|
39
39
|
Document
|
40
40
|
########
|
41
|
-
:Mindee ID:
|
41
|
+
:Mindee ID: 9daa3085-152c-454e-9245-636f13fc9dc3
|
42
42
|
:Filename: default_sample.jpg
|
43
43
|
|
44
44
|
Inference
|
45
45
|
#########
|
46
|
-
:Product: mindee/resume v1.
|
47
|
-
:Rotation applied:
|
46
|
+
:Product: mindee/resume v1.1
|
47
|
+
:Rotation applied: Yes
|
48
48
|
|
49
49
|
Prediction
|
50
50
|
==========
|
@@ -54,8 +54,8 @@ Prediction
|
|
54
54
|
:Surnames: Morgan
|
55
55
|
:Nationality:
|
56
56
|
:Email Address: christoper.m@gmail.com
|
57
|
-
:Phone Number: +44 (0)
|
58
|
-
:Address: 177 Great Portland Street, London W5W 6PQ
|
57
|
+
:Phone Number: +44 (0)20 7666 8555
|
58
|
+
:Address: 177 Great Portland Street, London, W5W 6PQ
|
59
59
|
:Social Networks:
|
60
60
|
+----------------------+----------------------------------------------------+
|
61
61
|
| Name | URL |
|
@@ -72,38 +72,37 @@ Prediction
|
|
72
72
|
+----------+----------------------+
|
73
73
|
| ZHO | Beginner |
|
74
74
|
+----------+----------------------+
|
75
|
-
| DEU |
|
75
|
+
| DEU | Beginner |
|
76
76
|
+----------+----------------------+
|
77
77
|
:Hard Skills: HTML5
|
78
78
|
PHP OOP
|
79
79
|
JavaScript
|
80
80
|
CSS
|
81
81
|
MySQL
|
82
|
+
SQL
|
82
83
|
:Soft Skills: Project management
|
84
|
+
Creative design
|
83
85
|
Strong decision maker
|
84
86
|
Innovative
|
85
87
|
Complex problem solver
|
86
|
-
Creative design
|
87
88
|
Service-focused
|
88
89
|
:Education:
|
89
90
|
+-----------------+---------------------------+-----------+----------+---------------------------+-------------+------------+
|
90
91
|
| Domain | Degree | End Month | End Year | School | Start Month | Start Year |
|
91
92
|
+=================+===========================+===========+==========+===========================+=============+============+
|
92
|
-
| Computer Inf... | Bachelor | |
|
93
|
+
| Computer Inf... | Bachelor | | 2014 | Columbia University, NY | | |
|
93
94
|
+-----------------+---------------------------+-----------+----------+---------------------------+-------------+------------+
|
94
95
|
:Professional Experiences:
|
95
|
-
|
96
|
-
| Contract Type | Department | Employer | End Month | End Year | Role | Start Month | Start Year |
|
97
|
-
|
98
|
-
|
|
99
|
-
|
96
|
+
+-----------------+------------+--------------------------------------+---------------------------+-----------+----------+----------------------+-------------+------------+
|
97
|
+
| Contract Type | Department | Description | Employer | End Month | End Year | Role | Start Month | Start Year |
|
98
|
+
+=================+============+======================================+===========================+===========+==========+======================+=============+============+
|
99
|
+
| | | Cooperate with designers to creat... | Luna Web Design, New York | 05 | 2019 | Web Developer | 09 | 2015 |
|
100
|
+
+-----------------+------------+--------------------------------------+---------------------------+-----------+----------+----------------------+-------------+------------+
|
100
101
|
:Certificates:
|
101
102
|
+------------+--------------------------------+---------------------------+------+
|
102
103
|
| Grade | Name | Provider | Year |
|
103
104
|
+============+================================+===========================+======+
|
104
|
-
| | PHP Framework (certificate)... | |
|
105
|
-
+------------+--------------------------------+---------------------------+------+
|
106
|
-
| | Programming Languages: Java... | | |
|
105
|
+
| | PHP Framework (certificate)... | | |
|
107
106
|
+------------+--------------------------------+---------------------------+------+
|
108
107
|
```
|
109
108
|
|
@@ -171,6 +170,7 @@ A `ResumeV1Language` implements the following attributes:
|
|
171
170
|
* `level` (String): The candidate's level for the language.
|
172
171
|
|
173
172
|
#### Possible values include:
|
173
|
+
- Native
|
174
174
|
- Fluent
|
175
175
|
- Proficient
|
176
176
|
- Intermediate
|
@@ -192,6 +192,7 @@ A `ResumeV1ProfessionalExperience` implements the following attributes:
|
|
192
192
|
- Freelance
|
193
193
|
|
194
194
|
* `department` (String): The specific department or division within the company.
|
195
|
+
* `description` (String): The description of the professional experience as written in the document.
|
195
196
|
* `employer` (String): The name of the company or organization.
|
196
197
|
* `end_month` (String): The month when the professional experience ended.
|
197
198
|
* `end_year` (String): The year when the professional experience ended.
|
data/lib/mindee/client.rb
CHANGED
@@ -128,6 +128,7 @@ module Mindee
|
|
128
128
|
end
|
129
129
|
|
130
130
|
# rubocop:disable Metrics/ParameterLists
|
131
|
+
|
131
132
|
# Enqueue a document for async parsing and automatically try to retrieve it
|
132
133
|
#
|
133
134
|
# @param input_source [Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource]
|
@@ -148,8 +149,8 @@ module Mindee
|
|
148
149
|
# * `:on_min_pages` Apply the operation only if document has at least this many pages.
|
149
150
|
# @param cropper [Boolean, nil] Whether to include cropper results for each page.
|
150
151
|
# This performs a cropping operation on the server and will increase response time.
|
151
|
-
# @param initial_delay_sec [Integer, Float] initial delay before polling. Defaults to
|
152
|
-
# @param delay_sec [Integer, Float] delay between polling attempts. Defaults to
|
152
|
+
# @param initial_delay_sec [Integer, Float] initial delay before polling. Defaults to 2.
|
153
|
+
# @param delay_sec [Integer, Float] delay between polling attempts. Defaults to 1.5.
|
153
154
|
# @param max_retries [Integer] maximum amount of retries. Defaults to 60.
|
154
155
|
# @return [Mindee::Parsing::Common::ApiResponse]
|
155
156
|
def enqueue_and_parse(
|
@@ -161,8 +162,8 @@ module Mindee
|
|
161
162
|
close_file: true,
|
162
163
|
page_options: nil,
|
163
164
|
cropper: false,
|
164
|
-
initial_delay_sec:
|
165
|
-
delay_sec:
|
165
|
+
initial_delay_sec: 2,
|
166
|
+
delay_sec: 1.5,
|
166
167
|
max_retries: 60
|
167
168
|
)
|
168
169
|
enqueue_res = enqueue(
|
@@ -271,13 +272,13 @@ module Mindee
|
|
271
272
|
# @param max_retries [Integer, nil] maximum amount of retries.
|
272
273
|
def validate_async_params(initial_delay_sec, delay_sec, max_retries)
|
273
274
|
min_delay_sec = 1
|
274
|
-
min_initial_delay_sec =
|
275
|
+
min_initial_delay_sec = 1
|
275
276
|
min_retries = 2
|
276
|
-
raise "Cannot set auto-poll delay to less than #{min_delay_sec}
|
277
|
+
raise "Cannot set auto-poll delay to less than #{min_delay_sec} second(s)" if delay_sec < min_delay_sec
|
277
278
|
if initial_delay_sec < min_initial_delay_sec
|
278
|
-
raise "Cannot set initial parsing delay to less than #{min_initial_delay_sec}
|
279
|
+
raise "Cannot set initial parsing delay to less than #{min_initial_delay_sec} second(s)"
|
279
280
|
end
|
280
|
-
raise "Cannot set auto-poll
|
281
|
+
raise "Cannot set auto-poll retries to less than #{min_retries}" if max_retries < min_retries
|
281
282
|
end
|
282
283
|
|
283
284
|
# Creates an endpoint with the given values. Raises an error if the endpoint is invalid.
|
@@ -54,7 +54,6 @@ module Mindee
|
|
54
54
|
image = MiniMagick::Image.read(@buffer)
|
55
55
|
image.format file_format.downcase
|
56
56
|
image.write resolved_path.to_s
|
57
|
-
logger.info("File saved successfully to '#{resolved_path}'.")
|
58
57
|
rescue TypeError
|
59
58
|
raise 'Invalid path/filename provided.'
|
60
59
|
rescue StandardError
|
@@ -11,8 +11,8 @@ module Mindee
|
|
11
11
|
# Image Extraction Module.
|
12
12
|
module Extraction
|
13
13
|
# Image Extraction wrapper class.
|
14
|
-
|
15
|
-
def self.attach_image_as_new_file(input_buffer)
|
14
|
+
module ImageExtractor
|
15
|
+
def self.attach_image_as_new_file(input_buffer, format: 'jpg')
|
16
16
|
# Attaches an image as a new page in a PdfDocument object.
|
17
17
|
#
|
18
18
|
# @param [StringIO] input_buffer Input buffer. Only supports JPEG.
|
@@ -21,9 +21,9 @@ module Mindee
|
|
21
21
|
magick_image = MiniMagick::Image.read(input_buffer)
|
22
22
|
# NOTE: some jpeg images get rendered as three different versions of themselves per output if the format isn't
|
23
23
|
# converted.
|
24
|
-
magick_image.format(
|
24
|
+
magick_image.format(format)
|
25
25
|
original_density = magick_image.resolution
|
26
|
-
scale_factor = original_density[0].to_f / 4.166666 # No clue why
|
26
|
+
scale_factor = original_density[0].to_f / 4.166666 # No clue why the resolution needs to be reduced for
|
27
27
|
# the pdf otherwise the resulting image shrinks.
|
28
28
|
magick_image.format('pdf', 0, { density: scale_factor.to_s })
|
29
29
|
Origami::PDF.read(StringIO.new(magick_image.to_blob))
|
@@ -37,27 +37,12 @@ module Mindee
|
|
37
37
|
# to extract.
|
38
38
|
# @return [Array<Mindee::Extraction::ExtractedImage>] Extracted Images.
|
39
39
|
def self.extract_multiple_images_from_source(input_source, page_id, polygons)
|
40
|
-
new_stream =
|
40
|
+
new_stream = load_input_source_pdf_page_as_image(input_source, page_id)
|
41
41
|
new_stream.seek(0)
|
42
42
|
|
43
43
|
extract_images_from_polygons(input_source, new_stream, page_id, polygons)
|
44
44
|
end
|
45
45
|
|
46
|
-
# Retrieves a PDF document's page.
|
47
|
-
#
|
48
|
-
# @param [Origami::PDF] pdf_doc Origami PDF handle.
|
49
|
-
# @param [Integer] page_id Page ID.
|
50
|
-
def self.get_page(pdf_doc, page_id)
|
51
|
-
stream = StringIO.new
|
52
|
-
pdf_doc.save(stream)
|
53
|
-
|
54
|
-
options = {
|
55
|
-
page_indexes: [page_id - 1],
|
56
|
-
}
|
57
|
-
|
58
|
-
Mindee::PDF::PdfProcessor.parse(stream, options)
|
59
|
-
end
|
60
|
-
|
61
46
|
# Extracts images from their positions on a file (as polygons).
|
62
47
|
#
|
63
48
|
# @param [Mindee::Input::Source::LocalInputSource] input_source Local input source.
|
@@ -179,10 +164,10 @@ module Mindee
|
|
179
164
|
# @param input_file [LocalInputSource] Local input.
|
180
165
|
# @param [Integer] page_id Page ID.
|
181
166
|
# @return [MiniMagick::Image] A valid PdfDocument handle.
|
182
|
-
def self.
|
167
|
+
def self.load_input_source_pdf_page_as_image(input_file, page_id)
|
183
168
|
input_file.io_stream.rewind
|
184
169
|
if input_file.pdf?
|
185
|
-
get_page(Origami::PDF.read(input_file.io_stream), page_id)
|
170
|
+
Mindee::PDF::PdfProcessor.get_page(Origami::PDF.read(input_file.io_stream), page_id)
|
186
171
|
else
|
187
172
|
input_file.io_stream
|
188
173
|
end
|
@@ -65,6 +65,7 @@ module Mindee
|
|
65
65
|
|
66
66
|
# rubocop:disable Metrics/CyclomaticComplexity
|
67
67
|
# rubocop:disable Metrics/PerceivedComplexity
|
68
|
+
|
68
69
|
# Extracts invoices as complete PDFs from the document.
|
69
70
|
# @param page_indexes [Array<Array<Integer>, InvoiceSplitterV1PageGroup>]
|
70
71
|
# @param strict [Boolean]
|
@@ -99,6 +100,7 @@ module Mindee
|
|
99
100
|
end
|
100
101
|
extract_sub_documents(correct_page_indexes)
|
101
102
|
end
|
103
|
+
|
102
104
|
# rubocop:enable Metrics/CyclomaticComplexity
|
103
105
|
# rubocop:enable Metrics/PerceivedComplexity
|
104
106
|
|
@@ -0,0 +1,29 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Mindee
|
4
|
+
# Image processing module.
|
5
|
+
module Image
|
6
|
+
# Image compressor module to handle image compression.
|
7
|
+
module ImageCompressor
|
8
|
+
# Resize and/or compress an SKBitmap. This assumes the ratio was provided before hands.
|
9
|
+
# @param image [MiniMagick::Image, StringIO] Input image.
|
10
|
+
# @param quality [Integer, nil] Quality of the final file.
|
11
|
+
# @param max_width [Integer, nil] Maximum width. If not specified, the horizontal ratio will remain the same.
|
12
|
+
# @param max_height [Integer] Maximum height. If not specified, the vertical ratio will remain the same.
|
13
|
+
# @return [StringIO]
|
14
|
+
def self.compress_image(image, quality: 85, max_width: nil, max_height: nil)
|
15
|
+
processed_image = ImageUtils.to_image(image)
|
16
|
+
processed_image.format 'jpg'
|
17
|
+
final_width, final_height = ImageUtils.calculate_new_dimensions(
|
18
|
+
processed_image,
|
19
|
+
max_width: max_width,
|
20
|
+
max_height: max_height
|
21
|
+
)
|
22
|
+
ImageUtils.resize_image(processed_image, final_width, final_height) if final_width || final_height
|
23
|
+
ImageUtils.compress_image_quality(processed_image, quality)
|
24
|
+
|
25
|
+
ImageUtils.image_to_stringio(processed_image)
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|