mindee 3.14.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +20 -0
  3. data/docs/business_card_v1.md +169 -0
  4. data/docs/code_samples/business_card_v1_async.txt +19 -0
  5. data/docs/code_samples/delivery_notes_v1_async.txt +19 -0
  6. data/docs/code_samples/expense_receipts_v5_async.txt +19 -0
  7. data/docs/code_samples/ind_passport_v1_async.txt +19 -0
  8. data/docs/delivery_notes_v1.md +143 -0
  9. data/docs/energy_bill_fra_v1.md +2 -2
  10. data/docs/expense_receipts_v5.md +27 -2
  11. data/docs/financial_document_v1.md +8 -4
  12. data/docs/ind_passport_v1.md +281 -0
  13. data/docs/invoices_v4.md +12 -8
  14. data/docs/resume_v1.md +17 -16
  15. data/lib/mindee/client.rb +9 -8
  16. data/lib/mindee/extraction/common/extracted_image.rb +0 -1
  17. data/lib/mindee/extraction/common/image_extractor.rb +7 -22
  18. data/lib/mindee/extraction/pdf_extractor/pdf_extractor.rb +2 -0
  19. data/lib/mindee/extraction/tax_extractor/tax_extractor.rb +1 -0
  20. data/lib/mindee/geometry/point.rb +2 -1
  21. data/lib/mindee/image/image_compressor.rb +29 -0
  22. data/lib/mindee/image/image_utils.rb +104 -0
  23. data/lib/mindee/image.rb +4 -0
  24. data/lib/mindee/input/sources.rb +36 -0
  25. data/lib/mindee/parsing/standard/position_field.rb +3 -0
  26. data/lib/mindee/pdf/pdf_compressor.rb +117 -0
  27. data/lib/mindee/pdf/{pdf_processing.rb → pdf_processor.rb} +17 -0
  28. data/lib/mindee/pdf/pdf_tools.rb +100 -0
  29. data/lib/mindee/pdf.rb +3 -1
  30. data/lib/mindee/product/business_card/business_card_v1.rb +39 -0
  31. data/lib/mindee/product/business_card/business_card_v1_document.rb +85 -0
  32. data/lib/mindee/product/business_card/business_card_v1_page.rb +32 -0
  33. data/lib/mindee/product/delivery_note/delivery_note_v1.rb +39 -0
  34. data/lib/mindee/product/delivery_note/delivery_note_v1_document.rb +61 -0
  35. data/lib/mindee/product/delivery_note/delivery_note_v1_page.rb +32 -0
  36. data/lib/mindee/product/financial_document/financial_document_v1_document.rb +1 -1
  37. data/lib/mindee/product/financial_document/financial_document_v1_page.rb +1 -1
  38. data/lib/mindee/product/ind/indian_passport/indian_passport_v1.rb +41 -0
  39. data/lib/mindee/product/ind/indian_passport/indian_passport_v1_document.rb +143 -0
  40. data/lib/mindee/product/ind/indian_passport/indian_passport_v1_page.rb +34 -0
  41. data/lib/mindee/product/invoice/invoice_v4_document.rb +1 -1
  42. data/lib/mindee/product/invoice/invoice_v4_page.rb +1 -1
  43. data/lib/mindee/product/resume/resume_v1_document.rb +3 -1
  44. data/lib/mindee/product/resume/resume_v1_page.rb +1 -1
  45. data/lib/mindee/product/resume/resume_v1_professional_experience.rb +8 -0
  46. data/lib/mindee/product.rb +10 -7
  47. data/lib/mindee/version.rb +1 -1
  48. data/lib/mindee.rb +10 -0
  49. data/mindee.gemspec +2 -1
  50. metadata +47 -7
@@ -0,0 +1,281 @@
1
+ ---
2
+ title: IND Passport - India OCR Ruby
3
+ category: 622b805aaec68102ea7fcbc2
4
+ slug: ruby-ind-passport---india-ocr
5
+ parentDoc: 6294d97ee723f1008d2ab28e
6
+ ---
7
+ The Ruby OCR SDK supports the [Passport - India API](https://platform.mindee.com/mindee/ind_passport).
8
+
9
+ Using the [sample below](https://github.com/mindee/client-lib-test-data/blob/main/products/ind_passport/default_sample.jpg), we are going to illustrate how to extract the data that we want using the OCR SDK.
10
+ ![Passport - India sample](https://github.com/mindee/client-lib-test-data/blob/main/products/ind_passport/default_sample.jpg?raw=true)
11
+
12
+ # Quick-Start
13
+ ```rb
14
+ require 'mindee'
15
+
16
+ # Init a new client
17
+ mindee_client = Mindee::Client.new(api_key: 'my-api-key')
18
+
19
+ # Load a file from disk
20
+ input_source = mindee_client.source_from_path('/path/to/the/file.ext')
21
+
22
+ # Parse the file
23
+ result = mindee_client.enqueue_and_parse(
24
+ input_source,
25
+ Mindee::Product::IND::IndianPassport::IndianPassportV1
26
+ )
27
+
28
+ # Print a full summary of the parsed data in RST format
29
+ puts result.document
30
+
31
+ # Print the document-level parsed data
32
+ # puts result.document.inference.prediction
33
+
34
+ ```
35
+
36
+ **Output (RST):**
37
+ ```rst
38
+ ########
39
+ Document
40
+ ########
41
+ :Mindee ID: cf88fd43-eaa1-497a-ba29-a9569a4edaa7
42
+ :Filename: default_sample.jpg
43
+
44
+ Inference
45
+ #########
46
+ :Product: mindee/ind_passport v1.0
47
+ :Rotation applied: Yes
48
+
49
+ Prediction
50
+ ==========
51
+ :Page Number: 1
52
+ :Country: IND
53
+ :ID Number: J8369854
54
+ :Given Names: JOCELYN MICHELLE
55
+ :Surname: DOE
56
+ :Birth Date: 1959-09-23
57
+ :Birth Place: GUNDUGOLANU
58
+ :Issuance Place: HYDERABAD
59
+ :Gender: F
60
+ :Issuance Date: 2011-10-11
61
+ :Expiry Date: 2021-10-10
62
+ :MRZ Line 1: P<DOE<<JOCELYNMICHELLE<<<<<<<<<<<<<<<<<<<<<
63
+ :MRZ Line 2: J8369854<4IND5909234F2110101<<<<<<<<<<<<<<<8
64
+ :Legal Guardian:
65
+ :Name of Spouse:
66
+ :Name of Mother:
67
+ :Old Passport Date of Issue:
68
+ :Old Passport Number:
69
+ :Address Line 1:
70
+ :Address Line 2:
71
+ :Address Line 3:
72
+ :Old Passport Place of Issue:
73
+ :File Number:
74
+ ```
75
+
76
+ # Field Types
77
+ ## Standard Fields
78
+ These fields are generic and used in several products.
79
+
80
+ ### Basic Field
81
+ Each prediction object contains a set of fields that inherit from the generic `Field` class.
82
+ A typical `Field` object will have the following attributes:
83
+
84
+ * **value** (`String`, `Float`, `Integer`, `Boolean`): corresponds to the field value. Can be `nil` if no value was extracted.
85
+ * **confidence** (Float, nil): the confidence score of the field prediction.
86
+ * **bounding_box** (`Mindee::Geometry::Quadrilateral`, `nil`): contains exactly 4 relative vertices (points) coordinates of a right rectangle containing the field in the document.
87
+ * **polygon** (`Mindee::Geometry::Polygon`, `nil`): contains the relative vertices coordinates (`Point`) of a polygon containing the field in the image.
88
+ * **page_id** (`Integer`, `nil`): the ID of the page, always `nil` when at document-level.
89
+ * **reconstructed** (`Boolean`): indicates whether an object was reconstructed (not extracted as the API gave it).
90
+
91
+
92
+ Aside from the previous attributes, all basic fields have access to a `to_s` method that can be used to print their value as a string.
93
+
94
+
95
+ ### Classification Field
96
+ The classification field `ClassificationField` does not implement all the basic `Field` attributes. It only implements **value**, **confidence** and **page_id**.
97
+
98
+ > Note: a classification field's `value is always a `String`.
99
+
100
+ ### Date Field
101
+ Aside from the basic `Field` attributes, the date field `DateField` also implements the following:
102
+
103
+ * **date_object** (`Date`): an accessible representation of the value as a JavaScript object.
104
+
105
+ ### String Field
106
+ The text field `StringField` only has one constraint: it's **value** is a `String` (or `nil`).
107
+
108
+ # Attributes
109
+ The following fields are extracted for Passport - India V1:
110
+
111
+ ## Address Line 1
112
+ **address1** ([StringField](#string-field)): The first line of the address of the passport holder.
113
+
114
+ ```rb
115
+ puts result.document.inference.prediction.address1.value
116
+ ```
117
+
118
+ ## Address Line 2
119
+ **address2** ([StringField](#string-field)): The second line of the address of the passport holder.
120
+
121
+ ```rb
122
+ puts result.document.inference.prediction.address2.value
123
+ ```
124
+
125
+ ## Address Line 3
126
+ **address3** ([StringField](#string-field)): The third line of the address of the passport holder.
127
+
128
+ ```rb
129
+ puts result.document.inference.prediction.address3.value
130
+ ```
131
+
132
+ ## Birth Date
133
+ **birth_date** ([DateField](#date-field)): The birth date of the passport holder, ISO format: YYYY-MM-DD.
134
+
135
+ ```rb
136
+ puts result.document.inference.prediction.birth_date.value
137
+ ```
138
+
139
+ ## Birth Place
140
+ **birth_place** ([StringField](#string-field)): The birth place of the passport holder.
141
+
142
+ ```rb
143
+ puts result.document.inference.prediction.birth_place.value
144
+ ```
145
+
146
+ ## Country
147
+ **country** ([StringField](#string-field)): ISO 3166-1 alpha-3 country code (3 letters format).
148
+
149
+ ```rb
150
+ puts result.document.inference.prediction.country.value
151
+ ```
152
+
153
+ ## Expiry Date
154
+ **expiry_date** ([DateField](#date-field)): The date when the passport will expire, ISO format: YYYY-MM-DD.
155
+
156
+ ```rb
157
+ puts result.document.inference.prediction.expiry_date.value
158
+ ```
159
+
160
+ ## File Number
161
+ **file_number** ([StringField](#string-field)): The file number of the passport document.
162
+
163
+ ```rb
164
+ puts result.document.inference.prediction.file_number.value
165
+ ```
166
+
167
+ ## Gender
168
+ **gender** ([ClassificationField](#classification-field)): The gender of the passport holder.
169
+
170
+ #### Possible values include:
171
+ - M
172
+ - F
173
+
174
+ ```rb
175
+ puts result.document.inference.prediction.gender.value
176
+ ```
177
+
178
+ ## Given Names
179
+ **given_names** ([StringField](#string-field)): The given names of the passport holder.
180
+
181
+ ```rb
182
+ puts result.document.inference.prediction.given_names.value
183
+ ```
184
+
185
+ ## ID Number
186
+ **id_number** ([StringField](#string-field)): The identification number of the passport document.
187
+
188
+ ```rb
189
+ puts result.document.inference.prediction.id_number.value
190
+ ```
191
+
192
+ ## Issuance Date
193
+ **issuance_date** ([DateField](#date-field)): The date when the passport was issued, ISO format: YYYY-MM-DD.
194
+
195
+ ```rb
196
+ puts result.document.inference.prediction.issuance_date.value
197
+ ```
198
+
199
+ ## Issuance Place
200
+ **issuance_place** ([StringField](#string-field)): The place where the passport was issued.
201
+
202
+ ```rb
203
+ puts result.document.inference.prediction.issuance_place.value
204
+ ```
205
+
206
+ ## Legal Guardian
207
+ **legal_guardian** ([StringField](#string-field)): The name of the legal guardian of the passport holder (if applicable).
208
+
209
+ ```rb
210
+ puts result.document.inference.prediction.legal_guardian.value
211
+ ```
212
+
213
+ ## MRZ Line 1
214
+ **mrz1** ([StringField](#string-field)): The first line of the machine-readable zone (MRZ) of the passport document.
215
+
216
+ ```rb
217
+ puts result.document.inference.prediction.mrz1.value
218
+ ```
219
+
220
+ ## MRZ Line 2
221
+ **mrz2** ([StringField](#string-field)): The second line of the machine-readable zone (MRZ) of the passport document.
222
+
223
+ ```rb
224
+ puts result.document.inference.prediction.mrz2.value
225
+ ```
226
+
227
+ ## Name of Mother
228
+ **name_of_mother** ([StringField](#string-field)): The name of the mother of the passport holder.
229
+
230
+ ```rb
231
+ puts result.document.inference.prediction.name_of_mother.value
232
+ ```
233
+
234
+ ## Name of Spouse
235
+ **name_of_spouse** ([StringField](#string-field)): The name of the spouse of the passport holder (if applicable).
236
+
237
+ ```rb
238
+ puts result.document.inference.prediction.name_of_spouse.value
239
+ ```
240
+
241
+ ## Old Passport Date of Issue
242
+ **old_passport_date_of_issue** ([DateField](#date-field)): The date of issue of the old passport (if applicable), ISO format: YYYY-MM-DD.
243
+
244
+ ```rb
245
+ puts result.document.inference.prediction.old_passport_date_of_issue.value
246
+ ```
247
+
248
+ ## Old Passport Number
249
+ **old_passport_number** ([StringField](#string-field)): The number of the old passport (if applicable).
250
+
251
+ ```rb
252
+ puts result.document.inference.prediction.old_passport_number.value
253
+ ```
254
+
255
+ ## Old Passport Place of Issue
256
+ **old_passport_place_of_issue** ([StringField](#string-field)): The place of issue of the old passport (if applicable).
257
+
258
+ ```rb
259
+ puts result.document.inference.prediction.old_passport_place_of_issue.value
260
+ ```
261
+
262
+ ## Page Number
263
+ **page_number** ([ClassificationField](#classification-field)): The page number of the passport document.
264
+
265
+ #### Possible values include:
266
+ - 1
267
+ - 2
268
+
269
+ ```rb
270
+ puts result.document.inference.prediction.page_number.value
271
+ ```
272
+
273
+ ## Surname
274
+ **surname** ([StringField](#string-field)): The surname of the passport holder.
275
+
276
+ ```rb
277
+ puts result.document.inference.prediction.surname.value
278
+ ```
279
+
280
+ # Questions?
281
+ [Join our Slack](https://join.slack.com/t/mindee-community/shared_invite/zt-2d0ds7dtz-DPAF81ZqTy20chsYpQBW5g)
data/docs/invoices_v4.md CHANGED
@@ -63,21 +63,23 @@ puts result.document
63
63
  ########
64
64
  Document
65
65
  ########
66
- :Mindee ID: 372d9d08-59d8-4e1c-9622-06648c1c238b
66
+ :Mindee ID: a67b70ea-4b1e-4eac-ae75-dda47a7064ae
67
67
  :Filename: default_sample.jpg
68
68
 
69
69
  Inference
70
70
  #########
71
- :Product: mindee/invoices v4.7
71
+ :Product: mindee/invoices v4.9
72
72
  :Rotation applied: Yes
73
73
 
74
74
  Prediction
75
75
  ==========
76
- :Locale: en; en; CAD;
76
+ :Locale: en-CA; en; CA; CAD;
77
77
  :Invoice Number: 14
78
+ :Purchase Order Number: AD29094
78
79
  :Reference Numbers: AD29094
79
80
  :Purchase Date: 2018-09-25
80
- :Due Date:
81
+ :Due Date: 2011-12-01
82
+ :Payment Date: 2011-12-01
81
83
  :Total Net: 2145.00
82
84
  :Total Amount: 2608.20
83
85
  :Total Tax: 193.20
@@ -93,7 +95,7 @@ Prediction
93
95
  :Supplier Address: 156 University Ave, Toronto ON, Canada, M5H 2H7
94
96
  :Supplier Phone Number: 4165551212
95
97
  :Supplier Website:
96
- :Supplier Email: ldoi@example.com
98
+ :Supplier Email: j_coi@example.com
97
99
  :Customer Name: JIRO DOI
98
100
  :Customer Company Registrations:
99
101
  :Customer Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
@@ -117,11 +119,13 @@ Page Predictions
117
119
 
118
120
  Page 0
119
121
  ------
120
- :Locale: en; en; CAD;
122
+ :Locale: en-CA; en; CA; CAD;
121
123
  :Invoice Number: 14
124
+ :Purchase Order Number: AD29094
122
125
  :Reference Numbers: AD29094
123
126
  :Purchase Date: 2018-09-25
124
- :Due Date:
127
+ :Due Date: 2011-12-01
128
+ :Payment Date: 2011-12-01
125
129
  :Total Net: 2145.00
126
130
  :Total Amount: 2608.20
127
131
  :Total Tax: 193.20
@@ -137,7 +141,7 @@ Page 0
137
141
  :Supplier Address: 156 University Ave, Toronto ON, Canada, M5H 2H7
138
142
  :Supplier Phone Number: 4165551212
139
143
  :Supplier Website:
140
- :Supplier Email: ldoi@example.com
144
+ :Supplier Email: j_coi@example.com
141
145
  :Customer Name: JIRO DOI
142
146
  :Customer Company Registrations:
143
147
  :Customer Address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
data/docs/resume_v1.md CHANGED
@@ -38,13 +38,13 @@ puts result.document
38
38
  ########
39
39
  Document
40
40
  ########
41
- :Mindee ID: bc80bae0-af75-4464-95a9-2419403c75bf
41
+ :Mindee ID: 9daa3085-152c-454e-9245-636f13fc9dc3
42
42
  :Filename: default_sample.jpg
43
43
 
44
44
  Inference
45
45
  #########
46
- :Product: mindee/resume v1.0
47
- :Rotation applied: No
46
+ :Product: mindee/resume v1.1
47
+ :Rotation applied: Yes
48
48
 
49
49
  Prediction
50
50
  ==========
@@ -54,8 +54,8 @@ Prediction
54
54
  :Surnames: Morgan
55
55
  :Nationality:
56
56
  :Email Address: christoper.m@gmail.com
57
- :Phone Number: +44 (0) 20 7666 8555
58
- :Address: 177 Great Portland Street, London W5W 6PQ
57
+ :Phone Number: +44 (0)20 7666 8555
58
+ :Address: 177 Great Portland Street, London, W5W 6PQ
59
59
  :Social Networks:
60
60
  +----------------------+----------------------------------------------------+
61
61
  | Name | URL |
@@ -72,38 +72,37 @@ Prediction
72
72
  +----------+----------------------+
73
73
  | ZHO | Beginner |
74
74
  +----------+----------------------+
75
- | DEU | Intermediate |
75
+ | DEU | Beginner |
76
76
  +----------+----------------------+
77
77
  :Hard Skills: HTML5
78
78
  PHP OOP
79
79
  JavaScript
80
80
  CSS
81
81
  MySQL
82
+ SQL
82
83
  :Soft Skills: Project management
84
+ Creative design
83
85
  Strong decision maker
84
86
  Innovative
85
87
  Complex problem solver
86
- Creative design
87
88
  Service-focused
88
89
  :Education:
89
90
  +-----------------+---------------------------+-----------+----------+---------------------------+-------------+------------+
90
91
  | Domain | Degree | End Month | End Year | School | Start Month | Start Year |
91
92
  +=================+===========================+===========+==========+===========================+=============+============+
92
- | Computer Inf... | Bachelor | | | Columbia University, NY | | 2014 |
93
+ | Computer Inf... | Bachelor | | 2014 | Columbia University, NY | | |
93
94
  +-----------------+---------------------------+-----------+----------+---------------------------+-------------+------------+
94
95
  :Professional Experiences:
95
- +-----------------+------------+---------------------------+-----------+----------+----------------------+-------------+------------+
96
- | Contract Type | Department | Employer | End Month | End Year | Role | Start Month | Start Year |
97
- +=================+============+===========================+===========+==========+======================+=============+============+
98
- | Full-Time | | Luna Web Design, New York | 05 | 2019 | Web Developer | 09 | 2015 |
99
- +-----------------+------------+---------------------------+-----------+----------+----------------------+-------------+------------+
96
+ +-----------------+------------+--------------------------------------+---------------------------+-----------+----------+----------------------+-------------+------------+
97
+ | Contract Type | Department | Description | Employer | End Month | End Year | Role | Start Month | Start Year |
98
+ +=================+============+======================================+===========================+===========+==========+======================+=============+============+
99
+ | | | Cooperate with designers to creat... | Luna Web Design, New York | 05 | 2019 | Web Developer | 09 | 2015 |
100
+ +-----------------+------------+--------------------------------------+---------------------------+-----------+----------+----------------------+-------------+------------+
100
101
  :Certificates:
101
102
  +------------+--------------------------------+---------------------------+------+
102
103
  | Grade | Name | Provider | Year |
103
104
  +============+================================+===========================+======+
104
- | | PHP Framework (certificate)... | | 2014 |
105
- +------------+--------------------------------+---------------------------+------+
106
- | | Programming Languages: Java... | | |
105
+ | | PHP Framework (certificate)... | | |
107
106
  +------------+--------------------------------+---------------------------+------+
108
107
  ```
109
108
 
@@ -171,6 +170,7 @@ A `ResumeV1Language` implements the following attributes:
171
170
  * `level` (String): The candidate's level for the language.
172
171
 
173
172
  #### Possible values include:
173
+ - Native
174
174
  - Fluent
175
175
  - Proficient
176
176
  - Intermediate
@@ -192,6 +192,7 @@ A `ResumeV1ProfessionalExperience` implements the following attributes:
192
192
  - Freelance
193
193
 
194
194
  * `department` (String): The specific department or division within the company.
195
+ * `description` (String): The description of the professional experience as written in the document.
195
196
  * `employer` (String): The name of the company or organization.
196
197
  * `end_month` (String): The month when the professional experience ended.
197
198
  * `end_year` (String): The year when the professional experience ended.
data/lib/mindee/client.rb CHANGED
@@ -128,6 +128,7 @@ module Mindee
128
128
  end
129
129
 
130
130
  # rubocop:disable Metrics/ParameterLists
131
+
131
132
  # Enqueue a document for async parsing and automatically try to retrieve it
132
133
  #
133
134
  # @param input_source [Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource]
@@ -148,8 +149,8 @@ module Mindee
148
149
  # * `:on_min_pages` Apply the operation only if document has at least this many pages.
149
150
  # @param cropper [Boolean, nil] Whether to include cropper results for each page.
150
151
  # This performs a cropping operation on the server and will increase response time.
151
- # @param initial_delay_sec [Integer, Float] initial delay before polling. Defaults to 4.
152
- # @param delay_sec [Integer, Float] delay between polling attempts. Defaults to 2.
152
+ # @param initial_delay_sec [Integer, Float] initial delay before polling. Defaults to 2.
153
+ # @param delay_sec [Integer, Float] delay between polling attempts. Defaults to 1.5.
153
154
  # @param max_retries [Integer] maximum amount of retries. Defaults to 60.
154
155
  # @return [Mindee::Parsing::Common::ApiResponse]
155
156
  def enqueue_and_parse(
@@ -161,8 +162,8 @@ module Mindee
161
162
  close_file: true,
162
163
  page_options: nil,
163
164
  cropper: false,
164
- initial_delay_sec: 4,
165
- delay_sec: 2,
165
+ initial_delay_sec: 2,
166
+ delay_sec: 1.5,
166
167
  max_retries: 60
167
168
  )
168
169
  enqueue_res = enqueue(
@@ -271,13 +272,13 @@ module Mindee
271
272
  # @param max_retries [Integer, nil] maximum amount of retries.
272
273
  def validate_async_params(initial_delay_sec, delay_sec, max_retries)
273
274
  min_delay_sec = 1
274
- min_initial_delay_sec = 2
275
+ min_initial_delay_sec = 1
275
276
  min_retries = 2
276
- raise "Cannot set auto-poll delay to less than #{min_delay_sec} seconds" if delay_sec < min_delay_sec
277
+ raise "Cannot set auto-poll delay to less than #{min_delay_sec} second(s)" if delay_sec < min_delay_sec
277
278
  if initial_delay_sec < min_initial_delay_sec
278
- raise "Cannot set initial parsing delay to less than #{min_initial_delay_sec} seconds"
279
+ raise "Cannot set initial parsing delay to less than #{min_initial_delay_sec} second(s)"
279
280
  end
280
- raise "Cannot set auto-poll delay to less than #{min_retries} seconds" if max_retries < min_retries
281
+ raise "Cannot set auto-poll retries to less than #{min_retries}" if max_retries < min_retries
281
282
  end
282
283
 
283
284
  # Creates an endpoint with the given values. Raises an error if the endpoint is invalid.
@@ -54,7 +54,6 @@ module Mindee
54
54
  image = MiniMagick::Image.read(@buffer)
55
55
  image.format file_format.downcase
56
56
  image.write resolved_path.to_s
57
- logger.info("File saved successfully to '#{resolved_path}'.")
58
57
  rescue TypeError
59
58
  raise 'Invalid path/filename provided.'
60
59
  rescue StandardError
@@ -11,8 +11,8 @@ module Mindee
11
11
  # Image Extraction Module.
12
12
  module Extraction
13
13
  # Image Extraction wrapper class.
14
- class ImageExtractor
15
- def self.attach_image_as_new_file(input_buffer)
14
+ module ImageExtractor
15
+ def self.attach_image_as_new_file(input_buffer, format: 'jpg')
16
16
  # Attaches an image as a new page in a PdfDocument object.
17
17
  #
18
18
  # @param [StringIO] input_buffer Input buffer. Only supports JPEG.
@@ -21,9 +21,9 @@ module Mindee
21
21
  magick_image = MiniMagick::Image.read(input_buffer)
22
22
  # NOTE: some jpeg images get rendered as three different versions of themselves per output if the format isn't
23
23
  # converted.
24
- magick_image.format('jpg')
24
+ magick_image.format(format)
25
25
  original_density = magick_image.resolution
26
- scale_factor = original_density[0].to_f / 4.166666 # No clue why bit the resolution needs to be reduced for
26
+ scale_factor = original_density[0].to_f / 4.166666 # No clue why the resolution needs to be reduced for
27
27
  # the pdf otherwise the resulting image shrinks.
28
28
  magick_image.format('pdf', 0, { density: scale_factor.to_s })
29
29
  Origami::PDF.read(StringIO.new(magick_image.to_blob))
@@ -37,27 +37,12 @@ module Mindee
37
37
  # to extract.
38
38
  # @return [Array<Mindee::Extraction::ExtractedImage>] Extracted Images.
39
39
  def self.extract_multiple_images_from_source(input_source, page_id, polygons)
40
- new_stream = load_doc(input_source, page_id)
40
+ new_stream = load_input_source_pdf_page_as_image(input_source, page_id)
41
41
  new_stream.seek(0)
42
42
 
43
43
  extract_images_from_polygons(input_source, new_stream, page_id, polygons)
44
44
  end
45
45
 
46
- # Retrieves a PDF document's page.
47
- #
48
- # @param [Origami::PDF] pdf_doc Origami PDF handle.
49
- # @param [Integer] page_id Page ID.
50
- def self.get_page(pdf_doc, page_id)
51
- stream = StringIO.new
52
- pdf_doc.save(stream)
53
-
54
- options = {
55
- page_indexes: [page_id - 1],
56
- }
57
-
58
- Mindee::PDF::PdfProcessor.parse(stream, options)
59
- end
60
-
61
46
  # Extracts images from their positions on a file (as polygons).
62
47
  #
63
48
  # @param [Mindee::Input::Source::LocalInputSource] input_source Local input source.
@@ -179,10 +164,10 @@ module Mindee
179
164
  # @param input_file [LocalInputSource] Local input.
180
165
  # @param [Integer] page_id Page ID.
181
166
  # @return [MiniMagick::Image] A valid PdfDocument handle.
182
- def self.load_doc(input_file, page_id)
167
+ def self.load_input_source_pdf_page_as_image(input_file, page_id)
183
168
  input_file.io_stream.rewind
184
169
  if input_file.pdf?
185
- get_page(Origami::PDF.read(input_file.io_stream), page_id)
170
+ Mindee::PDF::PdfProcessor.get_page(Origami::PDF.read(input_file.io_stream), page_id)
186
171
  else
187
172
  input_file.io_stream
188
173
  end
@@ -65,6 +65,7 @@ module Mindee
65
65
 
66
66
  # rubocop:disable Metrics/CyclomaticComplexity
67
67
  # rubocop:disable Metrics/PerceivedComplexity
68
+
68
69
  # Extracts invoices as complete PDFs from the document.
69
70
  # @param page_indexes [Array<Array<Integer>, InvoiceSplitterV1PageGroup>]
70
71
  # @param strict [Boolean]
@@ -99,6 +100,7 @@ module Mindee
99
100
  end
100
101
  extract_sub_documents(correct_page_indexes)
101
102
  end
103
+
102
104
  # rubocop:enable Metrics/CyclomaticComplexity
103
105
  # rubocop:enable Metrics/PerceivedComplexity
104
106
 
@@ -271,6 +271,7 @@ module Mindee
271
271
  end
272
272
  candidates
273
273
  end
274
+
274
275
  # rubocop:enable Metrics/CyclomaticComplexity
275
276
  # rubocop:enable Metrics/PerceivedComplexity
276
277
 
@@ -10,9 +10,10 @@ module Mindee
10
10
  # @return [Float]
11
11
  attr_accessor :y
12
12
 
13
+ # rubocop:disable Naming/MethodParameterName
14
+
13
15
  # @param x [Float]
14
16
  # @param y [Float]
15
- # rubocop:disable Naming/MethodParameterName
16
17
  def initialize(x, y)
17
18
  @x = x
18
19
  @y = y
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Mindee
4
+ # Image processing module.
5
+ module Image
6
+ # Image compressor module to handle image compression.
7
+ module ImageCompressor
8
+ # Resize and/or compress an SKBitmap. This assumes the ratio was provided before hands.
9
+ # @param image [MiniMagick::Image, StringIO] Input image.
10
+ # @param quality [Integer, nil] Quality of the final file.
11
+ # @param max_width [Integer, nil] Maximum width. If not specified, the horizontal ratio will remain the same.
12
+ # @param max_height [Integer] Maximum height. If not specified, the vertical ratio will remain the same.
13
+ # @return [StringIO]
14
+ def self.compress_image(image, quality: 85, max_width: nil, max_height: nil)
15
+ processed_image = ImageUtils.to_image(image)
16
+ processed_image.format 'jpg'
17
+ final_width, final_height = ImageUtils.calculate_new_dimensions(
18
+ processed_image,
19
+ max_width: max_width,
20
+ max_height: max_height
21
+ )
22
+ ImageUtils.resize_image(processed_image, final_width, final_height) if final_width || final_height
23
+ ImageUtils.compress_image_quality(processed_image, quality)
24
+
25
+ ImageUtils.image_to_stringio(processed_image)
26
+ end
27
+ end
28
+ end
29
+ end