geolocation_service 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +88 -11
- data/lib/geolocation_service/services/import_bulk_data_service.rb +9 -9
- data/lib/geolocation_service/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 425afb60c0457918c20ce1dc77821fe51f72a5c3956baad224b2282628cf4ae7
|
4
|
+
data.tar.gz: 2b21fb3f283c1c9ae58d479c3ab1fa24ad8c989e1f741741f384f88f83c066f4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f4fdd1994c0f4ba8fa758da9440a3510e679ea737d0431e1e312fffc34bfb925020a29b10a6574f14f82f63f8681a1b7fce541ce6012f1ba1ccd008c67afba77
|
7
|
+
data.tar.gz: 40636b1b37251ed3b00eed41246fc9cf238e5b78db80b1a5d3175ef432017876b0041c2a5f295591279c9cd77690f4657bbe3c9c20e6ead478d588e2dd82b147
|
data/README.md
CHANGED
@@ -1,10 +1,20 @@
|
|
1
|
-
#
|
2
|
-
Short description and motivation.
|
1
|
+
# FindHotel Coding Challenge
|
3
2
|
|
4
|
-
|
5
|
-
|
3
|
+
The FindHotel coding challenge consists of two parts, a library and a REST API application:
|
4
|
+
|
5
|
+
1. A library with two main features:
|
6
|
+
* A service that parses the CSV file containing the raw data and persists it in a database;
|
7
|
+
* An interface to provide access to the geolocation data (model layer);
|
8
|
+
2. A REST API that uses the aforementioned library to expose the geolocation data.
|
9
|
+
|
10
|
+
This repository contains my solution to the library. You can find my solution to the REST API application here: https://github.com/jalerson/geolocation_api.
|
11
|
+
|
12
|
+
## Geolocation Service
|
13
|
+
|
14
|
+
The library was developed as a [Rails Engine gem](https://guides.rubyonrails.org/engines.html), which can be easily and seamlessly integrated into any Rails application.
|
15
|
+
|
16
|
+
### Installation
|
6
17
|
|
7
|
-
## Installation
|
8
18
|
Add this line to your application's Gemfile:
|
9
19
|
|
10
20
|
```ruby
|
@@ -12,17 +22,84 @@ gem 'geolocation_service'
|
|
12
22
|
```
|
13
23
|
|
14
24
|
And then execute:
|
25
|
+
|
15
26
|
```bash
|
16
27
|
$ bundle
|
17
28
|
```
|
18
29
|
|
19
|
-
|
30
|
+
Install the gem's migrations:
|
31
|
+
|
20
32
|
```bash
|
21
|
-
$
|
33
|
+
$ rails geolocation_service_engine:install:migrations
|
34
|
+
```
|
35
|
+
|
36
|
+
Execute pending migrations:
|
37
|
+
|
38
|
+
```bash
|
39
|
+
$ rake db:migrate
|
40
|
+
```
|
41
|
+
|
42
|
+
### Usage
|
43
|
+
|
44
|
+
The gem provides four new models: `Ip`, `City`, `Country` and `Location`.
|
45
|
+
|
46
|
+

|
47
|
+
|
48
|
+
In order to import data, you must use the `GeolocationService::Services::ImportBulkDataService`.
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
GeolocationService::Services::ImportBulkDataService.call(file_path: 'path/to/data_file.csv')
|
52
|
+
```
|
53
|
+
|
54
|
+
The service returns a [Dry::Monad::Result](https://dry-rb.org/gems/dry-monads/result/) indicating the success or failure of the importing operation.
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
result = GeolocationService::Services::ImportBulkDataService.call(file_path: 'path/to/data_file.csv')
|
58
|
+
|
59
|
+
if result.success?
|
60
|
+
# do something...
|
61
|
+
else
|
62
|
+
# do something else...
|
63
|
+
end
|
22
64
|
```
|
23
65
|
|
24
|
-
|
25
|
-
|
66
|
+
In a successful importing operation, the `result` will contain an instance of `GeolocationService::ImportResult`, which has:
|
67
|
+
|
68
|
+
- `imported_records`: number of imported records
|
69
|
+
- `invalid_records`: number of invalid records
|
70
|
+
- `time_consumed`: time consumed in seconds
|
71
|
+
|
72
|
+
In a failure importing operation, the `result` will contain an error/exception.
|
73
|
+
|
74
|
+
```ruby
|
75
|
+
result = GeolocationService::Services::ImportBulkDataService.call(file_path: 'path/to/data_file.csv')
|
76
|
+
|
77
|
+
if result.success?
|
78
|
+
import_results = result.value!
|
79
|
+
Rails.logger.info "Records imported in #{import_results.time_consumed} seconds"
|
80
|
+
else
|
81
|
+
error = result.failure
|
82
|
+
Rails.logger.error error.message
|
83
|
+
end
|
84
|
+
```
|
85
|
+
|
86
|
+
### Design decisions
|
87
|
+
|
88
|
+
The main guidance for design decisions in this project was: **provide the best importing performance while keeping the database normalized**.
|
89
|
+
|
90
|
+
In order to achieve the best importing performance using a normalized database, experiments were conduct to seek for the best performance of (a) converting the CSV data to a format/representation that could be validated and stored into the database, (b) data validation and (c) actually store the data into the database. In each case, the alternatives considered were:
|
91
|
+
|
92
|
+
**(a) CSV data conversion:** ActiveRecord instances, simple Ruby classes (no ActiveRecord), Arrays/Hashes and Structs.
|
93
|
+
|
94
|
+
**(b) data validation:** [ActiveRecord validations](https://guides.rubyonrails.org/active_record_validations.html) or [contracts/schemas](https://dry-rb.org/gems/dry-validation/1.0/).
|
95
|
+
|
96
|
+
**(c) store into the database**: in order to keep the performance as best as possible, the alternatives considered are limited to those which persist a set of records in a single `INSERT` statement: [activerecord-import gem](https://github.com/zdennis/activerecord-import) or writing and sending the SQL statements to the database.
|
97
|
+
|
98
|
+
After several experiments with different combinations, the chosen approach is using Structs, contracts/schemas and send SQL statements directly to the database. This particular combination presented a great performance when importing one million records in approximately 4 minutes while avoiding duplicates and keeping a clean code.
|
99
|
+
|
100
|
+
## Trade-offs
|
101
|
+
|
102
|
+
In order to keep the importing performance as best as possible, two design decisions have consequences which users need to be aware of.
|
26
103
|
|
27
|
-
|
28
|
-
|
104
|
+
- **Memory usage:** when importing a set of records, the service will load all existing records in memory and also add new records. This way the service avoids creating duplicated records.
|
105
|
+
- **id (primary key) needs to be manually set:** in order to guarantee the proper relationship constraints between records in the database when importing records, the `id` must be set manually in all tables, except `locations`.
|
@@ -43,10 +43,10 @@ module GeolocationService::Services
|
|
43
43
|
|
44
44
|
GeolocationService::ImportResult.new(
|
45
45
|
imported_records: {
|
46
|
-
ip: @
|
47
|
-
city: @
|
48
|
-
country: @
|
49
|
-
location: @new_records[:location].
|
46
|
+
ip: @new_records[:ip].values.size,
|
47
|
+
city: @new_records[:city].values.size,
|
48
|
+
country: @new_records[:country].values.size,
|
49
|
+
location: @new_records[:location].size
|
50
50
|
},
|
51
51
|
invalid_records: @invalid_records,
|
52
52
|
time_consumed: (Time.zone.now - start_time)
|
@@ -74,8 +74,8 @@ module GeolocationService::Services
|
|
74
74
|
def build_city(row, country)
|
75
75
|
return if row['city'].blank?
|
76
76
|
|
77
|
-
if validate(:city, name: row['city'],
|
78
|
-
new_city = Structs::CityStruct.new(@city_count, row['city'], country
|
77
|
+
if validate(:city, name: row['city'], country_id: country&.id).success?
|
78
|
+
new_city = Structs::CityStruct.new(@city_count, row['city'], country&.id)
|
79
79
|
@new_records[:city][row['city'].downcase] = new_city
|
80
80
|
@city_count += 1
|
81
81
|
return new_city
|
@@ -117,9 +117,9 @@ module GeolocationService::Services
|
|
117
117
|
end
|
118
118
|
|
119
119
|
def load_new_records
|
120
|
-
@ip_count = 0
|
121
|
-
@city_count = 0
|
122
|
-
@country_count = 0
|
120
|
+
@ip_count = Ip.count == 0 ? 0 : Ip.last.id + 1
|
121
|
+
@city_count = City.count == 0 ? 0 : City.last.id + 1
|
122
|
+
@country_count = Country.count == 0 ? 0 : Country.last.id + 1
|
123
123
|
@invalid_records = 0
|
124
124
|
@new_records = {ip: {}, location: [], city: {}, country: {}}
|
125
125
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: geolocation_service
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jalerson Lima
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-09-
|
11
|
+
date: 2019-09-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rails
|