geolocation_service 0.1.0 → 0.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +88 -11
- data/lib/geolocation_service/services/import_bulk_data_service.rb +9 -9
- data/lib/geolocation_service/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 425afb60c0457918c20ce1dc77821fe51f72a5c3956baad224b2282628cf4ae7
|
4
|
+
data.tar.gz: 2b21fb3f283c1c9ae58d479c3ab1fa24ad8c989e1f741741f384f88f83c066f4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f4fdd1994c0f4ba8fa758da9440a3510e679ea737d0431e1e312fffc34bfb925020a29b10a6574f14f82f63f8681a1b7fce541ce6012f1ba1ccd008c67afba77
|
7
|
+
data.tar.gz: 40636b1b37251ed3b00eed41246fc9cf238e5b78db80b1a5d3175ef432017876b0041c2a5f295591279c9cd77690f4657bbe3c9c20e6ead478d588e2dd82b147
|
data/README.md
CHANGED
@@ -1,10 +1,20 @@
|
|
1
|
-
#
|
2
|
-
Short description and motivation.
|
1
|
+
# FindHotel Coding Challenge
|
3
2
|
|
4
|
-
|
5
|
-
|
3
|
+
The FindHotel coding challenge consists of two parts, a library and a REST API application:
|
4
|
+
|
5
|
+
1. A library with two main features:
|
6
|
+
* A service that parses the CSV file containing the raw data and persists it in a database;
|
7
|
+
* An interface to provide access to the geolocation data (model layer);
|
8
|
+
2. A REST API that uses the aforementioned library to expose the geolocation data.
|
9
|
+
|
10
|
+
This repository contains my solution to the library. You can find my solution to the REST API application here: https://github.com/jalerson/geolocation_api.
|
11
|
+
|
12
|
+
## Geolocation Service
|
13
|
+
|
14
|
+
The library was developed as a [Rails Engine gem](https://guides.rubyonrails.org/engines.html), which can be easily and seamlessly integrated into any Rails application.
|
15
|
+
|
16
|
+
### Installation
|
6
17
|
|
7
|
-
## Installation
|
8
18
|
Add this line to your application's Gemfile:
|
9
19
|
|
10
20
|
```ruby
|
@@ -12,17 +22,84 @@ gem 'geolocation_service'
|
|
12
22
|
```
|
13
23
|
|
14
24
|
And then execute:
|
25
|
+
|
15
26
|
```bash
|
16
27
|
$ bundle
|
17
28
|
```
|
18
29
|
|
19
|
-
|
30
|
+
Install the gem's migrations:
|
31
|
+
|
20
32
|
```bash
|
21
|
-
$
|
33
|
+
$ rails geolocation_service_engine:install:migrations
|
34
|
+
```
|
35
|
+
|
36
|
+
Execute pending migrations:
|
37
|
+
|
38
|
+
```bash
|
39
|
+
$ rake db:migrate
|
40
|
+
```
|
41
|
+
|
42
|
+
### Usage
|
43
|
+
|
44
|
+
The gem provides four new models: `Ip`, `City`, `Country` and `Location`.
|
45
|
+
|
46
|
+
![Models and associations](https://i.ibb.co/Dbb2nTH/geocoding-service-erd.png)
|
47
|
+
|
48
|
+
In order to import data, you must use the `GeolocationService::Services::ImportBulkDataService`.
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
GeolocationService::Services::ImportBulkDataService.call(file_path: 'path/to/data_file.csv')
|
52
|
+
```
|
53
|
+
|
54
|
+
The service returns a [Dry::Monad::Result](https://dry-rb.org/gems/dry-monads/result/) indicating the success or failure of the importing operation.
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
result = GeolocationService::Services::ImportBulkDataService.call(file_path: 'path/to/data_file.csv')
|
58
|
+
|
59
|
+
if result.success?
|
60
|
+
# do something...
|
61
|
+
else
|
62
|
+
# do something else...
|
63
|
+
end
|
22
64
|
```
|
23
65
|
|
24
|
-
|
25
|
-
|
66
|
+
In a successful importing operation, the `result` will contain an instance of `GeolocationService::ImportResult`, which has:
|
67
|
+
|
68
|
+
- `imported_records`: number of imported records
|
69
|
+
- `invalid_records`: number of invalid records
|
70
|
+
- `time_consumed`: time consumed in seconds
|
71
|
+
|
72
|
+
In a failure importing operation, the `result` will contain an error/exception.
|
73
|
+
|
74
|
+
```ruby
|
75
|
+
result = GeolocationService::Services::ImportBulkDataService.call(file_path: 'path/to/data_file.csv')
|
76
|
+
|
77
|
+
if result.success?
|
78
|
+
import_results = result.value!
|
79
|
+
Rails.logger.info "Records imported in #{import_results.time_consumed} seconds"
|
80
|
+
else
|
81
|
+
error = result.failure
|
82
|
+
Rails.logger.error error.message
|
83
|
+
end
|
84
|
+
```
|
85
|
+
|
86
|
+
### Design decisions
|
87
|
+
|
88
|
+
The main guidance for design decisions in this project was: **provide the best importing performance while keeping the database normalized**.
|
89
|
+
|
90
|
+
In order to achieve the best importing performance using a normalized database, experiments were conduct to seek for the best performance of (a) converting the CSV data to a format/representation that could be validated and stored into the database, (b) data validation and (c) actually store the data into the database. In each case, the alternatives considered were:
|
91
|
+
|
92
|
+
**(a) CSV data conversion:** ActiveRecord instances, simple Ruby classes (no ActiveRecord), Arrays/Hashes and Structs.
|
93
|
+
|
94
|
+
**(b) data validation:** [ActiveRecord validations](https://guides.rubyonrails.org/active_record_validations.html) or [contracts/schemas](https://dry-rb.org/gems/dry-validation/1.0/).
|
95
|
+
|
96
|
+
**(c) store into the database**: in order to keep the performance as best as possible, the alternatives considered are limited to those which persist a set of records in a single `INSERT` statement: [activerecord-import gem](https://github.com/zdennis/activerecord-import) or writing and sending the SQL statements to the database.
|
97
|
+
|
98
|
+
After several experiments with different combinations, the chosen approach is using Structs, contracts/schemas and send SQL statements directly to the database. This particular combination presented a great performance when importing one million records in approximately 4 minutes while avoiding duplicates and keeping a clean code.
|
99
|
+
|
100
|
+
## Trade-offs
|
101
|
+
|
102
|
+
In order to keep the importing performance as best as possible, two design decisions have consequences which users need to be aware of.
|
26
103
|
|
27
|
-
|
28
|
-
|
104
|
+
- **Memory usage:** when importing a set of records, the service will load all existing records in memory and also add new records. This way the service avoids creating duplicated records.
|
105
|
+
- **id (primary key) needs to be manually set:** in order to guarantee the proper relationship constraints between records in the database when importing records, the `id` must be set manually in all tables, except `locations`.
|
@@ -43,10 +43,10 @@ module GeolocationService::Services
|
|
43
43
|
|
44
44
|
GeolocationService::ImportResult.new(
|
45
45
|
imported_records: {
|
46
|
-
ip: @
|
47
|
-
city: @
|
48
|
-
country: @
|
49
|
-
location: @new_records[:location].
|
46
|
+
ip: @new_records[:ip].values.size,
|
47
|
+
city: @new_records[:city].values.size,
|
48
|
+
country: @new_records[:country].values.size,
|
49
|
+
location: @new_records[:location].size
|
50
50
|
},
|
51
51
|
invalid_records: @invalid_records,
|
52
52
|
time_consumed: (Time.zone.now - start_time)
|
@@ -74,8 +74,8 @@ module GeolocationService::Services
|
|
74
74
|
def build_city(row, country)
|
75
75
|
return if row['city'].blank?
|
76
76
|
|
77
|
-
if validate(:city, name: row['city'],
|
78
|
-
new_city = Structs::CityStruct.new(@city_count, row['city'], country
|
77
|
+
if validate(:city, name: row['city'], country_id: country&.id).success?
|
78
|
+
new_city = Structs::CityStruct.new(@city_count, row['city'], country&.id)
|
79
79
|
@new_records[:city][row['city'].downcase] = new_city
|
80
80
|
@city_count += 1
|
81
81
|
return new_city
|
@@ -117,9 +117,9 @@ module GeolocationService::Services
|
|
117
117
|
end
|
118
118
|
|
119
119
|
def load_new_records
|
120
|
-
@ip_count = 0
|
121
|
-
@city_count = 0
|
122
|
-
@country_count = 0
|
120
|
+
@ip_count = Ip.count == 0 ? 0 : Ip.last.id + 1
|
121
|
+
@city_count = City.count == 0 ? 0 : City.last.id + 1
|
122
|
+
@country_count = Country.count == 0 ? 0 : Country.last.id + 1
|
123
123
|
@invalid_records = 0
|
124
124
|
@new_records = {ip: {}, location: [], city: {}, country: {}}
|
125
125
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: geolocation_service
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jalerson Lima
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-09-
|
11
|
+
date: 2019-09-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rails
|