estratto 1.0.0 → 1.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f4425ecb691c561000266f1717d1bcab1aa9c198
4
- data.tar.gz: f3526413190af262cf089112ae34682f1465894b
3
+ metadata.gz: 768680440af76aff2f08d4a72dad782fbfa17c70
4
+ data.tar.gz: 68974ef5b81a15b4fd57814f21025b93a5cd2ac8
5
5
  SHA512:
6
- metadata.gz: fd3154c347e94acd858b7a92e88817d00f3e55618ac8191786a5b5606212fb519f9180e071dcffec9e776d15c8bb88dedbe709a3fcedd514098d74ae96718ce6
7
- data.tar.gz: be0cdd97b6a300d68bd30c6b4090c20c2274c2b4a160ca7d925c49c32a30704c144cf6de82e88aa8cd0f2a068cc22b83c71f4ab84e5282daffd11728f5f903c7
6
+ metadata.gz: 0bff7f6b4c314824bd827e8a61a8459f2b9af8e2c8511fb319b244beb5d7d1034102293fda7843a71ed2c5a990885be763ae3507d79b2264d52d3d0d3eca4597
7
+ data.tar.gz: 31f0f3e48656b4359972033f5b188525dc34bacf78128f9d73a4ecbe990c9c6925cbbba8535d9e80fb4373eb248f6c32a28cf913e8bdf654ddfaaf2535516386
data/Gemfile.lock CHANGED
@@ -1,11 +1,13 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- estratto (1.0.0)
4
+ estratto (1.0.1)
5
+ charlock_holmes
5
6
 
6
7
  GEM
7
8
  remote: https://rubygems.org/
8
9
  specs:
10
+ charlock_holmes (0.7.6)
9
11
  coderay (1.1.2)
10
12
  coveralls (0.8.22)
11
13
  json (>= 1.8, < 3)
data/README.md CHANGED
@@ -1,10 +1,18 @@
1
1
  # Estratto
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/estratto.svg)](https://badge.fury.io/rb/estratto)
3
4
  [![Build Status](https://travis-ci.com/Rynaro/estratto.svg?branch=master)](https://travis-ci.com/Rynaro/estratto)
4
5
  [![Coverage Status](https://coveralls.io/repos/github/Rynaro/estratto/badge.svg?branch=master)](https://coveralls.io/github/Rynaro/estratto?branch=master)
5
6
  [![Maintainability](https://api.codeclimate.com/v1/badges/46532b90e850401fce72/maintainability)](https://codeclimate.com/github/Rynaro/estratto/maintainability)
6
7
 
7
- ### TODO: Write a super README
8
+ [![Waffle.io - Columns and their card count](https://badge.waffle.io/Rynaro/estratto.svg?columns=all)](https://waffle.io/Rynaro/estratto)
9
+
10
+ > Estratto is a easy to handle parser based on YAML templating engine. Creating a easy interface for developers, and non developers to extract data from fixed width files
11
+
12
+ ## Motivation
13
+
14
+ In various scenarios the data processment is a crucial step of a integration with partner systems, or data storage. But the task to create parsing and import data from these text format is boring, and causing code duplication in every code project.
15
+ This project borns to help developers to reduce the time spent in this task, or creating a total delegation scenario to other team responsabilities.
8
16
 
9
17
  ## Installation
10
18
 
@@ -24,7 +32,245 @@ Or install it yourself as:
24
32
 
25
33
  ## Usage
26
34
 
27
- TODO: Comming soon
35
+ **Estratto** works with simple input of _data to parse file_ and a _yaml layout equivalent_.
36
+
37
+ Example of a default call for parsing:
38
+
39
+ ```ruby
40
+ Estratto::Document.process(file: 'path/to/data.txt', layout: 'path/to/layout.yml')
41
+ ```
42
+
43
+ ### Layout specifications
44
+
45
+ Fixed width files is sometimes ~always~ painful for human reading, and the layout manual comes in a very useful pdf or spreasheet format.
46
+
47
+ Here, we'll try to made things fun again, or less painful. :joy:
48
+
49
+ The base layout for YAML file is:
50
+
51
+ ```yaml
52
+ layout:
53
+ name: 'jojo stand users'
54
+ multi-register: true
55
+ prefix: 0..1
56
+ registers:
57
+ - register: '01'
58
+ fields:
59
+ - name: name
60
+ range: 2..45
61
+ type: String
62
+ - name: stand
63
+ range: 46..75
64
+ type: String
65
+ ```
66
+
67
+ And the output will be a array of hashes reflection of your columns:
68
+
69
+ ```ruby
70
+ [
71
+ {
72
+ name: 'Jotaro Kujo',
73
+ stand: 'Star Platinum'
74
+ },
75
+ {
76
+ name: 'Giorno Giovanna',
77
+ stand: 'Golden Experience Requiem'
78
+ },
79
+ {
80
+ name: 'Jobin Higashikata',
81
+ stand: 'Speed King'
82
+ }
83
+ ]
84
+ ```
85
+
86
+ The structure follows the strict directive
87
+ ```yaml
88
+ layout:
89
+ (base configuration)
90
+ registers:
91
+ (layouts)
92
+ ```
93
+
94
+ Actually **Estratto** supports these types of fixed width layouts:
95
+
96
+ - Batch prefix based registers
97
+ - Mono layout based registers _(development)_
98
+
99
+
100
+ ### UTF-8 Conversion
101
+
102
+ Estratto makes use of [CharlockHolmes](https://github.com/brianmario/charlock_holmes) gem to detect the file content encoding and convert it to UTF-8.
103
+ This approach prevents invalid characters from being present in the output.
104
+
105
+ ### Type Coercion
106
+
107
+ **Estratto** supports type coercion, with some perks called _formats_, on layout file.
108
+
109
+ Data type supported to handle in **Estratto**
110
+
111
+ - String
112
+ - Integer
113
+ - Float
114
+ - DateTime
115
+ - Date
116
+
117
+ Default data type in fields is `String`, if no one type is setted in field list register.
118
+
119
+ Registers fields list always respect this base structure:
120
+
121
+ ```yaml
122
+ fields:
123
+ - name: name
124
+ range: 2..12
125
+ type: String
126
+ formats:
127
+ strip: true
128
+ ```
129
+
130
+ `name` is your field identification of field, this value will be your symbol in hashed parsed data
131
+
132
+ `range` is where data is inside the file. (First index is 0)
133
+
134
+ `type` data type to be coerced
135
+
136
+ `formats` receives a specific configuration for data type. Here we can format Strings, and adjust precision for unformatted Float data.
137
+
138
+ ### Formats
139
+
140
+ Formats is the resource for deal with some "surprises" that this type of file can provide to us. Like, super large string fields that has a huge blank space, DateTime with suspicious formatting, or Float without any decimal point, but the manual description shows _"Decimal(15, 2)"_
141
+
142
+ #### String
143
+
144
+ ##### strip
145
+
146
+ Works like common ruby String strip method
147
+
148
+ ```yaml
149
+ strip: true
150
+ ```
151
+
152
+ Output example:
153
+
154
+ ```ruby
155
+ #raw_data
156
+ 'Hierophant Green '
157
+ # with strip clause
158
+ 'Hierophant Green'
159
+ ```
160
+
161
+ #### Integer
162
+
163
+ Simple integer values converter. Useful in cases that you need to deal with ids.
164
+
165
+ Actually we don't have any formats for Integer. :)
166
+
167
+ ```ruby
168
+ #raw_data
169
+ '000123'
170
+ # coerced
171
+ 123
172
+ #raw_data
173
+ '123'
174
+ # coerced
175
+ 123
176
+ #raw_data
177
+ 'a'
178
+ # coerced
179
+ 0
180
+ ```
181
+
182
+ #### Float
183
+
184
+ Float is one of most important types here. The fixed width files always respect the _non logical_ format to deliver information.
185
+
186
+ ##### precision
187
+
188
+ ```yaml
189
+ precision: <integer>
190
+ ```
191
+
192
+ Examples:
193
+
194
+ ```yaml
195
+ precision: 2
196
+ ```
197
+
198
+ ```ruby
199
+ #raw data
200
+ '12345'
201
+ # with precision
202
+ 123.45
203
+ ```
204
+
205
+ ```yaml
206
+ precision: 3
207
+ ```
208
+
209
+ ```ruby
210
+ #raw data
211
+ '12345'
212
+ # with precision
213
+ 12.345
214
+ ```
215
+
216
+ ##### comma_format
217
+
218
+ ```yaml
219
+ comma_format: <boolean>
220
+ ```
221
+
222
+ Examples:
223
+
224
+ ```yaml
225
+ comma_format: true
226
+ ```
227
+
228
+ ```ruby
229
+ #raw data
230
+ '123,45'
231
+ # with comma formats
232
+ 123.45
233
+ ```
234
+
235
+
236
+ #### DateTime and Date
237
+
238
+ The `DateTime` and `Date` has the same formats attributes. But the difference, one shows DateTime format, and other always respect Date output
239
+
240
+
241
+ ##### format
242
+
243
+ ```yaml
244
+ format: <ruby strptime format pattern>
245
+ ```
246
+
247
+ Examples
248
+
249
+ ```yaml
250
+ format: '%Y%m%d'
251
+ ```
252
+
253
+ ```ruby
254
+ #raw data
255
+ '20180101'
256
+ # with comma formats
257
+ #<DateTime: 2018-01-01T00:00:00+00:00 ...>
258
+ ```
259
+
260
+ ```yaml
261
+ format: '%d/%m/%Y'
262
+ ```
263
+
264
+ ```ruby
265
+ #raw data
266
+ '01/01/2018'
267
+ # with comma formats
268
+ #<DateTime: 2018-01-01T00:00:00+00:00 ...>
269
+ ```
270
+
271
+ ## Tests
272
+
273
+ Simple `rake spec`
28
274
 
29
275
  ## Development
30
276
 
data/estratto.gemspec CHANGED
@@ -20,6 +20,8 @@ Gem::Specification.new do |spec|
20
20
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
21
  spec.require_paths = ["lib"]
22
22
 
23
+ spec.add_dependency "charlock_holmes"
24
+
23
25
  spec.add_development_dependency "bundler", "~> 1.17"
24
26
  spec.add_development_dependency "rake", "~> 10.0"
25
27
  spec.add_development_dependency "rspec", "~> 3.0"
@@ -0,0 +1,11 @@
1
+ require_relative 'encoder'
2
+
3
+ module Estratto
4
+ class Content
5
+ def self.for(file_path)
6
+ content = File.read(file_path)
7
+ encoded_content = Encoder.new(content).encode
8
+ encoded_content.split("\n")
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,21 @@
1
+ require 'charlock_holmes'
2
+
3
+ module Estratto
4
+ class Encoder
5
+ attr_reader :content
6
+
7
+ def initialize(content)
8
+ @content = content
9
+ end
10
+
11
+ def encode
12
+ CharlockHolmes::Converter.convert(content, encoding, 'UTF-8')
13
+ end
14
+
15
+ private
16
+
17
+ def encoding
18
+ CharlockHolmes::EncodingDetector.detect(content)[:encoding]
19
+ end
20
+ end
21
+ end
@@ -1,4 +1,5 @@
1
1
  require_relative 'register'
2
+ require_relative 'content'
2
3
 
3
4
  module Estratto
4
5
  class Parser
@@ -10,16 +11,15 @@ module Estratto
10
11
  end
11
12
 
12
13
  def perform
13
- @data ||= raw_data.map do |line|
14
+ @data ||= raw_content.map do |line|
14
15
  register_layout = layout.register_fields_for(line[layout.prefix_range])
15
16
  next if register_layout.nil?
16
17
  Register.new(line, register_layout).refine
17
18
  end.compact
18
19
  end
19
20
 
20
- def raw_data
21
- @raw_data ||= File.open(file_path, 'r')
21
+ def raw_content
22
+ @raw_data = Content.for(file_path)
22
23
  end
23
-
24
24
  end
25
25
  end
@@ -1,3 +1,3 @@
1
1
  module Estratto
2
- VERSION = "1.0.0"
2
+ VERSION = "1.0.1"
3
3
  end
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: estratto
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Henrique A. Lavezzo
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2019-01-08 00:00:00.000000000 Z
11
+ date: 2019-01-21 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: charlock_holmes
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: bundler
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -85,6 +99,7 @@ files:
85
99
  - bin/setup
86
100
  - estratto.gemspec
87
101
  - lib/estratto.rb
102
+ - lib/estratto/content.rb
88
103
  - lib/estratto/data/base.rb
89
104
  - lib/estratto/data/coercer.rb
90
105
  - lib/estratto/data/date.rb
@@ -93,6 +108,7 @@ files:
93
108
  - lib/estratto/data/integer.rb
94
109
  - lib/estratto/data/string.rb
95
110
  - lib/estratto/document.rb
111
+ - lib/estratto/encoder.rb
96
112
  - lib/estratto/helpers/range.rb
97
113
  - lib/estratto/layout/base.rb
98
114
  - lib/estratto/layout/factory.rb