data_miner 2.0.1 → 2.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +5 -7
- data/CHANGELOG +13 -0
- data/LICENSE +1 -1
- data/README.markdown +112 -0
- data/data_miner.gemspec +2 -2
- data/lib/data_miner.rb +26 -12
- data/lib/data_miner/active_record_class_methods.rb +108 -0
- data/lib/data_miner/attribute.rb +150 -76
- data/lib/data_miner/dictionary.rb +40 -18
- data/lib/data_miner/run.rb +35 -0
- data/lib/data_miner/script.rb +123 -2
- data/lib/data_miner/step.rb +11 -3
- data/lib/data_miner/step/import.rb +100 -64
- data/lib/data_miner/step/process.rb +46 -28
- data/lib/data_miner/step/tap.rb +156 -123
- data/lib/data_miner/version.rb +1 -1
- data/test/test_safety.rb +61 -25
- metadata +8 -6
- data/README.rdoc +0 -289
- data/lib/data_miner/active_record_extensions.rb +0 -38
data/.gitignore
CHANGED
data/CHANGELOG
CHANGED
@@ -1,3 +1,16 @@
|
|
1
|
+
2.0.2 / 2012-05-04
|
2
|
+
|
3
|
+
* Breaking changes
|
4
|
+
|
5
|
+
* Import descriptions are no longer optional
|
6
|
+
* Import options are no longer optional (but then, they never were)
|
7
|
+
|
8
|
+
* Enhancements
|
9
|
+
|
10
|
+
* Real documentation!
|
11
|
+
* Replace class-level mutexes with simple Thread.exclusive calls
|
12
|
+
* Simplified DataMiner::Dictionary
|
13
|
+
|
1
14
|
2.0.1 / 2012-04-18
|
2
15
|
|
3
16
|
* Enhancements
|
data/LICENSE
CHANGED
data/README.markdown
ADDED
@@ -0,0 +1,112 @@
|
|
1
|
+
# data_miner
|
2
|
+
|
3
|
+
Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models.
|
4
|
+
|
5
|
+
Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.
|
6
|
+
|
7
|
+
## Real-world usage
|
8
|
+
|
9
|
+
<p><a href="http://brighterplanet.com"><img src="https://s3.amazonaws.com/static.brighterplanet.com/assets/logos/flush-left/inline/green/rasterized/brighter_planet-160-transparent.png" alt="Brighter Planet logo"/></a></p>
|
10
|
+
|
11
|
+
We use `data_miner` for [data science at Brighter Planet](http://brighterplanet.com/research) and in production at
|
12
|
+
|
13
|
+
* [Brighter Planet's reference data web service](http://data.brighterplanet.com)
|
14
|
+
* [Brighter Planet's impact estimate web service](http://impact.brighterplanet.com)
|
15
|
+
|
16
|
+
The killer combination for us is:
|
17
|
+
|
18
|
+
1. [`active_record_inline_schema`](https://github.com/seamusabshere/active_record_inline_schema) - define table structure
|
19
|
+
2. [`remote_table`](https://github.com/seamusabshere/remote_table) - download data and parse it
|
20
|
+
3. [`errata`](https://github.com/seamusabshere/errata) - apply corrections in a transparent way
|
21
|
+
4. [`data_miner`](https://github.com/seamusabshere/remote_table) (this library!) - import data idempotently
|
22
|
+
|
23
|
+
## Documentation
|
24
|
+
|
25
|
+
Check out the [extensive documentation](http://rdoc.info/github/seamusabshere/data_miner).
|
26
|
+
|
27
|
+
## Quick start
|
28
|
+
|
29
|
+
You define <code>data_miner</code> blocks in your ActiveRecord models. For example, in <code>app/models/country.rb</code>:
|
30
|
+
|
31
|
+
class Country < ActiveRecord::Base
|
32
|
+
self.primary_key = 'iso_3166_code'
|
33
|
+
|
34
|
+
data_miner do
|
35
|
+
import("OpenGeoCode.org's Country Codes to Country Names list",
|
36
|
+
:url => 'http://opengeocode.org/download/countrynames.txt',
|
37
|
+
:format => :delimited,
|
38
|
+
:delimiter => '; ',
|
39
|
+
:headers => false,
|
40
|
+
:skip => 22) do
|
41
|
+
key :iso_3166_code, :field_number => 0
|
42
|
+
store :iso_3166_alpha_3_code, :field_number => 1
|
43
|
+
store :iso_3166_numeric_code, :field_number => 2
|
44
|
+
store :name, :field_number => 5
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
Now you can run:
|
50
|
+
|
51
|
+
>> Country.run_data_miner!
|
52
|
+
=> nil
|
53
|
+
|
54
|
+
## More advanced usage
|
55
|
+
|
56
|
+
The [`earth` library](https://github.com/brighterplanet/earth) has dozens of real-life examples showing how to download, pull out of a ZIP/TAR/BZ2 archive, parse, correct, and import CSVs, fixed-width files, ODS, XLS, XLSX, even HTML and XML:
|
57
|
+
|
58
|
+
<table>
|
59
|
+
<tr>
|
60
|
+
<th>Model</th>
|
61
|
+
<th>Highlights</th>
|
62
|
+
<th>Reference</th>
|
63
|
+
</tr>
|
64
|
+
<tr>
|
65
|
+
<td><a href="http://data.brighterplanet.com/aircraft">Aircraft</a></td>
|
66
|
+
<td>parsing Microsoft Frontpage HTML (!)</td>
|
67
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb">data_miner.rb</a></td>
|
68
|
+
</tr>
|
69
|
+
<tr>
|
70
|
+
<td><a href="http://data.brighterplanet.com/airports">Airports</a></td>
|
71
|
+
<td>forcing column names and use of <code>:select</code> block (<code>Proc</code>)</td>
|
72
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/air/airport/data_miner.rb">data_miner.rb</a></td>
|
73
|
+
</tr>
|
74
|
+
<tr>
|
75
|
+
<td><a href="http://data.brighterplanet.com/automobile_make_model_year_variants">Automobile model variants</a></td>
|
76
|
+
<td>super advanced usage of "custom parser" and errata</td>
|
77
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb">data_miner.rb</a></td>
|
78
|
+
</tr>
|
79
|
+
<tr>
|
80
|
+
<td><a href="http://data.brighterplanet.com/countries">Country</a></td>
|
81
|
+
<td>parsing CSV and a few other tricks</td>
|
82
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/locality/country/data_miner.rb">data_miner.rb</a></td>
|
83
|
+
</tr>
|
84
|
+
<tr>
|
85
|
+
<td><a href="http://data.brighterplanet.com/egrid_regions">EGRID regions</a></td>
|
86
|
+
<td>parsing XLS</td>
|
87
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/locality/egrid_region/data_miner.rb">data_miner.rb</a></td>
|
88
|
+
</tr>
|
89
|
+
<tr>
|
90
|
+
<td><a href="http://data.brighterplanet.com/flight_segments">Flight segment (stage)</a></td>
|
91
|
+
<td>super advanced usage of POSTing form data</td>
|
92
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb">data_miner.rb</a></td>
|
93
|
+
</tr>
|
94
|
+
<tr>
|
95
|
+
<td><a href="http://data.brighterplanet.com/zip_codes">Zip codes</a></td>
|
96
|
+
<td>downloading a ZIP file and pulling an XLSX out of it</td>
|
97
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/locality/zip_code.rb">data_miner.rb</a></td>
|
98
|
+
</tr>
|
99
|
+
</table>
|
100
|
+
|
101
|
+
And many more - look for the `data_miner.rb` file that corresponds to each model. Note that you would normally put the `data_miner` declaration right inside the ActiveRecord model file... it's kept separate in `earth` so that loading it is optional.
|
102
|
+
|
103
|
+
## Authors
|
104
|
+
|
105
|
+
* Seamus Abshere <seamus@abshere.net>
|
106
|
+
* Andy Rossmeissl <andy@rossmeissl.net>
|
107
|
+
* Derek Kastner <dkastner@gmail.com>
|
108
|
+
* Ian Hough <ijhough@gmail.com>
|
109
|
+
|
110
|
+
## Copyright
|
111
|
+
|
112
|
+
Copyright (c) 2012 Brighter Planet. See LICENSE for details.
|
data/data_miner.gemspec
CHANGED
@@ -7,8 +7,8 @@ Gem::Specification.new do |s|
|
|
7
7
|
s.authors = ["Seamus Abshere", "Andy Rossmeissl", "Derek Kastner"]
|
8
8
|
s.email = ["seamus@abshere.net"]
|
9
9
|
s.homepage = "https://github.com/seamusabshere/data_miner"
|
10
|
-
s.summary = %{
|
11
|
-
s.description = %q{
|
10
|
+
s.summary = %{Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models.}
|
11
|
+
s.description = %q{Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. You can also convert units.}
|
12
12
|
|
13
13
|
s.rubyforge_project = "data_miner"
|
14
14
|
|
data/lib/data_miner.rb
CHANGED
@@ -14,7 +14,7 @@ if RUBY_VERSION >= '1.9'
|
|
14
14
|
end
|
15
15
|
end
|
16
16
|
|
17
|
-
require 'data_miner/
|
17
|
+
require 'data_miner/active_record_class_methods'
|
18
18
|
require 'data_miner/attribute'
|
19
19
|
require 'data_miner/script'
|
20
20
|
require 'data_miner/dictionary'
|
@@ -24,14 +24,13 @@ require 'data_miner/step/tap'
|
|
24
24
|
require 'data_miner/step/process'
|
25
25
|
require 'data_miner/run'
|
26
26
|
|
27
|
+
# A singleton class that holds global configuration for data mining.
|
28
|
+
#
|
29
|
+
# All of its instance methods are delegated to +DataMiner.instance+, so you can call +DataMiner.model_names+, for example.
|
30
|
+
#
|
31
|
+
# @see DataMiner::ActiveRecordClassMethods#data_miner Overview of how to define data miner scripts inside of ActiveRecord models.
|
27
32
|
class DataMiner
|
28
33
|
class << self
|
29
|
-
delegate :perform, :to => :instance
|
30
|
-
delegate :run, :to => :instance
|
31
|
-
delegate :logger, :to => :instance
|
32
|
-
delegate :logger=, :to => :instance
|
33
|
-
delegate :model_names, :to => :instance
|
34
|
-
|
35
34
|
# @private
|
36
35
|
def downcase(str)
|
37
36
|
defined?(::UnicodeUtils) ? ::UnicodeUtils.downcase(str) : str.downcase
|
@@ -48,16 +47,20 @@ class DataMiner
|
|
48
47
|
end
|
49
48
|
end
|
50
49
|
|
51
|
-
MUTEX = ::Mutex.new
|
52
50
|
INNER_SPACE = /[ ]+/
|
53
51
|
|
54
52
|
include ::Singleton
|
55
53
|
|
56
54
|
attr_writer :logger
|
57
55
|
|
56
|
+
# Run data miner scripts on models identified by their names. Defaults to all models.
|
57
|
+
#
|
58
|
+
# @param [optional, Array<String>] model_names Names of models to be run.
|
59
|
+
#
|
60
|
+
# @return [Array<DataMiner::Run>]
|
58
61
|
def perform(model_names = DataMiner.model_names)
|
59
62
|
Script.uniq do
|
60
|
-
model_names.
|
63
|
+
model_names.map do |model_name|
|
61
64
|
model_name.constantize.run_data_miner!
|
62
65
|
end
|
63
66
|
end
|
@@ -66,8 +69,11 @@ class DataMiner
|
|
66
69
|
# legacy
|
67
70
|
alias :run :perform
|
68
71
|
|
72
|
+
# Where DataMiner logs to. Defaults to +Rails.logger+ or +ActiveRecord::Base.logger+ if either is available.
|
73
|
+
#
|
74
|
+
# @return [Logger]
|
69
75
|
def logger
|
70
|
-
@logger ||
|
76
|
+
@logger || ::Thread.exclusive do
|
71
77
|
@logger ||= if defined?(::Rails)
|
72
78
|
::Rails.logger
|
73
79
|
elsif defined?(::ActiveRecord) and active_record_logger = ::ActiveRecord::Base.logger
|
@@ -79,12 +85,20 @@ class DataMiner
|
|
79
85
|
end
|
80
86
|
end
|
81
87
|
|
88
|
+
# Names of the models that have defined a data miner script.
|
89
|
+
#
|
90
|
+
# @note Models won't appear here until the files containing their data miner scripts have been +require+'d.
|
91
|
+
#
|
92
|
+
# @return [Set<String>]
|
82
93
|
def model_names
|
83
|
-
@model_names ||
|
94
|
+
@model_names || ::Thread.exclusive do
|
84
95
|
@model_names ||= ::Set.new
|
85
96
|
end
|
86
97
|
end
|
87
98
|
|
99
|
+
class << self
|
100
|
+
delegate(*DataMiner.instance_methods(false), :to => :instance)
|
101
|
+
end
|
88
102
|
end
|
89
103
|
|
90
|
-
::ActiveRecord::Base.extend ::DataMiner::
|
104
|
+
::ActiveRecord::Base.extend ::DataMiner::ActiveRecordClassMethods
|
@@ -0,0 +1,108 @@
|
|
1
|
+
require 'active_record'
|
2
|
+
require 'lock_method'
|
3
|
+
|
4
|
+
class DataMiner
|
5
|
+
# Class methods that are mixed into models (i.e. ActiveRecord::Base)
|
6
|
+
module ActiveRecordClassMethods
|
7
|
+
# Access this model's script.
|
8
|
+
#
|
9
|
+
# @return [DataMiner::Script] This model's data miner script.
|
10
|
+
def data_miner_script
|
11
|
+
@data_miner_script || ::Thread.exclusive do
|
12
|
+
@data_miner_script ||= DataMiner::Script.new(self)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
# Access to recordkeeping.
|
17
|
+
#
|
18
|
+
# @return [ActiveRecord::Relation] Records of running the data miner script.
|
19
|
+
def data_miner_runs
|
20
|
+
DataMiner::Run.scoped :conditions => { :model_name => name }
|
21
|
+
end
|
22
|
+
|
23
|
+
# Run this model's script.
|
24
|
+
#
|
25
|
+
# @return [DataMiner::Run]
|
26
|
+
def run_data_miner!
|
27
|
+
data_miner_script.perform
|
28
|
+
end
|
29
|
+
|
30
|
+
# Run the data miner scripts of parent associations. Useful for dependencies. Safe to call using +process+.
|
31
|
+
#
|
32
|
+
# @note Used extensively in https://github.com/brighterplanet/earth
|
33
|
+
#
|
34
|
+
# @example Since Provinces depend on Countries, make sure Countries are data mined first
|
35
|
+
# class Country < ActiveRecord::Base
|
36
|
+
# [...some data miner script...]
|
37
|
+
# end
|
38
|
+
# class Province < ActiveRecord::Base
|
39
|
+
# belongs_to :country
|
40
|
+
# data_miner do
|
41
|
+
# [...]
|
42
|
+
# process "make sure my dependencies have been loaded" do
|
43
|
+
# run_data_miner_on_parent_associations!
|
44
|
+
# end
|
45
|
+
# [...]
|
46
|
+
# end
|
47
|
+
# end
|
48
|
+
#
|
49
|
+
# @return [Array<DataMiner::Run>]
|
50
|
+
def run_data_miner_on_parent_associations!
|
51
|
+
reflect_on_all_associations(:belongs_to).reject do |assoc|
|
52
|
+
assoc.options[:polymorphic]
|
53
|
+
end.map do |non_polymorphic_belongs_to_assoc|
|
54
|
+
non_polymorphic_belongs_to_assoc.klass.run_data_miner!
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
# Define a data miner script.
|
59
|
+
#
|
60
|
+
# @param [optional, Hash] options
|
61
|
+
# @option options [TrueClass, FalseClass] :append (false) Add steps to existing data miner script instead of starting from scratch.
|
62
|
+
#
|
63
|
+
# @yield [] The block defining the steps.
|
64
|
+
#
|
65
|
+
# @see DataMiner::Script#import
|
66
|
+
# @see DataMiner::Script#process
|
67
|
+
# @see DataMiner::Script#tap
|
68
|
+
#
|
69
|
+
# @example Creating steps
|
70
|
+
# class MyModel < ActiveRecord::Base
|
71
|
+
# data_miner do
|
72
|
+
# process [...]
|
73
|
+
# import [...]
|
74
|
+
# import [...yes, it's ok to have more than one import step...]
|
75
|
+
# process [...]
|
76
|
+
# [...etc...]
|
77
|
+
# end
|
78
|
+
# end
|
79
|
+
#
|
80
|
+
# @example From the README
|
81
|
+
# class Country < ActiveRecord::Base
|
82
|
+
# self.primary_key = 'iso_3166_code'
|
83
|
+
# data_miner do
|
84
|
+
# import("OpenGeoCode.org's Country Codes to Country Names list",
|
85
|
+
# :url => 'http://opengeocode.org/download/countrynames.txt',
|
86
|
+
# :format => :delimited,
|
87
|
+
# :delimiter => '; ',
|
88
|
+
# :headers => false,
|
89
|
+
# :skip => 22) do
|
90
|
+
# key :iso_3166_code, :field_number => 0
|
91
|
+
# store :iso_3166_alpha_3_code, :field_number => 1
|
92
|
+
# store :iso_3166_numeric_code, :field_number => 2
|
93
|
+
# store :name, :field_number => 5
|
94
|
+
# end
|
95
|
+
# end
|
96
|
+
# end
|
97
|
+
#
|
98
|
+
# @return [nil]
|
99
|
+
def data_miner(options = {}, &blk)
|
100
|
+
DataMiner.model_names.add name
|
101
|
+
unless options[:append]
|
102
|
+
@data_miner_script = nil
|
103
|
+
end
|
104
|
+
data_miner_script.append_block blk
|
105
|
+
nil
|
106
|
+
end
|
107
|
+
end
|
108
|
+
end
|
data/lib/data_miner/attribute.rb
CHANGED
@@ -1,8 +1,14 @@
|
|
1
1
|
require 'conversions'
|
2
2
|
|
3
3
|
class DataMiner
|
4
|
+
# A mapping between a local model column and a remote data source column.
|
5
|
+
#
|
6
|
+
# @see DataMiner::ActiveRecordClassMethods#data_miner Overview of how to define data miner scripts inside of ActiveRecord models.
|
7
|
+
# @see DataMiner::Step::Import#store
|
8
|
+
# @see DataMiner::Step::Import#key
|
4
9
|
class Attribute
|
5
10
|
class << self
|
11
|
+
# @private
|
6
12
|
def check_options(options)
|
7
13
|
errors = []
|
8
14
|
if options[:dictionary].is_a?(Dictionary)
|
@@ -18,26 +24,26 @@ class DataMiner
|
|
18
24
|
end
|
19
25
|
end
|
20
26
|
|
21
|
-
VALID_OPTIONS =
|
22
|
-
from_units
|
23
|
-
to_units
|
24
|
-
static
|
25
|
-
dictionary
|
26
|
-
matcher
|
27
|
-
field_name
|
28
|
-
delimiter
|
29
|
-
split
|
30
|
-
units
|
31
|
-
sprintf
|
32
|
-
nullify
|
33
|
-
overwrite
|
34
|
-
upcase
|
35
|
-
units_field_name
|
36
|
-
units_field_number
|
37
|
-
field_number
|
38
|
-
chars
|
39
|
-
synthesize
|
40
|
-
|
27
|
+
VALID_OPTIONS = [
|
28
|
+
:from_units,
|
29
|
+
:to_units,
|
30
|
+
:static,
|
31
|
+
:dictionary,
|
32
|
+
:matcher,
|
33
|
+
:field_name,
|
34
|
+
:delimiter,
|
35
|
+
:split,
|
36
|
+
:units,
|
37
|
+
:sprintf,
|
38
|
+
:nullify,
|
39
|
+
:overwrite,
|
40
|
+
:upcase,
|
41
|
+
:units_field_name,
|
42
|
+
:units_field_number,
|
43
|
+
:field_number,
|
44
|
+
:chars,
|
45
|
+
:synthesize,
|
46
|
+
]
|
41
47
|
|
42
48
|
VALID_UNIT_DEFINITION_SETS = [
|
43
49
|
[:units],
|
@@ -48,30 +54,102 @@ class DataMiner
|
|
48
54
|
[:units_field_number, :to_units],
|
49
55
|
]
|
50
56
|
|
51
|
-
|
52
|
-
|
57
|
+
DEFAULT_SPLIT_PATTERN = /\s+/
|
58
|
+
DEFAULT_SPLIT_KEEP = 0
|
53
59
|
DEFAULT_DELIMITER = ', '
|
54
60
|
DEFAULT_NULLIFY = false
|
55
61
|
DEFAULT_UPCASE = false
|
56
62
|
DEFAULT_OVERWRITE = true
|
57
63
|
|
64
|
+
# @private
|
58
65
|
attr_reader :step
|
66
|
+
|
67
|
+
# Local column name.
|
68
|
+
# @return [Symbol]
|
59
69
|
attr_reader :name
|
70
|
+
|
71
|
+
# Synthesize a value by passing a proc that will receive +row+ and should return a final value.
|
72
|
+
#
|
73
|
+
# +row+ will be a +Hash+ with string keys or (less often) an +Array+
|
74
|
+
#
|
75
|
+
# @return [Proc]
|
60
76
|
attr_reader :synthesize
|
77
|
+
|
78
|
+
# An object that will be sent +#match(row)+ and should return a final value.
|
79
|
+
#
|
80
|
+
# Can be specified as a String which will be constantized into a class and an object of that class instantized with no arguments.
|
81
|
+
#
|
82
|
+
# +row+ will be a +Hash+ with string keys or (less often) an +Array+
|
83
|
+
# @return [Object]
|
61
84
|
attr_reader :matcher
|
85
|
+
|
86
|
+
# Index of where to find the data in the row, starting from zero.
|
87
|
+
#
|
88
|
+
# If you pass a +Range+, then multiple fields will be joined together.
|
89
|
+
#
|
90
|
+
# @return [Integer, Range]
|
62
91
|
attr_reader :field_number
|
92
|
+
|
93
|
+
# Where to find the data in the row.
|
94
|
+
# @return [Symbol]
|
63
95
|
attr_reader :field_name
|
64
|
-
|
96
|
+
|
97
|
+
# A delimiter to be used when joining fields together into a single final value. Used when +:field_number+ is a +Range+. Defaults to DEFAULT_DELIMITER.
|
98
|
+
# @return [String]
|
65
99
|
attr_reader :delimiter
|
100
|
+
|
101
|
+
# Which characters in a field to keep. Zero-based.
|
102
|
+
# @return [Range]
|
66
103
|
attr_reader :chars
|
104
|
+
|
105
|
+
# How to split a field. You specify two options:
|
106
|
+
#
|
107
|
+
# +:pattern+: what to split on. Defaults to DEFAULT_SPLIT_PATTERN.
|
108
|
+
# +:keep+: which of elements resulting from the split to keep. Defaults to DEFAULT_SPLIT_KEEP.
|
109
|
+
#
|
110
|
+
# @return [Hash]
|
67
111
|
attr_reader :split
|
112
|
+
|
113
|
+
# Final units. May invoke a conversion using https://github.com/seamusabshere/conversions
|
114
|
+
#
|
115
|
+
# If a local column named +[name]_units+ exists, it will be populated with this value.
|
116
|
+
#
|
117
|
+
# @return [Symbol]
|
68
118
|
attr_reader :to_units
|
119
|
+
|
120
|
+
# Initial units. May invoke a conversion using https://github.com/seamusabshere/conversions
|
121
|
+
# @return [Symbol]
|
69
122
|
attr_reader :from_units
|
123
|
+
|
124
|
+
# If every row specifies its own units, index of where to find the units. Zero-based.
|
125
|
+
# @return [Integer]
|
70
126
|
attr_reader :units_field_number
|
127
|
+
|
128
|
+
# If every row specifies its own units, where to find the units.
|
129
|
+
# @return [Symbol]
|
71
130
|
attr_reader :units_field_name
|
131
|
+
|
132
|
+
# A +sprintf+-style format to apply.
|
133
|
+
# @return [String]
|
72
134
|
attr_reader :sprintf
|
135
|
+
|
136
|
+
# A static value to be used.
|
137
|
+
# @return [String,Numeric,TrueClass,FalseClass,Object]
|
73
138
|
attr_reader :static
|
74
139
|
|
140
|
+
# Whether to nullify the value in a local column if it was not previously null. Defaults to DEFAULT_NULLIFY.
|
141
|
+
# @return [TrueClass,FalseClass]
|
142
|
+
attr_reader :nullify
|
143
|
+
|
144
|
+
# Whether to upcase value. Defaults to DEFAULT_UPCASE.
|
145
|
+
# @return [TrueClass,FalseClass]
|
146
|
+
attr_reader :upcase
|
147
|
+
|
148
|
+
# Whether to overwrite the value in a local column if it is not null. Defaults to DEFAULT_OVERWRITE.
|
149
|
+
# @return [TrueClass,FalseClass]
|
150
|
+
attr_reader :overwrite
|
151
|
+
|
152
|
+
# @private
|
75
153
|
def initialize(step, name, options = {})
|
76
154
|
options = options.symbolize_keys
|
77
155
|
if (errors = Attribute.check_options(options)).any?
|
@@ -81,7 +159,7 @@ class DataMiner
|
|
81
159
|
@name = name
|
82
160
|
@synthesize = options[:synthesize]
|
83
161
|
if @dictionary_boolean = options.has_key?(:dictionary)
|
84
|
-
@
|
162
|
+
@dictionary_settings = options[:dictionary]
|
85
163
|
end
|
86
164
|
@matcher = options[:matcher].is_a?(::String) ? options[:matcher].constantize.new : options[:matcher]
|
87
165
|
if @static_boolean = options.has_key?(:static)
|
@@ -94,52 +172,42 @@ class DataMiner
|
|
94
172
|
if split = options[:split]
|
95
173
|
@split = split.symbolize_keys
|
96
174
|
end
|
97
|
-
@
|
98
|
-
@
|
175
|
+
@nullify = options.fetch :nullify, DEFAULT_NULLIFY
|
176
|
+
@upcase = options.fetch :upcase, DEFAULT_UPCASE
|
99
177
|
@from_units = options[:from_units]
|
100
178
|
@to_units = options[:to_units] || options[:units]
|
101
179
|
@sprintf = options[:sprintf]
|
102
|
-
@
|
180
|
+
@overwrite = options.fetch :overwrite, DEFAULT_OVERWRITE
|
103
181
|
@units_field_name = options[:units_field_name]
|
104
182
|
@units_field_number = options[:units_field_number]
|
105
183
|
@dictionary_mutex = ::Mutex.new
|
106
184
|
end
|
107
185
|
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
@nullify_boolean
|
118
|
-
end
|
119
|
-
|
120
|
-
def upcase?
|
121
|
-
@upcase_boolean
|
122
|
-
end
|
123
|
-
|
124
|
-
def dictionary?
|
125
|
-
@dictionary_boolean
|
126
|
-
end
|
127
|
-
|
128
|
-
def convert?
|
129
|
-
from_units.present? or units_field_name.present? or units_field_number.present?
|
186
|
+
# Dictionary for translating.
|
187
|
+
#
|
188
|
+
# You pass a +Hash+ of options which is used to initialize a +DataMiner::Dictionary+.
|
189
|
+
#
|
190
|
+
# @return [DataMiner::Dictionary]
|
191
|
+
def dictionary
|
192
|
+
@dictionary || @dictionary_mutex.synchronize do
|
193
|
+
@dictionary ||= Dictionary.new(@dictionary_settings)
|
194
|
+
end
|
130
195
|
end
|
131
196
|
|
132
|
-
|
133
|
-
|
197
|
+
# @private
|
198
|
+
def set_from_row(local_record, remote_row)
|
199
|
+
if overwrite or local_record.send(name).nil?
|
200
|
+
local_record.send "#{name}=", read(remote_row)
|
201
|
+
end
|
202
|
+
if units? and ((final_to_units = (to_units || read_units(remote_row))) or nullify)
|
203
|
+
local_record.send "#{name}_units=", final_to_units
|
204
|
+
end
|
134
205
|
end
|
135
206
|
|
136
|
-
|
137
|
-
@overwrite_boolean
|
138
|
-
end
|
139
|
-
|
207
|
+
# @private
|
140
208
|
def read(row)
|
141
|
-
if matcher and
|
142
|
-
return
|
209
|
+
if matcher and matcher_output = matcher.match(row)
|
210
|
+
return matcher_output
|
143
211
|
end
|
144
212
|
if synthesize
|
145
213
|
return synthesize.call(row)
|
@@ -168,15 +236,15 @@ class DataMiner
|
|
168
236
|
value = value[chars]
|
169
237
|
end
|
170
238
|
if split
|
171
|
-
pattern = split.fetch :pattern,
|
172
|
-
keep = split.fetch :keep,
|
239
|
+
pattern = split.fetch :pattern, DEFAULT_SPLIT_PATTERN
|
240
|
+
keep = split.fetch :keep, DEFAULT_SPLIT_KEEP
|
173
241
|
value = value.to_s.split(pattern)[keep].to_s
|
174
242
|
end
|
175
243
|
value = DataMiner.compress_whitespace value
|
176
|
-
if nullify
|
244
|
+
if nullify and value.blank?
|
177
245
|
return
|
178
246
|
end
|
179
|
-
if upcase
|
247
|
+
if upcase
|
180
248
|
value = DataMiner.upcase value
|
181
249
|
end
|
182
250
|
if convert?
|
@@ -201,27 +269,33 @@ class DataMiner
|
|
201
269
|
value
|
202
270
|
end
|
203
271
|
|
204
|
-
|
205
|
-
if overwrite? or target.send(name).nil?
|
206
|
-
target.send "#{name}=", read(row)
|
207
|
-
end
|
208
|
-
if units? and ((final_to_units = (to_units || read_units(row))) or nullify?)
|
209
|
-
target.send "#{name}_units=", final_to_units
|
210
|
-
end
|
211
|
-
end
|
212
|
-
|
213
|
-
def dictionary
|
214
|
-
@dictionary || @dictionary_mutex.synchronize do
|
215
|
-
@dictionary ||= Dictionary.new(@dictionary_options)
|
216
|
-
end
|
217
|
-
end
|
218
|
-
|
272
|
+
# @private
|
219
273
|
def refresh
|
220
274
|
@dictionary = nil
|
221
275
|
end
|
222
276
|
|
223
277
|
private
|
224
278
|
|
279
|
+
def model
|
280
|
+
step.model
|
281
|
+
end
|
282
|
+
|
283
|
+
def static?
|
284
|
+
@static_boolean
|
285
|
+
end
|
286
|
+
|
287
|
+
def dictionary?
|
288
|
+
@dictionary_boolean
|
289
|
+
end
|
290
|
+
|
291
|
+
def convert?
|
292
|
+
from_units.present? or units_field_name.present? or units_field_number.present?
|
293
|
+
end
|
294
|
+
|
295
|
+
def units?
|
296
|
+
to_units.present? or units_field_name.present? or units_field_number.present?
|
297
|
+
end
|
298
|
+
|
225
299
|
def read_units(row)
|
226
300
|
if units = row[units_field_name || units_field_number]
|
227
301
|
DataMiner.compress_whitespace(units).underscore.to_sym
|