data_miner 2.0.1 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +5 -7
- data/CHANGELOG +13 -0
- data/LICENSE +1 -1
- data/README.markdown +112 -0
- data/data_miner.gemspec +2 -2
- data/lib/data_miner.rb +26 -12
- data/lib/data_miner/active_record_class_methods.rb +108 -0
- data/lib/data_miner/attribute.rb +150 -76
- data/lib/data_miner/dictionary.rb +40 -18
- data/lib/data_miner/run.rb +35 -0
- data/lib/data_miner/script.rb +123 -2
- data/lib/data_miner/step.rb +11 -3
- data/lib/data_miner/step/import.rb +100 -64
- data/lib/data_miner/step/process.rb +46 -28
- data/lib/data_miner/step/tap.rb +156 -123
- data/lib/data_miner/version.rb +1 -1
- data/test/test_safety.rb +61 -25
- metadata +8 -6
- data/README.rdoc +0 -289
- data/lib/data_miner/active_record_extensions.rb +0 -38
data/.gitignore
CHANGED
data/CHANGELOG
CHANGED
@@ -1,3 +1,16 @@
|
|
1
|
+
2.0.2 / 2012-05-04
|
2
|
+
|
3
|
+
* Breaking changes
|
4
|
+
|
5
|
+
* Import descriptions are no longer optional
|
6
|
+
* Import options are no longer optional (but then, they never were)
|
7
|
+
|
8
|
+
* Enhancements
|
9
|
+
|
10
|
+
* Real documentation!
|
11
|
+
* Replace class-level mutexes with simple Thread.exclusive calls
|
12
|
+
* Simplified DataMiner::Dictionary
|
13
|
+
|
1
14
|
2.0.1 / 2012-04-18
|
2
15
|
|
3
16
|
* Enhancements
|
data/LICENSE
CHANGED
data/README.markdown
ADDED
@@ -0,0 +1,112 @@
|
|
1
|
+
# data_miner
|
2
|
+
|
3
|
+
Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models.
|
4
|
+
|
5
|
+
Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.
|
6
|
+
|
7
|
+
## Real-world usage
|
8
|
+
|
9
|
+
<p><a href="http://brighterplanet.com"><img src="https://s3.amazonaws.com/static.brighterplanet.com/assets/logos/flush-left/inline/green/rasterized/brighter_planet-160-transparent.png" alt="Brighter Planet logo"/></a></p>
|
10
|
+
|
11
|
+
We use `data_miner` for [data science at Brighter Planet](http://brighterplanet.com/research) and in production at
|
12
|
+
|
13
|
+
* [Brighter Planet's reference data web service](http://data.brighterplanet.com)
|
14
|
+
* [Brighter Planet's impact estimate web service](http://impact.brighterplanet.com)
|
15
|
+
|
16
|
+
The killer combination for us is:
|
17
|
+
|
18
|
+
1. [`active_record_inline_schema`](https://github.com/seamusabshere/active_record_inline_schema) - define table structure
|
19
|
+
2. [`remote_table`](https://github.com/seamusabshere/remote_table) - download data and parse it
|
20
|
+
3. [`errata`](https://github.com/seamusabshere/errata) - apply corrections in a transparent way
|
21
|
+
4. [`data_miner`](https://github.com/seamusabshere/remote_table) (this library!) - import data idempotently
|
22
|
+
|
23
|
+
## Documentation
|
24
|
+
|
25
|
+
Check out the [extensive documentation](http://rdoc.info/github/seamusabshere/data_miner).
|
26
|
+
|
27
|
+
## Quick start
|
28
|
+
|
29
|
+
You define <code>data_miner</code> blocks in your ActiveRecord models. For example, in <code>app/models/country.rb</code>:
|
30
|
+
|
31
|
+
class Country < ActiveRecord::Base
|
32
|
+
self.primary_key = 'iso_3166_code'
|
33
|
+
|
34
|
+
data_miner do
|
35
|
+
import("OpenGeoCode.org's Country Codes to Country Names list",
|
36
|
+
:url => 'http://opengeocode.org/download/countrynames.txt',
|
37
|
+
:format => :delimited,
|
38
|
+
:delimiter => '; ',
|
39
|
+
:headers => false,
|
40
|
+
:skip => 22) do
|
41
|
+
key :iso_3166_code, :field_number => 0
|
42
|
+
store :iso_3166_alpha_3_code, :field_number => 1
|
43
|
+
store :iso_3166_numeric_code, :field_number => 2
|
44
|
+
store :name, :field_number => 5
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
Now you can run:
|
50
|
+
|
51
|
+
>> Country.run_data_miner!
|
52
|
+
=> nil
|
53
|
+
|
54
|
+
## More advanced usage
|
55
|
+
|
56
|
+
The [`earth` library](https://github.com/brighterplanet/earth) has dozens of real-life examples showing how to download, pull out of a ZIP/TAR/BZ2 archive, parse, correct, and import CSVs, fixed-width files, ODS, XLS, XLSX, even HTML and XML:
|
57
|
+
|
58
|
+
<table>
|
59
|
+
<tr>
|
60
|
+
<th>Model</th>
|
61
|
+
<th>Highlights</th>
|
62
|
+
<th>Reference</th>
|
63
|
+
</tr>
|
64
|
+
<tr>
|
65
|
+
<td><a href="http://data.brighterplanet.com/aircraft">Aircraft</a></td>
|
66
|
+
<td>parsing Microsoft Frontpage HTML (!)</td>
|
67
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb">data_miner.rb</a></td>
|
68
|
+
</tr>
|
69
|
+
<tr>
|
70
|
+
<td><a href="http://data.brighterplanet.com/airports">Airports</a></td>
|
71
|
+
<td>forcing column names and use of <code>:select</code> block (<code>Proc</code>)</td>
|
72
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/air/airport/data_miner.rb">data_miner.rb</a></td>
|
73
|
+
</tr>
|
74
|
+
<tr>
|
75
|
+
<td><a href="http://data.brighterplanet.com/automobile_make_model_year_variants">Automobile model variants</a></td>
|
76
|
+
<td>super advanced usage of "custom parser" and errata</td>
|
77
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb">data_miner.rb</a></td>
|
78
|
+
</tr>
|
79
|
+
<tr>
|
80
|
+
<td><a href="http://data.brighterplanet.com/countries">Country</a></td>
|
81
|
+
<td>parsing CSV and a few other tricks</td>
|
82
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/locality/country/data_miner.rb">data_miner.rb</a></td>
|
83
|
+
</tr>
|
84
|
+
<tr>
|
85
|
+
<td><a href="http://data.brighterplanet.com/egrid_regions">EGRID regions</a></td>
|
86
|
+
<td>parsing XLS</td>
|
87
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/locality/egrid_region/data_miner.rb">data_miner.rb</a></td>
|
88
|
+
</tr>
|
89
|
+
<tr>
|
90
|
+
<td><a href="http://data.brighterplanet.com/flight_segments">Flight segment (stage)</a></td>
|
91
|
+
<td>super advanced usage of POSTing form data</td>
|
92
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb">data_miner.rb</a></td>
|
93
|
+
</tr>
|
94
|
+
<tr>
|
95
|
+
<td><a href="http://data.brighterplanet.com/zip_codes">Zip codes</a></td>
|
96
|
+
<td>downloading a ZIP file and pulling an XLSX out of it</td>
|
97
|
+
<td><a href="https://github.com/brighterplanet/earth/blob/master/lib/earth/locality/zip_code.rb">data_miner.rb</a></td>
|
98
|
+
</tr>
|
99
|
+
</table>
|
100
|
+
|
101
|
+
And many more - look for the `data_miner.rb` file that corresponds to each model. Note that you would normally put the `data_miner` declaration right inside the ActiveRecord model file... it's kept separate in `earth` so that loading it is optional.
|
102
|
+
|
103
|
+
## Authors
|
104
|
+
|
105
|
+
* Seamus Abshere <seamus@abshere.net>
|
106
|
+
* Andy Rossmeissl <andy@rossmeissl.net>
|
107
|
+
* Derek Kastner <dkastner@gmail.com>
|
108
|
+
* Ian Hough <ijhough@gmail.com>
|
109
|
+
|
110
|
+
## Copyright
|
111
|
+
|
112
|
+
Copyright (c) 2012 Brighter Planet. See LICENSE for details.
|
data/data_miner.gemspec
CHANGED
@@ -7,8 +7,8 @@ Gem::Specification.new do |s|
|
|
7
7
|
s.authors = ["Seamus Abshere", "Andy Rossmeissl", "Derek Kastner"]
|
8
8
|
s.email = ["seamus@abshere.net"]
|
9
9
|
s.homepage = "https://github.com/seamusabshere/data_miner"
|
10
|
-
s.summary = %{
|
11
|
-
s.description = %q{
|
10
|
+
s.summary = %{Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models.}
|
11
|
+
s.description = %q{Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. You can also convert units.}
|
12
12
|
|
13
13
|
s.rubyforge_project = "data_miner"
|
14
14
|
|
data/lib/data_miner.rb
CHANGED
@@ -14,7 +14,7 @@ if RUBY_VERSION >= '1.9'
|
|
14
14
|
end
|
15
15
|
end
|
16
16
|
|
17
|
-
require 'data_miner/
|
17
|
+
require 'data_miner/active_record_class_methods'
|
18
18
|
require 'data_miner/attribute'
|
19
19
|
require 'data_miner/script'
|
20
20
|
require 'data_miner/dictionary'
|
@@ -24,14 +24,13 @@ require 'data_miner/step/tap'
|
|
24
24
|
require 'data_miner/step/process'
|
25
25
|
require 'data_miner/run'
|
26
26
|
|
27
|
+
# A singleton class that holds global configuration for data mining.
|
28
|
+
#
|
29
|
+
# All of its instance methods are delegated to +DataMiner.instance+, so you can call +DataMiner.model_names+, for example.
|
30
|
+
#
|
31
|
+
# @see DataMiner::ActiveRecordClassMethods#data_miner Overview of how to define data miner scripts inside of ActiveRecord models.
|
27
32
|
class DataMiner
|
28
33
|
class << self
|
29
|
-
delegate :perform, :to => :instance
|
30
|
-
delegate :run, :to => :instance
|
31
|
-
delegate :logger, :to => :instance
|
32
|
-
delegate :logger=, :to => :instance
|
33
|
-
delegate :model_names, :to => :instance
|
34
|
-
|
35
34
|
# @private
|
36
35
|
def downcase(str)
|
37
36
|
defined?(::UnicodeUtils) ? ::UnicodeUtils.downcase(str) : str.downcase
|
@@ -48,16 +47,20 @@ class DataMiner
|
|
48
47
|
end
|
49
48
|
end
|
50
49
|
|
51
|
-
MUTEX = ::Mutex.new
|
52
50
|
INNER_SPACE = /[ ]+/
|
53
51
|
|
54
52
|
include ::Singleton
|
55
53
|
|
56
54
|
attr_writer :logger
|
57
55
|
|
56
|
+
# Run data miner scripts on models identified by their names. Defaults to all models.
|
57
|
+
#
|
58
|
+
# @param [optional, Array<String>] model_names Names of models to be run.
|
59
|
+
#
|
60
|
+
# @return [Array<DataMiner::Run>]
|
58
61
|
def perform(model_names = DataMiner.model_names)
|
59
62
|
Script.uniq do
|
60
|
-
model_names.
|
63
|
+
model_names.map do |model_name|
|
61
64
|
model_name.constantize.run_data_miner!
|
62
65
|
end
|
63
66
|
end
|
@@ -66,8 +69,11 @@ class DataMiner
|
|
66
69
|
# legacy
|
67
70
|
alias :run :perform
|
68
71
|
|
72
|
+
# Where DataMiner logs to. Defaults to +Rails.logger+ or +ActiveRecord::Base.logger+ if either is available.
|
73
|
+
#
|
74
|
+
# @return [Logger]
|
69
75
|
def logger
|
70
|
-
@logger ||
|
76
|
+
@logger || ::Thread.exclusive do
|
71
77
|
@logger ||= if defined?(::Rails)
|
72
78
|
::Rails.logger
|
73
79
|
elsif defined?(::ActiveRecord) and active_record_logger = ::ActiveRecord::Base.logger
|
@@ -79,12 +85,20 @@ class DataMiner
|
|
79
85
|
end
|
80
86
|
end
|
81
87
|
|
88
|
+
# Names of the models that have defined a data miner script.
|
89
|
+
#
|
90
|
+
# @note Models won't appear here until the files containing their data miner scripts have been +require+'d.
|
91
|
+
#
|
92
|
+
# @return [Set<String>]
|
82
93
|
def model_names
|
83
|
-
@model_names ||
|
94
|
+
@model_names || ::Thread.exclusive do
|
84
95
|
@model_names ||= ::Set.new
|
85
96
|
end
|
86
97
|
end
|
87
98
|
|
99
|
+
class << self
|
100
|
+
delegate(*DataMiner.instance_methods(false), :to => :instance)
|
101
|
+
end
|
88
102
|
end
|
89
103
|
|
90
|
-
::ActiveRecord::Base.extend ::DataMiner::
|
104
|
+
::ActiveRecord::Base.extend ::DataMiner::ActiveRecordClassMethods
|
@@ -0,0 +1,108 @@
|
|
1
|
+
require 'active_record'
|
2
|
+
require 'lock_method'
|
3
|
+
|
4
|
+
class DataMiner
|
5
|
+
# Class methods that are mixed into models (i.e. ActiveRecord::Base)
|
6
|
+
module ActiveRecordClassMethods
|
7
|
+
# Access this model's script.
|
8
|
+
#
|
9
|
+
# @return [DataMiner::Script] This model's data miner script.
|
10
|
+
def data_miner_script
|
11
|
+
@data_miner_script || ::Thread.exclusive do
|
12
|
+
@data_miner_script ||= DataMiner::Script.new(self)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
# Access to recordkeeping.
|
17
|
+
#
|
18
|
+
# @return [ActiveRecord::Relation] Records of running the data miner script.
|
19
|
+
def data_miner_runs
|
20
|
+
DataMiner::Run.scoped :conditions => { :model_name => name }
|
21
|
+
end
|
22
|
+
|
23
|
+
# Run this model's script.
|
24
|
+
#
|
25
|
+
# @return [DataMiner::Run]
|
26
|
+
def run_data_miner!
|
27
|
+
data_miner_script.perform
|
28
|
+
end
|
29
|
+
|
30
|
+
# Run the data miner scripts of parent associations. Useful for dependencies. Safe to call using +process+.
|
31
|
+
#
|
32
|
+
# @note Used extensively in https://github.com/brighterplanet/earth
|
33
|
+
#
|
34
|
+
# @example Since Provinces depend on Countries, make sure Countries are data mined first
|
35
|
+
# class Country < ActiveRecord::Base
|
36
|
+
# [...some data miner script...]
|
37
|
+
# end
|
38
|
+
# class Province < ActiveRecord::Base
|
39
|
+
# belongs_to :country
|
40
|
+
# data_miner do
|
41
|
+
# [...]
|
42
|
+
# process "make sure my dependencies have been loaded" do
|
43
|
+
# run_data_miner_on_parent_associations!
|
44
|
+
# end
|
45
|
+
# [...]
|
46
|
+
# end
|
47
|
+
# end
|
48
|
+
#
|
49
|
+
# @return [Array<DataMiner::Run>]
|
50
|
+
def run_data_miner_on_parent_associations!
|
51
|
+
reflect_on_all_associations(:belongs_to).reject do |assoc|
|
52
|
+
assoc.options[:polymorphic]
|
53
|
+
end.map do |non_polymorphic_belongs_to_assoc|
|
54
|
+
non_polymorphic_belongs_to_assoc.klass.run_data_miner!
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
# Define a data miner script.
|
59
|
+
#
|
60
|
+
# @param [optional, Hash] options
|
61
|
+
# @option options [TrueClass, FalseClass] :append (false) Add steps to existing data miner script instead of starting from scratch.
|
62
|
+
#
|
63
|
+
# @yield [] The block defining the steps.
|
64
|
+
#
|
65
|
+
# @see DataMiner::Script#import
|
66
|
+
# @see DataMiner::Script#process
|
67
|
+
# @see DataMiner::Script#tap
|
68
|
+
#
|
69
|
+
# @example Creating steps
|
70
|
+
# class MyModel < ActiveRecord::Base
|
71
|
+
# data_miner do
|
72
|
+
# process [...]
|
73
|
+
# import [...]
|
74
|
+
# import [...yes, it's ok to have more than one import step...]
|
75
|
+
# process [...]
|
76
|
+
# [...etc...]
|
77
|
+
# end
|
78
|
+
# end
|
79
|
+
#
|
80
|
+
# @example From the README
|
81
|
+
# class Country < ActiveRecord::Base
|
82
|
+
# self.primary_key = 'iso_3166_code'
|
83
|
+
# data_miner do
|
84
|
+
# import("OpenGeoCode.org's Country Codes to Country Names list",
|
85
|
+
# :url => 'http://opengeocode.org/download/countrynames.txt',
|
86
|
+
# :format => :delimited,
|
87
|
+
# :delimiter => '; ',
|
88
|
+
# :headers => false,
|
89
|
+
# :skip => 22) do
|
90
|
+
# key :iso_3166_code, :field_number => 0
|
91
|
+
# store :iso_3166_alpha_3_code, :field_number => 1
|
92
|
+
# store :iso_3166_numeric_code, :field_number => 2
|
93
|
+
# store :name, :field_number => 5
|
94
|
+
# end
|
95
|
+
# end
|
96
|
+
# end
|
97
|
+
#
|
98
|
+
# @return [nil]
|
99
|
+
def data_miner(options = {}, &blk)
|
100
|
+
DataMiner.model_names.add name
|
101
|
+
unless options[:append]
|
102
|
+
@data_miner_script = nil
|
103
|
+
end
|
104
|
+
data_miner_script.append_block blk
|
105
|
+
nil
|
106
|
+
end
|
107
|
+
end
|
108
|
+
end
|
data/lib/data_miner/attribute.rb
CHANGED
@@ -1,8 +1,14 @@
|
|
1
1
|
require 'conversions'
|
2
2
|
|
3
3
|
class DataMiner
|
4
|
+
# A mapping between a local model column and a remote data source column.
|
5
|
+
#
|
6
|
+
# @see DataMiner::ActiveRecordClassMethods#data_miner Overview of how to define data miner scripts inside of ActiveRecord models.
|
7
|
+
# @see DataMiner::Step::Import#store
|
8
|
+
# @see DataMiner::Step::Import#key
|
4
9
|
class Attribute
|
5
10
|
class << self
|
11
|
+
# @private
|
6
12
|
def check_options(options)
|
7
13
|
errors = []
|
8
14
|
if options[:dictionary].is_a?(Dictionary)
|
@@ -18,26 +24,26 @@ class DataMiner
|
|
18
24
|
end
|
19
25
|
end
|
20
26
|
|
21
|
-
VALID_OPTIONS =
|
22
|
-
from_units
|
23
|
-
to_units
|
24
|
-
static
|
25
|
-
dictionary
|
26
|
-
matcher
|
27
|
-
field_name
|
28
|
-
delimiter
|
29
|
-
split
|
30
|
-
units
|
31
|
-
sprintf
|
32
|
-
nullify
|
33
|
-
overwrite
|
34
|
-
upcase
|
35
|
-
units_field_name
|
36
|
-
units_field_number
|
37
|
-
field_number
|
38
|
-
chars
|
39
|
-
synthesize
|
40
|
-
|
27
|
+
VALID_OPTIONS = [
|
28
|
+
:from_units,
|
29
|
+
:to_units,
|
30
|
+
:static,
|
31
|
+
:dictionary,
|
32
|
+
:matcher,
|
33
|
+
:field_name,
|
34
|
+
:delimiter,
|
35
|
+
:split,
|
36
|
+
:units,
|
37
|
+
:sprintf,
|
38
|
+
:nullify,
|
39
|
+
:overwrite,
|
40
|
+
:upcase,
|
41
|
+
:units_field_name,
|
42
|
+
:units_field_number,
|
43
|
+
:field_number,
|
44
|
+
:chars,
|
45
|
+
:synthesize,
|
46
|
+
]
|
41
47
|
|
42
48
|
VALID_UNIT_DEFINITION_SETS = [
|
43
49
|
[:units],
|
@@ -48,30 +54,102 @@ class DataMiner
|
|
48
54
|
[:units_field_number, :to_units],
|
49
55
|
]
|
50
56
|
|
51
|
-
|
52
|
-
|
57
|
+
DEFAULT_SPLIT_PATTERN = /\s+/
|
58
|
+
DEFAULT_SPLIT_KEEP = 0
|
53
59
|
DEFAULT_DELIMITER = ', '
|
54
60
|
DEFAULT_NULLIFY = false
|
55
61
|
DEFAULT_UPCASE = false
|
56
62
|
DEFAULT_OVERWRITE = true
|
57
63
|
|
64
|
+
# @private
|
58
65
|
attr_reader :step
|
66
|
+
|
67
|
+
# Local column name.
|
68
|
+
# @return [Symbol]
|
59
69
|
attr_reader :name
|
70
|
+
|
71
|
+
# Synthesize a value by passing a proc that will receive +row+ and should return a final value.
|
72
|
+
#
|
73
|
+
# +row+ will be a +Hash+ with string keys or (less often) an +Array+
|
74
|
+
#
|
75
|
+
# @return [Proc]
|
60
76
|
attr_reader :synthesize
|
77
|
+
|
78
|
+
# An object that will be sent +#match(row)+ and should return a final value.
|
79
|
+
#
|
80
|
+
# Can be specified as a String which will be constantized into a class and an object of that class instantized with no arguments.
|
81
|
+
#
|
82
|
+
# +row+ will be a +Hash+ with string keys or (less often) an +Array+
|
83
|
+
# @return [Object]
|
61
84
|
attr_reader :matcher
|
85
|
+
|
86
|
+
# Index of where to find the data in the row, starting from zero.
|
87
|
+
#
|
88
|
+
# If you pass a +Range+, then multiple fields will be joined together.
|
89
|
+
#
|
90
|
+
# @return [Integer, Range]
|
62
91
|
attr_reader :field_number
|
92
|
+
|
93
|
+
# Where to find the data in the row.
|
94
|
+
# @return [Symbol]
|
63
95
|
attr_reader :field_name
|
64
|
-
|
96
|
+
|
97
|
+
# A delimiter to be used when joining fields together into a single final value. Used when +:field_number+ is a +Range+. Defaults to DEFAULT_DELIMITER.
|
98
|
+
# @return [String]
|
65
99
|
attr_reader :delimiter
|
100
|
+
|
101
|
+
# Which characters in a field to keep. Zero-based.
|
102
|
+
# @return [Range]
|
66
103
|
attr_reader :chars
|
104
|
+
|
105
|
+
# How to split a field. You specify two options:
|
106
|
+
#
|
107
|
+
# +:pattern+: what to split on. Defaults to DEFAULT_SPLIT_PATTERN.
|
108
|
+
# +:keep+: which of elements resulting from the split to keep. Defaults to DEFAULT_SPLIT_KEEP.
|
109
|
+
#
|
110
|
+
# @return [Hash]
|
67
111
|
attr_reader :split
|
112
|
+
|
113
|
+
# Final units. May invoke a conversion using https://github.com/seamusabshere/conversions
|
114
|
+
#
|
115
|
+
# If a local column named +[name]_units+ exists, it will be populated with this value.
|
116
|
+
#
|
117
|
+
# @return [Symbol]
|
68
118
|
attr_reader :to_units
|
119
|
+
|
120
|
+
# Initial units. May invoke a conversion using https://github.com/seamusabshere/conversions
|
121
|
+
# @return [Symbol]
|
69
122
|
attr_reader :from_units
|
123
|
+
|
124
|
+
# If every row specifies its own units, index of where to find the units. Zero-based.
|
125
|
+
# @return [Integer]
|
70
126
|
attr_reader :units_field_number
|
127
|
+
|
128
|
+
# If every row specifies its own units, where to find the units.
|
129
|
+
# @return [Symbol]
|
71
130
|
attr_reader :units_field_name
|
131
|
+
|
132
|
+
# A +sprintf+-style format to apply.
|
133
|
+
# @return [String]
|
72
134
|
attr_reader :sprintf
|
135
|
+
|
136
|
+
# A static value to be used.
|
137
|
+
# @return [String,Numeric,TrueClass,FalseClass,Object]
|
73
138
|
attr_reader :static
|
74
139
|
|
140
|
+
# Whether to nullify the value in a local column if it was not previously null. Defaults to DEFAULT_NULLIFY.
|
141
|
+
# @return [TrueClass,FalseClass]
|
142
|
+
attr_reader :nullify
|
143
|
+
|
144
|
+
# Whether to upcase value. Defaults to DEFAULT_UPCASE.
|
145
|
+
# @return [TrueClass,FalseClass]
|
146
|
+
attr_reader :upcase
|
147
|
+
|
148
|
+
# Whether to overwrite the value in a local column if it is not null. Defaults to DEFAULT_OVERWRITE.
|
149
|
+
# @return [TrueClass,FalseClass]
|
150
|
+
attr_reader :overwrite
|
151
|
+
|
152
|
+
# @private
|
75
153
|
def initialize(step, name, options = {})
|
76
154
|
options = options.symbolize_keys
|
77
155
|
if (errors = Attribute.check_options(options)).any?
|
@@ -81,7 +159,7 @@ class DataMiner
|
|
81
159
|
@name = name
|
82
160
|
@synthesize = options[:synthesize]
|
83
161
|
if @dictionary_boolean = options.has_key?(:dictionary)
|
84
|
-
@
|
162
|
+
@dictionary_settings = options[:dictionary]
|
85
163
|
end
|
86
164
|
@matcher = options[:matcher].is_a?(::String) ? options[:matcher].constantize.new : options[:matcher]
|
87
165
|
if @static_boolean = options.has_key?(:static)
|
@@ -94,52 +172,42 @@ class DataMiner
|
|
94
172
|
if split = options[:split]
|
95
173
|
@split = split.symbolize_keys
|
96
174
|
end
|
97
|
-
@
|
98
|
-
@
|
175
|
+
@nullify = options.fetch :nullify, DEFAULT_NULLIFY
|
176
|
+
@upcase = options.fetch :upcase, DEFAULT_UPCASE
|
99
177
|
@from_units = options[:from_units]
|
100
178
|
@to_units = options[:to_units] || options[:units]
|
101
179
|
@sprintf = options[:sprintf]
|
102
|
-
@
|
180
|
+
@overwrite = options.fetch :overwrite, DEFAULT_OVERWRITE
|
103
181
|
@units_field_name = options[:units_field_name]
|
104
182
|
@units_field_number = options[:units_field_number]
|
105
183
|
@dictionary_mutex = ::Mutex.new
|
106
184
|
end
|
107
185
|
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
@nullify_boolean
|
118
|
-
end
|
119
|
-
|
120
|
-
def upcase?
|
121
|
-
@upcase_boolean
|
122
|
-
end
|
123
|
-
|
124
|
-
def dictionary?
|
125
|
-
@dictionary_boolean
|
126
|
-
end
|
127
|
-
|
128
|
-
def convert?
|
129
|
-
from_units.present? or units_field_name.present? or units_field_number.present?
|
186
|
+
# Dictionary for translating.
|
187
|
+
#
|
188
|
+
# You pass a +Hash+ of options which is used to initialize a +DataMiner::Dictionary+.
|
189
|
+
#
|
190
|
+
# @return [DataMiner::Dictionary]
|
191
|
+
def dictionary
|
192
|
+
@dictionary || @dictionary_mutex.synchronize do
|
193
|
+
@dictionary ||= Dictionary.new(@dictionary_settings)
|
194
|
+
end
|
130
195
|
end
|
131
196
|
|
132
|
-
|
133
|
-
|
197
|
+
# @private
|
198
|
+
def set_from_row(local_record, remote_row)
|
199
|
+
if overwrite or local_record.send(name).nil?
|
200
|
+
local_record.send "#{name}=", read(remote_row)
|
201
|
+
end
|
202
|
+
if units? and ((final_to_units = (to_units || read_units(remote_row))) or nullify)
|
203
|
+
local_record.send "#{name}_units=", final_to_units
|
204
|
+
end
|
134
205
|
end
|
135
206
|
|
136
|
-
|
137
|
-
@overwrite_boolean
|
138
|
-
end
|
139
|
-
|
207
|
+
# @private
|
140
208
|
def read(row)
|
141
|
-
if matcher and
|
142
|
-
return
|
209
|
+
if matcher and matcher_output = matcher.match(row)
|
210
|
+
return matcher_output
|
143
211
|
end
|
144
212
|
if synthesize
|
145
213
|
return synthesize.call(row)
|
@@ -168,15 +236,15 @@ class DataMiner
|
|
168
236
|
value = value[chars]
|
169
237
|
end
|
170
238
|
if split
|
171
|
-
pattern = split.fetch :pattern,
|
172
|
-
keep = split.fetch :keep,
|
239
|
+
pattern = split.fetch :pattern, DEFAULT_SPLIT_PATTERN
|
240
|
+
keep = split.fetch :keep, DEFAULT_SPLIT_KEEP
|
173
241
|
value = value.to_s.split(pattern)[keep].to_s
|
174
242
|
end
|
175
243
|
value = DataMiner.compress_whitespace value
|
176
|
-
if nullify
|
244
|
+
if nullify and value.blank?
|
177
245
|
return
|
178
246
|
end
|
179
|
-
if upcase
|
247
|
+
if upcase
|
180
248
|
value = DataMiner.upcase value
|
181
249
|
end
|
182
250
|
if convert?
|
@@ -201,27 +269,33 @@ class DataMiner
|
|
201
269
|
value
|
202
270
|
end
|
203
271
|
|
204
|
-
|
205
|
-
if overwrite? or target.send(name).nil?
|
206
|
-
target.send "#{name}=", read(row)
|
207
|
-
end
|
208
|
-
if units? and ((final_to_units = (to_units || read_units(row))) or nullify?)
|
209
|
-
target.send "#{name}_units=", final_to_units
|
210
|
-
end
|
211
|
-
end
|
212
|
-
|
213
|
-
def dictionary
|
214
|
-
@dictionary || @dictionary_mutex.synchronize do
|
215
|
-
@dictionary ||= Dictionary.new(@dictionary_options)
|
216
|
-
end
|
217
|
-
end
|
218
|
-
|
272
|
+
# @private
|
219
273
|
def refresh
|
220
274
|
@dictionary = nil
|
221
275
|
end
|
222
276
|
|
223
277
|
private
|
224
278
|
|
279
|
+
def model
|
280
|
+
step.model
|
281
|
+
end
|
282
|
+
|
283
|
+
def static?
|
284
|
+
@static_boolean
|
285
|
+
end
|
286
|
+
|
287
|
+
def dictionary?
|
288
|
+
@dictionary_boolean
|
289
|
+
end
|
290
|
+
|
291
|
+
def convert?
|
292
|
+
from_units.present? or units_field_name.present? or units_field_number.present?
|
293
|
+
end
|
294
|
+
|
295
|
+
def units?
|
296
|
+
to_units.present? or units_field_name.present? or units_field_number.present?
|
297
|
+
end
|
298
|
+
|
225
299
|
def read_units(row)
|
226
300
|
if units = row[units_field_name || units_field_number]
|
227
301
|
DataMiner.compress_whitespace(units).underscore.to_sym
|