iron-import 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 70c4748d780e9854cbd60622563b74d3b7ce2b5c
4
+ data.tar.gz: d6503f0f7a08b4c88da5813b3114446baf1fff1a
5
+ SHA512:
6
+ metadata.gz: 488a0e4b2d8ed83914bb2a6c907358ee584c0849f26bf9e64d6cc4bd8c2296997e4bc580f59b3bff4db6fa699a6abf94f5a85cd31c1585f03f728523025529a3
7
+ data.tar.gz: 00c6e27cf433423c9c1cc14828c11cd895459b0c12e86aa57ec65b35b049b0b7939dda98edd3359aa4dab8af945b0a17b9ef5b7fc486300edeb8b987b21d65dd
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --require <%= File.join(File.expand_path(File.dirname(__FILE__)), 'spec', 'spec_helper.rb') %>
data/History.txt ADDED
@@ -0,0 +1,12 @@
1
+ == 0.5.0 / 2015-02-XX
2
+
3
+ * Initial revision
4
+ * Support for CSV, XLS and XLSX importing
5
+ * Multiple sheet support
6
+ * Automatic header and start-of-data detection
7
+ * Value coercion to :string, :integer, :float, :date, and :cents
8
+ * Custom parsing of raw cell values
9
+ * Custom validation of cell values
10
+ * Conditional row filtering
11
+ * Error and warning aggregation, by sheet/row as appropriate
12
+ * Automatic stream-to-file conversion where needed by underlying libs
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2015 Irongaze Consulting LLC
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ 'Software'), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
17
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
18
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
19
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,70 @@
1
+ = GEM: iron-import
2
+
3
+ Written by Rob Morris @ Irongaze Consulting LLC (http://irongaze.com)
4
+
5
+ == DESCRIPTION
6
+
7
+ Simple, reliable tabular data import.
8
+
9
+ This gem provides a set of classes to support automating import of tabular data from
10
+ CSV, XLS or XLSX files. Provides help in defining columns, auto-detecting column
11
+ order, pre-parsing data, and error/warning tracking.
12
+
13
+ The Roo/Spreadsheet gems do a great job of providing general purpose spreadsheet reading.
14
+ However, using them with unreliable user submitted data requires a lot of error checking,
15
+ monkeying with data coercion, etc. At Irongaze, we do a lot of work with growing
16
+ businesses, where Excel files are the lingua franca for all kinds of uses. This gem
17
+ attempts to extract years of experience building one-off importers into a simple library
18
+ for rapid import coding.
19
+
20
+ This is NOT a general-purpose tool for reading spreadsheets. If you want access to
21
+ cell styling, reading underlying formulas, etc., you will be better served building
22
+ a custom importer based on Roo. But if you're looking to take an uploaded CSV file,
23
+ validate and coerce values, then write each row to a database, all the while tracking
24
+ any warnings and errors encountered... well, this is the library for you!
25
+
26
+ IMPORTANT NOTE: this gem is in flux as we work to define the best possible abstraction
27
+ for the task. Breaking changes will be noted by increases in the second-level version,
28
+ ie 0.5.0 and 0.5.1 will be compatible, but 0.6.0 will not.
29
+
30
+ == SAMPLE USAGE
31
+
32
+ # Define our importer, with two columns. The importer will look for a row containing
33
+ # "name" and "description" (case insensitively) and automatically determine column
34
+ # order and starting row of the data.
35
+ importer = Importer.build do
36
+ column :name
37
+ column :description
38
+ end
39
+
40
+ # Import the provided file row-by-row if importing succeeds, automatically
41
+ # using the proper library to read CSV data. This same code would work
42
+ # with XLS or XLSX files with no changes to the code.
43
+ if importer.import('/tmp/source.csv')
44
+ importer.process do |row|
45
+ puts row[:name] + ' = ' + row[:description]
46
+ end
47
+ end
48
+
49
+ == REQUIREMENTS
50
+
51
+ Depends on the iron-extensions and iron-dsl gems, and optionally requires the roo gem to support XLS and
52
+ XLSX file import and parsing. Without roo, all you get is CSV.
53
+
54
+ Requires RSpec and roo to build/test.
55
+
56
+ == INSTALLATION
57
+
58
+ To install, simply run:
59
+
60
+ sudo gem install iron-import
61
+
62
+ RVM users can skip the sudo:
63
+
64
+ gem install iron-import
65
+
66
+ Then use
67
+
68
+ require 'iron/import'
69
+
70
+ to require the library code.
data/Version.txt ADDED
@@ -0,0 +1 @@
1
+ 0.5.0
@@ -0,0 +1,177 @@
1
+ class Importer
2
+
3
+ # Columns represent the settings for importing a given column within a Sheet. They do not
4
+ # hold data, rather they capture the settings needed for identifying the column in the header,
5
+ # how to parse and validate each of their cell's data, and so forth.
6
+ #
7
+ # Here's the complete list of column configuration options:
8
+ #
9
+ # Importer.build do
10
+ # column :key do
11
+ # # Set a fixed position - may be a column number or a letter-based
12
+ # # column description, ie 'A' == 1. In most cases, you can leave
13
+ # # this defaulted to nil, which will mean "look for the proper header"
14
+ # position 'C'
15
+ #
16
+ # # Specify a regex to locate the header for this column, defaults to
17
+ # # finding a string containing the key.
18
+ # header /(price|cost)/i
19
+ #
20
+ # # Tells the data parser what type of data this column contains, one
21
+ # # of :integer, :string, :date, :float, or :cents. Defaults to :string.
22
+ # type :cents
23
+ #
24
+ # # Instead of a type, you can set an explicit parse block. Be aware
25
+ # # that different source types may give you different raw values for what
26
+ # # seems like the "same" source value, for example an Excel source file
27
+ # # will give you a float value for all numeric types, even "integers"
28
+ # parse do |raw_value|
29
+ # raw_value.to_i + 1000
30
+ # end
31
+ #
32
+ # # You can also add a custom validator to check the value and add
33
+ # # an error if it's not within a given range, or whatever:
34
+ # validate do |parsed_value|
35
+ # raise "Out of range" unless (parsed_value > 0 && parsed_value < 5000)
36
+ # end
37
+ # end
38
+ # end
39
+ #
40
+ class Column
41
+
42
+ # Holds load-time data
43
+ class Data
44
+ attr_accessor :index
45
+
46
+ def pos
47
+ @index ? Column::index_to_pos(@index) : 'Unknown'
48
+ end
49
+ end
50
+
51
+ # Core info
52
+ attr_reader :key
53
+ attr_reader :data
54
+
55
+ # Configuration
56
+ dsl_flag :required
57
+ dsl_accessor :header, :position, :type
58
+ dsl_accessor :parse, :validate
59
+
60
+ def self.pos_to_index(pos)
61
+ raise 'Invalid column position: ' + pos.inspect unless pos.is_a?(String) && pos.match(/\A[a-z]{1,3}\z/i)
62
+ vals = pos.upcase.bytes.collect {|b| b - 64}
63
+ total = 0
64
+ multiplier = 1
65
+ vals.reverse.each do |val|
66
+ total += val * multiplier
67
+ multiplier *= 26
68
+ end
69
+ total - 1
70
+ end
71
+
72
+ def self.index_to_pos(index)
73
+ val = index.to_i
74
+ raise 'Invalid column index: ' + index.inspect if (!index.is_a?(Fixnum) || index.to_i < 0)
75
+
76
+ chars = ('A'..'Z').to_a
77
+ str = ''
78
+ while index > 25
79
+ str = chars[index % 26] + str
80
+ index /= 26
81
+ index -= 1
82
+ end
83
+ str = chars[index] + str
84
+ str
85
+ end
86
+
87
+ def initialize(sheet, key)
88
+ # Save off our info
89
+ @key = key
90
+ @sheet = sheet
91
+ @importer = @sheet.importer
92
+
93
+ # Return it as a string, by default
94
+ @type = :string
95
+
96
+ # By default, we allow empty values
97
+ @required = false
98
+
99
+ # Position can be explicitly set
100
+ @position = nil
101
+
102
+ # By default, don't parse incoming data, just pass it through
103
+ @parse = nil
104
+
105
+ # Default matcher, looks for the presence of the column key as text anywhere
106
+ # in the header string, ignoring case and using underscores as spaces, ie
107
+ # :order_id => /\A\s*order id\s*\z/i
108
+ @header = Regexp.new('\A\s*' + key.to_s.gsub('_', ' ') + '\s*\z', Regexp::IGNORECASE)
109
+
110
+ # Reset our state to pre-load status
111
+ reset
112
+ end
113
+
114
+ def build(&block)
115
+ DslProxy.exec(self, &block)
116
+ end
117
+
118
+ def reset
119
+ @data = Data.new
120
+ end
121
+
122
+ # When true, matches either the passed value or the index (if position has been explicitly set)
123
+ def match_header?(text, index)
124
+ res = index == self.fixed_index || (@header && !@header.match(text).nil?)
125
+ # puts "#{@header.inspect} ~ #{text.inspect} => #{res.inspect}"
126
+ res
127
+ end
128
+
129
+ # Use any custom parser defined to process the given value, capturing
130
+ # errors as needed
131
+ def parse_value(row, val)
132
+ return val if @parse.nil?
133
+ begin
134
+ @parse.call(val)
135
+ rescue Exception => e
136
+ @importer.add_error(row, "Error parsing #{self}: #{e}")
137
+ nil
138
+ end
139
+ end
140
+
141
+ def validate_value(row, val)
142
+ return unless @validate
143
+ begin
144
+ @validate.call(val)
145
+ true
146
+ rescue Exception => e
147
+ @importer.add_error(row, "Validation error in #{self}: #{e}")
148
+ false
149
+ end
150
+ end
151
+
152
+ def fixed_index
153
+ return nil unless @position
154
+ if @position.is_a?(Fixnum)
155
+ @position - 1
156
+ elsif @position.is_a?(String)
157
+ Column.pos_to_index(@position)
158
+ end
159
+ end
160
+
161
+ def to_s
162
+ 'Column ' + @data.pos
163
+ end
164
+
165
+ def to_a
166
+ @sheet.data.rows.collect {|r| r[@key] }
167
+ end
168
+
169
+ def to_h
170
+ res = {}
171
+ @sheet.data.rows.collect {|r| res[r.num] = r[@key] }
172
+ res
173
+ end
174
+
175
+ end
176
+
177
+ end
@@ -0,0 +1,26 @@
1
+ require 'csv'
2
+
3
+ class Importer
4
+
5
+ class CsvReader < DataReader
6
+
7
+ def initialize(importer)
8
+ super(importer, :csv)
9
+ end
10
+
11
+ def load_stream(stream)
12
+ text = stream.read
13
+ encoding = @importer.encoding || 'UTF-8'
14
+ raw_rows = CSV.parse(text, :encoding => "#{encoding}:UTF-8")
15
+ @importer.default_sheet.parse_raw_data(raw_rows)
16
+ end
17
+
18
+ def load_file(path)
19
+ encoding = @importer.encoding || 'UTF-8'
20
+ raw_rows = CSV.read(path, :encoding => "#{encoding}:UTF-8")
21
+ @importer.default_sheet.parse_raw_data(raw_rows)
22
+ end
23
+
24
+ end
25
+
26
+ end
@@ -0,0 +1,176 @@
1
+ class Importer
2
+
3
+ # Base class for our input reading - dealing with the raw file/stream,
4
+ # and extracting raw values. In addition, we provide the base
5
+ # data coercion/parsing for our derived classes.
6
+ class DataReader
7
+
8
+ # Attributes
9
+ attr_reader :format
10
+
11
+ def self.verify_roo!
12
+ if Gem::Specification.find_all_by_name('roo', '~> 1.13.0').empty?
13
+ raise "You are attempting to use the iron-import gem to import an Excel file. Doing so requires installing the roo gem, version 1.13.0 or later."
14
+ end
15
+ end
16
+
17
+ def self.for_format(importer, format)
18
+ case format
19
+ when :csv
20
+ CsvReader.new(importer)
21
+ when :xls
22
+ verify_roo!
23
+ XlsReader.new(importer)
24
+ when :xlsx
25
+ verify_roo!
26
+ XlsxReader.new(importer)
27
+ else
28
+ nil
29
+ end
30
+ end
31
+
32
+ def self.for_path(importer, path)
33
+ format = path.to_s.extract(/\.(csv|xlsx?)\z/i)
34
+ if format
35
+ format = format.downcase.to_sym
36
+ for_format(importer, format)
37
+ else
38
+ nil
39
+ end
40
+ end
41
+
42
+ def self.for_stream(importer, stream)
43
+ path = path_from_stream(stream)
44
+ for_path(importer, path)
45
+ end
46
+
47
+ # Try to find the original file name for the given stream,
48
+ # as in the case where a file is uploaded to Rails and we're dealing with an
49
+ # ActionDispatch::Http::UploadedFile.
50
+ def self.path_from_stream(stream)
51
+ if stream.respond_to?(:original_filename)
52
+ stream.original_filename
53
+ elsif stream.respond_to?(:path)
54
+ stream.path
55
+ else
56
+ nil
57
+ end
58
+ end
59
+
60
+ def initialize(importer, format)
61
+ @importer = importer
62
+ @format = format
63
+ @multisheet = true
64
+ end
65
+
66
+ def load(path_or_stream)
67
+ # Figure out what we've been passed, and handle it
68
+ if path_or_stream.respond_to?(:read)
69
+ # We have a stream (open file, upload, whatever)
70
+ if respond_to?(:load_stream)
71
+ # Stream loader defined, run it
72
+ load_stream(path_or_stream)
73
+ else
74
+ # Write to temp file, as some of our readers only read physical files, annoyingly
75
+ file = Tempfile.new(['importer', ".#{format}"])
76
+ file.binmode
77
+ begin
78
+ file.write path_or_stream.read
79
+ file.close
80
+ load_file(file.path)
81
+ ensure
82
+ file.close
83
+ file.unlink
84
+ end
85
+ end
86
+
87
+ elsif path_or_stream.is_a?(String)
88
+ # Assume it's a path
89
+ if respond_to?(:load_file)
90
+ # We're all set, load up the given path
91
+ load_file(path_or_stream)
92
+ else
93
+ # No file handler, so open the file and run the stream processor
94
+ file = File.open(path_or_stream, 'rb')
95
+ load_stream(file)
96
+ end
97
+
98
+ else
99
+ raise "Unable to load data: #{path_or_stream.inspect}"
100
+ end
101
+
102
+ # Return our status
103
+ !@importer.has_errors?
104
+ end
105
+
106
+ # Provides default value parsing/coersion for all derived data readers. Attempts to be clever and
107
+ # handle edge cases like converting '5.00' to 5 when in integer mode, etc. If you find your inputs aren't
108
+ # being parsed correctly, add a custom #parse block on your Column definition.
109
+ def parse_value(val, type)
110
+ return nil if val.nil? || val.to_s == ''
111
+
112
+ case type
113
+ when :string then
114
+ val = val.to_s.strip
115
+ val.blank? ? nil : val
116
+
117
+ when :integer, :int then
118
+ if val.class < Numeric
119
+ # If numeric, verify that there's no decimal places to worry about
120
+ if (val.to_f % 1.0 == 0.0)
121
+ val.to_i
122
+ else
123
+ nil
124
+ end
125
+ else
126
+ # Convert to string, strip off trailing decimal zeros
127
+ val = val.to_s.strip.gsub(/\.0*$/, '')
128
+ if val.integer?
129
+ val.to_i
130
+ else
131
+ nil
132
+ end
133
+ end
134
+
135
+ when :float then
136
+ if val.class < Numeric
137
+ val.to_f
138
+ else
139
+ # Convert to string, strip off trailing decimal zeros
140
+ val = val.to_s.strip
141
+ if val.match(/\A-?[0-9]+(?:\.[0-9]+)?\z/)
142
+ val.to_f
143
+ else
144
+ nil
145
+ end
146
+ end
147
+
148
+ when :cents then
149
+ if val.is_a?(String)
150
+ val = val.gsub(/\s*\$\s*/, '')
151
+ end
152
+ intval = parse_value(val, :integer)
153
+ if !val.is_a?(Float) && intval
154
+ intval * 100
155
+ else
156
+ floatval = parse_value(val, :float)
157
+ if floatval
158
+ (floatval * 100).to_i
159
+ else
160
+ nil
161
+ end
162
+ end
163
+
164
+ when :date then
165
+ # Pull out the date part of the string and convert
166
+ date_str = val.to_s.extract(/[0-9]+[\-\/][0-9]+[\-\/][0-9]+/)
167
+ date_str.to_date rescue nil
168
+
169
+ else
170
+ raise "Unknown column type #{type.inspect} - unimplemented?"
171
+ end
172
+ end
173
+
174
+ end
175
+
176
+ end
@@ -0,0 +1,66 @@
1
+ class Importer
2
+
3
+ class Error
4
+
5
+ attr_reader :sheet, :row, :text
6
+
7
+ def initialize(context, text)
8
+ if context.is_a?(Importer::Sheet)
9
+ @sheet = context
10
+ elsif context.is_a?(Importer::Row)
11
+ @row = context
12
+ @sheet = context.sheet
13
+ end
14
+ @text = text.to_s
15
+ end
16
+
17
+ def summary
18
+ summary = ''
19
+ if @row
20
+ summary += "#{@sheet} #{@row}: "
21
+ elsif @sheet
22
+ summary += "#{@sheet}: "
23
+ end
24
+ summary + @text
25
+ end
26
+
27
+ def to_s
28
+ summary
29
+ end
30
+
31
+ # Returns the level at which this error occurred, one of
32
+ # :row, :sheet, :importer
33
+ def level
34
+ return :row if @row
35
+ return :sheet if @sheet
36
+ return :importer
37
+ end
38
+
39
+ def row_level?
40
+ level == :row
41
+ end
42
+
43
+ def sheet_level?
44
+ level == :sheet
45
+ end
46
+
47
+ def importer_level?
48
+ level == :importer
49
+ end
50
+
51
+ # Returns true if this error is for the given context, where
52
+ # context can be a Row, Sheet or Importer instance.
53
+ def for_context?(context)
54
+ case context
55
+ when Row
56
+ return @row == context
57
+ when Sheet
58
+ return @sheet == context
59
+ else
60
+ return true
61
+ end
62
+ end
63
+
64
+ end
65
+
66
+ end