iron-import 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 70c4748d780e9854cbd60622563b74d3b7ce2b5c
4
+ data.tar.gz: d6503f0f7a08b4c88da5813b3114446baf1fff1a
5
+ SHA512:
6
+ metadata.gz: 488a0e4b2d8ed83914bb2a6c907358ee584c0849f26bf9e64d6cc4bd8c2296997e4bc580f59b3bff4db6fa699a6abf94f5a85cd31c1585f03f728523025529a3
7
+ data.tar.gz: 00c6e27cf433423c9c1cc14828c11cd895459b0c12e86aa57ec65b35b049b0b7939dda98edd3359aa4dab8af945b0a17b9ef5b7fc486300edeb8b987b21d65dd
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --require <%= File.join(File.expand_path(File.dirname(__FILE__)), 'spec', 'spec_helper.rb') %>
data/History.txt ADDED
@@ -0,0 +1,12 @@
1
+ == 0.5.0 / 2015-02-XX
2
+
3
+ * Initial revision
4
+ * Support for CSV, XLS and XLSX importing
5
+ * Multiple sheet support
6
+ * Automatic header and start-of-data detection
7
+ * Value coercion to :string, :integer, :float, :date, and :cents
8
+ * Custom parsing of raw cell values
9
+ * Custom validation of cell values
10
+ * Conditional row filtering
11
+ * Error and warning aggregation, by sheet/row as appropriate
12
+ * Automatic stream-to-file conversion where needed by underlying libs
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2015 Irongaze Consulting LLC
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ 'Software'), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
17
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
18
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
19
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,70 @@
1
+ = GEM: iron-import
2
+
3
+ Written by Rob Morris @ Irongaze Consulting LLC (http://irongaze.com)
4
+
5
+ == DESCRIPTION
6
+
7
+ Simple, reliable tabular data import.
8
+
9
+ This gem provides a set of classes to support automating import of tabular data from
10
+ CSV, XLS or XLSX files. Provides help in defining columns, auto-detecting column
11
+ order, pre-parsing data, and error/warning tracking.
12
+
13
+ The Roo/Spreadsheet gems do a great job of providing general purpose spreadsheet reading.
14
+ However, using them with unreliable user submitted data requires a lot of error checking,
15
+ monkeying with data coercion, etc. At Irongaze, we do a lot of work with growing
16
+ businesses, where Excel files are the lingua franca for all kinds of uses. This gem
17
+ attempts to extract years of experience building one-off importers into a simple library
18
+ for rapid import coding.
19
+
20
+ This is NOT a general-purpose tool for reading spreadsheets. If you want access to
21
+ cell styling, reading underlying formulas, etc., you will be better served building
22
+ a custom importer based on Roo. But if you're looking to take an uploaded CSV file,
23
+ validate and coerce values, then write each row to a database, all the while tracking
24
+ any warnings and errors encountered... well, this is the library for you!
25
+
26
+ IMPORTANT NOTE: this gem is in flux as we work to define the best possible abstraction
27
+ for the task. Breaking changes will be noted by increases in the second-level version,
28
+ ie 0.5.0 and 0.5.1 will be compatible, but 0.6.0 will not.
29
+
30
+ == SAMPLE USAGE
31
+
32
+ # Define our importer, with two columns. The importer will look for a row containing
33
+ # "name" and "description" (case insensitively) and automatically determine column
34
+ # order and starting row of the data.
35
+ importer = Importer.build do
36
+ column :name
37
+ column :description
38
+ end
39
+
40
+ # Import the provided file row-by-row if importing succeeds, automatically
41
+ # using the proper library to read CSV data. This same code would work
42
+ # with XLS or XLSX files with no changes to the code.
43
+ if importer.import('/tmp/source.csv')
44
+ importer.process do |row|
45
+ puts row[:name] + ' = ' + row[:description]
46
+ end
47
+ end
48
+
49
+ == REQUIREMENTS
50
+
51
+ Depends on the iron-extensions and iron-dsl gems, and optionally requires the roo gem to support XLS and
52
+ XLSX file import and parsing. Without roo, all you get is CSV.
53
+
54
+ Requires RSpec and roo to build/test.
55
+
56
+ == INSTALLATION
57
+
58
+ To install, simply run:
59
+
60
+ sudo gem install iron-import
61
+
62
+ RVM users can skip the sudo:
63
+
64
+ gem install iron-import
65
+
66
+ Then use
67
+
68
+ require 'iron/import'
69
+
70
+ to require the library code.
data/Version.txt ADDED
@@ -0,0 +1 @@
1
+ 0.5.0
@@ -0,0 +1,177 @@
1
+ class Importer
2
+
3
+ # Columns represent the settings for importing a given column within a Sheet. They do not
4
+ # hold data, rather they capture the settings needed for identifying the column in the header,
5
+ # how to parse and validate each of their cell's data, and so forth.
6
+ #
7
+ # Here's the complete list of column configuration options:
8
+ #
9
+ # Importer.build do
10
+ # column :key do
11
+ # # Set a fixed position - may be a column number or a letter-based
12
+ # # column description, ie 'A' == 1. In most cases, you can leave
13
+ # # this defaulted to nil, which will mean "look for the proper header"
14
+ # position 'C'
15
+ #
16
+ # # Specify a regex to locate the header for this column, defaults to
17
+ # # finding a string containing the key.
18
+ # header /(price|cost)/i
19
+ #
20
+ # # Tells the data parser what type of data this column contains, one
21
+ # # of :integer, :string, :date, :float, or :cents. Defaults to :string.
22
+ # type :cents
23
+ #
24
+ # # Instead of a type, you can set an explicit parse block. Be aware
25
+ # # that different source types may give you different raw values for what
26
+ # # seems like the "same" source value, for example an Excel source file
27
+ # # will give you a float value for all numeric types, even "integers"
28
+ # parse do |raw_value|
29
+ # raw_value.to_i + 1000
30
+ # end
31
+ #
32
+ # # You can also add a custom validator to check the value and add
33
+ # # an error if it's not within a given range, or whatever:
34
+ # validate do |parsed_value|
35
+ # raise "Out of range" unless (parsed_value > 0 && parsed_value < 5000)
36
+ # end
37
+ # end
38
+ # end
39
+ #
40
+ class Column
41
+
42
+ # Holds load-time data
43
+ class Data
44
+ attr_accessor :index
45
+
46
+ def pos
47
+ @index ? Column::index_to_pos(@index) : 'Unknown'
48
+ end
49
+ end
50
+
51
+ # Core info
52
+ attr_reader :key
53
+ attr_reader :data
54
+
55
+ # Configuration
56
+ dsl_flag :required
57
+ dsl_accessor :header, :position, :type
58
+ dsl_accessor :parse, :validate
59
+
60
+ def self.pos_to_index(pos)
61
+ raise 'Invalid column position: ' + pos.inspect unless pos.is_a?(String) && pos.match(/\A[a-z]{1,3}\z/i)
62
+ vals = pos.upcase.bytes.collect {|b| b - 64}
63
+ total = 0
64
+ multiplier = 1
65
+ vals.reverse.each do |val|
66
+ total += val * multiplier
67
+ multiplier *= 26
68
+ end
69
+ total - 1
70
+ end
71
+
72
+ def self.index_to_pos(index)
73
+ val = index.to_i
74
+ raise 'Invalid column index: ' + index.inspect if (!index.is_a?(Fixnum) || index.to_i < 0)
75
+
76
+ chars = ('A'..'Z').to_a
77
+ str = ''
78
+ while index > 25
79
+ str = chars[index % 26] + str
80
+ index /= 26
81
+ index -= 1
82
+ end
83
+ str = chars[index] + str
84
+ str
85
+ end
86
+
87
+ def initialize(sheet, key)
88
+ # Save off our info
89
+ @key = key
90
+ @sheet = sheet
91
+ @importer = @sheet.importer
92
+
93
+ # Return it as a string, by default
94
+ @type = :string
95
+
96
+ # By default, we allow empty values
97
+ @required = false
98
+
99
+ # Position can be explicitly set
100
+ @position = nil
101
+
102
+ # By default, don't parse incoming data, just pass it through
103
+ @parse = nil
104
+
105
+ # Default matcher, looks for the presence of the column key as text anywhere
106
+ # in the header string, ignoring case and using underscores as spaces, ie
107
+ # :order_id => /\A\s*order id\s*\z/i
108
+ @header = Regexp.new('\A\s*' + key.to_s.gsub('_', ' ') + '\s*\z', Regexp::IGNORECASE)
109
+
110
+ # Reset our state to pre-load status
111
+ reset
112
+ end
113
+
114
+ def build(&block)
115
+ DslProxy.exec(self, &block)
116
+ end
117
+
118
+ def reset
119
+ @data = Data.new
120
+ end
121
+
122
+ # When true, matches either the passed value or the index (if position has been explicitly set)
123
+ def match_header?(text, index)
124
+ res = index == self.fixed_index || (@header && !@header.match(text).nil?)
125
+ # puts "#{@header.inspect} ~ #{text.inspect} => #{res.inspect}"
126
+ res
127
+ end
128
+
129
+ # Use any custom parser defined to process the given value, capturing
130
+ # errors as needed
131
+ def parse_value(row, val)
132
+ return val if @parse.nil?
133
+ begin
134
+ @parse.call(val)
135
+ rescue Exception => e
136
+ @importer.add_error(row, "Error parsing #{self}: #{e}")
137
+ nil
138
+ end
139
+ end
140
+
141
+ def validate_value(row, val)
142
+ return unless @validate
143
+ begin
144
+ @validate.call(val)
145
+ true
146
+ rescue Exception => e
147
+ @importer.add_error(row, "Validation error in #{self}: #{e}")
148
+ false
149
+ end
150
+ end
151
+
152
+ def fixed_index
153
+ return nil unless @position
154
+ if @position.is_a?(Fixnum)
155
+ @position - 1
156
+ elsif @position.is_a?(String)
157
+ Column.pos_to_index(@position)
158
+ end
159
+ end
160
+
161
+ def to_s
162
+ 'Column ' + @data.pos
163
+ end
164
+
165
+ def to_a
166
+ @sheet.data.rows.collect {|r| r[@key] }
167
+ end
168
+
169
+ def to_h
170
+ res = {}
171
+ @sheet.data.rows.collect {|r| res[r.num] = r[@key] }
172
+ res
173
+ end
174
+
175
+ end
176
+
177
+ end
@@ -0,0 +1,26 @@
1
+ require 'csv'
2
+
3
+ class Importer
4
+
5
+ class CsvReader < DataReader
6
+
7
+ def initialize(importer)
8
+ super(importer, :csv)
9
+ end
10
+
11
+ def load_stream(stream)
12
+ text = stream.read
13
+ encoding = @importer.encoding || 'UTF-8'
14
+ raw_rows = CSV.parse(text, :encoding => "#{encoding}:UTF-8")
15
+ @importer.default_sheet.parse_raw_data(raw_rows)
16
+ end
17
+
18
+ def load_file(path)
19
+ encoding = @importer.encoding || 'UTF-8'
20
+ raw_rows = CSV.read(path, :encoding => "#{encoding}:UTF-8")
21
+ @importer.default_sheet.parse_raw_data(raw_rows)
22
+ end
23
+
24
+ end
25
+
26
+ end
@@ -0,0 +1,176 @@
1
+ class Importer
2
+
3
+ # Base class for our input reading - dealing with the raw file/stream,
4
+ # and extracting raw values. In addition, we provide the base
5
+ # data coercion/parsing for our derived classes.
6
+ class DataReader
7
+
8
+ # Attributes
9
+ attr_reader :format
10
+
11
+ def self.verify_roo!
12
+ if Gem::Specification.find_all_by_name('roo', '~> 1.13.0').empty?
13
+ raise "You are attempting to use the iron-import gem to import an Excel file. Doing so requires installing the roo gem, version 1.13.0 or later."
14
+ end
15
+ end
16
+
17
+ def self.for_format(importer, format)
18
+ case format
19
+ when :csv
20
+ CsvReader.new(importer)
21
+ when :xls
22
+ verify_roo!
23
+ XlsReader.new(importer)
24
+ when :xlsx
25
+ verify_roo!
26
+ XlsxReader.new(importer)
27
+ else
28
+ nil
29
+ end
30
+ end
31
+
32
+ def self.for_path(importer, path)
33
+ format = path.to_s.extract(/\.(csv|xlsx?)\z/i)
34
+ if format
35
+ format = format.downcase.to_sym
36
+ for_format(importer, format)
37
+ else
38
+ nil
39
+ end
40
+ end
41
+
42
+ def self.for_stream(importer, stream)
43
+ path = path_from_stream(stream)
44
+ for_path(importer, path)
45
+ end
46
+
47
+ # Try to find the original file name for the given stream,
48
+ # as in the case where a file is uploaded to Rails and we're dealing with an
49
+ # ActionDispatch::Http::UploadedFile.
50
+ def self.path_from_stream(stream)
51
+ if stream.respond_to?(:original_filename)
52
+ stream.original_filename
53
+ elsif stream.respond_to?(:path)
54
+ stream.path
55
+ else
56
+ nil
57
+ end
58
+ end
59
+
60
+ def initialize(importer, format)
61
+ @importer = importer
62
+ @format = format
63
+ @multisheet = true
64
+ end
65
+
66
+ def load(path_or_stream)
67
+ # Figure out what we've been passed, and handle it
68
+ if path_or_stream.respond_to?(:read)
69
+ # We have a stream (open file, upload, whatever)
70
+ if respond_to?(:load_stream)
71
+ # Stream loader defined, run it
72
+ load_stream(path_or_stream)
73
+ else
74
+ # Write to temp file, as some of our readers only read physical files, annoyingly
75
+ file = Tempfile.new(['importer', ".#{format}"])
76
+ file.binmode
77
+ begin
78
+ file.write path_or_stream.read
79
+ file.close
80
+ load_file(file.path)
81
+ ensure
82
+ file.close
83
+ file.unlink
84
+ end
85
+ end
86
+
87
+ elsif path_or_stream.is_a?(String)
88
+ # Assume it's a path
89
+ if respond_to?(:load_file)
90
+ # We're all set, load up the given path
91
+ load_file(path_or_stream)
92
+ else
93
+ # No file handler, so open the file and run the stream processor
94
+ file = File.open(path_or_stream, 'rb')
95
+ load_stream(file)
96
+ end
97
+
98
+ else
99
+ raise "Unable to load data: #{path_or_stream.inspect}"
100
+ end
101
+
102
+ # Return our status
103
+ !@importer.has_errors?
104
+ end
105
+
106
+ # Provides default value parsing/coersion for all derived data readers. Attempts to be clever and
107
+ # handle edge cases like converting '5.00' to 5 when in integer mode, etc. If you find your inputs aren't
108
+ # being parsed correctly, add a custom #parse block on your Column definition.
109
+ def parse_value(val, type)
110
+ return nil if val.nil? || val.to_s == ''
111
+
112
+ case type
113
+ when :string then
114
+ val = val.to_s.strip
115
+ val.blank? ? nil : val
116
+
117
+ when :integer, :int then
118
+ if val.class < Numeric
119
+ # If numeric, verify that there's no decimal places to worry about
120
+ if (val.to_f % 1.0 == 0.0)
121
+ val.to_i
122
+ else
123
+ nil
124
+ end
125
+ else
126
+ # Convert to string, strip off trailing decimal zeros
127
+ val = val.to_s.strip.gsub(/\.0*$/, '')
128
+ if val.integer?
129
+ val.to_i
130
+ else
131
+ nil
132
+ end
133
+ end
134
+
135
+ when :float then
136
+ if val.class < Numeric
137
+ val.to_f
138
+ else
139
+ # Convert to string, strip off trailing decimal zeros
140
+ val = val.to_s.strip
141
+ if val.match(/\A-?[0-9]+(?:\.[0-9]+)?\z/)
142
+ val.to_f
143
+ else
144
+ nil
145
+ end
146
+ end
147
+
148
+ when :cents then
149
+ if val.is_a?(String)
150
+ val = val.gsub(/\s*\$\s*/, '')
151
+ end
152
+ intval = parse_value(val, :integer)
153
+ if !val.is_a?(Float) && intval
154
+ intval * 100
155
+ else
156
+ floatval = parse_value(val, :float)
157
+ if floatval
158
+ (floatval * 100).to_i
159
+ else
160
+ nil
161
+ end
162
+ end
163
+
164
+ when :date then
165
+ # Pull out the date part of the string and convert
166
+ date_str = val.to_s.extract(/[0-9]+[\-\/][0-9]+[\-\/][0-9]+/)
167
+ date_str.to_date rescue nil
168
+
169
+ else
170
+ raise "Unknown column type #{type.inspect} - unimplemented?"
171
+ end
172
+ end
173
+
174
+ end
175
+
176
+ end
@@ -0,0 +1,66 @@
1
+ class Importer
2
+
3
+ class Error
4
+
5
+ attr_reader :sheet, :row, :text
6
+
7
+ def initialize(context, text)
8
+ if context.is_a?(Importer::Sheet)
9
+ @sheet = context
10
+ elsif context.is_a?(Importer::Row)
11
+ @row = context
12
+ @sheet = context.sheet
13
+ end
14
+ @text = text.to_s
15
+ end
16
+
17
+ def summary
18
+ summary = ''
19
+ if @row
20
+ summary += "#{@sheet} #{@row}: "
21
+ elsif @sheet
22
+ summary += "#{@sheet}: "
23
+ end
24
+ summary + @text
25
+ end
26
+
27
+ def to_s
28
+ summary
29
+ end
30
+
31
+ # Returns the level at which this error occurred, one of
32
+ # :row, :sheet, :importer
33
+ def level
34
+ return :row if @row
35
+ return :sheet if @sheet
36
+ return :importer
37
+ end
38
+
39
+ def row_level?
40
+ level == :row
41
+ end
42
+
43
+ def sheet_level?
44
+ level == :sheet
45
+ end
46
+
47
+ def importer_level?
48
+ level == :importer
49
+ end
50
+
51
+ # Returns true if this error is for the given context, where
52
+ # context can be a Row, Sheet or Importer instance.
53
+ def for_context?(context)
54
+ case context
55
+ when Row
56
+ return @row == context
57
+ when Sheet
58
+ return @sheet == context
59
+ else
60
+ return true
61
+ end
62
+ end
63
+
64
+ end
65
+
66
+ end