oxcelix 0.3.0 → 0.3.1
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGES +12 -21
- data/README.md +49 -9
- data/README.rdoc +36 -1
- data/lib/oxcelix/numformats.rb +17 -8
- data/lib/oxcelix/sheet.rb +41 -5
- data/oxcelix.gemspec +2 -2
- metadata +2 -2
data/CHANGES
CHANGED
@@ -1,21 +1,12 @@
|
|
1
|
-
|
2
|
-
*
|
3
|
-
*
|
4
|
-
|
5
|
-
*
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
*
|
11
|
-
|
12
|
-
*
|
13
|
-
* Merge branch '0.3.0' of https://github.com/gbiczo/oxcelix into 0.3.0 Sat Nov 23 09:43:59 2013 +0100
|
14
|
-
* Added numformats.rb. Cleaned styles.rb from unnecessary comments. Sat Nov 23 09:41:10 2013 +0100
|
15
|
-
* Added badge Fri Nov 22 12:20:48 2013 +0100
|
16
|
-
* Some speed optimizations. matrixto does not accept fmt parameter any more. fixed typos Wed Nov 13 00:03:44 2013 +0100
|
17
|
-
* Cell now has a numformat attribute. A new module called Numformats will contain methods related to numeric formatting. to_r and to_d are now obsolete. Xlsheet init parameter (styles). Stylefile is opened in the workbook. Matrixto gets a :values option (this will be obsoleted shortly). Slight Gemspec and .md description change Mon Nov 4 17:47:17 2013 +0100
|
18
|
-
* Initial version of Sheet::to_m method Sat Oct 19 11:06:42 2013 +0200
|
19
|
-
* Format array is now a constant (FARY) Fri Oct 18 21:58:39 2013 +0200
|
20
|
-
* Fixed typo in README Fri Oct 18 21:51:09 2013 +0200
|
21
|
-
* Started working on cell value formats Fri Oct 18 21:50:24 2013 +0200
|
1
|
+
0.3.1
|
2
|
+
* Sheet now has its very own to_ru and to_fmt methods. Also a to_m method has been added, which returns a Matrix of raw data.
|
3
|
+
* Documentation changes.
|
4
|
+
0.3.0
|
5
|
+
* Number formats edition that includes:
|
6
|
+
Cell#to_ru and Cell#to_fmt.
|
7
|
+
Numberformats module
|
8
|
+
Numberhelper module
|
9
|
+
0.2.4
|
10
|
+
* Bugfix release
|
11
|
+
0.2.2
|
12
|
+
* Sheet < Matrix
|
data/README.md
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
Oxcelix
|
2
2
|
=======
|
3
|
+
<a href="http://badge.fury.io/rb/oxcelix"><img src="https://badge.fury.io/rb/oxcelix@2x.png" alt="Gem Version" height="18"></a>
|
3
4
|
|
4
5
|
Oxcelix - A fast and simple .xlsx file parser
|
5
6
|
|
@@ -16,20 +17,59 @@ Oxcelix uses the great Ox gem (http://rubygems.org/gems/ox) for fast SAX-parsing
|
|
16
17
|
Synopsis
|
17
18
|
--------
|
18
19
|
|
19
|
-
To process an xlsx file:
|
20
|
+
To process an xlsx file:
|
20
21
|
|
21
|
-
`require 'oxcelix'`
|
22
|
+
`require 'oxcelix'`
|
23
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx')`
|
22
24
|
|
23
|
-
|
25
|
+
To omit certain sheets:
|
24
26
|
|
25
|
-
|
27
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])`
|
26
28
|
|
27
|
-
|
29
|
+
Include only some of the sheets:
|
28
30
|
|
29
|
-
|
31
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])`
|
30
32
|
|
31
|
-
|
33
|
+
To have the values of the merged cells copied over the mergegroup:
|
32
34
|
|
33
|
-
|
35
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
|
36
|
+
|
37
|
+
Convert a Sheet object into a collection of ruby values or formatted ruby strings:
|
38
|
+
`require 'oxcelix'`
|
39
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
|
40
|
+
`w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects`
|
41
|
+
`w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.`
|
34
42
|
|
35
|
-
|
43
|
+
Installation
|
44
|
+
------------
|
45
|
+
|
46
|
+
`gem install oxcelix`
|
47
|
+
|
48
|
+
|
49
|
+
Advantages over other Excel parsers
|
50
|
+
-----------------------------------
|
51
|
+
|
52
|
+
Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
|
53
|
+
|
54
|
+
|
55
|
+
The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
|
56
|
+
as the original file, and during the parsing, they will both be stored in the memory, which can
|
57
|
+
cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
|
58
|
+
will slow down unnecessarily the process, if we only need the data stored in that file.
|
59
|
+
|
60
|
+
|
61
|
+
The solution for the memory-consumption problem is SAX stream parsing.
|
62
|
+
|
63
|
+
|
64
|
+
Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
|
65
|
+
|
66
|
+
|
67
|
+
For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
|
68
|
+
|
69
|
+
TODO
|
70
|
+
----
|
71
|
+
* Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
|
72
|
+
* include/exclude mechanism should extend to cell areas inside Sheet objects
|
73
|
+
* Possible speedups
|
74
|
+
* Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
|
75
|
+
split conditional-formatting strings and to display e.g. thousands separated number strings
|
data/README.rdoc
CHANGED
@@ -23,10 +23,45 @@ To omit certain sheets to be processed:
|
|
23
23
|
|
24
24
|
w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])
|
25
25
|
|
26
|
-
|
26
|
+
Include only some of the sheets:
|
27
27
|
|
28
28
|
w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])
|
29
29
|
|
30
30
|
To have the values of the merged cells copied over the mergegroup:
|
31
31
|
|
32
32
|
w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
|
33
|
+
|
34
|
+
Convert a Sheet object into a collection of ruby values or formatted ruby strings:
|
35
|
+
require 'oxcelix'
|
36
|
+
w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
|
37
|
+
w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects
|
38
|
+
w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.
|
39
|
+
|
40
|
+
== Installation
|
41
|
+
|
42
|
+
gem install oxcelix
|
43
|
+
|
44
|
+
== Advantages over other Excel parsers
|
45
|
+
Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
|
46
|
+
|
47
|
+
|
48
|
+
The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
|
49
|
+
as the original file, and during the parsing, they will both be stored in the memory, which can
|
50
|
+
cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
|
51
|
+
will slow down unnecessarily the process, if we only need the data stored in that file.
|
52
|
+
|
53
|
+
|
54
|
+
The solution for the memory-consumption problem is SAX stream parsing.
|
55
|
+
|
56
|
+
|
57
|
+
Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
|
58
|
+
|
59
|
+
|
60
|
+
For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
|
61
|
+
|
62
|
+
== TODO
|
63
|
+
* Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
|
64
|
+
* include/exclude mechanism should extend to cell areas inside Sheet objects
|
65
|
+
* Possible speedups
|
66
|
+
* Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
|
67
|
+
split conditional-formatting strings and to display e.g. thousands separated number strings
|
data/lib/oxcelix/numformats.rb
CHANGED
@@ -76,30 +76,39 @@ module Oxcelix
|
|
76
76
|
module Numberhelper
|
77
77
|
include Numformats
|
78
78
|
# Get the cell's value and excel format string and return a string, a ruby Numeric or a DateTime object accordingly
|
79
|
+
# @return [Object] A ruby object that holds and represents the value stored in the cell. Conversion is based on cell formatting.
|
80
|
+
# @example Get the value of a cell:
|
81
|
+
# c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
|
82
|
+
# c.to_ru # => <DateTime: 2012-09-03T00:00:00+00:00 ((2456174j,0s,0n),+0s,2299161j)>
|
83
|
+
#
|
79
84
|
def to_ru
|
80
85
|
if !@value.numeric? || Numformats::Formatarray[@numformat.to_i][:xl] == nil || Numformats::Formatarray[@numformat.to_i][:xl].downcase == "general"
|
81
86
|
return @value
|
82
87
|
end
|
83
|
-
if Numformats::Formatarray[@numformat.to_i][:cls] == '
|
84
|
-
return eval @value
|
85
|
-
elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
|
88
|
+
if Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
|
86
89
|
return DateTime.new(1899, 12, 30) + (eval @value)
|
87
|
-
else
|
88
|
-
|
90
|
+
else Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
|
91
|
+
return eval @value rescue @value
|
89
92
|
end
|
90
93
|
end
|
91
94
|
|
92
95
|
# Get the cell's value, convert it with to_ru and finally, format it based on the value's type.
|
96
|
+
# @return [String] Value gets formatted depending on its class. If it is a DateTime, the #DateTime.strftime method is used,
|
97
|
+
# if it holds a number, the #Kernel::sprintf is run.
|
98
|
+
# @example Get the formatted value of a cell:
|
99
|
+
# c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
|
100
|
+
# c.to_fmt # => "3/9/2012"
|
101
|
+
#
|
93
102
|
def to_fmt
|
94
103
|
begin
|
95
104
|
if Numformats::Formatarray[@numformat][:cls] == 'date'
|
96
|
-
|
105
|
+
self.to_ru.strftime(Numformats::Formatarray[@numformat][:ostring]) rescue @value
|
97
106
|
elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
|
98
|
-
|
107
|
+
sprintf(Numformats::Formatarray[@numformat][:ostring], self.to_ru) rescue @value
|
99
108
|
else
|
100
109
|
return @value
|
101
110
|
end
|
102
111
|
end
|
103
112
|
end
|
104
113
|
end
|
105
|
-
end
|
114
|
+
end
|
data/lib/oxcelix/sheet.rb
CHANGED
@@ -3,6 +3,7 @@ module Oxcelix
|
|
3
3
|
# The Sheet class represents an excel sheet.
|
4
4
|
class Sheet < Matrix
|
5
5
|
include Cellhelper
|
6
|
+
include Numberhelper
|
6
7
|
# @!attribute [rw] name
|
7
8
|
# @return [String] Sheet name
|
8
9
|
# @!attribute [rw] sheetId
|
@@ -25,15 +26,50 @@ module Oxcelix
|
|
25
26
|
super(i,j[0])
|
26
27
|
end
|
27
28
|
end
|
28
|
-
|
29
|
+
|
30
|
+
#The to_m method returns a simple Matrix object filled with the raw values of the original Sheet object.
|
31
|
+
# @return [Matrix] a collection of string values (the former #Cell::value)
|
29
32
|
def to_m(*attrs)
|
30
|
-
m=Matrix.build(self.
|
31
|
-
self.
|
33
|
+
m=Matrix.build(self.row_size, self.column_size){nil}
|
34
|
+
self.each_with_index do |x, row, col|
|
32
35
|
if attrs.size == 0 || attrs.nil?
|
33
|
-
m[row, col]=x.value
|
36
|
+
m[row, col] = x.value
|
37
|
+
end
|
38
|
+
end
|
39
|
+
return m
|
40
|
+
end
|
41
|
+
|
42
|
+
# The to_ru method returns a Matrix of "rubified" values. It basically builds a new Matrix
|
43
|
+
# and puts the result of the #Cell::to_ru method of every cell of the original sheet in
|
44
|
+
# the corresponding Matrix cell.
|
45
|
+
# @return [Matrix] a collection of ruby objects (#Integers, #Floats, #DateTimes, #Rationals, #Strings)
|
46
|
+
def to_ru
|
47
|
+
m=Matrix.build(self.row_size, self.column_size){nil}
|
48
|
+
self.each_with_index do |x, row, col|
|
49
|
+
if x.nil? || x.value.nil?
|
50
|
+
m[row, col] = nil
|
51
|
+
else
|
52
|
+
m[row, col] = x.to_ru
|
34
53
|
end
|
35
54
|
end
|
36
55
|
return m
|
37
56
|
end
|
38
|
-
|
57
|
+
|
58
|
+
# The to_fmt method returns a Matrix of "formatted" values. It basically builds a new Matrix
|
59
|
+
# and puts the result of the #Cell::to_fmt method of every cell of the original sheet in
|
60
|
+
# the corresponding Matrix cell. The #Cell::to_fmt will pass the original values to to_ru, and then
|
61
|
+
# depending on the value, will run strftime on DateTime objects and sprintf on numeric types.
|
62
|
+
# @return [Matrix] a collection of Strings
|
63
|
+
def to_fmt
|
64
|
+
m=Matrix.build(self.row_size, self.column_size){nil}
|
65
|
+
self.each_with_index do |x, row, col|
|
66
|
+
if x.nil? || x.value.nil?
|
67
|
+
m[row, col] = nil
|
68
|
+
else
|
69
|
+
m[row, col] = x.to_fmt
|
70
|
+
end
|
71
|
+
end
|
72
|
+
return m
|
73
|
+
end
|
74
|
+
end
|
39
75
|
end
|
data/oxcelix.gemspec
CHANGED
@@ -3,8 +3,8 @@
|
|
3
3
|
require 'rake'
|
4
4
|
Gem::Specification.new do |s|
|
5
5
|
s.name = 'oxcelix'
|
6
|
-
s.version = '0.3.
|
7
|
-
s.date = '2013-
|
6
|
+
s.version = '0.3.1'
|
7
|
+
s.date = '2013-12-07'
|
8
8
|
s.summary = 'A fast Excel 2007/2010 file parser'
|
9
9
|
s.description = 'A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects'
|
10
10
|
s.authors = 'Giovanni Biczo'
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: oxcelix
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.1
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-
|
12
|
+
date: 2013-12-07 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: ox
|