oxcelix 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGES +12 -21
- data/README.md +49 -9
- data/README.rdoc +36 -1
- data/lib/oxcelix/numformats.rb +17 -8
- data/lib/oxcelix/sheet.rb +41 -5
- data/oxcelix.gemspec +2 -2
- metadata +2 -2
data/CHANGES
CHANGED
@@ -1,21 +1,12 @@
|
|
1
|
-
|
2
|
-
*
|
3
|
-
*
|
4
|
-
|
5
|
-
*
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
*
|
11
|
-
|
12
|
-
*
|
13
|
-
* Merge branch '0.3.0' of https://github.com/gbiczo/oxcelix into 0.3.0 Sat Nov 23 09:43:59 2013 +0100
|
14
|
-
* Added numformats.rb. Cleaned styles.rb from unnecessary comments. Sat Nov 23 09:41:10 2013 +0100
|
15
|
-
* Added badge Fri Nov 22 12:20:48 2013 +0100
|
16
|
-
* Some speed optimizations. matrixto does not accept fmt parameter any more. fixed typos Wed Nov 13 00:03:44 2013 +0100
|
17
|
-
* Cell now has a numformat attribute. A new module called Numformats will contain methods related to numeric formatting. to_r and to_d are now obsolete. Xlsheet init parameter (styles). Stylefile is opened in the workbook. Matrixto gets a :values option (this will be obsoleted shortly). Slight Gemspec and .md description change Mon Nov 4 17:47:17 2013 +0100
|
18
|
-
* Initial version of Sheet::to_m method Sat Oct 19 11:06:42 2013 +0200
|
19
|
-
* Format array is now a constant (FARY) Fri Oct 18 21:58:39 2013 +0200
|
20
|
-
* Fixed typo in README Fri Oct 18 21:51:09 2013 +0200
|
21
|
-
* Started working on cell value formats Fri Oct 18 21:50:24 2013 +0200
|
1
|
+
0.3.1
|
2
|
+
* Sheet now has its very own to_ru and to_fmt methods. Also a to_m method has been added, which returns a Matrix of raw data.
|
3
|
+
* Documentation changes.
|
4
|
+
0.3.0
|
5
|
+
* Number formats edition that includes:
|
6
|
+
Cell#to_ru and Cell#to_fmt.
|
7
|
+
Numberformats module
|
8
|
+
Numberhelper module
|
9
|
+
0.2.4
|
10
|
+
* Bugfix release
|
11
|
+
0.2.2
|
12
|
+
* Sheet < Matrix
|
data/README.md
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
Oxcelix
|
2
2
|
=======
|
3
|
+
<a href="http://badge.fury.io/rb/oxcelix"><img src="https://badge.fury.io/rb/oxcelix@2x.png" alt="Gem Version" height="18"></a>
|
3
4
|
|
4
5
|
Oxcelix - A fast and simple .xlsx file parser
|
5
6
|
|
@@ -16,20 +17,59 @@ Oxcelix uses the great Ox gem (http://rubygems.org/gems/ox) for fast SAX-parsing
|
|
16
17
|
Synopsis
|
17
18
|
--------
|
18
19
|
|
19
|
-
To process an xlsx file:
|
20
|
+
To process an xlsx file:
|
20
21
|
|
21
|
-
`require 'oxcelix'`
|
22
|
+
`require 'oxcelix'`
|
23
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx')`
|
22
24
|
|
23
|
-
|
25
|
+
To omit certain sheets:
|
24
26
|
|
25
|
-
|
27
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])`
|
26
28
|
|
27
|
-
|
29
|
+
Include only some of the sheets:
|
28
30
|
|
29
|
-
|
31
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])`
|
30
32
|
|
31
|
-
|
33
|
+
To have the values of the merged cells copied over the mergegroup:
|
32
34
|
|
33
|
-
|
35
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
|
36
|
+
|
37
|
+
Convert a Sheet object into a collection of ruby values or formatted ruby strings:
|
38
|
+
`require 'oxcelix'`
|
39
|
+
`w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
|
40
|
+
`w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects`
|
41
|
+
`w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.`
|
34
42
|
|
35
|
-
|
43
|
+
Installation
|
44
|
+
------------
|
45
|
+
|
46
|
+
`gem install oxcelix`
|
47
|
+
|
48
|
+
|
49
|
+
Advantages over other Excel parsers
|
50
|
+
-----------------------------------
|
51
|
+
|
52
|
+
Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
|
53
|
+
|
54
|
+
|
55
|
+
The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
|
56
|
+
as the original file, and during the parsing, they will both be stored in the memory, which can
|
57
|
+
cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
|
58
|
+
will slow down unnecessarily the process, if we only need the data stored in that file.
|
59
|
+
|
60
|
+
|
61
|
+
The solution for the memory-consumption problem is SAX stream parsing.
|
62
|
+
|
63
|
+
|
64
|
+
Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
|
65
|
+
|
66
|
+
|
67
|
+
For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
|
68
|
+
|
69
|
+
TODO
|
70
|
+
----
|
71
|
+
* Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
|
72
|
+
* include/exclude mechanism should extend to cell areas inside Sheet objects
|
73
|
+
* Possible speedups
|
74
|
+
* Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
|
75
|
+
split conditional-formatting strings and to display e.g. thousands separated number strings
|
data/README.rdoc
CHANGED
@@ -23,10 +23,45 @@ To omit certain sheets to be processed:
|
|
23
23
|
|
24
24
|
w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])
|
25
25
|
|
26
|
-
|
26
|
+
Include only some of the sheets:
|
27
27
|
|
28
28
|
w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])
|
29
29
|
|
30
30
|
To have the values of the merged cells copied over the mergegroup:
|
31
31
|
|
32
32
|
w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
|
33
|
+
|
34
|
+
Convert a Sheet object into a collection of ruby values or formatted ruby strings:
|
35
|
+
require 'oxcelix'
|
36
|
+
w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
|
37
|
+
w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects
|
38
|
+
w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.
|
39
|
+
|
40
|
+
== Installation
|
41
|
+
|
42
|
+
gem install oxcelix
|
43
|
+
|
44
|
+
== Advantages over other Excel parsers
|
45
|
+
Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
|
46
|
+
|
47
|
+
|
48
|
+
The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
|
49
|
+
as the original file, and during the parsing, they will both be stored in the memory, which can
|
50
|
+
cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
|
51
|
+
will slow down unnecessarily the process, if we only need the data stored in that file.
|
52
|
+
|
53
|
+
|
54
|
+
The solution for the memory-consumption problem is SAX stream parsing.
|
55
|
+
|
56
|
+
|
57
|
+
Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
|
58
|
+
|
59
|
+
|
60
|
+
For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
|
61
|
+
|
62
|
+
== TODO
|
63
|
+
* Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
|
64
|
+
* include/exclude mechanism should extend to cell areas inside Sheet objects
|
65
|
+
* Possible speedups
|
66
|
+
* Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
|
67
|
+
split conditional-formatting strings and to display e.g. thousands separated number strings
|
data/lib/oxcelix/numformats.rb
CHANGED
@@ -76,30 +76,39 @@ module Oxcelix
|
|
76
76
|
module Numberhelper
|
77
77
|
include Numformats
|
78
78
|
# Get the cell's value and excel format string and return a string, a ruby Numeric or a DateTime object accordingly
|
79
|
+
# @return [Object] A ruby object that holds and represents the value stored in the cell. Conversion is based on cell formatting.
|
80
|
+
# @example Get the value of a cell:
|
81
|
+
# c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
|
82
|
+
# c.to_ru # => <DateTime: 2012-09-03T00:00:00+00:00 ((2456174j,0s,0n),+0s,2299161j)>
|
83
|
+
#
|
79
84
|
def to_ru
|
80
85
|
if !@value.numeric? || Numformats::Formatarray[@numformat.to_i][:xl] == nil || Numformats::Formatarray[@numformat.to_i][:xl].downcase == "general"
|
81
86
|
return @value
|
82
87
|
end
|
83
|
-
if Numformats::Formatarray[@numformat.to_i][:cls] == '
|
84
|
-
return eval @value
|
85
|
-
elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
|
88
|
+
if Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
|
86
89
|
return DateTime.new(1899, 12, 30) + (eval @value)
|
87
|
-
else
|
88
|
-
|
90
|
+
else Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
|
91
|
+
return eval @value rescue @value
|
89
92
|
end
|
90
93
|
end
|
91
94
|
|
92
95
|
# Get the cell's value, convert it with to_ru and finally, format it based on the value's type.
|
96
|
+
# @return [String] Value gets formatted depending on its class. If it is a DateTime, the #DateTime.strftime method is used,
|
97
|
+
# if it holds a number, the #Kernel::sprintf is run.
|
98
|
+
# @example Get the formatted value of a cell:
|
99
|
+
# c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
|
100
|
+
# c.to_fmt # => "3/9/2012"
|
101
|
+
#
|
93
102
|
def to_fmt
|
94
103
|
begin
|
95
104
|
if Numformats::Formatarray[@numformat][:cls] == 'date'
|
96
|
-
|
105
|
+
self.to_ru.strftime(Numformats::Formatarray[@numformat][:ostring]) rescue @value
|
97
106
|
elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
|
98
|
-
|
107
|
+
sprintf(Numformats::Formatarray[@numformat][:ostring], self.to_ru) rescue @value
|
99
108
|
else
|
100
109
|
return @value
|
101
110
|
end
|
102
111
|
end
|
103
112
|
end
|
104
113
|
end
|
105
|
-
end
|
114
|
+
end
|
data/lib/oxcelix/sheet.rb
CHANGED
@@ -3,6 +3,7 @@ module Oxcelix
|
|
3
3
|
# The Sheet class represents an excel sheet.
|
4
4
|
class Sheet < Matrix
|
5
5
|
include Cellhelper
|
6
|
+
include Numberhelper
|
6
7
|
# @!attribute [rw] name
|
7
8
|
# @return [String] Sheet name
|
8
9
|
# @!attribute [rw] sheetId
|
@@ -25,15 +26,50 @@ module Oxcelix
|
|
25
26
|
super(i,j[0])
|
26
27
|
end
|
27
28
|
end
|
28
|
-
|
29
|
+
|
30
|
+
#The to_m method returns a simple Matrix object filled with the raw values of the original Sheet object.
|
31
|
+
# @return [Matrix] a collection of string values (the former #Cell::value)
|
29
32
|
def to_m(*attrs)
|
30
|
-
m=Matrix.build(self.
|
31
|
-
self.
|
33
|
+
m=Matrix.build(self.row_size, self.column_size){nil}
|
34
|
+
self.each_with_index do |x, row, col|
|
32
35
|
if attrs.size == 0 || attrs.nil?
|
33
|
-
m[row, col]=x.value
|
36
|
+
m[row, col] = x.value
|
37
|
+
end
|
38
|
+
end
|
39
|
+
return m
|
40
|
+
end
|
41
|
+
|
42
|
+
# The to_ru method returns a Matrix of "rubified" values. It basically builds a new Matrix
|
43
|
+
# and puts the result of the #Cell::to_ru method of every cell of the original sheet in
|
44
|
+
# the corresponding Matrix cell.
|
45
|
+
# @return [Matrix] a collection of ruby objects (#Integers, #Floats, #DateTimes, #Rationals, #Strings)
|
46
|
+
def to_ru
|
47
|
+
m=Matrix.build(self.row_size, self.column_size){nil}
|
48
|
+
self.each_with_index do |x, row, col|
|
49
|
+
if x.nil? || x.value.nil?
|
50
|
+
m[row, col] = nil
|
51
|
+
else
|
52
|
+
m[row, col] = x.to_ru
|
34
53
|
end
|
35
54
|
end
|
36
55
|
return m
|
37
56
|
end
|
38
|
-
|
57
|
+
|
58
|
+
# The to_fmt method returns a Matrix of "formatted" values. It basically builds a new Matrix
|
59
|
+
# and puts the result of the #Cell::to_fmt method of every cell of the original sheet in
|
60
|
+
# the corresponding Matrix cell. The #Cell::to_fmt will pass the original values to to_ru, and then
|
61
|
+
# depending on the value, will run strftime on DateTime objects and sprintf on numeric types.
|
62
|
+
# @return [Matrix] a collection of Strings
|
63
|
+
def to_fmt
|
64
|
+
m=Matrix.build(self.row_size, self.column_size){nil}
|
65
|
+
self.each_with_index do |x, row, col|
|
66
|
+
if x.nil? || x.value.nil?
|
67
|
+
m[row, col] = nil
|
68
|
+
else
|
69
|
+
m[row, col] = x.to_fmt
|
70
|
+
end
|
71
|
+
end
|
72
|
+
return m
|
73
|
+
end
|
74
|
+
end
|
39
75
|
end
|
data/oxcelix.gemspec
CHANGED
@@ -3,8 +3,8 @@
|
|
3
3
|
require 'rake'
|
4
4
|
Gem::Specification.new do |s|
|
5
5
|
s.name = 'oxcelix'
|
6
|
-
s.version = '0.3.
|
7
|
-
s.date = '2013-
|
6
|
+
s.version = '0.3.1'
|
7
|
+
s.date = '2013-12-07'
|
8
8
|
s.summary = 'A fast Excel 2007/2010 file parser'
|
9
9
|
s.description = 'A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects'
|
10
10
|
s.authors = 'Giovanni Biczo'
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: oxcelix
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.1
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-
|
12
|
+
date: 2013-12-07 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: ox
|