oxcelix 0.3.0 → 0.3.1

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGES CHANGED
@@ -1,21 +1,12 @@
1
- * Further code cleanup in numformats.rb Tue Dec 3 19:49:03 2013 +0100
2
- * Renamed add to add_custom_formats Tue Dec 3 19:46:15 2013 +0100
3
- * Path cleanup Tue Dec 3 19:07:47 2013 +0100
4
- * Added documentation to Formatarray. Fixed bug which prevented cells containing e.g. String values AND datetime/numeric formatting code to be properly returned when to_ru or to_fmt was invoked. Code cleanup. Tue Dec 3 19:06:39 2013 +0100
5
- * Numberhelper included in Cell class. Various typos corrected. Cell#to_ru and Cell#to_fmt now reflect the inclusion of the Numberformats module. Mon Dec 2 23:27:52 2013 +0100
6
- * to_ru and to_fmt methods are now encapsulated in the Numhelper method. This is not only cleaner, ensures them to be included in Cell and only there. Mon Dec 2 23:04:13 2013 +0100
7
- * Cleared @numformats from Workbook. Formatarray is now a Numformats module constant available to any class including it. Mon Dec 2 22:47:40 2013 +0100
8
- * Numformat is now included in Workbook. Dtmap became a constant. New method Numformats::add converts the temparray to a series of numformat hashes and adds it to the main numformat array. to_ru, to_fmt, datetime, numeric are now Workbook methods. Slight code cleanup. Sun Dec 1 19:28:41 2013 +0100
9
- * Numformats.rb cleanup. Documented new modules/methods. Restructured to_ru and to_fmt. Fri Nov 29 09:23:41 2013 +0100
10
- * Code cleanup. TODO: split format string (;)? cellhelper.rb to be cleaned up Mon Nov 25 00:06:38 2013 +0100
11
- * finalized numeric method with a regex composed by 6 members: prefix, decimals, separator, floats, expo, postfix. This obsoletes the vgrp method as well as the other ones (still in the numformats.rb file.) TODO: delete all unnecessary code and provide a meaningful return value. Mon Nov 25 00:00:52 2013 +0100
12
- * Added Numformats Sat Nov 23 09:48:50 2013 +0100
13
- * Merge branch '0.3.0' of https://github.com/gbiczo/oxcelix into 0.3.0 Sat Nov 23 09:43:59 2013 +0100
14
- * Added numformats.rb. Cleaned styles.rb from unnecessary comments. Sat Nov 23 09:41:10 2013 +0100
15
- * Added badge Fri Nov 22 12:20:48 2013 +0100
16
- * Some speed optimizations. matrixto does not accept fmt parameter any more. fixed typos Wed Nov 13 00:03:44 2013 +0100
17
- * Cell now has a numformat attribute. A new module called Numformats will contain methods related to numeric formatting. to_r and to_d are now obsolete. Xlsheet init parameter (styles). Stylefile is opened in the workbook. Matrixto gets a :values option (this will be obsoleted shortly). Slight Gemspec and .md description change Mon Nov 4 17:47:17 2013 +0100
18
- * Initial version of Sheet::to_m method Sat Oct 19 11:06:42 2013 +0200
19
- * Format array is now a constant (FARY) Fri Oct 18 21:58:39 2013 +0200
20
- * Fixed typo in README Fri Oct 18 21:51:09 2013 +0200
21
- * Started working on cell value formats Fri Oct 18 21:50:24 2013 +0200
1
+ 0.3.1
2
+ * Sheet now has its very own to_ru and to_fmt methods. Also a to_m method has been added, which returns a Matrix of raw data.
3
+ * Documentation changes.
4
+ 0.3.0
5
+ * Number formats edition that includes:
6
+ Cell#to_ru and Cell#to_fmt.
7
+ Numberformats module
8
+ Numberhelper module
9
+ 0.2.4
10
+ * Bugfix release
11
+ 0.2.2
12
+ * Sheet < Matrix
data/README.md CHANGED
@@ -1,5 +1,6 @@
1
1
  Oxcelix
2
2
  =======
3
+ <a href="http://badge.fury.io/rb/oxcelix"><img src="https://badge.fury.io/rb/oxcelix@2x.png" alt="Gem Version" height="18"></a>
3
4
 
4
5
  Oxcelix - A fast and simple .xlsx file parser
5
6
 
@@ -16,20 +17,59 @@ Oxcelix uses the great Ox gem (http://rubygems.org/gems/ox) for fast SAX-parsing
16
17
  Synopsis
17
18
  --------
18
19
 
19
- To process an xlsx file:
20
+ To process an xlsx file:
20
21
 
21
- `require 'oxcelix'`
22
+ `require 'oxcelix'`
23
+ `w = Oxcelix::Workbook.new('whatever.xlsx')`
22
24
 
23
- `w = Oxcelix::Workbook.new('whatever.xlsx')`
25
+ To omit certain sheets:
24
26
 
25
- To omit certain sheets:
27
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])`
26
28
 
27
- `w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])`
29
+ Include only some of the sheets:
28
30
 
29
- To include only some of the sheets:
31
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])`
30
32
 
31
- `w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])`
33
+ To have the values of the merged cells copied over the mergegroup:
32
34
 
33
- To have the values of the merged cells copied over the mergegroup:
35
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
36
+
37
+ Convert a Sheet object into a collection of ruby values or formatted ruby strings:
38
+ `require 'oxcelix'`
39
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
40
+ `w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects`
41
+ `w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.`
34
42
 
35
- `w = Oxcelix::Workbook.new('whatever.xlsx', :mergecells => true)`
43
+ Installation
44
+ ------------
45
+
46
+ `gem install oxcelix`
47
+
48
+
49
+ Advantages over other Excel parsers
50
+ -----------------------------------
51
+
52
+ Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
53
+
54
+
55
+ The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
56
+ as the original file, and during the parsing, they will both be stored in the memory, which can
57
+ cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
58
+ will slow down unnecessarily the process, if we only need the data stored in that file.
59
+
60
+
61
+ The solution for the memory-consumption problem is SAX stream parsing.
62
+
63
+
64
+ Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
65
+
66
+
67
+ For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
68
+
69
+ TODO
70
+ ----
71
+ * Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
72
+ * include/exclude mechanism should extend to cell areas inside Sheet objects
73
+ * Possible speedups
74
+ * Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
75
+ split conditional-formatting strings and to display e.g. thousands separated number strings
data/README.rdoc CHANGED
@@ -23,10 +23,45 @@ To omit certain sheets to be processed:
23
23
 
24
24
  w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])
25
25
 
26
- To include only some of the sheets:
26
+ Include only some of the sheets:
27
27
 
28
28
  w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])
29
29
 
30
30
  To have the values of the merged cells copied over the mergegroup:
31
31
 
32
32
  w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
33
+
34
+ Convert a Sheet object into a collection of ruby values or formatted ruby strings:
35
+ require 'oxcelix'
36
+ w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
37
+ w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects
38
+ w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.
39
+
40
+ == Installation
41
+
42
+ gem install oxcelix
43
+
44
+ == Advantages over other Excel parsers
45
+ Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
46
+
47
+
48
+ The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
49
+ as the original file, and during the parsing, they will both be stored in the memory, which can
50
+ cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
51
+ will slow down unnecessarily the process, if we only need the data stored in that file.
52
+
53
+
54
+ The solution for the memory-consumption problem is SAX stream parsing.
55
+
56
+
57
+ Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
58
+
59
+
60
+ For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
61
+
62
+ == TODO
63
+ * Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
64
+ * include/exclude mechanism should extend to cell areas inside Sheet objects
65
+ * Possible speedups
66
+ * Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
67
+ split conditional-formatting strings and to display e.g. thousands separated number strings
@@ -76,30 +76,39 @@ module Oxcelix
76
76
  module Numberhelper
77
77
  include Numformats
78
78
  # Get the cell's value and excel format string and return a string, a ruby Numeric or a DateTime object accordingly
79
+ # @return [Object] A ruby object that holds and represents the value stored in the cell. Conversion is based on cell formatting.
80
+ # @example Get the value of a cell:
81
+ # c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
82
+ # c.to_ru # => <DateTime: 2012-09-03T00:00:00+00:00 ((2456174j,0s,0n),+0s,2299161j)>
83
+ #
79
84
  def to_ru
80
85
  if !@value.numeric? || Numformats::Formatarray[@numformat.to_i][:xl] == nil || Numformats::Formatarray[@numformat.to_i][:xl].downcase == "general"
81
86
  return @value
82
87
  end
83
- if Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
84
- return eval @value
85
- elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
88
+ if Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
86
89
  return DateTime.new(1899, 12, 30) + (eval @value)
87
- else
88
- eval @value rescue @value
90
+ else Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
91
+ return eval @value rescue @value
89
92
  end
90
93
  end
91
94
 
92
95
  # Get the cell's value, convert it with to_ru and finally, format it based on the value's type.
96
+ # @return [String] Value gets formatted depending on its class. If it is a DateTime, the #DateTime.strftime method is used,
97
+ # if it holds a number, the #Kernel::sprintf is run.
98
+ # @example Get the formatted value of a cell:
99
+ # c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
100
+ # c.to_fmt # => "3/9/2012"
101
+ #
93
102
  def to_fmt
94
103
  begin
95
104
  if Numformats::Formatarray[@numformat][:cls] == 'date'
96
- self.to_ru.strftime(datetime(Numformats::Formatarray[@numformat][:xl])) rescue @value
105
+ self.to_ru.strftime(Numformats::Formatarray[@numformat][:ostring]) rescue @value
97
106
  elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
98
- sprintf(numeric(Numformats::Formatarray[@numformat][:xl]), self.to_ru) rescue @value
107
+ sprintf(Numformats::Formatarray[@numformat][:ostring], self.to_ru) rescue @value
99
108
  else
100
109
  return @value
101
110
  end
102
111
  end
103
112
  end
104
113
  end
105
- end
114
+ end
data/lib/oxcelix/sheet.rb CHANGED
@@ -3,6 +3,7 @@ module Oxcelix
3
3
  # The Sheet class represents an excel sheet.
4
4
  class Sheet < Matrix
5
5
  include Cellhelper
6
+ include Numberhelper
6
7
  # @!attribute [rw] name
7
8
  # @return [String] Sheet name
8
9
  # @!attribute [rw] sheetId
@@ -25,15 +26,50 @@ module Oxcelix
25
26
  super(i,j[0])
26
27
  end
27
28
  end
28
-
29
+
30
+ #The to_m method returns a simple Matrix object filled with the raw values of the original Sheet object.
31
+ # @return [Matrix] a collection of string values (the former #Cell::value)
29
32
  def to_m(*attrs)
30
- m=Matrix.build(self.col(0).length, self.row(0).length){nil}
31
- self.each do |x, row, col|
33
+ m=Matrix.build(self.row_size, self.column_size){nil}
34
+ self.each_with_index do |x, row, col|
32
35
  if attrs.size == 0 || attrs.nil?
33
- m[row, col]=x.value
36
+ m[row, col] = x.value
37
+ end
38
+ end
39
+ return m
40
+ end
41
+
42
+ # The to_ru method returns a Matrix of "rubified" values. It basically builds a new Matrix
43
+ # and puts the result of the #Cell::to_ru method of every cell of the original sheet in
44
+ # the corresponding Matrix cell.
45
+ # @return [Matrix] a collection of ruby objects (#Integers, #Floats, #DateTimes, #Rationals, #Strings)
46
+ def to_ru
47
+ m=Matrix.build(self.row_size, self.column_size){nil}
48
+ self.each_with_index do |x, row, col|
49
+ if x.nil? || x.value.nil?
50
+ m[row, col] = nil
51
+ else
52
+ m[row, col] = x.to_ru
34
53
  end
35
54
  end
36
55
  return m
37
56
  end
38
- end
57
+
58
+ # The to_fmt method returns a Matrix of "formatted" values. It basically builds a new Matrix
59
+ # and puts the result of the #Cell::to_fmt method of every cell of the original sheet in
60
+ # the corresponding Matrix cell. The #Cell::to_fmt will pass the original values to to_ru, and then
61
+ # depending on the value, will run strftime on DateTime objects and sprintf on numeric types.
62
+ # @return [Matrix] a collection of Strings
63
+ def to_fmt
64
+ m=Matrix.build(self.row_size, self.column_size){nil}
65
+ self.each_with_index do |x, row, col|
66
+ if x.nil? || x.value.nil?
67
+ m[row, col] = nil
68
+ else
69
+ m[row, col] = x.to_fmt
70
+ end
71
+ end
72
+ return m
73
+ end
74
+ end
39
75
  end
data/oxcelix.gemspec CHANGED
@@ -3,8 +3,8 @@
3
3
  require 'rake'
4
4
  Gem::Specification.new do |s|
5
5
  s.name = 'oxcelix'
6
- s.version = '0.3.0'
7
- s.date = '2013-11-12'
6
+ s.version = '0.3.1'
7
+ s.date = '2013-12-07'
8
8
  s.summary = 'A fast Excel 2007/2010 file parser'
9
9
  s.description = 'A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects'
10
10
  s.authors = 'Giovanni Biczo'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oxcelix
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-11-12 00:00:00.000000000 Z
12
+ date: 2013-12-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: ox