oxcelix 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGES CHANGED
@@ -1,21 +1,12 @@
1
- * Further code cleanup in numformats.rb Tue Dec 3 19:49:03 2013 +0100
2
- * Renamed add to add_custom_formats Tue Dec 3 19:46:15 2013 +0100
3
- * Path cleanup Tue Dec 3 19:07:47 2013 +0100
4
- * Added documentation to Formatarray. Fixed bug which prevented cells containing e.g. String values AND datetime/numeric formatting code to be properly returned when to_ru or to_fmt was invoked. Code cleanup. Tue Dec 3 19:06:39 2013 +0100
5
- * Numberhelper included in Cell class. Various typos corrected. Cell#to_ru and Cell#to_fmt now reflect the inclusion of the Numberformats module. Mon Dec 2 23:27:52 2013 +0100
6
- * to_ru and to_fmt methods are now encapsulated in the Numhelper method. This is not only cleaner, ensures them to be included in Cell and only there. Mon Dec 2 23:04:13 2013 +0100
7
- * Cleared @numformats from Workbook. Formatarray is now a Numformats module constant available to any class including it. Mon Dec 2 22:47:40 2013 +0100
8
- * Numformat is now included in Workbook. Dtmap became a constant. New method Numformats::add converts the temparray to a series of numformat hashes and adds it to the main numformat array. to_ru, to_fmt, datetime, numeric are now Workbook methods. Slight code cleanup. Sun Dec 1 19:28:41 2013 +0100
9
- * Numformats.rb cleanup. Documented new modules/methods. Restructured to_ru and to_fmt. Fri Nov 29 09:23:41 2013 +0100
10
- * Code cleanup. TODO: split format string (;)? cellhelper.rb to be cleaned up Mon Nov 25 00:06:38 2013 +0100
11
- * finalized numeric method with a regex composed by 6 members: prefix, decimals, separator, floats, expo, postfix. This obsoletes the vgrp method as well as the other ones (still in the numformats.rb file.) TODO: delete all unnecessary code and provide a meaningful return value. Mon Nov 25 00:00:52 2013 +0100
12
- * Added Numformats Sat Nov 23 09:48:50 2013 +0100
13
- * Merge branch '0.3.0' of https://github.com/gbiczo/oxcelix into 0.3.0 Sat Nov 23 09:43:59 2013 +0100
14
- * Added numformats.rb. Cleaned styles.rb from unnecessary comments. Sat Nov 23 09:41:10 2013 +0100
15
- * Added badge Fri Nov 22 12:20:48 2013 +0100
16
- * Some speed optimizations. matrixto does not accept fmt parameter any more. fixed typos Wed Nov 13 00:03:44 2013 +0100
17
- * Cell now has a numformat attribute. A new module called Numformats will contain methods related to numeric formatting. to_r and to_d are now obsolete. Xlsheet init parameter (styles). Stylefile is opened in the workbook. Matrixto gets a :values option (this will be obsoleted shortly). Slight Gemspec and .md description change Mon Nov 4 17:47:17 2013 +0100
18
- * Initial version of Sheet::to_m method Sat Oct 19 11:06:42 2013 +0200
19
- * Format array is now a constant (FARY) Fri Oct 18 21:58:39 2013 +0200
20
- * Fixed typo in README Fri Oct 18 21:51:09 2013 +0200
21
- * Started working on cell value formats Fri Oct 18 21:50:24 2013 +0200
1
+ 0.3.1
2
+ * Sheet now has its very own to_ru and to_fmt methods. Also a to_m method has been added, which returns a Matrix of raw data.
3
+ * Documentation changes.
4
+ 0.3.0
5
+ * Number formats edition that includes:
6
+ Cell#to_ru and Cell#to_fmt.
7
+ Numberformats module
8
+ Numberhelper module
9
+ 0.2.4
10
+ * Bugfix release
11
+ 0.2.2
12
+ * Sheet < Matrix
data/README.md CHANGED
@@ -1,5 +1,6 @@
1
1
  Oxcelix
2
2
  =======
3
+ <a href="http://badge.fury.io/rb/oxcelix"><img src="https://badge.fury.io/rb/oxcelix@2x.png" alt="Gem Version" height="18"></a>
3
4
 
4
5
  Oxcelix - A fast and simple .xlsx file parser
5
6
 
@@ -16,20 +17,59 @@ Oxcelix uses the great Ox gem (http://rubygems.org/gems/ox) for fast SAX-parsing
16
17
  Synopsis
17
18
  --------
18
19
 
19
- To process an xlsx file:
20
+ To process an xlsx file:
20
21
 
21
- `require 'oxcelix'`
22
+ `require 'oxcelix'`
23
+ `w = Oxcelix::Workbook.new('whatever.xlsx')`
22
24
 
23
- `w = Oxcelix::Workbook.new('whatever.xlsx')`
25
+ To omit certain sheets:
24
26
 
25
- To omit certain sheets:
27
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])`
26
28
 
27
- `w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])`
29
+ Include only some of the sheets:
28
30
 
29
- To include only some of the sheets:
31
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])`
30
32
 
31
- `w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])`
33
+ To have the values of the merged cells copied over the mergegroup:
32
34
 
33
- To have the values of the merged cells copied over the mergegroup:
35
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
36
+
37
+ Convert a Sheet object into a collection of ruby values or formatted ruby strings:
38
+ `require 'oxcelix'`
39
+ `w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)`
40
+ `w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects`
41
+ `w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.`
34
42
 
35
- `w = Oxcelix::Workbook.new('whatever.xlsx', :mergecells => true)`
43
+ Installation
44
+ ------------
45
+
46
+ `gem install oxcelix`
47
+
48
+
49
+ Advantages over other Excel parsers
50
+ -----------------------------------
51
+
52
+ Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
53
+
54
+
55
+ The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
56
+ as the original file, and during the parsing, they will both be stored in the memory, which can
57
+ cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
58
+ will slow down unnecessarily the process, if we only need the data stored in that file.
59
+
60
+
61
+ The solution for the memory-consumption problem is SAX stream parsing.
62
+
63
+
64
+ Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
65
+
66
+
67
+ For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
68
+
69
+ TODO
70
+ ----
71
+ * Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
72
+ * include/exclude mechanism should extend to cell areas inside Sheet objects
73
+ * Possible speedups
74
+ * Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
75
+ split conditional-formatting strings and to display e.g. thousands separated number strings
data/README.rdoc CHANGED
@@ -23,10 +23,45 @@ To omit certain sheets to be processed:
23
23
 
24
24
  w = Oxcelix::Workbook.new('whatever.xlsx', :exclude => ['sheet1', 'sheet2'])
25
25
 
26
- To include only some of the sheets:
26
+ Include only some of the sheets:
27
27
 
28
28
  w = Oxcelix::Workbook.new('whatever.xlsx', :include => ['sheet1', 'sheet2', 'sheet3'])
29
29
 
30
30
  To have the values of the merged cells copied over the mergegroup:
31
31
 
32
32
  w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
33
+
34
+ Convert a Sheet object into a collection of ruby values or formatted ruby strings:
35
+ require 'oxcelix'
36
+ w = Oxcelix::Workbook.new('whatever.xlsx', :copymerge => true)
37
+ w.sheets[0].to_ru # returns a Matrix of DateTime, Integer, etc objects
38
+ w.sheets[0].to_fmt # returns a Matrix of formatted Strings based on the above.
39
+
40
+ == Installation
41
+
42
+ gem install oxcelix
43
+
44
+ == Advantages over other Excel parsers
45
+ Excel file processing involves XML document parsing. Usually, this is achieved by some XML library such as Nokogiri[http://nokogiri.org].
46
+
47
+
48
+ The main drawbacks of this approach are memory usage and speed. The resulting object tree will be roughly as big
49
+ as the original file, and during the parsing, they will both be stored in the memory, which can
50
+ cause quite some complications when processing huge files. Also, interpreting every bit of an excel spreadsheet
51
+ will slow down unnecessarily the process, if we only need the data stored in that file.
52
+
53
+
54
+ The solution for the memory-consumption problem is SAX stream parsing.
55
+
56
+
57
+ Oxcelix uses the SAX parser offered by Peter Ohler's Ox gem. Ox is fast and powerful enough to solve the speed issue.
58
+
59
+
60
+ For a comparison of XML parsers, please consult the Ox homepage[http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html].
61
+
62
+ == TODO
63
+ * Implement RawWorkbook, ValueWorkbook, FormattedWorkbook
64
+ * include/exclude mechanism should extend to cell areas inside Sheet objects
65
+ * Possible speedups
66
+ * Further improvement to the formatting algorithms. Theoretically, to_fmt should be able to
67
+ split conditional-formatting strings and to display e.g. thousands separated number strings
@@ -76,30 +76,39 @@ module Oxcelix
76
76
  module Numberhelper
77
77
  include Numformats
78
78
  # Get the cell's value and excel format string and return a string, a ruby Numeric or a DateTime object accordingly
79
+ # @return [Object] A ruby object that holds and represents the value stored in the cell. Conversion is based on cell formatting.
80
+ # @example Get the value of a cell:
81
+ # c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
82
+ # c.to_ru # => <DateTime: 2012-09-03T00:00:00+00:00 ((2456174j,0s,0n),+0s,2299161j)>
83
+ #
79
84
  def to_ru
80
85
  if !@value.numeric? || Numformats::Formatarray[@numformat.to_i][:xl] == nil || Numformats::Formatarray[@numformat.to_i][:xl].downcase == "general"
81
86
  return @value
82
87
  end
83
- if Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
84
- return eval @value
85
- elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
88
+ if Numformats::Formatarray[@numformat.to_i][:cls] == 'date'
86
89
  return DateTime.new(1899, 12, 30) + (eval @value)
87
- else
88
- eval @value rescue @value
90
+ else Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
91
+ return eval @value rescue @value
89
92
  end
90
93
  end
91
94
 
92
95
  # Get the cell's value, convert it with to_ru and finally, format it based on the value's type.
96
+ # @return [String] Value gets formatted depending on its class. If it is a DateTime, the #DateTime.strftime method is used,
97
+ # if it holds a number, the #Kernel::sprintf is run.
98
+ # @example Get the formatted value of a cell:
99
+ # c = w.sheets[0]["B3"] # => <Oxcelix::Cell:0x00000002a5b368 @xlcoords="A3", @style="84", @type="n", @value="41155", @numformat=14>
100
+ # c.to_fmt # => "3/9/2012"
101
+ #
93
102
  def to_fmt
94
103
  begin
95
104
  if Numformats::Formatarray[@numformat][:cls] == 'date'
96
- self.to_ru.strftime(datetime(Numformats::Formatarray[@numformat][:xl])) rescue @value
105
+ self.to_ru.strftime(Numformats::Formatarray[@numformat][:ostring]) rescue @value
97
106
  elsif Numformats::Formatarray[@numformat.to_i][:cls] == 'numeric' || Numformats::Formatarray[@numformat.to_i][:cls] == 'rational'
98
- sprintf(numeric(Numformats::Formatarray[@numformat][:xl]), self.to_ru) rescue @value
107
+ sprintf(Numformats::Formatarray[@numformat][:ostring], self.to_ru) rescue @value
99
108
  else
100
109
  return @value
101
110
  end
102
111
  end
103
112
  end
104
113
  end
105
- end
114
+ end
data/lib/oxcelix/sheet.rb CHANGED
@@ -3,6 +3,7 @@ module Oxcelix
3
3
  # The Sheet class represents an excel sheet.
4
4
  class Sheet < Matrix
5
5
  include Cellhelper
6
+ include Numberhelper
6
7
  # @!attribute [rw] name
7
8
  # @return [String] Sheet name
8
9
  # @!attribute [rw] sheetId
@@ -25,15 +26,50 @@ module Oxcelix
25
26
  super(i,j[0])
26
27
  end
27
28
  end
28
-
29
+
30
+ #The to_m method returns a simple Matrix object filled with the raw values of the original Sheet object.
31
+ # @return [Matrix] a collection of string values (the former #Cell::value)
29
32
  def to_m(*attrs)
30
- m=Matrix.build(self.col(0).length, self.row(0).length){nil}
31
- self.each do |x, row, col|
33
+ m=Matrix.build(self.row_size, self.column_size){nil}
34
+ self.each_with_index do |x, row, col|
32
35
  if attrs.size == 0 || attrs.nil?
33
- m[row, col]=x.value
36
+ m[row, col] = x.value
37
+ end
38
+ end
39
+ return m
40
+ end
41
+
42
+ # The to_ru method returns a Matrix of "rubified" values. It basically builds a new Matrix
43
+ # and puts the result of the #Cell::to_ru method of every cell of the original sheet in
44
+ # the corresponding Matrix cell.
45
+ # @return [Matrix] a collection of ruby objects (#Integers, #Floats, #DateTimes, #Rationals, #Strings)
46
+ def to_ru
47
+ m=Matrix.build(self.row_size, self.column_size){nil}
48
+ self.each_with_index do |x, row, col|
49
+ if x.nil? || x.value.nil?
50
+ m[row, col] = nil
51
+ else
52
+ m[row, col] = x.to_ru
34
53
  end
35
54
  end
36
55
  return m
37
56
  end
38
- end
57
+
58
+ # The to_fmt method returns a Matrix of "formatted" values. It basically builds a new Matrix
59
+ # and puts the result of the #Cell::to_fmt method of every cell of the original sheet in
60
+ # the corresponding Matrix cell. The #Cell::to_fmt will pass the original values to to_ru, and then
61
+ # depending on the value, will run strftime on DateTime objects and sprintf on numeric types.
62
+ # @return [Matrix] a collection of Strings
63
+ def to_fmt
64
+ m=Matrix.build(self.row_size, self.column_size){nil}
65
+ self.each_with_index do |x, row, col|
66
+ if x.nil? || x.value.nil?
67
+ m[row, col] = nil
68
+ else
69
+ m[row, col] = x.to_fmt
70
+ end
71
+ end
72
+ return m
73
+ end
74
+ end
39
75
  end
data/oxcelix.gemspec CHANGED
@@ -3,8 +3,8 @@
3
3
  require 'rake'
4
4
  Gem::Specification.new do |s|
5
5
  s.name = 'oxcelix'
6
- s.version = '0.3.0'
7
- s.date = '2013-11-12'
6
+ s.version = '0.3.1'
7
+ s.date = '2013-12-07'
8
8
  s.summary = 'A fast Excel 2007/2010 file parser'
9
9
  s.description = 'A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects'
10
10
  s.authors = 'Giovanni Biczo'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oxcelix
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-11-12 00:00:00.000000000 Z
12
+ date: 2013-12-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: ox