rubyexcel 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. data/lib/README.md +572 -0
  2. metadata +2 -1
data/lib/README.md ADDED
@@ -0,0 +1,572 @@
1
+ RubyExcel
2
+ =========
3
+
4
+ Designed for Windows with MS Excel
5
+
6
+ **Still under construction! Bugs are inevitable.**
7
+
8
+ Introduction
9
+ ------------
10
+
11
+ A Data-analysis tool for Ruby, with an Excel-style API.
12
+
13
+ Details
14
+ -----
15
+
16
+ Key design features taken from Excel:
17
+
18
+ * 1-based indexing.
19
+ * Referencing objects like Excel's API ( Workbook, Sheet, Row, Column, Cell, Range ).
20
+ * Useful data-handling functions ( e.g. Filter, Match, Sumif, Vlookup ).
21
+
22
+ Typical usage:
23
+
24
+ 1. Extract a HTML Table or CSV File into 2D Array ( normally with Nokogiri / Mechanize )
25
+ 2. Organise and interpret data with RubyExcel
26
+ 3. Output results into a file.
27
+
28
+ About
29
+ -----
30
+
31
+ This gem is designed as a way to conveniently edit table data before outputting it to Excel (XLSX) or TSV format (which Excel can interpret).
32
+ It attempts to take as much as possible from Excel's API while providing some of the best bits of Ruby ( e.g. Enumerators, Blocks, Regexp ).
33
+ An important feature is allowing reference to Columns via their Headers for convenience and enhanced code readability.
34
+ As this works directly on the data, processing is faster than using Excel itself.
35
+
36
+ This was written out of the frustration of editing tabular data using Ruby's multidimensional arrays,
37
+ without affecting headers and while maintaining code readability.
38
+ Its API is designed to simplify moving code across from VBA into Ruby format when processing spreadsheet data.
39
+ The combination of Ruby, WIN32OLE Excel, and extracting HTML table data is probably quite rare; but I thought I'd share what I came up with.
40
+
41
+ Examples
42
+ ========
43
+
44
+ Expected Data Layout (2D Array)
45
+ --------
46
+
47
+ ```ruby
48
+ data = [
49
+ [ 'Part', 'Ref1', 'Ref2', 'Qty', 'Cost' ],
50
+ [ 'Type1', 'QT1', '231', 1, 35.15 ],
51
+ [ 'Type2', 'QT3', '123', 1, 40 ],
52
+ [ 'Type3', 'XT1', '321', 3, 0.1 ],
53
+ [ 'Type1', 'XY2', '132', 1, 30.00 ],
54
+ [ 'Type4', 'XT3', '312', 2, 3 ],
55
+ [ 'Type2', 'QY2', '213', 1, 99.99 ],
56
+ [ 'Type1', 'QT4', '123', 2, 104 ]
57
+ ]
58
+ ```
59
+ The number of header rows defaults to 1
60
+
61
+ Loading the data into a Sheet
62
+ --------
63
+
64
+ ```ruby
65
+ require 'rubyexcel'
66
+
67
+ wb = RubyExcel::Workbook.new
68
+ s = wb.add( 'Sheet1' )
69
+ s.load( data )
70
+
71
+ Or:
72
+
73
+ wb = RubyExcel::Workbook.new
74
+ s = wb.add( 'Sheet1' )
75
+ s.load( RubyExcel.sample_data )
76
+
77
+ Or:
78
+
79
+ wb = RubyExcel::Workbook.new
80
+ s = wb.load( RubyExcel.sample_data )
81
+
82
+ Or:
83
+
84
+ s = RubyExcel.sample_sheet
85
+ wb = s.parent
86
+ ```
87
+
88
+ Using the Mechanize gem to get data
89
+ --------
90
+
91
+ ```ruby
92
+ s = RubyExcel::Workbook.new.load( CSV.parse( Mechanize.new.get('http://example.com/myfile.csv').content ) )
93
+ ```
94
+
95
+ Reference a cell's value
96
+ --------
97
+
98
+ ```ruby
99
+ s['A7']
100
+ s.cell(7,1).value
101
+ s.range('A7').value
102
+ s.row(7)['A']
103
+ s.row(7)[1]
104
+ s.column('A')[7]
105
+ s.column('A')['7']
106
+ ```
107
+
108
+ Reference a group of cells
109
+ --------
110
+
111
+ ```ruby
112
+ s['A1:B3'] #=> Array
113
+ s.range( 'A1:B3' ) #=> Element
114
+ s.range( 'A1', 'B3' ) #=> Element
115
+ s.range( s.cell( 1, 1 ), s.cell( 3, 2 ) ) #=> Element
116
+ s.row( 1 ) #=> Row
117
+ s.column( 'A' ) #=> Column
118
+ s.column( 1 ) #=> Column
119
+ ```
120
+
121
+ Detailed Interactions
122
+ ========
123
+
124
+ Workbook
125
+ --------
126
+
127
+ ```ruby
128
+ #Create a workbook
129
+ wb = RubyExcel::Workbook.new
130
+
131
+ #Add sheets to the workbook
132
+ sheet1, sheet2 = wb.add('Sheet1'), wb.add
133
+
134
+ #Delete all sheets from a workbook
135
+ wb.clear_all
136
+
137
+ #Delete a specific sheet
138
+ wb.delete( 1 )
139
+ wb.delete( 'Sheet1' )
140
+ wb.delete( sheet1 )
141
+ wb.delete( /sheet1/i )
142
+
143
+ #Shortcut to create a sheet with a default name and fill it with data
144
+ wb.load( data )
145
+
146
+ #Select a sheet
147
+ wb.sheets(1) #=> RubyExcel::Sheet
148
+ wb.sheets('Sheet1') #=> RubyExcel::Sheet
149
+
150
+ #Iterate through all sheets
151
+ wb.sheets #=> Enumerator
152
+ wb.each #=> Enumerator
153
+
154
+ #Sort the sheets
155
+ wb.sort! { |x,y| x.name <=> y.name }
156
+ wb.sort_by! &:name
157
+ ```
158
+
159
+ Sheet
160
+ --------
161
+
162
+ ```ruby
163
+ #Create a sheet
164
+ s = wb.add #Name defaults to 'Sheet' + total number of sheets
165
+ s = wb.add( 'Sheet1' )
166
+
167
+ #Access the sheet name
168
+ s.name #=> 'Sheet1'
169
+ s.name = 'Sheet1'
170
+
171
+ #Access the parent workbook
172
+ s.workbook
173
+ s.parent
174
+
175
+ #Access the headers
176
+ s.header_rows #=> 1
177
+ s.headers #=> 1
178
+ s.headers = 1
179
+ s.header_rows = 1
180
+
181
+ #Specify the number of header rows when loading data
182
+ s.load( data, 1 )
183
+
184
+ #Append data (at the bottom of the sheet)
185
+ s << data
186
+ s << s
187
+ s += data
188
+ s += s
189
+
190
+ #Remove identical rows in another data set (skipping any headers)
191
+ s -= data
192
+ s -= s
193
+
194
+ #Select a column by its header
195
+ s.column_by_header( 'Part' )
196
+ s.ch( 'Part' )
197
+ #=> Column
198
+
199
+ #Iterate through rows or columns
200
+ s.rows { |r| puts r } #All rows
201
+ s.rows( 2 ) { |r| puts r } #From the 2nd to the last row
202
+ s.rows( 1, 3 ) { |r| puts r } #Rows 1 to 3
203
+ s.columns { |c| puts c } #All columns
204
+ s.columns( 'B' ) { |c| puts c } #From the 2nd to the last column
205
+ s.columns( 2 ) { |c| puts c } #From the 2nd to the last column
206
+ s.columns( 'B', 'D' ) { |c| puts c } #Columns 2 to 4
207
+ s.columns( 2, 4 ) { |c| puts c } #Columns 2 to 4
208
+
209
+ #Remove all empty rows & columns
210
+ s.compact!
211
+
212
+ #Delete the current sheet from the workbook
213
+ s.delete
214
+
215
+ #Delete rows or columns "if( condition )" (iterates in reverse to preserve references during loop)
216
+ s.delete_rows_if { |r| r.empty? }
217
+ s.delete_columns_if { |c| c.empty? }
218
+
219
+ #Filter the data given a column and a block to test values against.
220
+ #Note: Returns a copy of the sheet when used without "!".
221
+ #Note: This gem carries a Regexp to_proc method for Regex shorthand (shown below).
222
+ s.filter!( 'Part' ) { |value| value =~ /Type[13]/ }
223
+ s.filter!( 'Part', &/Type[13]/ )
224
+
225
+ #Filter the data to a specific set of columns by their headers.
226
+ #Note: Returns a copy of the sheet when used without "!".
227
+ s.get_columns!( 'Cost', 'Part', 'Qty' )
228
+ s.gc!( 'Cost', 'Part', 'Qty' )
229
+
230
+ #Insert blank rows or columns ( before, number to insert )
231
+ s.insert_rows( 2, 2 ) #Inserts 2 empty rows before row 2
232
+ s.insert_columns( 'B', 1 ) #Inserts 2 empty columns before column 2
233
+ s.insert_columns( 2, 1 ) #Inserts 2 empty columns before column 2
234
+
235
+ #Find the first row which matches a value within a column (selected by header)
236
+ #Note: Can now accept a Column object in place of a header.
237
+ s.match( 'Qty' ) { |value| value == 1 } #=> 2
238
+ s.match( 'Part', &/Type2/ ) #=> 3
239
+
240
+ #Find the current end of the data range
241
+ s.maxrow #=> 8
242
+ s.rows.count #=> 8
243
+ s.maxcol #=> 5
244
+ s.columns.count #=> 5
245
+
246
+ #Partition the sheet into two, given a header and a block (like Filter)
247
+ #Note: this keeps the headers intact in both outputs sheets
248
+ type_1_and_3, other = s.partition( 'Part' ) { |value| value =~ /Type[13]/ }
249
+ type_1_and_3, other = s.partition( 'Part', &/Type[13]/ )
250
+
251
+ #Reverse the data by rows or columns (ignores headers)
252
+ s.reverse_rows!
253
+ s.reverse_columns!
254
+
255
+ #Sort the rows by criteria (ignores headers)
256
+ s.sort! { |r1,r2| r1['A'] <=> r2['A'] }
257
+ s.sort_by! { |r| r['A'] }
258
+
259
+ #Sum all elements in a column by criteria in another column (selected by header)
260
+ #Parameters: Header to pass to the block, Header to sum, Block.
261
+ #Note: Now also accepts Column objects in place of headers.
262
+ s.sumif( 'Part', 'Cost' ) { |part| part == 'Type1' } #=> 169.15
263
+ s.sumif( 'Part', 'Cost', &/Type1/ ) #=> 169.15
264
+
265
+ #Convert the data into various formats:
266
+ s.to_a #=> 2D Array
267
+ s.to_excel #=> WIN32OLE Excel Workbook (Contains only the current sheet)
268
+ s.to_html #=> String (HTML table)
269
+ s.to_s #=> String (TSV)
270
+
271
+ #Remove all rows with duplicate values in the given column (selected by header or Column object)
272
+ s.uniq! 'Part'
273
+
274
+ #Find a value in one column by searching another one (selected by headers or Column objects)
275
+ s.vlookup( 'Part', 'Ref1', &/Type4/ ) #=> "XT3"
276
+ ```
277
+
278
+ Row / Column (Section)
279
+ --------
280
+
281
+ ```ruby
282
+ #Reference a Row or Column
283
+ row = s.row(2)
284
+ col = s.column('B')
285
+
286
+ =begin
287
+ Append a value
288
+ Note: Only extends the data boundaries when at the first row or column.
289
+ This allows looping through an entire row or column to append single values,
290
+ without worrying about using the correct index.
291
+ =end
292
+ s.row(1) << 'New'
293
+ s.rows(2) { |r| r << 'Column' }
294
+ s.column(1) << 'New'
295
+ s.columns(2) { |c| c << 'Row' }
296
+
297
+ #Delete the data referenced by self.
298
+ row.delete
299
+ col.delete
300
+
301
+ #Find the address of a cell matching a block
302
+ row.find { |value| value == 'QT1' }
303
+ row.find &/QT1/
304
+ col.find { |value| value == 'QT1' }
305
+ col.find &/QT1/
306
+
307
+ #Summarise the current row or column into a Hash.
308
+ s.column(1).summarise
309
+ #=> {"Type1"=>3, "Type2"=>2, "Type3"=>1, "Type4"=>1}
310
+
311
+ #Loop through all values
312
+ row.each { |val| puts val }
313
+ col.each { |val| puts val }
314
+
315
+ #Loop through all values without including headers
316
+ col.each_without_headers { |val| puts val }
317
+ col.each_wh { |val| puts val }
318
+
319
+ #Loop through each cell
320
+ row.each_cell { |ce| puts "#{ ce.address }: #{ ce.value }" }
321
+ col.each_cell { |ce| puts "#{ ce.address }: #{ ce.value }" }
322
+
323
+ #Loop through each cell without including headers
324
+ col.each_cell_without_headers { |ce| puts "#{ ce.address }: #{ ce.value }" }
325
+ col.each_cell_wh { |ce| puts "#{ ce.address }: #{ ce.value }" }
326
+
327
+ #Overwrite each value based on its current value
328
+ row.map! { |val| val.to_s + 'a' }
329
+ col.map! { |val| val.to_s + 'a' }
330
+
331
+ #Get the value of a cell in the current row by its header
332
+ row.value_by_header( 'Part' ) #=> 'Type1'
333
+ row.val( 'Part' ) #=> 'Type1'
334
+ ```
335
+
336
+ Cell / Range (Element)
337
+ --------
338
+
339
+ ```ruby
340
+ #Reference a Cell or Range
341
+ cell = s.cell( 2, 2 )
342
+ range = s.range('B2:C3')
343
+
344
+ #Get the address and indices of the Element (Indices return that of the first cell for multi-cell Ranges)
345
+ cell.address
346
+ cell.row
347
+ cell.column
348
+ range.address
349
+ range.row
350
+ range.column
351
+
352
+ #Get and set the value(s)
353
+ cell.value #=> "QT1"
354
+ cell.value = 'QT1'
355
+ range.value #=> [["QT1", "231"], ["QT3", "123"]]
356
+ range.value = "a"
357
+ range.value #=> [["a", "a"], ["a", "a"]]
358
+ range.value = [["QT1", "231"], ["QT3", "123"]]
359
+ range.value #=> [["QT1", "231"], ["QT3", "123"]]
360
+
361
+ #Loop through a range
362
+ range.each { |val| puts val }
363
+
364
+ #Loop through each cell within a range
365
+ range.each_cell { |ce| puts "#{ ce.address }: #{ ce.value }" }
366
+
367
+ ```
368
+
369
+ Address Tools (Included in Sheet, Section, and Element)
370
+ --------
371
+
372
+ ```ruby
373
+ #Get the column index from an address string
374
+ s.address_to_col_index( 'A2' ) #=> 1
375
+
376
+ #Translate an address to indices
377
+ s.address_to_indices( 'A2' ) #=> [ 2, 1 ]
378
+
379
+ #Translate letter(s) to a column index
380
+ s.col_index( 'A' ) #=> 1
381
+
382
+ #Translate a number to column letter(s)
383
+ s.col_letter( 1 ) #=> "A"
384
+
385
+ #Extract the column letter(s) or row number from an address
386
+ s.column_id( 'A2' ) #=> "A"
387
+ s.row_id( 'A2' ) #=> 2
388
+
389
+ #Expand a Range address
390
+ s.expand( 'A1:B2' ) #=> [["A1", "B1"], ["A2","B2"]]
391
+ s.expand( 'A1' ) #=> [["A1"]]
392
+
393
+ #Translate indices to an address
394
+ s.indices_to_address( 2, 1 ) #=> "A2"
395
+
396
+ #Offset an address by rows and columns
397
+ s.offset( 'A2', 1, 2 ) #=> "C3"
398
+ s.offset( 'A2', 2, 0 ) #=> "A4"
399
+ s.offset( 'A2', -1, 0 ) #=> "A1"
400
+
401
+ ```
402
+
403
+ Importing a Hash
404
+ --------
405
+
406
+ ```ruby
407
+ #Import a nested Hash (useful if you're summarising data before handing it to RubyExcel)
408
+
409
+ #Here's an example Hash
410
+ h = {
411
+ Part1: {
412
+ Type1: {
413
+ SubType1: 1, SubType2: 2, SubType3: 3
414
+ },
415
+ Type2: {
416
+ SubType1: 4, SubType2: 5, SubType3: 6
417
+ }
418
+ },
419
+ Part2: {
420
+ Type1: {
421
+ SubType1: 1, SubType2: 2, SubType3: 3
422
+ },
423
+ Type2: {
424
+ SubType1: 4, SubType2: 5, SubType3: 6
425
+ }
426
+ }
427
+ }
428
+
429
+ #Import the Hash to a Sheet
430
+ s.load( h )
431
+ #Or append the Hash to a Sheet
432
+ s << h
433
+
434
+ #Convert the symbols to strings (Not essential, but Excel can't handle Symbols in output)
435
+ s.rows { |r| r.map! { |v| v.is_a?(Symbol) ? v.to_s : v } }
436
+
437
+ #Have a look at the results
438
+ require 'pp'
439
+ pp s.to_a
440
+ [["Part1", "Type1", "SubType1", 1],
441
+ ["Part1", "Type1", "SubType2", 2],
442
+ ["Part1", "Type1", "SubType3", 3],
443
+ ["Part1", "Type2", "SubType1", 4],
444
+ ["Part1", "Type2", "SubType2", 5],
445
+ ["Part1", "Type2", "SubType3", 6],
446
+ ["Part2", "Type1", "SubType1", 1],
447
+ ["Part2", "Type1", "SubType2", 2],
448
+ ["Part2", "Type1", "SubType3", 3],
449
+ ["Part2", "Type2", "SubType1", 4],
450
+ ["Part2", "Type2", "SubType2", 5],
451
+ ["Part2", "Type2", "SubType3", 6]]
452
+
453
+ ```
454
+
455
+ Excel Tools ( requires win32ole and Excel )
456
+ --------
457
+
458
+ Make sure all your data types are compatible with Excel first!
459
+
460
+ ```ruby
461
+ #Sample RubyExcel::Workbook to work with
462
+ rubywb = RubyExcel.sample_sheet.parent
463
+
464
+ #Get a new Excel instance
465
+ excel = rubywb.get_excel
466
+
467
+ #Get a new Excel Workbook
468
+ excelwb = rubywb.get_workbook( excel )
469
+ excelwb = rubywb.get_workbook
470
+
471
+ #Drop data into an Excel Sheet
472
+ rubywb.dump_to_sheet( rubywb.sheets(1).to_a )
473
+ rubywb.dump_to_sheet( rubywb.sheets(1).to_a, excelwb.sheets(1) )
474
+
475
+ #Autofit and left-align a WIN32OLE Excel Sheet
476
+ rubywb.make_sheet_pretty( excelwb.sheets(1) )
477
+
478
+ #Output the RubyExcel::Workbook into a new Excel Workbook
479
+ rubywb.to_excel
480
+
481
+ #Output the RubyExcel::Sheet into a new Excel Workbook
482
+ rubywb.sheets(1).to_excel
483
+
484
+ #Output the RubyExcel::Workbook into an Excel Workbook and save the file
485
+ #Note: The default directory is "Documents" or "My Documents" to support Ocra + InnoSetup installs.
486
+ #Note: There is an optional second argument which if set to true doesn't make Excel visible.
487
+ # This is a useful accelerator when running as an automated process.
488
+ # If you set the process to be invisible, don't forget to close Excel after you're finished with it!
489
+ rubywb.save_excel
490
+ rubywb.save_excel( 'Output.xlsx' )
491
+ rubywb.save_excel( 'c:/example/Output.xlsx' )
492
+
493
+ #Add borders to a given Excel Range
494
+ #1st Argument: WIN32OLE Range
495
+ #2nd Argument (default 1), weight of borders (0 to 4)
496
+ #3rd Argument (default false), include inner borders
497
+ RubyExcel.borders( excelwb.sheets(1).usedrange ) #Give used range outer borders
498
+ RubyExcel.borders( excelwb.sheets(1).usedrange, 2, true ) #Give used range inner and outer borders, medium weight
499
+ RubyExcel.borders( excelwb.sheets(1).usedrange, 0, false ) #Clear outer borders from used range
500
+
501
+ #You can even enter formula strings and Excel will evaluate them in the output.
502
+ s = rubywb.sheets(1)
503
+ s.row(1) << 'Formula'
504
+ s.rows(2) { |row| row << "=SUM(D#{ row.idx }:E#{ row.idx })" }
505
+ s.to_excel
506
+
507
+ ```
508
+
509
+ Comparison of operations with and without RubyExcel gem
510
+ --------
511
+
512
+ Without RubyExcel (one way to to it):
513
+
514
+ ```ruby
515
+ #Filter to only 'Part' of 'Type1' and 'Type3' while keeping the header row
516
+ idx = data[0].index( 'Part' )
517
+ data = [ data[0] ] + data[1..-1].select { |row| row[ idx ] =~ /Type[13]/ }
518
+
519
+ #Keep only the columns 'Cost' and 'Ref2' in that order
520
+ max_size = data.max_by(&:length).length #Standardise the row size to transpose into columns
521
+ data.map! { |row| row.length == max_size ? row : row + Array.new( max_size - row.length, nil) }
522
+ headers = [ 'Cost', 'Ref2' ]
523
+ data = data.transpose.select { |header,_| headers.index(header) }.sort_by { |header,_| headers.index(header) }.transpose
524
+
525
+ #Get the combined 'Cost' of every 'Part' of 'Type1' and 'Type3'
526
+ find_idx, sum_idx = data[0].index('Part'), data[0].index('Cost')
527
+ data[1..-1].inject(0) { |sum, row| row[find_idx] =~ /Type[13]/ ? sum + row[sum_idx] : sum }
528
+
529
+ #Write the data to a TSV file
530
+ output = data.map { |row| row.map { |el| "#{el}".strip.gsub( /\s/, ' ' ) }.join "\t" }.join $/
531
+ File.write( 'output.txt', output )
532
+
533
+ #Drop the data into an Excel sheet ( using Excel and win32ole )
534
+ excel = WIN32OLE::new( 'excel.application' )
535
+ excel.visible = true
536
+ wb = excel.workbooks.add
537
+ sheet = wb.sheets(1)
538
+ sheet.range( sheet.cells( 1, 1 ), sheet.cells( data.length, data[0].length ) ).value = data
539
+ wb.saveas( Dir.pwd.gsub('/','\\') + '\\Output.xlsx' )
540
+ ```
541
+
542
+ With RubyExcel:
543
+
544
+ ```ruby
545
+ #Filter to only 'Part' of 'Type1' and 'Type3' while keeping the header row
546
+ s.filter!( 'Part', &/Type[13]/ )
547
+
548
+ #Keep only the columns 'Cost' and 'Ref2' in that order
549
+ s.get_columns!( 'Cost', 'Ref2' )
550
+
551
+ #Get the combined 'Cost' of every 'Part' of 'Type1' and 'Type3'
552
+ s.sumif( 'Part', 'Cost', &/Type[13]/ )
553
+
554
+ #Write the data to a TSV file
555
+ File.write( 'output.txt', s.to_s )
556
+
557
+ #Write the data to an XLSX file ( requires Excel and win32ole )
558
+ s.parent.save_excel( 'Output.xlsx' )
559
+ ```
560
+
561
+ Todo List
562
+ =========
563
+
564
+ - Allow argument overloading for methods like filter to avoid repetition and increase efficiency.
565
+
566
+ - Add support for Range notations like "A:A" and "A:B"
567
+
568
+ - Write TestCases (after learning how to do it)
569
+
570
+ - Find bugs and extirpate them.
571
+
572
+ - Optimise slow operations
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rubyexcel
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.6
4
+ version: 0.0.7
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -24,6 +24,7 @@ files:
24
24
  - lib/rubyexcel/rubyexcel_components.rb
25
25
  - lib/rubyexcel/section.rb
26
26
  - lib/rubyexcel.rb
27
+ - lib/README.md
27
28
  homepage: https://github.com/VirtuosoJoel
28
29
  licenses: []
29
30
  post_install_message: