rubyexcel 0.0.6 → 0.0.7

Sign up to get free protection for your applications and to get access to all the features.
Files changed (2) hide show
  1. data/lib/README.md +572 -0
  2. metadata +2 -1
data/lib/README.md ADDED
@@ -0,0 +1,572 @@
1
+ RubyExcel
2
+ =========
3
+
4
+ Designed for Windows with MS Excel
5
+
6
+ **Still under construction! Bugs are inevitable.**
7
+
8
+ Introduction
9
+ ------------
10
+
11
+ A Data-analysis tool for Ruby, with an Excel-style API.
12
+
13
+ Details
14
+ -----
15
+
16
+ Key design features taken from Excel:
17
+
18
+ * 1-based indexing.
19
+ * Referencing objects like Excel's API ( Workbook, Sheet, Row, Column, Cell, Range ).
20
+ * Useful data-handling functions ( e.g. Filter, Match, Sumif, Vlookup ).
21
+
22
+ Typical usage:
23
+
24
+ 1. Extract a HTML Table or CSV File into 2D Array ( normally with Nokogiri / Mechanize )
25
+ 2. Organise and interpret data with RubyExcel
26
+ 3. Output results into a file.
27
+
28
+ About
29
+ -----
30
+
31
+ This gem is designed as a way to conveniently edit table data before outputting it to Excel (XLSX) or TSV format (which Excel can interpret).
32
+ It attempts to take as much as possible from Excel's API while providing some of the best bits of Ruby ( e.g. Enumerators, Blocks, Regexp ).
33
+ An important feature is allowing reference to Columns via their Headers for convenience and enhanced code readability.
34
+ As this works directly on the data, processing is faster than using Excel itself.
35
+
36
+ This was written out of the frustration of editing tabular data using Ruby's multidimensional arrays,
37
+ without affecting headers and while maintaining code readability.
38
+ Its API is designed to simplify moving code across from VBA into Ruby format when processing spreadsheet data.
39
+ The combination of Ruby, WIN32OLE Excel, and extracting HTML table data is probably quite rare; but I thought I'd share what I came up with.
40
+
41
+ Examples
42
+ ========
43
+
44
+ Expected Data Layout (2D Array)
45
+ --------
46
+
47
+ ```ruby
48
+ data = [
49
+ [ 'Part', 'Ref1', 'Ref2', 'Qty', 'Cost' ],
50
+ [ 'Type1', 'QT1', '231', 1, 35.15 ],
51
+ [ 'Type2', 'QT3', '123', 1, 40 ],
52
+ [ 'Type3', 'XT1', '321', 3, 0.1 ],
53
+ [ 'Type1', 'XY2', '132', 1, 30.00 ],
54
+ [ 'Type4', 'XT3', '312', 2, 3 ],
55
+ [ 'Type2', 'QY2', '213', 1, 99.99 ],
56
+ [ 'Type1', 'QT4', '123', 2, 104 ]
57
+ ]
58
+ ```
59
+ The number of header rows defaults to 1
60
+
61
+ Loading the data into a Sheet
62
+ --------
63
+
64
+ ```ruby
65
+ require 'rubyexcel'
66
+
67
+ wb = RubyExcel::Workbook.new
68
+ s = wb.add( 'Sheet1' )
69
+ s.load( data )
70
+
71
+ Or:
72
+
73
+ wb = RubyExcel::Workbook.new
74
+ s = wb.add( 'Sheet1' )
75
+ s.load( RubyExcel.sample_data )
76
+
77
+ Or:
78
+
79
+ wb = RubyExcel::Workbook.new
80
+ s = wb.load( RubyExcel.sample_data )
81
+
82
+ Or:
83
+
84
+ s = RubyExcel.sample_sheet
85
+ wb = s.parent
86
+ ```
87
+
88
+ Using the Mechanize gem to get data
89
+ --------
90
+
91
+ ```ruby
92
+ s = RubyExcel::Workbook.new.load( CSV.parse( Mechanize.new.get('http://example.com/myfile.csv').content ) )
93
+ ```
94
+
95
+ Reference a cell's value
96
+ --------
97
+
98
+ ```ruby
99
+ s['A7']
100
+ s.cell(7,1).value
101
+ s.range('A7').value
102
+ s.row(7)['A']
103
+ s.row(7)[1]
104
+ s.column('A')[7]
105
+ s.column('A')['7']
106
+ ```
107
+
108
+ Reference a group of cells
109
+ --------
110
+
111
+ ```ruby
112
+ s['A1:B3'] #=> Array
113
+ s.range( 'A1:B3' ) #=> Element
114
+ s.range( 'A1', 'B3' ) #=> Element
115
+ s.range( s.cell( 1, 1 ), s.cell( 3, 2 ) ) #=> Element
116
+ s.row( 1 ) #=> Row
117
+ s.column( 'A' ) #=> Column
118
+ s.column( 1 ) #=> Column
119
+ ```
120
+
121
+ Detailed Interactions
122
+ ========
123
+
124
+ Workbook
125
+ --------
126
+
127
+ ```ruby
128
+ #Create a workbook
129
+ wb = RubyExcel::Workbook.new
130
+
131
+ #Add sheets to the workbook
132
+ sheet1, sheet2 = wb.add('Sheet1'), wb.add
133
+
134
+ #Delete all sheets from a workbook
135
+ wb.clear_all
136
+
137
+ #Delete a specific sheet
138
+ wb.delete( 1 )
139
+ wb.delete( 'Sheet1' )
140
+ wb.delete( sheet1 )
141
+ wb.delete( /sheet1/i )
142
+
143
+ #Shortcut to create a sheet with a default name and fill it with data
144
+ wb.load( data )
145
+
146
+ #Select a sheet
147
+ wb.sheets(1) #=> RubyExcel::Sheet
148
+ wb.sheets('Sheet1') #=> RubyExcel::Sheet
149
+
150
+ #Iterate through all sheets
151
+ wb.sheets #=> Enumerator
152
+ wb.each #=> Enumerator
153
+
154
+ #Sort the sheets
155
+ wb.sort! { |x,y| x.name <=> y.name }
156
+ wb.sort_by! &:name
157
+ ```
158
+
159
+ Sheet
160
+ --------
161
+
162
+ ```ruby
163
+ #Create a sheet
164
+ s = wb.add #Name defaults to 'Sheet' + total number of sheets
165
+ s = wb.add( 'Sheet1' )
166
+
167
+ #Access the sheet name
168
+ s.name #=> 'Sheet1'
169
+ s.name = 'Sheet1'
170
+
171
+ #Access the parent workbook
172
+ s.workbook
173
+ s.parent
174
+
175
+ #Access the headers
176
+ s.header_rows #=> 1
177
+ s.headers #=> 1
178
+ s.headers = 1
179
+ s.header_rows = 1
180
+
181
+ #Specify the number of header rows when loading data
182
+ s.load( data, 1 )
183
+
184
+ #Append data (at the bottom of the sheet)
185
+ s << data
186
+ s << s
187
+ s += data
188
+ s += s
189
+
190
+ #Remove identical rows in another data set (skipping any headers)
191
+ s -= data
192
+ s -= s
193
+
194
+ #Select a column by its header
195
+ s.column_by_header( 'Part' )
196
+ s.ch( 'Part' )
197
+ #=> Column
198
+
199
+ #Iterate through rows or columns
200
+ s.rows { |r| puts r } #All rows
201
+ s.rows( 2 ) { |r| puts r } #From the 2nd to the last row
202
+ s.rows( 1, 3 ) { |r| puts r } #Rows 1 to 3
203
+ s.columns { |c| puts c } #All columns
204
+ s.columns( 'B' ) { |c| puts c } #From the 2nd to the last column
205
+ s.columns( 2 ) { |c| puts c } #From the 2nd to the last column
206
+ s.columns( 'B', 'D' ) { |c| puts c } #Columns 2 to 4
207
+ s.columns( 2, 4 ) { |c| puts c } #Columns 2 to 4
208
+
209
+ #Remove all empty rows & columns
210
+ s.compact!
211
+
212
+ #Delete the current sheet from the workbook
213
+ s.delete
214
+
215
+ #Delete rows or columns "if( condition )" (iterates in reverse to preserve references during loop)
216
+ s.delete_rows_if { |r| r.empty? }
217
+ s.delete_columns_if { |c| c.empty? }
218
+
219
+ #Filter the data given a column and a block to test values against.
220
+ #Note: Returns a copy of the sheet when used without "!".
221
+ #Note: This gem carries a Regexp to_proc method for Regex shorthand (shown below).
222
+ s.filter!( 'Part' ) { |value| value =~ /Type[13]/ }
223
+ s.filter!( 'Part', &/Type[13]/ )
224
+
225
+ #Filter the data to a specific set of columns by their headers.
226
+ #Note: Returns a copy of the sheet when used without "!".
227
+ s.get_columns!( 'Cost', 'Part', 'Qty' )
228
+ s.gc!( 'Cost', 'Part', 'Qty' )
229
+
230
+ #Insert blank rows or columns ( before, number to insert )
231
+ s.insert_rows( 2, 2 ) #Inserts 2 empty rows before row 2
232
+ s.insert_columns( 'B', 1 ) #Inserts 2 empty columns before column 2
233
+ s.insert_columns( 2, 1 ) #Inserts 2 empty columns before column 2
234
+
235
+ #Find the first row which matches a value within a column (selected by header)
236
+ #Note: Can now accept a Column object in place of a header.
237
+ s.match( 'Qty' ) { |value| value == 1 } #=> 2
238
+ s.match( 'Part', &/Type2/ ) #=> 3
239
+
240
+ #Find the current end of the data range
241
+ s.maxrow #=> 8
242
+ s.rows.count #=> 8
243
+ s.maxcol #=> 5
244
+ s.columns.count #=> 5
245
+
246
+ #Partition the sheet into two, given a header and a block (like Filter)
247
+ #Note: this keeps the headers intact in both outputs sheets
248
+ type_1_and_3, other = s.partition( 'Part' ) { |value| value =~ /Type[13]/ }
249
+ type_1_and_3, other = s.partition( 'Part', &/Type[13]/ )
250
+
251
+ #Reverse the data by rows or columns (ignores headers)
252
+ s.reverse_rows!
253
+ s.reverse_columns!
254
+
255
+ #Sort the rows by criteria (ignores headers)
256
+ s.sort! { |r1,r2| r1['A'] <=> r2['A'] }
257
+ s.sort_by! { |r| r['A'] }
258
+
259
+ #Sum all elements in a column by criteria in another column (selected by header)
260
+ #Parameters: Header to pass to the block, Header to sum, Block.
261
+ #Note: Now also accepts Column objects in place of headers.
262
+ s.sumif( 'Part', 'Cost' ) { |part| part == 'Type1' } #=> 169.15
263
+ s.sumif( 'Part', 'Cost', &/Type1/ ) #=> 169.15
264
+
265
+ #Convert the data into various formats:
266
+ s.to_a #=> 2D Array
267
+ s.to_excel #=> WIN32OLE Excel Workbook (Contains only the current sheet)
268
+ s.to_html #=> String (HTML table)
269
+ s.to_s #=> String (TSV)
270
+
271
+ #Remove all rows with duplicate values in the given column (selected by header or Column object)
272
+ s.uniq! 'Part'
273
+
274
+ #Find a value in one column by searching another one (selected by headers or Column objects)
275
+ s.vlookup( 'Part', 'Ref1', &/Type4/ ) #=> "XT3"
276
+ ```
277
+
278
+ Row / Column (Section)
279
+ --------
280
+
281
+ ```ruby
282
+ #Reference a Row or Column
283
+ row = s.row(2)
284
+ col = s.column('B')
285
+
286
+ =begin
287
+ Append a value
288
+ Note: Only extends the data boundaries when at the first row or column.
289
+ This allows looping through an entire row or column to append single values,
290
+ without worrying about using the correct index.
291
+ =end
292
+ s.row(1) << 'New'
293
+ s.rows(2) { |r| r << 'Column' }
294
+ s.column(1) << 'New'
295
+ s.columns(2) { |c| c << 'Row' }
296
+
297
+ #Delete the data referenced by self.
298
+ row.delete
299
+ col.delete
300
+
301
+ #Find the address of a cell matching a block
302
+ row.find { |value| value == 'QT1' }
303
+ row.find &/QT1/
304
+ col.find { |value| value == 'QT1' }
305
+ col.find &/QT1/
306
+
307
+ #Summarise the current row or column into a Hash.
308
+ s.column(1).summarise
309
+ #=> {"Type1"=>3, "Type2"=>2, "Type3"=>1, "Type4"=>1}
310
+
311
+ #Loop through all values
312
+ row.each { |val| puts val }
313
+ col.each { |val| puts val }
314
+
315
+ #Loop through all values without including headers
316
+ col.each_without_headers { |val| puts val }
317
+ col.each_wh { |val| puts val }
318
+
319
+ #Loop through each cell
320
+ row.each_cell { |ce| puts "#{ ce.address }: #{ ce.value }" }
321
+ col.each_cell { |ce| puts "#{ ce.address }: #{ ce.value }" }
322
+
323
+ #Loop through each cell without including headers
324
+ col.each_cell_without_headers { |ce| puts "#{ ce.address }: #{ ce.value }" }
325
+ col.each_cell_wh { |ce| puts "#{ ce.address }: #{ ce.value }" }
326
+
327
+ #Overwrite each value based on its current value
328
+ row.map! { |val| val.to_s + 'a' }
329
+ col.map! { |val| val.to_s + 'a' }
330
+
331
+ #Get the value of a cell in the current row by its header
332
+ row.value_by_header( 'Part' ) #=> 'Type1'
333
+ row.val( 'Part' ) #=> 'Type1'
334
+ ```
335
+
336
+ Cell / Range (Element)
337
+ --------
338
+
339
+ ```ruby
340
+ #Reference a Cell or Range
341
+ cell = s.cell( 2, 2 )
342
+ range = s.range('B2:C3')
343
+
344
+ #Get the address and indices of the Element (Indices return that of the first cell for multi-cell Ranges)
345
+ cell.address
346
+ cell.row
347
+ cell.column
348
+ range.address
349
+ range.row
350
+ range.column
351
+
352
+ #Get and set the value(s)
353
+ cell.value #=> "QT1"
354
+ cell.value = 'QT1'
355
+ range.value #=> [["QT1", "231"], ["QT3", "123"]]
356
+ range.value = "a"
357
+ range.value #=> [["a", "a"], ["a", "a"]]
358
+ range.value = [["QT1", "231"], ["QT3", "123"]]
359
+ range.value #=> [["QT1", "231"], ["QT3", "123"]]
360
+
361
+ #Loop through a range
362
+ range.each { |val| puts val }
363
+
364
+ #Loop through each cell within a range
365
+ range.each_cell { |ce| puts "#{ ce.address }: #{ ce.value }" }
366
+
367
+ ```
368
+
369
+ Address Tools (Included in Sheet, Section, and Element)
370
+ --------
371
+
372
+ ```ruby
373
+ #Get the column index from an address string
374
+ s.address_to_col_index( 'A2' ) #=> 1
375
+
376
+ #Translate an address to indices
377
+ s.address_to_indices( 'A2' ) #=> [ 2, 1 ]
378
+
379
+ #Translate letter(s) to a column index
380
+ s.col_index( 'A' ) #=> 1
381
+
382
+ #Translate a number to column letter(s)
383
+ s.col_letter( 1 ) #=> "A"
384
+
385
+ #Extract the column letter(s) or row number from an address
386
+ s.column_id( 'A2' ) #=> "A"
387
+ s.row_id( 'A2' ) #=> 2
388
+
389
+ #Expand a Range address
390
+ s.expand( 'A1:B2' ) #=> [["A1", "B1"], ["A2","B2"]]
391
+ s.expand( 'A1' ) #=> [["A1"]]
392
+
393
+ #Translate indices to an address
394
+ s.indices_to_address( 2, 1 ) #=> "A2"
395
+
396
+ #Offset an address by rows and columns
397
+ s.offset( 'A2', 1, 2 ) #=> "C3"
398
+ s.offset( 'A2', 2, 0 ) #=> "A4"
399
+ s.offset( 'A2', -1, 0 ) #=> "A1"
400
+
401
+ ```
402
+
403
+ Importing a Hash
404
+ --------
405
+
406
+ ```ruby
407
+ #Import a nested Hash (useful if you're summarising data before handing it to RubyExcel)
408
+
409
+ #Here's an example Hash
410
+ h = {
411
+ Part1: {
412
+ Type1: {
413
+ SubType1: 1, SubType2: 2, SubType3: 3
414
+ },
415
+ Type2: {
416
+ SubType1: 4, SubType2: 5, SubType3: 6
417
+ }
418
+ },
419
+ Part2: {
420
+ Type1: {
421
+ SubType1: 1, SubType2: 2, SubType3: 3
422
+ },
423
+ Type2: {
424
+ SubType1: 4, SubType2: 5, SubType3: 6
425
+ }
426
+ }
427
+ }
428
+
429
+ #Import the Hash to a Sheet
430
+ s.load( h )
431
+ #Or append the Hash to a Sheet
432
+ s << h
433
+
434
+ #Convert the symbols to strings (Not essential, but Excel can't handle Symbols in output)
435
+ s.rows { |r| r.map! { |v| v.is_a?(Symbol) ? v.to_s : v } }
436
+
437
+ #Have a look at the results
438
+ require 'pp'
439
+ pp s.to_a
440
+ [["Part1", "Type1", "SubType1", 1],
441
+ ["Part1", "Type1", "SubType2", 2],
442
+ ["Part1", "Type1", "SubType3", 3],
443
+ ["Part1", "Type2", "SubType1", 4],
444
+ ["Part1", "Type2", "SubType2", 5],
445
+ ["Part1", "Type2", "SubType3", 6],
446
+ ["Part2", "Type1", "SubType1", 1],
447
+ ["Part2", "Type1", "SubType2", 2],
448
+ ["Part2", "Type1", "SubType3", 3],
449
+ ["Part2", "Type2", "SubType1", 4],
450
+ ["Part2", "Type2", "SubType2", 5],
451
+ ["Part2", "Type2", "SubType3", 6]]
452
+
453
+ ```
454
+
455
+ Excel Tools ( requires win32ole and Excel )
456
+ --------
457
+
458
+ Make sure all your data types are compatible with Excel first!
459
+
460
+ ```ruby
461
+ #Sample RubyExcel::Workbook to work with
462
+ rubywb = RubyExcel.sample_sheet.parent
463
+
464
+ #Get a new Excel instance
465
+ excel = rubywb.get_excel
466
+
467
+ #Get a new Excel Workbook
468
+ excelwb = rubywb.get_workbook( excel )
469
+ excelwb = rubywb.get_workbook
470
+
471
+ #Drop data into an Excel Sheet
472
+ rubywb.dump_to_sheet( rubywb.sheets(1).to_a )
473
+ rubywb.dump_to_sheet( rubywb.sheets(1).to_a, excelwb.sheets(1) )
474
+
475
+ #Autofit and left-align a WIN32OLE Excel Sheet
476
+ rubywb.make_sheet_pretty( excelwb.sheets(1) )
477
+
478
+ #Output the RubyExcel::Workbook into a new Excel Workbook
479
+ rubywb.to_excel
480
+
481
+ #Output the RubyExcel::Sheet into a new Excel Workbook
482
+ rubywb.sheets(1).to_excel
483
+
484
+ #Output the RubyExcel::Workbook into an Excel Workbook and save the file
485
+ #Note: The default directory is "Documents" or "My Documents" to support Ocra + InnoSetup installs.
486
+ #Note: There is an optional second argument which if set to true doesn't make Excel visible.
487
+ # This is a useful accelerator when running as an automated process.
488
+ # If you set the process to be invisible, don't forget to close Excel after you're finished with it!
489
+ rubywb.save_excel
490
+ rubywb.save_excel( 'Output.xlsx' )
491
+ rubywb.save_excel( 'c:/example/Output.xlsx' )
492
+
493
+ #Add borders to a given Excel Range
494
+ #1st Argument: WIN32OLE Range
495
+ #2nd Argument (default 1), weight of borders (0 to 4)
496
+ #3rd Argument (default false), include inner borders
497
+ RubyExcel.borders( excelwb.sheets(1).usedrange ) #Give used range outer borders
498
+ RubyExcel.borders( excelwb.sheets(1).usedrange, 2, true ) #Give used range inner and outer borders, medium weight
499
+ RubyExcel.borders( excelwb.sheets(1).usedrange, 0, false ) #Clear outer borders from used range
500
+
501
+ #You can even enter formula strings and Excel will evaluate them in the output.
502
+ s = rubywb.sheets(1)
503
+ s.row(1) << 'Formula'
504
+ s.rows(2) { |row| row << "=SUM(D#{ row.idx }:E#{ row.idx })" }
505
+ s.to_excel
506
+
507
+ ```
508
+
509
+ Comparison of operations with and without RubyExcel gem
510
+ --------
511
+
512
+ Without RubyExcel (one way to to it):
513
+
514
+ ```ruby
515
+ #Filter to only 'Part' of 'Type1' and 'Type3' while keeping the header row
516
+ idx = data[0].index( 'Part' )
517
+ data = [ data[0] ] + data[1..-1].select { |row| row[ idx ] =~ /Type[13]/ }
518
+
519
+ #Keep only the columns 'Cost' and 'Ref2' in that order
520
+ max_size = data.max_by(&:length).length #Standardise the row size to transpose into columns
521
+ data.map! { |row| row.length == max_size ? row : row + Array.new( max_size - row.length, nil) }
522
+ headers = [ 'Cost', 'Ref2' ]
523
+ data = data.transpose.select { |header,_| headers.index(header) }.sort_by { |header,_| headers.index(header) }.transpose
524
+
525
+ #Get the combined 'Cost' of every 'Part' of 'Type1' and 'Type3'
526
+ find_idx, sum_idx = data[0].index('Part'), data[0].index('Cost')
527
+ data[1..-1].inject(0) { |sum, row| row[find_idx] =~ /Type[13]/ ? sum + row[sum_idx] : sum }
528
+
529
+ #Write the data to a TSV file
530
+ output = data.map { |row| row.map { |el| "#{el}".strip.gsub( /\s/, ' ' ) }.join "\t" }.join $/
531
+ File.write( 'output.txt', output )
532
+
533
+ #Drop the data into an Excel sheet ( using Excel and win32ole )
534
+ excel = WIN32OLE::new( 'excel.application' )
535
+ excel.visible = true
536
+ wb = excel.workbooks.add
537
+ sheet = wb.sheets(1)
538
+ sheet.range( sheet.cells( 1, 1 ), sheet.cells( data.length, data[0].length ) ).value = data
539
+ wb.saveas( Dir.pwd.gsub('/','\\') + '\\Output.xlsx' )
540
+ ```
541
+
542
+ With RubyExcel:
543
+
544
+ ```ruby
545
+ #Filter to only 'Part' of 'Type1' and 'Type3' while keeping the header row
546
+ s.filter!( 'Part', &/Type[13]/ )
547
+
548
+ #Keep only the columns 'Cost' and 'Ref2' in that order
549
+ s.get_columns!( 'Cost', 'Ref2' )
550
+
551
+ #Get the combined 'Cost' of every 'Part' of 'Type1' and 'Type3'
552
+ s.sumif( 'Part', 'Cost', &/Type[13]/ )
553
+
554
+ #Write the data to a TSV file
555
+ File.write( 'output.txt', s.to_s )
556
+
557
+ #Write the data to an XLSX file ( requires Excel and win32ole )
558
+ s.parent.save_excel( 'Output.xlsx' )
559
+ ```
560
+
561
+ Todo List
562
+ =========
563
+
564
+ - Allow argument overloading for methods like filter to avoid repetition and increase efficiency.
565
+
566
+ - Add support for Range notations like "A:A" and "A:B"
567
+
568
+ - Write TestCases (after learning how to do it)
569
+
570
+ - Find bugs and extirpate them.
571
+
572
+ - Optimise slow operations
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rubyexcel
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.6
4
+ version: 0.0.7
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -24,6 +24,7 @@ files:
24
24
  - lib/rubyexcel/rubyexcel_components.rb
25
25
  - lib/rubyexcel/section.rb
26
26
  - lib/rubyexcel.rb
27
+ - lib/README.md
27
28
  homepage: https://github.com/VirtuosoJoel
28
29
  licenses: []
29
30
  post_install_message: