sycsvpro 0.1.13 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore CHANGED
@@ -19,3 +19,4 @@ doc/
19
19
 
20
20
  # Test files
21
21
  *.csv
22
+ test-files/
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- sycsvpro (0.1.13)
4
+ sycsvpro (0.2.0)
5
5
  gli (= 2.9.0)
6
6
  timeleap (~> 0.0.1)
7
7
 
data/README.md CHANGED
@@ -24,6 +24,8 @@ Processing of csv files. *sycsvpro* offers following functions
24
24
  * join two file based on a joint column value (since version 0.1.7)
25
25
  * merge files based on common headline columns (since version 0.1.10)
26
26
  * transpose (swapping) rows and columns (since version 0.1.13)
27
+ * arithmetic operations between multiple files that have a table like
28
+ structure (since version 0.2.0)
27
29
 
28
30
  To get help type
29
31
 
@@ -255,6 +257,158 @@ Write only columns 0, 6 and 7 by specifying write columns
255
257
  chiro;2;20
256
258
  0;10;100
257
259
 
260
+ Spread Sheet
261
+ ------------
262
+ A spread sheet is a table with rows and columns. On or between spread sheets
263
+ operations can be conducted. A spread sheet's rows must have same column
264
+ sizes and may have row and column labels.
265
+
266
+ Use cases are
267
+
268
+ * arithmetic operations on spread sheets
269
+ * information about table like data
270
+
271
+ ###Example for Arithmetic Operation
272
+ Asume we want to calculate the market for computer services. We have the count
273
+ of computers in each country, we are offering different services with service
274
+ specific prices. We know the market for each service in percent. With this data
275
+ we can calculate the market value.
276
+
277
+ Count of computers in target countries
278
+
279
+ [Tablet] [Laptop] [Desktop]
280
+ [CA] 1000 2000 500
281
+ [DE] 2000 3000 400
282
+ [MX] 500 4000 800
283
+ [RU] 1500 1500 1000
284
+ [TR] 1000 2500 3000
285
+ [US] 3000 3500 1200
286
+
287
+ Prices for different services offered computer specific
288
+
289
+ [Clean] [Maintain] [Repair]
290
+ [Tablet] 10 50 100
291
+ [Laptop] 20 60 150
292
+ [Desktop] 50 100 200
293
+
294
+ Market for the different services
295
+
296
+ [Clean] [Maintain] [Repair]
297
+ [Tablet] 0.10 0.05 0.03
298
+ [Laptop] 0.05 0.10 0.02
299
+ [Desktop] 0.20 0.30 0.04
300
+
301
+ To calculate the market value we have to multiply each row of the country file
302
+ with the columns of the service prices and service market file (for readabiltiy
303
+ it has been split up to multiple rows)
304
+
305
+ $ sycsvpro -o market_value.csv spreadsheet \
306
+ -f country.csv,prices.csv,market.csv \
307
+ -a country,price,market \
308
+ -o "SpreadSheet.bind_columns( \
309
+ country.transpose.column_collect { |value| value * price * market } \
310
+ ).transpose"
311
+
312
+ Note: If you get obscure errors then check whether your aliases (-a flag)
313
+ conflict with a method of your classes. Therefore it is adviced to
314
+ always use specific names like in the example country, price, market
315
+
316
+ The result of the operation is written to market\_value.csv (labels have been
317
+ optimized for better readability)
318
+
319
+ [Tablet] [Laptop] [Desktop]
320
+ [CA-Clean] 1000.0 2000.0 5000.0
321
+ [CA-Maintain] 2500.0 12000.0 15000.0
322
+ [CA-Repair] 3000.0 6000.0 4000.0
323
+ [DE-Clean] 2000.0 3000.0 4000.0
324
+ [DE-Maintain] 5000.0 18000.0 12000.0
325
+ [DE-Repair] 6000.0 9000.0 3200.0
326
+ [MX-Clean] 500.0 4000.0 8000.0
327
+ [MX-Maintain] 1250.0 24000.0 24000.0
328
+ [MX-Repair] 1500.0 12000.0 6400.0
329
+ [RU-Clean] 1500.0 1500.0 10000.0
330
+ [RU-Maintain] 3750.0 9000.0 30000.0
331
+ [RU-Repair] 4500.0 4500.0 8000.0
332
+ [TR-Clean] 1000.0 2500.0 30000.0
333
+ [TR-Maintain] 2500.0 15000.0 90000.0
334
+ [TR-Repair] 3000.0 7500.0 24000.0
335
+ [US-Clean] 3000.0 3500.0 12000.0
336
+ [US-Maintain] 7500.0 21000.0 36000.0
337
+ [US-Repair] 9000.0 10500.0 9600.0
338
+
339
+ ###Example for Information on Spread Sheets
340
+ With the analyze command we get information about the general structure and some
341
+ sample data of a csv file. If we want to look at the csv file more detailed we
342
+ can use the spreadsheet command. In this case we don't want to write the result
343
+ to the file as it is no spread sheet, so we can ommit the global -o option.
344
+
345
+ sycsvpro spreadsheet -f country.csv -r true -c true -a a \
346
+ -o "puts;puts a;puts a.ncol;puts a.nrow;puts a.size"
347
+
348
+ This will give us the information about the data, the number of columns and rows
349
+ and the number of values in the csv file. But for this case there is a standard
350
+ method that provides this information
351
+
352
+ sycsvpro spreadsheet -f country.csv -r true, -c true -a a -o "a.summary"
353
+
354
+ Summary
355
+ -------
356
+ rows: 6, columns: 3, dimension: [6, 3], size: 18
357
+
358
+ row labels:
359
+ ["CA","DE","MX","RU","TR","US"]
360
+ column labels:
361
+ ["Clean","Maintain","Repair"]
362
+
363
+ If the result is no spread sheet it won't be written to the outfile (-o) but we
364
+ can print the result to the console with the -p flag
365
+
366
+ sycsvpro spreadsheet -f country.csv,prices.csv,market.csv \
367
+ -r true,true,true -c true,true,true \
368
+ -a country,price,market \
369
+ -o "result = []; \
370
+ a.each_column { \
371
+ |column| result << column * price * market \
372
+ }; \
373
+ result" \
374
+ -p
375
+
376
+ The last evaluation, in this case result, will be returned as the result. The
377
+ -p flag will print the result to the console
378
+
379
+ Operation
380
+ ---------
381
+ result = []
382
+ country.transpose.each_column { |column| result << column * price * market }
383
+ result
384
+
385
+ Result
386
+ ------
387
+ [CA*Clean*Clean] [CA*Maintain*Maintain] [CA*Repair*Repair]
388
+ [Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
389
+ [Laptop*Laptop*Laptop] 2000.0 12000.0 6000.0
390
+ [Desktop*Desktop*Desktop] 5000.0 15000.0 4000.0
391
+ [DE*Clean*Clean] [DE*Maintain*Maintain] [DE*Repair*Repair]
392
+ [Tablet*Tablet*Tablet] 2000.0 5000.0 6000.0
393
+ [Laptop*Laptop*Laptop] 3000.0 18000.0 9000.0
394
+ [Desktop*Desktop*Desktop] 4000.0 12000.0 3200.0
395
+ [MX*Clean*Clean] [MX*Maintain*Maintain] [MX*Repair*Repair]
396
+ [Tablet*Tablet*Tablet] 500.0 1250.0 1500.0
397
+ [Laptop*Laptop*Laptop] 4000.0 24000.0 12000.0
398
+ [Desktop*Desktop*Desktop] 8000.0 24000.0 6400.0
399
+ [RU*Clean*Clean] [RU*Maintain*Maintain] [RU*Repair*Repair]
400
+ [Tablet*Tablet*Tablet] 1500.0 3750.0 4500.0
401
+ [Laptop*Laptop*Laptop] 1500.0 9000.0 4500.0
402
+ [Desktop*Desktop*Desktop] 10000.0 30000.0 8000.0
403
+ [TR*Clean*Clean] [TR*Maintain*Maintain] [TR*Repair*Repair]
404
+ [Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
405
+ [Laptop*Laptop*Laptop] 2500.0 15000.0 7500.0
406
+ [Desktop*Desktop*Desktop] 30000.0 90000.0 24000.0
407
+ [US*Clean*Clean] [US*Maintain*Maintain] [US*Repair*Repair]
408
+ [Tablet*Tablet*Tablet] 3000.0 7500.0 9000.0
409
+ [Laptop*Laptop*Laptop] 3500.0 21000.0 10500.0
410
+ [Desktop*Desktop*Desktop] 12000.0 36000.0 9600.0
411
+
258
412
  Join
259
413
  ----
260
414
  Join the machine and contract file with columns from the customer address file
@@ -412,16 +566,17 @@ want to dig deeper I would recommend [R](http://www.r-project.org/).
412
566
 
413
567
  A work flow could be as follows
414
568
 
415
- * Analyze the file `analyze`
569
+ * Analyze the file `analyze` or `spreadsheet`
416
570
  * Clean the data `map`
417
571
  * Extract rows and columns of interest `extract`
418
572
  * Count values `count`
419
- * Do arithmetic operations on the values `calc`
420
- * Sort the rows based on column values
573
+ * Do arithmetic operations on the values `calc` or `spreadsheet`
574
+ * Sort the rows based on column values `sort`
421
575
 
422
576
  When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to
423
577
  create nice graphs. To create more sophisiticated analysis *R* is the right tool
424
- to use.
578
+ to use. I also use sycsvpro to clean and prepare data and then do the analysis
579
+ with *R*.
425
580
 
426
581
  Release notes
427
582
  =============
@@ -557,6 +712,20 @@ Version 0.1.13
557
712
  * Merger now doesn't require a key column that is files can be merged without
558
713
  key columns.
559
714
 
715
+ Version 0.2.0
716
+ -------------
717
+ * SpreadSheet is used to conduct operations like multiplication, division,
718
+ addition and subtraction between multiple files that have a table like
719
+ structure. SpreadSheet can also be used to retrieve information about csv
720
+ files
721
+
722
+ Documentation
723
+ =============
724
+ The class documentation can be found at
725
+ [rubygems](https://rubygems.org/gems/sycsvpro) and the source code at
726
+ [github](https://github.com/sugaryourcoffee/syc-svpro). This might be valuable
727
+ when writing scripts.
728
+
560
729
  Installation
561
730
  ============
562
731
  [![Gem Version](https://badge.fury.io/rb/sycsvpro.png)](http://badge.fury.io/rb/sycsvpro)
data/README.rdoc CHANGED
@@ -42,5 +42,6 @@ Test files are in
42
42
 
43
43
  spec/sycsvpro/files
44
44
 
45
- :include: sycsvpro.rdoc
45
+ == Help contents
46
+ :include:sycsvpro.rdoc
46
47
 
data/bin/sycsvpro CHANGED
@@ -405,6 +405,47 @@ command :table do |c|
405
405
 
406
406
  end
407
407
 
408
+ desc 'Do arithmetic operation with table like data. The table has to have '+
409
+ 'rows with same size. Arithmetic operations are *, /, + and - where the '+
410
+ 'results can be concatenated. Complete functions can be looked up at '+
411
+ 'https://rubygems.org/gem/sycsvpro'
412
+ command :spreadsheet do |c|
413
+ c.desc 'Files that contain the table data'
414
+ c.arg_name 'FILE_1,FILE_2,...,FILE_N'
415
+ c.flag [:f, :file]
416
+
417
+ c.desc 'Indicates for each file whether it has row labels'
418
+ c.arg_name 'true,false,...,true'
419
+ c.flag [:r, :rlabel]
420
+
421
+ c.desc 'Indicates for each file whether it has column labels'
422
+ c.arg_name 'true,false,...,false'
423
+ c.flag [:c, :clabel]
424
+
425
+ c.desc 'The alias for each file that is used in the arithmetic operation'
426
+ c.arg_name 'ALIAS_1,ALIAS_2,...,ALIAS_N'
427
+ c.flag [:a, :alias]
428
+
429
+ c.desc 'The arithmetic operation with the table data'
430
+ c.arg_name 'ARITHMETIC_OPERATION'
431
+ c.flag [:o, :operation]
432
+
433
+ c.desc 'Print the result of the operation'
434
+ c.switch [:p, :print], :default_value => false
435
+
436
+ c.action do |global_options,options,args|
437
+ print 'Operating...'
438
+ Sycsvpro::SpreadSheetBuilder.new(outfile: global_options[:o],
439
+ files: options[:f],
440
+ rlabels: options[:r],
441
+ clabels: options[:c],
442
+ aliases: options[:a],
443
+ operation: options[:o],
444
+ print: options[:p]).execute
445
+ print 'done'
446
+ end
447
+ end
448
+
408
449
  desc 'Join two files based on a joint column value'
409
450
  arg_name 'SOURCE_FILE'
410
451
  command :join do |c|
@@ -688,7 +729,8 @@ pre do |global,command,options,args|
688
729
  unless command.name == :edit or
689
730
  command.name == :execute or
690
731
  command.name == :list or
691
- command.name == :merge
732
+ command.name == :merge or
733
+ command.name == :spreadsheet
692
734
  analyzer = Sycsvpro::Analyzer.new(global[:f])
693
735
  result = analyzer.result
694
736
  count = result.row_count
@@ -10,16 +10,16 @@ module Sycsvpro
10
10
  #
11
11
  # in.csv
12
12
  #
13
- # | Customer | 2013 | 2014 |
14
- # | A | A1 | |
15
- # | B | B1 | B16 |
16
- # | A | A3 | A7 |
13
+ # | Customer | 2013 | 2014 |
14
+ # | A | A1 | |
15
+ # | B | B1 | B16 |
16
+ # | A | A3 | A7 |
17
17
  #
18
18
  # out.csv
19
19
  #
20
- # | Customer | 2013 | 2014 | Sum |
21
- # | A | 2 | 1 | 3 |
22
- # | B | 1 | 1 | 2 |
20
+ # | Customer | 2013 | 2014 | Sum |
21
+ # | A | 2 | 1 | 3 |
22
+ # | B | 1 | 1 | 2 |
23
23
  class Aggregator
24
24
 
25
25
  include Dsl
@@ -5,15 +5,15 @@ module Sycsvpro
5
5
  #
6
6
  # infile.csv
7
7
  #
8
- # | Name | Product |
9
- # | A | X1 |
10
- # | B | Y2 |
11
- # | A | W10 |
8
+ # | Name | Product |
9
+ # | A | X1 |
10
+ # | B | Y2 |
11
+ # | A | W10 |
12
12
  #
13
13
  # outfile.csv
14
14
  #
15
- # | A | X1 | W10 |
16
- # | B | Y2 | |
15
+ # | A | X1 | W10 |
16
+ # | B | Y2 | |
17
17
  class Allocator
18
18
 
19
19
  # File from that values are read
@@ -6,19 +6,19 @@ module Sycsvpro
6
6
 
7
7
  # Analyzes the file structure
8
8
  #
9
- # | Name | C1 | C2 |
10
- # | A | a | b |
9
+ # | Name | C1 | C2 |
10
+ # | A | a | b |
11
11
  #
12
- # 3 columns: ["Name", "C1", "C2"]
13
- # 2 rows
12
+ # 3 columns: ["Name", "C1", "C2"]
13
+ # 2 rows
14
14
  #
15
- # Row sample data:
16
- # A;b;c
15
+ # Row sample data:
16
+ # A;b;c
17
17
  #
18
- # Column index: Column name | Column sample value
19
- # 0: Name | A
20
- # 1: C1 | a
21
- # 2: C2 | b
18
+ # Column index: Column name | Column sample value
19
+ # 0: Name | A
20
+ # 1: C1 | a
21
+ # 2: C2 | b
22
22
  class Analyzer
23
23
 
24
24
  # File that is analyzed
@@ -5,26 +5,26 @@ module Sycsvpro
5
5
  #
6
6
  # in.csv
7
7
  #
8
- # | ID | Name |
9
- # | --- | ---- |
10
- # | 1 | Hank |
11
- # | 2 | Jane |
8
+ # | ID | Name |
9
+ # | --- | ---- |
10
+ # | 1 | Hank |
11
+ # | 2 | Jane |
12
12
  #
13
13
  # mapping
14
14
  #
15
- # 1:01
16
- # 2:02
15
+ # 1:01
16
+ # 2:02
17
17
  #
18
- # Sycsvpro::Mapping.new(infile: "in.csv",
19
- # outfile: "out.csv",
20
- # mapping: "mapping",
21
- # cols: "0").execute
18
+ # Sycsvpro::Mapping.new(infile: "in.csv",
19
+ # outfile: "out.csv",
20
+ # mapping: "mapping",
21
+ # cols: "0").execute
22
22
  # out.csv
23
23
  #
24
- # | ID | Name |
25
- # | --- | ---- |
26
- # | 01 | Hank |
27
- # | 02 | Jane |
24
+ # | ID | Name |
25
+ # | --- | ---- |
26
+ # | 01 | Hank |
27
+ # | 02 | Jane |
28
28
  class Mapper
29
29
 
30
30
  include Dsl
@@ -5,28 +5,28 @@ module Sycsvpro
5
5
  #
6
6
  # file1.csv
7
7
  #
8
- # | | 2010 | 2011 | 2012 | 2013 |
9
- # | --- | ---- | ---- | ---- | ---- |
10
- # | SP | 20 | 30 | 40 | 50 |
11
- # | RP | 30 | 40 | 50 | 60 |
8
+ # | | 2010 | 2011 | 2012 | 2013 |
9
+ # | --- | ---- | ---- | ---- | ---- |
10
+ # | SP | 20 | 30 | 40 | 50 |
11
+ # | RP | 30 | 40 | 50 | 60 |
12
12
  #
13
13
  # file2.csv
14
14
  #
15
- # | | 2010 | 2011 | 2012 |
16
- # | --- | ---- | ---- | ---- |
17
- # | M | m1 | m2 | m3 |
18
- # | N | n1 | n2 | n3 |
15
+ # | | 2010 | 2011 | 2012 |
16
+ # | --- | ---- | ---- | ---- |
17
+ # | M | m1 | m2 | m3 |
18
+ # | N | n1 | n2 | n3 |
19
19
  #
20
20
  # merging restults in
21
21
  #
22
22
  # merge.csv
23
23
  #
24
- # | | 2010 | 2011 | 2012 | 2013 |
25
- # | --- | ---- | ---- | ---- | ---- |
26
- # | SP | 20 | 30 | 40 | 50 |
27
- # | RP | 30 | 40 | 50 | 60 |
28
- # | M | m1 | m2 | m3 | |
29
- # | N | n1 | n2 | n3 | |
24
+ # | | 2010 | 2011 | 2012 | 2013 |
25
+ # | --- | ---- | ---- | ---- | ---- |
26
+ # | SP | 20 | 30 | 40 | 50 |
27
+ # | RP | 30 | 40 | 50 | 60 |
28
+ # | M | m1 | m2 | m3 | |
29
+ # | N | n1 | n2 | n3 | |
30
30
  #
31
31
  class Merger
32
32
 
@@ -0,0 +1,36 @@
1
+ # Operating csv files
2
+ module Sycsvpro
3
+
4
+ # The NotAvailable class is an Eigenclass and used to represent a missing
5
+ # value. It will return if used in any expression always not available.
6
+ #
7
+ # na = NotAvailable
8
+ #
9
+ # na + 1 -> na
10
+ # 1 + na -> na
11
+ class NotAvailable
12
+
13
+ class << self
14
+
15
+ # Catches all expressions where na is the first argument
16
+ def method_missing(name, *args, &block)
17
+ super if name == :to_ary
18
+ super if name == :to_str
19
+ self
20
+ end
21
+
22
+ # Catches all expressions where na is not the first argument and swaps
23
+ # value and na, so na is first argument
24
+ def coerce(value)
25
+ [self,value]
26
+ end
27
+
28
+ # Returns NA as the string representation
29
+ def to_s
30
+ "NA"
31
+ end
32
+
33
+ end
34
+ end
35
+
36
+ end