sycsvpro 0.1.13 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore CHANGED
@@ -19,3 +19,4 @@ doc/
19
19
 
20
20
  # Test files
21
21
  *.csv
22
+ test-files/
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- sycsvpro (0.1.13)
4
+ sycsvpro (0.2.0)
5
5
  gli (= 2.9.0)
6
6
  timeleap (~> 0.0.1)
7
7
 
data/README.md CHANGED
@@ -24,6 +24,8 @@ Processing of csv files. *sycsvpro* offers following functions
24
24
  * join two file based on a joint column value (since version 0.1.7)
25
25
  * merge files based on common headline columns (since version 0.1.10)
26
26
  * transpose (swapping) rows and columns (since version 0.1.13)
27
+ * arithmetic operations between multiple files that have a table like
28
+ structure (since version 0.2.0)
27
29
 
28
30
  To get help type
29
31
 
@@ -255,6 +257,158 @@ Write only columns 0, 6 and 7 by specifying write columns
255
257
  chiro;2;20
256
258
  0;10;100
257
259
 
260
+ Spread Sheet
261
+ ------------
262
+ A spread sheet is a table with rows and columns. On or between spread sheets
263
+ operations can be conducted. A spread sheet's rows must have same column
264
+ sizes and may have row and column labels.
265
+
266
+ Use cases are
267
+
268
+ * arithmetic operations on spread sheets
269
+ * information about table like data
270
+
271
+ ###Example for Arithmetic Operation
272
+ Asume we want to calculate the market for computer services. We have the count
273
+ of computers in each country, we are offering different services with service
274
+ specific prices. We know the market for each service in percent. With this data
275
+ we can calculate the market value.
276
+
277
+ Count of computers in target countries
278
+
279
+ [Tablet] [Laptop] [Desktop]
280
+ [CA] 1000 2000 500
281
+ [DE] 2000 3000 400
282
+ [MX] 500 4000 800
283
+ [RU] 1500 1500 1000
284
+ [TR] 1000 2500 3000
285
+ [US] 3000 3500 1200
286
+
287
+ Prices for different services offered computer specific
288
+
289
+ [Clean] [Maintain] [Repair]
290
+ [Tablet] 10 50 100
291
+ [Laptop] 20 60 150
292
+ [Desktop] 50 100 200
293
+
294
+ Market for the different services
295
+
296
+ [Clean] [Maintain] [Repair]
297
+ [Tablet] 0.10 0.05 0.03
298
+ [Laptop] 0.05 0.10 0.02
299
+ [Desktop] 0.20 0.30 0.04
300
+
301
+ To calculate the market value we have to multiply each row of the country file
302
+ with the columns of the service prices and service market file (for readabiltiy
303
+ it has been split up to multiple rows)
304
+
305
+ $ sycsvpro -o market_value.csv spreadsheet \
306
+ -f country.csv,prices.csv,market.csv \
307
+ -a country,price,market \
308
+ -o "SpreadSheet.bind_columns( \
309
+ country.transpose.column_collect { |value| value * price * market } \
310
+ ).transpose"
311
+
312
+ Note: If you get obscure errors then check whether your aliases (-a flag)
313
+ conflict with a method of your classes. Therefore it is adviced to
314
+ always use specific names like in the example country, price, market
315
+
316
+ The result of the operation is written to market\_value.csv (labels have been
317
+ optimized for better readability)
318
+
319
+ [Tablet] [Laptop] [Desktop]
320
+ [CA-Clean] 1000.0 2000.0 5000.0
321
+ [CA-Maintain] 2500.0 12000.0 15000.0
322
+ [CA-Repair] 3000.0 6000.0 4000.0
323
+ [DE-Clean] 2000.0 3000.0 4000.0
324
+ [DE-Maintain] 5000.0 18000.0 12000.0
325
+ [DE-Repair] 6000.0 9000.0 3200.0
326
+ [MX-Clean] 500.0 4000.0 8000.0
327
+ [MX-Maintain] 1250.0 24000.0 24000.0
328
+ [MX-Repair] 1500.0 12000.0 6400.0
329
+ [RU-Clean] 1500.0 1500.0 10000.0
330
+ [RU-Maintain] 3750.0 9000.0 30000.0
331
+ [RU-Repair] 4500.0 4500.0 8000.0
332
+ [TR-Clean] 1000.0 2500.0 30000.0
333
+ [TR-Maintain] 2500.0 15000.0 90000.0
334
+ [TR-Repair] 3000.0 7500.0 24000.0
335
+ [US-Clean] 3000.0 3500.0 12000.0
336
+ [US-Maintain] 7500.0 21000.0 36000.0
337
+ [US-Repair] 9000.0 10500.0 9600.0
338
+
339
+ ###Example for Information on Spread Sheets
340
+ With the analyze command we get information about the general structure and some
341
+ sample data of a csv file. If we want to look at the csv file more detailed we
342
+ can use the spreadsheet command. In this case we don't want to write the result
343
+ to the file as it is no spread sheet, so we can ommit the global -o option.
344
+
345
+ sycsvpro spreadsheet -f country.csv -r true -c true -a a \
346
+ -o "puts;puts a;puts a.ncol;puts a.nrow;puts a.size"
347
+
348
+ This will give us the information about the data, the number of columns and rows
349
+ and the number of values in the csv file. But for this case there is a standard
350
+ method that provides this information
351
+
352
+ sycsvpro spreadsheet -f country.csv -r true, -c true -a a -o "a.summary"
353
+
354
+ Summary
355
+ -------
356
+ rows: 6, columns: 3, dimension: [6, 3], size: 18
357
+
358
+ row labels:
359
+ ["CA","DE","MX","RU","TR","US"]
360
+ column labels:
361
+ ["Clean","Maintain","Repair"]
362
+
363
+ If the result is no spread sheet it won't be written to the outfile (-o) but we
364
+ can print the result to the console with the -p flag
365
+
366
+ sycsvpro spreadsheet -f country.csv,prices.csv,market.csv \
367
+ -r true,true,true -c true,true,true \
368
+ -a country,price,market \
369
+ -o "result = []; \
370
+ a.each_column { \
371
+ |column| result << column * price * market \
372
+ }; \
373
+ result" \
374
+ -p
375
+
376
+ The last evaluation, in this case result, will be returned as the result. The
377
+ -p flag will print the result to the console
378
+
379
+ Operation
380
+ ---------
381
+ result = []
382
+ country.transpose.each_column { |column| result << column * price * market }
383
+ result
384
+
385
+ Result
386
+ ------
387
+ [CA*Clean*Clean] [CA*Maintain*Maintain] [CA*Repair*Repair]
388
+ [Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
389
+ [Laptop*Laptop*Laptop] 2000.0 12000.0 6000.0
390
+ [Desktop*Desktop*Desktop] 5000.0 15000.0 4000.0
391
+ [DE*Clean*Clean] [DE*Maintain*Maintain] [DE*Repair*Repair]
392
+ [Tablet*Tablet*Tablet] 2000.0 5000.0 6000.0
393
+ [Laptop*Laptop*Laptop] 3000.0 18000.0 9000.0
394
+ [Desktop*Desktop*Desktop] 4000.0 12000.0 3200.0
395
+ [MX*Clean*Clean] [MX*Maintain*Maintain] [MX*Repair*Repair]
396
+ [Tablet*Tablet*Tablet] 500.0 1250.0 1500.0
397
+ [Laptop*Laptop*Laptop] 4000.0 24000.0 12000.0
398
+ [Desktop*Desktop*Desktop] 8000.0 24000.0 6400.0
399
+ [RU*Clean*Clean] [RU*Maintain*Maintain] [RU*Repair*Repair]
400
+ [Tablet*Tablet*Tablet] 1500.0 3750.0 4500.0
401
+ [Laptop*Laptop*Laptop] 1500.0 9000.0 4500.0
402
+ [Desktop*Desktop*Desktop] 10000.0 30000.0 8000.0
403
+ [TR*Clean*Clean] [TR*Maintain*Maintain] [TR*Repair*Repair]
404
+ [Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
405
+ [Laptop*Laptop*Laptop] 2500.0 15000.0 7500.0
406
+ [Desktop*Desktop*Desktop] 30000.0 90000.0 24000.0
407
+ [US*Clean*Clean] [US*Maintain*Maintain] [US*Repair*Repair]
408
+ [Tablet*Tablet*Tablet] 3000.0 7500.0 9000.0
409
+ [Laptop*Laptop*Laptop] 3500.0 21000.0 10500.0
410
+ [Desktop*Desktop*Desktop] 12000.0 36000.0 9600.0
411
+
258
412
  Join
259
413
  ----
260
414
  Join the machine and contract file with columns from the customer address file
@@ -412,16 +566,17 @@ want to dig deeper I would recommend [R](http://www.r-project.org/).
412
566
 
413
567
  A work flow could be as follows
414
568
 
415
- * Analyze the file `analyze`
569
+ * Analyze the file `analyze` or `spreadsheet`
416
570
  * Clean the data `map`
417
571
  * Extract rows and columns of interest `extract`
418
572
  * Count values `count`
419
- * Do arithmetic operations on the values `calc`
420
- * Sort the rows based on column values
573
+ * Do arithmetic operations on the values `calc` or `spreadsheet`
574
+ * Sort the rows based on column values `sort`
421
575
 
422
576
  When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to
423
577
  create nice graphs. To create more sophisiticated analysis *R* is the right tool
424
- to use.
578
+ to use. I also use sycsvpro to clean and prepare data and then do the analysis
579
+ with *R*.
425
580
 
426
581
  Release notes
427
582
  =============
@@ -557,6 +712,20 @@ Version 0.1.13
557
712
  * Merger now doesn't require a key column that is files can be merged without
558
713
  key columns.
559
714
 
715
+ Version 0.2.0
716
+ -------------
717
+ * SpreadSheet is used to conduct operations like multiplication, division,
718
+ addition and subtraction between multiple files that have a table like
719
+ structure. SpreadSheet can also be used to retrieve information about csv
720
+ files
721
+
722
+ Documentation
723
+ =============
724
+ The class documentation can be found at
725
+ [rubygems](https://rubygems.org/gems/sycsvpro) and the source code at
726
+ [github](https://github.com/sugaryourcoffee/syc-svpro). This might be valuable
727
+ when writing scripts.
728
+
560
729
  Installation
561
730
  ============
562
731
  [![Gem Version](https://badge.fury.io/rb/sycsvpro.png)](http://badge.fury.io/rb/sycsvpro)
data/README.rdoc CHANGED
@@ -42,5 +42,6 @@ Test files are in
42
42
 
43
43
  spec/sycsvpro/files
44
44
 
45
- :include: sycsvpro.rdoc
45
+ == Help contents
46
+ :include:sycsvpro.rdoc
46
47
 
data/bin/sycsvpro CHANGED
@@ -405,6 +405,47 @@ command :table do |c|
405
405
 
406
406
  end
407
407
 
408
+ desc 'Do arithmetic operation with table like data. The table has to have '+
409
+ 'rows with same size. Arithmetic operations are *, /, + and - where the '+
410
+ 'results can be concatenated. Complete functions can be looked up at '+
411
+ 'https://rubygems.org/gem/sycsvpro'
412
+ command :spreadsheet do |c|
413
+ c.desc 'Files that contain the table data'
414
+ c.arg_name 'FILE_1,FILE_2,...,FILE_N'
415
+ c.flag [:f, :file]
416
+
417
+ c.desc 'Indicates for each file whether it has row labels'
418
+ c.arg_name 'true,false,...,true'
419
+ c.flag [:r, :rlabel]
420
+
421
+ c.desc 'Indicates for each file whether it has column labels'
422
+ c.arg_name 'true,false,...,false'
423
+ c.flag [:c, :clabel]
424
+
425
+ c.desc 'The alias for each file that is used in the arithmetic operation'
426
+ c.arg_name 'ALIAS_1,ALIAS_2,...,ALIAS_N'
427
+ c.flag [:a, :alias]
428
+
429
+ c.desc 'The arithmetic operation with the table data'
430
+ c.arg_name 'ARITHMETIC_OPERATION'
431
+ c.flag [:o, :operation]
432
+
433
+ c.desc 'Print the result of the operation'
434
+ c.switch [:p, :print], :default_value => false
435
+
436
+ c.action do |global_options,options,args|
437
+ print 'Operating...'
438
+ Sycsvpro::SpreadSheetBuilder.new(outfile: global_options[:o],
439
+ files: options[:f],
440
+ rlabels: options[:r],
441
+ clabels: options[:c],
442
+ aliases: options[:a],
443
+ operation: options[:o],
444
+ print: options[:p]).execute
445
+ print 'done'
446
+ end
447
+ end
448
+
408
449
  desc 'Join two files based on a joint column value'
409
450
  arg_name 'SOURCE_FILE'
410
451
  command :join do |c|
@@ -688,7 +729,8 @@ pre do |global,command,options,args|
688
729
  unless command.name == :edit or
689
730
  command.name == :execute or
690
731
  command.name == :list or
691
- command.name == :merge
732
+ command.name == :merge or
733
+ command.name == :spreadsheet
692
734
  analyzer = Sycsvpro::Analyzer.new(global[:f])
693
735
  result = analyzer.result
694
736
  count = result.row_count
@@ -10,16 +10,16 @@ module Sycsvpro
10
10
  #
11
11
  # in.csv
12
12
  #
13
- # | Customer | 2013 | 2014 |
14
- # | A | A1 | |
15
- # | B | B1 | B16 |
16
- # | A | A3 | A7 |
13
+ # | Customer | 2013 | 2014 |
14
+ # | A | A1 | |
15
+ # | B | B1 | B16 |
16
+ # | A | A3 | A7 |
17
17
  #
18
18
  # out.csv
19
19
  #
20
- # | Customer | 2013 | 2014 | Sum |
21
- # | A | 2 | 1 | 3 |
22
- # | B | 1 | 1 | 2 |
20
+ # | Customer | 2013 | 2014 | Sum |
21
+ # | A | 2 | 1 | 3 |
22
+ # | B | 1 | 1 | 2 |
23
23
  class Aggregator
24
24
 
25
25
  include Dsl
@@ -5,15 +5,15 @@ module Sycsvpro
5
5
  #
6
6
  # infile.csv
7
7
  #
8
- # | Name | Product |
9
- # | A | X1 |
10
- # | B | Y2 |
11
- # | A | W10 |
8
+ # | Name | Product |
9
+ # | A | X1 |
10
+ # | B | Y2 |
11
+ # | A | W10 |
12
12
  #
13
13
  # outfile.csv
14
14
  #
15
- # | A | X1 | W10 |
16
- # | B | Y2 | |
15
+ # | A | X1 | W10 |
16
+ # | B | Y2 | |
17
17
  class Allocator
18
18
 
19
19
  # File from that values are read
@@ -6,19 +6,19 @@ module Sycsvpro
6
6
 
7
7
  # Analyzes the file structure
8
8
  #
9
- # | Name | C1 | C2 |
10
- # | A | a | b |
9
+ # | Name | C1 | C2 |
10
+ # | A | a | b |
11
11
  #
12
- # 3 columns: ["Name", "C1", "C2"]
13
- # 2 rows
12
+ # 3 columns: ["Name", "C1", "C2"]
13
+ # 2 rows
14
14
  #
15
- # Row sample data:
16
- # A;b;c
15
+ # Row sample data:
16
+ # A;b;c
17
17
  #
18
- # Column index: Column name | Column sample value
19
- # 0: Name | A
20
- # 1: C1 | a
21
- # 2: C2 | b
18
+ # Column index: Column name | Column sample value
19
+ # 0: Name | A
20
+ # 1: C1 | a
21
+ # 2: C2 | b
22
22
  class Analyzer
23
23
 
24
24
  # File that is analyzed
@@ -5,26 +5,26 @@ module Sycsvpro
5
5
  #
6
6
  # in.csv
7
7
  #
8
- # | ID | Name |
9
- # | --- | ---- |
10
- # | 1 | Hank |
11
- # | 2 | Jane |
8
+ # | ID | Name |
9
+ # | --- | ---- |
10
+ # | 1 | Hank |
11
+ # | 2 | Jane |
12
12
  #
13
13
  # mapping
14
14
  #
15
- # 1:01
16
- # 2:02
15
+ # 1:01
16
+ # 2:02
17
17
  #
18
- # Sycsvpro::Mapping.new(infile: "in.csv",
19
- # outfile: "out.csv",
20
- # mapping: "mapping",
21
- # cols: "0").execute
18
+ # Sycsvpro::Mapping.new(infile: "in.csv",
19
+ # outfile: "out.csv",
20
+ # mapping: "mapping",
21
+ # cols: "0").execute
22
22
  # out.csv
23
23
  #
24
- # | ID | Name |
25
- # | --- | ---- |
26
- # | 01 | Hank |
27
- # | 02 | Jane |
24
+ # | ID | Name |
25
+ # | --- | ---- |
26
+ # | 01 | Hank |
27
+ # | 02 | Jane |
28
28
  class Mapper
29
29
 
30
30
  include Dsl
@@ -5,28 +5,28 @@ module Sycsvpro
5
5
  #
6
6
  # file1.csv
7
7
  #
8
- # | | 2010 | 2011 | 2012 | 2013 |
9
- # | --- | ---- | ---- | ---- | ---- |
10
- # | SP | 20 | 30 | 40 | 50 |
11
- # | RP | 30 | 40 | 50 | 60 |
8
+ # | | 2010 | 2011 | 2012 | 2013 |
9
+ # | --- | ---- | ---- | ---- | ---- |
10
+ # | SP | 20 | 30 | 40 | 50 |
11
+ # | RP | 30 | 40 | 50 | 60 |
12
12
  #
13
13
  # file2.csv
14
14
  #
15
- # | | 2010 | 2011 | 2012 |
16
- # | --- | ---- | ---- | ---- |
17
- # | M | m1 | m2 | m3 |
18
- # | N | n1 | n2 | n3 |
15
+ # | | 2010 | 2011 | 2012 |
16
+ # | --- | ---- | ---- | ---- |
17
+ # | M | m1 | m2 | m3 |
18
+ # | N | n1 | n2 | n3 |
19
19
  #
20
20
  # merging restults in
21
21
  #
22
22
  # merge.csv
23
23
  #
24
- # | | 2010 | 2011 | 2012 | 2013 |
25
- # | --- | ---- | ---- | ---- | ---- |
26
- # | SP | 20 | 30 | 40 | 50 |
27
- # | RP | 30 | 40 | 50 | 60 |
28
- # | M | m1 | m2 | m3 | |
29
- # | N | n1 | n2 | n3 | |
24
+ # | | 2010 | 2011 | 2012 | 2013 |
25
+ # | --- | ---- | ---- | ---- | ---- |
26
+ # | SP | 20 | 30 | 40 | 50 |
27
+ # | RP | 30 | 40 | 50 | 60 |
28
+ # | M | m1 | m2 | m3 | |
29
+ # | N | n1 | n2 | n3 | |
30
30
  #
31
31
  class Merger
32
32
 
@@ -0,0 +1,36 @@
1
+ # Operating csv files
2
+ module Sycsvpro
3
+
4
+ # The NotAvailable class is an Eigenclass and used to represent a missing
5
+ # value. It will return if used in any expression always not available.
6
+ #
7
+ # na = NotAvailable
8
+ #
9
+ # na + 1 -> na
10
+ # 1 + na -> na
11
+ class NotAvailable
12
+
13
+ class << self
14
+
15
+ # Catches all expressions where na is the first argument
16
+ def method_missing(name, *args, &block)
17
+ super if name == :to_ary
18
+ super if name == :to_str
19
+ self
20
+ end
21
+
22
+ # Catches all expressions where na is not the first argument and swaps
23
+ # value and na, so na is first argument
24
+ def coerce(value)
25
+ [self,value]
26
+ end
27
+
28
+ # Returns NA as the string representation
29
+ def to_s
30
+ "NA"
31
+ end
32
+
33
+ end
34
+ end
35
+
36
+ end