sycsvpro 0.1.13 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +1 -0
- data/Gemfile.lock +1 -1
- data/README.md +173 -4
- data/README.rdoc +2 -1
- data/bin/sycsvpro +43 -1
- data/lib/sycsvpro/aggregator.rb +7 -7
- data/lib/sycsvpro/allocator.rb +6 -6
- data/lib/sycsvpro/analyzer.rb +10 -10
- data/lib/sycsvpro/mapper.rb +14 -14
- data/lib/sycsvpro/merger.rb +14 -14
- data/lib/sycsvpro/not_available.rb +36 -0
- data/lib/sycsvpro/spread_sheet.rb +523 -0
- data/lib/sycsvpro/spread_sheet_builder.rb +104 -0
- data/lib/sycsvpro/transposer.rb +14 -15
- data/lib/sycsvpro/unique.rb +11 -12
- data/lib/sycsvpro/version.rb +1 -1
- data/lib/sycsvpro.rb +2 -0
- data/spec/sycsvpro/not_available_spec.rb +34 -0
- data/spec/sycsvpro/spread_sheet_builder_spec.rb +35 -0
- data/spec/sycsvpro/spread_sheet_spec.rb +415 -0
- data/sycsvpro.rdoc +25 -24
- metadata +8 -2
data/.gitignore
CHANGED
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -24,6 +24,8 @@ Processing of csv files. *sycsvpro* offers following functions
|
|
24
24
|
* join two file based on a joint column value (since version 0.1.7)
|
25
25
|
* merge files based on common headline columns (since version 0.1.10)
|
26
26
|
* transpose (swapping) rows and columns (since version 0.1.13)
|
27
|
+
* arithmetic operations between multiple files that have a table like
|
28
|
+
structure (since version 0.2.0)
|
27
29
|
|
28
30
|
To get help type
|
29
31
|
|
@@ -255,6 +257,158 @@ Write only columns 0, 6 and 7 by specifying write columns
|
|
255
257
|
chiro;2;20
|
256
258
|
0;10;100
|
257
259
|
|
260
|
+
Spread Sheet
|
261
|
+
------------
|
262
|
+
A spread sheet is a table with rows and columns. On or between spread sheets
|
263
|
+
operations can be conducted. A spread sheet's rows must have same column
|
264
|
+
sizes and may have row and column labels.
|
265
|
+
|
266
|
+
Use cases are
|
267
|
+
|
268
|
+
* arithmetic operations on spread sheets
|
269
|
+
* information about table like data
|
270
|
+
|
271
|
+
###Example for Arithmetic Operation
|
272
|
+
Asume we want to calculate the market for computer services. We have the count
|
273
|
+
of computers in each country, we are offering different services with service
|
274
|
+
specific prices. We know the market for each service in percent. With this data
|
275
|
+
we can calculate the market value.
|
276
|
+
|
277
|
+
Count of computers in target countries
|
278
|
+
|
279
|
+
[Tablet] [Laptop] [Desktop]
|
280
|
+
[CA] 1000 2000 500
|
281
|
+
[DE] 2000 3000 400
|
282
|
+
[MX] 500 4000 800
|
283
|
+
[RU] 1500 1500 1000
|
284
|
+
[TR] 1000 2500 3000
|
285
|
+
[US] 3000 3500 1200
|
286
|
+
|
287
|
+
Prices for different services offered computer specific
|
288
|
+
|
289
|
+
[Clean] [Maintain] [Repair]
|
290
|
+
[Tablet] 10 50 100
|
291
|
+
[Laptop] 20 60 150
|
292
|
+
[Desktop] 50 100 200
|
293
|
+
|
294
|
+
Market for the different services
|
295
|
+
|
296
|
+
[Clean] [Maintain] [Repair]
|
297
|
+
[Tablet] 0.10 0.05 0.03
|
298
|
+
[Laptop] 0.05 0.10 0.02
|
299
|
+
[Desktop] 0.20 0.30 0.04
|
300
|
+
|
301
|
+
To calculate the market value we have to multiply each row of the country file
|
302
|
+
with the columns of the service prices and service market file (for readabiltiy
|
303
|
+
it has been split up to multiple rows)
|
304
|
+
|
305
|
+
$ sycsvpro -o market_value.csv spreadsheet \
|
306
|
+
-f country.csv,prices.csv,market.csv \
|
307
|
+
-a country,price,market \
|
308
|
+
-o "SpreadSheet.bind_columns( \
|
309
|
+
country.transpose.column_collect { |value| value * price * market } \
|
310
|
+
).transpose"
|
311
|
+
|
312
|
+
Note: If you get obscure errors then check whether your aliases (-a flag)
|
313
|
+
conflict with a method of your classes. Therefore it is adviced to
|
314
|
+
always use specific names like in the example country, price, market
|
315
|
+
|
316
|
+
The result of the operation is written to market\_value.csv (labels have been
|
317
|
+
optimized for better readability)
|
318
|
+
|
319
|
+
[Tablet] [Laptop] [Desktop]
|
320
|
+
[CA-Clean] 1000.0 2000.0 5000.0
|
321
|
+
[CA-Maintain] 2500.0 12000.0 15000.0
|
322
|
+
[CA-Repair] 3000.0 6000.0 4000.0
|
323
|
+
[DE-Clean] 2000.0 3000.0 4000.0
|
324
|
+
[DE-Maintain] 5000.0 18000.0 12000.0
|
325
|
+
[DE-Repair] 6000.0 9000.0 3200.0
|
326
|
+
[MX-Clean] 500.0 4000.0 8000.0
|
327
|
+
[MX-Maintain] 1250.0 24000.0 24000.0
|
328
|
+
[MX-Repair] 1500.0 12000.0 6400.0
|
329
|
+
[RU-Clean] 1500.0 1500.0 10000.0
|
330
|
+
[RU-Maintain] 3750.0 9000.0 30000.0
|
331
|
+
[RU-Repair] 4500.0 4500.0 8000.0
|
332
|
+
[TR-Clean] 1000.0 2500.0 30000.0
|
333
|
+
[TR-Maintain] 2500.0 15000.0 90000.0
|
334
|
+
[TR-Repair] 3000.0 7500.0 24000.0
|
335
|
+
[US-Clean] 3000.0 3500.0 12000.0
|
336
|
+
[US-Maintain] 7500.0 21000.0 36000.0
|
337
|
+
[US-Repair] 9000.0 10500.0 9600.0
|
338
|
+
|
339
|
+
###Example for Information on Spread Sheets
|
340
|
+
With the analyze command we get information about the general structure and some
|
341
|
+
sample data of a csv file. If we want to look at the csv file more detailed we
|
342
|
+
can use the spreadsheet command. In this case we don't want to write the result
|
343
|
+
to the file as it is no spread sheet, so we can ommit the global -o option.
|
344
|
+
|
345
|
+
sycsvpro spreadsheet -f country.csv -r true -c true -a a \
|
346
|
+
-o "puts;puts a;puts a.ncol;puts a.nrow;puts a.size"
|
347
|
+
|
348
|
+
This will give us the information about the data, the number of columns and rows
|
349
|
+
and the number of values in the csv file. But for this case there is a standard
|
350
|
+
method that provides this information
|
351
|
+
|
352
|
+
sycsvpro spreadsheet -f country.csv -r true, -c true -a a -o "a.summary"
|
353
|
+
|
354
|
+
Summary
|
355
|
+
-------
|
356
|
+
rows: 6, columns: 3, dimension: [6, 3], size: 18
|
357
|
+
|
358
|
+
row labels:
|
359
|
+
["CA","DE","MX","RU","TR","US"]
|
360
|
+
column labels:
|
361
|
+
["Clean","Maintain","Repair"]
|
362
|
+
|
363
|
+
If the result is no spread sheet it won't be written to the outfile (-o) but we
|
364
|
+
can print the result to the console with the -p flag
|
365
|
+
|
366
|
+
sycsvpro spreadsheet -f country.csv,prices.csv,market.csv \
|
367
|
+
-r true,true,true -c true,true,true \
|
368
|
+
-a country,price,market \
|
369
|
+
-o "result = []; \
|
370
|
+
a.each_column { \
|
371
|
+
|column| result << column * price * market \
|
372
|
+
}; \
|
373
|
+
result" \
|
374
|
+
-p
|
375
|
+
|
376
|
+
The last evaluation, in this case result, will be returned as the result. The
|
377
|
+
-p flag will print the result to the console
|
378
|
+
|
379
|
+
Operation
|
380
|
+
---------
|
381
|
+
result = []
|
382
|
+
country.transpose.each_column { |column| result << column * price * market }
|
383
|
+
result
|
384
|
+
|
385
|
+
Result
|
386
|
+
------
|
387
|
+
[CA*Clean*Clean] [CA*Maintain*Maintain] [CA*Repair*Repair]
|
388
|
+
[Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
|
389
|
+
[Laptop*Laptop*Laptop] 2000.0 12000.0 6000.0
|
390
|
+
[Desktop*Desktop*Desktop] 5000.0 15000.0 4000.0
|
391
|
+
[DE*Clean*Clean] [DE*Maintain*Maintain] [DE*Repair*Repair]
|
392
|
+
[Tablet*Tablet*Tablet] 2000.0 5000.0 6000.0
|
393
|
+
[Laptop*Laptop*Laptop] 3000.0 18000.0 9000.0
|
394
|
+
[Desktop*Desktop*Desktop] 4000.0 12000.0 3200.0
|
395
|
+
[MX*Clean*Clean] [MX*Maintain*Maintain] [MX*Repair*Repair]
|
396
|
+
[Tablet*Tablet*Tablet] 500.0 1250.0 1500.0
|
397
|
+
[Laptop*Laptop*Laptop] 4000.0 24000.0 12000.0
|
398
|
+
[Desktop*Desktop*Desktop] 8000.0 24000.0 6400.0
|
399
|
+
[RU*Clean*Clean] [RU*Maintain*Maintain] [RU*Repair*Repair]
|
400
|
+
[Tablet*Tablet*Tablet] 1500.0 3750.0 4500.0
|
401
|
+
[Laptop*Laptop*Laptop] 1500.0 9000.0 4500.0
|
402
|
+
[Desktop*Desktop*Desktop] 10000.0 30000.0 8000.0
|
403
|
+
[TR*Clean*Clean] [TR*Maintain*Maintain] [TR*Repair*Repair]
|
404
|
+
[Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
|
405
|
+
[Laptop*Laptop*Laptop] 2500.0 15000.0 7500.0
|
406
|
+
[Desktop*Desktop*Desktop] 30000.0 90000.0 24000.0
|
407
|
+
[US*Clean*Clean] [US*Maintain*Maintain] [US*Repair*Repair]
|
408
|
+
[Tablet*Tablet*Tablet] 3000.0 7500.0 9000.0
|
409
|
+
[Laptop*Laptop*Laptop] 3500.0 21000.0 10500.0
|
410
|
+
[Desktop*Desktop*Desktop] 12000.0 36000.0 9600.0
|
411
|
+
|
258
412
|
Join
|
259
413
|
----
|
260
414
|
Join the machine and contract file with columns from the customer address file
|
@@ -412,16 +566,17 @@ want to dig deeper I would recommend [R](http://www.r-project.org/).
|
|
412
566
|
|
413
567
|
A work flow could be as follows
|
414
568
|
|
415
|
-
* Analyze the file `analyze`
|
569
|
+
* Analyze the file `analyze` or `spreadsheet`
|
416
570
|
* Clean the data `map`
|
417
571
|
* Extract rows and columns of interest `extract`
|
418
572
|
* Count values `count`
|
419
|
-
* Do arithmetic operations on the values `calc`
|
420
|
-
* Sort the rows based on column values
|
573
|
+
* Do arithmetic operations on the values `calc` or `spreadsheet`
|
574
|
+
* Sort the rows based on column values `sort`
|
421
575
|
|
422
576
|
When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to
|
423
577
|
create nice graphs. To create more sophisiticated analysis *R* is the right tool
|
424
|
-
to use.
|
578
|
+
to use. I also use sycsvpro to clean and prepare data and then do the analysis
|
579
|
+
with *R*.
|
425
580
|
|
426
581
|
Release notes
|
427
582
|
=============
|
@@ -557,6 +712,20 @@ Version 0.1.13
|
|
557
712
|
* Merger now doesn't require a key column that is files can be merged without
|
558
713
|
key columns.
|
559
714
|
|
715
|
+
Version 0.2.0
|
716
|
+
-------------
|
717
|
+
* SpreadSheet is used to conduct operations like multiplication, division,
|
718
|
+
addition and subtraction between multiple files that have a table like
|
719
|
+
structure. SpreadSheet can also be used to retrieve information about csv
|
720
|
+
files
|
721
|
+
|
722
|
+
Documentation
|
723
|
+
=============
|
724
|
+
The class documentation can be found at
|
725
|
+
[rubygems](https://rubygems.org/gems/sycsvpro) and the source code at
|
726
|
+
[github](https://github.com/sugaryourcoffee/syc-svpro). This might be valuable
|
727
|
+
when writing scripts.
|
728
|
+
|
560
729
|
Installation
|
561
730
|
============
|
562
731
|
[![Gem Version](https://badge.fury.io/rb/sycsvpro.png)](http://badge.fury.io/rb/sycsvpro)
|
data/README.rdoc
CHANGED
data/bin/sycsvpro
CHANGED
@@ -405,6 +405,47 @@ command :table do |c|
|
|
405
405
|
|
406
406
|
end
|
407
407
|
|
408
|
+
desc 'Do arithmetic operation with table like data. The table has to have '+
|
409
|
+
'rows with same size. Arithmetic operations are *, /, + and - where the '+
|
410
|
+
'results can be concatenated. Complete functions can be looked up at '+
|
411
|
+
'https://rubygems.org/gem/sycsvpro'
|
412
|
+
command :spreadsheet do |c|
|
413
|
+
c.desc 'Files that contain the table data'
|
414
|
+
c.arg_name 'FILE_1,FILE_2,...,FILE_N'
|
415
|
+
c.flag [:f, :file]
|
416
|
+
|
417
|
+
c.desc 'Indicates for each file whether it has row labels'
|
418
|
+
c.arg_name 'true,false,...,true'
|
419
|
+
c.flag [:r, :rlabel]
|
420
|
+
|
421
|
+
c.desc 'Indicates for each file whether it has column labels'
|
422
|
+
c.arg_name 'true,false,...,false'
|
423
|
+
c.flag [:c, :clabel]
|
424
|
+
|
425
|
+
c.desc 'The alias for each file that is used in the arithmetic operation'
|
426
|
+
c.arg_name 'ALIAS_1,ALIAS_2,...,ALIAS_N'
|
427
|
+
c.flag [:a, :alias]
|
428
|
+
|
429
|
+
c.desc 'The arithmetic operation with the table data'
|
430
|
+
c.arg_name 'ARITHMETIC_OPERATION'
|
431
|
+
c.flag [:o, :operation]
|
432
|
+
|
433
|
+
c.desc 'Print the result of the operation'
|
434
|
+
c.switch [:p, :print], :default_value => false
|
435
|
+
|
436
|
+
c.action do |global_options,options,args|
|
437
|
+
print 'Operating...'
|
438
|
+
Sycsvpro::SpreadSheetBuilder.new(outfile: global_options[:o],
|
439
|
+
files: options[:f],
|
440
|
+
rlabels: options[:r],
|
441
|
+
clabels: options[:c],
|
442
|
+
aliases: options[:a],
|
443
|
+
operation: options[:o],
|
444
|
+
print: options[:p]).execute
|
445
|
+
print 'done'
|
446
|
+
end
|
447
|
+
end
|
448
|
+
|
408
449
|
desc 'Join two files based on a joint column value'
|
409
450
|
arg_name 'SOURCE_FILE'
|
410
451
|
command :join do |c|
|
@@ -688,7 +729,8 @@ pre do |global,command,options,args|
|
|
688
729
|
unless command.name == :edit or
|
689
730
|
command.name == :execute or
|
690
731
|
command.name == :list or
|
691
|
-
command.name == :merge
|
732
|
+
command.name == :merge or
|
733
|
+
command.name == :spreadsheet
|
692
734
|
analyzer = Sycsvpro::Analyzer.new(global[:f])
|
693
735
|
result = analyzer.result
|
694
736
|
count = result.row_count
|
data/lib/sycsvpro/aggregator.rb
CHANGED
@@ -10,16 +10,16 @@ module Sycsvpro
|
|
10
10
|
#
|
11
11
|
# in.csv
|
12
12
|
#
|
13
|
-
#
|
14
|
-
#
|
15
|
-
#
|
16
|
-
#
|
13
|
+
# | Customer | 2013 | 2014 |
|
14
|
+
# | A | A1 | |
|
15
|
+
# | B | B1 | B16 |
|
16
|
+
# | A | A3 | A7 |
|
17
17
|
#
|
18
18
|
# out.csv
|
19
19
|
#
|
20
|
-
#
|
21
|
-
#
|
22
|
-
#
|
20
|
+
# | Customer | 2013 | 2014 | Sum |
|
21
|
+
# | A | 2 | 1 | 3 |
|
22
|
+
# | B | 1 | 1 | 2 |
|
23
23
|
class Aggregator
|
24
24
|
|
25
25
|
include Dsl
|
data/lib/sycsvpro/allocator.rb
CHANGED
@@ -5,15 +5,15 @@ module Sycsvpro
|
|
5
5
|
#
|
6
6
|
# infile.csv
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
8
|
+
# | Name | Product |
|
9
|
+
# | A | X1 |
|
10
|
+
# | B | Y2 |
|
11
|
+
# | A | W10 |
|
12
12
|
#
|
13
13
|
# outfile.csv
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
15
|
+
# | A | X1 | W10 |
|
16
|
+
# | B | Y2 | |
|
17
17
|
class Allocator
|
18
18
|
|
19
19
|
# File from that values are read
|
data/lib/sycsvpro/analyzer.rb
CHANGED
@@ -6,19 +6,19 @@ module Sycsvpro
|
|
6
6
|
|
7
7
|
# Analyzes the file structure
|
8
8
|
#
|
9
|
-
#
|
10
|
-
#
|
9
|
+
# | Name | C1 | C2 |
|
10
|
+
# | A | a | b |
|
11
11
|
#
|
12
|
-
#
|
13
|
-
#
|
12
|
+
# 3 columns: ["Name", "C1", "C2"]
|
13
|
+
# 2 rows
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
15
|
+
# Row sample data:
|
16
|
+
# A;b;c
|
17
17
|
#
|
18
|
-
#
|
19
|
-
#
|
20
|
-
#
|
21
|
-
#
|
18
|
+
# Column index: Column name | Column sample value
|
19
|
+
# 0: Name | A
|
20
|
+
# 1: C1 | a
|
21
|
+
# 2: C2 | b
|
22
22
|
class Analyzer
|
23
23
|
|
24
24
|
# File that is analyzed
|
data/lib/sycsvpro/mapper.rb
CHANGED
@@ -5,26 +5,26 @@ module Sycsvpro
|
|
5
5
|
#
|
6
6
|
# in.csv
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
8
|
+
# | ID | Name |
|
9
|
+
# | --- | ---- |
|
10
|
+
# | 1 | Hank |
|
11
|
+
# | 2 | Jane |
|
12
12
|
#
|
13
13
|
# mapping
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
15
|
+
# 1:01
|
16
|
+
# 2:02
|
17
17
|
#
|
18
|
-
#
|
19
|
-
#
|
20
|
-
#
|
21
|
-
#
|
18
|
+
# Sycsvpro::Mapping.new(infile: "in.csv",
|
19
|
+
# outfile: "out.csv",
|
20
|
+
# mapping: "mapping",
|
21
|
+
# cols: "0").execute
|
22
22
|
# out.csv
|
23
23
|
#
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
24
|
+
# | ID | Name |
|
25
|
+
# | --- | ---- |
|
26
|
+
# | 01 | Hank |
|
27
|
+
# | 02 | Jane |
|
28
28
|
class Mapper
|
29
29
|
|
30
30
|
include Dsl
|
data/lib/sycsvpro/merger.rb
CHANGED
@@ -5,28 +5,28 @@ module Sycsvpro
|
|
5
5
|
#
|
6
6
|
# file1.csv
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
8
|
+
# | | 2010 | 2011 | 2012 | 2013 |
|
9
|
+
# | --- | ---- | ---- | ---- | ---- |
|
10
|
+
# | SP | 20 | 30 | 40 | 50 |
|
11
|
+
# | RP | 30 | 40 | 50 | 60 |
|
12
12
|
#
|
13
13
|
# file2.csv
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
17
|
-
#
|
18
|
-
#
|
15
|
+
# | | 2010 | 2011 | 2012 |
|
16
|
+
# | --- | ---- | ---- | ---- |
|
17
|
+
# | M | m1 | m2 | m3 |
|
18
|
+
# | N | n1 | n2 | n3 |
|
19
19
|
#
|
20
20
|
# merging restults in
|
21
21
|
#
|
22
22
|
# merge.csv
|
23
23
|
#
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
28
|
-
#
|
29
|
-
#
|
24
|
+
# | | 2010 | 2011 | 2012 | 2013 |
|
25
|
+
# | --- | ---- | ---- | ---- | ---- |
|
26
|
+
# | SP | 20 | 30 | 40 | 50 |
|
27
|
+
# | RP | 30 | 40 | 50 | 60 |
|
28
|
+
# | M | m1 | m2 | m3 | |
|
29
|
+
# | N | n1 | n2 | n3 | |
|
30
30
|
#
|
31
31
|
class Merger
|
32
32
|
|
@@ -0,0 +1,36 @@
|
|
1
|
+
# Operating csv files
|
2
|
+
module Sycsvpro
|
3
|
+
|
4
|
+
# The NotAvailable class is an Eigenclass and used to represent a missing
|
5
|
+
# value. It will return if used in any expression always not available.
|
6
|
+
#
|
7
|
+
# na = NotAvailable
|
8
|
+
#
|
9
|
+
# na + 1 -> na
|
10
|
+
# 1 + na -> na
|
11
|
+
class NotAvailable
|
12
|
+
|
13
|
+
class << self
|
14
|
+
|
15
|
+
# Catches all expressions where na is the first argument
|
16
|
+
def method_missing(name, *args, &block)
|
17
|
+
super if name == :to_ary
|
18
|
+
super if name == :to_str
|
19
|
+
self
|
20
|
+
end
|
21
|
+
|
22
|
+
# Catches all expressions where na is not the first argument and swaps
|
23
|
+
# value and na, so na is first argument
|
24
|
+
def coerce(value)
|
25
|
+
[self,value]
|
26
|
+
end
|
27
|
+
|
28
|
+
# Returns NA as the string representation
|
29
|
+
def to_s
|
30
|
+
"NA"
|
31
|
+
end
|
32
|
+
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|