sycsvpro 0.1.13 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +1 -0
- data/Gemfile.lock +1 -1
- data/README.md +173 -4
- data/README.rdoc +2 -1
- data/bin/sycsvpro +43 -1
- data/lib/sycsvpro/aggregator.rb +7 -7
- data/lib/sycsvpro/allocator.rb +6 -6
- data/lib/sycsvpro/analyzer.rb +10 -10
- data/lib/sycsvpro/mapper.rb +14 -14
- data/lib/sycsvpro/merger.rb +14 -14
- data/lib/sycsvpro/not_available.rb +36 -0
- data/lib/sycsvpro/spread_sheet.rb +523 -0
- data/lib/sycsvpro/spread_sheet_builder.rb +104 -0
- data/lib/sycsvpro/transposer.rb +14 -15
- data/lib/sycsvpro/unique.rb +11 -12
- data/lib/sycsvpro/version.rb +1 -1
- data/lib/sycsvpro.rb +2 -0
- data/spec/sycsvpro/not_available_spec.rb +34 -0
- data/spec/sycsvpro/spread_sheet_builder_spec.rb +35 -0
- data/spec/sycsvpro/spread_sheet_spec.rb +415 -0
- data/sycsvpro.rdoc +25 -24
- metadata +8 -2
data/.gitignore
CHANGED
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -24,6 +24,8 @@ Processing of csv files. *sycsvpro* offers following functions
|
|
24
24
|
* join two file based on a joint column value (since version 0.1.7)
|
25
25
|
* merge files based on common headline columns (since version 0.1.10)
|
26
26
|
* transpose (swapping) rows and columns (since version 0.1.13)
|
27
|
+
* arithmetic operations between multiple files that have a table like
|
28
|
+
structure (since version 0.2.0)
|
27
29
|
|
28
30
|
To get help type
|
29
31
|
|
@@ -255,6 +257,158 @@ Write only columns 0, 6 and 7 by specifying write columns
|
|
255
257
|
chiro;2;20
|
256
258
|
0;10;100
|
257
259
|
|
260
|
+
Spread Sheet
|
261
|
+
------------
|
262
|
+
A spread sheet is a table with rows and columns. On or between spread sheets
|
263
|
+
operations can be conducted. A spread sheet's rows must have same column
|
264
|
+
sizes and may have row and column labels.
|
265
|
+
|
266
|
+
Use cases are
|
267
|
+
|
268
|
+
* arithmetic operations on spread sheets
|
269
|
+
* information about table like data
|
270
|
+
|
271
|
+
###Example for Arithmetic Operation
|
272
|
+
Asume we want to calculate the market for computer services. We have the count
|
273
|
+
of computers in each country, we are offering different services with service
|
274
|
+
specific prices. We know the market for each service in percent. With this data
|
275
|
+
we can calculate the market value.
|
276
|
+
|
277
|
+
Count of computers in target countries
|
278
|
+
|
279
|
+
[Tablet] [Laptop] [Desktop]
|
280
|
+
[CA] 1000 2000 500
|
281
|
+
[DE] 2000 3000 400
|
282
|
+
[MX] 500 4000 800
|
283
|
+
[RU] 1500 1500 1000
|
284
|
+
[TR] 1000 2500 3000
|
285
|
+
[US] 3000 3500 1200
|
286
|
+
|
287
|
+
Prices for different services offered computer specific
|
288
|
+
|
289
|
+
[Clean] [Maintain] [Repair]
|
290
|
+
[Tablet] 10 50 100
|
291
|
+
[Laptop] 20 60 150
|
292
|
+
[Desktop] 50 100 200
|
293
|
+
|
294
|
+
Market for the different services
|
295
|
+
|
296
|
+
[Clean] [Maintain] [Repair]
|
297
|
+
[Tablet] 0.10 0.05 0.03
|
298
|
+
[Laptop] 0.05 0.10 0.02
|
299
|
+
[Desktop] 0.20 0.30 0.04
|
300
|
+
|
301
|
+
To calculate the market value we have to multiply each row of the country file
|
302
|
+
with the columns of the service prices and service market file (for readabiltiy
|
303
|
+
it has been split up to multiple rows)
|
304
|
+
|
305
|
+
$ sycsvpro -o market_value.csv spreadsheet \
|
306
|
+
-f country.csv,prices.csv,market.csv \
|
307
|
+
-a country,price,market \
|
308
|
+
-o "SpreadSheet.bind_columns( \
|
309
|
+
country.transpose.column_collect { |value| value * price * market } \
|
310
|
+
).transpose"
|
311
|
+
|
312
|
+
Note: If you get obscure errors then check whether your aliases (-a flag)
|
313
|
+
conflict with a method of your classes. Therefore it is adviced to
|
314
|
+
always use specific names like in the example country, price, market
|
315
|
+
|
316
|
+
The result of the operation is written to market\_value.csv (labels have been
|
317
|
+
optimized for better readability)
|
318
|
+
|
319
|
+
[Tablet] [Laptop] [Desktop]
|
320
|
+
[CA-Clean] 1000.0 2000.0 5000.0
|
321
|
+
[CA-Maintain] 2500.0 12000.0 15000.0
|
322
|
+
[CA-Repair] 3000.0 6000.0 4000.0
|
323
|
+
[DE-Clean] 2000.0 3000.0 4000.0
|
324
|
+
[DE-Maintain] 5000.0 18000.0 12000.0
|
325
|
+
[DE-Repair] 6000.0 9000.0 3200.0
|
326
|
+
[MX-Clean] 500.0 4000.0 8000.0
|
327
|
+
[MX-Maintain] 1250.0 24000.0 24000.0
|
328
|
+
[MX-Repair] 1500.0 12000.0 6400.0
|
329
|
+
[RU-Clean] 1500.0 1500.0 10000.0
|
330
|
+
[RU-Maintain] 3750.0 9000.0 30000.0
|
331
|
+
[RU-Repair] 4500.0 4500.0 8000.0
|
332
|
+
[TR-Clean] 1000.0 2500.0 30000.0
|
333
|
+
[TR-Maintain] 2500.0 15000.0 90000.0
|
334
|
+
[TR-Repair] 3000.0 7500.0 24000.0
|
335
|
+
[US-Clean] 3000.0 3500.0 12000.0
|
336
|
+
[US-Maintain] 7500.0 21000.0 36000.0
|
337
|
+
[US-Repair] 9000.0 10500.0 9600.0
|
338
|
+
|
339
|
+
###Example for Information on Spread Sheets
|
340
|
+
With the analyze command we get information about the general structure and some
|
341
|
+
sample data of a csv file. If we want to look at the csv file more detailed we
|
342
|
+
can use the spreadsheet command. In this case we don't want to write the result
|
343
|
+
to the file as it is no spread sheet, so we can ommit the global -o option.
|
344
|
+
|
345
|
+
sycsvpro spreadsheet -f country.csv -r true -c true -a a \
|
346
|
+
-o "puts;puts a;puts a.ncol;puts a.nrow;puts a.size"
|
347
|
+
|
348
|
+
This will give us the information about the data, the number of columns and rows
|
349
|
+
and the number of values in the csv file. But for this case there is a standard
|
350
|
+
method that provides this information
|
351
|
+
|
352
|
+
sycsvpro spreadsheet -f country.csv -r true, -c true -a a -o "a.summary"
|
353
|
+
|
354
|
+
Summary
|
355
|
+
-------
|
356
|
+
rows: 6, columns: 3, dimension: [6, 3], size: 18
|
357
|
+
|
358
|
+
row labels:
|
359
|
+
["CA","DE","MX","RU","TR","US"]
|
360
|
+
column labels:
|
361
|
+
["Clean","Maintain","Repair"]
|
362
|
+
|
363
|
+
If the result is no spread sheet it won't be written to the outfile (-o) but we
|
364
|
+
can print the result to the console with the -p flag
|
365
|
+
|
366
|
+
sycsvpro spreadsheet -f country.csv,prices.csv,market.csv \
|
367
|
+
-r true,true,true -c true,true,true \
|
368
|
+
-a country,price,market \
|
369
|
+
-o "result = []; \
|
370
|
+
a.each_column { \
|
371
|
+
|column| result << column * price * market \
|
372
|
+
}; \
|
373
|
+
result" \
|
374
|
+
-p
|
375
|
+
|
376
|
+
The last evaluation, in this case result, will be returned as the result. The
|
377
|
+
-p flag will print the result to the console
|
378
|
+
|
379
|
+
Operation
|
380
|
+
---------
|
381
|
+
result = []
|
382
|
+
country.transpose.each_column { |column| result << column * price * market }
|
383
|
+
result
|
384
|
+
|
385
|
+
Result
|
386
|
+
------
|
387
|
+
[CA*Clean*Clean] [CA*Maintain*Maintain] [CA*Repair*Repair]
|
388
|
+
[Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
|
389
|
+
[Laptop*Laptop*Laptop] 2000.0 12000.0 6000.0
|
390
|
+
[Desktop*Desktop*Desktop] 5000.0 15000.0 4000.0
|
391
|
+
[DE*Clean*Clean] [DE*Maintain*Maintain] [DE*Repair*Repair]
|
392
|
+
[Tablet*Tablet*Tablet] 2000.0 5000.0 6000.0
|
393
|
+
[Laptop*Laptop*Laptop] 3000.0 18000.0 9000.0
|
394
|
+
[Desktop*Desktop*Desktop] 4000.0 12000.0 3200.0
|
395
|
+
[MX*Clean*Clean] [MX*Maintain*Maintain] [MX*Repair*Repair]
|
396
|
+
[Tablet*Tablet*Tablet] 500.0 1250.0 1500.0
|
397
|
+
[Laptop*Laptop*Laptop] 4000.0 24000.0 12000.0
|
398
|
+
[Desktop*Desktop*Desktop] 8000.0 24000.0 6400.0
|
399
|
+
[RU*Clean*Clean] [RU*Maintain*Maintain] [RU*Repair*Repair]
|
400
|
+
[Tablet*Tablet*Tablet] 1500.0 3750.0 4500.0
|
401
|
+
[Laptop*Laptop*Laptop] 1500.0 9000.0 4500.0
|
402
|
+
[Desktop*Desktop*Desktop] 10000.0 30000.0 8000.0
|
403
|
+
[TR*Clean*Clean] [TR*Maintain*Maintain] [TR*Repair*Repair]
|
404
|
+
[Tablet*Tablet*Tablet] 1000.0 2500.0 3000.0
|
405
|
+
[Laptop*Laptop*Laptop] 2500.0 15000.0 7500.0
|
406
|
+
[Desktop*Desktop*Desktop] 30000.0 90000.0 24000.0
|
407
|
+
[US*Clean*Clean] [US*Maintain*Maintain] [US*Repair*Repair]
|
408
|
+
[Tablet*Tablet*Tablet] 3000.0 7500.0 9000.0
|
409
|
+
[Laptop*Laptop*Laptop] 3500.0 21000.0 10500.0
|
410
|
+
[Desktop*Desktop*Desktop] 12000.0 36000.0 9600.0
|
411
|
+
|
258
412
|
Join
|
259
413
|
----
|
260
414
|
Join the machine and contract file with columns from the customer address file
|
@@ -412,16 +566,17 @@ want to dig deeper I would recommend [R](http://www.r-project.org/).
|
|
412
566
|
|
413
567
|
A work flow could be as follows
|
414
568
|
|
415
|
-
* Analyze the file `analyze`
|
569
|
+
* Analyze the file `analyze` or `spreadsheet`
|
416
570
|
* Clean the data `map`
|
417
571
|
* Extract rows and columns of interest `extract`
|
418
572
|
* Count values `count`
|
419
|
-
* Do arithmetic operations on the values `calc`
|
420
|
-
* Sort the rows based on column values
|
573
|
+
* Do arithmetic operations on the values `calc` or `spreadsheet`
|
574
|
+
* Sort the rows based on column values `sort`
|
421
575
|
|
422
576
|
When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to
|
423
577
|
create nice graphs. To create more sophisiticated analysis *R* is the right tool
|
424
|
-
to use.
|
578
|
+
to use. I also use sycsvpro to clean and prepare data and then do the analysis
|
579
|
+
with *R*.
|
425
580
|
|
426
581
|
Release notes
|
427
582
|
=============
|
@@ -557,6 +712,20 @@ Version 0.1.13
|
|
557
712
|
* Merger now doesn't require a key column that is files can be merged without
|
558
713
|
key columns.
|
559
714
|
|
715
|
+
Version 0.2.0
|
716
|
+
-------------
|
717
|
+
* SpreadSheet is used to conduct operations like multiplication, division,
|
718
|
+
addition and subtraction between multiple files that have a table like
|
719
|
+
structure. SpreadSheet can also be used to retrieve information about csv
|
720
|
+
files
|
721
|
+
|
722
|
+
Documentation
|
723
|
+
=============
|
724
|
+
The class documentation can be found at
|
725
|
+
[rubygems](https://rubygems.org/gems/sycsvpro) and the source code at
|
726
|
+
[github](https://github.com/sugaryourcoffee/syc-svpro). This might be valuable
|
727
|
+
when writing scripts.
|
728
|
+
|
560
729
|
Installation
|
561
730
|
============
|
562
731
|
[](http://badge.fury.io/rb/sycsvpro)
|
data/README.rdoc
CHANGED
data/bin/sycsvpro
CHANGED
@@ -405,6 +405,47 @@ command :table do |c|
|
|
405
405
|
|
406
406
|
end
|
407
407
|
|
408
|
+
desc 'Do arithmetic operation with table like data. The table has to have '+
|
409
|
+
'rows with same size. Arithmetic operations are *, /, + and - where the '+
|
410
|
+
'results can be concatenated. Complete functions can be looked up at '+
|
411
|
+
'https://rubygems.org/gem/sycsvpro'
|
412
|
+
command :spreadsheet do |c|
|
413
|
+
c.desc 'Files that contain the table data'
|
414
|
+
c.arg_name 'FILE_1,FILE_2,...,FILE_N'
|
415
|
+
c.flag [:f, :file]
|
416
|
+
|
417
|
+
c.desc 'Indicates for each file whether it has row labels'
|
418
|
+
c.arg_name 'true,false,...,true'
|
419
|
+
c.flag [:r, :rlabel]
|
420
|
+
|
421
|
+
c.desc 'Indicates for each file whether it has column labels'
|
422
|
+
c.arg_name 'true,false,...,false'
|
423
|
+
c.flag [:c, :clabel]
|
424
|
+
|
425
|
+
c.desc 'The alias for each file that is used in the arithmetic operation'
|
426
|
+
c.arg_name 'ALIAS_1,ALIAS_2,...,ALIAS_N'
|
427
|
+
c.flag [:a, :alias]
|
428
|
+
|
429
|
+
c.desc 'The arithmetic operation with the table data'
|
430
|
+
c.arg_name 'ARITHMETIC_OPERATION'
|
431
|
+
c.flag [:o, :operation]
|
432
|
+
|
433
|
+
c.desc 'Print the result of the operation'
|
434
|
+
c.switch [:p, :print], :default_value => false
|
435
|
+
|
436
|
+
c.action do |global_options,options,args|
|
437
|
+
print 'Operating...'
|
438
|
+
Sycsvpro::SpreadSheetBuilder.new(outfile: global_options[:o],
|
439
|
+
files: options[:f],
|
440
|
+
rlabels: options[:r],
|
441
|
+
clabels: options[:c],
|
442
|
+
aliases: options[:a],
|
443
|
+
operation: options[:o],
|
444
|
+
print: options[:p]).execute
|
445
|
+
print 'done'
|
446
|
+
end
|
447
|
+
end
|
448
|
+
|
408
449
|
desc 'Join two files based on a joint column value'
|
409
450
|
arg_name 'SOURCE_FILE'
|
410
451
|
command :join do |c|
|
@@ -688,7 +729,8 @@ pre do |global,command,options,args|
|
|
688
729
|
unless command.name == :edit or
|
689
730
|
command.name == :execute or
|
690
731
|
command.name == :list or
|
691
|
-
command.name == :merge
|
732
|
+
command.name == :merge or
|
733
|
+
command.name == :spreadsheet
|
692
734
|
analyzer = Sycsvpro::Analyzer.new(global[:f])
|
693
735
|
result = analyzer.result
|
694
736
|
count = result.row_count
|
data/lib/sycsvpro/aggregator.rb
CHANGED
@@ -10,16 +10,16 @@ module Sycsvpro
|
|
10
10
|
#
|
11
11
|
# in.csv
|
12
12
|
#
|
13
|
-
#
|
14
|
-
#
|
15
|
-
#
|
16
|
-
#
|
13
|
+
# | Customer | 2013 | 2014 |
|
14
|
+
# | A | A1 | |
|
15
|
+
# | B | B1 | B16 |
|
16
|
+
# | A | A3 | A7 |
|
17
17
|
#
|
18
18
|
# out.csv
|
19
19
|
#
|
20
|
-
#
|
21
|
-
#
|
22
|
-
#
|
20
|
+
# | Customer | 2013 | 2014 | Sum |
|
21
|
+
# | A | 2 | 1 | 3 |
|
22
|
+
# | B | 1 | 1 | 2 |
|
23
23
|
class Aggregator
|
24
24
|
|
25
25
|
include Dsl
|
data/lib/sycsvpro/allocator.rb
CHANGED
@@ -5,15 +5,15 @@ module Sycsvpro
|
|
5
5
|
#
|
6
6
|
# infile.csv
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
8
|
+
# | Name | Product |
|
9
|
+
# | A | X1 |
|
10
|
+
# | B | Y2 |
|
11
|
+
# | A | W10 |
|
12
12
|
#
|
13
13
|
# outfile.csv
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
15
|
+
# | A | X1 | W10 |
|
16
|
+
# | B | Y2 | |
|
17
17
|
class Allocator
|
18
18
|
|
19
19
|
# File from that values are read
|
data/lib/sycsvpro/analyzer.rb
CHANGED
@@ -6,19 +6,19 @@ module Sycsvpro
|
|
6
6
|
|
7
7
|
# Analyzes the file structure
|
8
8
|
#
|
9
|
-
#
|
10
|
-
#
|
9
|
+
# | Name | C1 | C2 |
|
10
|
+
# | A | a | b |
|
11
11
|
#
|
12
|
-
#
|
13
|
-
#
|
12
|
+
# 3 columns: ["Name", "C1", "C2"]
|
13
|
+
# 2 rows
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
15
|
+
# Row sample data:
|
16
|
+
# A;b;c
|
17
17
|
#
|
18
|
-
#
|
19
|
-
#
|
20
|
-
#
|
21
|
-
#
|
18
|
+
# Column index: Column name | Column sample value
|
19
|
+
# 0: Name | A
|
20
|
+
# 1: C1 | a
|
21
|
+
# 2: C2 | b
|
22
22
|
class Analyzer
|
23
23
|
|
24
24
|
# File that is analyzed
|
data/lib/sycsvpro/mapper.rb
CHANGED
@@ -5,26 +5,26 @@ module Sycsvpro
|
|
5
5
|
#
|
6
6
|
# in.csv
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
8
|
+
# | ID | Name |
|
9
|
+
# | --- | ---- |
|
10
|
+
# | 1 | Hank |
|
11
|
+
# | 2 | Jane |
|
12
12
|
#
|
13
13
|
# mapping
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
15
|
+
# 1:01
|
16
|
+
# 2:02
|
17
17
|
#
|
18
|
-
#
|
19
|
-
#
|
20
|
-
#
|
21
|
-
#
|
18
|
+
# Sycsvpro::Mapping.new(infile: "in.csv",
|
19
|
+
# outfile: "out.csv",
|
20
|
+
# mapping: "mapping",
|
21
|
+
# cols: "0").execute
|
22
22
|
# out.csv
|
23
23
|
#
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
24
|
+
# | ID | Name |
|
25
|
+
# | --- | ---- |
|
26
|
+
# | 01 | Hank |
|
27
|
+
# | 02 | Jane |
|
28
28
|
class Mapper
|
29
29
|
|
30
30
|
include Dsl
|
data/lib/sycsvpro/merger.rb
CHANGED
@@ -5,28 +5,28 @@ module Sycsvpro
|
|
5
5
|
#
|
6
6
|
# file1.csv
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
8
|
+
# | | 2010 | 2011 | 2012 | 2013 |
|
9
|
+
# | --- | ---- | ---- | ---- | ---- |
|
10
|
+
# | SP | 20 | 30 | 40 | 50 |
|
11
|
+
# | RP | 30 | 40 | 50 | 60 |
|
12
12
|
#
|
13
13
|
# file2.csv
|
14
14
|
#
|
15
|
-
#
|
16
|
-
#
|
17
|
-
#
|
18
|
-
#
|
15
|
+
# | | 2010 | 2011 | 2012 |
|
16
|
+
# | --- | ---- | ---- | ---- |
|
17
|
+
# | M | m1 | m2 | m3 |
|
18
|
+
# | N | n1 | n2 | n3 |
|
19
19
|
#
|
20
20
|
# merging restults in
|
21
21
|
#
|
22
22
|
# merge.csv
|
23
23
|
#
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
28
|
-
#
|
29
|
-
#
|
24
|
+
# | | 2010 | 2011 | 2012 | 2013 |
|
25
|
+
# | --- | ---- | ---- | ---- | ---- |
|
26
|
+
# | SP | 20 | 30 | 40 | 50 |
|
27
|
+
# | RP | 30 | 40 | 50 | 60 |
|
28
|
+
# | M | m1 | m2 | m3 | |
|
29
|
+
# | N | n1 | n2 | n3 | |
|
30
30
|
#
|
31
31
|
class Merger
|
32
32
|
|
@@ -0,0 +1,36 @@
|
|
1
|
+
# Operating csv files
|
2
|
+
module Sycsvpro
|
3
|
+
|
4
|
+
# The NotAvailable class is an Eigenclass and used to represent a missing
|
5
|
+
# value. It will return if used in any expression always not available.
|
6
|
+
#
|
7
|
+
# na = NotAvailable
|
8
|
+
#
|
9
|
+
# na + 1 -> na
|
10
|
+
# 1 + na -> na
|
11
|
+
class NotAvailable
|
12
|
+
|
13
|
+
class << self
|
14
|
+
|
15
|
+
# Catches all expressions where na is the first argument
|
16
|
+
def method_missing(name, *args, &block)
|
17
|
+
super if name == :to_ary
|
18
|
+
super if name == :to_str
|
19
|
+
self
|
20
|
+
end
|
21
|
+
|
22
|
+
# Catches all expressions where na is not the first argument and swaps
|
23
|
+
# value and na, so na is first argument
|
24
|
+
def coerce(value)
|
25
|
+
[self,value]
|
26
|
+
end
|
27
|
+
|
28
|
+
# Returns NA as the string representation
|
29
|
+
def to_s
|
30
|
+
"NA"
|
31
|
+
end
|
32
|
+
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|