RubyGems - sycsvpro - Versions diffs - 0.1.13 → 0.2.0 - Mend

sycsvpro 0.1.13 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

data/.gitignore +1 -0
data/Gemfile.lock +1 -1
data/README.md +173 -4
data/README.rdoc +2 -1
data/bin/sycsvpro +43 -1
data/lib/sycsvpro/aggregator.rb +7 -7
data/lib/sycsvpro/allocator.rb +6 -6
data/lib/sycsvpro/analyzer.rb +10 -10
data/lib/sycsvpro/mapper.rb +14 -14
data/lib/sycsvpro/merger.rb +14 -14
data/lib/sycsvpro/not_available.rb +36 -0
data/lib/sycsvpro/spread_sheet.rb +523 -0
data/lib/sycsvpro/spread_sheet_builder.rb +104 -0
data/lib/sycsvpro/transposer.rb +14 -15
data/lib/sycsvpro/unique.rb +11 -12
data/lib/sycsvpro/version.rb +1 -1
data/lib/sycsvpro.rb +2 -0
data/spec/sycsvpro/not_available_spec.rb +34 -0
data/spec/sycsvpro/spread_sheet_builder_spec.rb +35 -0
data/spec/sycsvpro/spread_sheet_spec.rb +415 -0
data/sycsvpro.rdoc +25 -24
metadata +8 -2

data/.gitignore CHANGED Viewed

@@ -19,3 +19,4 @@ doc/
 # Test files
 *.csv
+test-files/

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    sycsvpro (0.1.13)
+    sycsvpro (0.2.0)
       gli (= 2.9.0)
       timeleap (~> 0.0.1)

data/README.md CHANGED Viewed

@@ -24,6 +24,8 @@ Processing of csv files. *sycsvpro* offers following functions
 * join two file based on a joint column value (since version 0.1.7)
 * merge files based on common headline columns (since version 0.1.10)
 * transpose (swapping) rows and columns (since version 0.1.13)
+* arithmetic operations between multiple files that have a table like
+  structure (since version 0.2.0)
 To get help type
@@ -255,6 +257,158 @@ Write only columns 0, 6 and 7 by specifying write columns
     chiro;2;20
     0;10;100
+Spread Sheet
+------------
+A spread sheet is a table with rows and columns. On or between spread sheets
+operations can be conducted. A spread sheet's rows must have same column
+sizes and may have row and column labels.
+Use cases are
+* arithmetic operations on spread sheets
+* information about table like data
+###Example for Arithmetic Operation
+Asume we want to calculate the market for computer services. We have the count
+of computers in each country, we are offering different services with service
+specific prices. We know the market for each service in percent. With this data
+we can calculate the market value.
+Count of computers in target countries
+            [Tablet] [Laptop] [Desktop]
+    [CA]        1000     2000       500
+    [DE]        2000     3000       400
+    [MX]         500     4000       800
+    [RU]        1500     1500      1000
+    [TR]        1000     2500      3000
+    [US]        3000     3500      1200
+Prices for different services offered computer specific
+              [Clean] [Maintain] [Repair]
+    [Tablet]       10         50      100
+    [Laptop]       20         60      150
+    [Desktop]      50        100      200
+Market for the different services
+              [Clean] [Maintain] [Repair]
+    [Tablet]     0.10       0.05     0.03
+    [Laptop]     0.05       0.10     0.02
+    [Desktop]    0.20       0.30     0.04
+To calculate the market value we have to multiply each row of the country file
+with the columns of the service prices and service market file (for readabiltiy
+it has been split up to multiple rows)
+    $ sycsvpro -o market_value.csv spreadsheet \
+      -f country.csv,prices.csv,market.csv \
+      -a country,price,market \
+      -o "SpreadSheet.bind_columns( \
+          country.transpose.column_collect { |value| value * price * market } \
+        ).transpose"
+    Note: If you get obscure errors then check whether your aliases (-a flag)
+          conflict with a method of your classes. Therefore it is adviced to
+          always use specific names like in the example country, price, market
+The result of the operation is written to market\_value.csv (labels have been
+optimized for better readability)
+                  [Tablet] [Laptop] [Desktop]
+    [CA-Clean]      1000.0   2000.0    5000.0
+    [CA-Maintain]   2500.0  12000.0   15000.0
+    [CA-Repair]     3000.0   6000.0    4000.0
+    [DE-Clean]      2000.0   3000.0    4000.0
+    [DE-Maintain]   5000.0  18000.0   12000.0
+    [DE-Repair]     6000.0   9000.0    3200.0
+    [MX-Clean]       500.0   4000.0    8000.0
+    [MX-Maintain]   1250.0  24000.0   24000.0
+    [MX-Repair]     1500.0  12000.0    6400.0
+    [RU-Clean]      1500.0   1500.0   10000.0
+    [RU-Maintain]   3750.0   9000.0   30000.0
+    [RU-Repair]     4500.0   4500.0    8000.0
+    [TR-Clean]      1000.0   2500.0   30000.0
+    [TR-Maintain]   2500.0  15000.0   90000.0
+    [TR-Repair]     3000.0   7500.0   24000.0
+    [US-Clean]      3000.0   3500.0   12000.0
+    [US-Maintain]   7500.0  21000.0   36000.0
+    [US-Repair]     9000.0  10500.0    9600.0
+###Example for Information on Spread Sheets
+With the analyze command we get information about the general structure and some
+sample data of a csv file. If we want to look at the csv file more detailed we
+can use the spreadsheet command. In this case we don't want to write the result
+to the file as it is no spread sheet, so we can ommit the global -o option.
+    sycsvpro spreadsheet -f country.csv -r true -c true -a a \
+                         -o "puts;puts a;puts a.ncol;puts a.nrow;puts a.size"
+This will give us the information about the data, the number of columns and rows
+and the number of values in the csv file. But for this case there is a standard
+method that provides this information
+    sycsvpro spreadsheet -f country.csv -r true, -c true -a a -o "a.summary"
+    Summary
+    -------
+    rows: 6, columns: 3, dimension: [6, 3], size: 18
+    row labels:
+     ["CA","DE","MX","RU","TR","US"]
+    column labels:
+     ["Clean","Maintain","Repair"]
+If the result is no spread sheet it won't be written to the outfile (-o) but we
+can print the result to the console with the -p flag
+    sycsvpro spreadsheet -f country.csv,prices.csv,market.csv \
+                         -r true,true,true -c true,true,true \
+                         -a country,price,market \
+                         -o "result = []; \
+                             a.each_column { \
+                               |column| result << column * price * market \
+                             }; \
+                             result" \
+                         -p
+The last evaluation, in this case result, will be returned as the result. The
+-p flag will print the result to the console
+    Operation
+    ---------
+    result = []
+    country.transpose.each_column { |column| result << column * price * market }
+    result
+    Result
+    ------
+                              [CA*Clean*Clean] [CA*Maintain*Maintain] [CA*Repair*Repair]
+       [Tablet*Tablet*Tablet]           1000.0                 2500.0             3000.0
+       [Laptop*Laptop*Laptop]           2000.0                12000.0             6000.0
+    [Desktop*Desktop*Desktop]           5000.0                15000.0             4000.0
+                              [DE*Clean*Clean] [DE*Maintain*Maintain] [DE*Repair*Repair]
+       [Tablet*Tablet*Tablet]           2000.0                 5000.0             6000.0
+       [Laptop*Laptop*Laptop]           3000.0                18000.0             9000.0
+    [Desktop*Desktop*Desktop]           4000.0                12000.0             3200.0
+                              [MX*Clean*Clean] [MX*Maintain*Maintain] [MX*Repair*Repair]
+       [Tablet*Tablet*Tablet]            500.0                 1250.0             1500.0
+       [Laptop*Laptop*Laptop]           4000.0                24000.0            12000.0
+    [Desktop*Desktop*Desktop]           8000.0                24000.0             6400.0
+                              [RU*Clean*Clean] [RU*Maintain*Maintain] [RU*Repair*Repair]
+       [Tablet*Tablet*Tablet]           1500.0                 3750.0             4500.0
+       [Laptop*Laptop*Laptop]           1500.0                 9000.0             4500.0
+    [Desktop*Desktop*Desktop]          10000.0                30000.0             8000.0
+                              [TR*Clean*Clean] [TR*Maintain*Maintain] [TR*Repair*Repair]
+       [Tablet*Tablet*Tablet]           1000.0                 2500.0             3000.0
+       [Laptop*Laptop*Laptop]           2500.0                15000.0             7500.0
+    [Desktop*Desktop*Desktop]          30000.0                90000.0            24000.0
+                              [US*Clean*Clean] [US*Maintain*Maintain] [US*Repair*Repair]
+       [Tablet*Tablet*Tablet]           3000.0                 7500.0             9000.0
+       [Laptop*Laptop*Laptop]           3500.0                21000.0            10500.0
+    [Desktop*Desktop*Desktop]          12000.0                36000.0             9600.0
 Join
 ----
 Join the machine and contract file with columns from the customer address file
@@ -412,16 +566,17 @@ want to dig deeper I would recommend [R](http://www.r-project.org/).
 A work flow could be as follows
-* Analyze the file `analyze`
+* Analyze the file `analyze` or `spreadsheet`
 * Clean the data `map`
 * Extract rows and columns of interest `extract`
 * Count values `count`
-* Do arithmetic operations on the values `calc`
-* Sort the rows based on column values
+* Do arithmetic operations on the values `calc` or `spreadsheet`
+* Sort the rows based on column values `sort`
 When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to
 create nice graphs. To create more sophisiticated analysis *R* is the right tool
-to use.
+to use. I also use sycsvpro to clean and prepare data and then do the analysis
+with *R*.
 Release notes
 =============
@@ -557,6 +712,20 @@ Version 0.1.13
 * Merger now doesn't require a key column that is files can be merged without
   key columns.
+Version 0.2.0
+-------------
+* SpreadSheet is used to conduct operations like multiplication, division,
+  addition and subtraction between multiple files that have a table like
+  structure. SpreadSheet can also be used to retrieve information about csv
+  files
+Documentation
+=============
+The class documentation can be found at
+[rubygems](https://rubygems.org/gems/sycsvpro) and the source code at
+[github](https://github.com/sugaryourcoffee/syc-svpro). This might be valuable
+when writing scripts.
 Installation
 ============
 [![Gem Version](https://badge.fury.io/rb/sycsvpro.png)](http://badge.fury.io/rb/sycsvpro)

data/README.rdoc CHANGED Viewed

@@ -42,5 +42,6 @@ Test files are in
     spec/sycsvpro/files
-:include: sycsvpro.rdoc
+== Help contents
+:include:sycsvpro.rdoc

data/bin/sycsvpro CHANGED Viewed

@@ -405,6 +405,47 @@ command :table do |c|
 end
+desc 'Do arithmetic operation with table like data. The table has to have '+
+     'rows with same size. Arithmetic operations are *, /, + and - where the '+
+     'results can be concatenated. Complete functions can be looked up at '+
+     'https://rubygems.org/gem/sycsvpro'
+command :spreadsheet do |c|
+  c.desc 'Files that contain the table data'
+  c.arg_name 'FILE_1,FILE_2,...,FILE_N'
+  c.flag [:f, :file]
+  c.desc 'Indicates for each file whether it has row labels'
+  c.arg_name 'true,false,...,true'
+  c.flag [:r, :rlabel]
+  c.desc 'Indicates for each file whether it has column labels'
+  c.arg_name 'true,false,...,false'
+  c.flag [:c, :clabel]
+  c.desc 'The alias for each file that is used in the arithmetic operation'
+  c.arg_name 'ALIAS_1,ALIAS_2,...,ALIAS_N'
+  c.flag [:a, :alias]
+  c.desc 'The arithmetic operation with the table data'
+  c.arg_name 'ARITHMETIC_OPERATION'
+  c.flag [:o, :operation]
+  c.desc 'Print the result of the operation'
+  c.switch [:p, :print], :default_value => false
+  c.action do |global_options,options,args|
+    print 'Operating...'
+    Sycsvpro::SpreadSheetBuilder.new(outfile:   global_options[:o],
+                                     files:     options[:f],
+                                     rlabels:   options[:r],
+                                     clabels:   options[:c],
+                                     aliases:   options[:a],
+                                     operation: options[:o],
+                                     print:     options[:p]).execute
+    print 'done'
+  end
+end
 desc 'Join two files based on a joint column value'
 arg_name 'SOURCE_FILE'
 command :join do |c|
@@ -688,7 +729,8 @@ pre do |global,command,options,args|
   unless command.name == :edit or
          command.name == :execute or
          command.name == :list or
-         command.name == :merge
+         command.name == :merge or
+         command.name == :spreadsheet
     analyzer = Sycsvpro::Analyzer.new(global[:f])
     result = analyzer.result
     count = result.row_count

data/lib/sycsvpro/aggregator.rb CHANGED Viewed

@@ -10,16 +10,16 @@ module Sycsvpro
   #
   # in.csv
   #
-  # | Customer | 2013 | 2014 |
-  # | A        | A1   |      |
-  # | B        | B1   | B16  |
-  # | A        | A3   | A7   |
+  #   | Customer | 2013 | 2014 |
+  #   | A        | A1   |      |
+  #   | B        | B1   | B16  |
+  #   | A        | A3   | A7   |
   #
   # out.csv
   #
-  # | Customer | 2013 | 2014 | Sum |
-  # | A        | 2    | 1    | 3   |
-  # | B        | 1    | 1    | 2   |
+  #   | Customer | 2013 | 2014 | Sum |
+  #   | A        | 2    | 1    | 3   |
+  #   | B        | 1    | 1    | 2   |
   class Aggregator
     include Dsl

data/lib/sycsvpro/allocator.rb CHANGED Viewed

@@ -5,15 +5,15 @@ module Sycsvpro
   #
   # infile.csv
   #
-  # | Name | Product |
-  # | A    | X1      |
-  # | B    | Y2      |
-  # | A    | W10     |
+  #   | Name | Product |
+  #   | A    | X1      |
+  #   | B    | Y2      |
+  #   | A    | W10     |
   #
   # outfile.csv
   #
-  # | A    | X1 | W10 |
-  # | B    | Y2 |     |
+  #   | A    | X1 | W10 |
+  #   | B    | Y2 |     |
   class Allocator
     # File from that values are read

data/lib/sycsvpro/analyzer.rb CHANGED Viewed

@@ -6,19 +6,19 @@ module Sycsvpro
   # Analyzes the file structure
   #
-  # | Name | C1 | C2 |
-  # | A    | a  | b  |
+  #   | Name | C1 | C2 |
+  #   | A    | a  | b  |
   #
-  # 3 columns: ["Name", "C1", "C2"]
-  # 2 rows
+  #   3 columns: ["Name", "C1", "C2"]
+  #   2 rows
   #
-  # Row sample data:
-  # A;b;c
+  #   Row sample data:
+  #   A;b;c
   #
-  # Column index: Column name | Column sample value
-  # 0: Name | A
-  # 1: C1 | a
-  # 2: C2 | b
+  #   Column index: Column name | Column sample value
+  #   0: Name | A
+  #   1: C1 | a
+  #   2: C2 | b
   class Analyzer
     # File that is analyzed

data/lib/sycsvpro/mapper.rb CHANGED Viewed

@@ -5,26 +5,26 @@ module Sycsvpro
   #
   # in.csv
   #
-  # | ID  | Name |
-  # | --- | ---- |
-  # | 1   | Hank |
-  # | 2   | Jane |
+  #     | ID  | Name |
+  #     | --- | ---- |
+  #     | 1   | Hank |
+  #     | 2   | Jane |
   #
   # mapping
   #
-  # 1:01
-  # 2:02
+  #     1:01
+  #     2:02
   #
-  # Sycsvpro::Mapping.new(infile:  "in.csv",
-  #                       outfile: "out.csv",
-  #                       mapping: "mapping",
-  #                       cols:    "0").execute
+  #     Sycsvpro::Mapping.new(infile:  "in.csv",
+  #                           outfile: "out.csv",
+  #                           mapping: "mapping",
+  #                           cols:    "0").execute
   # out.csv
   #
-  # | ID  | Name |
-  # | --- | ---- |
-  # | 01  | Hank |
-  # | 02  | Jane |
+  #     | ID  | Name |
+  #     | --- | ---- |
+  #     | 01  | Hank |
+  #     | 02  | Jane |
   class Mapper
     include Dsl

data/lib/sycsvpro/merger.rb CHANGED Viewed

@@ -5,28 +5,28 @@ module Sycsvpro
   #
   # file1.csv
   #
-  # |     | 2010 | 2011 | 2012 | 2013 |
-  # | --- | ---- | ---- | ---- | ---- |
-  # | SP  | 20   | 30   | 40   | 50   |
-  # | RP  | 30   | 40   | 50   | 60   |
+  #     |     | 2010 | 2011 | 2012 | 2013 |
+  #     | --- | ---- | ---- | ---- | ---- |
+  #     | SP  | 20   | 30   | 40   | 50   |
+  #     | RP  | 30   | 40   | 50   | 60   |
   #
   # file2.csv
   #
-  # |     | 2010 | 2011 | 2012 |
-  # | --- | ---- | ---- | ---- |
-  # | M   | m1   | m2   | m3   |
-  # | N   | n1   | n2   | n3   |
+  #     |     | 2010 | 2011 | 2012 |
+  #     | --- | ---- | ---- | ---- |
+  #     | M   | m1   | m2   | m3   |
+  #     | N   | n1   | n2   | n3   |
   #
   # merging restults in
   #
   # merge.csv
   #
-  # |     | 2010 | 2011 | 2012 | 2013 |
-  # | --- | ---- | ---- | ---- | ---- |
-  # | SP  | 20   | 30   | 40   | 50   |
-  # | RP  | 30   | 40   | 50   | 60   |
-  # | M   | m1   | m2   | m3   |      |
-  # | N   | n1   | n2   | n3   |      |
+  #     |     | 2010 | 2011 | 2012 | 2013 |
+  #     | --- | ---- | ---- | ---- | ---- |
+  #     | SP  | 20   | 30   | 40   | 50   |
+  #     | RP  | 30   | 40   | 50   | 60   |
+  #     | M   | m1   | m2   | m3   |      |
+  #     | N   | n1   | n2   | n3   |      |
   #
   class Merger

data/lib/sycsvpro/not_available.rb ADDED Viewed

@@ -0,0 +1,36 @@
+# Operating csv files
+module Sycsvpro
+  # The NotAvailable class is an Eigenclass and used to represent a missing
+  # value. It will return if used in any expression always not available.
+  #
+  #    na = NotAvailable
+  #
+  #    na + 1 -> na
+  #    1 + na -> na
+  class NotAvailable
+    class << self
+      # Catches all expressions where na is the first argument
+      def method_missing(name, *args, &block)
+        super if name == :to_ary
+        super if name == :to_str
+        self
+      end
+      # Catches all expressions where na is not the first argument and swaps
+      # value and na, so na is first argument
+      def coerce(value)
+        [self,value]
+      end
+      # Returns NA as the string representation
+      def to_s
+        "NA"
+      end
+    end
+  end
+end