RubyGems - csvutils - Versions diffs - 0.2.1 → 0.2.2 - Mend

csvutils 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: beeb6af86232b078b710e1bd744a6fe7cc0b40b0
-  data.tar.gz: 6def57af46d09b47c9669aecf1937c49b77b3eff
+  metadata.gz: c639c6a13e149e81b5255262884abd02fae3c22e
+  data.tar.gz: 73bec69652d262ebf616c00e8351f8788b0d022a
 SHA512:
-  metadata.gz: a4641b42b5b613d859fe89ee41d6afc13d3a748eb917ba1a368a2d6fb8e9a8884dd2b9c9d533af4188405378a7d6678fb4e312ba3e6b27cd2de6564dd163f5c1
-  data.tar.gz: 98cf752445c23b9ec80c8ba4327e83df39d09cae008e7f246f18c99226827929f58cf84ff497069cbbeec61e69fe6b9a8810910b2c02c14613ac7fc26a4a9e18
+  metadata.gz: 45d8ba36d58577f3362cea9ead2ff9b952992f5038cb453dad6d666948a498269b60ef2c7d15b1c260b291a69c047ab3616ba62d93c92ac6af61cd5e8ae703fc
+  data.tar.gz: 56ebd4f88a54411e97312a832de6d2d830e1af15e287923535dd51edd4ac11c09ff525a7a27363d9c4255defbae7dc6040f6bb2202439fc06d4bf15adfe01b54

data/README.md CHANGED

@@ -8,10 +8,268 @@
 * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
 ## Usage
+### Command Line Tools
+`csvhead`  •  `csvheader`  •  `csvstat`  •  `csvsplit`  •  `csvcut`
+Try:
+```
+$ csvhead -h          # or
+$ csvhead --help
+```
+resulting in:
+```
+Usage: csvhead [OPTS] datafile ...
+    -n, --num=NUM                    Number of rows
+    -h, --help                       Prints this help
+```
+and so on. Now try with `csvheader -h`, `csvstat -h`, `csvsplit -h`,
+`csvcut -h` and so on.
+#### Working with Comma-Separated Values (CSV) Datafile Examples
+Let's use a sample datafile e.g. `E0.csv` from the [football.csv project]() with
+matches from the English Premier League. Try
+```
+$ csvhead E0.csv
+```
+to pretty print (pp) the first four row (use the `-n` option for more or less rows).
+Resulting in:
+```
+== E0.csv (.) ==
+#<CSV::Row "Date":nil "HomeTeam":"Arsenal" "AwayTeam":"Leicester" "FTHG":"4" "FTAG":"3" "HTHG":"2" "HTAG":"2">
+#<CSV::Row "Date":nil "HomeTeam":"Brighton" "AwayTeam":"Man City" "FTHG":"0" "FTAG":"2" "HTHG":"?" "HTAG":"?">
+#<CSV::Row "Date":"12/08/17" "HomeTeam":"Chelsea" "AwayTeam":"Burnley" "FTHG":"2" "FTAG":"3" "HTHG":"?" "HTAG":"?">
+#<CSV::Row "Date":"-" "HomeTeam":"Crystal Palace" "AwayTeam":"Huddersfield" "FTHG":"0" "FTAG":"3" "HTHG":"0" "HTAG":"2">
+ 4 rows
+```
+Next try
+```
+$ csvheader E0.csv
+```
+to print all header columns (the first row). Resulting in:
+```
+== E0.csv (.) ==
+7 columns:
+  1: Date
+  2: HomeTeam
+  3: AwayTeam
+  4: FTHG
+  5: FTAG
+  6: HTHG
+  7: HTAG
+```
+Next try:
+```
+$ csvstat -c HomeTeam,AwayTeam E0.csv
+```
+to show all unique values for the columns `HomeTeam` and `AwayTeam`.
+Resulting in:
+```
+== E0.csv (.) ==
+... 380 rows
+7 columns:
+  1: Date
+  2: HomeTeam
+  3: AwayTeam
+  4: FTHG
+  5: FTAG
+  6: HTHG
+  7: HTAG
+ column >HomeTeam< 21 unique values:
+   1 x  <nil>
+   19 x  Arsenal
+   18 x  Bournemouth
+   19 x  Brighton
+   19 x  Burnley
+   19 x  Chelsea
+   19 x  Crystal Palace
+   19 x  Everton
+   19 x  Huddersfield
+   19 x  Leicester
+   19 x  Liverpool
+   19 x  Man City
+   19 x  Man United
+   19 x  Newcastle
+   19 x  Southampton
+   19 x  Stoke
+   19 x  Swansea
+   19 x  Tottenham
+   19 x  Watford
+   19 x  West Brom
+   19 x  West Ham
+ column >AwayTeam< 21 unique values:
+   1 x  ?
+   19 x  Arsenal
+   19 x  Bournemouth
+   19 x  Brighton
+   19 x  Burnley
+   19 x  Chelsea
+   19 x  Crystal Palace
+   19 x  Everton
+   19 x  Huddersfield
+   19 x  Leicester
+   19 x  Liverpool
+   19 x  Man City
+   19 x  Man United
+   19 x  Newcastle
+   19 x  Southampton
+   19 x  Stoke
+   19 x  Swansea
+   19 x  Tottenham
+   18 x  Watford
+   19 x  West Brom
+   19 x  West Ham
+```
+#### Split & Cut - Split One Datafile into Many or Cut / Reorder Columns
+Let's use another sample datafile e.g. `AUT.csv` that holds many seasons
+from the Austrian (AUT) Bundesliga. First lets see how many seasons:
+```
+$ csvstat -c Season AUT.csv
+```
+Resulting in:
+```
+== AUT.csv (.) ==
+... 362 rows
+7 columns:
+  1: Season
+  2: Date
+  3: Time
+  4: Home
+  5: Away
+  6: HG
+  7: AG
+ column >Season< 2 unique values:
+   180 x  2016/2017
+   182 x  2017/2018
+```
+Now let's split the `AUT.csv` datafile by the `Season` column
+resulting in two new datafiles named `AUT_2016-2017.csv`
+and `ÀUT_2017-2018.csv`. Try:
+```
+$ csvsplit -c Season AUT.csv
+```
+Resulting in:
+```
+new chunk: ["2016/2017"]
+  saving >AUT_2016-2017.csv<...
+new chunk: ["2017/2018"]
+  saving >AUT_2017-2018.csv<...
+```
+Let's cut out (remove) the `Season` and `Time` column from the new `AUT_2016-2017.csv`
+datafile. Try:
+```
+$ csvcut -c Date,Home,Away,HG,AG AUT_2016-2017.csv
+```
+Double check the overwritten cleaned-up datafile:
+```
+$ csvhead AUT_2016-2017.csv
+```
+resulting in:
+```
+```
+And so on and so forth.
+### Code, Code, Code - Script Your Data Work Flow with Ruby
+You can use all tools in your script using the `CsvUtils`
+class methods:
+| Shell       | Ruby                              |
+|-------------|-----------------------------------|
+| `csvhead`   |  `CsvUtils.head( path, n: 4 )`    |
+| `csvheader` |  `CsvUtils.header( path )`        |
+| `csvstat`   |  `CsvUtils.stat( path, *columns )`  |
+| `csvsplit`  |  `CsvUtils.split( path, *columns )` |
+| `csvcut`    |  `CsvUtils.cut( path, *columns, output: path)`   |
+Let's retry the sample above in a script:
+``` ruby
+require 'csvutils'
+CsvUtils.head( 'E0.csv' )
+# same as
+#  $ csvhead E0.csv
+CsvUtils.header( 'E0.csv' )
+# => see above :-)
+CsvUtils.stat( 'E0.csv', 'HomeTeam', 'AwayTeam' )
+# same as:
+#  $ csvstat -c HomeTeam,AwayTeam E0.csv
+CsvUtils.stat( 'AUT.csv', 'Season' )
+# => same as
+#  $ csvstat -c Season AUT.csv
+CsvUtils.split( 'AUT.csv', 'Season' )
+# => see above :-)
+CsvUtils.cut( 'AUT_2016-2017.csv', 'AUT_2016-2017.csv', 'Date', 'Home', 'Away', 'HG', 'AG' )
+# => see above :-)
+```
+That's it.
 ## License

data/lib/csvutils/commands/cut.rb CHANGED

@@ -9,7 +9,7 @@ def self.cut( args )
   config = { columns: [] }
   parser = OptionParser.new do |opts|
-     opts.banner = "Usage: csvcut [OPTS] source dest"
+     opts.banner = "Usage: csvcut [OPTS] source [dest]"
      opts.on("-c", "--columns=COLUMNS", "Name of header columns" ) do |columns|
        config[:columns] = columns.split(/[,|;]/)   ## allow differnt separators
@@ -26,17 +26,17 @@ def self.cut( args )
   ## pp config
   ## pp args
-  source = arg[0]
-  dest   = arg[1]
+  source = args[0]
+  dest   = args[1] || source   ## default to same as source (note: overwrites datafile in place!!!)
-  unless arg[0] && arg[1]
-    puts "** error: arg(s) missing - source and dest filepath required!!! - #{args.inspect}"
+  unless args[0]
+    puts "** error: arg missing - source filepath required - #{args.inspect}"
     exit 1
   end
   columns = config[:columns]
-  CsvUtils.cut( source, dest, *columns )
+  CsvUtils.cut( source, *columns, output: dest )
 end

data/lib/csvutils/cut.rb CHANGED

@@ -5,7 +5,10 @@
 class CsvUtils
-  def self.cut( inpath, outpath, *columns, sep: ',' )
+  def self.cut( path, *columns, output: path, sep: ',' )
+    inpath  = path
+    outpath = output   # note: output defaults to inpath (overwrites datafile in-place!!!)
     puts "cvscut in: >#{inpath}<  out: >#{outpath}<"

data/lib/csvutils/version.rb CHANGED

@@ -6,7 +6,7 @@ class CsvUtils
   MAJOR = 0    ## todo: namespace inside version or something - why? why not??
   MINOR = 2
-  PATCH = 1
+  PATCH = 2
   VERSION = [MAJOR,MINOR,PATCH].join('.')
   def self.version

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: csvutils
 version: !ruby/object:Gem::Version
-  version: 0.2.1
+  version: 0.2.2
 platform: ruby
 authors:
 - Gerald Bauer