csvutils 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: beeb6af86232b078b710e1bd744a6fe7cc0b40b0
4
- data.tar.gz: 6def57af46d09b47c9669aecf1937c49b77b3eff
3
+ metadata.gz: c639c6a13e149e81b5255262884abd02fae3c22e
4
+ data.tar.gz: 73bec69652d262ebf616c00e8351f8788b0d022a
5
5
  SHA512:
6
- metadata.gz: a4641b42b5b613d859fe89ee41d6afc13d3a748eb917ba1a368a2d6fb8e9a8884dd2b9c9d533af4188405378a7d6678fb4e312ba3e6b27cd2de6564dd163f5c1
7
- data.tar.gz: 98cf752445c23b9ec80c8ba4327e83df39d09cae008e7f246f18c99226827929f58cf84ff497069cbbeec61e69fe6b9a8810910b2c02c14613ac7fc26a4a9e18
6
+ metadata.gz: 45d8ba36d58577f3362cea9ead2ff9b952992f5038cb453dad6d666948a498269b60ef2c7d15b1c260b291a69c047ab3616ba62d93c92ac6af61cd5e8ae703fc
7
+ data.tar.gz: 56ebd4f88a54411e97312a832de6d2d830e1af15e287923535dd51edd4ac11c09ff525a7a27363d9c4255defbae7dc6040f6bb2202439fc06d4bf15adfe01b54
data/README.md CHANGED
@@ -8,10 +8,268 @@
8
8
  * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
9
 
10
10
 
11
+
11
12
  ## Usage
12
13
 
14
+ ### Command Line Tools
15
+
16
+ `csvhead` • `csvheader` • `csvstat` • `csvsplit` • `csvcut`
17
+
18
+
19
+ Try:
20
+
21
+ ```
22
+ $ csvhead -h # or
23
+ $ csvhead --help
24
+ ```
25
+
26
+ resulting in:
27
+
28
+ ```
29
+ Usage: csvhead [OPTS] datafile ...
30
+ -n, --num=NUM Number of rows
31
+ -h, --help Prints this help
32
+ ```
33
+
34
+ and so on. Now try with `csvheader -h`, `csvstat -h`, `csvsplit -h`,
35
+ `csvcut -h` and so on.
36
+
37
+
38
+
39
+ #### Working with Comma-Separated Values (CSV) Datafile Examples
40
+
41
+ Let's use a sample datafile e.g. `E0.csv` from the [football.csv project]() with
42
+ matches from the English Premier League. Try
43
+
44
+ ```
45
+ $ csvhead E0.csv
46
+ ```
47
+ to pretty print (pp) the first four row (use the `-n` option for more or less rows).
48
+ Resulting in:
49
+
50
+ ```
51
+ == E0.csv (.) ==
52
+
53
+ #<CSV::Row "Date":nil "HomeTeam":"Arsenal" "AwayTeam":"Leicester" "FTHG":"4" "FTAG":"3" "HTHG":"2" "HTAG":"2">
54
+ #<CSV::Row "Date":nil "HomeTeam":"Brighton" "AwayTeam":"Man City" "FTHG":"0" "FTAG":"2" "HTHG":"?" "HTAG":"?">
55
+ #<CSV::Row "Date":"12/08/17" "HomeTeam":"Chelsea" "AwayTeam":"Burnley" "FTHG":"2" "FTAG":"3" "HTHG":"?" "HTAG":"?">
56
+ #<CSV::Row "Date":"-" "HomeTeam":"Crystal Palace" "AwayTeam":"Huddersfield" "FTHG":"0" "FTAG":"3" "HTHG":"0" "HTAG":"2">
57
+
58
+ 4 rows
59
+ ```
60
+
61
+ Next try
62
+
63
+ ```
64
+ $ csvheader E0.csv
65
+ ```
66
+
67
+ to print all header columns (the first row). Resulting in:
68
+
69
+ ```
70
+ == E0.csv (.) ==
71
+
72
+ 7 columns:
73
+ 1: Date
74
+ 2: HomeTeam
75
+ 3: AwayTeam
76
+ 4: FTHG
77
+ 5: FTAG
78
+ 6: HTHG
79
+ 7: HTAG
80
+ ```
81
+
82
+ Next try:
83
+
84
+ ```
85
+ $ csvstat -c HomeTeam,AwayTeam E0.csv
86
+ ```
87
+
88
+ to show all unique values for the columns `HomeTeam` and `AwayTeam`.
89
+ Resulting in:
90
+
91
+ ```
92
+ == E0.csv (.) ==
93
+
94
+ ... 380 rows
95
+
96
+ 7 columns:
97
+ 1: Date
98
+ 2: HomeTeam
99
+ 3: AwayTeam
100
+ 4: FTHG
101
+ 5: FTAG
102
+ 6: HTHG
103
+ 7: HTAG
104
+
105
+ column >HomeTeam< 21 unique values:
106
+ 1 x <nil>
107
+ 19 x Arsenal
108
+ 18 x Bournemouth
109
+ 19 x Brighton
110
+ 19 x Burnley
111
+ 19 x Chelsea
112
+ 19 x Crystal Palace
113
+ 19 x Everton
114
+ 19 x Huddersfield
115
+ 19 x Leicester
116
+ 19 x Liverpool
117
+ 19 x Man City
118
+ 19 x Man United
119
+ 19 x Newcastle
120
+ 19 x Southampton
121
+ 19 x Stoke
122
+ 19 x Swansea
123
+ 19 x Tottenham
124
+ 19 x Watford
125
+ 19 x West Brom
126
+ 19 x West Ham
127
+ column >AwayTeam< 21 unique values:
128
+ 1 x ?
129
+ 19 x Arsenal
130
+ 19 x Bournemouth
131
+ 19 x Brighton
132
+ 19 x Burnley
133
+ 19 x Chelsea
134
+ 19 x Crystal Palace
135
+ 19 x Everton
136
+ 19 x Huddersfield
137
+ 19 x Leicester
138
+ 19 x Liverpool
139
+ 19 x Man City
140
+ 19 x Man United
141
+ 19 x Newcastle
142
+ 19 x Southampton
143
+ 19 x Stoke
144
+ 19 x Swansea
145
+ 19 x Tottenham
146
+ 18 x Watford
147
+ 19 x West Brom
148
+ 19 x West Ham
149
+ ```
150
+
151
+
152
+ #### Split & Cut - Split One Datafile into Many or Cut / Reorder Columns
153
+
154
+ Let's use another sample datafile e.g. `AUT.csv` that holds many seasons
155
+ from the Austrian (AUT) Bundesliga. First lets see how many seasons:
156
+
157
+ ```
158
+ $ csvstat -c Season AUT.csv
159
+ ```
160
+
161
+ Resulting in:
162
+
163
+ ```
164
+ == AUT.csv (.) ==
165
+
166
+ ... 362 rows
167
+
168
+ 7 columns:
169
+ 1: Season
170
+ 2: Date
171
+ 3: Time
172
+ 4: Home
173
+ 5: Away
174
+ 6: HG
175
+ 7: AG
176
+
177
+ column >Season< 2 unique values:
178
+ 180 x 2016/2017
179
+ 182 x 2017/2018
180
+ ```
181
+
182
+ Now let's split the `AUT.csv` datafile by the `Season` column
183
+ resulting in two new datafiles named `AUT_2016-2017.csv`
184
+ and `ÀUT_2017-2018.csv`. Try:
185
+
186
+ ```
187
+ $ csvsplit -c Season AUT.csv
188
+ ```
189
+
190
+ Resulting in:
191
+
192
+ ```
193
+ new chunk: ["2016/2017"]
194
+ saving >AUT_2016-2017.csv<...
195
+ new chunk: ["2017/2018"]
196
+ saving >AUT_2017-2018.csv<...
197
+ ```
198
+
199
+ Let's cut out (remove) the `Season` and `Time` column from the new `AUT_2016-2017.csv`
200
+ datafile. Try:
201
+
202
+ ```
203
+ $ csvcut -c Date,Home,Away,HG,AG AUT_2016-2017.csv
204
+ ```
205
+
206
+ Double check the overwritten cleaned-up datafile:
207
+
208
+ ```
209
+ $ csvhead AUT_2016-2017.csv
210
+ ```
211
+
212
+ resulting in:
213
+
214
+ ```
215
+
216
+ ```
217
+
218
+ And so on and so forth.
219
+
220
+
221
+
222
+
223
+ ### Code, Code, Code - Script Your Data Work Flow with Ruby
224
+
225
+ You can use all tools in your script using the `CsvUtils`
226
+ class methods:
227
+
228
+ | Shell | Ruby |
229
+ |-------------|-----------------------------------|
230
+ | `csvhead` | `CsvUtils.head( path, n: 4 )` |
231
+ | `csvheader` | `CsvUtils.header( path )` |
232
+ | `csvstat` | `CsvUtils.stat( path, *columns )` |
233
+ | `csvsplit` | `CsvUtils.split( path, *columns )` |
234
+ | `csvcut` | `CsvUtils.cut( path, *columns, output: path)` |
235
+
236
+
237
+
238
+ Let's retry the sample above in a script:
239
+
240
+
241
+ ``` ruby
242
+ require 'csvutils'
243
+
244
+
245
+ CsvUtils.head( 'E0.csv' )
246
+ # same as
247
+ # $ csvhead E0.csv
248
+
249
+ CsvUtils.header( 'E0.csv' )
250
+ # => see above :-)
251
+
252
+ CsvUtils.stat( 'E0.csv', 'HomeTeam', 'AwayTeam' )
253
+ # same as:
254
+ # $ csvstat -c HomeTeam,AwayTeam E0.csv
255
+
256
+
257
+ CsvUtils.stat( 'AUT.csv', 'Season' )
258
+ # => same as
259
+ # $ csvstat -c Season AUT.csv
260
+
261
+
262
+ CsvUtils.split( 'AUT.csv', 'Season' )
263
+ # => see above :-)
264
+
265
+ CsvUtils.cut( 'AUT_2016-2017.csv', 'AUT_2016-2017.csv', 'Date', 'Home', 'Away', 'HG', 'AG' )
266
+ # => see above :-)
267
+
268
+
269
+ ```
13
270
 
14
271
 
272
+ That's it.
15
273
 
16
274
 
17
275
  ## License
@@ -9,7 +9,7 @@ def self.cut( args )
9
9
  config = { columns: [] }
10
10
 
11
11
  parser = OptionParser.new do |opts|
12
- opts.banner = "Usage: csvcut [OPTS] source dest"
12
+ opts.banner = "Usage: csvcut [OPTS] source [dest]"
13
13
 
14
14
  opts.on("-c", "--columns=COLUMNS", "Name of header columns" ) do |columns|
15
15
  config[:columns] = columns.split(/[,|;]/) ## allow differnt separators
@@ -26,17 +26,17 @@ def self.cut( args )
26
26
  ## pp config
27
27
  ## pp args
28
28
 
29
- source = arg[0]
30
- dest = arg[1]
29
+ source = args[0]
30
+ dest = args[1] || source ## default to same as source (note: overwrites datafile in place!!!)
31
31
 
32
- unless arg[0] && arg[1]
33
- puts "** error: arg(s) missing - source and dest filepath required!!! - #{args.inspect}"
32
+ unless args[0]
33
+ puts "** error: arg missing - source filepath required - #{args.inspect}"
34
34
  exit 1
35
35
  end
36
36
 
37
37
  columns = config[:columns]
38
38
 
39
- CsvUtils.cut( source, dest, *columns )
39
+ CsvUtils.cut( source, *columns, output: dest )
40
40
  end
41
41
 
42
42
 
@@ -5,7 +5,10 @@
5
5
 
6
6
  class CsvUtils
7
7
 
8
- def self.cut( inpath, outpath, *columns, sep: ',' )
8
+ def self.cut( path, *columns, output: path, sep: ',' )
9
+
10
+ inpath = path
11
+ outpath = output # note: output defaults to inpath (overwrites datafile in-place!!!)
9
12
 
10
13
  puts "cvscut in: >#{inpath}< out: >#{outpath}<"
11
14
 
@@ -6,7 +6,7 @@ class CsvUtils
6
6
 
7
7
  MAJOR = 0 ## todo: namespace inside version or something - why? why not??
8
8
  MINOR = 2
9
- PATCH = 1
9
+ PATCH = 2
10
10
  VERSION = [MAJOR,MINOR,PATCH].join('.')
11
11
 
12
12
  def self.version
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csvutils
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gerald Bauer