sycsvpro 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- sycsvpro (0.1.1)
4
+ sycsvpro (0.1.2)
5
5
  gli (= 2.9.0)
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -109,7 +109,9 @@ Allocate all the machine types to the customer
109
109
 
110
110
  Count
111
111
  -----
112
- Count all customers (key column) in rows 2 to 20 that have machines that start with *h* and have a contract valid beginning after 1.1.2000. Add a sum row with title Total at column 1
112
+ Count all customers (key column) in rows 2 to 20 that have machines that start
113
+ with *h* and have a contract valid beginning after 1.1.2000. Add a sum row with
114
+ title Total at column 1
113
115
 
114
116
  $ sycsvpro -f in.csv -o out.csv count -r 2-20 -k 0:customer -c 1:/^h/,5:">1.1.2000" --df "%d.%m.%Y" -s "Total:1"
115
117
 
@@ -126,7 +128,8 @@ It is possible to use multiple key columns `-k 0:customer,1:machines`
126
128
 
127
129
  Aggregate
128
130
  ---------
129
- Aggregate row values and add the sum to the end of the row. In the example we aggregate the customer names.
131
+ Aggregate row values and add the sum to the end of the row. In the example we
132
+ aggregate the customer names.
130
133
 
131
134
  $ sycsvpro -f in.csv -o out.csv aggregate -c 0 -s Total:1,Sum
132
135
 
@@ -141,7 +144,8 @@ The aggregation result in out.csv is
141
144
 
142
145
  Calc
143
146
  ----
144
- Process arithmetic operations on the contract count and create a target column and a sum which is added at the end of the result file
147
+ Process arithmetic operations on the contract count and create a target column
148
+ and a sum which is added at the end of the result file
145
149
 
146
150
  $ sycsvpro -f in.csv -o out.csv calc -r 2-20 -h *,target -c 6:*2,7:target=c6*10
147
151
 
@@ -154,11 +158,13 @@ Process arithmetic operations on the contract count and create a target column a
154
158
  chiro;c2;con331;dri100;mot130;3.05.3010;2;20
155
159
  0;0;0;0;0;0;10;100
156
160
 
157
- In the sum row non-numbers in the colums are converted to 0. Therefore column 0 is summed up to 0 as all strings are converted to 0.
161
+ In the sum row non-numbers in the colums are converted to 0. Therefore column 0
162
+ is summed up to 0 as all strings are converted to 0.
158
163
 
159
164
  Sort
160
165
  ----
161
- Sort rows on specified columns as an example sort rows based on customer (string s) and contract date (date d)
166
+ Sort rows on specified columns as an example sort rows based on customer
167
+ (string s) and contract date (date d)
162
168
 
163
169
  $ sycsvpro -f in.csv -o out.csv sort -r 2-20 -c s:0,d:5
164
170
 
@@ -169,35 +175,44 @@ Sort rows on specified columns as an example sort rows based on customer (string
169
175
  chiro;c2;con331;dri100;mot130;3.05.3010;1
170
176
  chiro;c1;con333;dri110;mot100;1.10.3011;1
171
177
 
172
- Sort expects the first non-empty row as the header row. If --headerless switch is set then sort assumes no header being available.
178
+ Sort expects the first non-empty row as the header row. If --headerless switch
179
+ is set then sort assumes no header being available.
173
180
 
174
181
  Insert
175
182
  ------
176
- Add rows at the bottom or on top of a file. The command below adds the content of the file file-with-rows-to-insert.text on top of the file in.csv and saves it to out.csv
183
+ Add rows at the bottom or on top of a file. The command below adds the content
184
+ of the file file-with-rows-to-insert.text on top of the file in.csv and saves
185
+ it to out.csv
177
186
 
178
187
  $ sycsvpro -f in.csv -o out.csv insert file-with-rows-to-insert.txt -p top
179
188
 
180
189
  Edit
181
190
  ----
182
- Creates or if it exists opens a file for editing. The file is created in the directory ~/.syc/sycsvpro/scripts. Following command creates a Ruby script with the name script.rb and a method call_me
191
+ Creates or if it exists opens a file for editing. The file is created in the
192
+ directory ~/.syc/sycsvpro/scripts. Following command creates a Ruby script with
193
+ the name script.rb and a method call_me
183
194
 
184
195
  $ sycsvpro edit -s script.rb -m call_me
185
196
 
186
197
  List
187
198
  ----
188
- List the scripts or insert-file available in the scripts directory
199
+ List the scripts, insert-file or all scripts available in the scripts directory
200
+ which is also displayed
189
201
 
202
+ script directory: ~/.syc/sycsvpro/scripts
190
203
  $ sycsvpro list -m
191
204
  script.rb
192
205
  call_me
193
206
 
194
207
  Execute
195
208
  -------
196
- Execute takes a Ruby script file as an argument and processes the script. The following command executes the script *script.rb* and invokes the method *calc*
209
+ Execute takes a Ruby script file as an argument and processes the script. The
210
+ following command executes the script *script.rb* and invokes the method *calc*
197
211
 
198
212
  $ sycsvpro execute ./script.rb calc
199
213
 
200
- Below is an example script file that is ultimately doing the same as the count command
214
+ Below is an example script file that is ultimately doing the same as the count
215
+ command
201
216
 
202
217
  $ sycsvpro -f in.csv -o out.csv count -r 1-20 -k 0 -c 4,5
203
218
 
@@ -232,15 +247,20 @@ def calc
232
247
  end
233
248
  ```
234
249
 
235
- *rows* and *write_to* are convenience methods provided by sycsvpro that can be used in script files to operate on files.
250
+ *rows* and *write_to* are convenience methods provided by sycsvpro that can be
251
+ used in script files to operate on files.
236
252
 
237
- *rows* will return values at the specified columns in the order they are provided in the call to
238
- rows. The columns to be returned in the block have to end with _column_ or _columns_ dependent if a value or an array should be returned. You can find the *rows* and *write_to* methods at _lib/sycsvpro/dsl.rb_.
253
+ *rows* will return values at the specified columns in the order they are
254
+ provided in the call to rows. The columns to be returned in the block have to
255
+ end with _column_ or _columns_ dependent if a value or an array should be
256
+ returned. You can find the *rows* and *write_to* methods at
257
+ _lib/sycsvpro/dsl.rb_.
239
258
 
240
259
  Working with sycsvpro
241
260
  =====================
242
261
 
243
- sycsvpro emerged from my daily work when cleaning and anaylzing data. If you want to dig deeper I would recommend [R](http://www.r-project.org/).
262
+ sycsvpro emerged from my daily work when cleaning and anaylzing data. If you
263
+ want to dig deeper I would recommend [R](http://www.r-project.org/).
244
264
 
245
265
  A work flow could be as follows
246
266
 
@@ -251,7 +271,29 @@ A work flow could be as follows
251
271
  * Do arithmetic operations on the values `calc`
252
272
  * Sort the rows based on column values
253
273
 
254
- When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to create nice graphs. To create more sophisiticated analysis *R* is the right tool to use.
274
+ When I have analyzed the data I use _Microsoft Excel_ or _LibreOffice Calc_ to
275
+ create nice graphs. To create more sophisiticated analysis *R* is the right tool
276
+ to use.
277
+
278
+ Release notes
279
+ =============
280
+
281
+ Version 0.1.2
282
+ -------------
283
+ * Now it is possible to have , in the filter as non separating values. You can
284
+ now define filter like 1-2,4,/[56789]{2,}/,10
285
+ * Filtering rows on boolean expression based on values contained in columns.
286
+ The boolean expression has to be enclosed between BEGIN and END
287
+ Example:
288
+ -r BEGINs0=='Ruby'&&n1<1||d2==Date.new(2014,6,17)END
289
+ s0 - string in column 0
290
+ n1 - number in column 1
291
+ d2 - date in column 2
292
+ * ``list`` shows the directory of the script file and has the flag *all* to
293
+ show all scripts, that is _insert files_ and _Ruby files_
294
+ * When counting columns with *count* the column headers are sorted
295
+ alphabetically. No it is possible to set ``sort: false`` to keep the column
296
+ headers in the sequence they are specified
255
297
 
256
298
  Installation
257
299
  ============
data/bin/sycsvpro CHANGED
@@ -11,6 +11,12 @@ end
11
11
 
12
12
  include GLI::App
13
13
 
14
+ row_regex = %r{
15
+ \d+(?:,\d+|-\d+|-eof|,\/.*\/)*|
16
+ \/.*\/(?:,\/.*\/|\d+)*|
17
+ BEGIN.*?END
18
+ }xi
19
+
14
20
  # Directory holding configuration files
15
21
  sycsvpro_directory = File.expand_path("~/.syc/sycsvpro")
16
22
 
@@ -58,17 +64,24 @@ end
58
64
  desc 'Extract specified rows and columns from the file'
59
65
  command :extract do |c|
60
66
  c.desc 'Rows to extract'
61
- c.arg_name '1,2,10-30,45-EOF,REGEXP'
62
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
67
+ c.arg_name '1,2,10-30,45-EOF,REGEXP,BEGINlogical_expressionEND'
68
+ c.flag [:r, :row], :must_match => row_regex
63
69
 
64
70
  c.desc 'Columns to extract'
65
71
  c.arg_name '1,2,10-30'
66
72
  c.flag [:c, :col], :must_match => /\d+(?:,\d+|-\d+)*/
67
73
 
74
+ c.desc 'Format of date values'
75
+ c.arg_name '%d.%m.%Y|%m/%d/%Y|...'
76
+ c.flag [:df]
77
+
68
78
  c.action do |global_options,options,args|
69
79
  print "Extracting ..."
70
- extractor = Sycsvpro::Extractor.new(infile: global_options[:f], outfile: global_options[:o],
71
- rows: options[:r], cols: options[:c])
80
+ extractor = Sycsvpro::Extractor.new(infile: global_options[:f],
81
+ outfile: global_options[:o],
82
+ rows: options[:r],
83
+ cols: options[:c],
84
+ df: options[:df])
72
85
  extractor.execute
73
86
  puts "done"
74
87
  end
@@ -79,16 +92,23 @@ command :collect do |c|
79
92
 
80
93
  c.desc 'Rows to consider for collection'
81
94
  c.arg_name 'ROW1,ROW2,ROW10-ROW30,45-EOF,REGEXP'
82
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
95
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
83
96
 
84
97
  c.desc 'Columns to collect values from'
85
98
  c.arg_name 'CATEGORY1:COL1,COL2,COL10-COL30+CATEGORY2:COL3-COL9'
86
99
  c.flag [:c, :col], :must_match => /^\w*:\d+(?:,\d+|-\d+|\+\w*:\d+(?:,\d+|-\d+)*)*/
87
100
 
101
+ c.desc 'Format of date values'
102
+ c.arg_name '%d.%m.%Y|%m/%d/%Y|...'
103
+ c.flag [:df]
104
+
88
105
  c.action do |global_options,options,args|
89
106
  print "Collecting ..."
90
- collector = Sycsvpro::Collector.new(infile: global_options[:f], outfile: global_options[:o],
91
- rows: options[:r], cols: options[:c])
107
+ collector = Sycsvpro::Collector.new(infile: global_options[:f],
108
+ outfile: global_options[:o],
109
+ rows: options[:r],
110
+ cols: options[:c],
111
+ df: options[:df])
92
112
  collector.execute
93
113
  puts "done"
94
114
  end
@@ -98,7 +118,7 @@ desc 'Allocate specified columns from the file to a key value'
98
118
  command :allocate do |c|
99
119
  c.desc 'Rows to consider'
100
120
  c.arg_name '1,2,10-30,45-EOF,REGEXP'
101
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
121
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
102
122
 
103
123
  c.desc 'Key to allocate columns to'
104
124
  c.arg_name '0'
@@ -108,10 +128,18 @@ command :allocate do |c|
108
128
  c.arg_name '1,2,10-30'
109
129
  c.flag [:c, :col], :must_match => /\d+(?:,\d+|-\d+)*/
110
130
 
131
+ c.desc 'Format of date values'
132
+ c.arg_name '%d.%m.%Y|%m/%d/%Y|...'
133
+ c.flag [:df]
134
+
111
135
  c.action do |global_options,options,args|
112
136
  print "Allocating ..."
113
- allocator = Sycsvpro::Allocator.new(infile: global_options[:f], outfile: global_options[:o],
114
- key: options[:k], rows: options[:r], cols: options[:c])
137
+ allocator = Sycsvpro::Allocator.new(infile: global_options[:f],
138
+ outfile: global_options[:o],
139
+ key: options[:k],
140
+ rows: options[:r],
141
+ cols: options[:c],
142
+ df: options[:df])
115
143
  allocator.execute
116
144
  puts "done"
117
145
  end
@@ -136,10 +164,10 @@ end
136
164
 
137
165
  desc 'Lists script or insert files in the scripts directory with optionally listing methods of script files'
138
166
  command :list do |c|
139
- c.desc 'Type of script (Ruby or insert file)'
167
+ c.desc 'Type of script (Ruby, insert or all files)'
140
168
  c.default_value 'script'
141
- c.arg_name 'SCRIPT|INSERT'
142
- c.flag [:t, :type], :must_match => /script|insert/i
169
+ c.arg_name 'SCRIPT|INSERT|ALL'
170
+ c.flag [:t, :type], :must_match => /script|insert|all/i
143
171
 
144
172
  c.desc 'Name of the script file'
145
173
  c.arg_name 'SCRIPT_NAME.rb|INSERT_NAME.ins'
@@ -148,12 +176,19 @@ command :list do |c|
148
176
  c.desc 'Show methods'
149
177
  c.switch [:m, :method]
150
178
 
179
+ c.desc 'Show script directory'
180
+ c.switch [:d, :dir]
181
+
151
182
  c.action do |global_options,options,args|
152
- script_list = Sycsvpro::ScriptList.new(dir: script_directory, type: options[:t],
153
- script: options[:s], show_methods: options[:m])
183
+ script_list = Sycsvpro::ScriptList.new(dir: script_directory,
184
+ type: options[:t],
185
+ script: options[:s],
186
+ show_methods: options[:m])
154
187
 
155
188
  scripts = script_list.execute
156
189
 
190
+ puts "script directory: #{script_directory}" if options[:d]; puts
191
+
157
192
  if scripts.empty?
158
193
  help_now! "No scripts available. You can create scripts with the edit command"
159
194
  else
@@ -194,11 +229,11 @@ command :count do |c|
194
229
 
195
230
  c.desc 'Key columns that are assigned the count of column values'
196
231
  c.arg_name 'COLUMN:TITLE,COLUMN:TITLE'
197
- c.flag [:k, :key], :must_match => /^\d+:\w+(?:,\d+:\w+)*/
232
+ c.flag [:k, :key], :required => true, :must_match => /^\d+:\w+(?:,\d+:\w+)*/
198
233
 
199
234
  c.desc 'Rows to consider'
200
235
  c.arg_name '1,2,10-30,45-EOF,REGEXP'
201
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
236
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
202
237
 
203
238
  c.desc 'Columns to count where columns 2 and 3 are counted conditionally'
204
239
  c.arg_name '1,2:<14.2.2014,10-30,3:>10'
@@ -212,11 +247,19 @@ command :count do |c|
212
247
  c.arg_name '%d.%m.%Y|%m/%d/%Y|...'
213
248
  c.flag [:df]
214
249
 
250
+ c.desc 'Sort headline values'
251
+ c.switch [:sort], :default_value => true
252
+
215
253
  c.action do |global_options,options,args|
216
254
  print "Counting..."
217
- counter = Sycsvpro::Counter.new(infile: global_options[:f], outfile: global_options[:o],
218
- key: options[:k], rows: options[:r], cols: options[:c],
219
- df: options[:df], sum: options[:s])
255
+ counter = Sycsvpro::Counter.new(infile: global_options[:f],
256
+ outfile: global_options[:o],
257
+ key: options[:k],
258
+ rows: options[:r],
259
+ cols: options[:c],
260
+ df: options[:df],
261
+ sum: options[:s],
262
+ sort: options[:sort])
220
263
  counter.execute
221
264
  puts "done"
222
265
  end
@@ -229,7 +272,7 @@ command :aggregate do |c|
229
272
 
230
273
  c.desc 'Rows to consider'
231
274
  c.arg_name '1,2,10-30,45-EOF,REGEXP'
232
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
275
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
233
276
 
234
277
  c.desc 'Columns to count'
235
278
  c.arg_name '1,2-4'
@@ -240,10 +283,18 @@ command :aggregate do |c|
240
283
  c.arg_name 'SUM_ROW_TITLE:ROW,SUM_COL_TITLE'
241
284
  c.flag [:s, :sum], :must_match => /^\w+:\d+(?:,\w+)?|^\w+/
242
285
 
286
+ c.desc 'Format of date values'
287
+ c.arg_name '%d.%m.%Y|%m/%d/%Y|...'
288
+ c.flag [:df]
289
+
243
290
  c.action do |global_options,options,args|
244
291
  print "Aggregating..."
245
- aggregator = Sycsvpro::Aggregator.new(infile: global_options[:f], outfile: global_options[:o],
246
- rows: options[:r], cols: options[:c], sum: options[:s])
292
+ aggregator = Sycsvpro::Aggregator.new(infile: global_options[:f],
293
+ outfile: global_options[:o],
294
+ rows: options[:r],
295
+ cols: options[:c],
296
+ sum: options[:s],
297
+ df: options[:df])
247
298
  aggregator.execute
248
299
  puts "done"
249
300
  end
@@ -254,7 +305,7 @@ desc 'Sort rows based on column values'
254
305
  command :sort do |c|
255
306
  c.desc 'Rows to consider'
256
307
  c.arg_name '1,2,10-30,45-EOF,REGEXP'
257
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
308
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
258
309
 
259
310
  c.desc 'Columns to sort based on a type (n = number, s = string, d = date) and its value'
260
311
  c.arg_name 'n:1,s:2-5,d:7'
@@ -310,18 +361,27 @@ arg_name 'MAPPINGS-FILE'
310
361
  command :map do |c|
311
362
  c.desc 'Rows to consider'
312
363
  c.arg_name 'ROW1,ROW2,ROW10-ROW30,45-EOF,REGEXP'
313
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
364
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
314
365
 
315
366
  c.desc 'Columns to consider for mapping'
316
367
  c.arg_name 'COL1,COL2,COL10-COL30'
317
368
  c.flag [:c, :col], :must_match => /\d+(?:,\d+|-\d+)*/
369
+
370
+ c.desc 'Format of date values'
371
+ c.arg_name '%d.%m.%Y|%m/%d/%Y|...'
372
+ c.default_value '%Y-%m-%d'
373
+ c.flag [:df]
318
374
 
319
375
  c.action do |global_options,options,args|
320
376
  help_now! "You need to provide a mapping file" if args.size == 0
321
377
 
322
378
  print "Mapping..."
323
- mapper = Sycsvpro::Mapper.new(infile: global_options[:f], outfile: global_options[:o],
324
- mapping: args[0], rows: options[:r], cols: options[:c])
379
+ mapper = Sycsvpro::Mapper.new(infile: global_options[:f],
380
+ outfile: global_options[:o],
381
+ mapping: args[0],
382
+ rows: options[:r],
383
+ cols: options[:c],
384
+ df: options[:df])
325
385
  mapper.execute
326
386
  puts "done"
327
387
  end
@@ -336,9 +396,9 @@ command :calc do |c|
336
396
  default_value '*'
337
397
  c.flag [:h, :header], :must_match => /\*(?:,\w+)*/
338
398
 
339
- c.desc 'Columns to consider for calculations'
399
+ c.desc 'Rows to consider for calculations'
340
400
  c.arg_name 'ROW1,ROW2-ROW10,45-EOF,REGEXP'
341
- c.flag [:r, :row], :must_match => /\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
401
+ c.flag [:r, :row], :must_match => row_regex #/\d+(?:,\d+|-\d+|-eof|,\/.*\/)*|\/.*\/(?:,\/.*\/|\d+)*/i
342
402
 
343
403
  c.desc 'Column to do calculations on'
344
404
  c.arg_name 'COL1:*2,COL2:-C3,COL3:*2+(4+C5),COL6:NEW_COL=C1+5'
@@ -356,9 +416,13 @@ command :calc do |c|
356
416
  help_now! "You need to provide the column flag" if options[:c].nil?
357
417
 
358
418
  print "Calculating..."
359
- calculator = Sycsvpro::Calculator.new(infile: global_options[:f], outfile: global_options[:o],
360
- header: options[:h], rows: options[:r], cols: options[:c],
361
- sum: options[:s], df: options[:df])
419
+ calculator = Sycsvpro::Calculator.new(infile: global_options[:f],
420
+ outfile: global_options[:o],
421
+ header: options[:h],
422
+ rows: options[:r],
423
+ cols: options[:c],
424
+ sum: options[:s],
425
+ df: options[:df])
362
426
  calculator.execute
363
427
  puts "done"
364
428
  end
@@ -41,7 +41,7 @@ module Sycsvpro
41
41
  @infile = options[:infile]
42
42
  @outfile = options[:outfile]
43
43
  @headerless = options[:headerless] || false
44
- @row_filter = RowFilter.new(options[:rows])
44
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
45
45
  @col_filter = ColumnFilter.new(options[:cols], df: options[:df])
46
46
  @key_values = Hash.new(0)
47
47
  @heading = []
@@ -19,8 +19,8 @@ module Sycsvpro
19
19
  def initialize(options={})
20
20
  @infile = options[:infile]
21
21
  @outfile = options[:outfile]
22
- @key_filter = ColumnFilter.new(options[:key])
23
- @row_filter = RowFilter.new(options[:rows])
22
+ @key_filter = ColumnFilter.new(options[:key], df: options[:df])
23
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
24
24
  @col_filter = ColumnFilter.new(options[:cols])
25
25
  end
26
26
 
@@ -37,7 +37,7 @@ module Sycsvpro
37
37
  @infile = options[:infile]
38
38
  @outfile = options[:outfile]
39
39
  @date_format = options[:df] || "%Y-%m-%d"
40
- @row_filter = RowFilter.new(options[:rows])
40
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
41
41
  @header = Header.new(options[:header])
42
42
  @sum_row = []
43
43
  @add_sum_row = options[:sum] || false
@@ -20,7 +20,7 @@ module Sycsvpro
20
20
  def initialize(options={})
21
21
  @infile = options[:infile]
22
22
  @outfile = options[:outfile]
23
- @row_filter = RowFilter.new(options[:rows])
23
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
24
24
  @collection = {}
25
25
  init_collection(options[:cols])
26
26
  end
@@ -27,6 +27,8 @@ module Sycsvpro
27
27
  attr_reader :key_values
28
28
  # header of the out file
29
29
  attr_reader :heading
30
+ # indicates whether the headline values should be sorted
31
+ attr_reader :heading_sort
30
32
  # Title of the sum row
31
33
  attr_reader :sum_row_title
32
34
  # row where to add the sums of the columns
@@ -39,15 +41,16 @@ module Sycsvpro
39
41
  # Creates a new counter. Takes as attributes infile, outfile, key, rows, cols, date-format and
40
42
  # indicator whether to add a sum row
41
43
  def initialize(options={})
42
- @infile = options[:infile]
43
- @outfile = options[:outfile]
44
+ @infile = options[:infile]
45
+ @outfile = options[:outfile]
44
46
  init_key_columns(options[:key])
45
- @row_filter = RowFilter.new(options[:rows])
46
- @col_filter = ColumnFilter.new(options[:cols], df: options[:df])
47
- @key_values = {}
48
- @heading = []
47
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
48
+ @col_filter = ColumnFilter.new(options[:cols], df: options[:df])
49
+ @key_values = {}
50
+ @heading = []
51
+ @heading_sort = options[:sort].nil? ? true : options[:sort]
49
52
  init_sum_scheme(options[:sum])
50
- @sums = Hash.new(0)
53
+ @sums = Hash.new(0)
51
54
  end
52
55
 
53
56
  # Executes the counter
@@ -82,17 +85,18 @@ module Sycsvpro
82
85
  # Writes the count results
83
86
  def write_result
84
87
  sum_line = [sum_row_title] + [''] * (key_titles.size - 1)
85
- heading.sort.each do |h|
88
+ headline = heading_sort ? heading.sort : col_filter.pivot.keys
89
+ headline.each do |h|
86
90
  sum_line << sums[h]
87
91
  end
88
92
  row = 0;
89
93
  File.open(outfile, 'w') do |out|
90
94
  out.puts sum_line.join(';') if row == sum_row ; row += 1
91
- out.puts (key_titles + heading.sort).join(';')
95
+ out.puts (key_titles + headline).join(';')
92
96
  key_values.each do |k,v|
93
97
  out.puts sum_line.join(';') if row == sum_row ; row += 1
94
98
  line = [k]
95
- heading.sort.each do |h|
99
+ headline.each do |h|
96
100
  line << v[:elements][h] unless h == sum_col_title
97
101
  end
98
102
  line << v[:sum] unless sum_col_title.nil?
@@ -20,8 +20,8 @@ module Sycsvpro
20
20
  def initialize(options={})
21
21
  @in_file = options[:infile]
22
22
  @out_file = options[:outfile]
23
- @row_filter = RowFilter.new(options[:rows])
24
- @col_filter = ColumnFilter.new(options[:cols])
23
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
24
+ @col_filter = ColumnFilter.new(options[:cols], df: options[:df])
25
25
  end
26
26
 
27
27
  # Executes the extractor
@@ -11,6 +11,8 @@ module Sycsvpro
11
11
  attr_reader :date_format
12
12
  # Filter for rows and columns
13
13
  attr_reader :filter
14
+ # Boolean for rows
15
+ attr_reader :boolean_filter
14
16
  # Type of column (n = number, s = string)
15
17
  attr_reader :types
16
18
  # Pattern that is used as a filter
@@ -25,21 +27,32 @@ module Sycsvpro
25
27
  @types = []
26
28
  @pattern = []
27
29
  @pivot = {}
30
+ @boolean_filter = ""
28
31
  create_filter(values)
29
32
  end
30
33
 
31
34
  # Creates the filters based on the given patterns
32
35
  def method_missing(id, *args, &block)
36
+ boolean_row_regex = %r{
37
+ BEGIN(\(*[nsd]\d+[<!=~>]{1,2}
38
+ (?:[A-Z][A-Za-z]*\.new\(.*?\)|\d+|['"].*?['"])
39
+ (?:\)*(?:&&|\|\||$)
40
+ \(*[nsd]\d+[<!=~>]{1,2}
41
+ (?:[A-Z][A-Za-z]*\.new\(.*?\)|\d+|['"].*?['"])\)*)*)END
42
+ }xi
43
+
44
+ return boolean_row($1, args, block) if id =~ boolean_row_regex
33
45
  return equal($1, args, block) if id =~ /^(\d+)$/
34
46
  return equal_type($1, $2, args, block) if id =~ /^(s|n|d):(\d+)$/
35
47
  return range($1, $2, args, block) if id =~ /^(\d+)-(\d+)$/
36
48
  return range_type($1, $2, $3, args, block) if id =~ /^(s|n|d):(\d+)-(\d+)$/
37
49
  return regex($1, args, block) if id =~ /^\/(.*)\/$/
38
50
  return col_regex($1, $2, args, block) if id =~ /^(\d+):\/(.*)\/$/
39
- return date($1, $2, $3, args, block) if id =~ /^(\d+):(<|=|>)(\d+.\d+.\d+)/
51
+ return date($1, $2, $3, args, block) if id =~ /^(\d+):(<|=|>)(\d+.\d+.\d+)$/
40
52
  return date_range($1, $2, $3, args, block) if id =~ /^(\d+):(\d+.\d+.\d+.)-(\d+.\d+.\d+)$/
41
- return number($1, $2, $3, args, block) if id =~ /^(\d+):(<|=|>)(\d+)/
42
- return number_range($1, $2, $3, args, block) if id =~ /^(\d):(\d+)-(\d+)/
53
+ return number($1, $2, $3, args, block) if id =~ /^(\d+):(<|=|>)(\d+)$/
54
+ return number_range($1, $2, $3, args, block) if id =~ /^(\d):(\d+)-(\d+)$/
55
+
43
56
  super
44
57
  end
45
58
 
@@ -48,6 +61,44 @@ module Sycsvpro
48
61
  raise 'Needs to be overridden by sub class'
49
62
  end
50
63
 
64
+ # Checks whether the values match the boolean filter
65
+ def match_boolean_filter?(values=[])
66
+ return false if boolean_filter.empty? or values.empty?
67
+ expression = boolean_filter
68
+ columns = expression.scan(/(([nsd])(\d+))([<!=~>]{1,2})(.*?)(?:[\|&]{2}|$)/)
69
+ # STDERR.puts "expr = #{expression.inspect}"
70
+ # STDERR.puts "vals = #{values.inspect}"
71
+ # STDERR.puts "cols = #{columns.inspect}"
72
+ columns.each do |c|
73
+ # STDERR.puts "val = #{values[c[2].to_i].inspect}"
74
+ value = case c[1]
75
+ when 'n'
76
+ values[c[2].to_i].empty? ? '0' : values[c[2].to_i]
77
+ when 's'
78
+ "'#{values[c[2].to_i]}'"
79
+ when 'd'
80
+ begin
81
+ Date.strptime(values[c[2].to_i], date_format)
82
+ rescue Exception => e
83
+ case c[3]
84
+ when '<', '<=', '=='
85
+ "#{c[4]}+1"
86
+ when '>', '>='
87
+ '0'
88
+ when '!='
89
+ c[4]
90
+ end
91
+ else
92
+ "Date.strptime('#{values[c[2].to_i]}', '#{date_format}')"
93
+ end
94
+ end
95
+ expression = expression.gsub(c[0], value)
96
+ # STDERR.puts "val2 = #{value}"
97
+ end
98
+ # STDERR.puts "exp = #{expression.inspect}"
99
+ eval(expression)
100
+ end
101
+
51
102
  # Yields the column value and whether the filter matches the column
52
103
  def pivot_each_column(values=[])
53
104
  pivot.each do |column, parameters|
@@ -65,14 +116,17 @@ module Sycsvpro
65
116
 
66
117
  # Checks whether a filter has been set. Returns true if filter has been set otherwise false
67
118
  def has_filter?
68
- return !(filter.empty? and pattern.empty?)
119
+ return !(filter.empty? and pattern.empty? and boolean_filter.empty?)
69
120
  end
70
121
 
71
122
  private
72
123
 
73
- # Creates a filter based on the provided rows and columns
124
+ # Creates a filter based on the provided rows and columns select criteria
74
125
  def create_filter(values)
75
- values.split(',').each { |f| send(f) } unless values.nil?
126
+ values.scan(/(?<=,|^)(BEGIN.*?END|\/.*?\/|.*?)(?=,|$)/i).flatten.each do |value|
127
+ # STDERR.puts "value = #{value}"
128
+ send(value)
129
+ end unless values.nil?
76
130
  end
77
131
 
78
132
  # Adds a single value to the filter
@@ -110,6 +164,11 @@ module Sycsvpro
110
164
  pivot[r] = { col: col, operation: operation }
111
165
  end
112
166
 
167
+ # Adds a boolean row filter
168
+ def boolean_row(operation, args, block)
169
+ boolean_filter.clear << operation
170
+ end
171
+
113
172
  # Adds a date filter
114
173
  def date(col, comparator, date, args, block)
115
174
  comparator = '==' if comparator == '='
@@ -19,8 +19,8 @@ module Sycsvpro
19
19
  def initialize(options={})
20
20
  @infile = options[:infile]
21
21
  @outfile = options[:outfile]
22
- @row_filter = RowFilter.new(options[:row_filter])
23
- @col_filter = ColumnFilter.new(options[:col_filter])
22
+ @row_filter = RowFilter.new(options[:row_filter], df: options[:df])
23
+ @col_filter = ColumnFilter.new(options[:col_filter], df: options[:df])
24
24
  @mapper = {}
25
25
  init_mapper(options[:mapping])
26
26
  end
@@ -17,9 +17,10 @@ module Sycsvpro
17
17
  pattern.each do |p|
18
18
  filtered = (filtered or !(object =~ Regexp.new(p)).nil?)
19
19
  end
20
+ filtered = (filtered or match_boolean_filter?(object.split(';')))
20
21
  filtered ? object : nil
21
22
  end
22
23
 
23
24
  end
24
-
25
+
25
26
  end
@@ -21,6 +21,7 @@ module Sycsvpro
21
21
  @script_type.downcase!
22
22
  @script_file = options[:script] || '*.rb' if @script_type == 'script'
23
23
  @script_file = options[:script] || '*.ins' if @script_type == 'insert'
24
+ @script_file = options[:script] || '*.{rb,ins}' if @script_type == 'all'
24
25
  @show_methods = options[:show_methods] if @script_type == 'script'
25
26
  @show_methods = false if @script_type == 'insert'
26
27
  @list = {}
@@ -32,7 +32,7 @@ module Sycsvpro
32
32
  @outfile = options[:outfile]
33
33
  @headerless = options[:headerless] || false
34
34
  @desc = options[:desc] || false
35
- @row_filter = RowFilter.new(options[:rows])
35
+ @row_filter = RowFilter.new(options[:rows], df: options[:df])
36
36
  @col_type_filter = ColumnTypeFilter.new(options[:cols], df: options[:df])
37
37
  @sorted_rows = []
38
38
  end
@@ -1,5 +1,5 @@
1
1
  # Operating csv files
2
2
  module Sycsvpro
3
3
  # Version number of sycsvpro
4
- VERSION = '0.1.1'
4
+ VERSION = '0.1.2'
5
5
  end
@@ -33,15 +33,15 @@ module Sycsvpro
33
33
  it "should count date columns" do
34
34
  counter = Counter.new(infile: @in_file, outfile: @out_file, rows: "1-10",
35
35
  cols: "2:<1.1.2013,2:1.1.2013-31.12.2014,2:>31.12.2014",
36
- key: "0:customer", df: "%d.%m.%Y")
36
+ key: "0:customer", df: "%d.%m.%Y", sort: false)
37
37
 
38
38
  counter.execute
39
39
 
40
- result = [ "customer;1.1.2013-31.12.2014;<1.1.2013;>31.12.2014",
40
+ result = [ "customer;<1.1.2013;1.1.2013-31.12.2014;>31.12.2014",
41
41
  "Fink;0;0;2",
42
- "Haas;0;1;0",
43
- "Gent;1;0;0",
44
- "Rank;1;0;0" ]
42
+ "Haas;1;0;0",
43
+ "Gent;0;1;0",
44
+ "Rank;0;1;0" ]
45
45
 
46
46
  File.open(@out_file).each_with_index do |line, index|
47
47
  line.chomp.should eq result[index]
@@ -5,7 +5,8 @@ module Sycsvpro
5
5
  describe Extractor do
6
6
 
7
7
  before do
8
- @in_file = File.join(File.dirname(__FILE__), "files/in.csv")
8
+ @in_file = File.join(File.dirname(__FILE__), "files/in.csv")
9
+ @in_file2 = File.join(File.dirname(__FILE__), "files/in4.csv")
9
10
  @out_file = File.join(File.dirname(__FILE__), "files/out.csv")
10
11
  end
11
12
 
@@ -37,6 +38,20 @@ module Sycsvpro
37
38
 
38
39
  end
39
40
 
41
+ it "should extract rows base on regex including commas" do
42
+ extractor = Extractor.new(infile: @in_file2, outfile: @out_file, rows: "/[56789]\\d+|\\d{3,}/")
43
+
44
+ extractor.execute
45
+
46
+ result = [ "Gent;50",
47
+ "Haas;100",
48
+ "Klig;80" ]
49
+
50
+ File.open(@out_file).each_with_index do |line, index|
51
+ line.chomp.should eq result[index]
52
+ end
53
+ end
54
+
40
55
  end
41
56
 
42
57
  end
@@ -0,0 +1,76 @@
1
+ require 'sycsvpro/row_filter'
2
+
3
+ module Sycsvpro
4
+
5
+ describe RowFilter do
6
+
7
+ before do
8
+ @in_file = File.join(File.dirname(__FILE__), "files/in.csv")
9
+ @out_file = File.join(File.dirname(__FILE__), "files/out.csv")
10
+ end
11
+
12
+ it "should return row string when no filter is set" do
13
+ row_filter = Sycsvpro::RowFilter.new(nil)
14
+ row_filter.process("abc", row: 1).should eq "abc"
15
+ end
16
+
17
+ it "should filter rows on index" do
18
+ rows = "1-5"
19
+ row_filter = Sycsvpro::RowFilter.new(rows)
20
+ row_filter.process("abc", row: 1).should eq "abc"
21
+ row_filter.process("abc", row: 6).should be_nil
22
+ end
23
+
24
+ it "should filter rows on regex" do
25
+ rows = "1,\/\\d{2,}\/"
26
+ row_filter = Sycsvpro::RowFilter.new(rows)
27
+ row_filter.process("5;50;500", row: 1).should eq "5;50;500"
28
+ row_filter.process("5;50;500", row: 2).should eq "5;50;500"
29
+ end
30
+
31
+ it "should filter rows on logical expression" do
32
+ rows = "BEGINn1>50&&s2=='Ruby'||n3<10END"
33
+ row_filter = Sycsvpro::RowFilter.new(rows)
34
+ row_filter.process("a;49;Rub;9").should eq "a;49;Rub;9"
35
+ row_filter.process("a;51;Ruby;11").should eq "a;51;Ruby;11"
36
+ row_filter.process("a;49;Ruby;11").should be_nil
37
+ end
38
+
39
+ it "should filter rows on Ruby classes" do
40
+ rows = "BEGINn1==50&&d2==Date.new(2014,6,16)||s3=~Regexp.new('[56789]\\d{2,}')END"
41
+ row_filter = Sycsvpro::RowFilter.new(rows)
42
+ row_filter.process("x;50;2014-06-16;99").should eq "x;50;2014-06-16;99"
43
+ end
44
+
45
+ it "should filter rows on row number filter and boolean filter" do
46
+ rows = "1,3-4,BEGINn1==50&&d2<Date.new(2014,6,16)||s3=='Works?'END"
47
+ row_filter = Sycsvpro::RowFilter.new(rows)
48
+ row_filter.process("x;50;2014-06-15;Works?").should eq "x;50;2014-06-15;Works?"
49
+ row_filter.process("x;50;2014-06-15;Works?", row: 1).should eq "x;50;2014-06-15;Works?"
50
+ end
51
+
52
+ it "should filter rows on boolean filter with brackets" do
53
+ rows = "BEGINn1==50&&(d2<Date.new(2014,6,16)||s3=='Works?')END"
54
+ row_filter = Sycsvpro::RowFilter.new(rows)
55
+ row_filter.process("x;50;2014-6-15;Works?").should eq "x;50;2014-6-15;Works?"
56
+ row_filter.process("x;49;2014-6-15;Works?").should be_nil
57
+ row_filter.process("x;50;2014-6-17;Worx?").should be_nil
58
+ end
59
+
60
+ it "should fitler rows with ' in value" do
61
+ rows = "BEGINn1!=50||n2=~'/\\d+/'||n2==\"Doesn't work\"END"
62
+ row_filter = Sycsvpro::RowFilter.new(rows)
63
+ row_filter.process("x;50;2;we").should be_nil
64
+ row_filter.process("x;49;/\\d+/;\"Doesn't work\"").should eq "x;49;/\\d+/;Doesn't work"
65
+ end
66
+
67
+ it "should not filter rows with invalid syntax" do
68
+ rows = "BEGINn1!=50||n2=~regex('\\d+')END"
69
+ expect { Sycsvpro::RowFilter.new(rows) }.to raise_error
70
+ end
71
+
72
+ end
73
+
74
+ end
75
+
76
+
metadata CHANGED
@@ -1,74 +1,84 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sycsvpro
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
+ prerelease:
5
6
  platform: ruby
6
7
  authors:
7
8
  - Pierre Sugar
8
9
  autorequire:
9
10
  bindir: bin
10
11
  cert_chain: []
11
- date: 2014-03-12 00:00:00.000000000 Z
12
+ date: 2014-06-17 00:00:00.000000000 Z
12
13
  dependencies:
13
14
  - !ruby/object:Gem::Dependency
14
15
  name: rake
15
16
  requirement: !ruby/object:Gem::Requirement
17
+ none: false
16
18
  requirements:
17
- - - ">="
19
+ - - ! '>='
18
20
  - !ruby/object:Gem::Version
19
21
  version: '0'
20
22
  type: :development
21
23
  prerelease: false
22
24
  version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
23
26
  requirements:
24
- - - ">="
27
+ - - ! '>='
25
28
  - !ruby/object:Gem::Version
26
29
  version: '0'
27
30
  - !ruby/object:Gem::Dependency
28
31
  name: rdoc
29
32
  requirement: !ruby/object:Gem::Requirement
33
+ none: false
30
34
  requirements:
31
- - - ">="
35
+ - - ! '>='
32
36
  - !ruby/object:Gem::Version
33
37
  version: '0'
34
38
  type: :development
35
39
  prerelease: false
36
40
  version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
37
42
  requirements:
38
- - - ">="
43
+ - - ! '>='
39
44
  - !ruby/object:Gem::Version
40
45
  version: '0'
41
46
  - !ruby/object:Gem::Dependency
42
47
  name: aruba
43
48
  requirement: !ruby/object:Gem::Requirement
49
+ none: false
44
50
  requirements:
45
- - - ">="
51
+ - - ! '>='
46
52
  - !ruby/object:Gem::Version
47
53
  version: '0'
48
54
  type: :development
49
55
  prerelease: false
50
56
  version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
51
58
  requirements:
52
- - - ">="
59
+ - - ! '>='
53
60
  - !ruby/object:Gem::Version
54
61
  version: '0'
55
62
  - !ruby/object:Gem::Dependency
56
63
  name: rspec
57
64
  requirement: !ruby/object:Gem::Requirement
65
+ none: false
58
66
  requirements:
59
- - - ">="
67
+ - - ! '>='
60
68
  - !ruby/object:Gem::Version
61
69
  version: '0'
62
70
  type: :development
63
71
  prerelease: false
64
72
  version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
65
74
  requirements:
66
- - - ">="
75
+ - - ! '>='
67
76
  - !ruby/object:Gem::Version
68
77
  version: '0'
69
78
  - !ruby/object:Gem::Dependency
70
79
  name: gli
71
80
  requirement: !ruby/object:Gem::Requirement
81
+ none: false
72
82
  requirements:
73
83
  - - '='
74
84
  - !ruby/object:Gem::Version
@@ -76,6 +86,7 @@ dependencies:
76
86
  type: :runtime
77
87
  prerelease: false
78
88
  version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
79
90
  requirements:
80
91
  - - '='
81
92
  - !ruby/object:Gem::Version
@@ -89,8 +100,8 @@ extra_rdoc_files:
89
100
  - README.rdoc
90
101
  - sycsvpro.rdoc
91
102
  files:
92
- - ".gitignore"
93
- - ".rspec"
103
+ - .gitignore
104
+ - .rspec
94
105
  - Gemfile
95
106
  - Gemfile.lock
96
107
  - LICENSE
@@ -201,37 +212,39 @@ files:
201
212
  - spec/sycsvpro/inserter_spec.rb
202
213
  - spec/sycsvpro/mapper_spec.rb
203
214
  - spec/sycsvpro/profiler_spec.rb
215
+ - spec/sycsvpro/row_filter_spec.rb
204
216
  - spec/sycsvpro/script_list_spec.rb
205
217
  - spec/sycsvpro/sorter_spec.rb
206
218
  - sycsvpro.gemspec
207
219
  - sycsvpro.rdoc
208
220
  homepage: https://github.com/sugaryourcoffee/syc-svpro
209
221
  licenses: []
210
- metadata: {}
211
222
  post_install_message:
212
223
  rdoc_options:
213
- - "--title"
224
+ - --title
214
225
  - sycsvpro
215
- - "--main"
226
+ - --main
216
227
  - README.rdoc
217
- - "-ri"
228
+ - -ri
218
229
  require_paths:
219
230
  - lib
220
231
  - lib
221
232
  required_ruby_version: !ruby/object:Gem::Requirement
233
+ none: false
222
234
  requirements:
223
- - - ">="
235
+ - - ! '>='
224
236
  - !ruby/object:Gem::Version
225
237
  version: '0'
226
238
  required_rubygems_version: !ruby/object:Gem::Requirement
239
+ none: false
227
240
  requirements:
228
- - - ">="
241
+ - - ! '>='
229
242
  - !ruby/object:Gem::Version
230
243
  version: '0'
231
244
  requirements: []
232
245
  rubyforge_project:
233
- rubygems_version: 2.2.0
246
+ rubygems_version: 1.8.23
234
247
  signing_key:
235
- specification_version: 4
248
+ specification_version: 3
236
249
  summary: Processing of csv files
237
250
  test_files: []
checksums.yaml DELETED
@@ -1,7 +0,0 @@
1
- ---
2
- SHA1:
3
- metadata.gz: 7823aeea07dda43deb692fdf13a8f57559bbb917
4
- data.tar.gz: 4e3da29ada80c6c4a5eb282facce537e4750f0e3
5
- SHA512:
6
- metadata.gz: 203400b5c9187269c3fe336f940f25427eecfd5fe25a56fbc3fc982f7c47d059a6aefc5eb74b5ab5769d5a0b318ebf2ee72a231db64127cb24ca767ee8f5915f
7
- data.tar.gz: deb1cfff428e6e83f1770a2f67a7b4e6fcf5cc642ce6c96cd50ba748ecfb8e44aaa39b53b88bc08083e434857d2ab778a7d086f1e61d9de3b3d7c1568a9ce343