rubish 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.textile ADDED
@@ -0,0 +1,651 @@
1
+ # Rubish
2
+
3
+ Rubish is a shell in Ruby. It is *object
4
+ oriented*, and it only uses Ruby's own syntax (*no
5
+ metasyntax* of its own). Rubish is pronounced
6
+ Roobish, as opposed to Rubbish, unlike Bash.
7
+
8
+ # Getting Started
9
+
10
+ Fire up an irb, and start Rubish
11
+
12
+ $ irb -rrubish
13
+ irb> Rubish.repl
14
+ rbh> date
15
+ Tue Mar 17 17:06:04 PDT 2009
16
+ rbh> uname :svrm
17
+ Linux 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686
18
+
19
+ A few tips upfront. Sometimes Rubish could mess up
20
+ your terminal. If that happens, try hitting `C-c`
21
+ and `C-d` to get back to Bash, then,
22
+
23
+ $ reset
24
+
25
+ to reset your terminal. Also, Rubish doesn't have
26
+ shell history. But it uses the readline library,
27
+ so you can use its history mechanism. `C-r <string>`
28
+ to match a previously entered line with
29
+ string.
30
+
31
+ # Overview
32
+
33
+ Rubish's Executable class provides a common API
34
+ for IO redirection and output processing. The
35
+ subclasses are,
36
+
37
+ Command
38
+ A unix command.
39
+ Pipe
40
+ A pipe line of unix commands
41
+ Awk
42
+ Sed
43
+ Batch
44
+ An arbitrary block of code. Like subshell.
45
+
46
+ # Command
47
+
48
+ Rubish REPL takes a line and `instance_eval` it
49
+ with the shell object (a `Rubish::Context`). If
50
+ the method is undefined, they call is translated
51
+ into an Executable (`Rubish::Command`) object with
52
+ `method_missing`.
53
+
54
+ rbh> ls
55
+ awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
56
+
57
+ # ls evaluates to a Rubish::Command object, which
58
+ # is a subclass of Rubish::Executable
59
+ rbh> ls.inspect
60
+ "#<Rubish::Command::ShellCommand:0xb7ac297c @args=\"\", @status=nil, @cmd=\"ls \">"
61
+
62
+ # you can store a command in an instance variable
63
+ rbh> @cmd = ls; nil
64
+ nil
65
+ # if the shell evaluates to a command, the shell
66
+ # calls the exec method on it.
67
+ rbh> @cmd # same as @cmd.exec
68
+ awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
69
+ rbh> @cmd
70
+ awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
71
+
72
+ You can invoke a command with arguments of
73
+ `String`, `Symbol`, or `Array` (of `String`,
74
+ `Symbol`, or `Array` (recursively)). A `String`
75
+ argument is taken as it is. A `Symbol` is
76
+ translated to a flag (:flag => -flag). Arguments
77
+ in an `Array` are treated likewise. Finally, all
78
+ the arguments are flatten and joined together.
79
+
80
+ The followings are equivalent,
81
+
82
+ rbh> ls :l, "awk.rb", "sed.rb"
83
+ rbh> ls "-l awk.rb sed.rb"
84
+ rbh> ls :l, %w(awk.rb sed.rb)
85
+
86
+ # Pipe
87
+
88
+ rbh> p { ls ; tr "a-z A-Z" }
89
+ AWK.RB
90
+ COMMAND_BUILDER.RB
91
+ COMMAND.RB
92
+ EXECUTABLE.RB
93
+ LICENSE
94
+ PIPE.RB
95
+ README.TEXTILE
96
+ RUBISH.RB
97
+ SED.RB
98
+ SESSION.RB
99
+ STREAMER.RB
100
+
101
+ Pipes are first class values:
102
+
103
+ rbh> @pipe = p { ls ; tr "a-z A-Z" }; nil
104
+ # again, we return nil so @pipe doesn't get executed.
105
+ rbh> @pipe
106
+ # execute @pipe once
107
+ rbh> @pipe
108
+ # execute @pipe again
109
+
110
+ # IO redirections
111
+
112
+ IO redirections are done by methods defined in
113
+ `Rubish::Executable`.
114
+
115
+ Rubish::Executable#i(io=nil)
116
+ Set the $stdin of the executable when
117
+ it is executed. If called without an argument,
118
+ return the executable's IO object.
119
+ Rubish::Executable#o(io=nil)
120
+ Ditto for $stdout
121
+ Rubish::Executable#err(io=nil)
122
+ Ditto for $stderr
123
+
124
+
125
+ rbh> ls.o("ls-result")
126
+ rbh> cat.i("ls-result")
127
+ awk.rb
128
+ command_builder.rb
129
+ command.rb
130
+ executable.rb
131
+ LICENSE
132
+ ls-result
133
+ pipe.rb
134
+ README.textile
135
+ rubish.rb
136
+ sed.rb
137
+ session.rb
138
+ streamer.rb
139
+
140
+ Rubish can take 4 kinds of objects for
141
+ IO. `String` (used as a file path), `Integer`
142
+ (used as file descriptor), `IO` object, or a ruby
143
+ block. Using the a block for IO, the block
144
+ receives a pipe connecting it to the command, for
145
+ reading or writing.
146
+
147
+ # pump numbers into cat
148
+ rbh> cat.i { |p| p.puts((1..5).to_a) }
149
+ 1
150
+ 2
151
+ 3
152
+ 4
153
+ 5
154
+
155
+ # upcase all filenames
156
+ rbh> ls.o { |p| p.each_line {|l| puts l.upcase} }
157
+ AWK.RB
158
+ COMMAND_BUILDER.RB
159
+ COMMAND.RB
160
+ EXECUTABLE.RB
161
+ LICENSE
162
+ LS-RESULT
163
+ PIPE.RB
164
+ README.TEXTILE
165
+ RUBISH.RB
166
+ SED.RB
167
+ SESSION.RB
168
+ STREAMER.RB
169
+
170
+ # kinda funny, pump numbers into cat, then pull
171
+ # them out again.
172
+ rbh> cat.i { |p| p.puts((1..10).to_a) }.o {|p| p.each_line {|l| puts l.to_i+100 }}
173
+ 101
174
+ 102
175
+ 103
176
+ 104
177
+ 105
178
+ 106
179
+ 107
180
+ 108
181
+ 109
182
+ 110
183
+
184
+ The input and output blocks are executed in their
185
+ own threads. So careful.
186
+
187
+ # Rubish with Ruby
188
+
189
+ Rubish is designed so it's easy to interface Unix
190
+ command with Ruby.
191
+
192
+ Rubish::Executable#each(&block)
193
+ yield each line of output to a block.
194
+ Rubish::Executable#map(&block)
195
+ Like #each, but collect the values returned by
196
+ the block. If no block given, collect each
197
+ line of the output.
198
+ Rubish::Executable#head(n=1,&block)
199
+ Process the first n lines of output
200
+ with a block.
201
+ Rubish::Executable#tail(n=1,&block)
202
+ Process the last n lines of output
203
+ with a block.
204
+ Rubish::Executable#first
205
+ Returns first line of output.
206
+ Rubish::Executable#last
207
+ Returns last line of output.
208
+
209
+ Since this is Ruby, there's no crazy metasyntatic
210
+ issues when you want to process the output lines.
211
+
212
+ # print filename and its extension side by side.
213
+ rbh> ls.each { |f| puts "#{f}\t#{File.extname(f)}" }
214
+ address.rb .rb
215
+ awk.output .output
216
+ awk.rb .rb
217
+ command_builder.rb .rb
218
+ command.rb .rb
219
+ executable.rb .rb
220
+ foo
221
+ foobar
222
+ foo.bar .bar
223
+ foo.rb .rb
224
+ LICENSE
225
+ my.rb .rb
226
+ pipe.rb .rb
227
+ #README.textile# .textile#
228
+ README.textile .textile
229
+ rubish.rb .rb
230
+ ruby-termios-0.9.5 .5
231
+ ruby-termios-0.9.5.tar.gz .gz
232
+ #sed.rb# .rb#
233
+ sed.rb .rb
234
+ session.rb .rb
235
+ streamer.rb .rb
236
+ todo
237
+ util
238
+
239
+ You can execute a command within the each block.
240
+
241
+ rbh> ls.each { |f| wc(f).exec }
242
+ 64 131 1013 awk.rb
243
+ 116 202 1914 command_builder.rb
244
+ 56 113 1034 command.rb
245
+ 196 563 4388 executable.rb
246
+ 24 217 1469 LICENSE
247
+ 12 12 132 ls-result
248
+ 78 245 1917 pipe.rb
249
+ 142 544 3388 README.textile
250
+ 107 278 2340 rubish.rb
251
+ 46 54 546 sed.rb
252
+ 95 206 1870 session.rb
253
+ 264 708 5906 streamer.rb
254
+
255
+ One nifty thing to do is to collect the outputs of nested commands.
256
+
257
+ rbh> ls.map {|f| stat(f).map }
258
+ [[" File: `awk.rb'",
259
+ " Size: 1013 \tBlocks: 8 IO Block: 4096 regular file",
260
+ "Device: 801h/2049d\tInode: 984369 Links: 1",
261
+ "Access: (0644/-rw-r--r--) Uid: ( 1000/ howard) Gid: ( 1000/ howard)",
262
+ "Access: 2009-03-17 21:02:25.000000000 -0700",
263
+ "Modify: 2009-03-17 21:02:13.000000000 -0700",
264
+ "Change: 2009-03-17 21:02:13.000000000 -0700"],
265
+ [" File: `command_builder.rb'",
266
+ " Size: 1914 \tBlocks: 8 IO Block: 4096 regular file",
267
+ "Device: 801h/2049d\tInode: 984371 Links: 1",
268
+ "Access: (0644/-rw-r--r--) Uid: ( 1000/ howard) Gid: ( 1000/ howard)",
269
+ "Access: 2009-03-17 21:02:25.000000000 -0700",
270
+ "Modify: 2009-03-17 21:02:13.000000000 -0700",
271
+ "Change: 2009-03-17 21:02:13.000000000 -0700"],
272
+ ...
273
+ ]
274
+
275
+ All the above apply to pipes as well. We can find
276
+ out how many files are in a directory as a Ruby
277
+ Integer.
278
+
279
+ rbh> p { ls; wc}
280
+ 23 23 248
281
+ rbh> p { ls; wc}.map
282
+ [" 23 23 248\n"]
283
+ rbh> p { ls; wc}.map.first.split
284
+ ["23", "23", "248"]
285
+ rbh> p { ls; wc}.map.first.split.first.to_i
286
+ 23
287
+
288
+ An big problem with Bash is when you have to
289
+ process output with weird characters. Ideally, you
290
+ might want to say,
291
+
292
+ wc `ls`
293
+
294
+ But that breaks. You have to say,
295
+
296
+ find . -maxdepth 1 -print0 | xargs -0 wc
297
+
298
+ And then again, that only works if you are working
299
+ with files, and if the command (e.g. wc) accepts
300
+ multiple arguments. In Rubish, you can use the
301
+ Executable#q method to tell a command to quote its
302
+ arguments. Like so,
303
+
304
+ wc(ls.map).q
305
+
306
+ # Sed and Awk
307
+
308
+ Rubish has sedish and awkish things that are not
309
+ quite like sed and awk, but not entirely unlike
310
+ sed and awk.
311
+
312
+ `Rubish::Sed` doesn't implicitly print (unlike
313
+ real sed). There's actually no option to turn on
314
+ implicit printing.
315
+
316
+ Rubish::Sed#line
317
+ the current line sed is processing
318
+ Rubish::Sed#p(*args)
319
+ print current line if no argument is given.
320
+ Rubish::Sed#s(regexp,str)
321
+ String#sub! on the current line
322
+ Rubish::Sed#gs(regexp,str)
323
+ String#gsub! on the current line
324
+ Rubish::Sed#q
325
+ quit from sed processing.
326
+
327
+
328
+ rbh> ls.sed { gs /b/, "bee"; p if line =~ /.rbee$/ }
329
+ awk.rbee
330
+ command_beeuilder.rbee
331
+ command.rbee
332
+ executabeele.rbee
333
+ pipe.rbee
334
+ rubeeish.rbee
335
+ sed.rbee
336
+ session.rbee
337
+ streamer.rbee
338
+
339
+ # output to a file
340
+ rbh> ls.sed { p }.o "sed.result"
341
+
342
+ Rubish::Sed doesn't have the concepts of swapping,
343
+ appending, modifying, or any interaction between
344
+ pattern space and hold space. Good riddance. The
345
+ block is `instance_eval` by the Sed object, so you
346
+ can keep track of state using instance variables.
347
+
348
+ Awk is a lot like sed. But you can associate
349
+ actions to be done before or after awk processing.
350
+
351
+ Rubish::Awk#begin(&block)
352
+ block is instance_eval by the Awk object
353
+ before processing.
354
+ Rubish::Awk#act(&block)
355
+ blcok is instance_eval by the Awk object for
356
+ each line.
357
+ Rubish::Awk#end(&block)
358
+ block is instance_eval at the end of
359
+ processing. Its value is returned as the
360
+ result.
361
+
362
+ rbh> ls.awk { puts do_something(line)}
363
+ # you can have begin and end blocks for awk.
364
+ rbh> ls.awk.begin { ...init }.act { ...body}.end { ...final}
365
+
366
+ You can associate multiple blocks with either awk
367
+ or sed. Each block is an "action" that's processed
368
+ in left-to-right order.
369
+
370
+ rbh> cmd.sed.act { ... }.act { ... }
371
+ rbh> cmd.awk.act { ... }.act { ... }
372
+
373
+ Rubish supports awk/sed-style pattern matching.
374
+
375
+ .sed(/a/) # triggers for lines that matches
376
+ .sed(/a/,/b/) # triggers for lines between (inclusive)
377
+ .sed(1) # matches line one
378
+ .sed(3,:eof) # matches line 3 to end of stream
379
+ # ditto with awk
380
+ .awk(/a/)
381
+
382
+
383
+ > cat.i {|p| p.puts((1..10).to_a)}.sed(2,4) { p }
384
+ 2
385
+ 3
386
+ 4
387
+
388
+ # Streamer
389
+
390
+ `Rubish::{Sed,Awk}` actually share the
391
+ `Rubish::Streamer` mixin. Most of their mechanisms
392
+ are implemented by this mixin. It has two
393
+ interesting features:
394
+
395
+ * *Line buffering* allows arbitrary peek ahead (of
396
+ lines). This lets you do what sed can with hold
397
+ space, but in a much cleaner way.
398
+ * *Aggregation* is what awk is all about. But
399
+ Rubish::Streamer implements special aggregators
400
+ inspired by Common Lisp's Loop facilities.
401
+
402
+ Let's see line buffering first.
403
+
404
+ Rubish::Streamer#peek(n=1)
405
+ Return the next n lines (as Array of Strings),
406
+ and put these lines in the stream buffer.
407
+ Rubish::Streamer#skip(n=1)
408
+ Skip the next n lines.
409
+ Rubish::Streamer#stop(n=1)
410
+ Skip other actions in the streamer, and
411
+ process next line.
412
+ Rubish::Streamer#quit(n=1)
413
+ Quit the streaming process.
414
+
415
+ By the way, isn't it nice that these methods all
416
+ have four chars?
417
+
418
+ # print files in groups of 3, separated by blank lines.
419
+ rbh> ls.sed { p; puts peek(2); puts ""; skip(3) }
420
+
421
+ In general, the aggregating methods take a name, a
422
+ value, and an optional key. The aggregated result
423
+ is accumulated in an instance variable named by
424
+ the given name. Each aggregator type basically
425
+ does foldl on an initial value. The optional key
426
+ is used to partition an aggregation.
427
+
428
+ Rubish::Streamer#count(name,key=nil)
429
+ count number of times it's called.
430
+ Rubish::Streamer#max(name,val,key=nil)
431
+ Rubish::Streamer#min(name,val,key=nil)
432
+ Rubish::Streamer#sum(name,val,key=nil)
433
+ Rubish::Streamer#collect(name,val,key=nil)
434
+ collect vals into an array.
435
+ Rubish::Streamer#hold(name,size,val,key=nil)
436
+ collect vals into a fixed-size FIFO queue.
437
+ Rubish::Streamer#pick(name,val,key=nil,&block)
438
+ pass the block old_val and new_val, and
439
+ the value returned by block is saved in
440
+ "name".
441
+
442
+ Each aggregator's name is used to create a
443
+ bucket. A reader method named by name can be used
444
+ to access that bucket. A bucket is a hash of
445
+ partitioned accumulation keyed by key. The special
446
+ key nil aggregates over the entire domain (like
447
+ MySQL's rollup).
448
+
449
+
450
+ # find the length of the longest file name, and
451
+ # collect the file names.
452
+ ls.awk { f=a[0]; max(:fl,f.length,File.extname(f)); collect(:fn,f)}.end { pp buckets; [fl,fl(""),fn] }
453
+ {:fl=>{""=>10, nil=>18, ".textile"=>14, ".rb"=>18},
454
+ :fn=>
455
+ {nil=>
456
+ ["awk.rb",
457
+ "command_builder.rb",
458
+ "command.rb",
459
+ "executable.rb",
460
+ "LICENSE",
461
+ "ls-result",
462
+ "pipe.rb",
463
+ "README.textile",
464
+ "rubish.rb",
465
+ "sed.rb",
466
+ "sed-result",
467
+ "session.rb",
468
+ "streamer.rb"]}}
469
+ [18,
470
+ 10,
471
+ ["awk.rb",
472
+ "command_builder.rb",
473
+ "command.rb",
474
+ "executable.rb",
475
+ "LICENSE",
476
+ "ls-result",
477
+ "pipe.rb",
478
+ "README.textile",
479
+ "rubish.rb",
480
+ "sed.rb",
481
+ "sed-result",
482
+ "session.rb",
483
+ "streamer.rb"]]
484
+
485
+
486
+ The first printout of hash is from `pp buckets`.
487
+ You can see the aggregation partitioned
488
+ by file extensions (in the case of `fl`). Note
489
+ that `fl(nil)` holds the max length over all the
490
+ files (the entire domain).
491
+
492
+ # Job Control
493
+
494
+ All Executable and its subclasses can execute in the background.
495
+
496
+ Executable::exec!
497
+ Execute, immediately return a Job
498
+ Executable::each!(&block)
499
+ Iterate the output in the background.
500
+ Executable::map!(acc,&block)
501
+ Accumulate output into a thread-safe
502
+ datastructure with <<
503
+
504
+ A Job has the following methods,
505
+
506
+ Job#wait
507
+ Wait for the job to finish. Would block
508
+ the current thread. Raises if computation
509
+ ends abnormally.
510
+ Job#stop
511
+ Signal for the job to terminate, then wait
512
+ for it.
513
+
514
+ In the case of executing a unix command (or pipe),
515
+ `Job#wait` would wait for the child process to
516
+ finish. `Job#stop` would send SIGTERM to the
517
+ process, then wait.
518
+
519
+ # slowcat takes 3 seconds to complete
520
+ > @j = slowcat(3).exec!
521
+ # return immediately
522
+ > @j
523
+ #<Job>
524
+ > @j.wait # blocks for three seconds
525
+
526
+ Jobs are registered in a JobControl object,
527
+
528
+ JobControl#wait(*jobs)
529
+ wait for jobs to complete, then unregister them.
530
+ JobControl#waitall
531
+ wait for all jobs to complete
532
+ JobControl#jobs
533
+ all the registered (and active) jobs.
534
+
535
+ > job_control
536
+ #<Rubish::JobControl>
537
+ > wait(@job)
538
+ # == job_control.wait(@job)
539
+ > waitall # == job_control.wait
540
+
541
+ # Context and Workspace
542
+
543
+ Rubish gives you fine control over the execution context of Executables.
544
+
545
+ * Contextual IOs are dynamically scoped.
546
+ * Contextual bindings (visible methods) are lexically scoped.
547
+
548
+ First, contextual IOs
549
+
550
+ with {
551
+ cmd1.exec
552
+ cmd2.exec
553
+ with { cmd3.exec }.o("output-3")
554
+ }.o("output-1-and-2")
555
+
556
+ At the shell, a `Workspace` object contains all
557
+ the visible method bindings you can use (as well
558
+ as methods from Kernel). Everything else
559
+ translates to a `Rubish::Command` instance by
560
+ `Workspace#method_missing`. To extend a workspace,
561
+ just mix in modules.
562
+
563
+ However, it's usually not a good idea to include a
564
+ module into the Workspace class, since this
565
+ extension would be visible in all the Workspace
566
+ instances, thus risking incompatibilities among
567
+ different extensions. It's better to extend
568
+ workspace singletons. The philosophy is, a
569
+ workspace is your own. You have the freedom to
570
+ mess it up however you like for your personal
571
+ conveniences. But the messing-it-up should be
572
+ localized.
573
+
574
+ Workspace#derive(*modules)
575
+ clone the current workspace, then extend the clone with modules.
576
+
577
+ Ruby doesn't have lexical scoping for methods, but
578
+ you can fake it by creating modules and deriving
579
+ workspaces on the fly.
580
+
581
+ with(derive({def foo; ...; end})) {
582
+ ... # outer foo
583
+ with(derive({def foo; ...; end})) {
584
+ ... # inner foo
585
+ }
586
+ ... # outer foo
587
+ }
588
+
589
+ The definition block for `derive` is used by
590
+ `Module.new(&block)` to create a dynamic module
591
+ that's mixed into a derived Workspace.
592
+
593
+
594
+ # Batch Executable
595
+
596
+ A batch executable is a block of code executed
597
+ within a context in a thread. This gives you
598
+ coarse-grained structured concurrency. It's like
599
+ subshell, but within the same process, and offers
600
+ finer control over IO and namespace (i.e. visible
601
+ bindings).
602
+
603
+ Schematically, a batch job is like,
604
+
605
+ @job = Thread.new { context.eval { work }}
606
+ @job.wait
607
+
608
+ An example,
609
+
610
+ @b = batch {
611
+ exec! cmd1, cmd2
612
+ batch {
613
+ exec! cmd3, cmd4
614
+ batch { exec! cmd5 }
615
+ }.exec
616
+ }
617
+ @b.exec! # => a job
618
+
619
+ Batches are nestable, such that each batch has its
620
+ own job control. A batch finishes when all its
621
+ jobs are terminated, as well as the jobs of all
622
+ nested job_controls.
623
+
624
+ Using batches, cocurrent jobs can be organized
625
+ structurally into a tree.
626
+
627
+ A batch is just a wrapper over context, you can
628
+ specify the execution context of a batch,
629
+
630
+ # extend the batch context
631
+ batch(derive(mod1,mod2)) { ... }
632
+
633
+ And a batch is an Executable! So all the
634
+ Executable methods are applicable:
635
+
636
+ batch { ... }.map { |l| ... }
637
+ batch { ... }.tail
638
+
639
+ # Remote Scripting
640
+
641
+ It's fun to think about.
642
+
643
+ Happy Hacking!
644
+
645
+
646
+ # Credit
647
+
648
+ Created by Howard Yeh.
649
+
650
+ Gem made available by [Gabriel Horner](http://tagaholic.me/).
651
+