rubish 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.textile ADDED
@@ -0,0 +1,651 @@
1
+ # Rubish
2
+
3
+ Rubish is a shell in Ruby. It is *object
4
+ oriented*, and it only uses Ruby's own syntax (*no
5
+ metasyntax* of its own). Rubish is pronounced
6
+ Roobish, as opposed to Rubbish, unlike Bash.
7
+
8
+ # Getting Started
9
+
10
+ Fire up an irb, and start Rubish
11
+
12
+ $ irb -rrubish
13
+ irb> Rubish.repl
14
+ rbh> date
15
+ Tue Mar 17 17:06:04 PDT 2009
16
+ rbh> uname :svrm
17
+ Linux 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686
18
+
19
+ A few tips upfront. Sometimes Rubish could mess up
20
+ your terminal. If that happens, try hitting `C-c`
21
+ and `C-d` to get back to Bash, then,
22
+
23
+ $ reset
24
+
25
+ to reset your terminal. Also, Rubish doesn't have
26
+ shell history. But it uses the readline library,
27
+ so you can use its history mechanism. `C-r <string>`
28
+ to match a previously entered line with
29
+ string.
30
+
31
+ # Overview
32
+
33
+ Rubish's Executable class provides a common API
34
+ for IO redirection and output processing. The
35
+ subclasses are,
36
+
37
+ Command
38
+ A unix command.
39
+ Pipe
40
+ A pipe line of unix commands
41
+ Awk
42
+ Sed
43
+ Batch
44
+ An arbitrary block of code. Like subshell.
45
+
46
+ # Command
47
+
48
+ Rubish REPL takes a line and `instance_eval` it
49
+ with the shell object (a `Rubish::Context`). If
50
+ the method is undefined, they call is translated
51
+ into an Executable (`Rubish::Command`) object with
52
+ `method_missing`.
53
+
54
+ rbh> ls
55
+ awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
56
+
57
+ # ls evaluates to a Rubish::Command object, which
58
+ # is a subclass of Rubish::Executable
59
+ rbh> ls.inspect
60
+ "#<Rubish::Command::ShellCommand:0xb7ac297c @args=\"\", @status=nil, @cmd=\"ls \">"
61
+
62
+ # you can store a command in an instance variable
63
+ rbh> @cmd = ls; nil
64
+ nil
65
+ # if the shell evaluates to a command, the shell
66
+ # calls the exec method on it.
67
+ rbh> @cmd # same as @cmd.exec
68
+ awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
69
+ rbh> @cmd
70
+ awk.rb command_builder.rb command.rb executable.rb LICENSE pipe.rb README.textile rubish.rb sed.rb session.rb streamer.rb
71
+
72
+ You can invoke a command with arguments of
73
+ `String`, `Symbol`, or `Array` (of `String`,
74
+ `Symbol`, or `Array` (recursively)). A `String`
75
+ argument is taken as it is. A `Symbol` is
76
+ translated to a flag (:flag => -flag). Arguments
77
+ in an `Array` are treated likewise. Finally, all
78
+ the arguments are flatten and joined together.
79
+
80
+ The followings are equivalent,
81
+
82
+ rbh> ls :l, "awk.rb", "sed.rb"
83
+ rbh> ls "-l awk.rb sed.rb"
84
+ rbh> ls :l, %w(awk.rb sed.rb)
85
+
86
+ # Pipe
87
+
88
+ rbh> p { ls ; tr "a-z A-Z" }
89
+ AWK.RB
90
+ COMMAND_BUILDER.RB
91
+ COMMAND.RB
92
+ EXECUTABLE.RB
93
+ LICENSE
94
+ PIPE.RB
95
+ README.TEXTILE
96
+ RUBISH.RB
97
+ SED.RB
98
+ SESSION.RB
99
+ STREAMER.RB
100
+
101
+ Pipes are first class values:
102
+
103
+ rbh> @pipe = p { ls ; tr "a-z A-Z" }; nil
104
+ # again, we return nil so @pipe doesn't get executed.
105
+ rbh> @pipe
106
+ # execute @pipe once
107
+ rbh> @pipe
108
+ # execute @pipe again
109
+
110
+ # IO redirections
111
+
112
+ IO redirections are done by methods defined in
113
+ `Rubish::Executable`.
114
+
115
+ Rubish::Executable#i(io=nil)
116
+ Set the $stdin of the executable when
117
+ it is executed. If called without an argument,
118
+ return the executable's IO object.
119
+ Rubish::Executable#o(io=nil)
120
+ Ditto for $stdout
121
+ Rubish::Executable#err(io=nil)
122
+ Ditto for $stderr
123
+
124
+
125
+ rbh> ls.o("ls-result")
126
+ rbh> cat.i("ls-result")
127
+ awk.rb
128
+ command_builder.rb
129
+ command.rb
130
+ executable.rb
131
+ LICENSE
132
+ ls-result
133
+ pipe.rb
134
+ README.textile
135
+ rubish.rb
136
+ sed.rb
137
+ session.rb
138
+ streamer.rb
139
+
140
+ Rubish can take 4 kinds of objects for
141
+ IO. `String` (used as a file path), `Integer`
142
+ (used as file descriptor), `IO` object, or a ruby
143
+ block. Using the a block for IO, the block
144
+ receives a pipe connecting it to the command, for
145
+ reading or writing.
146
+
147
+ # pump numbers into cat
148
+ rbh> cat.i { |p| p.puts((1..5).to_a) }
149
+ 1
150
+ 2
151
+ 3
152
+ 4
153
+ 5
154
+
155
+ # upcase all filenames
156
+ rbh> ls.o { |p| p.each_line {|l| puts l.upcase} }
157
+ AWK.RB
158
+ COMMAND_BUILDER.RB
159
+ COMMAND.RB
160
+ EXECUTABLE.RB
161
+ LICENSE
162
+ LS-RESULT
163
+ PIPE.RB
164
+ README.TEXTILE
165
+ RUBISH.RB
166
+ SED.RB
167
+ SESSION.RB
168
+ STREAMER.RB
169
+
170
+ # kinda funny, pump numbers into cat, then pull
171
+ # them out again.
172
+ rbh> cat.i { |p| p.puts((1..10).to_a) }.o {|p| p.each_line {|l| puts l.to_i+100 }}
173
+ 101
174
+ 102
175
+ 103
176
+ 104
177
+ 105
178
+ 106
179
+ 107
180
+ 108
181
+ 109
182
+ 110
183
+
184
+ The input and output blocks are executed in their
185
+ own threads. So careful.
186
+
187
+ # Rubish with Ruby
188
+
189
+ Rubish is designed so it's easy to interface Unix
190
+ command with Ruby.
191
+
192
+ Rubish::Executable#each(&block)
193
+ yield each line of output to a block.
194
+ Rubish::Executable#map(&block)
195
+ Like #each, but collect the values returned by
196
+ the block. If no block given, collect each
197
+ line of the output.
198
+ Rubish::Executable#head(n=1,&block)
199
+ Process the first n lines of output
200
+ with a block.
201
+ Rubish::Executable#tail(n=1,&block)
202
+ Process the last n lines of output
203
+ with a block.
204
+ Rubish::Executable#first
205
+ Returns first line of output.
206
+ Rubish::Executable#last
207
+ Returns last line of output.
208
+
209
+ Since this is Ruby, there's no crazy metasyntatic
210
+ issues when you want to process the output lines.
211
+
212
+ # print filename and its extension side by side.
213
+ rbh> ls.each { |f| puts "#{f}\t#{File.extname(f)}" }
214
+ address.rb .rb
215
+ awk.output .output
216
+ awk.rb .rb
217
+ command_builder.rb .rb
218
+ command.rb .rb
219
+ executable.rb .rb
220
+ foo
221
+ foobar
222
+ foo.bar .bar
223
+ foo.rb .rb
224
+ LICENSE
225
+ my.rb .rb
226
+ pipe.rb .rb
227
+ #README.textile# .textile#
228
+ README.textile .textile
229
+ rubish.rb .rb
230
+ ruby-termios-0.9.5 .5
231
+ ruby-termios-0.9.5.tar.gz .gz
232
+ #sed.rb# .rb#
233
+ sed.rb .rb
234
+ session.rb .rb
235
+ streamer.rb .rb
236
+ todo
237
+ util
238
+
239
+ You can execute a command within the each block.
240
+
241
+ rbh> ls.each { |f| wc(f).exec }
242
+ 64 131 1013 awk.rb
243
+ 116 202 1914 command_builder.rb
244
+ 56 113 1034 command.rb
245
+ 196 563 4388 executable.rb
246
+ 24 217 1469 LICENSE
247
+ 12 12 132 ls-result
248
+ 78 245 1917 pipe.rb
249
+ 142 544 3388 README.textile
250
+ 107 278 2340 rubish.rb
251
+ 46 54 546 sed.rb
252
+ 95 206 1870 session.rb
253
+ 264 708 5906 streamer.rb
254
+
255
+ One nifty thing to do is to collect the outputs of nested commands.
256
+
257
+ rbh> ls.map {|f| stat(f).map }
258
+ [[" File: `awk.rb'",
259
+ " Size: 1013 \tBlocks: 8 IO Block: 4096 regular file",
260
+ "Device: 801h/2049d\tInode: 984369 Links: 1",
261
+ "Access: (0644/-rw-r--r--) Uid: ( 1000/ howard) Gid: ( 1000/ howard)",
262
+ "Access: 2009-03-17 21:02:25.000000000 -0700",
263
+ "Modify: 2009-03-17 21:02:13.000000000 -0700",
264
+ "Change: 2009-03-17 21:02:13.000000000 -0700"],
265
+ [" File: `command_builder.rb'",
266
+ " Size: 1914 \tBlocks: 8 IO Block: 4096 regular file",
267
+ "Device: 801h/2049d\tInode: 984371 Links: 1",
268
+ "Access: (0644/-rw-r--r--) Uid: ( 1000/ howard) Gid: ( 1000/ howard)",
269
+ "Access: 2009-03-17 21:02:25.000000000 -0700",
270
+ "Modify: 2009-03-17 21:02:13.000000000 -0700",
271
+ "Change: 2009-03-17 21:02:13.000000000 -0700"],
272
+ ...
273
+ ]
274
+
275
+ All the above apply to pipes as well. We can find
276
+ out how many files are in a directory as a Ruby
277
+ Integer.
278
+
279
+ rbh> p { ls; wc}
280
+ 23 23 248
281
+ rbh> p { ls; wc}.map
282
+ [" 23 23 248\n"]
283
+ rbh> p { ls; wc}.map.first.split
284
+ ["23", "23", "248"]
285
+ rbh> p { ls; wc}.map.first.split.first.to_i
286
+ 23
287
+
288
+ An big problem with Bash is when you have to
289
+ process output with weird characters. Ideally, you
290
+ might want to say,
291
+
292
+ wc `ls`
293
+
294
+ But that breaks. You have to say,
295
+
296
+ find . -maxdepth 1 -print0 | xargs -0 wc
297
+
298
+ And then again, that only works if you are working
299
+ with files, and if the command (e.g. wc) accepts
300
+ multiple arguments. In Rubish, you can use the
301
+ Executable#q method to tell a command to quote its
302
+ arguments. Like so,
303
+
304
+ wc(ls.map).q
305
+
306
+ # Sed and Awk
307
+
308
+ Rubish has sedish and awkish things that are not
309
+ quite like sed and awk, but not entirely unlike
310
+ sed and awk.
311
+
312
+ `Rubish::Sed` doesn't implicitly print (unlike
313
+ real sed). There's actually no option to turn on
314
+ implicit printing.
315
+
316
+ Rubish::Sed#line
317
+ the current line sed is processing
318
+ Rubish::Sed#p(*args)
319
+ print current line if no argument is given.
320
+ Rubish::Sed#s(regexp,str)
321
+ String#sub! on the current line
322
+ Rubish::Sed#gs(regexp,str)
323
+ String#gsub! on the current line
324
+ Rubish::Sed#q
325
+ quit from sed processing.
326
+
327
+
328
+ rbh> ls.sed { gs /b/, "bee"; p if line =~ /.rbee$/ }
329
+ awk.rbee
330
+ command_beeuilder.rbee
331
+ command.rbee
332
+ executabeele.rbee
333
+ pipe.rbee
334
+ rubeeish.rbee
335
+ sed.rbee
336
+ session.rbee
337
+ streamer.rbee
338
+
339
+ # output to a file
340
+ rbh> ls.sed { p }.o "sed.result"
341
+
342
+ Rubish::Sed doesn't have the concepts of swapping,
343
+ appending, modifying, or any interaction between
344
+ pattern space and hold space. Good riddance. The
345
+ block is `instance_eval` by the Sed object, so you
346
+ can keep track of state using instance variables.
347
+
348
+ Awk is a lot like sed. But you can associate
349
+ actions to be done before or after awk processing.
350
+
351
+ Rubish::Awk#begin(&block)
352
+ block is instance_eval by the Awk object
353
+ before processing.
354
+ Rubish::Awk#act(&block)
355
+ blcok is instance_eval by the Awk object for
356
+ each line.
357
+ Rubish::Awk#end(&block)
358
+ block is instance_eval at the end of
359
+ processing. Its value is returned as the
360
+ result.
361
+
362
+ rbh> ls.awk { puts do_something(line)}
363
+ # you can have begin and end blocks for awk.
364
+ rbh> ls.awk.begin { ...init }.act { ...body}.end { ...final}
365
+
366
+ You can associate multiple blocks with either awk
367
+ or sed. Each block is an "action" that's processed
368
+ in left-to-right order.
369
+
370
+ rbh> cmd.sed.act { ... }.act { ... }
371
+ rbh> cmd.awk.act { ... }.act { ... }
372
+
373
+ Rubish supports awk/sed-style pattern matching.
374
+
375
+ .sed(/a/) # triggers for lines that matches
376
+ .sed(/a/,/b/) # triggers for lines between (inclusive)
377
+ .sed(1) # matches line one
378
+ .sed(3,:eof) # matches line 3 to end of stream
379
+ # ditto with awk
380
+ .awk(/a/)
381
+
382
+
383
+ > cat.i {|p| p.puts((1..10).to_a)}.sed(2,4) { p }
384
+ 2
385
+ 3
386
+ 4
387
+
388
+ # Streamer
389
+
390
+ `Rubish::{Sed,Awk}` actually share the
391
+ `Rubish::Streamer` mixin. Most of their mechanisms
392
+ are implemented by this mixin. It has two
393
+ interesting features:
394
+
395
+ * *Line buffering* allows arbitrary peek ahead (of
396
+ lines). This lets you do what sed can with hold
397
+ space, but in a much cleaner way.
398
+ * *Aggregation* is what awk is all about. But
399
+ Rubish::Streamer implements special aggregators
400
+ inspired by Common Lisp's Loop facilities.
401
+
402
+ Let's see line buffering first.
403
+
404
+ Rubish::Streamer#peek(n=1)
405
+ Return the next n lines (as Array of Strings),
406
+ and put these lines in the stream buffer.
407
+ Rubish::Streamer#skip(n=1)
408
+ Skip the next n lines.
409
+ Rubish::Streamer#stop(n=1)
410
+ Skip other actions in the streamer, and
411
+ process next line.
412
+ Rubish::Streamer#quit(n=1)
413
+ Quit the streaming process.
414
+
415
+ By the way, isn't it nice that these methods all
416
+ have four chars?
417
+
418
+ # print files in groups of 3, separated by blank lines.
419
+ rbh> ls.sed { p; puts peek(2); puts ""; skip(3) }
420
+
421
+ In general, the aggregating methods take a name, a
422
+ value, and an optional key. The aggregated result
423
+ is accumulated in an instance variable named by
424
+ the given name. Each aggregator type basically
425
+ does foldl on an initial value. The optional key
426
+ is used to partition an aggregation.
427
+
428
+ Rubish::Streamer#count(name,key=nil)
429
+ count number of times it's called.
430
+ Rubish::Streamer#max(name,val,key=nil)
431
+ Rubish::Streamer#min(name,val,key=nil)
432
+ Rubish::Streamer#sum(name,val,key=nil)
433
+ Rubish::Streamer#collect(name,val,key=nil)
434
+ collect vals into an array.
435
+ Rubish::Streamer#hold(name,size,val,key=nil)
436
+ collect vals into a fixed-size FIFO queue.
437
+ Rubish::Streamer#pick(name,val,key=nil,&block)
438
+ pass the block old_val and new_val, and
439
+ the value returned by block is saved in
440
+ "name".
441
+
442
+ Each aggregator's name is used to create a
443
+ bucket. A reader method named by name can be used
444
+ to access that bucket. A bucket is a hash of
445
+ partitioned accumulation keyed by key. The special
446
+ key nil aggregates over the entire domain (like
447
+ MySQL's rollup).
448
+
449
+
450
+ # find the length of the longest file name, and
451
+ # collect the file names.
452
+ ls.awk { f=a[0]; max(:fl,f.length,File.extname(f)); collect(:fn,f)}.end { pp buckets; [fl,fl(""),fn] }
453
+ {:fl=>{""=>10, nil=>18, ".textile"=>14, ".rb"=>18},
454
+ :fn=>
455
+ {nil=>
456
+ ["awk.rb",
457
+ "command_builder.rb",
458
+ "command.rb",
459
+ "executable.rb",
460
+ "LICENSE",
461
+ "ls-result",
462
+ "pipe.rb",
463
+ "README.textile",
464
+ "rubish.rb",
465
+ "sed.rb",
466
+ "sed-result",
467
+ "session.rb",
468
+ "streamer.rb"]}}
469
+ [18,
470
+ 10,
471
+ ["awk.rb",
472
+ "command_builder.rb",
473
+ "command.rb",
474
+ "executable.rb",
475
+ "LICENSE",
476
+ "ls-result",
477
+ "pipe.rb",
478
+ "README.textile",
479
+ "rubish.rb",
480
+ "sed.rb",
481
+ "sed-result",
482
+ "session.rb",
483
+ "streamer.rb"]]
484
+
485
+
486
+ The first printout of hash is from `pp buckets`.
487
+ You can see the aggregation partitioned
488
+ by file extensions (in the case of `fl`). Note
489
+ that `fl(nil)` holds the max length over all the
490
+ files (the entire domain).
491
+
492
+ # Job Control
493
+
494
+ All Executable and its subclasses can execute in the background.
495
+
496
+ Executable::exec!
497
+ Execute, immediately return a Job
498
+ Executable::each!(&block)
499
+ Iterate the output in the background.
500
+ Executable::map!(acc,&block)
501
+ Accumulate output into a thread-safe
502
+ datastructure with <<
503
+
504
+ A Job has the following methods,
505
+
506
+ Job#wait
507
+ Wait for the job to finish. Would block
508
+ the current thread. Raises if computation
509
+ ends abnormally.
510
+ Job#stop
511
+ Signal for the job to terminate, then wait
512
+ for it.
513
+
514
+ In the case of executing a unix command (or pipe),
515
+ `Job#wait` would wait for the child process to
516
+ finish. `Job#stop` would send SIGTERM to the
517
+ process, then wait.
518
+
519
+ # slowcat takes 3 seconds to complete
520
+ > @j = slowcat(3).exec!
521
+ # return immediately
522
+ > @j
523
+ #<Job>
524
+ > @j.wait # blocks for three seconds
525
+
526
+ Jobs are registered in a JobControl object,
527
+
528
+ JobControl#wait(*jobs)
529
+ wait for jobs to complete, then unregister them.
530
+ JobControl#waitall
531
+ wait for all jobs to complete
532
+ JobControl#jobs
533
+ all the registered (and active) jobs.
534
+
535
+ > job_control
536
+ #<Rubish::JobControl>
537
+ > wait(@job)
538
+ # == job_control.wait(@job)
539
+ > waitall # == job_control.wait
540
+
541
+ # Context and Workspace
542
+
543
+ Rubish gives you fine control over the execution context of Executables.
544
+
545
+ * Contextual IOs are dynamically scoped.
546
+ * Contextual bindings (visible methods) are lexically scoped.
547
+
548
+ First, contextual IOs
549
+
550
+ with {
551
+ cmd1.exec
552
+ cmd2.exec
553
+ with { cmd3.exec }.o("output-3")
554
+ }.o("output-1-and-2")
555
+
556
+ At the shell, a `Workspace` object contains all
557
+ the visible method bindings you can use (as well
558
+ as methods from Kernel). Everything else
559
+ translates to a `Rubish::Command` instance by
560
+ `Workspace#method_missing`. To extend a workspace,
561
+ just mix in modules.
562
+
563
+ However, it's usually not a good idea to include a
564
+ module into the Workspace class, since this
565
+ extension would be visible in all the Workspace
566
+ instances, thus risking incompatibilities among
567
+ different extensions. It's better to extend
568
+ workspace singletons. The philosophy is, a
569
+ workspace is your own. You have the freedom to
570
+ mess it up however you like for your personal
571
+ conveniences. But the messing-it-up should be
572
+ localized.
573
+
574
+ Workspace#derive(*modules)
575
+ clone the current workspace, then extend the clone with modules.
576
+
577
+ Ruby doesn't have lexical scoping for methods, but
578
+ you can fake it by creating modules and deriving
579
+ workspaces on the fly.
580
+
581
+ with(derive({def foo; ...; end})) {
582
+ ... # outer foo
583
+ with(derive({def foo; ...; end})) {
584
+ ... # inner foo
585
+ }
586
+ ... # outer foo
587
+ }
588
+
589
+ The definition block for `derive` is used by
590
+ `Module.new(&block)` to create a dynamic module
591
+ that's mixed into a derived Workspace.
592
+
593
+
594
+ # Batch Executable
595
+
596
+ A batch executable is a block of code executed
597
+ within a context in a thread. This gives you
598
+ coarse-grained structured concurrency. It's like
599
+ subshell, but within the same process, and offers
600
+ finer control over IO and namespace (i.e. visible
601
+ bindings).
602
+
603
+ Schematically, a batch job is like,
604
+
605
+ @job = Thread.new { context.eval { work }}
606
+ @job.wait
607
+
608
+ An example,
609
+
610
+ @b = batch {
611
+ exec! cmd1, cmd2
612
+ batch {
613
+ exec! cmd3, cmd4
614
+ batch { exec! cmd5 }
615
+ }.exec
616
+ }
617
+ @b.exec! # => a job
618
+
619
+ Batches are nestable, such that each batch has its
620
+ own job control. A batch finishes when all its
621
+ jobs are terminated, as well as the jobs of all
622
+ nested job_controls.
623
+
624
+ Using batches, cocurrent jobs can be organized
625
+ structurally into a tree.
626
+
627
+ A batch is just a wrapper over context, you can
628
+ specify the execution context of a batch,
629
+
630
+ # extend the batch context
631
+ batch(derive(mod1,mod2)) { ... }
632
+
633
+ And a batch is an Executable! So all the
634
+ Executable methods are applicable:
635
+
636
+ batch { ... }.map { |l| ... }
637
+ batch { ... }.tail
638
+
639
+ # Remote Scripting
640
+
641
+ It's fun to think about.
642
+
643
+ Happy Hacking!
644
+
645
+
646
+ # Credit
647
+
648
+ Created by Howard Yeh.
649
+
650
+ Gem made available by [Gabriel Horner](http://tagaholic.me/).
651
+