wukong 0.1.4 → 1.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/INSTALL.textile +89 -0
- data/README.textile +41 -74
- data/docpages/INSTALL.textile +94 -0
- data/{doc → docpages}/LICENSE.textile +0 -0
- data/{doc → docpages}/README-wulign.textile +6 -0
- data/docpages/UsingWukong-part1-get_ready.textile +17 -0
- data/{doc/overview.textile → docpages/UsingWukong-part2-ThinkingBigData.textile} +8 -24
- data/{doc → docpages}/UsingWukong-part3-parsing.textile +8 -2
- data/docpages/_config.yml +39 -0
- data/{doc/tips.textile → docpages/bigdata-tips.textile} +71 -44
- data/{doc → docpages}/code/api_response_example.txt +0 -0
- data/{doc → docpages}/code/parser_skeleton.rb +0 -0
- data/{doc/intro_to_map_reduce → docpages/diagrams}/MapReduceDiagram.graffle +0 -0
- data/docpages/favicon.ico +0 -0
- data/docpages/gem.css +16 -0
- data/docpages/hadoop-tips.textile +83 -0
- data/docpages/index.textile +90 -0
- data/docpages/intro.textile +8 -0
- data/docpages/moreinfo.textile +174 -0
- data/docpages/news.html +24 -0
- data/{doc → docpages}/pig/PigLatinExpressionsList.txt +0 -0
- data/{doc → docpages}/pig/PigLatinReferenceManual.html +0 -0
- data/{doc → docpages}/pig/PigLatinReferenceManual.txt +0 -0
- data/docpages/tutorial.textile +283 -0
- data/docpages/usage.textile +195 -0
- data/docpages/wutils.textile +263 -0
- data/wukong.gemspec +80 -50
- metadata +87 -54
- data/doc/INSTALL.textile +0 -41
- data/doc/README-tutorial.textile +0 -163
- data/doc/README-wutils.textile +0 -128
- data/doc/TODO.textile +0 -61
- data/doc/UsingWukong-part1-setup.textile +0 -2
- data/doc/UsingWukong-part2-scraping.textile +0 -2
- data/doc/hadoop-nfs.textile +0 -51
- data/doc/hadoop-setup.textile +0 -29
- data/doc/index.textile +0 -124
- data/doc/links.textile +0 -42
- data/doc/usage.textile +0 -102
- data/doc/utils.textile +0 -48
- data/examples/and_pig/sample_queries.rb +0 -128
- data/lib/wukong/and_pig.rb +0 -62
- data/lib/wukong/and_pig/README.textile +0 -12
- data/lib/wukong/and_pig/as.rb +0 -37
- data/lib/wukong/and_pig/data_types.rb +0 -30
- data/lib/wukong/and_pig/functions.rb +0 -50
- data/lib/wukong/and_pig/generate.rb +0 -85
- data/lib/wukong/and_pig/generate/variable_inflections.rb +0 -82
- data/lib/wukong/and_pig/junk.rb +0 -51
- data/lib/wukong/and_pig/operators.rb +0 -8
- data/lib/wukong/and_pig/operators/compound.rb +0 -29
- data/lib/wukong/and_pig/operators/evaluators.rb +0 -7
- data/lib/wukong/and_pig/operators/execution.rb +0 -15
- data/lib/wukong/and_pig/operators/file_methods.rb +0 -29
- data/lib/wukong/and_pig/operators/foreach.rb +0 -98
- data/lib/wukong/and_pig/operators/groupies.rb +0 -212
- data/lib/wukong/and_pig/operators/load_store.rb +0 -65
- data/lib/wukong/and_pig/operators/meta.rb +0 -42
- data/lib/wukong/and_pig/operators/relational.rb +0 -129
- data/lib/wukong/and_pig/pig_struct.rb +0 -48
- data/lib/wukong/and_pig/pig_var.rb +0 -95
- data/lib/wukong/and_pig/symbol.rb +0 -29
- data/lib/wukong/and_pig/utils.rb +0 -0
data/doc/INSTALL.textile
DELETED
@@ -1,41 +0,0 @@
|
|
1
|
-
---
|
2
|
-
layout: default
|
3
|
-
title: Install Notes
|
4
|
-
---
|
5
|
-
|
6
|
-
|
7
|
-
h1(gemheader). {{ site.gemname }} %(small):: install%
|
8
|
-
|
9
|
-
<notextile><div class="toggle"></notextile>
|
10
|
-
|
11
|
-
h2. Get the code
|
12
|
-
|
13
|
-
This code is available as a gem:
|
14
|
-
|
15
|
-
pre. $ sudo gem install mrflip-{{ site.gemname }}
|
16
|
-
|
17
|
-
You can instead download this project in either "zip":http://github.com/mrflip/{{ site.gemname }}/zipball/master or "tar":http://github.com/mrflip/{{ site.gemname }}/tarball/master formats.
|
18
|
-
|
19
|
-
Better yet, you can also clone the project with "Git":http://git-scm.com by running:
|
20
|
-
|
21
|
-
pre. $ git clone git://github.com/mrflip/{{ site.gemname }}
|
22
|
-
|
23
|
-
<notextile></div><div class="toggle"></notextile>
|
24
|
-
|
25
|
-
h2. Get the Dependencies
|
26
|
-
|
27
|
-
* Hadoop, pig
|
28
|
-
* extlib, YAML, JSON
|
29
|
-
* Optional gems: trollop, addressable/uri, htmlentities
|
30
|
-
|
31
|
-
|
32
|
-
<notextile></div><div class="toggle"></notextile>
|
33
|
-
|
34
|
-
h2. Setup
|
35
|
-
|
36
|
-
1. Allow Wukong to discover where his elephant friend lives: either
|
37
|
-
** set a $HADOOP_HOME environment variable,
|
38
|
-
** or create a file 'config/wukong-site.yaml' with a line that points to the top-level directory of your hadoop install: @:hadoop_home: /usr/local/share/hadoop@
|
39
|
-
2. Add wukong's @bin/@ directory to your $PATH, so that you may use its filesystem shortcuts.
|
40
|
-
|
41
|
-
<notextile></div></notextile>
|
data/doc/README-tutorial.textile
DELETED
@@ -1,163 +0,0 @@
|
|
1
|
-
Here's a script to count words in a text stream:
|
2
|
-
|
3
|
-
require 'wukong'
|
4
|
-
module WordCount
|
5
|
-
class Mapper < Wukong::Streamer::LineStreamer
|
6
|
-
# Emit each word in the line.
|
7
|
-
def process line
|
8
|
-
words = line.strip.split(/\W+/).reject(&:blank?)
|
9
|
-
words.each{|word| yield [word, 1] }
|
10
|
-
end
|
11
|
-
end
|
12
|
-
|
13
|
-
class Reducer < Wukong::Streamer::ListReducer
|
14
|
-
def finalize
|
15
|
-
yield [ key, values.map(&:last).map(&:to_i).sum ]
|
16
|
-
end
|
17
|
-
end
|
18
|
-
end
|
19
|
-
|
20
|
-
Wukong::Script.new(
|
21
|
-
WordCount::Mapper,
|
22
|
-
WordCount::Reducer
|
23
|
-
).run # Execute the script
|
24
|
-
|
25
|
-
The first class, the Mapper, eats lines and craps @[word, count]@ records. Here
|
26
|
-
the /key/ is the word, and the /value/ is its count.
|
27
|
-
|
28
|
-
The second class is an example of an accumulated list reducer. The values for
|
29
|
-
each key are stacked up into a list; then the record(s) yielded by @#finalize@
|
30
|
-
are emitted.
|
31
|
-
|
32
|
-
Here's another way to write the Reducer: accumulate the count of each line, then
|
33
|
-
yield the sum in @#finalize@:
|
34
|
-
|
35
|
-
class Reducer2 < Wukong::Streamer::AccumulatingReducer
|
36
|
-
attr_accessor :key_count
|
37
|
-
def start! *args
|
38
|
-
self.key_count = 0
|
39
|
-
end
|
40
|
-
def accumulate(word, count)
|
41
|
-
self.key_count += count.to_i
|
42
|
-
end
|
43
|
-
def finalize
|
44
|
-
yield [ key, key_count ]
|
45
|
-
end
|
46
|
-
end
|
47
|
-
|
48
|
-
Of course you can be really lazy (that is, smart) and write your script instead as
|
49
|
-
|
50
|
-
class Script < Wukong::Script
|
51
|
-
def reducer_command
|
52
|
-
'uniq -c'
|
53
|
-
end
|
54
|
-
end
|
55
|
-
|
56
|
-
|
57
|
-
h2. Structured data
|
58
|
-
|
59
|
-
All of these deal with unstructured data. Wukong also lets you view your data
|
60
|
-
as a stream of structured objects.
|
61
|
-
|
62
|
-
Let's say you have a blog; its records look like
|
63
|
-
|
64
|
-
Post = Struct.new( :id, :created_at, :user_id, :title, :body, :link )
|
65
|
-
Comment = Struct.new( :id, :created_at, :post_id, :user_id, :body )
|
66
|
-
User = Struct.new( :id, :username, :fullname, :homepage, :description )
|
67
|
-
UserLoc = Struct.new( :user_id, :text, :lat, :lng )
|
68
|
-
|
69
|
-
You've been using "twitter":http://twitter.com for a long time, and you've
|
70
|
-
written something that from now on will inject all your tweets as Posts, and all
|
71
|
-
replies to them as Comments (by a common 'twitter_bot' account on your blog).
|
72
|
-
What about the past two years' worth of tweets? Let's assume you're so chatty that
|
73
|
-
a Map/Reduce script is warranted to handle the volume.
|
74
|
-
|
75
|
-
Cook up something that scrapes your tweets and all replies to your tweets:
|
76
|
-
|
77
|
-
Tweet = Struct.new( :id, :created_at, :twitter_user_id,
|
78
|
-
:in_reply_to_user_id, :in_reply_to_status_id, :text )
|
79
|
-
TwitterUser = Struct.new( :id, :username, :fullname,
|
80
|
-
:homepage, :location, :description )
|
81
|
-
|
82
|
-
Now we'll just process all those in a big pile, converting to Posts, Comments
|
83
|
-
and Users as appropriate. Serialize your scrape results so that each Tweet and
|
84
|
-
each TwitterUser is a single lines containing first the class name ('tweet' or
|
85
|
-
'twitter_user') followed by its constituent fields, in order, separated by tabs.
|
86
|
-
|
87
|
-
The RecordStreamer takes each such line, constructs its corresponding class, and
|
88
|
-
instantiates it with the
|
89
|
-
|
90
|
-
require 'wukong'
|
91
|
-
require 'my_blog' #defines the blog models
|
92
|
-
module TwitBlog
|
93
|
-
class Mapper < Wukong::Streamer::RecordStreamer
|
94
|
-
# Watch for tweets by me
|
95
|
-
MY_USER_ID = 24601
|
96
|
-
# structs for our input objects
|
97
|
-
Tweet = Struct.new( :id, :created_at, :twitter_user_id,
|
98
|
-
:in_reply_to_user_id, :in_reply_to_status_id, :text )
|
99
|
-
TwitterUser = Struct.new( :id, :username, :fullname,
|
100
|
-
:homepage, :location, :description )
|
101
|
-
#
|
102
|
-
# If this is a tweet is by me, convert it to a Post.
|
103
|
-
#
|
104
|
-
# If it is a tweet not by me, convert it to a Comment that
|
105
|
-
# will be paired with the correct Post.
|
106
|
-
#
|
107
|
-
# If it is a TwitterUser, convert it to a User record and
|
108
|
-
# a user_location record
|
109
|
-
#
|
110
|
-
def process record
|
111
|
-
case record
|
112
|
-
when TwitterUser
|
113
|
-
user = MyBlog::User.new.merge(record) # grab the fields in common
|
114
|
-
user_loc = MyBlog::UserLoc.new(record.id, record.location, nil, nil)
|
115
|
-
yield user
|
116
|
-
yield user_loc
|
117
|
-
when Tweet
|
118
|
-
if record.twitter_user_id == MY_USER_ID
|
119
|
-
post = MyBlog::Post.new.merge record
|
120
|
-
post.link = "http://twitter.com/statuses/show/#{record.id}"
|
121
|
-
post.body = record.text
|
122
|
-
post.title = record.text[0..65] + "..."
|
123
|
-
yield post
|
124
|
-
else
|
125
|
-
comment = MyBlog::Comment.new.merge record
|
126
|
-
comment.body = record.text
|
127
|
-
comment.post_id = record.in_reply_to_status_id
|
128
|
-
yield comment
|
129
|
-
end
|
130
|
-
end
|
131
|
-
end
|
132
|
-
end
|
133
|
-
end
|
134
|
-
Wukong::Script.new( TwitBlog::Mapper, nil ).run # identity reducer
|
135
|
-
|
136
|
-
h2. Uniqifying
|
137
|
-
|
138
|
-
The script above uses the identity reducer: every record from the mapper is sent
|
139
|
-
to the output. But what if you had grabbed the replying user's record every time
|
140
|
-
you saw a reply?
|
141
|
-
|
142
|
-
Fine, so pass it through @uniq@. But what if a user updated their location or
|
143
|
-
description during this time? You'll want to probably use UniqByLastReducer
|
144
|
-
|
145
|
-
Location might want to take the most /frequent/, and might want as well to
|
146
|
-
geolocate the location text. Use a ListReducer, find the most frequent element,
|
147
|
-
then finally call the expensive geolocation method.
|
148
|
-
|
149
|
-
h2. A note about keys
|
150
|
-
|
151
|
-
Now we're going to write this using the synthetic keys already extant in the
|
152
|
-
twitter records, making the unwarranted assumption that they won't collide with
|
153
|
-
the keys in your database.
|
154
|
-
|
155
|
-
Map/Reduce paradigm does badly with synthetic keys. Synthetic keys demand
|
156
|
-
locality, and map/reduce's remarkable scaling comes from not assuming
|
157
|
-
locality. In general, write your map/reduce scripts to use natural keys (the scre
|
158
|
-
|
159
|
-
h1. More info
|
160
|
-
|
161
|
-
There are many useful examples (including an actually-useful version of this
|
162
|
-
WordCount script) in examples/ directory.
|
163
|
-
|
data/doc/README-wutils.textile
DELETED
@@ -1,128 +0,0 @@
|
|
1
|
-
h1. Wukong Utility Scripts
|
2
|
-
|
3
|
-
h2. Stupid command-line tricks
|
4
|
-
|
5
|
-
h3. Histogram
|
6
|
-
|
7
|
-
Given data with a date column:
|
8
|
-
|
9
|
-
message 235623 20090423012345 Now is the winter of our discontent Made glorious summer by this son of York
|
10
|
-
message 235623 20080101230900 These pretzels are making me THIRSTY!
|
11
|
-
...
|
12
|
-
|
13
|
-
You can calculate number of messages sent by day with
|
14
|
-
|
15
|
-
cat messages | cuttab 3 | cutc 8 | sort | uniq -c
|
16
|
-
|
17
|
-
(see the wuhist command, below.)
|
18
|
-
|
19
|
-
h3. Simple intersection, union, etc
|
20
|
-
|
21
|
-
For two datasets (batch_1 and batch_2) with unique entries (no repeated lines),
|
22
|
-
|
23
|
-
* Their union is simple:
|
24
|
-
|
25
|
-
cat batch_1 batch_2 | sort -u
|
26
|
-
|
27
|
-
|
28
|
-
* Their intersection:
|
29
|
-
|
30
|
-
cat batch_1 batch_2 | sort | uniq -c | egrep -v '^ *1 '
|
31
|
-
|
32
|
-
This concatenates the two sets and filters out everything that only occurred once.
|
33
|
-
|
34
|
-
* For the complement of the intersection, use "... | egrep '^ *1 '"
|
35
|
-
|
36
|
-
* In both cases, if the files are each internally sorted, the commandline sort takes a --merge flag:
|
37
|
-
|
38
|
-
sort --merge -u batch_1 batch_2
|
39
|
-
|
40
|
-
h2. Command Listing
|
41
|
-
|
42
|
-
h3. cutc
|
43
|
-
|
44
|
-
@cutc [colnum]@
|
45
|
-
|
46
|
-
Ex.
|
47
|
-
|
48
|
-
echo -e 'foo\tbar\tbaz' | cutc 6
|
49
|
-
foo ba
|
50
|
-
|
51
|
-
Cuts from beginning of line to given column (default 200). A tab is one character, so right margin can still be ragged.
|
52
|
-
|
53
|
-
h3. cuttab
|
54
|
-
|
55
|
-
@cuttab [colspec]@
|
56
|
-
|
57
|
-
Cuts given tab-separated columns. You can give a comma separated list of numbers
|
58
|
-
or ranges 1-4. columns are numbered from 1.
|
59
|
-
|
60
|
-
Ex.
|
61
|
-
|
62
|
-
echo -e 'foo\tbar\tbaz' | cuttab 1,3
|
63
|
-
foo baz
|
64
|
-
|
65
|
-
h3. hdp-*
|
66
|
-
|
67
|
-
These perform the corresponding commands on the HDFS filesystem. In general,
|
68
|
-
where they accept command-line flags, they go with the GNU-style ones, not the
|
69
|
-
hadoop-style: so, @hdp-du -s dir@ or @hdp-rm -r foo/@
|
70
|
-
|
71
|
-
* @hdp-cat@
|
72
|
-
* @hdp-catd@ -- cats the files that don't start with '_' in a directory. Use this for a pile of @.../part-00000@ files
|
73
|
-
* @hdp-du@
|
74
|
-
* @hdp-get@
|
75
|
-
* @hdp-kill@
|
76
|
-
* @hdp-ls@
|
77
|
-
* @hdp-mkdir@
|
78
|
-
* @hdp-mv@
|
79
|
-
* @hdp-ps@
|
80
|
-
* @hdp-put@
|
81
|
-
* @hdp-rm@
|
82
|
-
* @hdp-sync@
|
83
|
-
|
84
|
-
h3. hdp-sort, hdp-stream, hdp-stream-flat
|
85
|
-
|
86
|
-
* @hdp-sort@
|
87
|
-
* @hdp-stream@
|
88
|
-
* @hdp-stream-flat@
|
89
|
-
|
90
|
-
<code><pre>
|
91
|
-
hdp-stream input_filespec output_file map_cmd reduce_cmd num_key_fields
|
92
|
-
</pre></code>
|
93
|
-
|
94
|
-
h3. tabchar
|
95
|
-
|
96
|
-
Outputs a single tab character.
|
97
|
-
|
98
|
-
h3. wuhist
|
99
|
-
|
100
|
-
Occasionally useful to gather a lexical histogram of a single column:
|
101
|
-
|
102
|
-
Ex.
|
103
|
-
|
104
|
-
<code><pre>
|
105
|
-
$ echo -e 'foo\nbar\nbar\nfoo\nfoo\nfoo\n7' | ./wuhist
|
106
|
-
4 foo
|
107
|
-
2 bar
|
108
|
-
1 7
|
109
|
-
</pre></code>
|
110
|
-
|
111
|
-
(the output will have a tab between the first and second column, for futher processing.)
|
112
|
-
|
113
|
-
h3. wulign
|
114
|
-
|
115
|
-
Intelligently format a tab-separated file into aligned columns (while remaining tab-separated for further processing). See README-wulign.textile.
|
116
|
-
|
117
|
-
h3. hdp-parts_to_keys.rb
|
118
|
-
|
119
|
-
A *very* clumsy script to rename reduced hadoop output files by their initial key.
|
120
|
-
|
121
|
-
If your output file has an initial key in the first column and you pass it
|
122
|
-
through hdp-sort, they will be distributed across reducers and thus output
|
123
|
-
files. (Because of the way hadoop hashes the keys, there's no guarantee that
|
124
|
-
each file will get a distinct key. You could have 2 keys with a million entries
|
125
|
-
and they could land sequentially on the same reducer, always fun.)
|
126
|
-
|
127
|
-
If you're willing to roll the dice, this script will rename files according to
|
128
|
-
the first key in the first line.
|
data/doc/TODO.textile
DELETED
@@ -1,61 +0,0 @@
|
|
1
|
-
Utility
|
2
|
-
|
3
|
-
* columnizing / reconstituting
|
4
|
-
|
5
|
-
* Set up with JRuby
|
6
|
-
* Allow for direct HDFS operations
|
7
|
-
* Make the dfs commands slightly less stupid
|
8
|
-
* add more standard options
|
9
|
-
* Allow for combiners
|
10
|
-
* JobStarter / JobSteps
|
11
|
-
* might as well take dumbo's command line args
|
12
|
-
|
13
|
-
BUGS:
|
14
|
-
|
15
|
-
* Can't do multiple input files in local mode
|
16
|
-
|
17
|
-
Patterns to implement:
|
18
|
-
|
19
|
-
* Stats reducer (takes sum, avg, max, min, std.dev of a numeric field)
|
20
|
-
* Make StructRecordizer work generically with other reducers (spec. AccumulatingReducer)
|
21
|
-
|
22
|
-
Example graph scripts:
|
23
|
-
|
24
|
-
* Multigraph
|
25
|
-
* Pagerank (done)
|
26
|
-
* Breadth-first search
|
27
|
-
* Triangle enumeration
|
28
|
-
* Clustering
|
29
|
-
|
30
|
-
Example example scripts (from http://www.cloudera.com/resources/learning-mapreduce):
|
31
|
-
|
32
|
-
1. Find the [number of] hits by 5 minute timeslot for a website given its access logs.
|
33
|
-
|
34
|
-
2. Find the pages with over 1 million hits in day for a website given its access logs.
|
35
|
-
|
36
|
-
3. Find the pages that link to each page in a collection of webpages.
|
37
|
-
|
38
|
-
4. Calculate the proportion of lines that match a given regular expression for a collection of documents.
|
39
|
-
|
40
|
-
5. Sort tabular data by a primary and secondary column.
|
41
|
-
|
42
|
-
6. Find the most popular pages for a website given its access logs.
|
43
|
-
|
44
|
-
/can use
|
45
|
-
|
46
|
-
|
47
|
-
---------------------------------------------------------------------------
|
48
|
-
|
49
|
-
Add statistics helpers
|
50
|
-
|
51
|
-
* including "running standard deviation":http://www.johndcook.com/standard_deviation.html
|
52
|
-
|
53
|
-
|
54
|
-
---------------------------------------------------------------------------
|
55
|
-
|
56
|
-
Make wutils: tsv-oriented implementations of the coreutils (eg uniq, sort, cut, nl, wc, split, ls, df and du) to instrinsically accept and emit tab-separated records.
|
57
|
-
|
58
|
-
More example hadoop algorithms:
|
59
|
-
Bigram counts: http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/bigrams.html
|
60
|
-
* Inverted index construction: http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/indexer.html
|
61
|
-
* Pagerank : http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/pagerank.html
|
data/doc/hadoop-nfs.textile
DELETED
@@ -1,51 +0,0 @@
|
|
1
|
-
The "Cloudera Hadoop AMI Instances":http://www.cloudera.com/hadoop-ec2 for Amazon's EC2 compute cloud are the fastest, easiest way to get up and running with hadoop. Unfortunately, doing streaming scripts can be a pain, especially if you're doing iterative development.
|
2
|
-
|
3
|
-
Installing NFS to share files along the cluster gives the following conveniences:
|
4
|
-
|
5
|
-
* You don't have to bundle everything up with each run: any path in ~coder/ will refer back via NFS to the filesystem on master.
|
6
|
-
|
7
|
-
* The user can now passwordless ssh among the nodes, since there's only one shared home directory and since we included the user's own public key in the authorized_keys2 file. This lets you easily rsync files among the nodes.
|
8
|
-
|
9
|
-
First, you need to take note of the _internal_ name for your master, perhaps something like @domU-xx-xx-xx-xx-xx-xx.compute-1.internal@.
|
10
|
-
|
11
|
-
As root, on the master (change @compute-1.internal@ to match your setup):
|
12
|
-
|
13
|
-
<pre>
|
14
|
-
apt-get install nfs-kernel-server
|
15
|
-
echo "/home *.compute-1.internal(rw)" >> /etc/exports ;
|
16
|
-
/etc/init.d/nfs-kernel-server stop ;
|
17
|
-
</pre>
|
18
|
-
|
19
|
-
(The @*.compute-1.internal@ part limits host access, but you should take a look at the security settings of both EC2 and the built-in portmapper as well.)
|
20
|
-
|
21
|
-
Next, set up a regular user account on the *master only*. In this case our user will be named 'chimpy':
|
22
|
-
|
23
|
-
<pre>
|
24
|
-
visudo # uncomment the last line, to allow group sudo to sudo
|
25
|
-
groupadd admin
|
26
|
-
adduser chimpy
|
27
|
-
usermod -a -G sudo,admin chimpy
|
28
|
-
su chimpy # now you are the new user
|
29
|
-
ssh-keygen -t rsa # accept all the defaults
|
30
|
-
cat ~/.ssh/id_rsa.pub # can paste this public key into your github, etc
|
31
|
-
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
|
32
|
-
</pre>
|
33
|
-
|
34
|
-
Then on each slave (replacing domU-xx-... by the internal name for the master node):
|
35
|
-
|
36
|
-
<pre>
|
37
|
-
apt-get install nfs-common ;
|
38
|
-
echo "domU-xx-xx-xx-xx-xx-xx.compute-1.internal:/home /mnt/home nfs rw 0 0" >> /etc/fstab
|
39
|
-
/etc/init.d/nfs-common restart
|
40
|
-
mkdir /mnt/home
|
41
|
-
mount /mnt/home
|
42
|
-
ln -s /mnt/home/chimpy /home/chimpy
|
43
|
-
</pre>
|
44
|
-
|
45
|
-
You should now be in business.
|
46
|
-
|
47
|
-
Performance tradeoffs should be small as long as you're just sending code files and gems around. *Don't* write out log entries or data to NFS partitions, or you'll effectively perform a denial-of-service attack on the master node.
|
48
|
-
|
49
|
-
------------------------------
|
50
|
-
|
51
|
-
The "Setting up an NFS Server HOWTO":http://nfs.sourceforge.net/nfs-howto/index.html was an immense help, and I recommend reading it carefully.
|