ironfan 4.3.4 → 4.4.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (66) hide show
  1. data/CHANGELOG.md +7 -0
  2. data/ELB.md +121 -0
  3. data/Gemfile +1 -0
  4. data/Rakefile +4 -0
  5. data/VERSION +1 -1
  6. data/ironfan.gemspec +48 -3
  7. data/lib/chef/knife/cluster_launch.rb +5 -0
  8. data/lib/chef/knife/cluster_proxy.rb +3 -3
  9. data/lib/chef/knife/cluster_sync.rb +4 -0
  10. data/lib/chef/knife/ironfan_knife_common.rb +17 -6
  11. data/lib/chef/knife/ironfan_script.rb +29 -11
  12. data/lib/ironfan.rb +2 -2
  13. data/lib/ironfan/broker/computer.rb +8 -3
  14. data/lib/ironfan/dsl/ec2.rb +133 -2
  15. data/lib/ironfan/headers.rb +4 -0
  16. data/lib/ironfan/provider.rb +48 -3
  17. data/lib/ironfan/provider/ec2.rb +23 -8
  18. data/lib/ironfan/provider/ec2/elastic_load_balancer.rb +239 -0
  19. data/lib/ironfan/provider/ec2/iam_server_certificate.rb +101 -0
  20. data/lib/ironfan/provider/ec2/machine.rb +8 -0
  21. data/lib/ironfan/provider/ec2/security_group.rb +3 -5
  22. data/lib/ironfan/requirements.rb +2 -0
  23. data/notes/Home.md +45 -0
  24. data/notes/INSTALL-cloud_setup.md +103 -0
  25. data/notes/INSTALL.md +134 -0
  26. data/notes/Ironfan-Roadmap.md +70 -0
  27. data/notes/advanced-superpowers.md +16 -0
  28. data/notes/aws_servers.jpg +0 -0
  29. data/notes/aws_user_key.png +0 -0
  30. data/notes/cookbook-versioning.md +11 -0
  31. data/notes/core_concepts.md +200 -0
  32. data/notes/declaring_volumes.md +3 -0
  33. data/notes/design_notes-aspect_oriented_devops.md +36 -0
  34. data/notes/design_notes-ci_testing.md +169 -0
  35. data/notes/design_notes-cookbook_event_ordering.md +249 -0
  36. data/notes/design_notes-meta_discovery.md +59 -0
  37. data/notes/ec2-pricing_and_capacity.md +69 -0
  38. data/notes/ec2-pricing_and_capacity.numbers +0 -0
  39. data/notes/homebase-layout.txt +102 -0
  40. data/notes/knife-cluster-commands.md +18 -0
  41. data/notes/named-cloud-objects.md +11 -0
  42. data/notes/opscode_org_key.png +0 -0
  43. data/notes/opscode_user_key.png +0 -0
  44. data/notes/philosophy.md +13 -0
  45. data/notes/rake_tasks.md +24 -0
  46. data/notes/renamed-recipes.txt +142 -0
  47. data/notes/silverware.md +85 -0
  48. data/notes/style_guide.md +300 -0
  49. data/notes/tips_and_troubleshooting.md +92 -0
  50. data/notes/version-3_2.md +273 -0
  51. data/notes/walkthrough-hadoop.md +168 -0
  52. data/notes/walkthrough-web.md +166 -0
  53. data/spec/fixtures/ec2/elb/snakeoil.crt +35 -0
  54. data/spec/fixtures/ec2/elb/snakeoil.key +51 -0
  55. data/spec/integration/minimal-chef-repo/chefignore +41 -0
  56. data/spec/integration/minimal-chef-repo/environments/_default.json +12 -0
  57. data/spec/integration/minimal-chef-repo/knife/credentials/knife-org.rb +19 -0
  58. data/spec/integration/minimal-chef-repo/knife/credentials/knife-user-ironfantester.rb +9 -0
  59. data/spec/integration/minimal-chef-repo/knife/knife.rb +66 -0
  60. data/spec/integration/minimal-chef-repo/roles/systemwide.rb +10 -0
  61. data/spec/integration/spec/elb_build_spec.rb +95 -0
  62. data/spec/integration/spec_helper.rb +16 -0
  63. data/spec/integration/spec_helper/launch_cluster.rb +55 -0
  64. data/spec/ironfan/ec2/elb_spec.rb +95 -0
  65. data/spec/ironfan/ec2/security_group_spec.rb +0 -6
  66. metadata +60 -3
@@ -0,0 +1,85 @@
1
+ # Silverware Chef Cookbook
2
+
3
+ ## Overview
4
+
5
+ Cookbooks repeatably express these and other aspects:
6
+
7
+ * "I launch these daemons: ..."
8
+ * "I have a collection of logs at '/var/log/lol'"
9
+ * "I have a dashboard at 'http://....:...'"
10
+ * ... and much more.
11
+
12
+ Wouldn't it be nice if announcing a log directory caused...
13
+
14
+ - my log rotation system to start rotating my logs?
15
+ - a 'disk free space' gauge to be added to the monitoring dashboard for that service?
16
+ - Flume (or whatever) began picking up my logs and archiving them to a predictable location?
17
+ - in the case of standard apache logs, a listener to start counting the rate of requests, 200s, 404s and so forth?
18
+ Similarly, announcing ports should mean
19
+ - the firewall and security groups configure themselves correspondingly
20
+ - the monitor system starts regularly pinging the port for uptime and latency
21
+ - and pings the interfaces that it should *not* appear on to ensure the firewall is in place?
22
+
23
+ Ironfan makes those aspects standardized and predictable, and provides integration and discovery hooks. The key is to make integration *inevitable*: No more forgetting to rotate or monitor a service, or having a config change over here screw up a dependent system over there.
24
+ ________________________________________________________________________
25
+
26
+ Attributes are scoped by *cookbook* and then by *component*.
27
+
28
+ * If I declare `announce(:redis)`, it will look in `node[:redis]`.
29
+ * If I declare `announce(:hadoop, :namenode)`, it will look in `node[:hadoop]` for cookbook-wide concerns and `node[:hadoop][:namenode]` for component-specific concerns.
30
+ * The cookbook scope is always named for its cookbook. Its attributes live in`node[:cookbook_name]`. If everything in the cookbook shares a concern, it sits at cookbook level. So the Hadoop log directory (shared by all its components) is at `(scratch_root)/hadoop/log`.
31
+ * If there is only one component, it can be implicitly named for its cookbook. In this case, it is omitted: the component attributes live in `node[:cookbook_name]` (which is the same as the component name).
32
+ * If there are multiple components, they will live in `node[:cookbook_name][:component_name]` (eg `[:hadoop][:namenode]` or `[:flume][:master]`.
33
+
34
+ ### Discovery
35
+
36
+ Allow nodes to discover the location for a given service at runtime, adapting when new services register.
37
+
38
+ #### Operations:
39
+
40
+ * register for a service. A timestamp records the last registry.
41
+ * discover all chef nodes that have registered for the given service.
42
+ * discover the most recent chef node for that service.
43
+ * get the 'public_ip' for a service -- the address that nodes in the larger world should use
44
+ * get the 'private_ip' for a service -- the address that nodes on the local subnet / private cloud should use
45
+
46
+ #### Implementation
47
+
48
+ Nodes register a service by calling `announce(<service>[,<component>])`, which adds a hash to node[:announces][<service>][<component>], containing 'timestamp' (the time of registry) and other metadata passed in. Nodes discover services by calling `discover(<service>[,<component>[,<realm>]])`, where realm is the scope of the discovery (the current cluster, by default).
49
+
50
+ ## Recipes
51
+
52
+ * `default` - Base configuration for silverware
53
+
54
+ ## Integration
55
+
56
+ Supports platforms: Debian and Ubuntu
57
+
58
+
59
+
60
+ ## Attributes
61
+
62
+ * `[:silverware][:conf_dir]` - (default: "/etc/silverware")
63
+ * `[:silverware][:log_dir]` - (default: "/var/log/silverware")
64
+ * `[:silverware][:home_dir]` - (default: "/etc/silverware")
65
+ * `[:silverware][:user]` - (default: "root")
66
+ * `[:users][:root][:primary_group]` - (default: "root")
67
+
68
+ ## License and Author
69
+
70
+ Author:: Philip (flip) Kromer - Infochimps, Inc (<coders@infochimps.com>)
71
+ Copyright:: 2011, Philip (flip) Kromer - Infochimps, Inc
72
+
73
+ Licensed under the Apache License, Version 2.0 (the "License");
74
+ you may not use this file except in compliance with the License.
75
+ You may obtain a copy of the License at
76
+
77
+ http://www.apache.org/licenses/LICENSE-2.0
78
+
79
+ Unless required by applicable law or agreed to in writing, software
80
+ distributed under the License is distributed on an "AS IS" BASIS,
81
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
82
+ See the License for the specific language governing permissions and
83
+ limitations under the License.
84
+
85
+ > readme generated by [ironfan](http://github.com/infochimps-labs/ironfan)'s cookbook_munger
@@ -0,0 +1,300 @@
1
+ # Ironfan + Chef Style Guide
2
+
3
+ ------------------------------------------------------------------------
4
+
5
+ ### System+Component define Names
6
+
7
+ Name things uniformly for their system and component. For the ganglia master,
8
+
9
+ * attributes: `node[:ganglia][:master]`
10
+ * recipe: `ganglia::master`
11
+ * role: `ganglia_master`
12
+ * directories: `ganglia/master` (if specific to component), `ganglia` (if not).
13
+ - for example: `/var/log/ganglia/master`
14
+
15
+ ### Component names
16
+
17
+ * `agent.rb`
18
+ * `worker.rb`
19
+ * `datanode.rb`
20
+ * `webnode.rb`
21
+
22
+
23
+ ### Recipes
24
+
25
+ Recipes partition these things:
26
+
27
+ * shared functionality between components
28
+ * proper event order
29
+ * optional or platform-specific functionality
30
+
31
+ * Within the foo cookbook, name your recipes like this:
32
+ - `default.rb` -- information shared by anyone using foo, including support packages, users and directories.
33
+ - `user.rb` -- define daemon users. Called 'user' even if there is more than one. It's OK to move this into the default cookbook.
34
+ - `install_from_X.rb` -- install packages (`install_from_package`), versioned tarballs (`install_from_release`). It's OK to move this into `default.rb`.
35
+ - `deploy.rb` -- use this when doing sha-versioned deploys.
36
+ - `plugins.rb` -- install additional plugins or support code. If you have separate plugins, name them `git_plugin`, `rspec_plugin`, etc.
37
+ - `server.rb` -- define the foo server process. Similarly, `agent`, `worker`, etc -- see component naming above.
38
+ - `client.rb` -- install libraries to *use* the foo service.
39
+ - `config_files.rb` -- discover other components, write final configuration to disk
40
+ - `finalize.rb` -- final cleanup
41
+
42
+ * Do not repeat the cookbook name in a recipe title: `ganglia::master`, not `ganglia::ganglia_master`.
43
+ * Use only `[a-z0-9_]` for cookbook and component names. Do not use capital letters or hyphens.
44
+ * Keep names short and descriptive (preferably 15 characters or less, or it jacks with the Chef webui).
45
+
46
+ * Always include a `default.rb` recipe, even if it is blank.
47
+ * *DO NOT* use the default cookbook to install daemons or do anything interesting at all, even if that's currently the only thing the recipe does. I want to be able to refer to the attributes in the apache cookbook without launching the apache service. Think of it like a C header file.
48
+
49
+ A `client` is also passive -- it lets me *use* the system without requiring that I run it. This means the client recipe should *never* launch a process (chef_client` and `nfs_client` components are allowed exceptions).
50
+
51
+ ### Cookbook Dependencies
52
+
53
+ * Dependencies should be announced in metadata.rb, of course.
54
+ * Explicitly `include_recipe` for system resources -- `runit`, `java`, `silverware`, `thrift` and `apt`.
55
+ - never
56
+ * *DO NOT* use `include_recipe` unless putting it in the role would be utterly un-interesting. You *want* the run to break unless it's explicitly included in the role.
57
+ - *yes*: `java`, `ruby`, `announces`, etc.
58
+ - *no*: `zookeeper::client`, `nfs::server`, or anything that will start a daemon
59
+ Remember: ordinary cookbooks describe systems, roles and integration cookbooks coordinate them.
60
+ * `include_recipe` statements should only appear in recipes that are entry points. Recipes that are not meant to be called directly should assume their dependencies have been met.
61
+ * If a recipe is meant to be the primary entrypoint, it *should* include default, and it should do so explicitly: `include_recipe 'foo::default'` (not just 'foo').
62
+
63
+ Crisply separate cookbook-wide concerns from component concerns.
64
+
65
+ Separate system configuration from multi-system integration. Cookbooks should provide hooks that are neighborly but not exhibitionist, and otherwise mind their own business.
66
+
67
+ ### Templates
68
+
69
+ *DO NOT* refer to attributes directly on the node (`node[:foo]`). This prevents people from using those templates outside the cookbook. Instead:
70
+
71
+ ```ruby
72
+ # in recipe
73
+ template 'fooconf.yml' do
74
+ variables :foo => node[:foo]
75
+ end
76
+
77
+ # in template
78
+ @node[:log_dir]
79
+ ```
80
+
81
+ ### Attributes
82
+
83
+ * Scope concerns by *cookbook* or *cookbook and component*. `node[:hadoop]` holds cookbook-wide concerns, `node[:hadoop][:namenode]` holds component-specific concerns.
84
+ * Attributes shared by all components sit at cookbook level, and are always named for the cookbook: `node[:hadoop][:log_dir]` (since it is shared by all its components).
85
+ * Component-specific attributes sit at component level (`node[:cookbook_name][:component_name]`): eg `node[:hadoop][:namenode][:service_state]`. Do not use a prefix (NO: `node[:hadoop][:namenode_handler_count]`)
86
+
87
+ * Refer to node attributes by symbol, never by method:
88
+ - `node[:ganglia][:log_dir]`, not `node.ganglia.log_dir` or `node['ganglia']['log_dir']
89
+
90
+ #### Attribute Files
91
+
92
+ * The main attribute file should be named `attributes/default.rb`. Do not name the file after the cookbook, or anything else.
93
+ * If there are a sizeable number of tunable attributes (hadoop, cassandra), place them in `attributes/tuneables.rb`.
94
+
95
+ ## Name Attributes for their aspects
96
+
97
+ Attributes should be named for their aspect: `port`, `log`, etc. Use generic names if there is only one attribute for an aspect, prefixed names if there are many:
98
+ - For a component that only opens one port: `node[:foo][:server][:port]`
99
+ - More than one port, use a prefix: `node[:foo][:server][:dash_port]` and `node[:foo][:server][:rpc_port]`.
100
+
101
+ Sometimes the conventions below are inappropriate. All we ask is in those cases that you *not* use the special magic name. For example, don't use `:port` and give it a comma-separated string; name it something else, like `:port_list`.
102
+
103
+ Here are specific conventions:
104
+
105
+ ### File and Dir Aspects
106
+
107
+ A *file* is the full directory and basename for a file. A *dir* is a directory whose contents correspond to a single concern. A *prefix* not intended to be used directly -- it will be decorated with suffixes to form dirs and files. A *basename* is only the leaf part of a file reference. Don't use the terms 'path' or 'filename'.
108
+
109
+ Ignore the temptation to make a one-true-home-for-my-system, or to fight the package maintainer's choices. (FIXME: Rewrite to encourage OS-correct naming schemas.)
110
+ - a sandbox holding dir, pid, log, ...
111
+
112
+ #### Application
113
+
114
+ * **prefix**: A container with directories bin, lib, share, src, to use according to convention
115
+ - default: `/usr/local`.
116
+ * **home_dir**: Logical location for the cookbook's system code.
117
+ - default: typically, leave it up to the package maintainer. Otherwise, `:prefix/share/:cookbook` should be a symlink to the `install_dir` (see below).
118
+ - instead of: `xx_home` / `dir` alone / `install_dir`
119
+ * **install_dir**: The cookbook's system code, in case the home dir is a pointer to potential alternates.
120
+ - default: `:prefix/share/:cookbook-:version` ( you don't need the directory after the cookbook runs, use `:prefix/share/:cookbook-:version` instead, eg `/usr/local/src/tokyo_tyrant-xx.xx`)
121
+ - Make `home_dir` a symlink to this directory (eg home_dir `/usr/local/share/elasticsearch` links to install_dir `/usr/local/share/elasticsearch-0.17.8`).
122
+ * **src_dir**: holds the compressed tarball, its expanded contents, and the compiled files when installing from source. Use this when you will run `make install` or equivalent and use the files elsewhere.
123
+ - default: `:prefix/src/:system_name-:version`, eg `/usr/local/src/pig-0.9.tar.gz`
124
+ - do not: expand the tarball to `:prefix/src/(whatever)` if it will actually be used from there; instead, use the `install_dir` convention described above. (As a guideline, I should be able to blow away `/usr/local/src` and everything still works).
125
+ * **deploy_dir**: deployed code that follows the capistrano convention. See more about deploy variables below.
126
+ - the `:deploy_dir/shared` directory holds common files
127
+ - releases are checked out to `:deploy_dir/releases/{sha}`
128
+ - the operational release is a symlink to the right release: `:deploy_dir/current -> :deploy_dir/releases/xxx`.
129
+ - do not: use this when you mean `home_dir`.
130
+
131
+ * **scratch_roots**, **persistent_roots**: an array of directories spread across volumes, with expectations on persistence
132
+ - `scratch_root`s have no guarantee of persistence -- for example, stop/start'ing a machine on EC2 destroys the contents of its local (ephemeral) drives. `persistent_root`s have the *best available* promise of persistance: if permanent (eg EBS) volumes are available, they will exclusively populate the `persistent_root`s; but if not, the ephemeral drives are used instead.
133
+ - these attributes are provided by the `mountable_volume` meta-cookbook and its appropriate integration recipe. Ordinary cookbooks should always trust the integration cookbook's choices (or visit the integration cookbook to correct them).
134
+ - each element in `persistent_roots` is by contract on a separate volume, and similarly each of the `scratch_roots` is on a separate volume. A volume *may* be in both scratch and persistent (for example, there may be only one volume!).
135
+ - the singular forms **scratch_root** and **persistent_root** are provided for your convenience and always correspond to `scratch_roots.first` and `persistent_roots.first`. This means lots the first named volume is picked on the heaviest -- if you don't like that, choose explicitly (but not randomly, or you won't be idempotent).
136
+
137
+
138
+ * **log_file**, **log_dir**, **xx_log_file**, **xx_log_dir**:
139
+ - default:
140
+ - if the log files will always be trivial in size, put them in `/var/log/:cookbook.log` or `/var/log/:cookbook/(whatever)`.
141
+ - if it's a runit-managed service, leave them in `/etc/sv/:cookbook-:component/log/main/current`, and make a symlink from `/var/log/:cookbook-component` to `/etc/sv/:cookbook-:component/log/main/`.
142
+ - If the log files are non-trivial in size, set log dir `/:scratch_root/:cookbook/log/`, and symlink `/var/log/:cookbook/` to it.
143
+ - If the log files should be persisted, place them in `/:persistent_root/:cookbook/log`, and symlink `/var/log/:cookbook/` to it.
144
+ - in all cases, the directory is named `.../log`, not `.../logs`. Never put things in `/tmp`.
145
+ - Use the physical location for the `log_dir` attribute, not the /var/log symlink.
146
+ * **tmp_dir**:
147
+ - default: `/:scratch_root/:cookbook/tmp/`
148
+ - Do not put a symlink or directory in `/tmp` -- something else blows it away, the app recreates it as a physical directory, `/tmp` overflows, pagers go off, sadness spreads throughout the land.
149
+ * **conf_dir**:
150
+ - default: `/etc/:cookbook`
151
+ * **bin_dir**:
152
+ - default: `/:home_dir/bin`
153
+ * **pid_file**, **pid_dir**:
154
+ - default: pid_file: `/var/run/:cookbook.pid` or `/var/run/:cookbook/:component.pid`; pid_dir: `/var/run/:cookbook/`
155
+ - instead of: `job_dir`, `job_file`, `pidfile`, `run_dir`.
156
+ * **cache_dir**:
157
+ - default: `/var/cache/:cookbook`.
158
+
159
+ * **data_dir**:
160
+ - default: `:persistent_root/:cookbook/:component/data`
161
+ - instead of: `datadir, `dbfile`, `dbdir`
162
+ * **journal_dir**: high-speed local storage for commitlogs and so forth. Can be deleted, though you may rather it wasn't.
163
+ - default: `:scratch_root/:cookbook/:component/scratch`
164
+ - instead of: `commitlog_dir`
165
+
166
+ ### Daemon Aspects
167
+
168
+ * **daemon_name**: daemon's actual service name, if it differs from the component. For example, the `hadoop-namenode` component's daemon is `hadoop-0.20-namenode` as installed by apt.
169
+ * **daemon_states**: an array of the verbs acceptable to the Chef `service` resource: `:enable`, `:start`, etc.
170
+ * **num_xx_processes**, **num_xx_threads** the number of separate top-level processes (distinct PIDs) or internal threads to run
171
+ - instead of `num_workers`, `num_servers`, `worker_processes`, `foo_threads`.
172
+ * **log_level**
173
+ - application-specific; often takes values info, debug, warn
174
+ - instead of `verbose`, `verbosity`, `loglevel`
175
+ * **user**, **group**, **uid**, **gid** -- `user` is the user name. The `user` and `group` should be strings, even the `uid` and `gid` should be integers.
176
+ - instead of username, group_name, using uid for user name or vice versa.
177
+ - if there are multiple users, use a prefix: `launcher_user` and `observer_user`.
178
+
179
+ ### Install / Deploy Aspects
180
+
181
+ * **release_url**: URL for the release.
182
+ - instead of: install_url, package_url, being careless about partial vs whole URLs
183
+ * **release_file**: Where to put the release.
184
+ - default: `:prefix/src/system_name-version.ext`, eg `/usr/local/src/elasticsearch-0.17.8.tar.bz2`.
185
+ - do not use `/tmp` -- let me decide when to blow it away (and make it easy to be idempotent).
186
+ - do not use a non-versioned URL or file name.
187
+ * **release_file_sha** or **release_file_md5** fingerprint
188
+ - instead of: `whatever_checksum`, `whatever_fingerprint`
189
+ * **version**: if it's a simply-versioned resource that uses the `major.minor.patch-cruft` convention. Do not use unless this is true, and do not use the source control revision ID.
190
+
191
+ * **plugins**: array of system-specific plugins
192
+
193
+ use `deploy_{}` for anything that would be true whatever SCM you're using; use `git_{}` (and so forth) where specific to that repo.
194
+
195
+ * **deploy_env** production / staging / etc
196
+ * **deploy_strategy**
197
+ * **deploy_user** user to run as
198
+ * **deploy_dir**: Only use `deploy_dir` if you are following the capistrano convention: see above.
199
+
200
+ * **git_repo**: url for the repo, eg `git@github.com:infochimps-labs/ironfan.git` or `http://github.com/infochimps-labs/ironfan.git`
201
+ - instead of: `deploy_repo`, `git_url`
202
+ * **git_revision**: SHA or branch
203
+ - instead of: `deploy_revision`
204
+
205
+ * **apt/(repo_name)** Options for adding a cookbook's apt repo.
206
+ - Note that this is filed under *apt*, not the cookbook.
207
+ - Use the best name for the repo, which is not necessarily the cookbook's name: eg `apt/cloudera/{...}`, which is shared by hadoop, flume, pig, and so on.
208
+ - `apt/{repo_name}/url` -- eg `http://archive.cloudera.com/debian`
209
+ - `apt/{repo_name}/key` -- GPG key
210
+ - `apt/{repo_name}/force_distro` -- forces the distro (eg, you are on natty but the apt repo only has maverick)
211
+
212
+ ### Ports
213
+
214
+ * **xx_port**:
215
+ - *do not* use 'port' on its own.
216
+ - examples: `thrift_port`, `webui_port`, `zookeeper_port`, `carbon_port` and `whisper_port`.
217
+ - xx_port: `default[:foo][:server][:port] = 5000`
218
+ - xx_ports, if an array: `default[:foo][:server][:ports] = [5000, 5001, 5002]`
219
+
220
+ * **addr**, **xx_addr**
221
+ - if all ports bind to the same interface, use `addr`. Otherwise, do *not* use `addr`, and use a unique `foo_addr` for each `foo_port`.
222
+ - instead of: `hostname`, `binding`, `address`
223
+
224
+ * Want some way to announce my port is http or https.
225
+ * Need to distinguish client ports from service ports. You should be using cluster service discovery anyway though.
226
+
227
+ ### Application Integration
228
+
229
+ * **jmx_port**
230
+
231
+ ### Tunables
232
+
233
+ * **XX_heap_max**, **xx_heap_min**, **java_heap_eden**
234
+ * **java_home**
235
+ * AVOID batch declaration of options (e.g. **java_opts**) if possible: assemble it in your recipe from intelligible attribute names.
236
+
237
+ ### Nitpicks
238
+
239
+ * Always put file modes in quote marks: `mode "0664"` not `mode 0664`.
240
+
241
+ ## Announcing Aspects
242
+
243
+ If your app does any of the following,
244
+
245
+ * **services** -- Any interesting long-running process.
246
+ * **ports** -- Any reserved open application port
247
+ - *http*: HTTP application port
248
+ - *https*: HTTPS application port
249
+ - *internal*: port is on private IP, should *not* be visible through public IP
250
+ - *external*: port *is* available through public IP
251
+ * metric_ports:
252
+ - **jmx_ports** -- JMX diagnostic port (announced by many Java apps)
253
+ * **dashboards** -- Web interface to look inside a system; typically internal-facing only, and probably not performance-monitored by default.
254
+ * **logs** -- um, logs. You can also announce the logs' flavor: `:apache`, `log4j`, etc.
255
+ * **scheduleds** -- regularly-occurring events that leave a trace
256
+ * **exports** -- jars or libs that other programs may wish to incorporate
257
+ * **consumes** -- placed there by any call to `discover`.
258
+
259
+ ## Clusters
260
+
261
+ * Describe physical configuration:
262
+ - machine size, number of instances per facet, etc
263
+ - external assets (elastic IP, ebs volumes)
264
+ * Describe high-level assembly of systems via roles: `hadoop_namenode`, `nfs_client`, `ganglia_agent`, etc.
265
+ * Describe important modifications, such as `ironfan::system_internals`, mounts ebs volumes, etc
266
+ * Describe override attributes:
267
+ - `heap size`, rvm versions, etc.
268
+
269
+ * roles and recipes
270
+ - remove `cluster_role` and `facet_role` if empty
271
+ - are not in `run_list`, but populated by the `role` and `recipe` directives
272
+ * remove big_package unless it's a dev machine (sandbox, etc)
273
+
274
+ ## Roles
275
+
276
+ Roles define the high-level assembly of recipes into systems
277
+
278
+ * override attributes go into the cluster.
279
+ currently, those files are typically empty and are badly cluttering the roles/ directory.
280
+ the cluster and facet override attributes should be together, not scattered in different files.
281
+ roles shouldn't assemble systems. The contents of the infochimps_chef/roles/plato_truth.rb file belong in a facet.
282
+
283
+ * Deprecated:
284
+ - Cluster and facet roles (`roles/gibbon_cluster.rb`, `roles/gibbon_namenode.rb`, etc) go away
285
+ - Roles should be service-oriented: `hadoop_master` considered harmful, you should explicitly enumerate the services
286
+
287
+
288
+ ### Facets should be (nearly) identical
289
+
290
+ Within a facet, keep your servers almost entirely identical. For example, servers in a MySQL facet would their index to set shard order and to claim the right attached volumes. However, it would be a mistake to have one server within a facet be a master process and the rest be worker processes -- just define different facets for each.
291
+
292
+ ### Pedantic Distinctions:
293
+
294
+ Separate the following terms:
295
+
296
+ * A *machine* is a concrete thing that runs your code -- it might be a VM or raw metal, but it has CPUs and fans and a finite lifetime. It has a unique name tied to its physical presence -- something like 'i-123abcd' or 'rack 4 server 7'.
297
+ * A *chef node* is the code object that, together with the chef-client process, configures a machine. In ironfan, the chef node is strictly slave to the server description and the measured attributes of the machine.
298
+ * A *server description* gives the high-level specification the machine should acheive. This includes the roles, recipes and attributes given to the chef node; the physical characteristics of the machine ('8 cores, 7GB ram, AWS cloud'); and its relation to the rest of the system (george cluster, webnode facet, index 3).
299
+
300
+ In particular, we try to be careful to always call a Chef node a 'chef node' (never just 'node'). Try processing graph nodes in a flume node feeding a node.js decorator on a cloud node define by a chef node. No(de) way.
@@ -0,0 +1,92 @@
1
+ ## Tips and Notes
2
+
3
+ ### Gems
4
+
5
+ knife cluster ssh bonobo-worker-2 'sudo gem update --system'
6
+ knife cluster ssh bonobo-worker-2 'sudo true ; for foo in /usr/lib/ruby/gems/1.9.2-p290/specifications/* ; do sudo sed -i.bak "s!000000000Z!!" $foo ; done'
7
+ knife cluster ssh bonobo-worker-2 'sudo true ; for foo in /usr/lib/ruby/site_ruby/*/rubygems/deprecate.rb ; do sudo sed -i.bak "s!@skip ||= false!true!" $foo ; done'
8
+
9
+
10
+ ### EC2 Notes Instance attributes: `disable_api_termination` and `delete_on_termination`
11
+
12
+ To set `delete_on_termination` to 'true' after the fact, run the following (modify the instance and volume to suit):
13
+
14
+ ```
15
+ ec2-modify-instance-attribute -v i-0704be6c --block-device-mapping /dev/sda1=vol-XX8d2c80::true
16
+ ```
17
+
18
+ If you set `disable_api_termination` to true, in order to terminate the node run
19
+ ```
20
+ ec2-modify-instance-attribute -v i-0704be6c --disable-api-termination false
21
+ ```
22
+
23
+ To view whether an attached volume is deleted when the machine is terminated:
24
+
25
+ ```
26
+ # show volumes that will be deleted
27
+ ec2-describe-volumes --filter "attachment.delete-on-termination=true"
28
+ ```
29
+
30
+ You can't (as far as I know) alter the delete-on-termination flag of a running volume. Crazy, huh?
31
+
32
+ ### EC2: See your userdata
33
+
34
+ curl http://169.254.169.254/latest/user-data
35
+
36
+ ### EBS Volumes for a persistent HDFS
37
+
38
+ * Make one volume and format for XFS:
39
+ `$ sudo mkfs.xfs -f /dev/sdh1`
40
+ * options "defaults,nouuid,noatime" give good results. The 'nouuid' part
41
+ prevents errors when mounting multiple volumes from the same snapshot.
42
+ * poke a file onto the drive :
43
+ datename=`date +%Y%m%d`
44
+ sudo bash -c "(echo $datename ; df /data/ebs1 ) > /data/ebs1/xfs-created-at-$datename.txt"
45
+
46
+
47
+ If you want to grow the drive:
48
+ * take a snapshot.
49
+ * make a new volume from it
50
+ * mount that, and run `sudo xfs_growfs`. You *should* have the volume mounted, and should stop anything that would be working the volume hard.
51
+
52
+ ### Hadoop: On-the-fly backup of your namenode metadata
53
+
54
+ bkupdir=/ebs2/hadoop-nn-backup/`date +"%Y%m%d"`
55
+
56
+ for srcdir in /ebs*/hadoop/hdfs/ /home/hadoop/gibbon/hdfs/ ; do
57
+ destdir=$bkupdir/$srcdir ; echo $destdir ;
58
+ sudo mkdir -p $destdir ;
59
+ done
60
+
61
+
62
+ ### NFS: Halp I am using an NFS-mounted /home and now I can't log in as ubuntu
63
+
64
+ Say you set up an NFS server 'core-homebase-0' (in the 'core' cluster) to host and serve out `/home` directory; and a machine 'awesome-webserver-0' (in the 'awesome' cluster), that is an NFS client.
65
+
66
+ In each case, when the machine was born EC2 created a `/home/ubuntu/.ssh/authorized_keys` file listing only the single approved machine keypair -- 'core' for the core cluster, 'awesome' for the awesome cluster.
67
+
68
+ When chef client runs, however, it mounts the NFS share at /home. This then masks the actual /home directory -- nothing that's on the base directory tree shows up. Which means that after chef runs, the /home/ubuntu/.ssh/authorized_keys file on awesome-webserver-0 is the one for the *'core'* cluster, not the *'awesome'* cluster.
69
+
70
+ The solution is to use the cookbook ironfan provides -- it moves the 'ubuntu' user's home directory to an alternative path not masked by the NFS.
71
+
72
+
73
+ ### NFS: Problems starting NFS server on ubuntu maverick
74
+
75
+ For problems starting NFS server on ubuntu maverick systems, read, understand and then run /tmp/fix_nfs_on_maverick_amis.sh -- See "this thread for more":http://fossplanet.com/f10/[ec2ubuntu]-not-starting-nfs-kernel-daemon-no-support-current-kernel-90948/
76
+
77
+
78
+ ### Git deploys: My git deploy recipe has gone limp
79
+
80
+ Suppose you are using the @git@ resource to deploy a recipe (@george@ for sake of example). If @/var/chef/cache/revision_deploys/var/www/george@ exists then *nothing* will get deployed, even if /var/www/george/{release_sha} is empty or screwy. If git deploy is acting up in any way, nuke that cache from orbit -- it's the only way to be sure.
81
+
82
+ $ sudo rm -rf /var/www/george/{release_sha} /var/chef/cache/revision_deploys/var/www/george
83
+
84
+ ### Runit services : 'fail: XXX: unable to change to service directory: file does not exist'
85
+
86
+ Your service is probably installed but removed from runit's purview; check the `/etc/service` symlink. All of the following should be true:
87
+
88
+ * directory `/etc/sv/foo`, containing file `run` and dirs `log` and `supervise`
89
+ * `/etc/init.d/foo` is symlinked to `/usr/bin/sv`
90
+ * `/etc/servics/foo` is symlinked tp `/etc/sv/foo`
91
+
92
+