ironfan 4.3.4 → 4.4.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (66) hide show
  1. data/CHANGELOG.md +7 -0
  2. data/ELB.md +121 -0
  3. data/Gemfile +1 -0
  4. data/Rakefile +4 -0
  5. data/VERSION +1 -1
  6. data/ironfan.gemspec +48 -3
  7. data/lib/chef/knife/cluster_launch.rb +5 -0
  8. data/lib/chef/knife/cluster_proxy.rb +3 -3
  9. data/lib/chef/knife/cluster_sync.rb +4 -0
  10. data/lib/chef/knife/ironfan_knife_common.rb +17 -6
  11. data/lib/chef/knife/ironfan_script.rb +29 -11
  12. data/lib/ironfan.rb +2 -2
  13. data/lib/ironfan/broker/computer.rb +8 -3
  14. data/lib/ironfan/dsl/ec2.rb +133 -2
  15. data/lib/ironfan/headers.rb +4 -0
  16. data/lib/ironfan/provider.rb +48 -3
  17. data/lib/ironfan/provider/ec2.rb +23 -8
  18. data/lib/ironfan/provider/ec2/elastic_load_balancer.rb +239 -0
  19. data/lib/ironfan/provider/ec2/iam_server_certificate.rb +101 -0
  20. data/lib/ironfan/provider/ec2/machine.rb +8 -0
  21. data/lib/ironfan/provider/ec2/security_group.rb +3 -5
  22. data/lib/ironfan/requirements.rb +2 -0
  23. data/notes/Home.md +45 -0
  24. data/notes/INSTALL-cloud_setup.md +103 -0
  25. data/notes/INSTALL.md +134 -0
  26. data/notes/Ironfan-Roadmap.md +70 -0
  27. data/notes/advanced-superpowers.md +16 -0
  28. data/notes/aws_servers.jpg +0 -0
  29. data/notes/aws_user_key.png +0 -0
  30. data/notes/cookbook-versioning.md +11 -0
  31. data/notes/core_concepts.md +200 -0
  32. data/notes/declaring_volumes.md +3 -0
  33. data/notes/design_notes-aspect_oriented_devops.md +36 -0
  34. data/notes/design_notes-ci_testing.md +169 -0
  35. data/notes/design_notes-cookbook_event_ordering.md +249 -0
  36. data/notes/design_notes-meta_discovery.md +59 -0
  37. data/notes/ec2-pricing_and_capacity.md +69 -0
  38. data/notes/ec2-pricing_and_capacity.numbers +0 -0
  39. data/notes/homebase-layout.txt +102 -0
  40. data/notes/knife-cluster-commands.md +18 -0
  41. data/notes/named-cloud-objects.md +11 -0
  42. data/notes/opscode_org_key.png +0 -0
  43. data/notes/opscode_user_key.png +0 -0
  44. data/notes/philosophy.md +13 -0
  45. data/notes/rake_tasks.md +24 -0
  46. data/notes/renamed-recipes.txt +142 -0
  47. data/notes/silverware.md +85 -0
  48. data/notes/style_guide.md +300 -0
  49. data/notes/tips_and_troubleshooting.md +92 -0
  50. data/notes/version-3_2.md +273 -0
  51. data/notes/walkthrough-hadoop.md +168 -0
  52. data/notes/walkthrough-web.md +166 -0
  53. data/spec/fixtures/ec2/elb/snakeoil.crt +35 -0
  54. data/spec/fixtures/ec2/elb/snakeoil.key +51 -0
  55. data/spec/integration/minimal-chef-repo/chefignore +41 -0
  56. data/spec/integration/minimal-chef-repo/environments/_default.json +12 -0
  57. data/spec/integration/minimal-chef-repo/knife/credentials/knife-org.rb +19 -0
  58. data/spec/integration/minimal-chef-repo/knife/credentials/knife-user-ironfantester.rb +9 -0
  59. data/spec/integration/minimal-chef-repo/knife/knife.rb +66 -0
  60. data/spec/integration/minimal-chef-repo/roles/systemwide.rb +10 -0
  61. data/spec/integration/spec/elb_build_spec.rb +95 -0
  62. data/spec/integration/spec_helper.rb +16 -0
  63. data/spec/integration/spec_helper/launch_cluster.rb +55 -0
  64. data/spec/ironfan/ec2/elb_spec.rb +95 -0
  65. data/spec/ironfan/ec2/security_group_spec.rb +0 -6
  66. metadata +60 -3
@@ -0,0 +1,200 @@
1
+ # Ironfan Core Concepts
2
+
3
+ <a name="TOC"></a>
4
+
5
+ * [Build your architecture from clusters of cooperating machines](#clusters)
6
+
7
+ * [Decoupled *Components* connect](#components)
8
+
9
+ * [Components *Announce* their capabilities](#announcements)
10
+
11
+ * [Announcements enable *Service Discovery*](#discovery)
12
+
13
+ * [Components announce cross-cutting *Aspects*](#aspects)
14
+
15
+ * [Aspects enable zero-conf *Amenities*](#amenities) -
16
+
17
+ * [Announcements effectively define a component's *Contract*](#contract)
18
+
19
+ * [Contracts enable zero-conf *specification testing*](#specs)
20
+
21
+ * [Specs + monitoring enable zero-conf *integration testing*](#ci)
22
+
23
+ * [Systems *Bind* to provisioned resources](#binding)
24
+
25
+ * [Binding declarations enable *Resource Sharing*](#resource-sharing)
26
+
27
+ <a name="overview"></a>
28
+ ### Overview
29
+
30
+ Ironfan is your system diagram come to life. In ironfan, you use Chef to assemble and configure components on each machine. Ironfan assembles those machines into clusters -- a group of machines united to provide an important service. For example, at Infochimps one cluster of machines serves the webpages for infochimps.com; another consists only of elasticsearch machines to power our API; and another runs the lightweight goliath proxies that implement our API. Our data scientists are able to spin up and shut down terabyte-scale hadoop clusters in minutes. All this is supported by an Ops team of one -- who spends most of his time hacking on Ironfan.
31
+
32
+ The powerful abstractions provided by Chef and Ironfan enables an autowiring system diagram, inevitable best practices in the form of "amenities", and a readable, testable contract for each component in the stack.
33
+
34
+ <a name="clusters"></a><a name="facets"></a>
35
+ ### Clusters and Facets
36
+
37
+ A `cluster`, as mentioned, groups a set of machines around a common purpose. Within that cluster, you define `facet`s: a set of servers with identical components (and nearly identical configuration).
38
+
39
+ For example, a typical web stack cluster might have these facets:
40
+
41
+ * `webnode`s: nginx reverse-proxies requests to a pool of unicorns running Rails
42
+ * `mysql`: one or many MySQL servers, with attached persistent storage
43
+ * `qmaster`s: a redis DB and resque front end to distribute batch-processing tasks
44
+ * `qworkers`s: resque worker processes
45
+
46
+ <a name="components"></a>
47
+ ### Components
48
+
49
+ As you can see, the details of a machine largely follow from the list its `component`s: `mysql_server`, `resque_dashboard`, and so forth. What's a component? If you would draw it in a box on your system diagram, want to discover it from elsewhere, or it it forms part of the contract for your machine, it's a component.
50
+
51
+ Some systems have more than one component: the `ganglia` monitoring system has a component named `agent` to gather operating metrics, and a component named `master` to aggregate those metrics.
52
+
53
+ Those examples all describe daemon processes that listen on ports, but component is more general that that -- it's any isolatable piece of functionality that is interesting to an outside consumer. Here is a set of example systems we'll refer to repeatedly:
54
+
55
+ * *Ganglia*, a distributed system monitoring tool. The `agent` components gather and exchange system metrics, and the `master` component aggregates them. A basic setup would run the `master` component on a single machine, and the `agent` component on many machines (including the master). In order to work, the master must discover all agents, and each agent must discover the master.
56
+
57
+ * *Elasticsearch* is a powerful distributed document database. A basic setup runs a single `server` component on each machine. Elasticsearch handles discovery, but needs a stable subset of them to declare as discovery `seed`s.
58
+
59
+ * *Nginx* is a fast, lightweight webserver (similar to apache). Its `server` component can proxy web requests for one or many web apps. Those apps register a `site` component, which defines the receiving address (public/private/local), how the app connects to nginx (socket, port, files).
60
+
61
+ * *Pig* is a Big Data analysis tool that works with Hadoop, Elasticsearch and more. It provides an executable, and imports jars from hadoop, elasticsearch and others.
62
+
63
+ <a name="announcements"></a>
64
+ ### Components *Announce* their capabilities
65
+
66
+ Notice the recurring patterns: *capabilities* (serve webpages, execute script, send metrics, answer queries), *handles* (ip+port, jars, swarm), *aspects* (ports, daemons, logs, files, dashboards).
67
+
68
+ The Silverware cookbook lets your services `announce` their capabilities and `discover` other resources.
69
+
70
+ Chef cookbooks describe the related components that form a system. You should always have a recipe, separate from the `default` recipe, that clearly corresponds to the component: the `ganglia` cookbook has `master` and `agent` recipes; the `pig` cookbook has `install_from_package` and `install_from_release` recipes. Those recipes are grouped together into Chef roles that encapsulate the component: the `elasticsearch_server` role calls the recipes to install the software, start the daemon process, and write the config files, each in the correct order.
71
+
72
+ Cookbooks do *not* bake in assumptions about their scale or about the machine they're on. The same Elasticsearch cookbook can deploy a tiny little search box to sit next to a web app, or one server in a distributed terabyte scale database.
73
+
74
+ <a name="discovery"></a>
75
+ ### Announcements enable *Service Discovery*
76
+
77
+ The `discover` and `discover_all` connect decoupled components. Your systems
78
+
79
+ * Don't care whether the discovered components are on the same machine, different machines, or a remote data center.
80
+ * Don't care about the number of underlying machines -- the whole thing might run on your laptop while developing, across a handful of nodes in staging, and on dozens of nodes in production.
81
+ * Don't necessarily care about the actual system -- your load balancer doesn't care whether it's nginx or apache or anything else, it just wants to discover the correct set of `webnode`s.
82
+
83
+ <a name="aspects"></a>
84
+ ### Components announce cross-cutting *Aspects*
85
+
86
+ Besides the component's capabilities, the announcement also describes its aspects: cross-cutting attributes common to many components.
87
+
88
+ * **log**: write data to a log file.
89
+ * **daemon**: long-running process. Can specify run state, resource bounds, etc.
90
+ * **port**: serves data over a port. Can specify the protocol, performance expectations, etc.
91
+ * **dashboard**: HTML, JMX, etc -- internal component metrics and control
92
+ * **executable**: executes scripts
93
+ * **export**: libraries, `jar`s, `conf` files, etc
94
+ * **consumes**: registered whenever you `discover` another component
95
+
96
+ <a name="amenities"></a>
97
+ ### Aspects enable zero-conf *Amenities*
98
+
99
+ Typically, consumers discover their provider, and the provider is unconcerned with which consumers it attends to. Ironfan lets you invert this pattern: decoupled `amenities` find components they can cater to.
100
+
101
+ * A log aspect would enable the following amenities
102
+ - `logrotated` to intelligently manage its logs
103
+ - `flume` to archive logs to a predictable location
104
+ - If the log is known to be an apache web log, a flume decorator can track rate and duration of requests and errors.
105
+ * A port aspect would enable
106
+ - zeroconf configuration of firewall and security groups
107
+ - remote monitors to regularly pinging the port for uptime and latency
108
+ - and pings the interfaces that it should *not* appear on to ensure the firewall is in place?
109
+
110
+ <a name="contracts"></a>
111
+ ### Announcements effectively define a component's *Contract*
112
+
113
+ The announcements that components make don’t just facilitate discovery. In a larger sense, they describe the external contract for the component.
114
+
115
+ When `nginx` announces that it listens on `node[:nginx][:http_port] = 80`, it is promising a capability (namely, that http requests to that port return certain results). When elasticsearch announces that it runs the `elasticsearch` daemon, it promised that the daemon will be running, with the right privileges, and not consuming more than its fair share of resources.
116
+
117
+ <a name="specs"></a>
118
+ ### Contracts enable zero-conf *Specification Testing*
119
+
120
+
121
+
122
+ * A daemon aspect
123
+ - implies a process should be running
124
+ - owned by the right user
125
+ - with a stable PID
126
+ - and live within defined memory bounds
127
+ * A log aspect
128
+ - should be open and receiving content from the process
129
+ - should contain lines showing successful startup (and not contain lines matching an error).
130
+ * A dashboard/JMX/metrics aspect:
131
+ - actual configuration settings as read out of the running app should match those drawn from the node attributes. No more finding out a setting was overridden by some hidden config file.
132
+ - should have a healthy heartbeat and status
133
+
134
+ [Ironfan-CI](http://github.com/infochimps-labs/ironfan-ci) uses the announcement to create a suite of detailed [Cucumber](http://cukes.info) (via [Cuken](https://github.com/hedgehog/cuken)) feature tests that document and enforce the machine's contract. You're not limited to just the zeroconf tests: it's easy to drop in additional cucumber specs.
135
+
136
+ Ironfan-CI is young -- it's for the tenacious zealot only -- but is the subject of current work and developing fast.
137
+
138
+ <a name="ci"></a>
139
+ ### Specs + Monitoring enable zero-conf *Full-stack Testing*
140
+
141
+ You can now look at monitoring as the equivalent of a full-stack continuous integration test suite. The same announcement that Ironfan-CI maps into cucumber statements can as well drive your favorite monitoring suite (or more likely, the monitoring suite you hate the least).
142
+
143
+ The Ironfan Enterprise product ships with Zabbix, which is actually pretty loveable -- even moreso when you don't have to perform fiddly repeated template definitions.
144
+
145
+ <a name="binding"></a>
146
+ ### Systems *Bind* to provisioned resources
147
+
148
+ Components should adapt to their machine, but be largely unaware of its defaul arrangement. One common anti-pattern we see in many cookbooks is to place data at some application-specific absolute path, to assume a certain layout of volumes.
149
+
150
+ When my grandmother comes to visit, she quite reasonably asks for a room with a comfortable bed and a short climb. This means that at my apartment, she stays in the main bedroom and I use the couch. At my brother's house, she stays in the downstairs guest room, while my brother and sister-in-law stay in their bedroom.
151
+
152
+ Suppose Grandmom instead always chose 'the master bedroom on the first floor' no matter how the house was set up. At my apartment, she'd find herself in the parking garage. At my brother's house, she'd find herself in a crowded bed and uninvited from returning to visit.
153
+
154
+ Similarly, the well-mannered cookbook does not hard-code a large data directory onto the root partition. The root drive is the private domain of the operating system; typically, there's a large and comfortably-appointed volume just for it to use. On the other hand, hard-coding a location of `/mnt/external2` will end in tears if I'm testing the cookbook on my laptop, where no such drive exists.
155
+
156
+ The solution is to request for volumes by their characteristics, and defer to the machine's best effort in meeting that request.
157
+
158
+ # Data striped across all persistent dirs
159
+ volume_dirs('foo.datanode.data') do
160
+ type :persistent, :bulk, :fallback
161
+ selects :all
162
+ mode "0700"
163
+ end
164
+
165
+ # Scratch space for indexing, striped across all scratch dirs
166
+ volume_dirs('foo.indexer.journal') do
167
+ type :fast, local, :bulk, :fallback
168
+ selects :first
169
+ mode "0755"
170
+ end
171
+
172
+ Another example of this is binding to a network interface. Unfortunately most cookbooks choose the primary address; most of ours choose the 'private' interface if any and fall back to the primary.
173
+
174
+ The right pattern here is
175
+ * provisioners tag resources
176
+ * cookbooks to request the best match to their purpose
177
+ * at the cookbook's option, if no good match is found use a fallback or raise an exception
178
+
179
+ <a name="resource-sharing"></a>
180
+ ### Binding declarations enable *Resource Sharing*
181
+
182
+ Resource sharing is yet another place where an assertive announcement can enable best practices.
183
+
184
+ Right now, most java-based components hard-code a default JVM heap size. This can lead to a situation where a component shows up on a 16GB machine with 1GB heap allocated, or where five components show up on a 0.7GB machine each with 1GB allocated.
185
+
186
+ We instead deserve a deft but highly predictable way to apportion resources (disks, ram, etc). Nothing that gets in the way of explicit tuning, but one which gives a reasonable result in the default case.
187
+
188
+ The Hadoop cookbook has an initial stab at this, but for the most part Resource Sharing is on the roadmap but not yet in place.
189
+
190
+
191
+ __________________________________________________________________________
192
+
193
+ ### Learn More
194
+
195
+ [Aspect-Oriented Programming](http://en.wikipedia.org/wiki/Aspect-oriented_programming): The Ironfan concept of `aspects` as cross-cutting concerns is taken from AOP. Amenities don't correspond precisely to join cuts etc., so don't take the analogy too far. (Or perhaps instead help us understand how to take the analogy the rest of the way.)
196
+
197
+
198
+ Ironfan's primary models form a component-based approach to building a [Service-Oriented Architecture](http://msdn.microsoft.com/en-us/library/aa480021.aspx). Model examples of a modern SOA include the [Netflix API](http://www.slideshare.net/danieljacobson/the-futureofnetflixapi) (see [also](http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html)) and [Postrank](http://www.igvita.com/2011/03/08/goliath-non-blocking-ruby-19-web-server/) (see [also](http://www.igvita.com/2010/01/28/cluster-monitoring-with-ganglia-ruby/)).
199
+
200
+
@@ -0,0 +1,3 @@
1
+
2
+ Please see the [README from the volumes cookbook](https://github.com/infochimps-labs/ironfan-pantry/blob/master/cookbooks/volumes/README.md) for more information.
3
+
@@ -0,0 +1,36 @@
1
+
2
+ Examples of concerns that tend to be crosscutting include:
3
+
4
+ Synchronization -- (declare an action dependency, trigger, event)
5
+ Real-time constraints
6
+ Feature interaction
7
+ Memory management
8
+ - data checks
9
+ - feature checks
10
+ * security
11
+ - firewall rules
12
+ - access control
13
+ Logging
14
+ Monitoring
15
+ Business rules
16
+ Tuning
17
+ Refactor pivot
18
+
19
+
20
+ AOP:
21
+
22
+ - Scattered (1:n) / Tangled (n:1)
23
+ - join point: hook
24
+ - point cut: matches join points
25
+ - advice: behavior evoked at point cut
26
+
27
+ * Interception
28
+ - Interjection of advice, at least around methods.
29
+ * Introduction
30
+ - Enhancing with new (orthogonal!) state and behavior .
31
+ * Inspection
32
+ - Access to meta-information that may be exploited by pointcuts or
33
+ advice.
34
+ * Modularization
35
+ - Encapsulate as aspects.
36
+
@@ -0,0 +1,169 @@
1
+
2
+
3
+ https://github.com/acrmp/chefspec
4
+
5
+
6
+ pre-testing -- converge machine
7
+ https://github.com/acrmp/chefspec
8
+
9
+ http://wiki.opscode.com/display/chef/Knife#Knife-test
10
+
11
+ benchmarks
12
+
13
+ bonnie++
14
+ hdparm -t
15
+ iozone
16
+
17
+
18
+ in-machine
19
+
20
+ * x ports on x interfaces open
21
+ * daemon is running
22
+ * file exists and has string
23
+
24
+ * log file is accumulating lines at rate X
25
+ * script x runs successfully
26
+
27
+ in-chef
28
+
29
+ * runlist is X
30
+ * chef attribute X should be Y
31
+
32
+ meta
33
+
34
+ * chef run was idempotent
35
+
36
+
37
+
38
+
39
+
40
+ __________________________________________________________________________
41
+
42
+ ## Notes from around the web
43
+
44
+
45
+ * ...
46
+
47
+ > I'm thinking that the useful thing to test is NOT did chef install
48
+ > some package or setup a user, but rather after chef has run can I
49
+ > interact with the system as I would expect from an external
50
+ > perspective. For example:
51
+ >
52
+ > * Is the website accessible?
53
+ > * Are unused ports blocked?
54
+ > * When I send an email thorough the website does it end up in my inbox?
55
+ >
56
+ > Capybara (http://github.com/jnicklas/capybara) enforces this external
57
+ > perspective for webapp testing:
58
+ >
59
+ > "Access to session, request and response from the test is not
60
+ > possible. Maybe we’ll do response headers at some point in the future,
61
+ > but the others really shouldn’t be touched in an integration test
62
+ > anyway. "
63
+ >
64
+ > They only let you interact with screen elements that a user could
65
+ > interact with. It makes sense because the things that users interact
66
+ > with are what provides the business value
67
+
68
+ * Andrew Shafer < andrew@cloudscaling.com>
69
+
70
+ > Here's my thinking at this point... which could be wrong on every level.
71
+ > There is really no good way to TDD/BDD configuration management for several
72
+ > reasons:
73
+ > The recipes are already relatively declarative
74
+ > Mocking is useless because it may not reflect 'ground truth'
75
+ > The cycle times to really test convergence are relatively long
76
+ > Trying to test if a package is installed or not is testing the framework,
77
+ > not the recipe IMHO.
78
+ > I agree with the general sentiment that the functional service is the true
79
+ > test.
80
+ > I'm leaning towards 'testing' at that level, ideally with (a superset of?)
81
+ > what should be used for the production monitoring system.
82
+ > So the CI builds services, runs all the checks in test, green can go live
83
+ > and that's that.
84
+
85
+
86
+ * Jeremy Deininger < jeremy@rightscale.com>
87
+
88
+ > Thought I'd chime in with my experience testing system configuration code @ RightScale so far. What we've been building are integration style cucumber tests to run a cookbook through it's paces on all platforms and OSs that we support.
89
+ > First we use our API to spin up 'fresh' server clusters in EC2, one for every platform/OS (variation) that the cookbook will be supporting. The same could be done using other cloud APIs (anyone else doing this with VMware or etc?) Starting from scratch is important because of chef's idempotent nature.
90
+ > Then a cucumber test is run against every variation in parallel. The cucumber test runs a series of recipes on the cluster then uses what we call 'spot checks' to ensure the cluster is configured and functional. The spot checks are updated when we find a bug, to cover the bug. An example spot check would be, sshing to every server and checking the mysql.err file for bad strings.
91
+ > These high level integration tests are long running but have been very useful flushing out bugs.
92
+ > ...
93
+ > If you stop by the #rightscale channel on Freenode I'd be happy to embarrass myself by giving you a sneak peak at the features etc.. Would love to bounce ideas around and collaborate if you're interested. jeremydei on Freenode IRC
94
+
95
+ Ranjib Dey < ranjibd@th...s.com>
96
+
97
+ > So far, what we've done for testing is to use rspec for implementing tests. Here's an example:
98
+ >
99
+ > it "should respond on port 80" do
100
+ > lambda {
101
+ > TCPSocket.open(@server, 'http')
102
+ > }.should_not raise_error
103
+ > end
104
+ >
105
+ > Before running the tests, I have to manually bootstrap a node using knife. If my instance is the only one in its environment, the spec can find it using knife's search feature. The bootstrap takes a few minutes, and the 20 or so tests take about half a minute to run.
106
+ >
107
+ > While I'm iteratively developing a recipe, my work cycle is to edit source, upload a cookbook, and rerun chef-client (usually by rerunning knife boostrap, because the execution environment is different from invoking chef-client directly). This feels a bit slower than the cycle I'm used to when coding in Ruby because of the upload and bootstrap steps.
108
+ >
109
+ > I like rspec over other testing tools because of how it generates handy reports, such as this one, which displays an English list of covered test cases:
110
+ >
111
+ > $ rspec spec/ -f doc
112
+ >
113
+ > Foo role
114
+ > should respond on port 80
115
+ > should run memcached
116
+ > should accept memcached connections
117
+ > should have mysql account
118
+ > should allow passwordless sudo to user foo as user bar
119
+ > should allow passwordless sudo to root as a member of sysadmin
120
+ > should allow key login as user bar
121
+ > should mount homedirs on ext4, not NFS
122
+ > should rotate production.log
123
+ > should have baz as default vhost
124
+ > ...
125
+ >
126
+ > That sample report also gives a feel for sort of things we check. So far, nearly all of our checks are non-intrusive enough to run on a production system. (The exception is testing of local email delivery configurations.)
127
+ >
128
+ > Areas I'd love to see improvement:
129
+ >
130
+ > * Shortening the edit-upload-bootstrap-test cycle
131
+ > * Automating the bootstrap in the context of testing
132
+ > * Adding rspec primitives for Chef-related testing, which might
133
+ > include support for multiple platforms
134
+ >
135
+ > As an example of rspec primitives, instead of:
136
+ >
137
+ > it "should respond on port 80" do
138
+ > lambda {
139
+ > TCPSocket.open(@server, 'http')
140
+ > }.should_not raise_error
141
+ > end
142
+ >
143
+ > I'd like to write:
144
+ >
145
+ > it { should respond_on_port(80) }
146
+ >
147
+ > Rspec supports the the syntactic sugar; it's just a matter of adding some "matcher" plugins.
148
+ >
149
+ > How do other chef users verify that recipes work as expected?
150
+ >
151
+ > I'm not sure how applicable my approach is to opscode/cookbooks because it relies on having a specific server configuration and can only test a cookbook in the context of that single server. If we automated the boostrap step so it could be embedded into the rspec setup blocks, it would be possible to test a cookbook in several sample contexts, but the time required to setup each server instance might be prohibitive.
152
+ >
153
+
154
+
155
+ Andrew Crump < acrump@gmail.com>
156
+
157
+ > Integration tests that exercise the service you are building definitely give you the most bang for buck.
158
+ >
159
+ > We found the feedback cycle slow as well so I wrote chefspec which builds on RSpec to support unit testing cookbooks:
160
+ >
161
+ > https://github.com/acrmp/chefspec
162
+ >
163
+ > This basically fakes a convergence and allows you to make assertions about the created resources. At first glance Chef's declarative nature makes this less useful, but once you start introducing conditional execution I've found this to be a time saver.
164
+ >
165
+ > If you're looking to do CI (which you should be) converging twice goes some way to verifying that your recipes are idempotent.
166
+ >
167
+ > knife cookbook test is a useful first gate for CI:
168
+ >
169
+ > http://wiki.opscode.com/display/chef/Knife#Knife-test
@@ -0,0 +1,249 @@
1
+ # Cookbook event ordering
2
+
3
+
4
+ Most cookbooks have some set of the following
5
+
6
+ * base configuration
7
+ * announce component
8
+ - before discovery so it can be found
9
+ - currently done in converge stage -- some aspects might be incompletely defined?
10
+
11
+ * register apt repository if any
12
+ * create daemon user
13
+ - before directories so we can set permissions
14
+ - before package install so uid is stable
15
+ * install, as package, git deploy or install from release
16
+ - often have to halt legacy services -- config files don't exist
17
+ * create any remaining directories
18
+ - after package install so it has final say
19
+ * install plugins
20
+ - after directories, before config files
21
+ * define service
22
+ - before config file creation so we can notify
23
+ - can't start it yet because no config files
24
+ * discover component (own or other, same or other machines)
25
+ * write config files (notify service of changes)
26
+ - must follow everything so info is current
27
+ * register a minidash dashboard
28
+ * trigger start (or restart) of service
29
+
30
+ ## Proposal:
31
+
32
+ kill `role`s, in favor of `stack`s.
33
+
34
+ A runlist is assembled in the following phases:
35
+
36
+ * `initial` configuration
37
+ * `before_install`
38
+ - `account`
39
+ * `install`
40
+ - `plugin` install
41
+ - `directories`
42
+ * `announcement`
43
+ - `service` definition
44
+ - `discovery`
45
+ * `commit`
46
+ - `config_files`: write config files to disk
47
+ * `finalize`
48
+ * `launch`
49
+
50
+ As you can see, most layers have semantic names (`plugin`, `user`, etc); if you name your recipe correctly, they will be assigned to the correct phase. Otherwise, you can attach them explicitly to a semantic or non-semantic phase.
51
+
52
+ elasticsearch_datanode component:
53
+
54
+ elasticsearch::default # initial
55
+ elasticsearch::install_from_release #install phase
56
+ elasticsearch::plugins # plugin phase
57
+
58
+ elasticsearch::server # service definition; includes announcement
59
+ elasticsearch::config_files # config_files; includes discovery
60
+
61
+ I'm not clear on how much the phases should be strictly broken out into one-recipe-per-phase.
62
+
63
+ It's also possible to instead define a chef resource that let you defer a block to a given phase from within a callback. It would be similar to a ruby block, but have more explicit timing. This may involve jacking around inside Chef, something we've avoided to now.
64
+
65
+ __________________________________________________________________________
66
+
67
+ Run List is
68
+
69
+ [role[systemwide], role[chef_client], role[ssh], role[nfs_client],
70
+ role[volumes], role[package_set], role[org_base], role[org_users],
71
+
72
+ role[hadoop],
73
+ role[hadoop_s3_keys],
74
+ role[cassandra_server], role[zookeeper_server],
75
+ role[flume_master], role[flume_agent],
76
+ role[ganglia_master],
77
+ role[ganglia_agent], role[hadoop_namenode], role[hadoop_datanode],
78
+ role[hadoop_jobtracker], role[hadoop_secondarynn], role[hadoop_tasktracker],
79
+ role[hbase_master], role[hbase_regionserver], role[hbase_stargate],
80
+ role[redis_server], role[mysql_client], role[redis_client],
81
+ role[cassandra_client], role[elasticsearch_client], role[jruby], role[pig],
82
+ recipe[ant], recipe[bluepill], recipe[boost], recipe[build-essential],
83
+ recipe[cron], recipe[git], recipe[hive], recipe[java::sun], recipe[jpackage],
84
+ recipe[jruby], recipe[nodejs], recipe[ntp], recipe[openssh], recipe[openssl],
85
+ recipe[rstats], recipe[runit], recipe[thrift], recipe[xfs], recipe[xml],
86
+ recipe[zabbix], recipe[zlib], recipe[apache2], recipe[nginx],
87
+ role[el_ridiculoso_cluster], role[el_ridiculoso_gordo], role[minidash],
88
+ role[org_final], recipe[hadoop_cluster::config_files], role[tuning]]
89
+
90
+ Run List expands to
91
+
92
+ build-essential, motd, zsh, emacs, ntp, nfs, nfs::client, xfs,
93
+ volumes::mount, volumes::resize, package_set,
94
+
95
+ hadoop_cluster,
96
+ hadoop_cluster::minidash,
97
+
98
+ cassandra, cassandra::install_from_release,
99
+ cassandra::autoconf, cassandra::server, cassandra::jna_support,
100
+ cassandra::config_files, zookeeper::default, zookeeper::server,
101
+ zookeeper::config_files, flume, flume::master, flume::agent,
102
+ flume::jruby_plugin, flume::hbase_sink_plugin, ganglia, ganglia::server,
103
+ ganglia::monitor,
104
+
105
+ hadoop_cluster::namenode,
106
+ hadoop_cluster::datanode,
107
+ hadoop_cluster::jobtracker,
108
+ hadoop_cluster::secondarynn,
109
+ hadoop_cluster::tasktracker,
110
+
111
+ zookeeper::client,
112
+ hbase::master,
113
+ hbase::minidash,
114
+
115
+ minidash::server,
116
+ hbase::regionserver,
117
+ hbase::stargate,
118
+ redis, redis::install_from_release, redis::server,
119
+ mysql, mysql::client,
120
+ cassandra::client, elasticsearch::default,
121
+
122
+ elasticsearch::install_from_release,
123
+ elasticsearch::plugins,
124
+ elasticsearch::client,
125
+
126
+ jruby, jruby::gems,
127
+
128
+ pig,
129
+ pig::install_from_package,
130
+ pig::piggybank,
131
+ pig::integration,
132
+
133
+ zookeeper,
134
+ ant, bluepill, boost, cron,
135
+ git, hive,
136
+ java::sun, jpackage, nodejs, openssh, openssl, rstats, runit,
137
+ thrift, xml, zabbix, zlib, apache2, nginx, hadoop_cluster::config_files,
138
+ tuning::default
139
+
140
+
141
+ From an actual run of el_ridiculoso-gordo:
142
+
143
+ 2 nfs::client
144
+ 1 java::sun
145
+ 1 aws::default
146
+ 5 build-essential::default
147
+ 3 motd::default
148
+ 2 zsh::default
149
+ 1 emacs::default
150
+ 8 ntp::default
151
+ 1 nfs::default
152
+ 2 nfs::client
153
+ 3 xfs::default
154
+ 46 package_set::default
155
+ 4 java::sun
156
+ 8 tuning::ubuntu
157
+ 6 apt::default
158
+ 1 hadoop_cluster::add_cloudera_repo
159
+ 44 hadoop_cluster::default
160
+ 4 minidash::default
161
+ 2 /srv/chef/file_store/cookbooks/minidash/providers/dashboard.rb
162
+ 1 hadoop_cluster::minidash
163
+ 2 /srv/chef/file_store/cookbooks/minidash/providers/dashboard.rb
164
+ 1 boost::default
165
+ 2 python::package
166
+ 2 python::pip
167
+ 1 python::virtualenv
168
+ 2 install_from::default
169
+ 7 thrift::default
170
+ 9 cassandra::default
171
+ 1 cassandra::install_from_release
172
+ 6 /srv/chef/file_store/cookbooks/install_from/providers/release.rb
173
+ 3 cassandra::install_from_release
174
+ 1 cassandra::bintools
175
+ 3 runit::default
176
+ 11 cassandra::server
177
+ 2 cassandra::jna_support
178
+ 2 cassandra::config_files
179
+ 6 zookeeper::default
180
+ 15 zookeeper::server
181
+ 3 zookeeper::config_files
182
+ 18 flume::default
183
+ 2 flume::master
184
+ 3 flume::agent
185
+ 2 flume::jruby_plugin
186
+ 1 flume::hbase_sink_plugin
187
+ 21 ganglia::server
188
+ 20 ganglia::monitor
189
+ 13 hadoop_cluster::namenode
190
+ 11 hadoop_cluster::datanode
191
+ 11 hadoop_cluster::jobtracker
192
+ 11 hadoop_cluster::secondarynn
193
+ 11 hadoop_cluster::tasktracker
194
+ 14 hbase::default
195
+ 11 hbase::master
196
+ 1 hbase::minidash
197
+ 2 /srv/chef/file_store/cookbooks/minidash/providers/dashboard.rb
198
+ 13 minidash::server
199
+ 11 hbase::regionserver
200
+ 10 hbase::stargate
201
+ 2 redis::default
202
+ 2 redis::install_from_release
203
+ 2 redis::default
204
+ 16 redis::server
205
+ 3 mysql::client
206
+ 1 aws::default
207
+ 7 elasticsearch::default
208
+ 1 elasticsearch::install_from_release
209
+ 6 /srv/chef/file_store/cookbooks/install_from/providers/release.rb
210
+ 2 elasticsearch::plugins
211
+ 3 elasticsearch::client
212
+ 3 elasticsearch::config
213
+ 1 jruby::default
214
+ 9 /srv/chef/file_store/cookbooks/install_from/providers/release.rb
215
+ 7 jruby::default
216
+ 18 jruby::gems
217
+ 2 pig::install_from_package
218
+ 5 pig::piggybank
219
+ 8 pig::integration
220
+ 6 zookeeper::default
221
+ 3 ant::default
222
+ 5 bluepill::default
223
+ 2 cron::default
224
+ 1 git::default
225
+ 1 hive::default
226
+ 2 nodejs::default
227
+ 4 openssh::default
228
+ 7 rstats::default
229
+ 1 xml::default
230
+ 11 zabbix::default
231
+ 1 zlib::default
232
+ 10 apache2::default
233
+ 2 apache2::mod_status
234
+ 2 apache2::mod_alias
235
+ 1 apache2::mod_auth_basic
236
+ 1 apache2::mod_authn_file
237
+ 1 apache2::mod_authz_default
238
+ 1 apache2::mod_authz_groupfile
239
+ 1 apache2::mod_authz_host
240
+ 1 apache2::mod_authz_user
241
+ 2 apache2::mod_autoindex
242
+ 2 apache2::mod_dir
243
+ 1 apache2::mod_env
244
+ 2 apache2::mod_mime
245
+ 2 apache2::mod_negotiation
246
+ 2 apache2::mod_setenvif
247
+ 1 apache2::default
248
+ 8 nginx::default
249
+ 9 hadoop_cluster::config_files