bigrecord 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,6 +1,7 @@
1
1
  = Big Record
2
2
 
3
- A Ruby Object/Data Mapper for distributed column-oriented data stores (inspired by BigTable) such as HBase. Intended to work as a drop-in for Rails applications.
3
+ A Ruby Object/Data Mapper for distributed column-oriented data stores (inspired by BigTable) such as HBase. Intended
4
+ to work as a drop-in for Rails applications.
4
5
 
5
6
  == Features
6
7
  * Dynamic schemas (due to the schema-less design of BigTable).
@@ -12,22 +13,33 @@ A Ruby Object/Data Mapper for distributed column-oriented data stores (inspired
12
13
 
13
14
  == Motivations
14
15
 
15
- BigTable, and by extension, Bigrecord isn't right for everyone. A great introductory article discussing this topic can be found at http://blog.rapleaf.com/dev/?p=26 explaining why you would or wouldn't use BigTable. The rule of thumb, however, is that if your data model is simple or can fit into a standard RDBMS, then you probably don't need it.
16
+ BigTable, and by extension, Bigrecord isn't right for everyone. A great introductory article discussing this topic can
17
+ be found at http://blog.rapleaf.com/dev/?p=26 explaining why you would or wouldn't use BigTable. The rule of thumb,
18
+ however, is that if your data model is simple or can fit into a standard RDBMS, then you probably don't need it.
16
19
 
17
20
  Beyond this though, there are two basic motivations that almost immediately demand a BigTable model database:
18
- 1. Your data is highly dynamic in nature and would not fit in a schema bound model, or you cannot define a schema ahead of time.
19
- 2. You know that your database will grow to tens or hundreds of gigabytes, and can't afford big iron servers. Instead, you'd like to scale horizontally across many commodity servers.
21
+ 1. Your data is highly dynamic in nature and would not fit in a schema bound model, or you cannot define a schema ahead
22
+ of time.
23
+ 2. You know that your database will grow to tens or hundreds of gigabytes, and can't afford big iron servers. Instead,
24
+ you'd like to scale horizontally across many commodity servers.
20
25
 
21
- == Requirements
26
+ == Components
22
27
 
23
- * Big Record: Ruby Object/Data Mapper. Inspired and architected similarly to Active Record.
24
- * Big Record Driver: JRuby application that bridges Ruby and Java (through JRuby's Drb protocol) to interact with Java-based data stores and their native APIs. Required for HBase and Cassandra. This application can be run from a separate server than your Rails application.
25
- * JRuby 1.1.6+ is needed to run Big Record Driver.
26
- * Any other requirements needed to run Hadoop, HBase or your data store of choice.
28
+ * Bigrecord: Ruby Object/Data Mapper. Inspired and architected similarly to Active Record.
27
29
 
28
- == Optional Requirements
30
+ == Optional Component
29
31
 
30
- * Big Index (highly recommended): Due to the nature of Big Table data stores, some limitations occur while using Big Record standalone when compared to Active Record. Some major limitations include the inability to query for data other than with the row ID, indexing, searching, and dynamic finders (find_by_attribute_name). Since these data access patterns are vital for most Rails applications to function, Big Index was created to address these issues, and bring the feature set more up to par with Active Record. Please refer to the <tt>Big Index</tt> package for more information and its requirements.
32
+ * Bigrecord Driver: Consists of a JRuby server component that bridges Ruby and Java (through the DRb protocol) to
33
+ interact with Java-based data stores and their native APIs. Clients that connect to the DRb server can be of any Ruby
34
+ type (JRuby, MRI, etc). Currently, this is used only for HBase to serve as a connection alternative to Thrift or
35
+ Stargate. This application can be run from a separate server than your Rails application.
36
+
37
+ * Bigindex [http://github.com/openplaces/bigindex]: Due to the nature of BigTable databases, some limitations are
38
+ present while using Bigrecord standalone when compared to Active Record. Some major limitations include the inability
39
+ to query for data other than with the row ID, indexing, searching, and dynamic finders (find_by_attribute_name). Since
40
+ these data access patterns are vital for most Rails applications to function, Bigindex was created to address these
41
+ issues, and bring the feature set more up to par with Active Record. Please refer to the <tt>Bigindex</tt> package for
42
+ more information and its requirements.
31
43
 
32
44
  == Getting Started
33
45
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.1.1
@@ -2,11 +2,15 @@
2
2
 
3
3
  == Data store information
4
4
 
5
- The default settings for the Bigrecord specs can be found at spec/connections/bigrecord.yml with each environment broken down by the data store type (Hbase and Cassandra at the time of writing). These are the minimal settings required to connect to each data store, and should be modified freely to reflect your own system configurations.
5
+ The default settings for the Bigrecord specs can be found at spec/connections/bigrecord.yml with each environment
6
+ broken down by the data store type (Hbase and Cassandra at the time of writing). These are the minimal settings
7
+ required to connect to each data store, and should be modified freely to reflect your own system configurations.
6
8
 
7
9
  == Data store migration
8
10
 
9
- There are also migrations to create the necessary tables for the specs to run. To ensure migrations are functioning properly before actually running the migrations, you can run: spec spec/unit/migration_spec.rb. Alternatively, you can manually create the tables according to the migration files under: spec/lib/migrations
11
+ There are also migrations to create the necessary tables for the specs to run. To ensure migrations are functioning
12
+ properly before actually running the migrations, you can run: spec spec/unit/migration_spec.rb. Alternatively, you
13
+ can manually create the tables according to the migration files under: spec/lib/migrations
10
14
 
11
15
  Migrations have their own log file for debugging purposes. It's created under: bigrecord/migrate.log
12
16
 
@@ -31,6 +35,8 @@ To run a specific spec, you can run the following command from the bigrecord roo
31
35
 
32
36
  == Debugging
33
37
 
34
- If any problems or failures arise during the unit tests, please refer to the log files before submitting it as an issue. Often, it's a simple matter of forgetting to turn on BigrecordDriver, the tables weren't created, or configurations weren't set properly.
38
+ If any problems or failures arise during the unit tests, please refer to the log files before submitting it as an
39
+ issue. Often, it's a simple matter of forgetting to turn on BigrecordDriver, the tables weren't created, or
40
+ configurations weren't set properly.
35
41
 
36
42
  The log file for specs is created under: <bigrecord root>/spec/debug.log
@@ -0,0 +1,65 @@
1
+ == Setting up Cassandra
2
+
3
+ To quickly get started with development, you can set up Cassandra to run as a single node cluster on your local system.
4
+
5
+ (1) Download and unpack the most recent release of Cassandra from http://cassandra.apache.org/download/
6
+
7
+ (2) Add a <Keyspace></Keyspace> entry into your (cassandra-dir)/conf/storage-conf.xml configuration file named after
8
+ your application, and create <ColumnFamily> entries corresponding to each model you wish to add. The following is an
9
+ example of the Bigrecord keyspace used to run the spec suite against:
10
+
11
+ <Keyspace Name="Bigrecord">
12
+ <ColumnFamily Name="animals" CompareWith="UTF8Type" />
13
+ <ColumnFamily Name="books" CompareWith="UTF8Type" />
14
+ <ColumnFamily Name="companies" CompareWith="UTF8Type" />
15
+ <ColumnFamily Name="employees" CompareWith="UTF8Type" />
16
+ <ColumnFamily Name="novels" CompareWith="UTF8Type" />
17
+ <ColumnFamily Name="zoos" CompareWith="UTF8Type" />
18
+
19
+ <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
20
+
21
+ <ReplicationFactor>1</ReplicationFactor>
22
+
23
+ <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
24
+ </Keyspace>
25
+
26
+ You can also see {file:guides/storage-conf.rdoc guides/storage-conf.rdoc} for an example of a full configuration. More
27
+ documentation on setting up Cassandra can be found at http://wiki.apache.org/cassandra/GettingStarted
28
+
29
+ (3) Install the Cassandra Rubygem:
30
+
31
+ $ [sudo] gem install cassandra
32
+
33
+ (4) Start up Cassandra:
34
+ $ (cassandra-dir)/bin/cassandra -f
35
+
36
+
37
+ == Setting up Bigrecord
38
+
39
+ (1) Add the following line into the Rails::Initializer.run do |config| block:
40
+
41
+ config.gem "bigrecord", :source => "http://gemcutter.org"
42
+
43
+ and run the following command to install all the gems listed for your Rails app:
44
+
45
+ [sudo] rake gems:install
46
+
47
+ (2) Bootstrap Bigrecord into your project:
48
+
49
+ script/generate bigrecord
50
+
51
+ (3) Edit the config/bigrecord.yml[.sample] file in your Rails root to the information corresponding to your Cassandra
52
+ install (keyspace should correspond to the one you defined in step 2 of "Setting up Cassandra" above):
53
+
54
+ development:
55
+ adapter: cassandra
56
+ keyspace: Bigrecord
57
+ servers: localhost:9160
58
+ production:
59
+ adapter: cassandra
60
+ keyspace: Bigrecord
61
+ servers:
62
+ - server1:9160
63
+ - server2:9160
64
+
65
+ Note: 9160 is the default port for Cassandra's Thrift server.
@@ -1,15 +1,22 @@
1
- = Deploying Big Record
1
+ = Deploying Big Record with HBase
2
2
 
3
- Stargate is a new implementation for HBase's web service front-end, and as such, is not currently recommended for deployment.
3
+ Stargate is a new implementation for HBase's web service front-end, and as such, is not currently recommended for
4
+ deployment.
4
5
 
5
- We here at Openplaces have developed Bigrecord Driver, which uses JRuby to interact with HBase via the native Java API and connect to Bigrecord through the DRb protocol. This method is slightly more complicated to setup, but preliminary benchmarks show that it runs faster (especially for scanner functionality).
6
+ We here at Openplaces have developed Bigrecord Driver, which uses JRuby to interact with HBase via the native
7
+ Java API and connect to Bigrecord through the DRb protocol. This method is slightly more complicated to setup,
8
+ but preliminary benchmarks show that it runs faster (especially for scanner functionality).
6
9
 
7
10
  == Instructions
8
- * Your database should already be set up (please refer to the database's own documentation) with the required information known such as the zookeeper quorum/port, etc. in order for Bigrecord to connect to it.
11
+ * Your database should already be set up (please refer to the database's own documentation) with the required
12
+ information known such as the zookeeper quorum/port, etc. in order for Bigrecord to connect to it.
13
+
9
14
  * Bigrecord Driver (if your database requires it for connecting)
10
15
  * JRuby 1.1.6+ is needed to run Bigrecord Driver.
11
16
 
12
- Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please refer the Bigrecord Driver documentation for more detailed instructions. (http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
17
+ Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please refer the Bigrecord Driver
18
+ documentation for more detailed instructions.
19
+ (http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
13
20
 
14
21
  Edit your bigrecord.yml config file as follows:
15
22
 
@@ -1,50 +1,10 @@
1
1
  = Getting Started
2
2
 
3
+ == Install HBase or Cassandra
3
4
 
4
- == Setting up HBase and Stargate
5
-
6
- To quickly get started with development, you can set up HBase to run as a single server on your local computer, along with Stargate, its RESTful web service front-end.
7
-
8
- (1) Download and unpack the most recent release of HBase from http://hadoop.apache.org/hbase/releases.html#Download
9
-
10
- (2) Edit (hbase-dir)/conf/hbase-env.sh and uncomment/modify the following line to correspond to your Java home path:
11
- export JAVA_HOME=/usr/lib/jvm/java-6-sun
12
-
13
- (3) Copy (hbase-dir)/contrib/stargate/hbase-<version>-stargate.jar into <hbase-dir>/lib
14
-
15
- (4) Copy all the files in the (hbase-dir)/contrib/stargate/lib folder into <hbase-dir>/lib
16
-
17
- (5) Start up HBase:
18
- $ (hbase-dir)/bin/start-hbase.sh
19
-
20
- (6)Start up Stargate (append "-p 1234" at the end if you want to change the port):
21
- $ (hbase-dir)/bin/hbase org.apache.hadoop.hbase.stargate.Main
22
-
23
-
24
- == Setting up Bigrecord
25
-
26
- (1) Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please see the Bigrecord Driver documentation for more detailed instructions. (http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
27
-
28
- (2) Add the following line into the Rails::Initializer.run do |config| block:
29
-
30
- config.gem "bigrecord", :source => "http://gemcutter.org"
31
-
32
- and run the following command to install all the gems listed for your Rails app:
33
-
34
- [sudo] rake gems:install
35
-
36
- (3) Bootstrap Bigrecord into your project:
37
-
38
- script/generate bigrecord
39
-
40
- (4) Edit the config/bigrecord.yml[.sample] file in your Rails root to the information corresponding to the Stargate server.
41
-
42
- development:
43
- adapter: hbase_rest
44
- api_address: http://localhost:8080
45
-
46
- Note: 8080 is the default port that Stargate starts up on. Make sure you modify this if you changed the port from the default.
5
+ * HBase: {file:guides/hbase_install.rdoc guides/hbase_install.rdoc}
47
6
 
7
+ * Cassandra: {file:guides/cassandra_install.rdoc guides/cassandra_install.rdoc}
48
8
 
49
9
  == Usage
50
10
 
@@ -54,7 +14,8 @@ Once Bigrecord is working in your Rails project, you can use the following gener
54
14
 
55
15
  script/generate bigrecord_model ModelName
56
16
 
57
- This will add a model in app/models and a migration file in db/bigrecord_migrate. Note: This generator does not accept attributes.
17
+ This will add a model in app/models and a migration file in db/bigrecord_migrate. Note: This generator does not
18
+ accept attributes.
58
19
 
59
20
  script/generate bigrecord_migration MigrationName
60
21
 
@@ -62,11 +23,19 @@ Creates a Bigrecord specific migration and adds it into db/bigrecord_migrate
62
23
 
63
24
  === {BigRecord::Migration Migration File}
64
25
 
65
- Although column-oriented databases are generally schema-less, certain ones (like Hbase) require the creation of tables and column families ahead of time. The individual columns, however, are defined in the model itself and can be modified dynamically without the need for migrations.
26
+ Note: Cassandra doesn't have the capability to modify the ColumnFamily schema while running, and can only be edited
27
+ from the storage-conf.xml configuration while the cluster is down. Future versions of Cassandra have this planned.
28
+
29
+ Although column-oriented databases are generally schema-less, certain ones (like Hbase) require the creation of
30
+ tables and column families ahead of time. The individual columns, however, are defined in the model itself and can
31
+ be modified dynamically without the need for migrations.
66
32
 
67
- Unless you're familiar with column families, the majority of use cases work perfectly fine within one column family. When you generate a bigrecord_model, it will default to creating the :attribute column family.
33
+ Unless you're familiar with column families, the majority of use cases work perfectly fine within one column family.
34
+ When you generate a bigrecord_model, it will default to creating the :attribute column family.
68
35
 
69
- The following is a standard migration file that creates a table called "Books" with the default column family :attribute that has the following option of 100 versions and uses the 'lzo' compression scheme. Leave any options blank for the default value.
36
+ The following is a standard migration file that creates a table called "Books" with the default column family
37
+ :attribute that has the following option of 100 versions and uses the 'lzo' compression scheme. Leave any options
38
+ blank for the default value.
70
39
 
71
40
  class CreateBooks < BigRecord::Migration
72
41
  def self.up
@@ -80,12 +49,15 @@ The following is a standard migration file that creates a table called "Books" w
80
49
  end
81
50
  end
82
51
 
83
- === HBase column family options (HBase specific)
52
+ ==== HBase column family options (HBase specific)
84
53
 
85
- * versions: integer. By default, Hbase will store 3 versions of changes for any column family. Changing this value on the creation will change this behavior.
86
- * compression: 'none', 'gz', 'lzo'. Defaults to 'none'. Since Hbase 0.20, column families can be stored using compression. The compression scheme you define here must be installed on the Hbase servers!
54
+ * versions: integer. By default, Hbase will store 3 versions of changes for any column family. Changing this value on
55
+ the creation will change this behavior.
87
56
 
88
- === Migrating
57
+ * compression: 'none', 'gz', 'lzo'. Defaults to 'none'. Since Hbase 0.20, column families can be stored using
58
+ compression. The compression scheme you define here must be installed on the Hbase servers!
59
+
60
+ ==== Migrating
89
61
 
90
62
  Run the following rake task to migrate your tables and column families up to the latest version:
91
63
 
@@ -93,7 +65,8 @@ Run the following rake task to migrate your tables and column families up to the
93
65
 
94
66
  === {BigRecord::ConnectionAdapters::Column Column and Attribute Definition}
95
67
 
96
- Now that you have your tables and column families all set up, you can begin adding columns to your model. The following is an example of a model named book.rb
68
+ Now that you have your tables and column families all set up, you can begin adding columns to your model. The
69
+ following is an example of a model named book.rb
97
70
 
98
71
  class Book < BigRecord::Base
99
72
  column 'attribute:title', :string
@@ -102,11 +75,16 @@ Now that you have your tables and column families all set up, you can begin addi
102
75
  column :links, :string, :collection => true
103
76
  end
104
77
 
105
- This simple model defines 4 columns of type string. An important thing to notice here is that the first column 'attribute:title' had the column family prepended to it. This is identical to just passing the symbol :title to the column method, and the default behaviour is to prepend the column family (attribute) automatically if one is not defined. Furthermore, in Hbase, there's the option of storing collections for a given column. This will return an array for the links attribute on a Book record.
78
+ This simple model defines 4 columns of type string. An important thing to notice here is that the first column
79
+ 'attribute:title' had the column family prepended to it. This is identical to just passing the symbol :title to
80
+ the column method, and the default behaviour is to prepend the column family (attribute) automatically if one is
81
+ not defined. Furthermore, in Hbase, there's the option of storing collections for a given column. This will return
82
+ an array for the links attribute on a Book record.
106
83
 
107
84
  === {BigRecord::BrAssociations Associations}
108
85
 
109
- There are also associations available in Bigrecord, as well as the ability to associate to Activerecord models. The following are a few models demonstrating this:
86
+ There are also associations available in Bigrecord, as well as the ability to associate to Activerecord models. The
87
+ following are a few models demonstrating this:
110
88
 
111
89
  animal.rb
112
90
  class Animal < BigRecord::Base
@@ -124,13 +102,18 @@ animal.rb
124
102
  belongs_to :trainer, :foreign_key => :trainer_id
125
103
  end
126
104
 
127
- In this example, an Animal is related to Zoo and Trainer. Both Animal and Zoo are Bigrecord models, and Trainer is an Activerecord model. Notice here that we need to define both the association field for storing the information and the association itself. It's also important to remember that Bigrecord models have their IDs stored as string, and Activerecord models use integers.
105
+ In this example, an Animal is related to Zoo and Trainer. Both Animal and Zoo are Bigrecord models, and Trainer is
106
+ an Activerecord model. Notice here that we need to define both the association field for storing the information and
107
+ the association itself. It's also important to remember that Bigrecord models have their IDs stored as string, and
108
+ Activerecord models use integers.
128
109
 
129
- Once the association columns are defined, you define the associations themselves with either belongs_to_bigrecord or belongs_to_many and defining the :foreign_key (this is required for all associations).
110
+ Once the association columns are defined, you define the associations themselves with either belongs_to_bigrecord or
111
+ belongs_to_many and defining the :foreign_key (this is required for all associations).
130
112
 
131
113
  === {BigRecord::ConnectionAdapters::View Specifying return columns}
132
114
 
133
- There are two ways to define specific columns to be returned with your models: 1) at the model level and 2) during the query.
115
+ There are two ways to define specific columns to be returned with your models: 1) at the model level and 2) during
116
+ the query.
134
117
 
135
118
  (1) At the model level, a collection of columns are called named views, and are defined like the following:
136
119
 
@@ -147,7 +130,8 @@ There are two ways to define specific columns to be returned with your models: 1
147
130
  view :default, :title, :author, :description
148
131
  end
149
132
 
150
- Now, whenever you work with a Book record, it will only returned the columns you specify according to the view option you pass. i.e.
133
+ Now, whenever you work with a Book record, it will only returned the columns you specify according to the view option
134
+ you pass. i.e.
151
135
 
152
136
  >> Book.find(:first, :view => :front_page)
153
137
  => #<Book id: "2e13f182-1085-495e-9841-fe5c84ae9992", attribute:title: "Hello Thar", attribute:author: "Greg">
@@ -158,10 +142,11 @@ Now, whenever you work with a Book record, it will only returned the columns you
158
142
  >> Book.find(:first, :view => :default)
159
143
  => #<Book id: "2e13f182-1085-495e-9841-fe5c84ae9992", attribute:description: "Masterpiece!", attribute:title: "Hello Thar", attribute:links: ["link1", "link2", "link3", "link4"], attribute:author: "Greg">
160
144
 
161
- Note: A Bigrecord model will return all the columns within the default column family (when :view option is left blank, for example). You can override the :default name view to change this behaviour.
162
-
145
+ Note: A Bigrecord model will return all the columns within the default column family (when :view option is left blank,
146
+ for example). You can override the :default name view to change this behaviour.
163
147
 
164
- (2) If you don't want to define named views ahead of time, you can just pass an array of columns to the :columns option and it will return only those attributes:
148
+ (2) If you don't want to define named views ahead of time, you can just pass an array of columns to the :columns
149
+ option and it will return only those attributes:
165
150
 
166
151
  >> Book.find(:first, :columns => [:author, :description])
167
152
  => #<Book id: "2e13f182-1085-495e-9841-fe5c84ae9992", attribute:description: "Masterpiece!", attribute:author: "Greg">
@@ -170,4 +155,5 @@ As you may have noticed, this functionality is synonymous with the :select optio
170
155
 
171
156
  === {BigRecord::Embedded Embedded Records}
172
157
 
173
- === At this point, usage patterns for a Bigrecord model are similar to that of an Activerecord model, and much of that documentation applies as well. Please refer to those and see if they work!
158
+ === At this point, usage patterns for a Bigrecord model are similar to that of an Activerecord model, and much of that
159
+ documentation applies as well. Please refer to those and see if they work!
@@ -0,0 +1,48 @@
1
+ == Setting up HBase and Stargate
2
+
3
+ To quickly get started with development, you can set up HBase to run as a single server on your local computer,
4
+ along with Stargate, its RESTful web service front-end.
5
+
6
+ (1) Download and unpack the most recent release of HBase from http://hadoop.apache.org/hbase/releases.html#Download
7
+
8
+ (2) Edit (hbase-dir)/conf/hbase-env.sh and uncomment/modify the following line to correspond to your Java home path:
9
+ export JAVA_HOME=/usr/lib/jvm/java-6-sun
10
+
11
+ (3) Copy (hbase-dir)/contrib/stargate/hbase-<version>-stargate.jar into <hbase-dir>/lib
12
+
13
+ (4) Copy all the files in the (hbase-dir)/contrib/stargate/lib folder into <hbase-dir>/lib
14
+
15
+ (5) Start up HBase:
16
+ $ (hbase-dir)/bin/start-hbase.sh
17
+
18
+ (6)Start up Stargate (append "-p 1234" at the end if you want to change the port):
19
+ $ (hbase-dir)/bin/hbase org.apache.hadoop.hbase.stargate.Main
20
+
21
+
22
+ == Setting up Bigrecord
23
+
24
+ (1) Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please see the Bigrecord Driver
25
+ documentation for more detailed instructions.
26
+ (http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
27
+
28
+ (2) Add the following line into the Rails::Initializer.run do |config| block:
29
+
30
+ config.gem "bigrecord", :source => "http://gemcutter.org"
31
+
32
+ and run the following command to install all the gems listed for your Rails app:
33
+
34
+ [sudo] rake gems:install
35
+
36
+ (3) Bootstrap Bigrecord into your project:
37
+
38
+ script/generate bigrecord
39
+
40
+ (4) Edit the config/bigrecord.yml[.sample] file in your Rails root to the information corresponding to the Stargate
41
+ server.
42
+
43
+ development:
44
+ adapter: hbase_rest
45
+ api_address: http://localhost:8080
46
+
47
+ Note: 8080 is the default port that Stargate starts up on. Make sure you modify this if you changed the port from
48
+ the default.
@@ -0,0 +1,310 @@
1
+ Example storage-conf.xml file:
2
+
3
+ <!--
4
+ ~ Licensed to the Apache Software Foundation (ASF) under one
5
+ ~ or more contributor license agreements. See the NOTICE file
6
+ ~ distributed with this work for additional information
7
+ ~ regarding copyright ownership. The ASF licenses this file
8
+ ~ to you under the Apache License, Version 2.0 (the
9
+ ~ "License"); you may not use this file except in compliance
10
+ ~ with the License. You may obtain a copy of the License at
11
+ ~
12
+ ~ http://www.apache.org/licenses/LICENSE-2.0
13
+ ~
14
+ ~ Unless required by applicable law or agreed to in writing,
15
+ ~ software distributed under the License is distributed on an
16
+ ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
17
+ ~ KIND, either express or implied. See the License for the
18
+ ~ specific language governing permissions and limitations
19
+ ~ under the License.
20
+ -->
21
+ <Storage>
22
+ <!--======================================================================-->
23
+ <!-- Basic Configuration -->
24
+ <!--======================================================================-->
25
+
26
+ <!--
27
+ ~ The name of this cluster. This is mainly used to prevent machines in
28
+ ~ one logical cluster from joining another.
29
+ -->
30
+ <ClusterName>Local Testing</ClusterName>
31
+
32
+ <!--
33
+ ~ Turn on to make new [non-seed] nodes automatically migrate the right data
34
+ ~ to themselves. (If no InitialToken is specified, they will pick one
35
+ ~ such that they will get half the range of the most-loaded node.)
36
+ ~ If a node starts up without bootstrapping, it will mark itself bootstrapped
37
+ ~ so that you can't subsequently accidently bootstrap a node with
38
+ ~ data on it. (You can reset this by wiping your data and commitlog
39
+ ~ directories.)
40
+ ~
41
+ ~ Off by default so that new clusters and upgraders from 0.4 don't
42
+ ~ bootstrap immediately. You should turn this on when you start adding
43
+ ~ new nodes to a cluster that already has data on it. (If you are upgrading
44
+ ~ from 0.4, start your cluster with it off once before changing it to true.
45
+ ~ Otherwise, no data will be lost but you will incur a lot of unnecessary
46
+ ~ I/O before your cluster starts up.)
47
+ -->
48
+ <AutoBootstrap>false</AutoBootstrap>
49
+
50
+ <!--
51
+ ~ Keyspaces and ColumnFamilies:
52
+ ~ A ColumnFamily is the Cassandra concept closest to a relational
53
+ ~ table. Keyspaces are separate groups of ColumnFamilies. Except in
54
+ ~ very unusual circumstances you will have one Keyspace per application.
55
+
56
+ ~ There is an implicit keyspace named 'system' for Cassandra internals.
57
+ -->
58
+ <Keyspaces>
59
+ <Keyspace Name="Bigrecord">
60
+ <ColumnFamily Name="animals" CompareWith="UTF8Type" />
61
+ <ColumnFamily Name="books" CompareWith="UTF8Type" />
62
+ <ColumnFamily Name="companies" CompareWith="UTF8Type" />
63
+ <ColumnFamily Name="employees" CompareWith="UTF8Type" />
64
+ <ColumnFamily Name="novels" CompareWith="UTF8Type" />
65
+ <ColumnFamily Name="zoos" CompareWith="UTF8Type" />
66
+
67
+ <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
68
+
69
+ <ReplicationFactor>1</ReplicationFactor>
70
+
71
+ <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
72
+ </Keyspace>
73
+ </Keyspaces>
74
+
75
+ <!--
76
+ ~ Authenticator: any IAuthenticator may be used, including your own as long
77
+ ~ as it is on the classpath. Out of the box, Cassandra provides
78
+ ~ org.apache.cassandra.auth.AllowAllAuthenticator and,
79
+ ~ org.apache.cassandra.auth.SimpleAuthenticator
80
+ ~ (SimpleAuthenticator uses access.properties and passwd.properties by
81
+ ~ default).
82
+ ~
83
+ ~ If you don't specify an authenticator, AllowAllAuthenticator is used.
84
+ -->
85
+ <Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator>
86
+
87
+ <!--
88
+ ~ Partitioner: any IPartitioner may be used, including your own as long
89
+ ~ as it is on the classpath. Out of the box, Cassandra provides
90
+ ~ org.apache.cassandra.dht.RandomPartitioner,
91
+ ~ org.apache.cassandra.dht.OrderPreservingPartitioner, and
92
+ ~ org.apache.cassandra.dht.CollatingOrderPreservingPartitioner.
93
+ ~ (CollatingOPP colates according to EN,US rules, not naive byte
94
+ ~ ordering. Use this as an example if you need locale-aware collation.)
95
+ ~ Range queries require using an order-preserving partitioner.
96
+ ~
97
+ ~ Achtung! Changing this parameter requires wiping your data
98
+ ~ directories, since the partitioner can modify the sstable on-disk
99
+ ~ format.
100
+ -->
101
+ <Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
102
+
103
+ <!--
104
+ ~ If you are using an order-preserving partitioner and you know your key
105
+ ~ distribution, you can specify the token for this node to use. (Keys
106
+ ~ are sent to the node with the "closest" token, so distributing your
107
+ ~ tokens equally along the key distribution space will spread keys
108
+ ~ evenly across your cluster.) This setting is only checked the first
109
+ ~ time a node is started.
110
+
111
+ ~ This can also be useful with RandomPartitioner to force equal spacing
112
+ ~ of tokens around the hash space, especially for clusters with a small
113
+ ~ number of nodes.
114
+ -->
115
+ <InitialToken></InitialToken>
116
+
117
+ <!--
118
+ ~ Directories: Specify where Cassandra should store different data on
119
+ ~ disk. Keep the data disks and the CommitLog disks separate for best
120
+ ~ performance
121
+ -->
122
+ <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
123
+ <DataFileDirectories>
124
+ <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
125
+ </DataFileDirectories>
126
+
127
+
128
+ <!--
129
+ ~ Addresses of hosts that are deemed contact points. Cassandra nodes
130
+ ~ use this list of hosts to find each other and learn the topology of
131
+ ~ the ring. You must change this if you are running multiple nodes!
132
+ -->
133
+ <Seeds>
134
+ <Seed>127.0.0.1</Seed>
135
+ </Seeds>
136
+
137
+
138
+ <!-- Miscellaneous -->
139
+
140
+ <!-- Time to wait for a reply from other nodes before failing the command -->
141
+ <RpcTimeoutInMillis>10000</RpcTimeoutInMillis>
142
+ <!-- Size to allow commitlog to grow to before creating a new segment -->
143
+ <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
144
+
145
+
146
+ <!-- Local hosts and ports -->
147
+
148
+ <!--
149
+ ~ Address to bind to and tell other nodes to connect to. You _must_
150
+ ~ change this if you want multiple nodes to be able to communicate!
151
+ ~
152
+ ~ Leaving it blank leaves it up to InetAddress.getLocalHost(). This
153
+ ~ will always do the Right Thing *if* the node is properly configured
154
+ ~ (hostname, name resolution, etc), and the Right Thing is to use the
155
+ ~ address associated with the hostname (it might not be).
156
+ -->
157
+ <ListenAddress>localhost</ListenAddress>
158
+ <!-- internal communications port -->
159
+ <StoragePort>7000</StoragePort>
160
+
161
+ <!--
162
+ ~ The address to bind the Thrift RPC service to. Unlike ListenAddress
163
+ ~ above, you *can* specify 0.0.0.0 here if you want Thrift to listen on
164
+ ~ all interfaces.
165
+ ~
166
+ ~ Leaving this blank has the same effect it does for ListenAddress,
167
+ ~ (i.e. it will be based on the configured hostname of the node).
168
+ -->
169
+ <ThriftAddress>localhost</ThriftAddress>
170
+ <!-- Thrift RPC port (the port clients connect to). -->
171
+ <ThriftPort>9160</ThriftPort>
172
+ <!--
173
+ ~ Whether or not to use a framed transport for Thrift. If this option
174
+ ~ is set to true then you must also use a framed transport on the
175
+ ~ client-side, (framed and non-framed transports are not compatible).
176
+ -->
177
+ <ThriftFramedTransport>false</ThriftFramedTransport>
178
+
179
+
180
+ <!--======================================================================-->
181
+ <!-- Memory, Disk, and Performance -->
182
+ <!--======================================================================-->
183
+
184
+ <!--
185
+ ~ Access mode. mmapped i/o is substantially faster, but only practical on
186
+ ~ a 64bit machine (which notably does not include EC2 "small" instances)
187
+ ~ or relatively small datasets. "auto", the safe choice, will enable
188
+ ~ mmapping on a 64bit JVM. Other values are "mmap", "mmap_index_only"
189
+ ~ (which may allow you to get part of the benefits of mmap on a 32bit
190
+ ~ machine by mmapping only index files) and "standard".
191
+ ~ (The buffer size settings that follow only apply to standard,
192
+ ~ non-mmapped i/o.)
193
+ -->
194
+ <DiskAccessMode>auto</DiskAccessMode>
195
+
196
+ <!--
197
+ ~ Size of compacted row above which to log a warning. (If compacted
198
+ ~ rows do not fit in memory, Cassandra will crash. This is explained
199
+ ~ in http://wiki.apache.org/cassandra/CassandraLimitations and is
200
+ ~ scheduled to be fixed in 0.7.)
201
+ -->
202
+ <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
203
+
204
+ <!--
205
+ ~ Buffer size to use when performing contiguous column slices. Increase
206
+ ~ this to the size of the column slices you typically perform.
207
+ ~ (Name-based queries are performed with a buffer size of
208
+ ~ ColumnIndexSizeInKB.)
209
+ -->
210
+ <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
211
+
212
+ <!--
213
+ ~ Buffer size to use when flushing memtables to disk. (Only one
214
+ ~ memtable is ever flushed at a time.) Increase (decrease) the index
215
+ ~ buffer size relative to the data buffer if you have few (many)
216
+ ~ columns per key. Bigger is only better _if_ your memtables get large
217
+ ~ enough to use the space. (Check in your data directory after your
218
+ ~ app has been running long enough.) -->
219
+ <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
220
+ <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
221
+
222
+ <!--
223
+ ~ Add column indexes to a row after its contents reach this size.
224
+ ~ Increase if your column values are large, or if you have a very large
225
+ ~ number of columns. The competing causes are, Cassandra has to
226
+ ~ deserialize this much of the row to read a single column, so you want
227
+ ~ it to be small - at least if you do many partial-row reads - but all
228
+ ~ the index data is read for each access, so you don't want to generate
229
+ ~ that wastefully either.
230
+ -->
231
+ <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
232
+
233
+ <!--
234
+ ~ Flush memtable after this much data has been inserted, including
235
+ ~ overwritten data. There is one memtable per column family, and
236
+ ~ this threshold is based solely on the amount of data stored, not
237
+ ~ actual heap memory usage (there is some overhead in indexing the
238
+ ~ columns).
239
+ -->
240
+ <MemtableThroughputInMB>64</MemtableThroughputInMB>
241
+ <!--
242
+ ~ Throughput setting for Binary Memtables. Typically these are
243
+ ~ used for bulk load so you want them to be larger.
244
+ -->
245
+ <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
246
+ <!--
247
+ ~ The maximum number of columns in millions to store in memory per
248
+ ~ ColumnFamily before flushing to disk. This is also a per-memtable
249
+ ~ setting. Use with MemtableThroughputInMB to tune memory usage.
250
+ -->
251
+ <MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>
252
+ <!--
253
+ ~ The maximum time to leave a dirty memtable unflushed.
254
+ ~ (While any affected columnfamilies have unflushed data from a
255
+ ~ commit log segment, that segment cannot be deleted.)
256
+ ~ This needs to be large enough that it won't cause a flush storm
257
+ ~ of all your memtables flushing at once because none has hit
258
+ ~ the size or count thresholds yet. For production, a larger
259
+ ~ value such as 1440 is recommended.
260
+ -->
261
+ <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
262
+
263
+ <!--
264
+ ~ Unlike most systems, in Cassandra writes are faster than reads, so
265
+ ~ you can afford more of those in parallel. A good rule of thumb is 2
266
+ ~ concurrent reads per processor core. Increase ConcurrentWrites to
267
+ ~ the number of clients writing at once if you enable CommitLogSync +
268
+ ~ CommitLogSyncDelay. -->
269
+ <ConcurrentReads>8</ConcurrentReads>
270
+ <ConcurrentWrites>32</ConcurrentWrites>
271
+
272
+ <!--
273
+ ~ CommitLogSync may be either "periodic" or "batch." When in batch
274
+ ~ mode, Cassandra won't ack writes until the commit log has been
275
+ ~ fsynced to disk. It will wait up to CommitLogSyncBatchWindowInMS
276
+ ~ milliseconds for other writes, before performing the sync.
277
+
278
+ ~ This is less necessary in Cassandra than in traditional databases
279
+ ~ since replication reduces the odds of losing data from a failure
280
+ ~ after writing the log entry but before it actually reaches the disk.
281
+ ~ So the other option is "periodic," where writes may be acked immediately
282
+ ~ and the CommitLog is simply synced every CommitLogSyncPeriodInMS
283
+ ~ milliseconds.
284
+ -->
285
+ <CommitLogSync>periodic</CommitLogSync>
286
+ <!--
287
+ ~ Interval at which to perform syncs of the CommitLog in periodic mode.
288
+ ~ Usually the default of 10000ms is fine; increase it if your i/o
289
+ ~ load is such that syncs are taking excessively long times.
290
+ -->
291
+ <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
292
+ <!--
293
+ ~ Delay (in milliseconds) during which additional commit log entries
294
+ ~ may be written before fsync in batch mode. This will increase
295
+ ~ latency slightly, but can vastly improve throughput where there are
296
+ ~ many writers. Set to zero to disable (each entry will be synced
297
+ ~ individually). Reasonable values range from a minimal 0.1 to 10 or
298
+ ~ even more if throughput matters more than latency.
299
+ -->
300
+ <!-- <CommitLogSyncBatchWindowInMS>1</CommitLogSyncBatchWindowInMS> -->
301
+
302
+ <!--
303
+ ~ Time to wait before garbage-collection deletion markers. Set this to
304
+ ~ a large enough value that you are confident that the deletion marker
305
+ ~ will be propagated to all replicas by the time this many seconds has
306
+ ~ elapsed, even in the face of hardware failures. The default value is
307
+ ~ ten days.
308
+ -->
309
+ <GCGraceSeconds>10</GCGraceSeconds>
310
+ </Storage>
@@ -68,7 +68,7 @@ module BigRecord
68
68
  def update_raw(table_name, row, values, timestamp)
69
69
  result = nil
70
70
  log "UPDATE #{table_name} SET #{values.inspect if values} WHERE ROW=#{row};" do
71
- result = @connection.insert(table_name, row, data_to_cassandra_format(values), {:consistency => Cassandra::Consistency::QUORUM})
71
+ result = @connection.insert(table_name, row, values, {:consistency => Cassandra::Consistency::QUORUM})
72
72
  end
73
73
  result
74
74
  end
@@ -84,8 +84,7 @@ module BigRecord
84
84
  def get_raw(table_name, row, column, options={})
85
85
  result = nil
86
86
  log "SELECT (#{column}) FROM #{table_name} WHERE ROW=#{row};" do
87
- super_column, name = column.split(":")
88
- result = @connection.get(table_name, row, super_column, name)
87
+ result = @connection.get(table_name, row, column)
89
88
  end
90
89
  result
91
90
  end
@@ -103,33 +102,33 @@ module BigRecord
103
102
 
104
103
  def get_columns_raw(table_name, row, columns, options={})
105
104
  result = {}
106
-
105
+
107
106
  log "SELECT (#{columns.join(", ")}) FROM #{table_name} WHERE ROW=#{row};" do
108
- requested_columns = columns_to_cassandra_format(columns)
109
- super_columns = requested_columns.keys
107
+ prefix_mode = false
108
+ prefixes = []
110
109
 
111
- if super_columns.size == 1 && requested_columns[super_columns.first].size > 0
112
- column_names = requested_columns[super_columns.first]
110
+ columns.each do |name|
111
+ prefix, name = name.split(":")
112
+ prefixes << prefix+":" unless prefixes.include?(prefix+":")
113
+ prefix_mode = name.blank?
114
+ end
113
115
 
114
- values = @connection.get_columns(table_name, row, super_columns.first, column_names)
116
+ if prefix_mode
117
+ prefixes.sort!
118
+ values = @connection.get(table_name, row, {:start => prefixes.first, :finish => prefixes.last + "~"})
115
119
 
116
- result["id"] = row if values && values.compact.size > 0
117
- column_names.each_index do |id|
118
- full_key = super_columns.first + ":" + column_names[id].to_s
119
- result[full_key] = values[id] unless values[id].nil?
120
+ result["id"] = row if values && values.size > 0
121
+
122
+ values.each do |key,value|
123
+ result[key] = value unless value.blank?
120
124
  end
121
125
  else
122
- values = @connection.get_columns(table_name, row, super_columns)
126
+ values = @connection.get_columns(table_name, row, columns)
127
+
123
128
  result["id"] = row if values && values.compact.size > 0
124
- super_columns.each_index do |id|
125
- next if values[id].nil?
126
-
127
- values[id].each do |column_name, value|
128
- next if value.nil?
129
-
130
- full_key = super_columns[id] + ":" + column_name
131
- result[full_key] = value
132
- end
129
+
130
+ columns.each_index do |id|
131
+ result[columns[id].to_s] = values[id] unless values[id].blank?
133
132
  end
134
133
  end
135
134
  end
@@ -144,11 +143,11 @@ module BigRecord
144
143
  row_cols.each do |key,value|
145
144
  begin
146
145
  result[key] =
147
- if key == 'id'
148
- value
149
- else
150
- deserialize(value)
151
- end
146
+ if key == 'id'
147
+ value
148
+ else
149
+ deserialize(value)
150
+ end
152
151
  rescue Exception => e
153
152
  puts "Could not load column value #{key} for row=#{row.name}"
154
153
  end
@@ -160,9 +159,9 @@ module BigRecord
160
159
  result = []
161
160
  log "SCAN (#{columns.join(", ")}) FROM #{table_name} WHERE START_ROW=#{start_row} AND STOP_ROW=#{stop_row} LIMIT=#{limit};" do
162
161
  options = {}
163
- options[:start] = start_row if start_row
164
- options[:finish] = stop_row if stop_row
165
- options[:count] = limit if limit
162
+ options[:start] = start_row unless start_row.blank?
163
+ options[:finish] = stop_row unless stop_row.blank?
164
+ options[:count] = limit unless limit.blank?
166
165
 
167
166
  keys = @connection.get_range(table_name, options)
168
167
 
@@ -172,14 +171,9 @@ module BigRecord
172
171
  row = {}
173
172
  row["id"] = key.key
174
173
 
175
- key.columns.each do |s_col|
176
- super_column = s_col.super_column
177
- super_column_name = super_column.name
178
-
179
- super_column.columns.each do |column|
180
- full_key = super_column_name + ":" + column.name
181
- row[full_key] = column.value
182
- end
174
+ key.columns.each do |col|
175
+ column = col.column
176
+ row[column.name] = column.value
183
177
  end
184
178
 
185
179
  result << row if row.keys.size > 1
@@ -266,31 +260,6 @@ module BigRecord
266
260
 
267
261
  protected
268
262
 
269
- def data_to_cassandra_format(data = {})
270
- super_columns = {}
271
-
272
- data.each do |name, value|
273
- super_column, column = name.split(":")
274
- super_columns[super_column.to_s] = {} unless super_columns.has_key?(super_column.to_s)
275
- super_columns[super_column.to_s][column.to_s] = value
276
- end
277
-
278
- return super_columns
279
- end
280
-
281
- def columns_to_cassandra_format(column_names = [])
282
- super_columns = {}
283
-
284
- column_names.each do |name|
285
- super_column, sub_column = name.split(":")
286
-
287
- super_columns[super_column.to_s] = [] unless super_columns.has_key?(super_column.to_s)
288
- super_columns[super_column.to_s] << sub_column
289
- end
290
-
291
- return super_columns
292
- end
293
-
294
263
  def log(str, name = nil)
295
264
  if block_given?
296
265
  if @logger and @logger.level <= Logger::INFO
@@ -346,4 +315,4 @@ module BigRecord
346
315
  end
347
316
  end
348
317
  end
349
- end
318
+ end
@@ -1,7 +1,7 @@
1
- hbase_rest:
1
+ hbase:
2
2
  adapter: hbase_rest
3
3
  api_address: http://localhost:8080
4
- hbase:
4
+ hbase_brd:
5
5
  adapter: hbase
6
6
  zookeeper_quorum: localhost
7
7
  zookeeper_client_port: 2181
metadata CHANGED
@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
5
5
  segments:
6
6
  - 0
7
7
  - 1
8
- - 0
9
- version: 0.1.0
8
+ - 1
9
+ version: 0.1.1
10
10
  platform: ruby
11
11
  authors:
12
12
  - openplaces.org
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2010-04-27 00:00:00 -04:00
17
+ date: 2010-05-05 00:00:00 -04:00
18
18
  default_executable:
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
@@ -77,8 +77,11 @@ extra_rdoc_files:
77
77
  - LICENSE
78
78
  - README.rdoc
79
79
  - guides/bigrecord_specs.rdoc
80
+ - guides/cassandra_install.rdoc
80
81
  - guides/deployment.rdoc
81
82
  - guides/getting_started.rdoc
83
+ - guides/hbase_install.rdoc
84
+ - guides/storage-conf.rdoc
82
85
  files:
83
86
  - Rakefile
84
87
  - VERSION
@@ -92,8 +95,11 @@ files:
92
95
  - generators/bigrecord_model/templates/model.rb
93
96
  - generators/bigrecord_model/templates/model_spec.rb
94
97
  - guides/bigrecord_specs.rdoc
98
+ - guides/cassandra_install.rdoc
95
99
  - guides/deployment.rdoc
96
100
  - guides/getting_started.rdoc
101
+ - guides/hbase_install.rdoc
102
+ - guides/storage-conf.rdoc
97
103
  - init.rb
98
104
  - install.rb
99
105
  - lib/big_record.rb