bigrecord 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +23 -11
- data/VERSION +1 -1
- data/guides/bigrecord_specs.rdoc +9 -3
- data/guides/cassandra_install.rdoc +65 -0
- data/guides/deployment.rdoc +12 -5
- data/guides/getting_started.rdoc +48 -62
- data/guides/hbase_install.rdoc +48 -0
- data/guides/storage-conf.rdoc +310 -0
- data/lib/big_record/connection_adapters/cassandra_adapter.rb +34 -65
- data/spec/connections/bigrecord.yml +2 -2
- metadata +9 -3
data/README.rdoc
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
= Big Record
|
2
2
|
|
3
|
-
A Ruby Object/Data Mapper for distributed column-oriented data stores (inspired by BigTable) such as HBase. Intended
|
3
|
+
A Ruby Object/Data Mapper for distributed column-oriented data stores (inspired by BigTable) such as HBase. Intended
|
4
|
+
to work as a drop-in for Rails applications.
|
4
5
|
|
5
6
|
== Features
|
6
7
|
* Dynamic schemas (due to the schema-less design of BigTable).
|
@@ -12,22 +13,33 @@ A Ruby Object/Data Mapper for distributed column-oriented data stores (inspired
|
|
12
13
|
|
13
14
|
== Motivations
|
14
15
|
|
15
|
-
BigTable, and by extension, Bigrecord isn't right for everyone. A great introductory article discussing this topic can
|
16
|
+
BigTable, and by extension, Bigrecord isn't right for everyone. A great introductory article discussing this topic can
|
17
|
+
be found at http://blog.rapleaf.com/dev/?p=26 explaining why you would or wouldn't use BigTable. The rule of thumb,
|
18
|
+
however, is that if your data model is simple or can fit into a standard RDBMS, then you probably don't need it.
|
16
19
|
|
17
20
|
Beyond this though, there are two basic motivations that almost immediately demand a BigTable model database:
|
18
|
-
1. Your data is highly dynamic in nature and would not fit in a schema bound model, or you cannot define a schema ahead
|
19
|
-
|
21
|
+
1. Your data is highly dynamic in nature and would not fit in a schema bound model, or you cannot define a schema ahead
|
22
|
+
of time.
|
23
|
+
2. You know that your database will grow to tens or hundreds of gigabytes, and can't afford big iron servers. Instead,
|
24
|
+
you'd like to scale horizontally across many commodity servers.
|
20
25
|
|
21
|
-
==
|
26
|
+
== Components
|
22
27
|
|
23
|
-
*
|
24
|
-
* Big Record Driver: JRuby application that bridges Ruby and Java (through JRuby's Drb protocol) to interact with Java-based data stores and their native APIs. Required for HBase and Cassandra. This application can be run from a separate server than your Rails application.
|
25
|
-
* JRuby 1.1.6+ is needed to run Big Record Driver.
|
26
|
-
* Any other requirements needed to run Hadoop, HBase or your data store of choice.
|
28
|
+
* Bigrecord: Ruby Object/Data Mapper. Inspired and architected similarly to Active Record.
|
27
29
|
|
28
|
-
== Optional
|
30
|
+
== Optional Component
|
29
31
|
|
30
|
-
*
|
32
|
+
* Bigrecord Driver: Consists of a JRuby server component that bridges Ruby and Java (through the DRb protocol) to
|
33
|
+
interact with Java-based data stores and their native APIs. Clients that connect to the DRb server can be of any Ruby
|
34
|
+
type (JRuby, MRI, etc). Currently, this is used only for HBase to serve as a connection alternative to Thrift or
|
35
|
+
Stargate. This application can be run from a separate server than your Rails application.
|
36
|
+
|
37
|
+
* Bigindex [http://github.com/openplaces/bigindex]: Due to the nature of BigTable databases, some limitations are
|
38
|
+
present while using Bigrecord standalone when compared to Active Record. Some major limitations include the inability
|
39
|
+
to query for data other than with the row ID, indexing, searching, and dynamic finders (find_by_attribute_name). Since
|
40
|
+
these data access patterns are vital for most Rails applications to function, Bigindex was created to address these
|
41
|
+
issues, and bring the feature set more up to par with Active Record. Please refer to the <tt>Bigindex</tt> package for
|
42
|
+
more information and its requirements.
|
31
43
|
|
32
44
|
== Getting Started
|
33
45
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.1
|
data/guides/bigrecord_specs.rdoc
CHANGED
@@ -2,11 +2,15 @@
|
|
2
2
|
|
3
3
|
== Data store information
|
4
4
|
|
5
|
-
The default settings for the Bigrecord specs can be found at spec/connections/bigrecord.yml with each environment
|
5
|
+
The default settings for the Bigrecord specs can be found at spec/connections/bigrecord.yml with each environment
|
6
|
+
broken down by the data store type (Hbase and Cassandra at the time of writing). These are the minimal settings
|
7
|
+
required to connect to each data store, and should be modified freely to reflect your own system configurations.
|
6
8
|
|
7
9
|
== Data store migration
|
8
10
|
|
9
|
-
There are also migrations to create the necessary tables for the specs to run. To ensure migrations are functioning
|
11
|
+
There are also migrations to create the necessary tables for the specs to run. To ensure migrations are functioning
|
12
|
+
properly before actually running the migrations, you can run: spec spec/unit/migration_spec.rb. Alternatively, you
|
13
|
+
can manually create the tables according to the migration files under: spec/lib/migrations
|
10
14
|
|
11
15
|
Migrations have their own log file for debugging purposes. It's created under: bigrecord/migrate.log
|
12
16
|
|
@@ -31,6 +35,8 @@ To run a specific spec, you can run the following command from the bigrecord roo
|
|
31
35
|
|
32
36
|
== Debugging
|
33
37
|
|
34
|
-
If any problems or failures arise during the unit tests, please refer to the log files before submitting it as an
|
38
|
+
If any problems or failures arise during the unit tests, please refer to the log files before submitting it as an
|
39
|
+
issue. Often, it's a simple matter of forgetting to turn on BigrecordDriver, the tables weren't created, or
|
40
|
+
configurations weren't set properly.
|
35
41
|
|
36
42
|
The log file for specs is created under: <bigrecord root>/spec/debug.log
|
@@ -0,0 +1,65 @@
|
|
1
|
+
== Setting up Cassandra
|
2
|
+
|
3
|
+
To quickly get started with development, you can set up Cassandra to run as a single node cluster on your local system.
|
4
|
+
|
5
|
+
(1) Download and unpack the most recent release of Cassandra from http://cassandra.apache.org/download/
|
6
|
+
|
7
|
+
(2) Add a <Keyspace></Keyspace> entry into your (cassandra-dir)/conf/storage-conf.xml configuration file named after
|
8
|
+
your application, and create <ColumnFamily> entries corresponding to each model you wish to add. The following is an
|
9
|
+
example of the Bigrecord keyspace used to run the spec suite against:
|
10
|
+
|
11
|
+
<Keyspace Name="Bigrecord">
|
12
|
+
<ColumnFamily Name="animals" CompareWith="UTF8Type" />
|
13
|
+
<ColumnFamily Name="books" CompareWith="UTF8Type" />
|
14
|
+
<ColumnFamily Name="companies" CompareWith="UTF8Type" />
|
15
|
+
<ColumnFamily Name="employees" CompareWith="UTF8Type" />
|
16
|
+
<ColumnFamily Name="novels" CompareWith="UTF8Type" />
|
17
|
+
<ColumnFamily Name="zoos" CompareWith="UTF8Type" />
|
18
|
+
|
19
|
+
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
|
20
|
+
|
21
|
+
<ReplicationFactor>1</ReplicationFactor>
|
22
|
+
|
23
|
+
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
|
24
|
+
</Keyspace>
|
25
|
+
|
26
|
+
You can also see {file:guides/storage-conf.rdoc guides/storage-conf.rdoc} for an example of a full configuration. More
|
27
|
+
documentation on setting up Cassandra can be found at http://wiki.apache.org/cassandra/GettingStarted
|
28
|
+
|
29
|
+
(3) Install the Cassandra Rubygem:
|
30
|
+
|
31
|
+
$ [sudo] gem install cassandra
|
32
|
+
|
33
|
+
(4) Start up Cassandra:
|
34
|
+
$ (cassandra-dir)/bin/cassandra -f
|
35
|
+
|
36
|
+
|
37
|
+
== Setting up Bigrecord
|
38
|
+
|
39
|
+
(1) Add the following line into the Rails::Initializer.run do |config| block:
|
40
|
+
|
41
|
+
config.gem "bigrecord", :source => "http://gemcutter.org"
|
42
|
+
|
43
|
+
and run the following command to install all the gems listed for your Rails app:
|
44
|
+
|
45
|
+
[sudo] rake gems:install
|
46
|
+
|
47
|
+
(2) Bootstrap Bigrecord into your project:
|
48
|
+
|
49
|
+
script/generate bigrecord
|
50
|
+
|
51
|
+
(3) Edit the config/bigrecord.yml[.sample] file in your Rails root to the information corresponding to your Cassandra
|
52
|
+
install (keyspace should correspond to the one you defined in step 2 of "Setting up Cassandra" above):
|
53
|
+
|
54
|
+
development:
|
55
|
+
adapter: cassandra
|
56
|
+
keyspace: Bigrecord
|
57
|
+
servers: localhost:9160
|
58
|
+
production:
|
59
|
+
adapter: cassandra
|
60
|
+
keyspace: Bigrecord
|
61
|
+
servers:
|
62
|
+
- server1:9160
|
63
|
+
- server2:9160
|
64
|
+
|
65
|
+
Note: 9160 is the default port for Cassandra's Thrift server.
|
data/guides/deployment.rdoc
CHANGED
@@ -1,15 +1,22 @@
|
|
1
|
-
= Deploying Big Record
|
1
|
+
= Deploying Big Record with HBase
|
2
2
|
|
3
|
-
Stargate is a new implementation for HBase's web service front-end, and as such, is not currently recommended for
|
3
|
+
Stargate is a new implementation for HBase's web service front-end, and as such, is not currently recommended for
|
4
|
+
deployment.
|
4
5
|
|
5
|
-
We here at Openplaces have developed Bigrecord Driver, which uses JRuby to interact with HBase via the native
|
6
|
+
We here at Openplaces have developed Bigrecord Driver, which uses JRuby to interact with HBase via the native
|
7
|
+
Java API and connect to Bigrecord through the DRb protocol. This method is slightly more complicated to setup,
|
8
|
+
but preliminary benchmarks show that it runs faster (especially for scanner functionality).
|
6
9
|
|
7
10
|
== Instructions
|
8
|
-
* Your database should already be set up (please refer to the database's own documentation) with the required
|
11
|
+
* Your database should already be set up (please refer to the database's own documentation) with the required
|
12
|
+
information known such as the zookeeper quorum/port, etc. in order for Bigrecord to connect to it.
|
13
|
+
|
9
14
|
* Bigrecord Driver (if your database requires it for connecting)
|
10
15
|
* JRuby 1.1.6+ is needed to run Bigrecord Driver.
|
11
16
|
|
12
|
-
Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please refer the Bigrecord Driver
|
17
|
+
Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please refer the Bigrecord Driver
|
18
|
+
documentation for more detailed instructions.
|
19
|
+
(http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
|
13
20
|
|
14
21
|
Edit your bigrecord.yml config file as follows:
|
15
22
|
|
data/guides/getting_started.rdoc
CHANGED
@@ -1,50 +1,10 @@
|
|
1
1
|
= Getting Started
|
2
2
|
|
3
|
+
== Install HBase or Cassandra
|
3
4
|
|
4
|
-
|
5
|
-
|
6
|
-
To quickly get started with development, you can set up HBase to run as a single server on your local computer, along with Stargate, its RESTful web service front-end.
|
7
|
-
|
8
|
-
(1) Download and unpack the most recent release of HBase from http://hadoop.apache.org/hbase/releases.html#Download
|
9
|
-
|
10
|
-
(2) Edit (hbase-dir)/conf/hbase-env.sh and uncomment/modify the following line to correspond to your Java home path:
|
11
|
-
export JAVA_HOME=/usr/lib/jvm/java-6-sun
|
12
|
-
|
13
|
-
(3) Copy (hbase-dir)/contrib/stargate/hbase-<version>-stargate.jar into <hbase-dir>/lib
|
14
|
-
|
15
|
-
(4) Copy all the files in the (hbase-dir)/contrib/stargate/lib folder into <hbase-dir>/lib
|
16
|
-
|
17
|
-
(5) Start up HBase:
|
18
|
-
$ (hbase-dir)/bin/start-hbase.sh
|
19
|
-
|
20
|
-
(6)Start up Stargate (append "-p 1234" at the end if you want to change the port):
|
21
|
-
$ (hbase-dir)/bin/hbase org.apache.hadoop.hbase.stargate.Main
|
22
|
-
|
23
|
-
|
24
|
-
== Setting up Bigrecord
|
25
|
-
|
26
|
-
(1) Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please see the Bigrecord Driver documentation for more detailed instructions. (http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
|
27
|
-
|
28
|
-
(2) Add the following line into the Rails::Initializer.run do |config| block:
|
29
|
-
|
30
|
-
config.gem "bigrecord", :source => "http://gemcutter.org"
|
31
|
-
|
32
|
-
and run the following command to install all the gems listed for your Rails app:
|
33
|
-
|
34
|
-
[sudo] rake gems:install
|
35
|
-
|
36
|
-
(3) Bootstrap Bigrecord into your project:
|
37
|
-
|
38
|
-
script/generate bigrecord
|
39
|
-
|
40
|
-
(4) Edit the config/bigrecord.yml[.sample] file in your Rails root to the information corresponding to the Stargate server.
|
41
|
-
|
42
|
-
development:
|
43
|
-
adapter: hbase_rest
|
44
|
-
api_address: http://localhost:8080
|
45
|
-
|
46
|
-
Note: 8080 is the default port that Stargate starts up on. Make sure you modify this if you changed the port from the default.
|
5
|
+
* HBase: {file:guides/hbase_install.rdoc guides/hbase_install.rdoc}
|
47
6
|
|
7
|
+
* Cassandra: {file:guides/cassandra_install.rdoc guides/cassandra_install.rdoc}
|
48
8
|
|
49
9
|
== Usage
|
50
10
|
|
@@ -54,7 +14,8 @@ Once Bigrecord is working in your Rails project, you can use the following gener
|
|
54
14
|
|
55
15
|
script/generate bigrecord_model ModelName
|
56
16
|
|
57
|
-
This will add a model in app/models and a migration file in db/bigrecord_migrate. Note: This generator does not
|
17
|
+
This will add a model in app/models and a migration file in db/bigrecord_migrate. Note: This generator does not
|
18
|
+
accept attributes.
|
58
19
|
|
59
20
|
script/generate bigrecord_migration MigrationName
|
60
21
|
|
@@ -62,11 +23,19 @@ Creates a Bigrecord specific migration and adds it into db/bigrecord_migrate
|
|
62
23
|
|
63
24
|
=== {BigRecord::Migration Migration File}
|
64
25
|
|
65
|
-
|
26
|
+
Note: Cassandra doesn't have the capability to modify the ColumnFamily schema while running, and can only be edited
|
27
|
+
from the storage-conf.xml configuration while the cluster is down. Future versions of Cassandra have this planned.
|
28
|
+
|
29
|
+
Although column-oriented databases are generally schema-less, certain ones (like Hbase) require the creation of
|
30
|
+
tables and column families ahead of time. The individual columns, however, are defined in the model itself and can
|
31
|
+
be modified dynamically without the need for migrations.
|
66
32
|
|
67
|
-
Unless you're familiar with column families, the majority of use cases work perfectly fine within one column family.
|
33
|
+
Unless you're familiar with column families, the majority of use cases work perfectly fine within one column family.
|
34
|
+
When you generate a bigrecord_model, it will default to creating the :attribute column family.
|
68
35
|
|
69
|
-
The following is a standard migration file that creates a table called "Books" with the default column family
|
36
|
+
The following is a standard migration file that creates a table called "Books" with the default column family
|
37
|
+
:attribute that has the following option of 100 versions and uses the 'lzo' compression scheme. Leave any options
|
38
|
+
blank for the default value.
|
70
39
|
|
71
40
|
class CreateBooks < BigRecord::Migration
|
72
41
|
def self.up
|
@@ -80,12 +49,15 @@ The following is a standard migration file that creates a table called "Books" w
|
|
80
49
|
end
|
81
50
|
end
|
82
51
|
|
83
|
-
|
52
|
+
==== HBase column family options (HBase specific)
|
84
53
|
|
85
|
-
* versions: integer. By default, Hbase will store 3 versions of changes for any column family. Changing this value on
|
86
|
-
|
54
|
+
* versions: integer. By default, Hbase will store 3 versions of changes for any column family. Changing this value on
|
55
|
+
the creation will change this behavior.
|
87
56
|
|
88
|
-
|
57
|
+
* compression: 'none', 'gz', 'lzo'. Defaults to 'none'. Since Hbase 0.20, column families can be stored using
|
58
|
+
compression. The compression scheme you define here must be installed on the Hbase servers!
|
59
|
+
|
60
|
+
==== Migrating
|
89
61
|
|
90
62
|
Run the following rake task to migrate your tables and column families up to the latest version:
|
91
63
|
|
@@ -93,7 +65,8 @@ Run the following rake task to migrate your tables and column families up to the
|
|
93
65
|
|
94
66
|
=== {BigRecord::ConnectionAdapters::Column Column and Attribute Definition}
|
95
67
|
|
96
|
-
Now that you have your tables and column families all set up, you can begin adding columns to your model. The
|
68
|
+
Now that you have your tables and column families all set up, you can begin adding columns to your model. The
|
69
|
+
following is an example of a model named book.rb
|
97
70
|
|
98
71
|
class Book < BigRecord::Base
|
99
72
|
column 'attribute:title', :string
|
@@ -102,11 +75,16 @@ Now that you have your tables and column families all set up, you can begin addi
|
|
102
75
|
column :links, :string, :collection => true
|
103
76
|
end
|
104
77
|
|
105
|
-
This simple model defines 4 columns of type string. An important thing to notice here is that the first column
|
78
|
+
This simple model defines 4 columns of type string. An important thing to notice here is that the first column
|
79
|
+
'attribute:title' had the column family prepended to it. This is identical to just passing the symbol :title to
|
80
|
+
the column method, and the default behaviour is to prepend the column family (attribute) automatically if one is
|
81
|
+
not defined. Furthermore, in Hbase, there's the option of storing collections for a given column. This will return
|
82
|
+
an array for the links attribute on a Book record.
|
106
83
|
|
107
84
|
=== {BigRecord::BrAssociations Associations}
|
108
85
|
|
109
|
-
There are also associations available in Bigrecord, as well as the ability to associate to Activerecord models. The
|
86
|
+
There are also associations available in Bigrecord, as well as the ability to associate to Activerecord models. The
|
87
|
+
following are a few models demonstrating this:
|
110
88
|
|
111
89
|
animal.rb
|
112
90
|
class Animal < BigRecord::Base
|
@@ -124,13 +102,18 @@ animal.rb
|
|
124
102
|
belongs_to :trainer, :foreign_key => :trainer_id
|
125
103
|
end
|
126
104
|
|
127
|
-
In this example, an Animal is related to Zoo and Trainer. Both Animal and Zoo are Bigrecord models, and Trainer is
|
105
|
+
In this example, an Animal is related to Zoo and Trainer. Both Animal and Zoo are Bigrecord models, and Trainer is
|
106
|
+
an Activerecord model. Notice here that we need to define both the association field for storing the information and
|
107
|
+
the association itself. It's also important to remember that Bigrecord models have their IDs stored as string, and
|
108
|
+
Activerecord models use integers.
|
128
109
|
|
129
|
-
Once the association columns are defined, you define the associations themselves with either belongs_to_bigrecord or
|
110
|
+
Once the association columns are defined, you define the associations themselves with either belongs_to_bigrecord or
|
111
|
+
belongs_to_many and defining the :foreign_key (this is required for all associations).
|
130
112
|
|
131
113
|
=== {BigRecord::ConnectionAdapters::View Specifying return columns}
|
132
114
|
|
133
|
-
There are two ways to define specific columns to be returned with your models: 1) at the model level and 2) during
|
115
|
+
There are two ways to define specific columns to be returned with your models: 1) at the model level and 2) during
|
116
|
+
the query.
|
134
117
|
|
135
118
|
(1) At the model level, a collection of columns are called named views, and are defined like the following:
|
136
119
|
|
@@ -147,7 +130,8 @@ There are two ways to define specific columns to be returned with your models: 1
|
|
147
130
|
view :default, :title, :author, :description
|
148
131
|
end
|
149
132
|
|
150
|
-
Now, whenever you work with a Book record, it will only returned the columns you specify according to the view option
|
133
|
+
Now, whenever you work with a Book record, it will only returned the columns you specify according to the view option
|
134
|
+
you pass. i.e.
|
151
135
|
|
152
136
|
>> Book.find(:first, :view => :front_page)
|
153
137
|
=> #<Book id: "2e13f182-1085-495e-9841-fe5c84ae9992", attribute:title: "Hello Thar", attribute:author: "Greg">
|
@@ -158,10 +142,11 @@ Now, whenever you work with a Book record, it will only returned the columns you
|
|
158
142
|
>> Book.find(:first, :view => :default)
|
159
143
|
=> #<Book id: "2e13f182-1085-495e-9841-fe5c84ae9992", attribute:description: "Masterpiece!", attribute:title: "Hello Thar", attribute:links: ["link1", "link2", "link3", "link4"], attribute:author: "Greg">
|
160
144
|
|
161
|
-
Note: A Bigrecord model will return all the columns within the default column family (when :view option is left blank,
|
162
|
-
|
145
|
+
Note: A Bigrecord model will return all the columns within the default column family (when :view option is left blank,
|
146
|
+
for example). You can override the :default name view to change this behaviour.
|
163
147
|
|
164
|
-
(2) If you don't want to define named views ahead of time, you can just pass an array of columns to the :columns
|
148
|
+
(2) If you don't want to define named views ahead of time, you can just pass an array of columns to the :columns
|
149
|
+
option and it will return only those attributes:
|
165
150
|
|
166
151
|
>> Book.find(:first, :columns => [:author, :description])
|
167
152
|
=> #<Book id: "2e13f182-1085-495e-9841-fe5c84ae9992", attribute:description: "Masterpiece!", attribute:author: "Greg">
|
@@ -170,4 +155,5 @@ As you may have noticed, this functionality is synonymous with the :select optio
|
|
170
155
|
|
171
156
|
=== {BigRecord::Embedded Embedded Records}
|
172
157
|
|
173
|
-
=== At this point, usage patterns for a Bigrecord model are similar to that of an Activerecord model, and much of that
|
158
|
+
=== At this point, usage patterns for a Bigrecord model are similar to that of an Activerecord model, and much of that
|
159
|
+
documentation applies as well. Please refer to those and see if they work!
|
@@ -0,0 +1,48 @@
|
|
1
|
+
== Setting up HBase and Stargate
|
2
|
+
|
3
|
+
To quickly get started with development, you can set up HBase to run as a single server on your local computer,
|
4
|
+
along with Stargate, its RESTful web service front-end.
|
5
|
+
|
6
|
+
(1) Download and unpack the most recent release of HBase from http://hadoop.apache.org/hbase/releases.html#Download
|
7
|
+
|
8
|
+
(2) Edit (hbase-dir)/conf/hbase-env.sh and uncomment/modify the following line to correspond to your Java home path:
|
9
|
+
export JAVA_HOME=/usr/lib/jvm/java-6-sun
|
10
|
+
|
11
|
+
(3) Copy (hbase-dir)/contrib/stargate/hbase-<version>-stargate.jar into <hbase-dir>/lib
|
12
|
+
|
13
|
+
(4) Copy all the files in the (hbase-dir)/contrib/stargate/lib folder into <hbase-dir>/lib
|
14
|
+
|
15
|
+
(5) Start up HBase:
|
16
|
+
$ (hbase-dir)/bin/start-hbase.sh
|
17
|
+
|
18
|
+
(6)Start up Stargate (append "-p 1234" at the end if you want to change the port):
|
19
|
+
$ (hbase-dir)/bin/hbase org.apache.hadoop.hbase.stargate.Main
|
20
|
+
|
21
|
+
|
22
|
+
== Setting up Bigrecord
|
23
|
+
|
24
|
+
(1) Install the Bigrecord Driver gem and its dependencies, then start up a DRb server. Please see the Bigrecord Driver
|
25
|
+
documentation for more detailed instructions.
|
26
|
+
(http://github.com/openplaces/bigrecord/blob/master/bigrecord-driver/README.rdoc)
|
27
|
+
|
28
|
+
(2) Add the following line into the Rails::Initializer.run do |config| block:
|
29
|
+
|
30
|
+
config.gem "bigrecord", :source => "http://gemcutter.org"
|
31
|
+
|
32
|
+
and run the following command to install all the gems listed for your Rails app:
|
33
|
+
|
34
|
+
[sudo] rake gems:install
|
35
|
+
|
36
|
+
(3) Bootstrap Bigrecord into your project:
|
37
|
+
|
38
|
+
script/generate bigrecord
|
39
|
+
|
40
|
+
(4) Edit the config/bigrecord.yml[.sample] file in your Rails root to the information corresponding to the Stargate
|
41
|
+
server.
|
42
|
+
|
43
|
+
development:
|
44
|
+
adapter: hbase_rest
|
45
|
+
api_address: http://localhost:8080
|
46
|
+
|
47
|
+
Note: 8080 is the default port that Stargate starts up on. Make sure you modify this if you changed the port from
|
48
|
+
the default.
|
@@ -0,0 +1,310 @@
|
|
1
|
+
Example storage-conf.xml file:
|
2
|
+
|
3
|
+
<!--
|
4
|
+
~ Licensed to the Apache Software Foundation (ASF) under one
|
5
|
+
~ or more contributor license agreements. See the NOTICE file
|
6
|
+
~ distributed with this work for additional information
|
7
|
+
~ regarding copyright ownership. The ASF licenses this file
|
8
|
+
~ to you under the Apache License, Version 2.0 (the
|
9
|
+
~ "License"); you may not use this file except in compliance
|
10
|
+
~ with the License. You may obtain a copy of the License at
|
11
|
+
~
|
12
|
+
~ http://www.apache.org/licenses/LICENSE-2.0
|
13
|
+
~
|
14
|
+
~ Unless required by applicable law or agreed to in writing,
|
15
|
+
~ software distributed under the License is distributed on an
|
16
|
+
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
17
|
+
~ KIND, either express or implied. See the License for the
|
18
|
+
~ specific language governing permissions and limitations
|
19
|
+
~ under the License.
|
20
|
+
-->
|
21
|
+
<Storage>
|
22
|
+
<!--======================================================================-->
|
23
|
+
<!-- Basic Configuration -->
|
24
|
+
<!--======================================================================-->
|
25
|
+
|
26
|
+
<!--
|
27
|
+
~ The name of this cluster. This is mainly used to prevent machines in
|
28
|
+
~ one logical cluster from joining another.
|
29
|
+
-->
|
30
|
+
<ClusterName>Local Testing</ClusterName>
|
31
|
+
|
32
|
+
<!--
|
33
|
+
~ Turn on to make new [non-seed] nodes automatically migrate the right data
|
34
|
+
~ to themselves. (If no InitialToken is specified, they will pick one
|
35
|
+
~ such that they will get half the range of the most-loaded node.)
|
36
|
+
~ If a node starts up without bootstrapping, it will mark itself bootstrapped
|
37
|
+
~ so that you can't subsequently accidently bootstrap a node with
|
38
|
+
~ data on it. (You can reset this by wiping your data and commitlog
|
39
|
+
~ directories.)
|
40
|
+
~
|
41
|
+
~ Off by default so that new clusters and upgraders from 0.4 don't
|
42
|
+
~ bootstrap immediately. You should turn this on when you start adding
|
43
|
+
~ new nodes to a cluster that already has data on it. (If you are upgrading
|
44
|
+
~ from 0.4, start your cluster with it off once before changing it to true.
|
45
|
+
~ Otherwise, no data will be lost but you will incur a lot of unnecessary
|
46
|
+
~ I/O before your cluster starts up.)
|
47
|
+
-->
|
48
|
+
<AutoBootstrap>false</AutoBootstrap>
|
49
|
+
|
50
|
+
<!--
|
51
|
+
~ Keyspaces and ColumnFamilies:
|
52
|
+
~ A ColumnFamily is the Cassandra concept closest to a relational
|
53
|
+
~ table. Keyspaces are separate groups of ColumnFamilies. Except in
|
54
|
+
~ very unusual circumstances you will have one Keyspace per application.
|
55
|
+
|
56
|
+
~ There is an implicit keyspace named 'system' for Cassandra internals.
|
57
|
+
-->
|
58
|
+
<Keyspaces>
|
59
|
+
<Keyspace Name="Bigrecord">
|
60
|
+
<ColumnFamily Name="animals" CompareWith="UTF8Type" />
|
61
|
+
<ColumnFamily Name="books" CompareWith="UTF8Type" />
|
62
|
+
<ColumnFamily Name="companies" CompareWith="UTF8Type" />
|
63
|
+
<ColumnFamily Name="employees" CompareWith="UTF8Type" />
|
64
|
+
<ColumnFamily Name="novels" CompareWith="UTF8Type" />
|
65
|
+
<ColumnFamily Name="zoos" CompareWith="UTF8Type" />
|
66
|
+
|
67
|
+
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
|
68
|
+
|
69
|
+
<ReplicationFactor>1</ReplicationFactor>
|
70
|
+
|
71
|
+
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
|
72
|
+
</Keyspace>
|
73
|
+
</Keyspaces>
|
74
|
+
|
75
|
+
<!--
|
76
|
+
~ Authenticator: any IAuthenticator may be used, including your own as long
|
77
|
+
~ as it is on the classpath. Out of the box, Cassandra provides
|
78
|
+
~ org.apache.cassandra.auth.AllowAllAuthenticator and,
|
79
|
+
~ org.apache.cassandra.auth.SimpleAuthenticator
|
80
|
+
~ (SimpleAuthenticator uses access.properties and passwd.properties by
|
81
|
+
~ default).
|
82
|
+
~
|
83
|
+
~ If you don't specify an authenticator, AllowAllAuthenticator is used.
|
84
|
+
-->
|
85
|
+
<Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator>
|
86
|
+
|
87
|
+
<!--
|
88
|
+
~ Partitioner: any IPartitioner may be used, including your own as long
|
89
|
+
~ as it is on the classpath. Out of the box, Cassandra provides
|
90
|
+
~ org.apache.cassandra.dht.RandomPartitioner,
|
91
|
+
~ org.apache.cassandra.dht.OrderPreservingPartitioner, and
|
92
|
+
~ org.apache.cassandra.dht.CollatingOrderPreservingPartitioner.
|
93
|
+
~ (CollatingOPP colates according to EN,US rules, not naive byte
|
94
|
+
~ ordering. Use this as an example if you need locale-aware collation.)
|
95
|
+
~ Range queries require using an order-preserving partitioner.
|
96
|
+
~
|
97
|
+
~ Achtung! Changing this parameter requires wiping your data
|
98
|
+
~ directories, since the partitioner can modify the sstable on-disk
|
99
|
+
~ format.
|
100
|
+
-->
|
101
|
+
<Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
|
102
|
+
|
103
|
+
<!--
|
104
|
+
~ If you are using an order-preserving partitioner and you know your key
|
105
|
+
~ distribution, you can specify the token for this node to use. (Keys
|
106
|
+
~ are sent to the node with the "closest" token, so distributing your
|
107
|
+
~ tokens equally along the key distribution space will spread keys
|
108
|
+
~ evenly across your cluster.) This setting is only checked the first
|
109
|
+
~ time a node is started.
|
110
|
+
|
111
|
+
~ This can also be useful with RandomPartitioner to force equal spacing
|
112
|
+
~ of tokens around the hash space, especially for clusters with a small
|
113
|
+
~ number of nodes.
|
114
|
+
-->
|
115
|
+
<InitialToken></InitialToken>
|
116
|
+
|
117
|
+
<!--
|
118
|
+
~ Directories: Specify where Cassandra should store different data on
|
119
|
+
~ disk. Keep the data disks and the CommitLog disks separate for best
|
120
|
+
~ performance
|
121
|
+
-->
|
122
|
+
<CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
|
123
|
+
<DataFileDirectories>
|
124
|
+
<DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
|
125
|
+
</DataFileDirectories>
|
126
|
+
|
127
|
+
|
128
|
+
<!--
|
129
|
+
~ Addresses of hosts that are deemed contact points. Cassandra nodes
|
130
|
+
~ use this list of hosts to find each other and learn the topology of
|
131
|
+
~ the ring. You must change this if you are running multiple nodes!
|
132
|
+
-->
|
133
|
+
<Seeds>
|
134
|
+
<Seed>127.0.0.1</Seed>
|
135
|
+
</Seeds>
|
136
|
+
|
137
|
+
|
138
|
+
<!-- Miscellaneous -->
|
139
|
+
|
140
|
+
<!-- Time to wait for a reply from other nodes before failing the command -->
|
141
|
+
<RpcTimeoutInMillis>10000</RpcTimeoutInMillis>
|
142
|
+
<!-- Size to allow commitlog to grow to before creating a new segment -->
|
143
|
+
<CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
|
144
|
+
|
145
|
+
|
146
|
+
<!-- Local hosts and ports -->
|
147
|
+
|
148
|
+
<!--
|
149
|
+
~ Address to bind to and tell other nodes to connect to. You _must_
|
150
|
+
~ change this if you want multiple nodes to be able to communicate!
|
151
|
+
~
|
152
|
+
~ Leaving it blank leaves it up to InetAddress.getLocalHost(). This
|
153
|
+
~ will always do the Right Thing *if* the node is properly configured
|
154
|
+
~ (hostname, name resolution, etc), and the Right Thing is to use the
|
155
|
+
~ address associated with the hostname (it might not be).
|
156
|
+
-->
|
157
|
+
<ListenAddress>localhost</ListenAddress>
|
158
|
+
<!-- internal communications port -->
|
159
|
+
<StoragePort>7000</StoragePort>
|
160
|
+
|
161
|
+
<!--
|
162
|
+
~ The address to bind the Thrift RPC service to. Unlike ListenAddress
|
163
|
+
~ above, you *can* specify 0.0.0.0 here if you want Thrift to listen on
|
164
|
+
~ all interfaces.
|
165
|
+
~
|
166
|
+
~ Leaving this blank has the same effect it does for ListenAddress,
|
167
|
+
~ (i.e. it will be based on the configured hostname of the node).
|
168
|
+
-->
|
169
|
+
<ThriftAddress>localhost</ThriftAddress>
|
170
|
+
<!-- Thrift RPC port (the port clients connect to). -->
|
171
|
+
<ThriftPort>9160</ThriftPort>
|
172
|
+
<!--
|
173
|
+
~ Whether or not to use a framed transport for Thrift. If this option
|
174
|
+
~ is set to true then you must also use a framed transport on the
|
175
|
+
~ client-side, (framed and non-framed transports are not compatible).
|
176
|
+
-->
|
177
|
+
<ThriftFramedTransport>false</ThriftFramedTransport>
|
178
|
+
|
179
|
+
|
180
|
+
<!--======================================================================-->
|
181
|
+
<!-- Memory, Disk, and Performance -->
|
182
|
+
<!--======================================================================-->
|
183
|
+
|
184
|
+
<!--
|
185
|
+
~ Access mode. mmapped i/o is substantially faster, but only practical on
|
186
|
+
~ a 64bit machine (which notably does not include EC2 "small" instances)
|
187
|
+
~ or relatively small datasets. "auto", the safe choice, will enable
|
188
|
+
~ mmapping on a 64bit JVM. Other values are "mmap", "mmap_index_only"
|
189
|
+
~ (which may allow you to get part of the benefits of mmap on a 32bit
|
190
|
+
~ machine by mmapping only index files) and "standard".
|
191
|
+
~ (The buffer size settings that follow only apply to standard,
|
192
|
+
~ non-mmapped i/o.)
|
193
|
+
-->
|
194
|
+
<DiskAccessMode>auto</DiskAccessMode>
|
195
|
+
|
196
|
+
<!--
|
197
|
+
~ Size of compacted row above which to log a warning. (If compacted
|
198
|
+
~ rows do not fit in memory, Cassandra will crash. This is explained
|
199
|
+
~ in http://wiki.apache.org/cassandra/CassandraLimitations and is
|
200
|
+
~ scheduled to be fixed in 0.7.)
|
201
|
+
-->
|
202
|
+
<RowWarningThresholdInMB>512</RowWarningThresholdInMB>
|
203
|
+
|
204
|
+
<!--
|
205
|
+
~ Buffer size to use when performing contiguous column slices. Increase
|
206
|
+
~ this to the size of the column slices you typically perform.
|
207
|
+
~ (Name-based queries are performed with a buffer size of
|
208
|
+
~ ColumnIndexSizeInKB.)
|
209
|
+
-->
|
210
|
+
<SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
|
211
|
+
|
212
|
+
<!--
|
213
|
+
~ Buffer size to use when flushing memtables to disk. (Only one
|
214
|
+
~ memtable is ever flushed at a time.) Increase (decrease) the index
|
215
|
+
~ buffer size relative to the data buffer if you have few (many)
|
216
|
+
~ columns per key. Bigger is only better _if_ your memtables get large
|
217
|
+
~ enough to use the space. (Check in your data directory after your
|
218
|
+
~ app has been running long enough.) -->
|
219
|
+
<FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
|
220
|
+
<FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
|
221
|
+
|
222
|
+
<!--
|
223
|
+
~ Add column indexes to a row after its contents reach this size.
|
224
|
+
~ Increase if your column values are large, or if you have a very large
|
225
|
+
~ number of columns. The competing causes are, Cassandra has to
|
226
|
+
~ deserialize this much of the row to read a single column, so you want
|
227
|
+
~ it to be small - at least if you do many partial-row reads - but all
|
228
|
+
~ the index data is read for each access, so you don't want to generate
|
229
|
+
~ that wastefully either.
|
230
|
+
-->
|
231
|
+
<ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
|
232
|
+
|
233
|
+
<!--
|
234
|
+
~ Flush memtable after this much data has been inserted, including
|
235
|
+
~ overwritten data. There is one memtable per column family, and
|
236
|
+
~ this threshold is based solely on the amount of data stored, not
|
237
|
+
~ actual heap memory usage (there is some overhead in indexing the
|
238
|
+
~ columns).
|
239
|
+
-->
|
240
|
+
<MemtableThroughputInMB>64</MemtableThroughputInMB>
|
241
|
+
<!--
|
242
|
+
~ Throughput setting for Binary Memtables. Typically these are
|
243
|
+
~ used for bulk load so you want them to be larger.
|
244
|
+
-->
|
245
|
+
<BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
|
246
|
+
<!--
|
247
|
+
~ The maximum number of columns in millions to store in memory per
|
248
|
+
~ ColumnFamily before flushing to disk. This is also a per-memtable
|
249
|
+
~ setting. Use with MemtableThroughputInMB to tune memory usage.
|
250
|
+
-->
|
251
|
+
<MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>
|
252
|
+
<!--
|
253
|
+
~ The maximum time to leave a dirty memtable unflushed.
|
254
|
+
~ (While any affected columnfamilies have unflushed data from a
|
255
|
+
~ commit log segment, that segment cannot be deleted.)
|
256
|
+
~ This needs to be large enough that it won't cause a flush storm
|
257
|
+
~ of all your memtables flushing at once because none has hit
|
258
|
+
~ the size or count thresholds yet. For production, a larger
|
259
|
+
~ value such as 1440 is recommended.
|
260
|
+
-->
|
261
|
+
<MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
|
262
|
+
|
263
|
+
<!--
|
264
|
+
~ Unlike most systems, in Cassandra writes are faster than reads, so
|
265
|
+
~ you can afford more of those in parallel. A good rule of thumb is 2
|
266
|
+
~ concurrent reads per processor core. Increase ConcurrentWrites to
|
267
|
+
~ the number of clients writing at once if you enable CommitLogSync +
|
268
|
+
~ CommitLogSyncDelay. -->
|
269
|
+
<ConcurrentReads>8</ConcurrentReads>
|
270
|
+
<ConcurrentWrites>32</ConcurrentWrites>
|
271
|
+
|
272
|
+
<!--
|
273
|
+
~ CommitLogSync may be either "periodic" or "batch." When in batch
|
274
|
+
~ mode, Cassandra won't ack writes until the commit log has been
|
275
|
+
~ fsynced to disk. It will wait up to CommitLogSyncBatchWindowInMS
|
276
|
+
~ milliseconds for other writes, before performing the sync.
|
277
|
+
|
278
|
+
~ This is less necessary in Cassandra than in traditional databases
|
279
|
+
~ since replication reduces the odds of losing data from a failure
|
280
|
+
~ after writing the log entry but before it actually reaches the disk.
|
281
|
+
~ So the other option is "periodic," where writes may be acked immediately
|
282
|
+
~ and the CommitLog is simply synced every CommitLogSyncPeriodInMS
|
283
|
+
~ milliseconds.
|
284
|
+
-->
|
285
|
+
<CommitLogSync>periodic</CommitLogSync>
|
286
|
+
<!--
|
287
|
+
~ Interval at which to perform syncs of the CommitLog in periodic mode.
|
288
|
+
~ Usually the default of 10000ms is fine; increase it if your i/o
|
289
|
+
~ load is such that syncs are taking excessively long times.
|
290
|
+
-->
|
291
|
+
<CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
|
292
|
+
<!--
|
293
|
+
~ Delay (in milliseconds) during which additional commit log entries
|
294
|
+
~ may be written before fsync in batch mode. This will increase
|
295
|
+
~ latency slightly, but can vastly improve throughput where there are
|
296
|
+
~ many writers. Set to zero to disable (each entry will be synced
|
297
|
+
~ individually). Reasonable values range from a minimal 0.1 to 10 or
|
298
|
+
~ even more if throughput matters more than latency.
|
299
|
+
-->
|
300
|
+
<!-- <CommitLogSyncBatchWindowInMS>1</CommitLogSyncBatchWindowInMS> -->
|
301
|
+
|
302
|
+
<!--
|
303
|
+
~ Time to wait before garbage-collection deletion markers. Set this to
|
304
|
+
~ a large enough value that you are confident that the deletion marker
|
305
|
+
~ will be propagated to all replicas by the time this many seconds has
|
306
|
+
~ elapsed, even in the face of hardware failures. The default value is
|
307
|
+
~ ten days.
|
308
|
+
-->
|
309
|
+
<GCGraceSeconds>10</GCGraceSeconds>
|
310
|
+
</Storage>
|
@@ -68,7 +68,7 @@ module BigRecord
|
|
68
68
|
def update_raw(table_name, row, values, timestamp)
|
69
69
|
result = nil
|
70
70
|
log "UPDATE #{table_name} SET #{values.inspect if values} WHERE ROW=#{row};" do
|
71
|
-
result = @connection.insert(table_name, row,
|
71
|
+
result = @connection.insert(table_name, row, values, {:consistency => Cassandra::Consistency::QUORUM})
|
72
72
|
end
|
73
73
|
result
|
74
74
|
end
|
@@ -84,8 +84,7 @@ module BigRecord
|
|
84
84
|
def get_raw(table_name, row, column, options={})
|
85
85
|
result = nil
|
86
86
|
log "SELECT (#{column}) FROM #{table_name} WHERE ROW=#{row};" do
|
87
|
-
|
88
|
-
result = @connection.get(table_name, row, super_column, name)
|
87
|
+
result = @connection.get(table_name, row, column)
|
89
88
|
end
|
90
89
|
result
|
91
90
|
end
|
@@ -103,33 +102,33 @@ module BigRecord
|
|
103
102
|
|
104
103
|
def get_columns_raw(table_name, row, columns, options={})
|
105
104
|
result = {}
|
106
|
-
|
105
|
+
|
107
106
|
log "SELECT (#{columns.join(", ")}) FROM #{table_name} WHERE ROW=#{row};" do
|
108
|
-
|
109
|
-
|
107
|
+
prefix_mode = false
|
108
|
+
prefixes = []
|
110
109
|
|
111
|
-
|
112
|
-
|
110
|
+
columns.each do |name|
|
111
|
+
prefix, name = name.split(":")
|
112
|
+
prefixes << prefix+":" unless prefixes.include?(prefix+":")
|
113
|
+
prefix_mode = name.blank?
|
114
|
+
end
|
113
115
|
|
114
|
-
|
116
|
+
if prefix_mode
|
117
|
+
prefixes.sort!
|
118
|
+
values = @connection.get(table_name, row, {:start => prefixes.first, :finish => prefixes.last + "~"})
|
115
119
|
|
116
|
-
result["id"] = row if values && values.
|
117
|
-
|
118
|
-
|
119
|
-
result[
|
120
|
+
result["id"] = row if values && values.size > 0
|
121
|
+
|
122
|
+
values.each do |key,value|
|
123
|
+
result[key] = value unless value.blank?
|
120
124
|
end
|
121
125
|
else
|
122
|
-
values = @connection.get_columns(table_name, row,
|
126
|
+
values = @connection.get_columns(table_name, row, columns)
|
127
|
+
|
123
128
|
result["id"] = row if values && values.compact.size > 0
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
values[id].each do |column_name, value|
|
128
|
-
next if value.nil?
|
129
|
-
|
130
|
-
full_key = super_columns[id] + ":" + column_name
|
131
|
-
result[full_key] = value
|
132
|
-
end
|
129
|
+
|
130
|
+
columns.each_index do |id|
|
131
|
+
result[columns[id].to_s] = values[id] unless values[id].blank?
|
133
132
|
end
|
134
133
|
end
|
135
134
|
end
|
@@ -144,11 +143,11 @@ module BigRecord
|
|
144
143
|
row_cols.each do |key,value|
|
145
144
|
begin
|
146
145
|
result[key] =
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
146
|
+
if key == 'id'
|
147
|
+
value
|
148
|
+
else
|
149
|
+
deserialize(value)
|
150
|
+
end
|
152
151
|
rescue Exception => e
|
153
152
|
puts "Could not load column value #{key} for row=#{row.name}"
|
154
153
|
end
|
@@ -160,9 +159,9 @@ module BigRecord
|
|
160
159
|
result = []
|
161
160
|
log "SCAN (#{columns.join(", ")}) FROM #{table_name} WHERE START_ROW=#{start_row} AND STOP_ROW=#{stop_row} LIMIT=#{limit};" do
|
162
161
|
options = {}
|
163
|
-
options[:start] = start_row
|
164
|
-
options[:finish] = stop_row
|
165
|
-
options[:count] = limit
|
162
|
+
options[:start] = start_row unless start_row.blank?
|
163
|
+
options[:finish] = stop_row unless stop_row.blank?
|
164
|
+
options[:count] = limit unless limit.blank?
|
166
165
|
|
167
166
|
keys = @connection.get_range(table_name, options)
|
168
167
|
|
@@ -172,14 +171,9 @@ module BigRecord
|
|
172
171
|
row = {}
|
173
172
|
row["id"] = key.key
|
174
173
|
|
175
|
-
key.columns.each do |
|
176
|
-
|
177
|
-
|
178
|
-
|
179
|
-
super_column.columns.each do |column|
|
180
|
-
full_key = super_column_name + ":" + column.name
|
181
|
-
row[full_key] = column.value
|
182
|
-
end
|
174
|
+
key.columns.each do |col|
|
175
|
+
column = col.column
|
176
|
+
row[column.name] = column.value
|
183
177
|
end
|
184
178
|
|
185
179
|
result << row if row.keys.size > 1
|
@@ -266,31 +260,6 @@ module BigRecord
|
|
266
260
|
|
267
261
|
protected
|
268
262
|
|
269
|
-
def data_to_cassandra_format(data = {})
|
270
|
-
super_columns = {}
|
271
|
-
|
272
|
-
data.each do |name, value|
|
273
|
-
super_column, column = name.split(":")
|
274
|
-
super_columns[super_column.to_s] = {} unless super_columns.has_key?(super_column.to_s)
|
275
|
-
super_columns[super_column.to_s][column.to_s] = value
|
276
|
-
end
|
277
|
-
|
278
|
-
return super_columns
|
279
|
-
end
|
280
|
-
|
281
|
-
def columns_to_cassandra_format(column_names = [])
|
282
|
-
super_columns = {}
|
283
|
-
|
284
|
-
column_names.each do |name|
|
285
|
-
super_column, sub_column = name.split(":")
|
286
|
-
|
287
|
-
super_columns[super_column.to_s] = [] unless super_columns.has_key?(super_column.to_s)
|
288
|
-
super_columns[super_column.to_s] << sub_column
|
289
|
-
end
|
290
|
-
|
291
|
-
return super_columns
|
292
|
-
end
|
293
|
-
|
294
263
|
def log(str, name = nil)
|
295
264
|
if block_given?
|
296
265
|
if @logger and @logger.level <= Logger::INFO
|
@@ -346,4 +315,4 @@ module BigRecord
|
|
346
315
|
end
|
347
316
|
end
|
348
317
|
end
|
349
|
-
end
|
318
|
+
end
|
metadata
CHANGED
@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
|
|
5
5
|
segments:
|
6
6
|
- 0
|
7
7
|
- 1
|
8
|
-
-
|
9
|
-
version: 0.1.
|
8
|
+
- 1
|
9
|
+
version: 0.1.1
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- openplaces.org
|
@@ -14,7 +14,7 @@ autorequire:
|
|
14
14
|
bindir: bin
|
15
15
|
cert_chain: []
|
16
16
|
|
17
|
-
date: 2010-
|
17
|
+
date: 2010-05-05 00:00:00 -04:00
|
18
18
|
default_executable:
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|
@@ -77,8 +77,11 @@ extra_rdoc_files:
|
|
77
77
|
- LICENSE
|
78
78
|
- README.rdoc
|
79
79
|
- guides/bigrecord_specs.rdoc
|
80
|
+
- guides/cassandra_install.rdoc
|
80
81
|
- guides/deployment.rdoc
|
81
82
|
- guides/getting_started.rdoc
|
83
|
+
- guides/hbase_install.rdoc
|
84
|
+
- guides/storage-conf.rdoc
|
82
85
|
files:
|
83
86
|
- Rakefile
|
84
87
|
- VERSION
|
@@ -92,8 +95,11 @@ files:
|
|
92
95
|
- generators/bigrecord_model/templates/model.rb
|
93
96
|
- generators/bigrecord_model/templates/model_spec.rb
|
94
97
|
- guides/bigrecord_specs.rdoc
|
98
|
+
- guides/cassandra_install.rdoc
|
95
99
|
- guides/deployment.rdoc
|
96
100
|
- guides/getting_started.rdoc
|
101
|
+
- guides/hbase_install.rdoc
|
102
|
+
- guides/storage-conf.rdoc
|
97
103
|
- init.rb
|
98
104
|
- install.rb
|
99
105
|
- lib/big_record.rb
|