rbhive-vidma 1.0.2.pre1.pre.thrift0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 57a36703b21d26a211961d87447b1a79daf93150
4
+ data.tar.gz: ad474d97732964366d0039d303f6c0d9b1bec260
5
+ SHA512:
6
+ metadata.gz: 5fe36f16504e4e22eaa7a2d0ee0770ede74c58430ec8eba3af8fbdb853a7b0bad49ca9d2bb5791c68a5d5f0f618ec5084387b59763595b74598f93ae6c8e17c9
7
+ data.tar.gz: 4b144de4b232c4592a143b5d9fa849ad9710e09272a4ec5a24f01a4003e7c9d5833b52d4174a18e4cd7698392d2e3aefaff2f3673e8a34b22b192008807b1f79
data/.gitignore ADDED
@@ -0,0 +1,18 @@
1
+ .DS_Store
2
+ *.gem
3
+ *.rbc
4
+ .bundle
5
+ .config
6
+ .yardoc
7
+ Gemfile.lock
8
+ InstalledFiles
9
+ _yardoc
10
+ coverage
11
+ doc/
12
+ lib/bundler/man
13
+ pkg
14
+ rdoc
15
+ spec/reports
16
+ test/tmp
17
+ test/version_tmp
18
+ tmp
data/CHANGELOG.md ADDED
@@ -0,0 +1,16 @@
1
+ # RBHive changelog
2
+
3
+ Versioning prior to 0.5.3 was not tracked, so this changelog only lists changes introduced after 0.5.3.
4
+
5
+ ## 0.6.0
6
+
7
+ 0.6.0 introduces one backwards-incompatible change:
8
+
9
+ * Behaviour change: RBHive will no longer coerce the strings "NULL" or "null" to the Ruby `nil`; the rationale
10
+ for this change is that it introduces hard to trace bugs and does not seem to make sense from a logical
11
+ perspective (Hive's "NULL" is a very different thing to Ruby's `nil`).
12
+
13
+ 0.6.0 introduces support for Hive 0.13, and for the Hive 0.11 version shipped with CDH5 Beta 1 and Beta 2:
14
+
15
+ * Thrift protocol bindings updated to include all the protocols shipped with the Hive 0.13 release.
16
+ * Allow the user to choose a protocol explicitly; provided helper symbols / lookups for common protocols (e.g. CDH4, CDH5)
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "https://rubygems.org"
2
+
3
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) [2013] [Forward3D]
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
6
+ this software and associated documentation files (the "Software"), to deal in
7
+ the Software without restriction, including without limitation the rights to
8
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
9
+ the Software, and to permit persons to whom the Software is furnished to do so,
10
+ subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,344 @@
1
+ # RBHive - A Ruby Thrift client for Apache Hive
2
+
3
+ [![Code Climate](https://codeclimate.com/github/forward3d/rbhive/badges/gpa.svg)](https://codeclimate.com/github/forward3d/rbhive)
4
+
5
+ RBHive is a simple Ruby gem to communicate with the [Apache Hive](http://hive.apache.org)
6
+ Thrift servers.
7
+
8
+ It supports:
9
+ * Hiveserver (the original Thrift service shipped with Hive since early releases)
10
+ * Hiveserver2 (the new, concurrent Thrift service shipped with Hive releases since 0.10)
11
+ * Any other 100% Hive-compatible Thrift service (e.g. [Sharkserver](https://github.com/amplab/shark))
12
+
13
+ It is capable of using the following Thrift transports:
14
+ * BufferedTransport (the default)
15
+ * SaslClientTransport ([SASL-enabled](http://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer) transport)
16
+ * HTTPClientTransport (tunnels Thrift over HTTP)
17
+
18
+ As of version 1.0, it supports asynchronous execution of queries. This allows you to submit
19
+ a query, disconnect, then reconnect later to check the status and retrieve the results.
20
+ This frees systems of the need to keep a persistent TCP connection.
21
+
22
+ ## About Thrift services and transports
23
+
24
+ ### Hiveserver
25
+
26
+ Hiveserver (the original Thrift interface) only supports a single client at a time. RBHive
27
+ implements this with the `RBHive::Connection` class. It only supports a single transport,
28
+ BufferedTransport.
29
+
30
+ ### Hiveserver2
31
+
32
+ [Hiveserver2](https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2)
33
+ (the new Thrift interface) can support many concurrent client connections. It is shipped
34
+ with Hive 0.10 and later. In Hive 0.10, only BufferedTranport and SaslClientTransport are
35
+ supported; starting with Hive 0.12, HTTPClientTransport is also supported.
36
+
37
+ Each of the versions after Hive 0.10 has a slightly different Thrift interface; when
38
+ connecting, you must specify the Hive version or you may get an exception.
39
+
40
+ Hiveserver2 supports (in versions later than 0.12) asynchronous query execution. This
41
+ works by submitting a query and retrieving a handle to the execution process; you can
42
+ then reconnect at a later time and retrieve the results using this handle.
43
+ Using the asynchronous methods has some caveats - please read the Asynchronous Execution
44
+ section of the documentation thoroughly before using them.
45
+
46
+ RBHive implements this client with the `RBHive::TCLIConnection` class.
47
+
48
+ #### Warning!
49
+
50
+ We had to set the following in hive-site.xml to get the BufferedTransport Thrift service
51
+ to work with RBHive:
52
+
53
+ <property>
54
+ <name>hive.server2.enable.doAs</name>
55
+ <value>false</value>
56
+ </property>
57
+
58
+ Otherwise you'll get this nasty-looking exception in the logs:
59
+
60
+ ERROR server.TThreadPoolServer: Error occurred during processing of message.
61
+ java.lang.ClassCastException: org.apache.thrift.transport.TSocket cannot be cast to org.apache.thrift.transport.TSaslServerTransport
62
+ at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:35)
63
+ at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
64
+ at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
65
+ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
66
+ at java.lang.Thread.run(Thread.java:662)
67
+
68
+ ### Other Hive-compatible services
69
+
70
+ Consult the documentation for the service, as this will vary depending on the service you're using.
71
+
72
+ ## Connecting to Hiveserver and Hiveserver2
73
+
74
+ ### Hiveserver
75
+
76
+ Since Hiveserver has no options, connection code is very simple:
77
+
78
+ RBHive.connect('hive.server.address', 10_000) do |connection|
79
+ connection.fetch 'SELECT city, country FROM cities'
80
+ end
81
+ ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
82
+
83
+ ### Hiveserver2
84
+
85
+ Hiveserver2 has several options with how it is run. The connection code takes
86
+ a hash with these possible parameters:
87
+ * `:transport` - one of `:buffered` (BufferedTransport), `:http` (HTTPClientTransport), or `:sasl` (SaslClientTransport)
88
+ * `:hive_version` - the number after the period in the Hive version; e.g. `10`, `11`, `12`, `13` or one of
89
+ a set of symbols; see [Hiveserver2 protocol versions](#hiveserver2-protocol-versions) below for details
90
+ * `:timeout` - if using BufferedTransport or SaslClientTransport, this is how long the timeout on the socket will be
91
+ * `:sasl_params` - if using SaslClientTransport, this is a hash of parameters to set up the SASL connection
92
+
93
+ If you pass either an empty hash or nil in place of the options (or do not supply them), the connection
94
+ is attempted with the Hive version set to 0.10, using `:buffered` as the transport, and a timeout of 1800 seconds.
95
+
96
+ Connecting with the defaults:
97
+
98
+ RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
99
+ connection.fetch('SHOW TABLES')
100
+ end
101
+
102
+ Connecting with a Logger:
103
+
104
+ RBHive.tcli_connect('hive.server.address', 10_000, { logger: Logger.new(STDOUT) }) do |connection|
105
+ connection.fetch('SHOW TABLES')
106
+ end
107
+
108
+ Connecting with a specific Hive version (0.12 in this case):
109
+
110
+ RBHive.tcli_connect('hive.server.address', 10_000, { hive_version: 12 }) do |connection|
111
+ connection.fetch('SHOW TABLES')
112
+ end
113
+
114
+ Connecting with a specific Hive version (0.12) and using the `:http` transport:
115
+
116
+ RBHive.tcli_connect('hive.server.address', 10_000, { hive_version: 12, transport: :http }) do |connection|
117
+ connection.fetch('SHOW TABLES')
118
+ end
119
+
120
+ We have not tested the SASL connection, as we don't run SASL; pull requests and testing are welcomed.
121
+
122
+ #### Hiveserver2 protocol versions
123
+
124
+ Since the introduction of Hiveserver2 in Hive 0.10, there have been a number of revisions to the Thrift protocol it uses.
125
+
126
+ The following table lists the available values you can supply to the `:hive_version` parameter when making a connection
127
+ to Hiveserver2.
128
+
129
+ | value | Thrift protocol version | notes
130
+ | ------- | ----------------------- | -----
131
+ | `10` | V1 | First version of the Thrift protocol used only by Hive 0.10
132
+ | `11` | V2 | Used by the Hive 0.11 release (*but not CDH5 which ships with Hive 0.11!*) - adds asynchronous execution
133
+ | `12` | V3 | Used by the Hive 0.12 release, adds varchar type and primitive type qualifiers
134
+ | `13` | V7 | Used by the Hive 0.13 release, adds features from V4, V5 and V6, plus token-based delegation connections
135
+ | `:cdh4` | V1 | CDH4 uses the V1 protocol as it ships with the upstream Hive 0.10
136
+ | `:cdh5` | V5 | CDH5 ships with upstream Hive 0.11, but adds patches to bring the Thrift protocol up to V5
137
+
138
+ In addition, you can explicitly set the Thrift protocol version according to this table:
139
+
140
+ | value | Thrift protocol version | notes
141
+ | --------------- | ----------------------- | -----
142
+ | `:PROTOCOL_V1` | V1 | Used by Hive 0.10 release
143
+ | `:PROTOCOL_V2` | V2 | Used by Hive 0.11 release
144
+ | `:PROTOCOL_V3` | V3 | Used by Hive 0.12 release
145
+ | `:PROTOCOL_V4` | V4 | Updated during Hive 0.13 development, adds decimal precision/scale, char type
146
+ | `:PROTOCOL_V5` | V5 | Updated during Hive 0.13 development, adds error details when GetOperationStatus returns in error state
147
+ | `:PROTOCOL_V6` | V6 | Updated during Hive 0.13 development, adds binary type for binary payload, uses columnar result set
148
+ | `:PROTOCOL_V7` | V7 | Used by Hive 0.13 release, support for token-based delegation connections
149
+
150
+ ## Asynchronous execution with Hiveserver2
151
+
152
+ In versions of Hive later than 0.12, the Thrift server supports asynchronous execution.
153
+
154
+ The high-level view of using this feature is as follows:
155
+
156
+ 1. Submit your query using `async_execute(query)`. This function returns a hash
157
+ with the following keys: `:guid`, `:secret`, and `:session`. You don't need to
158
+ care about the internals of this hash - all methods that interact with an async
159
+ query require this hash, and you can just store it and hand it to the methods.
160
+ 2. To check the state of the query, call `async_state(handles)`, where `handles`
161
+ is the handles hash given to you when you called `async_execute(query)`.
162
+ 3. To retrieve results, call either `async_fetch(handles)` or `async_fetch_in_batch(handles)`,
163
+ which work like the non async methods.
164
+ 4. When you're done with the query, call `async_close_session(handles)`.
165
+
166
+ ### Memory leaks
167
+
168
+ When you call `async_close_session(handles)`, *all async handles created during this
169
+ session are closed*.
170
+
171
+ If you do not close the sessions you create, *you will leak memory in the Hiveserver2 process*.
172
+ Be very careful to close your sessions!
173
+
174
+ ### Method documentation
175
+
176
+ #### `async_execute(query)`
177
+
178
+ This method submits a query for async execution. The hash you get back is used in the other
179
+ async methods, and will look like this:
180
+
181
+ {
182
+ :guid => (binary string),
183
+ :secret => (binary string),
184
+ :session => (binary string)
185
+ }
186
+
187
+ The Thrift protocol specifies the strings as "binary" - which means they have no encoding.
188
+ Be *extremely* careful when manipulating or storing these values, as they can quite easily
189
+ get converted to UTF-8 strings, which will make them invalid when trying to retrieve async data.
190
+
191
+ #### `async_state(handles)`
192
+
193
+ `handles` is the hash returned by `async_execute(query)`. The state will be a symbol with
194
+ one of the following values and meanings:
195
+
196
+ | symbol | meaning
197
+ | --------------------- | -------
198
+ | :initialized | The query is initialized in Hive and ready to run
199
+ | :running | The query is running (either as a MapReduce job or within process)
200
+ | :finished | The query is completed and results can be retrieved
201
+ | :cancelled | The query was cancelled by a user
202
+ | :closed | Unknown at present
203
+ | :error | The query is invalid semantically or broken in another way
204
+ | :unknown | The query is in an unknown state
205
+ | :pending | The query is ready to run but is not running
206
+
207
+ There are also the utility methods `async_is_complete?(handles)`, `async_is_running?(handles)`,
208
+ `async_is_failed?(handles)` and `async_is_cancelled?(handles)`.
209
+
210
+ #### `async_cancel(handles)`
211
+
212
+ Calling this method will cancel the query in execution.
213
+
214
+ #### `async_fetch(handles)`, `async_fetch_in_batch(handles)`
215
+
216
+ These methods let you fetch the results of the async query, if they are complete. If you call
217
+ these methods on an incomplete query, they will raise an exception. They work in exactly the
218
+ same way as the normal synchronous methods.
219
+
220
+ ## Examples
221
+
222
+ ### Fetching results
223
+
224
+ #### Hiveserver
225
+
226
+ RBHive.connect('hive.server.address', 10_000) do |connection|
227
+ connection.fetch 'SELECT city, country FROM cities'
228
+ end
229
+ ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
230
+
231
+ #### Hiveserver2
232
+
233
+ RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
234
+ connection.fetch 'SELECT city, country FROM cities'
235
+ end
236
+ ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
237
+
238
+ ### Executing a query
239
+
240
+ #### Hiveserver
241
+
242
+ RBHive.connect('hive.server.address') do |connection|
243
+ connection.execute 'DROP TABLE cities'
244
+ end
245
+ ➔ nil
246
+
247
+ #### Hiveserver2
248
+
249
+ RBHive.tcli_connect('hive.server.address') do |connection|
250
+ connection.execute 'DROP TABLE cities'
251
+ end
252
+ ➔ nil
253
+
254
+ ### Creating tables
255
+
256
+ table = TableSchema.new('person', 'List of people that owe me money') do
257
+ column 'name', :string, 'Full name of debtor'
258
+ column 'address', :string, 'Address of debtor'
259
+ column 'amount', :float, 'The amount of money borrowed'
260
+
261
+ partition 'dated', :string, 'The date money was given'
262
+ partition 'country', :string, 'The country the person resides in'
263
+ end
264
+
265
+ Then for Hiveserver:
266
+
267
+ RBHive.connect('hive.server.address', 10_000) do |connection|
268
+ connection.create_table(table)
269
+ end
270
+
271
+ Or Hiveserver2:
272
+
273
+ RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
274
+ connection.create_table(table)
275
+ end
276
+
277
+ ### Modifying table schema
278
+
279
+ table = TableSchema.new('person', 'List of people that owe me money') do
280
+ column 'name', :string, 'Full name of debtor'
281
+ column 'address', :string, 'Address of debtor'
282
+ column 'amount', :float, 'The amount of money borrowed'
283
+ column 'new_amount', :float, 'The new amount this person somehow convinced me to give them'
284
+
285
+ partition 'dated', :string, 'The date money was given'
286
+ partition 'country', :string, 'The country the person resides in'
287
+ end
288
+
289
+ Then for Hiveserver:
290
+
291
+ RBHive.connect('hive.server.address') do |connection|
292
+ connection.replace_columns(table)
293
+ end
294
+
295
+ Or Hiveserver2:
296
+
297
+ RBHive.tcli_connect('hive.server.address') do |connection|
298
+ connection.replace_columns(table)
299
+ end
300
+
301
+ ### Setting properties
302
+
303
+ You can set various properties for Hive tasks, some of which change how they run. Consult the Apache
304
+ Hive documentation and Hadoop's documentation for the various properties that can be set.
305
+ For example, you can set the map-reduce job's priority with the following:
306
+
307
+ connection.set("mapred.job.priority", "VERY_HIGH")
308
+
309
+ ### Inspecting tables
310
+
311
+ #### Hiveserver
312
+
313
+ RBHive.connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
314
+ result = connection.fetch("describe some_table")
315
+ puts result.column_names.inspect
316
+ puts result.first.inspect
317
+ }
318
+
319
+ #### Hiveserver2
320
+
321
+ RBHive.tcli_connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
322
+ result = connection.fetch("describe some_table")
323
+ puts result.column_names.inspect
324
+ puts result.first.inspect
325
+ }
326
+
327
+ ## Testing
328
+
329
+ We use RBHive against Hive 0.10, 0.11 and 0.12, and have tested the BufferedTransport and
330
+ HTTPClientTransport. We use it against both Hiveserver and Hiveserver2 with success.
331
+
332
+ We have _not_ tested the SaslClientTransport, and would welcome reports
333
+ on whether it works correctly.
334
+
335
+ ## Contributing
336
+
337
+ We welcome contributions, issues and pull requests. If there's a feature missing in RBHive that you need, or you
338
+ think you've found a bug, please do not hesitate to create an issue.
339
+
340
+ 1. Fork it
341
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
342
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
343
+ 4. Push to the branch (`git push origin my-new-feature`)
344
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
data/lib/rbhive.rb ADDED
@@ -0,0 +1,8 @@
1
+ require File.join(File.dirname(__FILE__), 'rbhive', 'connection')
2
+ require File.join(File.dirname(__FILE__), 'rbhive', 'table_schema')
3
+ require File.join(File.dirname(__FILE__), 'rbhive', 'result_set')
4
+ require File.join(File.dirname(__FILE__), 'rbhive', 'explain_result')
5
+ require File.join(File.dirname(__FILE__), 'rbhive', 'schema_definition')
6
+ require File.join(File.dirname(__FILE__), *%w[rbhive t_c_l_i_result_set])
7
+ require File.join(File.dirname(__FILE__), *%w[rbhive t_c_l_i_schema_definition])
8
+ require File.join(File.dirname(__FILE__), *%w[rbhive t_c_l_i_connection])
@@ -0,0 +1,150 @@
1
+ # suppress warnings
2
+ old_verbose, $VERBOSE = $VERBOSE, nil
3
+ # require thrift autogenerated files
4
+ require File.join(File.split(File.dirname(__FILE__)).first, *%w[thrift thrift_hive])
5
+ # require 'thrift'
6
+ # restore warnings
7
+ $VERBOSE = old_verbose
8
+
9
+ module RBHive
10
+ def connect(server, port=10_000)
11
+ connection = RBHive::Connection.new(server, port)
12
+ ret = nil
13
+ begin
14
+ connection.open
15
+ ret = yield(connection)
16
+ ensure
17
+ connection.close
18
+ ret
19
+ end
20
+ end
21
+ module_function :connect
22
+
23
+ class StdOutLogger
24
+ %w(fatal error warn info debug).each do |level|
25
+ define_method level.to_sym do |message|
26
+ STDOUT.puts(message)
27
+ end
28
+ end
29
+ end
30
+
31
+ class Connection
32
+ attr_reader :client
33
+
34
+ def initialize(server, port=10_000, logger=StdOutLogger.new)
35
+ @socket = Thrift::Socket.new(server, port)
36
+ @transport = Thrift::BufferedTransport.new(@socket)
37
+ @protocol = Thrift::BinaryProtocol.new(@transport)
38
+ @client = Hive::Thrift::ThriftHive::Client.new(@protocol)
39
+ @logger = logger
40
+ @logger.info("Connecting to #{server} on port #{port}")
41
+ @mutex = Mutex.new
42
+ end
43
+
44
+ def open
45
+ @transport.open
46
+ end
47
+
48
+ def close
49
+ @transport.close
50
+ end
51
+
52
+ def client
53
+ @client
54
+ end
55
+
56
+ def execute(query)
57
+ execute_safe(query)
58
+ end
59
+
60
+ def explain(query)
61
+ safe do
62
+ execute_unsafe("EXPLAIN "+ query)
63
+ ExplainResult.new(client.fetchAll)
64
+ end
65
+ end
66
+
67
+ def priority=(priority)
68
+ set("mapred.job.priority", priority)
69
+ end
70
+
71
+ def queue=(queue)
72
+ set("mapred.job.queue.name", queue)
73
+ end
74
+
75
+ def set(name,value)
76
+ @logger.info("Setting #{name}=#{value}")
77
+ client.execute("SET #{name}=#{value}")
78
+ end
79
+
80
+ def fetch(query)
81
+ safe do
82
+ execute_unsafe(query)
83
+ rows = client.fetchAll
84
+ the_schema = SchemaDefinition.new(client.getSchema, rows.first)
85
+ ResultSet.new(rows, the_schema)
86
+ end
87
+ end
88
+
89
+ def fetch_in_batch(query, batch_size=1_000)
90
+ safe do
91
+ execute_unsafe(query)
92
+ until (next_batch = client.fetchN(batch_size)).empty?
93
+ the_schema ||= SchemaDefinition.new(client.getSchema, next_batch.first)
94
+ yield ResultSet.new(next_batch, the_schema)
95
+ end
96
+ end
97
+ end
98
+
99
+ def first(query)
100
+ safe do
101
+ execute_unsafe(query)
102
+ row = client.fetchOne
103
+ the_schema = SchemaDefinition.new(client.getSchema, row)
104
+ ResultSet.new([row], the_schema).first
105
+ end
106
+ end
107
+
108
+ def schema(example_row=[])
109
+ safe { SchemaDefinition.new(client.getSchema, example_row) }
110
+ end
111
+
112
+ def create_table(schema)
113
+ execute(schema.create_table_statement)
114
+ end
115
+
116
+ def drop_table(name)
117
+ name = name.name if name.is_a?(TableSchema)
118
+ execute("DROP TABLE `#{name}`")
119
+ end
120
+
121
+ def replace_columns(schema)
122
+ execute(schema.replace_columns_statement)
123
+ end
124
+
125
+ def add_columns(schema)
126
+ execute(schema.add_columns_statement)
127
+ end
128
+
129
+ def method_missing(meth, *args)
130
+ client.send(meth, *args)
131
+ end
132
+
133
+ private
134
+
135
+ def execute_safe(query)
136
+ safe { execute_unsafe(query) }
137
+ end
138
+
139
+ def execute_unsafe(query)
140
+ @logger.info("Executing Hive Query: #{query}")
141
+ client.execute(query)
142
+ end
143
+
144
+ def safe
145
+ ret = nil
146
+ @mutex.synchronize { ret = yield }
147
+ ret
148
+ end
149
+ end
150
+ end