rbhive 0.2.95 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ *.gem
2
+ .DS_Store
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) [2013] [Forward3D]
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
6
+ this software and associated documentation files (the "Software"), to deal in
7
+ the Software without restriction, including without limitation the rights to
8
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
9
+ the Software, and to permit persons to whom the Software is furnished to do so,
10
+ subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,204 @@
1
+ # RBHive -- Ruby thrift lib for executing Hive queries
2
+
3
+ RBHive is a simple Ruby gem to communicate with the [Apache Hive](http://hive.apache.org)
4
+ Thrift server.
5
+
6
+ It supports:
7
+ * Hiveserver (the original Thrift service shipped with Hive since early releases)
8
+ * Hiveserver2 (the new, concurrent Thrift service shipped with Hive releases since 0.10)
9
+ * Any other 100% Hive-compatible Thrift service (e.g. [Sharkserver](https://github.com/amplab/shark))
10
+
11
+ It is capable of using the following Thrift transports:
12
+ * BufferedTransport (the default)
13
+ * SaslClientTransport ([SASL-enabled](http://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer) transport)
14
+ * HTTPClientTransport (tunnels Thrift over HTTP)
15
+
16
+ ## About Thrift services and transports
17
+
18
+ ### Hiveserver
19
+
20
+ Hiveserver (the original Thrift interface) only supports a single client at a time. RBHive
21
+ implements this with the `RBHive::Connection` class. It only supports a single transport,
22
+ BufferedTransport.
23
+
24
+ ### Hiveserver2
25
+
26
+ [Hiveserver2](https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2)
27
+ (the new Thrift interface) can support many concurrent client connections. It is shipped
28
+ with Hive 0.10 and later. In Hive 0.10, only BufferedTranport and SaslClientTransport are
29
+ supported; starting with Hive 0.12, HTTPClientTransport is also supported.
30
+
31
+ Each of the versions after Hive 0.10 has a slightly different Thrift interface; when
32
+ connecting, you must specify the Hive version or you may get an exception.
33
+
34
+ RBHive implements this client with the `RBHive::TCLIConnection` class.
35
+
36
+ ### Other Hive-compatible services
37
+
38
+ Consult the documentation for the service, as this will vary depending on the service you're using.
39
+
40
+ ## Connecting to Hiveserver and Hiveserver2
41
+
42
+ ### Hiveserver
43
+
44
+ Since Hiveserver has no options, connection code is very simple:
45
+
46
+ RBHive.connect('hive.server.address', 10_000) do |connection|
47
+ connection.fetch 'SELECT city, country FROM cities'
48
+ end
49
+ ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
50
+
51
+ ### Hiveserver2
52
+
53
+ Hiveserver2 has several options with how it is run. The connection code takes
54
+ a hash with these possible parameters:
55
+ * `:transport` - one of `:buffered` (BufferedTransport), `:http` (HTTPClientTransport), or `:sasl` (SaslClientTransport)
56
+ * `:hive_version` - the number after the period in the Hive version; e.g. `10`, `11`, `12`
57
+ * `:timeout` - if using BufferedTransport or SaslClientTransport, this is how long the timeout on the socket will be
58
+ * `:sasl_params` - if using SaslClientTransport, this is a hash of parameters to set up the SASL connection
59
+
60
+ If you pass either an empty hash or nil in place of the options (or do not supply them), the connection
61
+ is attempted with the Hive version set to 0.10, using `:buffered` as the transport, and a timeout of 1800 seconds.
62
+
63
+ Connecting with the defaults:
64
+
65
+ RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
66
+ connection.fetch('SHOW TABLES')
67
+ end
68
+
69
+ Connecting with a specific Hive version (0.12 in this case):
70
+
71
+ RBHive.tcli_connect('hive.server.address', 10_000, {:hive_version => 12}) do |connection|
72
+ connection.fetch('SHOW TABLES')
73
+ end
74
+
75
+ Connecting with a specific Hive version (0.12) and using the `:http` transport:
76
+
77
+ RBHive.tcli_connect('hive.server.address', 10_000, {:hive_version => 12, :transport => :http}) do |connection|
78
+ connection.fetch('SHOW TABLES')
79
+ end
80
+
81
+ We have not tested the SASL connection, as we don't run SASL; pull requests and testing are welcomed.
82
+
83
+ ## Examples
84
+
85
+ ### Fetching results
86
+
87
+ #### Hiveserver
88
+
89
+ RBHive.connect('hive.server.address', 10_000) do |connection|
90
+ connection.fetch 'SELECT city, country FROM cities'
91
+ end
92
+ ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
93
+
94
+ #### Hiveserver2
95
+
96
+ RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
97
+ connection.fetch 'SELECT city, country FROM cities'
98
+ end
99
+ ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
100
+
101
+ ### Executing a query
102
+
103
+ #### Hiveserver
104
+
105
+ RBHive.connect('hive.server.address') do |connection|
106
+ connection.execute 'DROP TABLE cities'
107
+ end
108
+ ➔ nil
109
+
110
+ #### Hiveserver2
111
+
112
+ RBHive.tcli_connect('hive.server.address') do |connection|
113
+ connection.execute 'DROP TABLE cities'
114
+ end
115
+ ➔ nil
116
+
117
+ ### Creating tables
118
+
119
+ table = TableSchema.new('person', 'List of people that owe me money') do
120
+ column 'name', :string, 'Full name of debtor'
121
+ column 'address', :string, 'Address of debtor'
122
+ column 'amount', :float, 'The amount of money borrowed'
123
+
124
+ partition 'dated', :string, 'The date money was given'
125
+ partition 'country', :string, 'The country the person resides in'
126
+ end
127
+
128
+ Then for Hiveserver:
129
+
130
+ RBHive.connect('hive.server.address', 10_000) do |connection|
131
+ connection.create_table(table)
132
+ end
133
+
134
+ Or Hiveserver2:
135
+
136
+ RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
137
+ connection.create_table(table)
138
+ end
139
+
140
+ ### Modifying table schema
141
+
142
+ table = TableSchema.new('person', 'List of people that owe me money') do
143
+ column 'name', :string, 'Full name of debtor'
144
+ column 'address', :string, 'Address of debtor'
145
+ column 'amount', :float, 'The amount of money borrowed'
146
+ column 'new_amount', :float, 'The new amount this person somehow convinced me to give them'
147
+
148
+ partition 'dated', :string, 'The date money was given'
149
+ partition 'country', :string, 'The country the person resides in'
150
+ end
151
+
152
+ Then for Hiveserver:
153
+
154
+ RBHive.connect('hive.server.address') do |connection|
155
+ connection.replace_columns(table)
156
+ end
157
+
158
+ Or Hiveserver2:
159
+
160
+ RBHive.tcli_connect('hive.server.address') do |connection|
161
+ connection.replace_columns(table)
162
+ end
163
+
164
+ ### Setting properties
165
+
166
+ You can set various properties for Hive tasks, some of which change how they run. Consult the Apache
167
+ Hive documentation and Hadoop's documentation for the various properties that can be set.
168
+ For example, you can set the map-reduce job's priority with the following:
169
+
170
+ connection.set("mapred.job.priority", "VERY_HIGH")
171
+
172
+ ### Inspecting tables
173
+
174
+ #### Hiveserver
175
+
176
+ RBHive.connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
177
+ result = connection.fetch("describe some_table")
178
+ puts result.column_names.inspect
179
+ puts result.first.inspect
180
+ }
181
+
182
+ #### Hiveserver2
183
+
184
+ RBHive.tcli_connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
185
+ result = connection.fetch("describe some_table")
186
+ puts result.column_names.inspect
187
+ puts result.first.inspect
188
+ }
189
+
190
+ ## Testing
191
+
192
+ We use RBHive against Hive 0.10, 0.11 and 0.12, and have tested the BufferedTransport and
193
+ HTTPClientTransport. We use it against both Hiveserver and Hiveserver2 with success.
194
+
195
+ We have _not_ tested the SaslClientTransport, and would welcome reports
196
+ on whether it works correctly.
197
+
198
+ ## Contributing
199
+
200
+ 1. Fork it
201
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
202
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
203
+ 4. Push to the branch (`git push origin my-new-feature`)
204
+ 5. Create new Pull Request
@@ -2,6 +2,7 @@
2
2
  old_verbose, $VERBOSE = $VERBOSE, nil
3
3
  # require thrift autogenerated files
4
4
  require File.join(File.dirname(__FILE__), *%w[.. thrift thrift_hive])
5
+ # require 'thrift'
5
6
  # restore warnings
6
7
  $VERBOSE = old_verbose
7
8
 
@@ -34,7 +35,7 @@ module RBHive
34
35
  @socket = Thrift::Socket.new(server, port)
35
36
  @transport = Thrift::BufferedTransport.new(@socket)
36
37
  @protocol = Thrift::BinaryProtocol.new(@transport)
37
- @client = ThriftHive::Client.new(@protocol)
38
+ @client = Hive::Thrift::ThriftHive::Client.new(@protocol)
38
39
  @logger = logger
39
40
  @logger.info("Connecting to #{server} on port #{port}")
40
41
  @mutex = Mutex.new
@@ -0,0 +1,315 @@
1
+ # suppress warnings
2
+ old_verbose, $VERBOSE = $VERBOSE, nil
3
+
4
+ raise 'Thrift is not loaded' unless defined?(Thrift)
5
+ raise 'RBHive is not loaded' unless defined?(RBHive)
6
+
7
+ # require thrift autogenerated files
8
+ require File.join(File.dirname(__FILE__), *%w[.. thrift t_c_l_i_service_constants])
9
+ require File.join(File.dirname(__FILE__), *%w[.. thrift t_c_l_i_service])
10
+ require File.join(File.dirname(__FILE__), *%w[.. thrift sasl_client_transport])
11
+
12
+ # restore warnings
13
+ $VERBOSE = old_verbose
14
+
15
+ # Monkey patch thrift to set an infinite read timeout
16
+ module Thrift
17
+ class HTTPClientTransport < BaseTransport
18
+ def flush
19
+ http = Net::HTTP.new @url.host, @url.port
20
+ http.use_ssl = @url.scheme == 'https'
21
+ http.read_timeout = nil
22
+ http.verify_mode = @ssl_verify_mode if @url.scheme == 'https'
23
+ resp = http.post(@url.request_uri, @outbuf, @headers)
24
+ data = resp.body
25
+ data = Bytes.force_binary_encoding(data)
26
+ @inbuf = StringIO.new data
27
+ @outbuf = Bytes.empty_byte_buffer
28
+ end
29
+ end
30
+ end
31
+
32
+ module RBHive
33
+
34
+ HIVE_THRIFT_MAPPING = {
35
+ 10 => 0,
36
+ 11 => 1,
37
+ 12 => 2
38
+ }
39
+
40
+ def tcli_connect(server, port=10_000, options)
41
+ connection = RBHive::TCLIConnection.new(server, port, options)
42
+ ret = nil
43
+ begin
44
+ connection.open
45
+ connection.open_session
46
+ ret = yield(connection)
47
+
48
+ ensure
49
+ # Try to close the session and our connection if those are still open, ignore io errors
50
+ begin
51
+ connection.close_session if connection.session
52
+ connection.close
53
+ rescue IOError => e
54
+ # noop
55
+ end
56
+ end
57
+
58
+ return ret
59
+ end
60
+ module_function :tcli_connect
61
+
62
+ class StdOutLogger
63
+ %w(fatal error warn info debug).each do |level|
64
+ define_method level.to_sym do |message|
65
+ STDOUT.puts(message)
66
+ end
67
+ end
68
+ end
69
+
70
+ class TCLIConnection
71
+ attr_reader :client
72
+
73
+ def initialize(server, port=10_000, options={}, logger=StdOutLogger.new)
74
+ options ||= {} # backwards compatibility
75
+ raise "'options' parameter must be a hash" unless options.is_a?(Hash)
76
+
77
+ if options[:transport] == :sasl and options[:sasl_params].nil?
78
+ raise ":transport is set to :sasl, but no :sasl_params option was supplied"
79
+ end
80
+
81
+ # Defaults to buffered transport, Hive 0.10, 1800 second timeout
82
+ options[:transport] ||= :buffered
83
+ options[:hive_version] ||= 10
84
+ options[:timeout] ||= 1800
85
+ @options = options
86
+
87
+ # Look up the appropriate Thrift protocol version for the supplied Hive version
88
+ @thrift_protocol_version = thrift_hive_protocol(options[:hive_version])
89
+
90
+ @logger = logger
91
+ @transport = thrift_transport(server, port)
92
+ @protocol = Thrift::BinaryProtocol.new(@transport)
93
+ @client = Hive2::Thrift::TCLIService::Client.new(@protocol)
94
+ @session = nil
95
+ @logger.info("Connecting to HiveServer2 #{server} on port #{port}")
96
+ @mutex = Mutex.new
97
+ end
98
+
99
+ def thrift_hive_protocol(version)
100
+ HIVE_THRIFT_MAPPING[version] || raise("Invalid Hive version")
101
+ end
102
+
103
+ def thrift_transport(server, port)
104
+ @logger.info("Initializing transport #{@options[:transport]}")
105
+ case @options[:transport]
106
+ when :buffered
107
+ return Thrift::BufferedTransport.new(thrift_socket(server, port, @options[:timeout]))
108
+ when :sasl
109
+ return Thrift::SaslClientTransport.new(thrift_socket(server, port, @options[:timeout]),
110
+ parse_sasl_params(@options[:sasl_params]))
111
+ when :http
112
+ return Thrift::HTTPClientTransport.new("http://#{server}:#{port}/cliservice")
113
+ else
114
+ raise "Unrecognised transport type '#{transport}'"
115
+ end
116
+ end
117
+
118
+ def thrift_socket(server, port, timeout)
119
+ socket = Thrift::Socket.new(server, port)
120
+ socket.timeout = timeout
121
+ socket
122
+ end
123
+
124
+ # Processes SASL connection params and returns a hash with symbol keys or a nil
125
+ def parse_sasl_params(sasl_params)
126
+ # Symbilize keys in a hash
127
+ if sasl_params.kind_of?(Hash)
128
+ return sasl_params.inject({}) do |memo,(k,v)|
129
+ memo[k.to_sym] = v;
130
+ memo
131
+ end
132
+ end
133
+ return nil
134
+ end
135
+
136
+ def open
137
+ @transport.open
138
+ end
139
+
140
+ def close
141
+ @transport.close
142
+ end
143
+
144
+ def open_session
145
+ @session = @client.OpenSession(prepare_open_session(@thrift_protocol_version))
146
+ end
147
+
148
+ def close_session
149
+ @client.CloseSession prepare_close_session
150
+ @session = nil
151
+ end
152
+
153
+ def session
154
+ @session && @session.sessionHandle
155
+ end
156
+
157
+ def client
158
+ @client
159
+ end
160
+
161
+ def execute(query)
162
+ execute_safe(query)
163
+ end
164
+
165
+ def priority=(priority)
166
+ set("mapred.job.priority", priority)
167
+ end
168
+
169
+ def queue=(queue)
170
+ set("mapred.job.queue.name", queue)
171
+ end
172
+
173
+ def set(name,value)
174
+ @logger.info("Setting #{name}=#{value}")
175
+ self.execute("SET #{name}=#{value}")
176
+ end
177
+
178
+ # Performs a query on the server, fetches up to *max_rows* rows and returns them as an array.
179
+ def fetch(query, max_rows = 100)
180
+ safe do
181
+ # Execute the query and check the result
182
+ exec_result = execute_unsafe(query)
183
+ raise_error_if_failed!(exec_result)
184
+
185
+ # Get search operation handle to fetch the results
186
+ op_handle = exec_result.operationHandle
187
+
188
+ # Prepare and execute fetch results request
189
+ fetch_req = prepare_fetch_results(op_handle, :first, max_rows)
190
+ fetch_results = client.FetchResults(fetch_req)
191
+ raise_error_if_failed!(fetch_results)
192
+
193
+ # Get data rows and format the result
194
+ rows = fetch_results.results.rows
195
+ the_schema = TCLISchemaDefinition.new(get_schema_for( op_handle ), rows.first)
196
+ TCLIResultSet.new(rows, the_schema)
197
+ end
198
+ end
199
+
200
+ # Performs a query on the server, fetches the results in batches of *batch_size* rows
201
+ # and yields the result batches to a given block as arrays of rows.
202
+ def fetch_in_batch(query, batch_size = 1000, &block)
203
+ raise "No block given for the batch fetch request!" unless block_given?
204
+ safe do
205
+ # Execute the query and check the result
206
+ exec_result = execute_unsafe(query)
207
+ raise_error_if_failed!(exec_result)
208
+
209
+ # Get search operation handle to fetch the results
210
+ op_handle = exec_result.operationHandle
211
+
212
+ # Prepare fetch results request
213
+ fetch_req = prepare_fetch_results(op_handle, :next, batch_size)
214
+
215
+ # Now let's iterate over the results
216
+ loop do
217
+ # Fetch next batch and raise an exception if it failed
218
+ fetch_results = client.FetchResults(fetch_req)
219
+ raise_error_if_failed!(fetch_results)
220
+
221
+ # Get data rows from the result
222
+ rows = fetch_results.results.rows
223
+ break if rows.empty?
224
+
225
+ # Prepare schema definition for the row
226
+ schema_for_req ||= get_schema_for(op_handle)
227
+ the_schema ||= TCLISchemaDefinition.new(schema_for_req, rows.first)
228
+
229
+ # Format the results and yield them to the given block
230
+ yield TCLIResultSet.new(rows, the_schema)
231
+ end
232
+ end
233
+ end
234
+
235
+ def create_table(schema)
236
+ execute(schema.create_table_statement)
237
+ end
238
+
239
+ def drop_table(name)
240
+ name = name.name if name.is_a?(TableSchema)
241
+ execute("DROP TABLE `#{name}`")
242
+ end
243
+
244
+ def replace_columns(schema)
245
+ execute(schema.replace_columns_statement)
246
+ end
247
+
248
+ def add_columns(schema)
249
+ execute(schema.add_columns_statement)
250
+ end
251
+
252
+ def method_missing(meth, *args)
253
+ client.send(meth, *args)
254
+ end
255
+
256
+ private
257
+
258
+ def execute_safe(query)
259
+ safe { execute_unsafe(query) }
260
+ end
261
+
262
+ def execute_unsafe(query)
263
+ @logger.info("Executing Hive Query: #{query}")
264
+ req = prepare_execute_statement(query)
265
+ client.ExecuteStatement(req)
266
+ end
267
+
268
+ def safe
269
+ ret = nil
270
+ @mutex.synchronize { ret = yield }
271
+ ret
272
+ end
273
+
274
+ def prepare_open_session(client_protocol)
275
+ req = ::Hive2::Thrift::TOpenSessionReq.new( @options[:sasl_params].nil? ? [] : @options[:sasl_params] )
276
+ req.client_protocol = client_protocol
277
+ req
278
+ end
279
+
280
+ def prepare_close_session
281
+ ::Hive2::Thrift::TCloseSessionReq.new( sessionHandle: self.session )
282
+ end
283
+
284
+ def prepare_execute_statement(query)
285
+ ::Hive2::Thrift::TExecuteStatementReq.new( sessionHandle: self.session, statement: query.to_s, confOverlay: {} )
286
+ end
287
+
288
+ def prepare_fetch_results(handle, orientation=:first, rows=100)
289
+ orientation_value = "FETCH_#{orientation.to_s.upcase}"
290
+ valid_orientations = ::Hive2::Thrift::TFetchOrientation::VALUE_MAP.values
291
+ unless valid_orientations.include?(orientation_value)
292
+ raise ArgumentError, "Invalid orientation: #{orientation.inspect}"
293
+ end
294
+ orientation_const = eval("::Hive2::Thrift::TFetchOrientation::#{orientation_value}")
295
+ ::Hive2::Thrift::TFetchResultsReq.new(
296
+ operationHandle: handle,
297
+ orientation: orientation_const,
298
+ maxRows: rows
299
+ )
300
+ end
301
+
302
+ def get_schema_for(handle)
303
+ req = ::Hive2::Thrift::TGetResultSetMetadataReq.new( operationHandle: handle )
304
+ metadata = client.GetResultSetMetadata( req )
305
+ metadata.schema
306
+ end
307
+
308
+ # Raises an exception if given operation result is a failure
309
+ def raise_error_if_failed!(result)
310
+ return if result.status.statusCode == 0
311
+ error_message = result.status.errorMessage || 'Execution failed!'
312
+ raise error_message
313
+ end
314
+ end
315
+ end
@@ -0,0 +1,3 @@
1
+ module RBHive
2
+ class TCLIResultSet < ResultSet; end
3
+ end
@@ -0,0 +1,87 @@
1
+ require 'json'
2
+
3
+ module RBHive
4
+ class TCLISchemaDefinition
5
+ attr_reader :schema
6
+
7
+ TYPES = {
8
+ :boolean => :to_s,
9
+ :string => :to_s,
10
+ :bigint => :to_i,
11
+ :float => :to_f,
12
+ :double => :to_f,
13
+ :int => :to_i,
14
+ :bigint => :to_i,
15
+ :smallint => :to_i,
16
+ :tinyint => :to_i,
17
+ }
18
+
19
+ def initialize(schema, example_row)
20
+ @schema = schema
21
+ @example_row = example_row ? example_row.colVals : []
22
+ end
23
+
24
+ def column_names
25
+ @column_names ||= begin
26
+ schema_names = @schema.columns.map {|c| c.columnName }
27
+
28
+ # In rare cases Hive can return two identical column names
29
+ # consider SELECT a.foo, b.foo...
30
+ # in this case you get two columns called foo with no disambiguation.
31
+ # as a (far from ideal) solution we detect this edge case and rename them
32
+ # a.foo => foo1, b.foo => foo2
33
+ # otherwise we will trample one of the columns during Hash mapping.
34
+ s = Hash.new(0)
35
+ schema_names.map! { |c| s[c] += 1; s[c] > 1 ? "#{c}---|---#{s[c]}" : c }
36
+ schema_names.map! { |c| s[c] > 1 ? "#{c}---|---1" : c }
37
+ schema_names.map! { |c| c.gsub('---|---', '_').to_sym }
38
+
39
+ # Lets fix the fact that Hive doesn't return schema data for partitions on SELECT * queries
40
+ # For now we will call them :_p1, :_p2, etc. to avoid collisions.
41
+ offset = 0
42
+ while schema_names.length < @example_row.length
43
+ schema_names.push(:"_p#{offset+=1}")
44
+ end
45
+ schema_names
46
+ end
47
+ end
48
+
49
+ def column_type_map
50
+ @column_type_map ||= column_names.inject({}) do |hsh, c|
51
+ definition = @schema.columns.find {|s| s.columnName.to_sym == c }
52
+ # If the column isn't in the schema (eg partitions in SELECT * queries) assume they are strings
53
+ type = TYPE_NAMES[definition.typeDesc.types.first.primitiveEntry.type].downcase rescue nil
54
+ hsh[c] = definition && type ? type.to_sym : :string
55
+ hsh
56
+ end
57
+ end
58
+
59
+ def coerce_row(row)
60
+ column_names.zip(row.colVals.map(&:get_value).map(&:value)).inject({}) do |hsh, (column_name, value)|
61
+ hsh[column_name] = coerce_column(column_name, value)
62
+ hsh
63
+ end
64
+ end
65
+
66
+ def coerce_column(column_name, value)
67
+ type = column_type_map[column_name]
68
+ return 1.0/0.0 if(type != :string && value == "Infinity")
69
+ return 0.0/0.0 if(type != :string && value == "NaN")
70
+ return nil if value.nil? || value == 'NULL' || value == 'null'
71
+ return coerce_complex_value(value) if type.to_s =~ /^array/
72
+ conversion_method = TYPES[type]
73
+ conversion_method ? value.send(conversion_method) : value
74
+ end
75
+
76
+ def coerce_row_to_array(row)
77
+ column_names.map { |n| row[n] }
78
+ end
79
+
80
+ def coerce_complex_value(value)
81
+ return nil if value.nil?
82
+ return nil if value.length == 0
83
+ return nil if value == 'null'
84
+ JSON.parse(value)
85
+ end
86
+ end
87
+ end
@@ -0,0 +1,3 @@
1
+ module RBHive
2
+ VERSION = '0.5.0'
3
+ end
data/lib/rbhive.rb CHANGED
@@ -2,4 +2,7 @@ require File.join(File.dirname(__FILE__), 'rbhive', 'connection')
2
2
  require File.join(File.dirname(__FILE__), 'rbhive', 'table_schema')
3
3
  require File.join(File.dirname(__FILE__), 'rbhive', 'result_set')
4
4
  require File.join(File.dirname(__FILE__), 'rbhive', 'explain_result')
5
- require File.join(File.dirname(__FILE__), 'rbhive', 'schema_definition')
5
+ require File.join(File.dirname(__FILE__), 'rbhive', 'schema_definition')
6
+ require File.join(File.dirname(__FILE__), *%w[rbhive t_c_l_i_result_set])
7
+ require File.join(File.dirname(__FILE__), *%w[rbhive t_c_l_i_schema_definition])
8
+ require File.join(File.dirname(__FILE__), *%w[rbhive t_c_l_i_connection])
@@ -1,12 +1,11 @@
1
1
  #
2
- # Autogenerated by Thrift
2
+ # Autogenerated by Thrift Compiler (0.9.0)
3
3
  #
4
4
  # DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
5
5
  #
6
6
 
7
7
  require 'thrift'
8
- require File.join(File.dirname(__FILE__), *%w[fb303_types])
9
-
8
+ require_relative 'fb303_types'
10
9
 
11
10
  module FacebookService
12
11
  class Client
@@ -370,13 +369,13 @@ module FacebookService
370
369
  SUCCESS = 0
371
370
 
372
371
  FIELDS = {
373
- SUCCESS => {:type => ::Thrift::Types::I32, :name => 'success', :enum_class => Fb_status}
372
+ SUCCESS => {:type => ::Thrift::Types::I32, :name => 'success', :enum_class => ::Fb_status}
374
373
  }
375
374
 
376
375
  def struct_fields; FIELDS; end
377
376
 
378
377
  def validate
379
- unless @success.nil? || Fb_status::VALID_VALUES.include?(@success)
378
+ unless @success.nil? || ::Fb_status::VALID_VALUES.include?(@success)
380
379
  raise ::Thrift::ProtocolException.new(::Thrift::ProtocolException::UNKNOWN, 'Invalid value of field success!')
381
380
  end
382
381
  end