RubyGems - hivemeta - Versions diffs - 0.0.2 → 0.0.3 - Mend

hivemeta 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

data/README CHANGED Viewed

@@ -1,15 +1,47 @@
 hivemeta
-A ruby API for access to the Hive metastore.  Useful for querying columns
-in Hadoop map/reduce applications.  Includes a demo application to spit out
-table information from the command-line via table name search or
-by the table's location in HDFS.
+A ruby API for access to a Hive metastore running under MySQL.
+Useful for querying columns in Hadoop map/reduce applications.  Normally,
+a developer needs to handle both the splitting of incoming data and the
+assignment of numerically indexed fields to friendly variables like so:
+fields = line.chomp.split /\t/
+item_id = fields[0]
+inv_cnt = fields[7].to_i
+puts "#{item_id}\t#{inv_cnt}"
+This is not overly traumatic, however it's susceptible to errors creeping
+in from file format changes.  Ongoing maintenance can easily become a burden
+if there are many map/reduce programs reading the same changed data files.
+Code size increases as the column count increases.
+With hivemeta, the process is streamlined.  That same task is now:
+row = inv_table.process_row line
+puts "#{row.item_id}\t#{row.inv_cnt.to_i}"
+The row object automagically knows its column names and they can be
+referenced in one of the following ways (in order of best to worst
+performance and coolness):
+row.col_name
+row[:col_name]
+row['col_name']
+Also included is a demo application, hivemeta_query.rb, to spit out table
+information from the command-line via table name search or by the table's
+location in HDFS.
+---
 Installation
 gem install hivemeta
-Usage
+---
+API Usage
 streaming map/reduce code snippet:
@@ -30,12 +62,15 @@ STDIN.each_line do |line|
   puts "#{item_id}\t#{count}" if count >= 1000
 end
-sample usage for the demo app:
+---
+hivemeta_query.rb Usage
 # query by table names
 $ hivemeta_query.rb join_test_name
 join_test_name
 hdfs://namenode/tmp/join_test_name
+delimiter: "\t" (ASCII 9)
 0   userid             # userid
 1   name               # username
@@ -43,6 +78,7 @@ hdfs://namenode/tmp/join_test_name
 $ hivemeta_query.rb join_test%
 join_test_address
 hdfs://namenode/tmp/join_test_address
+delimiter: "," (ASCII 44)
 0   userid             # uid
 1   address
 2   city
@@ -50,6 +86,7 @@ hdfs://namenode/tmp/join_test_address
 join_test_name
 hdfs://namenode/tmp/join_test_name
+delimiter: "\t" (ASCII 9)
 0   userid             # userid
 1   name               # username

data/lib/hivemeta/connection.rb CHANGED Viewed

@@ -80,7 +80,6 @@ module HiveMeta
           table.columns[col_idx]  = col_name
           table.comments[col_idx] = col_cmt
           table.path      = tbl_loc
         end
         sql = "select sp.PARAM_VALUE

data/lib/hivemeta/record.rb CHANGED Viewed

@@ -4,22 +4,30 @@ module HiveMeta
   class Record
     def initialize(line, table)
-      fields = line.chomp.split(table.delimiter, -1)
-      if fields.size != table.columns.size
+      @fields = line.chomp.split(table.delimiter, -1)
+      if @fields.size != table.columns.size
         raise FieldCountError
       end
       @columns = {}
       table.each_col_with_index do |col_name, i|
-        @columns[col_name] = fields[i]
-        @columns[col_name.to_sym] = fields[i]
+        @columns[col_name] = @fields[i]
+        @columns[col_name.to_sym] = @fields[i]
       end
     end
+    # allow for column access via column name as an index
+    # example: rec[:col_name]
+    #      or: rec['col_name']
+    # can also use the numeric index as stored in the file
+    # example: rec[7]
     def [] index
+      return "#{@fields[index]}" if index.is_a? Integer
       "#{@columns[index.to_sym]}"
     end
+    # allow for column access via column name as a method
+    # example: rec.col_name
     def method_missing(id, *args)
       return @columns[id] if @columns[id]
       raise NoMethodError

data/lib/hivemeta/table.rb CHANGED Viewed

@@ -11,7 +11,7 @@ module HiveMeta
       @path = nil
       @columns   = []
       @comments  = []
-      @delimiter = "\t"
+      @delimiter = "\001"
     end
     def to_s

metadata CHANGED Viewed

@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
   segments:
   - 0
   - 0
-  - 2
-  version: 0.0.2
+  - 3
+  version: 0.0.3
 platform: ruby
 authors:
 - Frank Fejes
@@ -14,7 +14,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-05-04 00:00:00 -05:00
+date: 2011-05-17 00:00:00 -05:00
 default_executable:
 dependencies: []