RubyGems - mobilize-hive - Versions diffs - 1.36 → 1.291 - Mend

mobilize-hive 1.36 → 1.291

Files changed (22) hide show

checksums.yaml +7 -0
data/README.md +242 -3
data/lib/mobilize-hive.rb +0 -3
data/lib/mobilize-hive/handlers/hive.rb +105 -100
data/lib/mobilize-hive/tasks.rb +0 -1
data/lib/mobilize-hive/version.rb +1 -1
data/lib/samples/hive.yml +0 -6
data/mobilize-hive.gemspec +1 -1
data/test/hive_job_rows.yml +26 -0
data/test/{fixtures/hive1.hql → hive_test_1.hql} +0 -0
data/test/{fixtures/hive1.in.yml → hive_test_1_in.yml} +0 -0
data/test/{fixtures/hive1.schema.yml → hive_test_1_schema.yml} +0 -0
data/test/mobilize-hive_test.rb +96 -0
data/test/test_helper.rb +0 -1
metadata +19 -38
data/lib/mobilize-hive/helpers/hive_helper.rb +0 -67
data/test/fixtures/hive1.sql +0 -1
data/test/fixtures/hive4_stage1.in +0 -1
data/test/fixtures/hive4_stage2.in.yml +0 -4
data/test/fixtures/integration_expected.yml +0 -69
data/test/fixtures/integration_jobs.yml +0 -34
data/test/integration/mobilize-hive_test.rb +0 -43

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 1eb5a243ff499f31c06f0c1e09abba3fafee0b31
+  data.tar.gz: d44471ad7d8fcc72ac8562ebfb529d31d839d195
+SHA512:
+  metadata.gz: a27bde80d634f949cbf7f82b0ceaad4e087e3fe1bee6fe4631aa8205b68a7897e9d0f8e8ec49700a37cc3bd54e266df06a30e2187e13b6f9a1357d1d270af54c
+  data.tar.gz: 3763262ee0ac27778cc2abb6342b20d9fb1a1c6c546ddd87f4ff73969a8054314480387c230aa232e89fad7e2df16f597f2fa50344430343a7cdde9aa03d0d79

data/README.md CHANGED Viewed

@@ -1,4 +1,243 @@
-Mobilize
-========
+Mobilize-Hive
+===============
-Please refer to the mobilize-server wiki: https://github.com/DeNA/mobilize-server/wiki
+Mobilize-Hive adds the power of hive to [mobilize-hdfs][mobilize-hdfs].
+* read, write, and copy hive files through Google Spreadsheets.
+Table Of Contents
+-----------------
+* [Overview](#section_Overview)
+* [Install](#section_Install)
+  * [Mobilize-Hive](#section_Install_Mobilize-Hive)
+  * [Install Dirs and Files](#section_Install_Dirs_and_Files)
+* [Configure](#section_Configure)
+  * [Hive](#section_Configure_Hive)
+* [Start](#section_Start)
+  * [Create Job](#section_Start_Create_Job)
+  * [Run Test](#section_Start_Run_Test)
+* [Meta](#section_Meta)
+* [Special Thanks](#section_Special_Thanks)
+* [Author](#section_Author)
+<a name='section_Overview'></a>
+Overview
+-----------
+* Mobilize-hive adds Hive methods to mobilize-hdfs.
+<a name='section_Install'></a>
+Install
+------------
+Make sure you go through all the steps in the
+[mobilize-base][mobilize-base],
+[mobilize-ssh][mobilize-ssh],
+[mobilize-hdfs][mobilize-hdfs],
+install sections first.
+<a name='section_Install_Mobilize-Hive'></a>
+### Mobilize-Hive
+add this to your Gemfile:
+``` ruby
+gem "mobilize-hive"
+```
+or do
+  $ gem install mobilize-hive
+for a ruby-wide install.
+<a name='section_Install_Dirs_and_Files'></a>
+### Dirs and Files
+### Rakefile
+Inside the Rakefile in your project's root dir, make sure you have:
+``` ruby
+require 'mobilize-base/tasks'
+require 'mobilize-ssh/tasks'
+require 'mobilize-hdfs/tasks'
+require 'mobilize-hive/tasks'
+```
+This defines rake tasks essential to run the environment.
+### Config Dir
+run
+  $ rake mobilize_hive:setup
+This will copy over a sample hive.yml to your config dir.
+<a name='section_Configure'></a>
+Configure
+------------
+<a name='section_Configure_Hive'></a>
+### Configure Hive
+* Hive is big data. That means we need to be careful when reading from
+the cluster as it could easily fill up our mongodb instance, RAM, local disk
+space, etc.
+* To achieve this, all hive operations, stage outputs, etc. are
+executed and stored on the cluster only.
+  * The exceptions are:
+    * writing to the cluster from an external source, such as a google
+sheet. Here there
+is no risk as the external source has much more strict size limits than
+hive.
+    * reading from the cluster, such as for posting to google sheet. In
+this case, the read_limit parameter dictates the maximum amount that can
+be read. If the data is bigger than the read limit, an exception will be
+raised.
+The Hive configuration consists of:
+* clusters - this defines aliases for clusters, which are used as
+parameters for Hive stages. They should have the same name as those
+in hadoop.yml. Each cluster has:
+  * max_slots - defines the total number of simultaneous slots to be
+    used for hive jobs on this cluster
+  * output_db - defines the db which should be used to hold stage outputs.
+    * This db must have open permissions (777) so any user on the system can
+write to it -- the tables inside will be owned by the users themselves.
+  * exec_path - defines the path to the hive executable
+Sample hive.yml:
+``` yml
+---
+development:
+  clusters:
+    dev_cluster:
+      max_slots: 5
+      output_db: mobilize
+      exec_path: /path/to/hive
+test:
+  clusters:
+    test_cluster:
+      max_slots: 5
+      output_db: mobilize
+      exec_path: /path/to/hive
+production:
+  clusters:
+    prod_cluster:
+      max_slots: 5
+      output_db: mobilize
+      exec_path: /path/to/hive
+```
+<a name='section_Start'></a>
+Start
+-----
+<a name='section_Start_Create_Job'></a>
+### Create Job
+* For mobilize-hive, the following stages are available.
+  * cluster and user are optional for all of the below.
+    * cluster defaults to the first cluster listed;
+    * user is treated the same way as in [mobilize-ssh][mobilize-ssh].
+  * hive.run `hql:<hql> || source:<gsheet_path>, user:<user>, cluster:<cluster>`, which executes the
+      script in the hql or source sheet and returns any output specified at the
+      end. If the cmd or last query in source is a select statement, column headers will be
+      returned as well.
+  * hive.write `hql:<hql> || source:<source_path>, target:<hive_path>, partitions:<partition_path>, user:<user>, cluster:<cluster>, schema:<gsheet_path>, drop:<true/false>`,
+      which writes the source or query result to the selected hive table.
+    * hive_path
+      * should be of the form `<hive_db>/<table_name>` or `<hive_db>.<table_name>`.
+    * source:
+      * can be a gsheet_path, hdfs_path, or hive_path (no partitions)
+      * for gsheet and hdfs path,
+        * if the file ends in .*ql, it's treated the same as passing hql
+        * otherwise it is treated as a tsv with the first row as column headers
+    * target:
+      * Should be a hive_path, as in `<hive_db>/<table_name>` or `<hive_db>.<table_name>`.
+    * partitions:
+      * Due to Hive limitation, partition names CANNOT be reserved keywords when writing from tsv (gsheet or hdfs source)
+      * Partitions should be specified as a path, as in  partitions:`<partition1>/<partition2>`.
+    * schema:
+      * optional. gsheet_path to column schema.
+        * two columns: name, datatype
+        * Any columns not defined here will receive "string" as the datatype
+        * partitions can have their datatypes overridden here as well
+        * columns named here that are not in the dataset will be ignored
+    * drop:
+      * optional. drops the target table before performing write
+      * defaults to false
+<a name='section_Start_Run_Test'></a>
+### Run Test
+To run tests, you will need to
+1) go through [mobilize-base][mobilize-base], [mobilize-ssh][mobilize-ssh], [mobilize-hdfs][mobilize-hdfs] tests first
+2) clone the mobilize-hive repository
+From the project folder, run
+3) $ rake mobilize_hive:setup
+Copy over the config files from the mobilize-base, mobilize-ssh,
+mobilize-hdfs projects into the config dir, and populate the values in the hive.yml file.
+Make sure you use the same names for your hive clusters as you do in
+hadoop.yml.
+3) $ rake test
+* The test runs these jobs:
+  * hive_test_1:
+    * `hive.write target:"mobilize/hive_test_1/act_date",source:"Runner_mobilize(test)/hive_test_1.in", schema:"hive_test_1.schema", drop:true`
+    * `hive.run source:"hive_test_1.hql"`
+    * `hive.run cmd:"show databases"`
+    * `gsheet.write source:"stage2", target:"hive_test_1_stage_2.out"`
+    * `gsheet.write source:"stage3", target:"hive_test_1_stage_3.out"`
+    * hive_test_1.hql runs a select statement on the table created in the
+      write command.
+    * at the end of the test, there should be two sheets, one with a
+        sum of the data as in your write query, one with the results of the show
+        databases command.
+  * hive_test_2:
+    * `hive.write source:"hdfs://user/mobilize/test/test_hdfs_1.out", target:"mobilize.hive_test_2", drop:true`
+    * `hive.run cmd:"select * from mobilize.hive_test_2"`
+    * `gsheet.write source:"stage2", target:"hive_test_2.out"`
+    * this test uses the output from the first hdfs test as an input, so make sure you've run that first.
+  * hive_test_3:
+    * `hive.write source:"hive://mobilize.hive_test_1",target:"mobilize/hive_test_3/date/product",drop:true`
+    * `hive.run hql:"select act_date as ```date```,product,category,value from mobilize.hive_test_1;"`
+    * `hive.write source:"stage2",target:"mobilize/hive_test_3/date/product", drop:false`
+    * `gsheet.write source:"hive://mobilize/hive_test_3", target:"hive_test_3.out"`
+<a name='section_Meta'></a>
+Meta
+----
+* Code: `git clone git://github.com/dena/mobilize-hive.git`
+* Home: <https://github.com/dena/mobilize-hive>
+* Bugs: <https://github.com/dena/mobilize-hive/issues>
+* Gems: <http://rubygems.org/gems/mobilize-hive>
+<a name='section_Special_Thanks'></a>
+Special Thanks
+--------------
+* This release goes to Toby Negrin, who championed this project with
+DeNA and gave me the support to get it properly architected, tested, and documented.
+* Also many thanks to the Analytics team at DeNA who build and maintain
+our Big Data infrastructure.
+<a name='section_Author'></a>
+Author
+------
+Cassio Paes-Leme :: cpaesleme@dena.com :: @cpaesleme
+[mobilize-base]: https://github.com/dena/mobilize-base
+[mobilize-ssh]: https://github.com/dena/mobilize-ssh
+[mobilize-hdfs]: https://github.com/dena/mobilize-hdfs

data/lib/mobilize-hive.rb CHANGED Viewed

@@ -3,9 +3,6 @@ require "mobilize-hdfs"
 module Mobilize
   module Hive
-    def Hive.home_dir
-      File.expand_path('..',File.dirname(__FILE__))
-    end
   end
 end
 require "mobilize-hive/handlers/hive"

data/lib/mobilize-hive/handlers/hive.rb CHANGED Viewed

@@ -1,7 +1,56 @@
 module Mobilize
   module Hive
-    #adds convenience methods
-    require "#{File.dirname(__FILE__)}/../helpers/hive_helper"
+    def Hive.config
+      Base.config('hive')
+    end
+    def Hive.exec_path(cluster)
+      Hive.clusters[cluster]['exec_path']
+    end
+    def Hive.output_db(cluster)
+      Hive.clusters[cluster]['output_db']
+    end
+    def Hive.output_db_user(cluster)
+      output_db_node = Hadoop.gateway_node(cluster)
+      output_db_user = Ssh.host(output_db_node)['user']
+      output_db_user
+    end
+    def Hive.clusters
+      Hive.config['clusters']
+    end
+    def Hive.slot_ids(cluster)
+      (1..Hive.clusters[cluster]['max_slots']).to_a.map{|s| "#{cluster}_#{s.to_s}"}
+    end
+    def Hive.slot_worker_by_cluster_and_path(cluster,path)
+      working_slots = Mobilize::Resque.jobs.map{|j| begin j['args'][1]['hive_slot'];rescue;nil;end}.compact.uniq
+      Hive.slot_ids(cluster).each do |slot_id|
+        unless working_slots.include?(slot_id)
+          Mobilize::Resque.set_worker_args_by_path(path,{'hive_slot'=>slot_id})
+          return slot_id
+        end
+      end
+      #return false if none are available
+      return false
+    end
+    def Hive.unslot_worker_by_path(path)
+      begin
+        Mobilize::Resque.set_worker_args_by_path(path,{'hive_slot'=>nil})
+        return true
+      rescue
+        return false
+      end
+    end
+    def Hive.databases(cluster,user_name)
+      Hive.run(cluster,"show databases",user_name)['stdout'].split("\n")
+    end
     # converts a source path or target path to a dst in the context of handler and stage
     def Hive.path_to_dst(path,stage_path,gdrive_slot)
       has_handler = true if path.index("://")
@@ -93,32 +142,12 @@ module Mobilize
     end
     #run a generic hive command, with the option of passing a file hash to be locally available
-    def Hive.run(cluster,hql,user_name,params=nil,file_hash=nil)
-      preps = Hive.prepends.map do |p|
-                                  prefix = "set "
-                                  suffix = ";"
-                                  prep_out = p
-                                  prep_out = "#{prefix}#{prep_out}" unless prep_out.starts_with?(prefix)
-                                  prep_out = "#{prep_out}#{suffix}" unless prep_out.ends_with?(suffix)
-                                  prep_out
-                                end.join
-      hql = "#{preps}#{hql}"
+    def Hive.run(cluster,hql,user_name,file_hash=nil)
+      # no TempStatsStore
+      hql = "set hive.stats.autogather=false;#{hql}"
       filename = hql.to_md5
       file_hash||= {}
       file_hash[filename] = hql
-      params ||= {}
-      #replace any params in the file_hash and command
-      params.each do |k,v|
-        file_hash.each do |name,data|
-          data.gsub!("@#{k}",v)
-        end
-      end
-      #add in default params
-      Hive.default_params.each do |k,v|
-        file_hash.each do |name,data|
-          data.gsub!(k,v)
-        end
-      end
       #silent mode so we don't have logs in stderr; clip output
       #at hadoop read limit
       command = "#{Hive.exec_path(cluster)} -S -f #{filename} | head -c #{Hadoop.read_limit}"
@@ -162,9 +191,8 @@ module Mobilize
       Gdrive.unslot_worker_by_path(stage_path)
       #check for select at end
-      hql_array = hql.split("\n").reject{|l| l.starts_with?("--") or l.strip.length==0}.join("\n").split(";").map{|h| h.strip}
-      last_statement = hql_array.last
-      if last_statement.to_s.downcase.starts_with?("select")
+      hql_array = hql.split(";").map{|hc| hc.strip}.reject{|hc| hc.length==0}
+      if hql_array.last.downcase.starts_with?("select")
         #nil if no prior commands
         prior_hql = hql_array[0..-2].join(";") if hql_array.length > 1
         select_hql = hql_array.last
@@ -172,10 +200,10 @@ module Mobilize
                             "drop table if exists #{output_path}",
                             "create table #{output_path} as #{select_hql};"].join(";")
         full_hql = [prior_hql, output_table_hql].compact.join(";")
-        result = Hive.run(cluster,full_hql, user_name,params['params'])
+        result = Hive.run(cluster,full_hql, user_name)
         Dataset.find_or_create_by_url(out_url)
       else
-        result = Hive.run(cluster, hql, user_name,params['params'])
+        result = Hive.run(cluster, hql, user_name)
         Dataset.find_or_create_by_url(out_url)
         Dataset.write_by_url(out_url,result['stdout'],user_name) if result['stdout'].to_s.length>0
       end
@@ -188,37 +216,40 @@ module Mobilize
       response
     end
-    def Hive.schema_hash(schema_path,stage_path,user_name,gdrive_slot)
-      handler = if schema_path.index("://")
-                  schema_path.split("://").first
-                else
-                  "gsheet"
-                end
-      dst = "Mobilize::#{handler.downcase.capitalize}".constantize.path_to_dst(schema_path,stage_path,gdrive_slot)
-      out_raw = dst.read(user_name,gdrive_slot)
-      #determine the datatype for schema; accept json, yaml, tsv
-      if schema_path.ends_with?(".yml")
-        out_ha = begin;YAML.load(out_raw);rescue ScriptError, StandardError;nil;end if out_ha.nil?
+    def Hive.schema_hash(schema_path,user_name,gdrive_slot)
+      if schema_path.index("/")
+        #slashes mean sheets
+        out_tsv = Gsheet.find_by_path(schema_path,gdrive_slot).read(user_name)
       else
-        out_ha = begin;JSON.parse(out_raw);rescue ScriptError, StandardError;nil;end
-        out_ha = out_raw.tsv_to_hash_array if out_ha.nil?
+        u = User.where(:name=>user_name).first
+        #check sheets in runner
+        r = u.runner
+        runner_sheet = r.gbook(gdrive_slot).worksheet_by_title(schema_path)
+        out_tsv = if runner_sheet
+                    runner_sheet.read(user_name)
+                  else
+                    #check for gfile. will fail if there isn't one.
+                    Gfile.find_by_path(schema_path).read(user_name)
+                  end
       end
+      #use Gridfs to cache gdrive results
+      file_name = schema_path.split("/").last
+      out_url = "gridfs://#{schema_path}/#{file_name}"
+      Dataset.write_by_url(out_url,out_tsv,user_name)
+      schema_tsv = Dataset.find_by_url(out_url).read(user_name,gdrive_slot)
       schema_hash = {}
-      out_ha.each do |hash|
-        schema_hash[hash['name']] = hash['datatype']
+      schema_tsv.tsv_to_hash_array.each do |ha|
+        schema_hash[ha['name']] = ha['datatype']
       end
       schema_hash
     end
-    def Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop=false, schema_hash=nil, run_params=nil)
+    def Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop=false, schema_hash=nil)
       table_path = [db,table].join(".")
       table_stats = Hive.table_stats(cluster, db, table, user_name)
-      url = "hive://" + [cluster,db,table,part_array.compact.join("/")].join("/")
-      #decomment hql
-      source_hql_array = source_hql.split("\n").reject{|l| l.starts_with?("--") or l.strip.length==0}.join("\n").split(";").map{|h| h.strip}
-      last_select_i = source_hql_array.rindex{|s| s.downcase.starts_with?("select")}
+      source_hql_array = source_hql.split(";")
+      last_select_i = source_hql_array.rindex{|hql| hql.downcase.strip.starts_with?("select")}
       #find the last select query -- it should be used for the temp table creation
       last_select_hql = (source_hql_array[last_select_i..-1].join(";")+";")
       #if there is anything prior to the last select, add it in prior to table creation
@@ -231,8 +262,7 @@ module Mobilize
       temp_set_hql = "set mapred.job.name=#{job_name} (temp table);"
       temp_drop_hql = "drop table if exists #{temp_table_path};"
       temp_create_hql = "#{temp_set_hql}#{prior_hql}#{temp_drop_hql}create table #{temp_table_path} as #{last_select_hql}"
-      response = Hive.run(cluster,temp_create_hql,user_name,run_params)
-      raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
+      Hive.run(cluster,temp_create_hql,user_name)
       source_table_stats = Hive.table_stats(cluster,temp_db,temp_table_name,user_name)
       source_fields = source_table_stats['field_defs']
@@ -270,12 +300,10 @@ module Mobilize
                            target_insert_hql,
                            temp_drop_hql].join
-        response = Hive.run(cluster, target_full_hql, user_name, run_params)
-        raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
+        Hive.run(cluster, target_full_hql, user_name)
       elsif part_array.length > 0 and
-        table_stats.ie{|tts| tts.nil? || drop || tts['partitions'].to_a.map{|p| p['name']}.sort == part_array.sort}
+        table_stats.ie{|tts| tts.nil? || drop || tts['partitions'].to_a.map{|p| p['name']} == part_array}
         #partitions and no target table or same partitions in both target table and user params
         target_headers = source_fields.map{|f| f['name']}.reject{|h| part_array.include?(h)}
@@ -306,7 +334,7 @@ module Mobilize
         target_set_hql = ["set mapred.job.name=#{job_name};",
                           "set hive.exec.dynamic.partition.mode=nonstrict;",
-                          "set hive.exec.max.dynamic.partitions.pernode=10000;",
+                          "set hive.exec.max.dynamic.partitions.pernode=1000;",
                           "set hive.exec.dynamic.partition=true;",
                           "set hive.exec.max.created.files = 200000;",
                           "set hive.max.created.files = 200000;"].join
@@ -322,20 +350,12 @@ module Mobilize
           part_set_hql = "set hive.cli.print.header=true;set mapred.job.name=#{job_name} (permutations);"
           part_select_hql = "select distinct #{target_part_stmt} from #{temp_table_path};"
           part_perm_hql = part_set_hql + part_select_hql
-          response = Hive.run(cluster, part_perm_hql, user_name, run_params)
-          raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
-          part_perm_tsv = response['stdout']
+          part_perm_tsv = Hive.run(cluster, part_perm_hql, user_name)['stdout']
           #having gotten the permutations, ensure they are dropped
           part_hash_array = part_perm_tsv.tsv_to_hash_array
-          #make sure there is data
-          if part_hash_array.first.nil? or part_hash_array.first.values.include?(nil)
-            #blank result set, return url
-            return url
-          end
           part_drop_hql = part_hash_array.map do |h|
             part_drop_stmt = h.map do |name,value|
-                               part_defs[name[1..-2]].downcase=="string" ? "#{name}='#{value}'" : "#{name}=#{value}"
+                               part_defs[name[1..-2]]=="string" ? "#{name}='#{value}'" : "#{name}=#{value}"
                              end.join(",")
                             "use #{db};alter table #{table} drop if exists partition (#{part_drop_stmt});"
                           end.join
@@ -348,12 +368,12 @@ module Mobilize
         target_full_hql = [target_set_hql, target_create_hql, target_insert_hql, temp_drop_hql].join
-        response = Hive.run(cluster, target_full_hql, user_name, run_params)
-        raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
+        Hive.run(cluster, target_full_hql, user_name)
       else
         error_msg = "Incompatible partition specs"
         raise error_msg
       end
+      url = "hive://" + [cluster,db,table,part_array.compact.join("/")].join("/")
       return url
     end
@@ -361,12 +381,6 @@ module Mobilize
     #Accepts options to drop existing target if any
     #also schema with column datatype overrides
     def Hive.tsv_to_table(cluster, db, table, part_array, source_tsv, user_name, drop=false, schema_hash=nil)
-      return nil if source_tsv.strip.length==0
-      if source_tsv.index("\r\n")
-        source_tsv = source_tsv.gsub("\r\n","\n")
-      elsif source_tsv.index("\r")
-        source_tsv = source_tsv.gsub("\r","\n")
-      end
       source_headers = source_tsv.tsv_header_array
       table_path = [db,table].join(".")
@@ -374,8 +388,6 @@ module Mobilize
       schema_hash ||= {}
-      url = "hive://" + [cluster,db,table,part_array.compact.join("/")].join("/")
       if part_array.length == 0 and
         table_stats.ie{|tts| tts.nil? || drop || tts['partitions'].nil?}
         #no partitions in either user params or the target table
@@ -402,11 +414,10 @@ module Mobilize
         target_full_hql = [target_drop_hql,target_create_hql,target_insert_hql].join(";")
-        response = Hive.run(cluster, target_full_hql, user_name, nil, file_hash)
-        raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
+        Hive.run(cluster, target_full_hql, user_name, file_hash)
       elsif part_array.length > 0 and
-        table_stats.ie{|tts| tts.nil? || drop || tts['partitions'].to_a.map{|p| p['name']}.sort == part_array.sort}
+        table_stats.ie{|tts| tts.nil? || drop || tts['partitions'].to_a.map{|p| p['name']} == part_array}
         #partitions and no target table
         #or same partitions in both target table and user params
         #or drop and start fresh
@@ -430,17 +441,13 @@ module Mobilize
                             "partitioned by #{partition_defs}"
         #create target table early if not here
-        response = Hive.run(cluster, target_create_hql, user_name)
-        raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
-        #return url (operation complete) if there's no data
-        source_hash_array = source_tsv.tsv_to_hash_array
-        return url if source_hash_array.length==1 and source_hash_array.first.values.compact.length==0
+        Hive.run(cluster, target_create_hql, user_name)
         table_stats = Hive.table_stats(cluster, db, table, user_name)
         #create data hash from source hash array
         data_hash = {}
+        source_hash_array = source_tsv.tsv_to_hash_array
         source_hash_array.each do |ha|
           tpmk = part_array.map{|pn| "#{pn}=#{ha[pn]}"}.join("/")
           tpmv = ha.reject{|k,v| part_array.include?(k)}.values.join("\001")
@@ -473,8 +480,7 @@ module Mobilize
         #run actual partition adds all at once
         if target_part_hql.length>0
           puts "Adding partitions to #{cluster}/#{db}/#{table} for #{user_name} at #{Time.now.utc}"
-          response = Hive.run(cluster, target_part_hql, user_name)
-          raise response['stderr'] if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
+          Hive.run(cluster, target_part_hql, user_name)
         end
       else
         error_msg = "Incompatible partition specs: " +
@@ -482,7 +488,7 @@ module Mobilize
                     "user_params:#{part_array.to_s}"
         raise error_msg
       end
+      url = "hive://" + [cluster,db,table,part_array.compact.join("/")].join("/")
       return url
     end
@@ -503,7 +509,7 @@ module Mobilize
       job_name = s.path.sub("Runner_","")
       schema_hash = if params['schema']
-                      Hive.schema_hash(params['schema'],stage_path,user_name,gdrive_slot)
+                      Hive.schema_hash(params['schema'],user_name,gdrive_slot)
                     else
                       {}
                     end
@@ -519,11 +525,11 @@ module Mobilize
           #source table
           cluster,source_path = source.path.split("/").ie{|sp| [sp.first, sp[1..-1].join(".")]}
           source_hql = "select * from #{source_path};"
-        else
+        elsif ['gsheet','gridfs','hdfs'].include?(source.handler)
           if source.path.ie{|sdp| sdp.index(/\.[A-Za-z]ql$/) or sdp.ends_with?(".ql")}
             source_hql = source.read(user_name,gdrive_slot)
           else
-            #tsv from sheet or file
+            #tsv from sheet
             source_tsv = source.read(user_name,gdrive_slot)
           end
         end
@@ -545,13 +551,9 @@ module Mobilize
       result = begin
                  url = if source_hql
-                         #include any params (or nil) at the end
-                         run_params = params['params']
-                         Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop, schema_hash,run_params)
+                         Hive.hql_to_table(cluster, db, table, part_array, source_hql, user_name, job_name, drop, schema_hash)
                        elsif source_tsv
                          Hive.tsv_to_table(cluster, db, table, part_array, source_tsv, user_name, drop, schema_hash)
-                       elsif source
-                         #null sheet
                        else
                          raise "Unable to determine source tsv or source hql"
                        end
@@ -578,8 +580,11 @@ module Mobilize
       select_hql = "select * from #{source_path};"
       hql = [set_hql,select_hql].join
       response = Hive.run(cluster, hql,user_name)
-      raise "Unable to read hive://#{dst_path} with error: #{response['stderr']}" if response['stderr'].to_s.ie{|s| s.index("FAILED") or s.index("KILLED")}
-      return response['stdout']
+      if response['exit_code']==0
+        return response['stdout']
+      else
+        raise "Unable to read hive://#{dst_path} with error: #{response['stderr']}"
+      end
     end
     def Hive.write_by_dataset_path(dst_path,source_tsv,user_name,*args)

data/lib/mobilize-hive/tasks.rb CHANGED Viewed

@@ -1,4 +1,3 @@
-require 'yaml'
 namespace :mobilize_hive do
   desc "Set up config and log folders and files"
   task :setup do

data/lib/mobilize-hive/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Mobilize
   module Hive
-    VERSION = "1.36"
+    VERSION = "1.291"
   end
 end

data/lib/samples/hive.yml CHANGED Viewed

@@ -1,23 +1,17 @@
 ---
 development:
-  prepends:
-    - "hive.stats.autogather=false"
   clusters:
     dev_cluster:
       max_slots: 5
       temp_table_db: mobilize
       exec_path: /path/to/hive
 test:
-  prepends:
-    - "hive.stats.autogather=false"
   clusters:
     test_cluster:
       max_slots: 5
       temp_table_db: mobilize
       exec_path: /path/to/hive
 production:
-  prepends:
-    - "hive.stats.autogather=false"
   clusters:
     prod_cluster:
       max_slots: 5

data/mobilize-hive.gemspec CHANGED Viewed

@@ -16,5 +16,5 @@ Gem::Specification.new do |gem|
   gem.executables   = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
   gem.test_files    = gem.files.grep(%r{^(test|spec|features)/})
   gem.require_paths = ["lib"]
-  gem.add_runtime_dependency "mobilize-hdfs","1.36"
+  gem.add_runtime_dependency "mobilize-hdfs","1.291"
 end

data/test/hive_job_rows.yml ADDED Viewed

@@ -0,0 +1,26 @@
+---
+- name: hive_test_1
+  active: true
+  trigger: once
+  status: ""
+  stage1: hive.write target:"mobilize/hive_test_1", partitions:"act_date", drop:true,
+            source:"Runner_mobilize(test)/hive_test_1.in", schema:"hive_test_1.schema"
+  stage2: hive.run source:"hive_test_1.hql"
+  stage3: hive.run hql:"show databases;"
+  stage4: gsheet.write source:"stage2", target:"hive_test_1_stage_2.out"
+  stage5: gsheet.write source:"stage3", target:"hive_test_1_stage_3.out"
+- name: hive_test_2
+  active: true
+  trigger: after hive_test_1
+  status: ""
+  stage1: hive.write source:"hdfs://user/mobilize/test/test_hdfs_1.out", target:"mobilize.hive_test_2", drop:true
+  stage2: hive.run hql:"select * from mobilize.hive_test_2;"
+  stage3: gsheet.write source:"stage2", target:"hive_test_2.out"
+- name: hive_test_3
+  active: true
+  trigger: after hive_test_2
+  status: ""
+  stage1: hive.run hql:"select act_date as `date`,product,category,value from mobilize.hive_test_1;"
+  stage2: hive.write source:"stage1",target:"mobilize/hive_test_3", partitions:"date/product", drop:true
+  stage3: hive.write hql:"select * from mobilize.hive_test_3;",target:"mobilize/hive_test_3", partitions:"date/product", drop:false
+  stage4: gsheet.write source:"hive://mobilize/hive_test_3", target:"hive_test_3.out"

data/test/{fixtures/hive1.hql → hive_test_1.hql} RENAMED Viewed

File without changes

data/test/{fixtures/hive1.in.yml → hive_test_1_in.yml} RENAMED Viewed

File without changes

data/test/{fixtures/hive1.schema.yml → hive_test_1_schema.yml} RENAMED Viewed

File without changes

data/test/mobilize-hive_test.rb ADDED Viewed

@@ -0,0 +1,96 @@
+require 'test_helper'
+describe "Mobilize" do
+  def before
+    puts 'nothing before'
+  end
+  # enqueues 4 workers on Resque
+  it "runs integration test" do
+    puts "restart workers"
+    Mobilize::Jobtracker.restart_workers!
+    gdrive_slot = Mobilize::Gdrive.owner_email
+    puts "create user 'mobilize'"
+    user_name = gdrive_slot.split("@").first
+    u = Mobilize::User.where(:name=>user_name).first
+    r = u.runner
+    puts "add test_source data"
+    hive_1_in_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.in",gdrive_slot)
+    [hive_1_in_sheet].each {|s| s.delete if s}
+    hive_1_in_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.in",gdrive_slot)
+    hive_1_in_tsv = YAML.load_file("#{Mobilize::Base.root}/test/hive_test_1_in.yml").hash_array_to_tsv
+    hive_1_in_sheet.write(hive_1_in_tsv,Mobilize::Gdrive.owner_name)
+    hive_1_schema_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.schema",gdrive_slot)
+    [hive_1_schema_sheet].each {|s| s.delete if s}
+    hive_1_schema_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.schema",gdrive_slot)
+    hive_1_schema_tsv = YAML.load_file("#{Mobilize::Base.root}/test/hive_test_1_schema.yml").hash_array_to_tsv
+    hive_1_schema_sheet.write(hive_1_schema_tsv,Mobilize::Gdrive.owner_name)
+    hive_1_hql_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.hql",gdrive_slot)
+    [hive_1_hql_sheet].each {|s| s.delete if s}
+    hive_1_hql_sheet = Mobilize::Gsheet.find_or_create_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1.hql",gdrive_slot)
+    hive_1_hql_tsv = File.open("#{Mobilize::Base.root}/test/hive_test_1.hql").read
+    hive_1_hql_sheet.write(hive_1_hql_tsv,Mobilize::Gdrive.owner_name)
+    jobs_sheet = r.gsheet(gdrive_slot)
+    test_job_rows = ::YAML.load_file("#{Mobilize::Base.root}/test/hive_job_rows.yml")
+    test_job_rows.map{|j| r.jobs(j['name'])}.each{|j| j.delete if j}
+    jobs_sheet.add_or_update_rows(test_job_rows)
+    hive_1_stage_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_2.out",gdrive_slot)
+    [hive_1_stage_2_target_sheet].each{|s| s.delete if s}
+    hive_1_stage_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_3.out",gdrive_slot)
+    [hive_1_stage_3_target_sheet].each{|s| s.delete if s}
+    hive_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_2.out",gdrive_slot)
+    [hive_2_target_sheet].each{|s| s.delete if s}
+    hive_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_3.out",gdrive_slot)
+    [hive_3_target_sheet].each{|s| s.delete if s}
+    puts "job row added, force enqueued requestor, wait for stages"
+    r.enqueue!
+    wait_for_stages(1200)
+    puts "jobtracker posted data to test sheet"
+    hive_1_stage_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_2.out",gdrive_slot)
+    hive_1_stage_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_1_stage_3.out",gdrive_slot)
+    hive_2_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_2.out",gdrive_slot)
+    hive_3_target_sheet = Mobilize::Gsheet.find_by_path("#{r.path.split("/")[0..-2].join("/")}/hive_test_3.out",gdrive_slot)
+    assert hive_1_stage_2_target_sheet.read(u.name).length == 219
+    assert hive_1_stage_3_target_sheet.read(u.name).length > 3
+    assert hive_2_target_sheet.read(u.name).length == 599
+    assert hive_3_target_sheet.read(u.name).length == 347
+  end
+  def wait_for_stages(time_limit=600,stage_limit=120,wait_length=10)
+    time = 0
+    time_since_stage = 0
+    #check for 10 min
+    while time < time_limit and time_since_stage < stage_limit
+      sleep wait_length
+      job_classes = Mobilize::Resque.jobs.map{|j| j['class']}
+      if job_classes.include?("Mobilize::Stage")
+        time_since_stage = 0
+        puts "saw stage at #{time.to_s} seconds"
+      else
+        time_since_stage += wait_length
+        puts "#{time_since_stage.to_s} seconds since stage seen"
+      end
+      time += wait_length
+      puts "total wait time #{time.to_s} seconds"
+    end
+    if time >= time_limit
+      raise "Timed out before stage completion"
+    end
+  end
+end

data/test/test_helper.rb CHANGED Viewed

@@ -8,4 +8,3 @@ $dir = File.dirname(File.expand_path(__FILE__))
 ENV['MOBILIZE_ENV'] = 'test'
 require 'mobilize-hive'
 $TESTING = true
-require "#{Mobilize::Hdfs.home_dir}/test/test_helper"

metadata CHANGED Viewed

@@ -1,32 +1,29 @@
 --- !ruby/object:Gem::Specification
 name: mobilize-hive
 version: !ruby/object:Gem::Version
-  version: '1.36'
-  prerelease:
+  version: '1.291'
 platform: ruby
 authors:
 - Cassio Paes-Leme
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-05-21 00:00:00.000000000 Z
+date: 2013-03-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: mobilize-hdfs
   requirement: !ruby/object:Gem::Requirement
-    none: false
     requirements:
     - - '='
       - !ruby/object:Gem::Version
-        version: '1.36'
+        version: '1.291'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
-    none: false
     requirements:
     - - '='
       - !ruby/object:Gem::Version
-        version: '1.36'
+        version: '1.291'
 description: Adds hive read, write, and run support to mobilize-hdfs
 email:
 - cpaesleme@dena.com
@@ -41,61 +38,45 @@ files:
 - Rakefile
 - lib/mobilize-hive.rb
 - lib/mobilize-hive/handlers/hive.rb
-- lib/mobilize-hive/helpers/hive_helper.rb
 - lib/mobilize-hive/tasks.rb
 - lib/mobilize-hive/version.rb
 - lib/samples/hive.yml
 - mobilize-hive.gemspec
-- test/fixtures/hive1.hql
-- test/fixtures/hive1.in.yml
-- test/fixtures/hive1.schema.yml
-- test/fixtures/hive1.sql
-- test/fixtures/hive4_stage1.in
-- test/fixtures/hive4_stage2.in.yml
-- test/fixtures/integration_expected.yml
-- test/fixtures/integration_jobs.yml
-- test/integration/mobilize-hive_test.rb
+- test/hive_job_rows.yml
+- test/hive_test_1.hql
+- test/hive_test_1_in.yml
+- test/hive_test_1_schema.yml
+- test/mobilize-hive_test.rb
 - test/redis-test.conf
 - test/test_helper.rb
 homepage: http://github.com/dena/mobilize-hive
 licenses: []
+metadata: {}
 post_install_message:
 rdoc_options: []
 require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
-  - - ! '>='
+  - - '>='
     - !ruby/object:Gem::Version
       version: '0'
-      segments:
-      - 0
-      hash: 837156919845089008
 required_rubygems_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
-  - - ! '>='
+  - - '>='
     - !ruby/object:Gem::Version
       version: '0'
-      segments:
-      - 0
-      hash: 837156919845089008
 requirements: []
 rubyforge_project:
-rubygems_version: 1.8.25
+rubygems_version: 2.0.3
 signing_key:
-specification_version: 3
+specification_version: 4
 summary: Adds hive read, write, and run support to mobilize-hdfs
 test_files:
-- test/fixtures/hive1.hql
-- test/fixtures/hive1.in.yml
-- test/fixtures/hive1.schema.yml
-- test/fixtures/hive1.sql
-- test/fixtures/hive4_stage1.in
-- test/fixtures/hive4_stage2.in.yml
-- test/fixtures/integration_expected.yml
-- test/fixtures/integration_jobs.yml
-- test/integration/mobilize-hive_test.rb
+- test/hive_job_rows.yml
+- test/hive_test_1.hql
+- test/hive_test_1_in.yml
+- test/hive_test_1_schema.yml
+- test/mobilize-hive_test.rb
 - test/redis-test.conf
 - test/test_helper.rb

data/lib/mobilize-hive/helpers/hive_helper.rb DELETED Viewed

@@ -1,67 +0,0 @@
-module Mobilize
-  module Hive
-    def self.config
-      Base.config('hive')
-    end
-    def self.exec_path(cluster)
-      self.clusters[cluster]['exec_path']
-    end
-    def self.output_db(cluster)
-      self.clusters[cluster]['output_db']
-    end
-    def self.output_db_user(cluster)
-      output_db_node = Hadoop.gateway_node(cluster)
-      output_db_user = Ssh.host(output_db_node)['user']
-      output_db_user
-    end
-    def self.clusters
-      self.config['clusters']
-    end
-    def self.slot_ids(cluster)
-      (1..self.clusters[cluster]['max_slots']).to_a.map{|s| "#{cluster}_#{s.to_s}"}
-    end
-    def self.prepends
-      self.config['prepends']
-    end
-    def self.slot_worker_by_cluster_and_path(cluster,path)
-      working_slots = Mobilize::Resque.jobs.map{|j| begin j['args'][1]['hive_slot'];rescue;nil;end}.compact.uniq
-      self.slot_ids(cluster).each do |slot_id|
-        unless working_slots.include?(slot_id)
-          Mobilize::Resque.set_worker_args_by_path(path,{'hive_slot'=>slot_id})
-          return slot_id
-        end
-      end
-      #return false if none are available
-      return false
-    end
-    def self.unslot_worker_by_path(path)
-      begin
-        Mobilize::Resque.set_worker_args_by_path(path,{'hive_slot'=>nil})
-        return true
-      rescue
-        return false
-      end
-    end
-    def self.databases(cluster,user_name)
-      self.run(cluster,"show databases",user_name)['stdout'].split("\n")
-    end
-    def self.default_params
-      time = Time.now.utc
-      {
-       '$utc_date'=>time.strftime("%Y-%m-%d"),
-       '$utc_time'=>time.strftime("%H:%M"),
-      }
-    end
-  end
-end

data/test/fixtures/hive1.sql DELETED Viewed

	@@ -1 +0,0 @@
1	- select act_date,product, sum(value) as sum from mobilize.hive_test_1 group by act_date,product;

data/test/fixtures/hive4_stage1.in DELETED Viewed

	@@ -1 +0,0 @@
1	-

data/test/fixtures/hive4_stage2.in.yml DELETED Viewed

@@ -1,4 +0,0 @@
-- act_date: ""
-  product: ""
-  category: ""
-  value: ""

data/test/fixtures/integration_expected.yml DELETED Viewed

@@ -1,69 +0,0 @@
----
-- path: "Runner_mobilize(test)/jobs"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive1/stage1"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive1/stage2"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive1/stage3"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive1/stage4"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive1/stage5"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive2/stage1"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive2/stage2"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive2/stage3"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive3/stage1"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive3/stage2"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive3/stage3"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive3/stage4"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive4/stage1"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive4/stage2"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive4/stage3"
-  state: working
-  count: 1
-  confirmed_ats: []
-- path: "Runner_mobilize(test)/jobs/hive4/stage4"
-  state: working
-  count: 1
-  confirmed_ats: []

data/test/fixtures/integration_jobs.yml DELETED Viewed

@@ -1,34 +0,0 @@
----
-- name: hive1
-  active: true
-  trigger: once
-  status: ""
-  stage1: hive.write target:"mobilize/hive1", partitions:"act_date", drop:true,
-            source:"Runner_mobilize(test)/hive1.in", schema:"hive1.schema"
-  stage2: hive.run source:"hive1.sql"
-  stage3: hive.run hql:"show databases;"
-  stage4: gsheet.write source:"stage2", target:"hive1_stage2.out"
-  stage5: gsheet.write source:"stage3", target:"hive1_stage3.out"
-- name: hive2
-  active: true
-  trigger: after hive1
-  status: ""
-  stage1: hive.write source:"hdfs://user/mobilize/test/hdfs1.out", target:"mobilize.hive2", drop:true
-  stage2: hive.run hql:"select * from mobilize.hive2;"
-  stage3: gsheet.write source:"stage2", target:"hive2.out"
-- name: hive3
-  active: true
-  trigger: after hive2
-  status: ""
-  stage1: hive.run hql:"select '@date' as `date`,product,category,value from mobilize.hive1;", params:{'date':'2013-01-01'}
-  stage2: hive.write source:"stage1",target:"mobilize/hive3", partitions:"date/product", drop:true
-  stage3: hive.write hql:"select * from mobilize.hive3;",target:"mobilize/hive3", partitions:"date/product", drop:false
-  stage4: gsheet.write source:"hive://mobilize/hive3", target:"hive3.out"
-- name: hive4
-  active: true
-  trigger: after hive3
-  status: ""
-  stage1: hive.write source:"hive4_stage1.in", target:"mobilize/hive1", partitions:"act_date"
-  stage2: hive.write source:"hive4_stage2.in", target:"mobilize/hive1", partitions:"act_date"
-  stage3: hive.run hql:"select '@date $utc_time' as `date_time`,product,category,value from mobilize.hive1;", params:{'date':'$utc_date'}
-  stage4: gsheet.write source:stage3, target:"hive4.out"

data/test/integration/mobilize-hive_test.rb DELETED Viewed

@@ -1,43 +0,0 @@
-require 'test_helper'
-describe "Mobilize" do
-  # enqueues 4 workers on Resque
-  it "runs integration test" do
-    puts "restart workers"
-    Mobilize::Jobtracker.restart_workers!
-    u = TestHelper.owner_user
-    r = u.runner
-    user_name = u.name
-    gdrive_slot = u.email
-    puts "add test data"
-    ["hive1.in","hive4_stage1.in","hive4_stage2.in","hive1.schema","hive1.sql"].each do |fixture_name|
-      target_url = "gsheet://#{r.title}/#{fixture_name}"
-      TestHelper.write_fixture(fixture_name, target_url, 'replace')
-    end
-    puts "add/update jobs"
-    u.jobs.each{|j| j.delete}
-    jobs_fixture_name = "integration_jobs"
-    jobs_target_url = "gsheet://#{r.title}/jobs"
-    TestHelper.write_fixture(jobs_fixture_name, jobs_target_url, 'update')
-    puts "job rows added, force enqueue runner, wait for stages"
-    #wait for stages to complete
-    expected_fixture_name = "integration_expected"
-    Mobilize::Jobtracker.stop!
-    r.enqueue!
-    TestHelper.confirm_expected_jobs(expected_fixture_name,2100)
-    puts "update job status and activity"
-    r.update_gsheet(gdrive_slot)
-    puts "check posted data"
-    assert TestHelper.check_output("gsheet://#{r.title}/hive1_stage2.out", 'min_length' => 219) == true
-    assert TestHelper.check_output("gsheet://#{r.title}/hive1_stage3.out", 'min_length' => 3) == true
-    assert TestHelper.check_output("gsheet://#{r.title}/hive2.out", 'min_length' => 599) == true
-    assert TestHelper.check_output("gsheet://#{r.title}/hive3.out", 'min_length' => 347) == true
-    assert TestHelper.check_output("gsheet://#{r.title}/hive4.out", 'min_length' => 432) == true
-  end
-end