RubyGems - gd_bam - Versions diffs - 0.0.7 → 0.0.8 - Mend

gd_bam 0.0.7 → 0.0.8

Files changed (10) hide show

data/README.md +92 -63
data/bin/bam +26 -7
data/lib/bam/version.rb +1 -1
data/lib/dsl/project_dsl.rb +11 -11
data/lib/nodes/clover_gen.rb +2 -4
data/lib/nodes/dependency.rb +1 -1
data/lib/runtime.rb +18 -6
data/templates/join_template.grf.erb +57 -0
metadata +10 -4
data/lib/taps/tap.rb +0 -52

data/README.md CHANGED Viewed

@@ -11,29 +11,13 @@ make sure you have ruby (1.9 and 1.8.7 is currently supported) and that you have
 Done.
-<!-- ##Spin up a new project
-Here project refers to the GoodData project.
-`bam project`
-This spins up a completely new empty project. You can further specify to spin up several predefined templates that bam knows about. Currently it is just goodsales. You do not need to worry about versions etc. Bam does this for you and spins up the latest available.
-###Scaffolding
-Scaffolding helps you create the files and provides you initial structure. Project here reffers to BAM project.
-create a project `bam scaffold project test`
-this will create a project. You can also scaffold a project from a know template. Currently it is again just goodsales. The goal is that after spinning you gooddata and BAM project you are good to run. If you want the defaults there is nothing you need to do besides filling credentials.
- -->
 ##Sample project -- GoodSales
 ###Prerequisites
-You need a working project with API access. You should also have username pass and token ready to fill in to params.json.
+You need a working Salesforce project with API access. You should also have username pass and token ready to fill in to params.json.
 ### Warnings
-The project is currently cloned out of an existing project. That means that you need to have access to it. If you do not (the project PID is i49w4c73c2mh75iiehte3fv3fbos8h2k) ask svarovsky@gooddata.com. Eventually this will be covered by a template so you will not need to do anything special. The template creation is tracked here https://jira.intgdc.com/browse/GD-34641 .
+The project is currently cloned out of an existing project. That means that you need to have access to it. If you do not (the project PID is nt935rwzls50zfqwy6dh62tabu8h0ocy) ask svarovsky@gooddata.com. Eventually this will be covered by a template so you will not need to do anything special. The template creation is tracked here https://jira.intgdc.com/browse/GD-34641 .
 ###Let's get to it
 We will spin a goodsales project and load it with data. Prerequisite for this is a functioning Salesforce poject that you can grab at force.com.
@@ -41,7 +25,7 @@ We will spin a goodsales project and load it with data. Prerequisite for this is
 `bam scaffold project test --blueprint goodsales`
 now you can go inside `cd test`. You will notice several directories and files. We will get to `flows`, `taps` and `sinks` late. Currently focus just on `params.json`
-If you open it you will see several parameters which you will have to fill. Common one should be predefined and empty. For starters you will need `gd_login` and `gd_pass` parameters filled in.
+If you open it you will see several parameters which you will have to fill. Common one should be predefined and empty. For starters you will need `gd_login`, `gd_pass`, `sf_password`, `sf_token` and `sf_login` parameters filled in. You can check that Salesforce connection is working by issuing `bam sf_jack_in`. If it is you should have a REPL opened up. If not you should get an error message.
 One of the parameters is project_pid. To get that you need a project.
@@ -49,7 +33,10 @@ One of the parameters is project_pid. To get that you need a project.
 This should spin for a while and eventually should give you project ID. Fill it in your params.json.
-Now go ahead and generate the downloaders.
+Now we are going to generate downloaders and before we do so it is a good practice to make sure that you have everything you need to. You can issue `bam taps_validate` which will go to Salesforce check every field you defined and make sure it is available. If not it will warn you. We tried to pick the fields that will be in you SalesForce but it is possible that they were deleted or the user does not have access to them.
+If everything is ok go ahead and generate the downloaders.
 `bam generate_downloaders`
@@ -69,16 +56,19 @@ This works the same as with downloaders but its default target is clover_project
 After it is finished log in to gooddata go into your project and celebrate. You just did project using BAM.
-##Painful metadata management
-Key pain that I had with CloudConnect is that I hated the management of metadata. Every project I saw was just pile of metadata definition that has to be constantly changed and tweaked. This is caused by couple of chioces that creators of underlying Clover engine made in the beginning and probably will not be changed easily. While I am trying to make it better I am still bound by these choices and sometimes the wiring stick out - sorry for that.
+##When Things go wrong
+We tried our best to this experience be a smooth one but sometimes things go bad. Here are some typical problems that can occur.
-###Incremental metadata
-Bam is working with something that is called Incremental metadata. Metadata is not defined in each step you just say what you want to change. Picture is probably better than thousand words.
+###Field is inaccessible in SF
+In the log there should be something like
-You have a conceptual picture of a simple transformation. You get a Tap that downloads FirstName and LastName somewhere. Obviously you would like to join them together to form a name. Exactly this happens in the second box the transformer. You would like to sink the only field and that is name. So on the next edge what you say is "I am adding Name and removing FirstName and LastName". So far so good. What is elegant about this approach is that how it copes with change. Imagine that the tap gets not only FirstName and LastName but also Age. Now what you need to change? If you would do it the old way You would have to change metadata on both edges, tap transformer and sink. With incremental metadata you need to change tap and sink nothing else. Since I claim that dealing with metadata was the biggest pain this is a lot of work (and errors) that you just saved.
+	Worker task failed: Missing mandatory fields
+This means that some of your fields are either not accessible or not in your SF project. Use `bam taps_validate` to identify those and remap them.
+##Next steps
+Ok so by now you hopefully have your project up and running. Before we dive into modifications you have to understand key concepts tha BAM builds on. Once you are comfortable with those we will get back with.
-###Types or not?
-Clover engine is built on Java and it shows. It is statically typed and CTL Clover transformation language resembles Java a lot. While it helps speed and many people claim it prevents errors it also causes more work and helps metadata explosion. Sometimes you need to translate an field into another field becuase you need to do something specific or the component needs it. It is not problem per se but it is important to see the tradeoffs and push the functionality into the components that should work for you and not against you. It is also important to do certain tasks at certain phases. If you do this you found out that certain parts are easier to automate or you can easily reuse work that you did somewhere else.
 ##Taps
 Taps are sources of data. Right now you can use just salesforce tap.
@@ -131,7 +121,7 @@ Sometimes it is useful to limit number of grabbed values for example for testing
 	}
 ####Acts as
-Sometime it is needed to use one field several times in a source of data or you want to "call" certain field differently because the ETL relies on partiular name. Both cases are handled using `acts_as`
+Sometime it is needed to use one field several times in a source of data or you want to "call" certain field differently because the ETL relies on particular name. Both cases are handled using `acts_as`
 	{
 	   "source" : "salesforce"
@@ -149,6 +139,8 @@ Sometime it is needed to use one field several times in a source of data or you
 Id will be routed to both Id and Name. Custom_Amount__c will be called RenamedAmount.
+Caution: This is a double edged sword so be careful. The idea here is that it should make your life easier not harder. You should map a field to a different one in exactly 2 cases. One is that you want the same field twice. The second is that (predefined) ETL requires certain field under certain name. If you are not careful it is easy to introduce data from 2 columns into a single one.
 ####Condition
 You can also specify a condition during download. I recommend using it only if it drastically lowers the amount of data that goes over wire. Otherwise implement it elsewhere.
@@ -162,15 +154,15 @@ It is wasteful to download everything on and on again. If you specify the increm
 The reason for this is simple. When you download only incrementally you do not stress the wires that much and that means you can run it pretty often. By runnning it often it means that even if something horrible happens once it will probably run succesfully next time. And as we mentioned this is cheap. On the other hand runnning the main ETL is often very expensive and recovering from failure is usually different so splitting them simplifies development of each. Since they are independent they can be even developed by different people which is sometimes useful.
 ####Taps validation
-Fail early. There is nothing more frustrating than when the ETL fails during exwcution. When you develop the taps you can ask BAM to connect to SF and validate that the fields are present. This is not bulletproof since some fields can go away at any time but it gives you good idea if you did not misspelled any fields.
+Fail early. There is nothing more frustrating than when the ETL fails during execution. When you develop the taps you can ask BAM to connect to SF and validate that the fields are present. This is not bulletproof since some fields can go away at any time but it gives you good idea if you did not misspelled any fields.
 ####Mandatory fields
-Sometimes it is necessary to move fields around in SF. In such case the tap will. If you know this upfront you can tell BAM that this field is not mandatory and it will silently go along filling the missing field with ''
+Sometimes it is necessary to move fields around in SF. In such case the tap will. If you know this upfront you can tell BAM that this field is not mandatory and it will silently go along filling the missing field with ''. If it is marked as mandatory which all fields are by default it will fail if it cannot access the field.
 ##Flows
-Flow is an abstraction
+Flow is an abstraction that should connect a tap with a sink creating a .. well a flow.
-This flow will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution). This graph concatenates first name and last name together. It then feeds data to the sink.
+Probably better show you a simple example. This flow will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution but we can act that is an arbitrary graph). This graph concatenates first name and last name together. It then feeds data to the sink.
 	GoodData::CloverGenerator::DSL::flow("user") do |f|
 	  tap(:id => "user")
@@ -185,18 +177,19 @@ This flow will download users from sf (you have to provide credentials in params
 	  sink(:id => "user")
 	end
-Now you have to provide it the definition of tap which you can do like this.
+Note couple of things.
-###When I call external graph? How does it work?
-In the flow you can call external graph by using
+* The flow is defined using a DSL in Ruby. If you like Ruby great if you do not I recommend http://rubymonk.com/ ot get you up to speed. This might change and we might introduce our own DSL.
+* Flow has its own id. The name of the file does not actually matter. Again something that we are thinking about.
+* with a tap you will include a tap into the flow. You can specify the id of a tap with id param. If you omit it it will try to include the tap with the same id as the flow.
+* with graph you run a graph of a name process_owner. When BAM creates the graphs for you there are two places it looks for the graphs. First it lookes into your project into `local_graphs` then it tries to look into the library that comes with BAM. Again especially the second part is going to change.
+* there might be one or more metadata statements after graph definition. Each graph might expect numerous inputs so the order of these `metadata` statements is telling you which input goes where. Second purpose is actually telling what is going to change in those metadata. Here we are saying "*Ok the user is going in as input number one (there is no number two in this  case). At the output user will have one more field and that is Name. On top of that we are removing two fields FirstName and LastName*".
+* The last thing we specify is the sink. Again as in tap you can specify id so you tell BAM for which sink it should look. If you do not fill it in by default it looks for the same is as your flow.
-	graph('my_graph')
-It goes to 2 places (this will change) and tries to find the graph. First place is your `local_graphs` directory in your project the second place is central reporsitory that is currently inside bam library and this part will probably change.
 ##Sinks
-Sink is a definition of where data goes to. Currently there is only one sink type and that is gooddata dataset.
+Sink is a definition of where data goes. Currently there is only one sink type and that is gooddata dataset.
 ###GoodData
@@ -233,11 +226,9 @@ Sink is a definition of where data goes to. Currently there is only one sink typ
 Gooddat sink is currently just mimicking the CL tool definition + some shortcuts on top of that. If you are familiar with CL tool you should be right at home if I tell you that the only additional thing you have to provide is telling BAM which metadata field is pulled in to a given field.
+##Adding a field
+Ok let's say you have a basic GoodSales
-<!--For this example to work you need to provide SF and gd credentials. Provide them in params.json. You would need to provide also a project with appropriate project but this is out of scope of this "example" (I am working on tools that would make it easier).
-Now run `bam generate` and there will be a folder with the clover project generated. Open it in CC find main.grf and run it. After crunching for a while you should see data in the project.
--->
 ##Runtime commands
 Part of the distribution is the bam executable which lets you do several neat things on the commandline
@@ -245,39 +236,77 @@ Part of the distribution is the bam executable which lets you do several neat th
 Run `bam` to get the list of commands
 Run `bam help command` to get help about the command
-### deploy directory
-deploys the directory to the server. You can provide the param of the process as a parameter
 ### generate
-Generates the ETL. The default target directory is clover_project (currently cannot be changed). You can provide --only parameter to specify the name of the flow to be processed if you do not need to generate all flows. Currently you can specify only one flow
+Generates the ETL. The default target directory is clover_project (currently cannot be changed).
+**--only flow_id** generates only one flow. Useful for debugging
+	bame generate --only owner
 ### generate_downloaders
-If you have incremental downloaders in your project it good to deploy them as a separate process. This generates only the downloaders and is meant for exacltly this purpose. If you are interested about why it is a good idea. Take a look here (TBD). The target directory is downloaders_project (currently cannot be changed).
+Generates downloaders into downloaders_project (currently cannot be changed).
+### deploy directory
+deploys the directory to the server.
+	bam deploy clover_project
-### generate_xmls
-Investigates what is changed and performs the changes in the target project. Uses CL tool behind the scenes. Needs more work
+**--process process_id** You can specify a process ID so you can redeploy to the same process. This just updates the deployed project. All the schedules are still in effect.
+	bam deploy clover_project --process 1231jkadjk123k
 ### model_sync
-Syncs the model with the definition in sinks. Sometimes the new field can actually be a typo or something like that. Possible to uncover with validate_datasets
+This will go through the sinks and updates the model. It rellies on CL tool to do this so this describes the limitation. It is very useful for adding additonal fields not changing the model altogether.
 ### run
-Runs the project and
-`bam run clover-project --email me@gooddata.com`
+Runs the project on the server. This is achived by deploying it there and deleting after the run finsihes.
+	bam run clover-project
+**--email someone@example.com** This will create a temporary email channel hooks the events on success and failure. The channel is tore down once the ETL is done.
 ### scaffold
-Takes an argument and creates a scaffold for you. It can scaffold project, flow, sink and tap.
+Creates a file templates so you do not need to start from scratch.
+	bam scaffold project my_new_project
+	bam scaffold tap new_tap
+	bam scaffold flow new_flow
+	bam scaffold dataset new_dataset
+To further ease your typical tasks in ETL BAM comes with couple of templates with prefilled ETL constructs
+	bam scaffold graph_template reformat local_process_my_stuff
+	bam scaffold graph_template join local_process_my_other_stuff
 ### taps_generate_docs
-In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git. The interpolated params are
-taps
-sinks
+In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git. Since it is an erb template there are several expressions that you can use.
-### sinks_validate
+	<%= taps %> - list of taps
+	<%= sinks %> - list of sinks
+You can run arbitrary ruby code inside so you can write something like
+	Last generated at <%= Date.today %>
+### taps_validate
 Currently works only for SF. Validates that the target SF instance has all the fields in the objects that are specified in the taps definitions.
-### validate_datasets
-Vallidates the sinks (currently only GD) with the definitions in the proeject. It looks for fields that are defined inside sinks and are not in the projects missing references etc. More description needed.
+### sinks_validate
+TBD
+##The why
+For those that are interested in reading why we actually bothered developing this. Read on.
+###Metadata management
+Key pain that I had with CloudConnect is that I did not like the management of metadata. Every project I saw was just pile of metadata definition that has to be constantly changed and tweaked. This is caused by couple of choices that creators of underlying Clover engine made in the beginning and probably will not be changed easily. While I am trying to make it better I am still bound by these choices and sometimes the wiring stick out - sorry for that.
-##Roadmap
-* Allow different storage then ES (Vertica)
-* Contract checkers
+###Incremental metadata
+Bam is working with something that is called Incremental metadata. Metadata is not defined in each step you just say what you want to change. Picture is probably better than thousand words.
+You have a conceptual picture of a simple transformation. You get a Tap that downloads FirstName and LastName somewhere. Obviously you would like to join them together to form a name. Exactly this happens in the second box the transformer. You would like to sink the only field and that is name. So on the next edge what you say is "I am adding Name and removing FirstName and LastName". So far so good. What is elegant about this approach is that how it copes with change. Imagine that the tap gets not only FirstName and LastName but also Age. Now what you need to change? If you would do it the old way You would have to change metadata on both edges, tap transformer and sink. With incremental metadata you need to change tap and sink nothing else. Since I claim that dealing with metadata was the biggest pain this is a lot of work (and errors) that you just saved.
+###Types or not?
+Clover engine is built on Java and it shows. It is statically typed and CTL Clover transformation language resembles Java a lot. While it helps speed and many people claim it prevents errors it also causes more work and helps metadata explosion. Sometimes you need to translate an field into another field becuase you need to do something specific or the component needs it. It is not problem per se but it is important to see the tradeoffs and push the functionality into the components that should work for you and not against you. It is also important to do certain tasks at certain phases. If you do this you found out that certain parts are easier to automate or you can easily reuse work that you did somewhere else.

data/bin/bam CHANGED Viewed

@@ -16,6 +16,11 @@ default_value false
 arg_name 'verbose'
 switch [:v,:verbose]
+desc 'Http logger'
+default_value false
+arg_name 'logger'
+switch [:l,:logger]
 desc 'Generates clover project based on information in current directory. The default ouptut is the directory ./clover_project'
 # arg_name 'Describe arguments to new here'
@@ -58,7 +63,19 @@ desc 'Validates that the tap has the fields it is claimed it should have. This i
 # arg_name 'Describe arguments to new here'
 command :taps_validate do |c|
   c.action do |global_options,options,args|
-    GoodData::CloverGenerator.validate_taps
+    verbose = global_options[:v]
+    result = GoodData::CloverGenerator.validate_taps
+    error = false
+    result.each_pair do |obj, fields|
+      if fields.empty?
+        puts HighLine::color("GOOD", :green) + " #{obj}" if verbose
+        error = true
+      else
+        puts HighLine::color("BAD", :red) + " #{obj} [" + fields.join(', ') + "]" if verbose
+      end
+    end
+    exit_now!("Errors found",exit_code=1) if error
   end
 end
@@ -111,12 +128,13 @@ command :project do |c|
     pid = case options[:blueprint]
       when "goodsales"
-        "i49w4c73c2mh75iiehte3fv3fbos8h2k"
+        "nt935rwzls50zfqwy6dh62tabu8h0ocy"
       when nil
         fail "Empty project not supported now"
     end
-    GoodData::CloverGenerator.connect_to_gd()
+    logger = Logger.new(STDOUT) if global_options[:l]
+    GoodData::CloverGenerator.connect_to_gd(:logger => logger)
     with_users  = options[:with_users]
     export = {
@@ -257,7 +275,8 @@ command :deploy do |c|
     dir = args.first
     fail "You have to specify directory to deploy as an argument" if dir.nil?
     fail "Specified directory does not exist" unless File.exist?(dir)
-    GoodData::CloverGenerator.connect_to_gd
+    logger = Logger.new(STDOUT) if global_options[:l]
+    GoodData::CloverGenerator.connect_to_gd(:logger => logger)
     options = global_options.merge({:name => "temporary"}).merge(options)
     response = GoodData::CloverGenerator.deploy(dir, options)
   end
@@ -279,15 +298,14 @@ command :run do |c|
     verbose = global_options[:v]
     logger = Logger.new(STDOUT) if global_options[:l]
     GoodData::CloverGenerator.connect_to_gd(:logger => logger)
     options = global_options.merge({:name => "temporary"})
     GoodData::CloverGenerator.deploy(dir, options) do |deploy_response|
       puts HighLine::color("Executing", HighLine::BOLD) if verbose
       GoodData::CloverGenerator.create_email_channel(options) do |channel_response|
         GoodData::CloverGenerator.subscribe_on_finish(:success, channel_response["channelConfiguration"]["meta"]["uri"], deploy_response["process"]["links"]["self"].split('/').last)
-        GoodData::CloverGenerator.execute_process(deploy_response["process"]["links"]["executions"], dir)
+        result = GoodData::CloverGenerator.execute_process(deploy_response["process"]["links"]["executions"], dir)
       end
     end
   end
@@ -300,6 +318,7 @@ pre do |global,command,options,args|
   # chosen command
   # Use skips_pre before a command to skip this block
   # on that command only
   true
 end

data/lib/bam/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Bam
-  VERSION = '0.0.7'
+  VERSION = '0.0.8'
 end

data/lib/dsl/project_dsl.rb CHANGED Viewed

@@ -64,7 +64,7 @@ module GoodData
         attr_accessor :steps, :name
         def self.define(name="", &script)
-          puts "Reading flow #{name}"
+          # puts "Reading flow #{name}"
           x = self.new
           x.flow_name(name)
           x.instance_eval(&script)
@@ -100,7 +100,7 @@ module GoodData
           type = options[:type]
           steps.push(options)
-          puts "Running step #{graph}"
+          # puts "Running step #{graph}"
         end
         def metadata(name=nil,options={}, &bl)
@@ -121,7 +121,7 @@ module GoodData
         attr_accessor :usecases, :name, :dims
         def self.define(&script)
-          print self
+          # print self
           x = self.new
           x.instance_eval(&script)
           x
@@ -209,23 +209,23 @@ module GoodData
         end
         def run(repo)
-          puts "Running"
+          # puts "Running"
-          puts "looking for dimension definitions"
-          dims.each do |dim|
-            puts "found #{dim}"
-          end
+          # puts "looking for dimension definitions"
+          # dims.each do |dim|
+            # puts "found #{dim}"
+          # end
           sources = get_sources
           fail "You have no sources defined" if sources.empty?
-          puts "Found #{sources.count} sources"
+          # puts "Found #{sources.count} sources"
           datasets = get_datasets
           fail "You have no datasets defined" if datasets.empty?
-          puts "Found #{datasets.count} sources"
+          # puts "Found #{datasets.count} sources"
-          puts "Composing the tree"
+          # puts "Composing the tree"
           you = GoodData::CloverGenerator::Dependency::N.new({
             :name => name,
             :type => "project",

data/lib/nodes/clover_gen.rb CHANGED Viewed

@@ -262,13 +262,11 @@ module GoodData
     def self.validate_sf_metadata(sf_client, sources)
       sources.reduce({}) do |memo, source|
-        puts "Checking #{source[:object]}"
         sf_object = source[:object]
         u = sf_client.describe(sf_object)
         sf_fields = u[:describeSObjectResponse][:result][:fields].map {|field| field[:name]}
         fields_to_validate = source[:fields].map {|field| field[:name]}
         memo[sf_object] = (fields_to_validate - sf_fields)
-        pp fields_to_validate - sf_fields
         memo
       end
     end
@@ -393,7 +391,7 @@ module GoodData
       if spec[:condition].nil? || spec[:condition].empty?
         spec[:condition] = "SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
       else
-        spec[:condition] += "AND SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
+        spec[:condition] += " AND SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
       end
       generate_select(spec)
     end
@@ -1229,7 +1227,7 @@ HEREDOC
             build_node2(builder, GoodData::CloverGenerator::Nodes.edge2({:toNode => "#{file}_es_reformat:0", :fromNode => "#{file}_copy:1", :metadata => "#{file}_clover_metadata", :id => get_id()}))
             if s3_backup then
-              build_node2(builder, GoodData::CloverGenerator::Nodes.writer2({:enabled => "disabled", :name => "#{file} s3 Writer", :id => "#{file}_s3", :fileURL => "https://${S3_ACCESS_KEY_ID}:\`replace(\"${S3_SECRET_ACCESS_KEY}\",\"/\",\"%2F\")\`@${S3_BUCKETNAME}.s3.amazonaws.com/${GDC_PROJECT_ID}/#{file}", :outputFieldNames => true, :quotedStrings => false}))
+              build_node2(builder, GoodData::CloverGenerator::Nodes.writer2({:enabled => "enabled", :name => "#{file} s3 Writer", :id => "#{file}_s3", :fileURL => "https://${S3_ACCESS_KEY_ID}:\`replace(\"${S3_SECRET_ACCESS_KEY}\",\"/\",\"%2F\")\`@${S3_BUCKETNAME}.s3.amazonaws.com/${GDC_PROJECT_ID}/#{file}/#{file}_\`date2long(today())\`", :outputFieldNames => true, :quotedStrings => false}))
             end
             build_node2(builder, GoodData::CloverGenerator::Nodes.edge2({:toNode => "#{file}_csv:0", :fromNode => "#{file}_copy:0", :metadata => "#{file}_clover_metadata", :id => get_id()}))
             if s3_backup then

data/lib/nodes/dependency.rb CHANGED Viewed

@@ -12,7 +12,7 @@ module GoodData
       class Visitor
         def accept(node)
-          puts node
+          # puts node
           if node.type == "ldm"
             puts "LDM #{node.to_s}"
           else

data/lib/runtime.rb CHANGED Viewed

@@ -137,8 +137,7 @@ module GoodData
       sources = project.get_sources
       client = get_sf_client(PARAMS)
       sf_sources = sources.find_all {|tap| tap[:source] == "salesforce"}
-      report = GoodData::CloverGenerator::validate_sf_metadata(client, sf_sources)
-      pp report
+      GoodData::CloverGenerator::validate_sf_metadata(client, sf_sources)
     end
     def self.sf_jack_in
@@ -222,12 +221,14 @@ module GoodData
       sources = project.get_sources
       sf_sources = sources.find_all {|tap| tap[:source] == "salesforce" && tap[:incremental] == true}
       create_incremental_downloader_run_graph(CLOVER_DOWNLOADERS_ROOT + PROJECT_GRAPHS_ROOT + "main.grf", sf_sources)
+      s3_backup = PARAMS[:S3_SECRET_ACCESS_KEY] && PARAMS[:S3_ACCESS_KEY_ID] && PARAMS[:S3_BUCKETNAME]
       GoodData::CloverGenerator::create_incremental_downloading_graph(CLOVER_DOWNLOADERS_ROOT + PROJECT_GRAPHS_ROOT + "incremental.grf", sf_sources, {
         :password => PARAMS[:sf_password],
         :token    => PARAMS[:sf_token],
         :login    => PARAMS[:sf_login],
         :sf_server => PARAMS[:sf_server],
-        :s3_backup => false
+        :s3_backup => s3_backup
       })
     end
@@ -239,7 +240,17 @@ module GoodData
          :params => {}
         }
       })
-      GoodData.poll(result, "executionTask")
+      begin
+        GoodData.poll(result, "executionTask")
+      rescue RestClient::RequestFailed => e
+      ensure
+        result = GoodData.get(result["executionTask"]["links"]["detail"])
+        if result["executionDetail"]["status"] == "ERROR"
+          fail "Runing process failed. You can look at a log here #{result["executionDetail"]["logFileName"]}"
+        end
+      end
+      result
     end
     def self.connect_to_gd(options={})
@@ -384,6 +395,7 @@ module GoodData
       p = build_project
       sources   = p.get_sources
       datasets  = p.get_datasets
+      s3_backup = PARAMS[:S3_SECRET_ACCESS_KEY] && PARAMS[:S3_ACCESS_KEY_ID] && PARAMS[:S3_BUCKETNAME]
       flows = []
       FileUtils::cd FLOWS_ROOT do
@@ -441,7 +453,7 @@ module GoodData
                   GoodData::CloverGenerator::create_es_downloading_graph(graph_name, [source], {
                     :metadata => current_metadata[source_name],
-                    :s3_backup => false
+                    :s3_backup => s3_backup
                   })
                 else
                   GoodData::CloverGenerator::create_sf_downloading_graph(graph_name, [source], {
@@ -450,7 +462,7 @@ module GoodData
                     :login    => PARAMS[:sf_login],
                     :sf_server => PARAMS[:sf_server],
                     :metadata => current_metadata[source_name],
-                    :s3_backup => false
+                    :s3_backup => s3_backup
                   })
                 end

data/templates/join_template.grf.erb ADDED Viewed

@@ -0,0 +1,57 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<Graph author="fluke" created="Tue Feb 05 15:38:24 PST 2013" guiVersion="3.3.2" id="1360179808937" licenseCode="CLP1DGOODD71636137BY" licenseType="Commercial" modified="Mon May 06 10:12:35 PDT 2013" modifiedBy="gdc-defectivedisplay" name="process_name" revision="1.15" showComponentDetails="true">
+<Global>
+<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_in.xml" id="Metadata0"/>
+<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_out.xml" id="Metadata1"/>
+<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/2_in.xml" id="Metadata2"/>
+<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/2_out.xml" id="Metadata3"/>
+<MetadataGroup id="ComponentGroup0" name="metadata"/>
+<Property fileURL="params.txt" id="GraphParameter14"/>
+<Property fileURL="workspace.prm" id="GraphParameter0"/>
+<Dictionary/>
+</Global>
+<Phase number="0">
+<Node enabled="enabled" fileURL="data/1_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="169" id="DATA_READER0" quoteCharacter="&quot;" quotedStrings="true" skipRows="1" type="DATA_READER"/>
+<Node enabled="enabled" fileURL="data/2_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="269" id="DATA_READER1" quoteCharacter="&quot;" quotedStrings="true" skipRows="1" type="DATA_READER"/>
+<Node enabled="enabled" fileURL="data/out.csv" guiHeight="77" guiName="CSV Writer" guiWidth="128" guiX="776" guiY="196" id="DATA_WRITER0" outputFieldNames="true" quoteCharacter="&quot;" quotedStrings="true" type="DATA_WRITER"/>
+<Node enabled="enabled" guiHeight="89" guiName="ExtMergeJoin" guiWidth="128" guiX="570" guiY="199" id="EXT_MERGE_JOIN0" joinKey="$UserRoleId(a)#$Id(a);" type="EXT_MERGE_JOIN">
+<attr name="transform"><![CDATA[//#CTL2
+// Transforms input record into output record.
+function integer transform() {
+	$out.0.* = $in.0.*;
+	$out.0.Id = nvl2($out.0.Id, $in.1.Name, "");
+	return ALL;
+}
+// Called during component initialization.
+// function boolean init() {}
+// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
+// required by the transform. All resources allocated within this method should be released
+// by the postExecute() method.
+// function void preExecute() {}
+// Called only if transform() throws an exception.
+// function integer transformOnError(string errorMessage, string stackTrace) {}
+// Called during each graph run after the entire transform was executed. Should be used to free any resources
+// allocated within the preExecute() method.
+// function void postExecute() {}
+// Called to return a user-defined error message when an error occurs.
+// function string getMessage() {}
+]]></attr>
+</Node>
+<Node enabled="enabled" guiHeight="77" guiName="ExtSort" guiWidth="128" guiX="362" guiY="161" id="EXT_SORT0" sortKey="UserRoleId(a)" type="EXT_SORT"/>
+<Node enabled="enabled" guiHeight="77" guiName="ExtSort" guiWidth="128" guiX="362" guiY="275" id="EXT_SORT1" sortKey="Id(a)" type="EXT_SORT"/>
+<Edge fromNode="DATA_READER0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (output)" toNode="EXT_SORT0:0"/>
+<Edge fromNode="DATA_READER1:0" guiBendpoints="" guiRouter="Manhattan" id="Edge2" inPort="Port 0 (in)" metadata="Metadata2" outPort="Port 0 (output)" toNode="EXT_SORT1:0"/>
+<Edge fromNode="EXT_MERGE_JOIN0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge4" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 0 (out)" toNode="DATA_WRITER0:0"/>
+<Edge fromNode="EXT_SORT0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge1" inPort="Port 0 (driver)" metadata="Metadata0" outPort="Port 0 (out)" toNode="EXT_MERGE_JOIN0:0"/>
+<Edge fromNode="EXT_SORT1:0" guiBendpoints="" guiRouter="Manhattan" id="Edge3" inPort="Port 1 (slave)" metadata="Metadata2" outPort="Port 0 (out)" toNode="EXT_MERGE_JOIN0:1"/>
+</Phase>
+</Graph>

metadata CHANGED Viewed

@@ -1,15 +1,15 @@
 --- !ruby/object:Gem::Specification
 name: gd_bam
 version: !ruby/object:Gem::Version
-  version: 0.0.7
+  version: 0.0.8
   prerelease:
 platform: ruby
 authors:
-- Your Name Here
+- Tomas Svarovsky
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-05-03 00:00:00.000000000 Z
+date: 2013-05-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -365,9 +365,9 @@ files:
 - lib/repo/1_config.json
 - lib/repository/repo.rb
 - lib/runtime.rb
-- lib/taps/tap.rb
 - templates/dataset.json.erb
 - templates/flow.rb.erb
+- templates/join_template.grf.erb
 - templates/params.json.erb
 - templates/project.erb
 - templates/reformat_template.grf.erb
@@ -395,12 +395,18 @@ required_ruby_version: !ruby/object:Gem::Requirement
   - - ! '>='
     - !ruby/object:Gem::Version
       version: '0'
+      segments:
+      - 0
+      hash: 3774242164736806626
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements:
   - - ! '>='
     - !ruby/object:Gem::Version
       version: '0'
+      segments:
+      - 0
+      hash: 3774242164736806626
 requirements: []
 rubyforge_project:
 rubygems_version: 1.8.25

data/lib/taps/tap.rb DELETED Viewed

@@ -1,52 +0,0 @@
-require 'hasb'
-class HashValidator::Validator::TapFieldValidtor < HashValidator::Validator::Base
-  def initialize
-    super('tap_field_validator')  # The name of the validator
-  end
-  def validate(key, value, validations, errors)
-  	binding.pry
-    unless value.is_a?(Integer) && value.odd?
-      errors[key] = presence_error_message
-    end
-  end
-end
-module GoodData
-  module BAM
-  	module Taps
-{
-  "type" : "tap"
-  ,"source" : "salesforce"
-  ,"object" : "Account"
-  ,"id" : "account"
-  ,"incremental" : true
-  ,"fields" : [
-		{
-			"name" : "Id"
-		},
-		{
-			"name" : "Name"
-		},
-		{
-			"name" : "SystemModstamp", "acts_as": ["timestamp"]
-		}
-	]
-	// ,"limit": "10"
-}
-  		VALIDATOR = {
-  			:type => :string,
-  			:source =>
-  		}
-  		def self.parse_tap(tap_spec)
-  		end
-  	end
-  end
-end