RubyGems - gd_bam - Versions diffs - 0.0.1 - Mend

gd_bam 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

data/README.md +169 -0
data/bin/bam +218 -0
data/lib/bam/version.rb +3 -0
data/lib/bam.rb +8 -0
data/lib/dsl/project_dsl.rb +259 -0
data/lib/graphs/docentize.grf +47 -0
data/lib/graphs/dummy.grf +46 -0
data/lib/graphs/load_history.grf +579 -0
data/lib/graphs/process_account.grf +47 -0
data/lib/graphs/process_activity.grf +222 -0
data/lib/graphs/process_activity_dim.grf +88 -0
data/lib/graphs/process_activity_owner.grf +48 -0
data/lib/graphs/process_opportunity.grf +46 -0
data/lib/graphs/process_opportunity_line_item.grf +179 -0
data/lib/graphs/process_opportunity_snapshot.grf +94 -0
data/lib/graphs/process_owner.grf +48 -0
data/lib/graphs/process_stage.grf +51 -0
data/lib/graphs/process_stage_history.grf +184 -0
data/lib/graphs/process_velocity_duration.grf +140 -0
data/lib/nodes/clover_gen.rb +1283 -0
data/lib/nodes/dependency.rb +96 -0
data/lib/nodes/nodes.rb +371 -0
data/lib/repo/1_config.json +8 -0
data/lib/repository/repo.rb +21 -0
data/lib/runtime.rb +517 -0
data/templates/dataset.json.erb +13 -0
data/templates/flow.rb.erb +12 -0
data/templates/params.json.erb +7 -0
data/templates/project.erb +18 -0
data/templates/source.json.erb +22 -0
data/templates/tap.json.erb +16 -0
data/templates/update_dataset.script.erb +4 -0
data/templates/update_dataset_dry.script.erb +3 -0
data/templates/workspace.prm.erb +25 -0
metadata +412 -0

data/README.md ADDED Viewed

@@ -0,0 +1,169 @@
+#BAsh Machinery = BAM
+This thing is fresh from the oven. It is 0.0.1 so there are lots of rough edges. On the other hand there is enough to make you dangerous. Play with it, break it, let me know.
+###What are goals of BAM
+* Be able to spin up a predefined project in hours not days
+* Everything implemented using CloudConnect so we can still leverage the infrastructure and deploy to secure
+* Make some specific modifications easy (new fields from sources)
+###What are not goals of BAM
+* Supersede Clover/CC
+* Provide templates for development. The generated grapsh are meant not to be tampered with.
+* define a whole processing language. This might be the next extension but right now the goal is to be able to spin up predefined projects as fast and easily as possible. I am not defining primitives for joins, reformats etc.
+##Overview
+BAM is consisting of two parts. The underlying layer that allows you to build ETLs from prebuild constructs. Second part should make possible to express different configurations in user comprehensible way and configure first layer for specific projects so you do not need to deal with low level stuff when you decide that you want to use Amount instead of Total Price in GoodSales project.
+####1st layer
+There are 3 basic pieces that you will be playing around. Let's have a look at those
+1) Tap
+This is a fancy name for source of data. It can be downloader from SF or CSV file. Tap configurations are source specific. Currently there is SF implemented.
+2) Sink
+This is the target of your data. The only sink we have currently is GD.
+3) Graph
+This is a clover graph. So it plays well with Ultra it needs to be created in a specific way. You can use graphs from the library or those that you provide locally (N/A yet).
+4) Flow
+This is something that describes how the data are flowing. The previous three pieces are the things that you can use in the flow.
+###2nd layer
+TBD
+##Installation
+create a directory `mkdir bam`
+cd into it `cd bam`
+	clone salesforce
+	clone bam
+	create a Gemfile `touch Gemfile`
+and put this inside
+	source "https://rubygems.org"
+	gem "bam", :path => "./bam"
+	gem "salesforce", :path => "./salesforce"
+make sure you are running ruby 1.9.x
+install bundler `gem install bundler`
+	cd bam
+	bundle install
+	cd ..
+	bundle install
+create a project `bundle exec bam scaffold project test`
+this will create a project
+##Sample project
+now you can go inside `cd test` and generate it
+`bundle exec bam generate`
+This tap will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution). This graph concatenates first name and last name together.
+	GoodData::CloverGenerator::DSL::flow("user") do |f|
+	  tap(:id => "user")
+	  graph("process_owner")
+	  metadata("user") do |m|
+	    m.remove("FirstName")
+	    m.remove("LastName")
+	    m.add(:name => "Name")
+	  end
+	  sink(:id => "user")
+	end
+Now you have to provide it the definition of tap which you can do like this.
+	{
+	   "source" : "salesforce"
+	  ,"object" : "User"
+	  ,"id"     : "user"
+	  ,"fields" : [
+	    {
+	      "name" : "Id"
+	    },
+	    {
+	      "name" : "FirstName"
+	    },
+	    {
+	      "name" : "LastName"
+	    },
+	    {
+	      "name" : "Region"
+	    },
+	    {
+	      "name" : "Department"
+	    }
+	  ]
+	}
+Also you need to provide a definition for sink which can look somwhow like this.
+	{
+	   "type"     : "dataset"
+	  ,"id"       : "user"
+	  ,"gd_name"  : "user"
+	  ,"fields"   : [
+	    {
+	      "name" : "Id"
+	    },
+	    {
+	      "name" : "Name"
+	    }
+	  ]
+	}
+For this example to work you need to provide SF and gd credentials. Provide them in params.json. You would need to provide also a project with appropriate project but this is out of scope of this "example" (I am working on tools that would make it easier).
+Now run `bundle exec bam generate` and there will be a folder with the clover project generated. Open it in CC find main.grf and run it. After crunching for a while you should see data in the project.
+### Runtime commands
+Part of the distribution is the bam executable which lets you do several neat things.
+Run `bam` to get the list of commands
+Run `bam help command` to get help about the command
+### deploy directory
+deploys the directory to the server. You can provide the param of the process as a parameter
+### generate
+Generates the ETL. The default target directory is clover_project (currently cannot be changed). You can provide --only parameter to specify the name of the flow to be processed if you do not need to generate all flows. Currently you can specify only on in only param
+### generate_downloaders
+If you have incremental downloaders in your project it good to deploy them as a separate process. This generates only the downloaders and is meant for exacltly this purpose. If you are interested about why it is a good idea. Take a look here (TBD). The target directory is downloaders_project (currently cannot be changed).
+### generate_xmls
+Investigates what is changed and performs the changes in the target project. Uses CL tool behind the scenes. Needs more work
+### model_sync
+Syncs the model with the definition in sinks. {Todo} Add interactive addition. Sometimes the new field can actually be a typo or something like that. Possible to uncover with validate_datasets
+### run
+TBD
+### scaffold
+Takes an argument and creates a scaffold for you. It can scaffold project, flow, sink and tap.
+### taps_generate_docs
+In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git. The interpolated params are
+taps
+sinks
+### sinks_validate
+Currently works only for SF. Validates that the target SF instance has all the fields in the objects that are specified in the taps definitions.
+### validate_datasets
+Vallidates the sinks (currently only GD) with the definitions in the proeject. It looks for fields that are defined inside sinks and are not in the projects missing references etc. More description needed.

data/bin/bam ADDED Viewed

@@ -0,0 +1,218 @@
+#!/usr/bin/env ruby
+require 'gli'
+# begin # XXX: Remove this begin/rescue before distributing your app
+require 'bam'
+# rescue LoadError
+#   STDERR.puts "In development, you need to use `bundle exec bin/bam` to run your app"
+#   STDERR.puts "At install-time, RubyGems will make sure lib, etc. are in the load path"
+#   STDERR.puts "Feel free to remove this message from bin/bam now"
+#   exit 64
+# end
+include GLI::App
+program_desc 'Describe your application here'
+version Bam::VERSION
+# desc 'Describe some switch here'
+# switch [:s,:switch]
+#
+desc 'Verbose'
+default_value false
+arg_name 'verbose'
+switch [:v,:verbose]
+desc 'Generates clover project based on information in current directory. The default ouptut is the directory ./clover_project'
+# arg_name 'Describe arguments to new here'
+command :generate do |c|
+  c.desc 'generate only specified flow'
+  c.arg_name 'only'
+  c.flag :only
+  c.action do |global_options,options,args|
+    GoodData::CloverGenerator.clobber_clover_project
+    GoodData::CloverGenerator.run(options)
+  end
+end
+desc 'Generates clover project for downloaders.'
+# arg_name 'Describe arguments to new here'
+command :generate_downloaders do |c|
+  c.desc 's3 backup'
+  c.arg_name 'backup'
+  c.flag :backup
+  c.action do |global_options,options,args|
+    GoodData::CloverGenerator.clobber_downloader_project
+    GoodData::CloverGenerator.generate_downloaders(options)
+  end
+end
+desc 'Validates that the tap has the fields it is claimed it should have. This is supposed to make the mitigate errors during deploy.'
+# arg_name 'Describe arguments to new here'
+command :taps_validate do |c|
+  c.action do |global_options,options,args|
+    GoodData::CloverGenerator.validate_taps
+  end
+end
+desc 'Validates that the tap has the fields it is claimed it should have. This is supposed to make the mitigate errors during deploy.'
+# arg_name 'Describe arguments to new here'
+command :taps_generate_docs do |c|
+  c.action do |global_options,options,args|
+    GoodData::CloverGenerator.taps_generate_docs
+  end
+end
+desc 'Lists processes for the project.'
+# arg_name 'Describe arguments to new here'
+command :procs do |c|
+  c.desc 'procs for all projects'
+  c.arg_name 'all'
+  c.switch :all
+  c.action do |global_options,options,args|
+    out = GoodData::CloverGenerator.procs_list(options)
+    out.each do |proc|
+      puts proc.join(',')
+    end
+  end
+end
+desc 'Validates that the tap has the fields it is claimed it should have. This is supposed to make the mitigate errors during deploy.'
+# arg_name 'Describe arguments to new here'
+command :sinks_validate do |c|
+  c.action do |global_options,options,args|
+    x = GoodData::CloverGenerator.validate_datasets
+  end
+end
+desc 'Generates structures'
+arg_name 'what you want to generate project, tap, flow, dataset'
+command :scaffold do |c|
+  c.action do |global_options,options,args|
+    command = args.first
+    fail "You did not provide what I should scaffold. I can generate project, tap, flow, sink nothing else" unless ["project", "tap", "flow", "sink"].include?(command)
+    case command
+    when "project"
+      puts "project"
+      directory = args[1]
+      fail "Directory has to be provided as an argument. See help" if directory.nil?
+      GoodData::CloverGenerator.setup_bash_structure(directory)
+    when "flow"
+      name = args[1]
+      fail "Name of the flow has to be provided as an argument. See help" if name.nil?
+      GoodData::CloverGenerator.setup_flow(name)
+    when "tap"
+      name = args[1]
+      fail "Name of the tap has to be provided as an argument. See help" if name.nil?
+      GoodData::CloverGenerator.setup_tap(name)
+    when "sink"
+      name = args[1]
+      fail "Name of the sink has to be provided as an argument. See help" if name.nil?
+      GoodData::CloverGenerator.setup_sink(name)
+    end
+  end
+end
+desc 'Runs the project on server'
+command :run do |c|
+  c.action do |global_options,options,args|
+    puts "This would run the project. But it is not yet implemented"
+  end
+end
+desc 'Runs the project on server'
+command :model_sync do |c|
+  c.desc 'do not execute'
+  c.arg_name 'dry'
+  c.switch :dry
+  c.action do |global_options,options,args|
+    GoodData::CloverGenerator.model_sync(options)
+  end
+end
+desc 'Deploys the project on server and schedules it'
+command :deploy do |c|
+  c.desc 'existing process id under which it is going to be redeployed'
+  c.arg_name 'process'
+  c.flag :process
+  c.desc 'name of the process'
+  c.arg_name 'name'
+  c.flag :name
+  c.action do |global_options,options,args|
+    dir = args.first
+    fail "You have to specify directory to deploy as an argument" if dir.nil?
+    fail "Specified directory does not exist" unless File.exist?(dir)
+    GoodData::CloverGenerator.connect_to_gd
+    response = GoodData::CloverGenerator.deploy(dir, options)
+  end
+end
+desc 'Runs the project on server'
+command :run do |c|
+  # c.desc 'existing process id under which it is going to be redeployed'
+  # c.arg_name 'process'
+  # c.flag :process
+  c.action do |global_options,options,args|
+    dir = args.first
+    fail "You have to specify directory to deploy as an argument" if dir.nil?
+    fail "Specified directory does not exist" unless File.exist?(dir)
+    verbose = global_options[:v]
+    GoodData::CloverGenerator.connect_to_gd
+    GoodData::CloverGenerator.create_email_channel
+    GoodData::CloverGenerator.deploy(args.first, global_options.merge({:name => "temporary"})) do |deploy_response|
+      puts HighLine::color("Executing", HighLine::BOLD) if verbose
+      GoodData::CloverGenerator.create_email_channel do
+        GoodData::CloverGenerator.execute_process(deploy_response["cloverTransformation"]["links"]["executions"], dir)
+      end
+    end
+  end
+end
+pre do |global,command,options,args|
+  # Pre logic here
+  # Return true to proceed; false to abort and not call the
+  # chosen command
+  # Use skips_pre before a command to skip this block
+  # on that command only
+  true
+end
+post do |global_options,command,options,args|
+  # Post logic here
+  # Use skips_post before a command to skip this
+  # block on that command only
+  verbose = global_options[:v]
+  puts HighLine::color("DONE", :green) if verbose
+end
+on_error do |exception|
+  pp exception.backtrace
+  # Error logic here
+  # return false to skip default error handling
+  true
+end
+exit run(ARGV)

data/lib/bam/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module Bam
+  VERSION = '0.0.1'
+end

data/lib/bam.rb ADDED Viewed

@@ -0,0 +1,8 @@
+require 'bam/version.rb'
+require 'pry'
+require 'zip/zip'
+require 'fileutils'
+$:.unshift(File.dirname(__FILE__))
+require 'runtime'
+# Add requires for other files you add to your project here, so
+# you just need to require this one file in your bin file

data/lib/dsl/project_dsl.rb ADDED Viewed

@@ -0,0 +1,259 @@
+require 'terminal-table'
+module GoodData
+  module CloverGenerator
+    module DSL
+      class RemoveMetadataFieldError < RuntimeError
+        attr_reader :options, :metadata, :field
+        def initialize(message, options={})
+          super(message)
+          @options = options
+          @metadata = options[:metadata]
+          @field = options[:field]
+        end
+      end
+      class Metadata
+        attr_accessor :metadata
+        def name
+          metadata[:name]
+        end
+        def to_hash
+          metadata
+        end
+        def initialize(metadata)
+          @metadata = metadata
+        end
+        def add(options={})
+          fail "You have to specify name at the metadata change. You specified #{what}" unless options.has_key?(:name)
+          position = options[:position] || 0
+          what = {
+            :name => options[:name],
+            :type => options[:type] || "string"
+          }
+          @metadata[:fields].insert(position - 1, what)
+          @metadata
+        end
+        def remove(what)
+          fields = metadata[:fields]
+          fail RemoveMetadataFieldError.new("Specified column #{what} was not found", :field => what, :metadata => self) unless fields.detect {|f| f[:name] == what}
+          @metadata[:fields] = fields.find_all {|f| f[:name] != what}
+          @metadata
+        end
+        def change
+          yield(self)
+          self
+        end
+      end
+      class Flow
+        attr_accessor :steps, :name
+        def self.define(name="", &script)
+          puts "Reading flow #{name}"
+          x = self.new
+          x.flow_name(name)
+          x.instance_eval(&script)
+          x
+        end
+        def initialize
+          @steps = []
+        end
+        def flow_name(name)
+          @name = name
+        end
+        def tap(options={}, &bl)
+          step({:type => :tap, :source_name => options[:id]})
+        end
+        def sink(options={}, &bl)
+          step(:type => :upload, :id => options[:id], &bl)
+        end
+        def graph(graph, &bl)
+          step(:graph => graph, :type => :user_provided, &bl)
+        end
+        def parallel(&bl)
+        end
+        def step(options={}, &bl)
+          graph = options[:graph]
+          type = options[:type]
+          steps.push(options)
+          puts "Running step #{graph}"
+        end
+        def metadata(name=nil,options={}, &bl)
+          steps.last[:metadata_block] = [] if steps.last[:metadata_block].nil?
+          steps.last[:metadata_block] << {:name => name, :block => bl, :out_as => options[:out_as]}
+        end
+      end
+      def self.flow(name="", &bl)
+        Flow.define(name, &bl)
+      end
+      class Project
+        attr_accessor :usecases, :name, :dims
+        def self.define(&script)
+          print self
+          x = self.new
+          x.instance_eval(&script)
+          x
+        end
+        def initialize
+          @usecases = []
+        end
+        def project_name(name)
+          @name = name
+        end
+        def use_dims(dims)
+          @dims = dims
+        end
+        def use_usecase(usecase)
+          @usecases << usecase
+        end
+        def get_sources
+          configs = []
+          FileUtils.cd('./taps') do
+            Dir.glob('*.json').each do|f|
+              configs << JSON.parse(File.read(f), :symbolize_names => true)
+            end
+          end
+          configs
+        end
+        def print_sources(taps)
+          puts
+          puts "Printing sources"
+          puts "================"
+          puts
+          taps.each do |tap|
+            fail "Provided tap #{tap[:object]} does not seem to be tap" if tap[:type] != "tap"
+            if tap[:source] == "salesforce"
+              table = Terminal::Table.new(:title => "#{tap[:source]} => #{tap[:object]}", :style => {:width => 30}) do |t|
+                tap[:fields].each do |f|
+                  t << [f[:name], f[:name]]
+                end
+              end
+              puts table
+              puts
+            end
+          end
+        end
+        def get_datasets
+          configs = []
+          FileUtils.cd('./sinks') do
+            Dir.glob('*.json').each do|f|
+              configs << JSON.parse(File.read(f), :symbolize_names => true)
+            end
+          end
+          configs
+        end
+        def compare_fields(sources, datasets)
+          a = sources.reduce([]) do |memo, source|
+            x = source[:object]
+            memo.concat(source[:fields].map {|f| [x, f[:name]]})
+            memo
+          end
+          b = datasets.reduce([]) do |memo, source|
+            x = source[:name]
+            memo.concat(source[:fields].map {|f| [x, f[:name]]})
+            memo
+          end
+          result = (a | b) - (a & b)
+          if result.count > 0
+            puts "------------------"
+            puts "All fields not in"
+            puts "------------------"
+            result.each {|x| pp x}
+            fail "Some fields form source are not used"
+          end
+        end
+        def run(repo)
+          puts "Running"
+          puts "looking for dimension definitions"
+          dims.each do |dim|
+            puts "found #{dim}"
+          end
+          sources = get_sources
+          fail "You have no sources defined" if sources.empty?
+          puts "Found #{sources.count} sources"
+          datasets = get_datasets
+          fail "You have no datasets defined" if datasets.empty?
+          puts "Found #{datasets.count} sources"
+          puts "Composing the tree"
+          you = GoodData::CloverGenerator::Dependency::N.new({
+            :name => name,
+            :type => "project",
+            :provides => [],
+            :requires => @usecases
+          })
+          provided_dims = @dims.map do |dim_to_provide|
+            GoodData::CloverGenerator::Dependency::N.new({
+              :package => dim_to_provide.split("/").first,
+              :name => dim_to_provide.split("/").last,
+              :provides => [dim_to_provide.split("/").last],
+              :type => "dim",
+              :requires => []
+            })
+          end
+          provided_dims.each {|x| repo << x}
+          # graph = resolve(repo, you)
+          # to_dot(graph)
+          v = GoodData::CloverGenerator::Dependency::Visitor.new
+        end
+      end
+      def self.project(&bl)
+        Project.define(&bl)
+      end
+    end
+  end
+end

data/lib/graphs/docentize.grf ADDED Viewed

@@ -0,0 +1,47 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<Graph author="fluke" created="Tue Feb 05 15:38:24 PST 2013" guiVersion="3.3.1" id="1360179808937" licenseType="Commercial" modified="Fri Feb 22 12:18:42 PST 2013" modifiedBy="fluke" name="process_name" revision="1.13" showComponentDetails="true">
+<Global>
+<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_in.xml" id="Metadata0"/>
+<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_out.xml" id="Metadata1"/>
+<MetadataGroup id="ComponentGroup0" name="metadata"/>
+<Property fileURL="params.txt" id="GraphParameter0"/>
+<Property fileURL="workspace.prm" id="GraphParameter0"/>
+<Dictionary/>
+</Global>
+<Phase number="0">
+<Node enabled="enabled" fileURL="data/1_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="169" id="DATA_READER0" quoteCharacter="&quot;" quotedStrings="true" skipRows="1" type="DATA_READER"/>
+<Node enabled="enabled" fileURL="data/out.csv" guiHeight="89" guiName="CSV Writer" guiWidth="128" guiX="609" guiY="169" id="DATA_WRITER0" outputFieldNames="true" quoteCharacter="&quot;" quotedStrings="true" type="DATA_WRITER"/>
+<Node enabled="enabled" guiHeight="65" guiName="Reformat" guiWidth="128" guiX="365" guiY="175" id="REFORMAT0" type="REFORMAT">
+<attr name="transform"><![CDATA[//#CTL2
+// Transforms input record into output record.
+function integer transform() {
+	$out.0.* = $in.0.*;
+	$out.0.Name = "Docent " + $in.0.Name;
+	return OK;
+}
+// Called during component initialization.
+// function boolean init() {}
+// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
+// required by the transform. All resources allocated within this method should be released
+// by the postExecute() method.
+// function void preExecute() {}
+// Called only if transform() throws an exception.
+// function integer transformOnError(string errorMessage, string stackTrace) {}
+// Called during each graph run after the entire transform was executed. Should be used to free any resources
+// allocated within the preExecute() method.
+// function void postExecute() {}
+// Called to return a user-defined error message when an error occurs.
+// function string getMessage() {}
+]]></attr>
+</Node>
+<Edge fromNode="DATA_READER0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (output)" toNode="REFORMAT0:0"/>
+<Edge fromNode="REFORMAT0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge1" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 0 (out)" toNode="DATA_WRITER0:0"/>
+</Phase>
+</Graph>