gd_bam 0.0.7 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -11,29 +11,13 @@ make sure you have ruby (1.9 and 1.8.7 is currently supported) and that you have
11
11
 
12
12
  Done.
13
13
 
14
- <!-- ##Spin up a new project
15
-
16
- Here project refers to the GoodData project.
17
-
18
- `bam project`
19
-
20
- This spins up a completely new empty project. You can further specify to spin up several predefined templates that bam knows about. Currently it is just goodsales. You do not need to worry about versions etc. Bam does this for you and spins up the latest available.
21
-
22
- ###Scaffolding
23
- Scaffolding helps you create the files and provides you initial structure. Project here reffers to BAM project.
24
-
25
- create a project `bam scaffold project test`
26
-
27
- this will create a project. You can also scaffold a project from a know template. Currently it is again just goodsales. The goal is that after spinning you gooddata and BAM project you are good to run. If you want the defaults there is nothing you need to do besides filling credentials.
28
-
29
- -->
30
14
  ##Sample project -- GoodSales
31
15
 
32
16
  ###Prerequisites
33
- You need a working project with API access. You should also have username pass and token ready to fill in to params.json.
17
+ You need a working Salesforce project with API access. You should also have username pass and token ready to fill in to params.json.
34
18
 
35
19
  ### Warnings
36
- The project is currently cloned out of an existing project. That means that you need to have access to it. If you do not (the project PID is i49w4c73c2mh75iiehte3fv3fbos8h2k) ask svarovsky@gooddata.com. Eventually this will be covered by a template so you will not need to do anything special. The template creation is tracked here https://jira.intgdc.com/browse/GD-34641 .
20
+ The project is currently cloned out of an existing project. That means that you need to have access to it. If you do not (the project PID is nt935rwzls50zfqwy6dh62tabu8h0ocy) ask svarovsky@gooddata.com. Eventually this will be covered by a template so you will not need to do anything special. The template creation is tracked here https://jira.intgdc.com/browse/GD-34641 .
37
21
 
38
22
  ###Let's get to it
39
23
  We will spin a goodsales project and load it with data. Prerequisite for this is a functioning Salesforce poject that you can grab at force.com.
@@ -41,7 +25,7 @@ We will spin a goodsales project and load it with data. Prerequisite for this is
41
25
  `bam scaffold project test --blueprint goodsales`
42
26
 
43
27
  now you can go inside `cd test`. You will notice several directories and files. We will get to `flows`, `taps` and `sinks` late. Currently focus just on `params.json`
44
- If you open it you will see several parameters which you will have to fill. Common one should be predefined and empty. For starters you will need `gd_login` and `gd_pass` parameters filled in.
28
+ If you open it you will see several parameters which you will have to fill. Common one should be predefined and empty. For starters you will need `gd_login`, `gd_pass`, `sf_password`, `sf_token` and `sf_login` parameters filled in. You can check that Salesforce connection is working by issuing `bam sf_jack_in`. If it is you should have a REPL opened up. If not you should get an error message.
45
29
 
46
30
  One of the parameters is project_pid. To get that you need a project.
47
31
 
@@ -49,7 +33,10 @@ One of the parameters is project_pid. To get that you need a project.
49
33
 
50
34
  This should spin for a while and eventually should give you project ID. Fill it in your params.json.
51
35
 
52
- Now go ahead and generate the downloaders.
36
+
37
+ Now we are going to generate downloaders and before we do so it is a good practice to make sure that you have everything you need to. You can issue `bam taps_validate` which will go to Salesforce check every field you defined and make sure it is available. If not it will warn you. We tried to pick the fields that will be in you SalesForce but it is possible that they were deleted or the user does not have access to them.
38
+
39
+ If everything is ok go ahead and generate the downloaders.
53
40
 
54
41
  `bam generate_downloaders`
55
42
 
@@ -69,16 +56,19 @@ This works the same as with downloaders but its default target is clover_project
69
56
 
70
57
  After it is finished log in to gooddata go into your project and celebrate. You just did project using BAM.
71
58
 
72
- ##Painful metadata management
73
- Key pain that I had with CloudConnect is that I hated the management of metadata. Every project I saw was just pile of metadata definition that has to be constantly changed and tweaked. This is caused by couple of chioces that creators of underlying Clover engine made in the beginning and probably will not be changed easily. While I am trying to make it better I am still bound by these choices and sometimes the wiring stick out - sorry for that.
59
+ ##When Things go wrong
60
+ We tried our best to this experience be a smooth one but sometimes things go bad. Here are some typical problems that can occur.
74
61
 
75
- ###Incremental metadata
76
- Bam is working with something that is called Incremental metadata. Metadata is not defined in each step you just say what you want to change. Picture is probably better than thousand words.
62
+ ###Field is inaccessible in SF
63
+ In the log there should be something like
77
64
 
78
- You have a conceptual picture of a simple transformation. You get a Tap that downloads FirstName and LastName somewhere. Obviously you would like to join them together to form a name. Exactly this happens in the second box the transformer. You would like to sink the only field and that is name. So on the next edge what you say is "I am adding Name and removing FirstName and LastName". So far so good. What is elegant about this approach is that how it copes with change. Imagine that the tap gets not only FirstName and LastName but also Age. Now what you need to change? If you would do it the old way You would have to change metadata on both edges, tap transformer and sink. With incremental metadata you need to change tap and sink nothing else. Since I claim that dealing with metadata was the biggest pain this is a lot of work (and errors) that you just saved.
65
+ Worker task failed: Missing mandatory fields
66
+
67
+ This means that some of your fields are either not accessible or not in your SF project. Use `bam taps_validate` to identify those and remap them.
68
+
69
+ ##Next steps
70
+ Ok so by now you hopefully have your project up and running. Before we dive into modifications you have to understand key concepts tha BAM builds on. Once you are comfortable with those we will get back with.
79
71
 
80
- ###Types or not?
81
- Clover engine is built on Java and it shows. It is statically typed and CTL Clover transformation language resembles Java a lot. While it helps speed and many people claim it prevents errors it also causes more work and helps metadata explosion. Sometimes you need to translate an field into another field becuase you need to do something specific or the component needs it. It is not problem per se but it is important to see the tradeoffs and push the functionality into the components that should work for you and not against you. It is also important to do certain tasks at certain phases. If you do this you found out that certain parts are easier to automate or you can easily reuse work that you did somewhere else.
82
72
 
83
73
  ##Taps
84
74
  Taps are sources of data. Right now you can use just salesforce tap.
@@ -131,7 +121,7 @@ Sometimes it is useful to limit number of grabbed values for example for testing
131
121
  }
132
122
 
133
123
  ####Acts as
134
- Sometime it is needed to use one field several times in a source of data or you want to "call" certain field differently because the ETL relies on partiular name. Both cases are handled using `acts_as`
124
+ Sometime it is needed to use one field several times in a source of data or you want to "call" certain field differently because the ETL relies on particular name. Both cases are handled using `acts_as`
135
125
 
136
126
  {
137
127
  "source" : "salesforce"
@@ -149,6 +139,8 @@ Sometime it is needed to use one field several times in a source of data or you
149
139
 
150
140
  Id will be routed to both Id and Name. Custom_Amount__c will be called RenamedAmount.
151
141
 
142
+ Caution: This is a double edged sword so be careful. The idea here is that it should make your life easier not harder. You should map a field to a different one in exactly 2 cases. One is that you want the same field twice. The second is that (predefined) ETL requires certain field under certain name. If you are not careful it is easy to introduce data from 2 columns into a single one.
143
+
152
144
  ####Condition
153
145
  You can also specify a condition during download. I recommend using it only if it drastically lowers the amount of data that goes over wire. Otherwise implement it elsewhere.
154
146
 
@@ -162,15 +154,15 @@ It is wasteful to download everything on and on again. If you specify the increm
162
154
  The reason for this is simple. When you download only incrementally you do not stress the wires that much and that means you can run it pretty often. By runnning it often it means that even if something horrible happens once it will probably run succesfully next time. And as we mentioned this is cheap. On the other hand runnning the main ETL is often very expensive and recovering from failure is usually different so splitting them simplifies development of each. Since they are independent they can be even developed by different people which is sometimes useful.
163
155
 
164
156
  ####Taps validation
165
- Fail early. There is nothing more frustrating than when the ETL fails during exwcution. When you develop the taps you can ask BAM to connect to SF and validate that the fields are present. This is not bulletproof since some fields can go away at any time but it gives you good idea if you did not misspelled any fields.
157
+ Fail early. There is nothing more frustrating than when the ETL fails during execution. When you develop the taps you can ask BAM to connect to SF and validate that the fields are present. This is not bulletproof since some fields can go away at any time but it gives you good idea if you did not misspelled any fields.
166
158
 
167
159
  ####Mandatory fields
168
- Sometimes it is necessary to move fields around in SF. In such case the tap will. If you know this upfront you can tell BAM that this field is not mandatory and it will silently go along filling the missing field with ''
160
+ Sometimes it is necessary to move fields around in SF. In such case the tap will. If you know this upfront you can tell BAM that this field is not mandatory and it will silently go along filling the missing field with ''. If it is marked as mandatory which all fields are by default it will fail if it cannot access the field.
169
161
 
170
162
  ##Flows
171
- Flow is an abstraction
163
+ Flow is an abstraction that should connect a tap with a sink creating a .. well a flow.
172
164
 
173
- This flow will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution). This graph concatenates first name and last name together. It then feeds data to the sink.
165
+ Probably better show you a simple example. This flow will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution but we can act that is an arbitrary graph). This graph concatenates first name and last name together. It then feeds data to the sink.
174
166
 
175
167
  GoodData::CloverGenerator::DSL::flow("user") do |f|
176
168
  tap(:id => "user")
@@ -185,18 +177,19 @@ This flow will download users from sf (you have to provide credentials in params
185
177
  sink(:id => "user")
186
178
  end
187
179
 
188
- Now you have to provide it the definition of tap which you can do like this.
180
+ Note couple of things.
189
181
 
190
- ###When I call external graph? How does it work?
191
- In the flow you can call external graph by using
182
+ * The flow is defined using a DSL in Ruby. If you like Ruby great if you do not I recommend http://rubymonk.com/ ot get you up to speed. This might change and we might introduce our own DSL.
183
+ * Flow has its own id. The name of the file does not actually matter. Again something that we are thinking about.
184
+ * with a tap you will include a tap into the flow. You can specify the id of a tap with id param. If you omit it it will try to include the tap with the same id as the flow.
185
+ * with graph you run a graph of a name process_owner. When BAM creates the graphs for you there are two places it looks for the graphs. First it lookes into your project into `local_graphs` then it tries to look into the library that comes with BAM. Again especially the second part is going to change.
186
+ * there might be one or more metadata statements after graph definition. Each graph might expect numerous inputs so the order of these `metadata` statements is telling you which input goes where. Second purpose is actually telling what is going to change in those metadata. Here we are saying "*Ok the user is going in as input number one (there is no number two in this case). At the output user will have one more field and that is Name. On top of that we are removing two fields FirstName and LastName*".
187
+ * The last thing we specify is the sink. Again as in tap you can specify id so you tell BAM for which sink it should look. If you do not fill it in by default it looks for the same is as your flow.
192
188
 
193
- graph('my_graph')
194
-
195
- It goes to 2 places (this will change) and tries to find the graph. First place is your `local_graphs` directory in your project the second place is central reporsitory that is currently inside bam library and this part will probably change.
196
189
 
197
190
  ##Sinks
198
191
 
199
- Sink is a definition of where data goes to. Currently there is only one sink type and that is gooddata dataset.
192
+ Sink is a definition of where data goes. Currently there is only one sink type and that is gooddata dataset.
200
193
 
201
194
  ###GoodData
202
195
 
@@ -233,11 +226,9 @@ Sink is a definition of where data goes to. Currently there is only one sink typ
233
226
  Gooddat sink is currently just mimicking the CL tool definition + some shortcuts on top of that. If you are familiar with CL tool you should be right at home if I tell you that the only additional thing you have to provide is telling BAM which metadata field is pulled in to a given field.
234
227
 
235
228
 
229
+ ##Adding a field
230
+ Ok let's say you have a basic GoodSales
236
231
 
237
- <!--For this example to work you need to provide SF and gd credentials. Provide them in params.json. You would need to provide also a project with appropriate project but this is out of scope of this "example" (I am working on tools that would make it easier).
238
-
239
- Now run `bam generate` and there will be a folder with the clover project generated. Open it in CC find main.grf and run it. After crunching for a while you should see data in the project.
240
- -->
241
232
 
242
233
  ##Runtime commands
243
234
  Part of the distribution is the bam executable which lets you do several neat things on the commandline
@@ -245,39 +236,77 @@ Part of the distribution is the bam executable which lets you do several neat th
245
236
  Run `bam` to get the list of commands
246
237
  Run `bam help command` to get help about the command
247
238
 
248
- ### deploy directory
249
- deploys the directory to the server. You can provide the param of the process as a parameter
250
-
251
239
  ### generate
252
- Generates the ETL. The default target directory is clover_project (currently cannot be changed). You can provide --only parameter to specify the name of the flow to be processed if you do not need to generate all flows. Currently you can specify only one flow
240
+ Generates the ETL. The default target directory is clover_project (currently cannot be changed).
241
+ **--only flow_id** generates only one flow. Useful for debugging
242
+
243
+ bame generate --only owner
253
244
 
254
245
  ### generate_downloaders
255
- If you have incremental downloaders in your project it good to deploy them as a separate process. This generates only the downloaders and is meant for exacltly this purpose. If you are interested about why it is a good idea. Take a look here (TBD). The target directory is downloaders_project (currently cannot be changed).
246
+ Generates downloaders into downloaders_project (currently cannot be changed).
247
+
248
+ ### deploy directory
249
+ deploys the directory to the server.
250
+
251
+ bam deploy clover_project
256
252
 
257
- ### generate_xmls
258
- Investigates what is changed and performs the changes in the target project. Uses CL tool behind the scenes. Needs more work
253
+ **--process process_id** You can specify a process ID so you can redeploy to the same process. This just updates the deployed project. All the schedules are still in effect.
254
+
255
+ bam deploy clover_project --process 1231jkadjk123k
259
256
 
260
257
  ### model_sync
261
- Syncs the model with the definition in sinks. Sometimes the new field can actually be a typo or something like that. Possible to uncover with validate_datasets
258
+ This will go through the sinks and updates the model. It rellies on CL tool to do this so this describes the limitation. It is very useful for adding additonal fields not changing the model altogether.
262
259
 
263
260
  ### run
264
- Runs the project and
265
- `bam run clover-project --email me@gooddata.com`
261
+ Runs the project on the server. This is achived by deploying it there and deleting after the run finsihes.
262
+
263
+ bam run clover-project
264
+
265
+ **--email someone@example.com** This will create a temporary email channel hooks the events on success and failure. The channel is tore down once the ETL is done.
266
266
 
267
267
  ### scaffold
268
- Takes an argument and creates a scaffold for you. It can scaffold project, flow, sink and tap.
268
+ Creates a file templates so you do not need to start from scratch.
269
+
270
+ bam scaffold project my_new_project
271
+
272
+ bam scaffold tap new_tap
273
+
274
+ bam scaffold flow new_flow
275
+
276
+ bam scaffold dataset new_dataset
277
+
278
+ To further ease your typical tasks in ETL BAM comes with couple of templates with prefilled ETL constructs
279
+
280
+ bam scaffold graph_template reformat local_process_my_stuff
281
+ bam scaffold graph_template join local_process_my_other_stuff
282
+
269
283
 
270
284
  ### taps_generate_docs
271
- In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git. The interpolated params are
272
- taps
273
- sinks
285
+ In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git. Since it is an erb template there are several expressions that you can use.
274
286
 
275
- ### sinks_validate
287
+ <%= taps %> - list of taps
288
+ <%= sinks %> - list of sinks
289
+
290
+ You can run arbitrary ruby code inside so you can write something like
291
+
292
+ Last generated at <%= Date.today %>
293
+
294
+ ### taps_validate
276
295
  Currently works only for SF. Validates that the target SF instance has all the fields in the objects that are specified in the taps definitions.
277
296
 
278
- ### validate_datasets
279
- Vallidates the sinks (currently only GD) with the definitions in the proeject. It looks for fields that are defined inside sinks and are not in the projects missing references etc. More description needed.
297
+ ### sinks_validate
298
+ TBD
299
+
300
+ ##The why
301
+ For those that are interested in reading why we actually bothered developing this. Read on.
302
+
303
+ ###Metadata management
304
+ Key pain that I had with CloudConnect is that I did not like the management of metadata. Every project I saw was just pile of metadata definition that has to be constantly changed and tweaked. This is caused by couple of choices that creators of underlying Clover engine made in the beginning and probably will not be changed easily. While I am trying to make it better I am still bound by these choices and sometimes the wiring stick out - sorry for that.
280
305
 
281
- ##Roadmap
282
- * Allow different storage then ES (Vertica)
283
- * Contract checkers
306
+ ###Incremental metadata
307
+ Bam is working with something that is called Incremental metadata. Metadata is not defined in each step you just say what you want to change. Picture is probably better than thousand words.
308
+
309
+ You have a conceptual picture of a simple transformation. You get a Tap that downloads FirstName and LastName somewhere. Obviously you would like to join them together to form a name. Exactly this happens in the second box the transformer. You would like to sink the only field and that is name. So on the next edge what you say is "I am adding Name and removing FirstName and LastName". So far so good. What is elegant about this approach is that how it copes with change. Imagine that the tap gets not only FirstName and LastName but also Age. Now what you need to change? If you would do it the old way You would have to change metadata on both edges, tap transformer and sink. With incremental metadata you need to change tap and sink nothing else. Since I claim that dealing with metadata was the biggest pain this is a lot of work (and errors) that you just saved.
310
+
311
+ ###Types or not?
312
+ Clover engine is built on Java and it shows. It is statically typed and CTL Clover transformation language resembles Java a lot. While it helps speed and many people claim it prevents errors it also causes more work and helps metadata explosion. Sometimes you need to translate an field into another field becuase you need to do something specific or the component needs it. It is not problem per se but it is important to see the tradeoffs and push the functionality into the components that should work for you and not against you. It is also important to do certain tasks at certain phases. If you do this you found out that certain parts are easier to automate or you can easily reuse work that you did somewhere else.
data/bin/bam CHANGED
@@ -16,6 +16,11 @@ default_value false
16
16
  arg_name 'verbose'
17
17
  switch [:v,:verbose]
18
18
 
19
+ desc 'Http logger'
20
+ default_value false
21
+ arg_name 'logger'
22
+ switch [:l,:logger]
23
+
19
24
 
20
25
  desc 'Generates clover project based on information in current directory. The default ouptut is the directory ./clover_project'
21
26
  # arg_name 'Describe arguments to new here'
@@ -58,7 +63,19 @@ desc 'Validates that the tap has the fields it is claimed it should have. This i
58
63
  # arg_name 'Describe arguments to new here'
59
64
  command :taps_validate do |c|
60
65
  c.action do |global_options,options,args|
61
- GoodData::CloverGenerator.validate_taps
66
+ verbose = global_options[:v]
67
+ result = GoodData::CloverGenerator.validate_taps
68
+
69
+ error = false
70
+ result.each_pair do |obj, fields|
71
+ if fields.empty?
72
+ puts HighLine::color("GOOD", :green) + " #{obj}" if verbose
73
+ error = true
74
+ else
75
+ puts HighLine::color("BAD", :red) + " #{obj} [" + fields.join(', ') + "]" if verbose
76
+ end
77
+ end
78
+ exit_now!("Errors found",exit_code=1) if error
62
79
  end
63
80
  end
64
81
 
@@ -111,12 +128,13 @@ command :project do |c|
111
128
 
112
129
  pid = case options[:blueprint]
113
130
  when "goodsales"
114
- "i49w4c73c2mh75iiehte3fv3fbos8h2k"
131
+ "nt935rwzls50zfqwy6dh62tabu8h0ocy"
115
132
  when nil
116
133
  fail "Empty project not supported now"
117
134
  end
118
135
 
119
- GoodData::CloverGenerator.connect_to_gd()
136
+ logger = Logger.new(STDOUT) if global_options[:l]
137
+ GoodData::CloverGenerator.connect_to_gd(:logger => logger)
120
138
  with_users = options[:with_users]
121
139
 
122
140
  export = {
@@ -257,7 +275,8 @@ command :deploy do |c|
257
275
  dir = args.first
258
276
  fail "You have to specify directory to deploy as an argument" if dir.nil?
259
277
  fail "Specified directory does not exist" unless File.exist?(dir)
260
- GoodData::CloverGenerator.connect_to_gd
278
+ logger = Logger.new(STDOUT) if global_options[:l]
279
+ GoodData::CloverGenerator.connect_to_gd(:logger => logger)
261
280
  options = global_options.merge({:name => "temporary"}).merge(options)
262
281
  response = GoodData::CloverGenerator.deploy(dir, options)
263
282
  end
@@ -279,15 +298,14 @@ command :run do |c|
279
298
  verbose = global_options[:v]
280
299
 
281
300
  logger = Logger.new(STDOUT) if global_options[:l]
282
-
283
-
284
301
  GoodData::CloverGenerator.connect_to_gd(:logger => logger)
285
302
  options = global_options.merge({:name => "temporary"})
286
303
  GoodData::CloverGenerator.deploy(dir, options) do |deploy_response|
287
304
  puts HighLine::color("Executing", HighLine::BOLD) if verbose
288
305
  GoodData::CloverGenerator.create_email_channel(options) do |channel_response|
289
306
  GoodData::CloverGenerator.subscribe_on_finish(:success, channel_response["channelConfiguration"]["meta"]["uri"], deploy_response["process"]["links"]["self"].split('/').last)
290
- GoodData::CloverGenerator.execute_process(deploy_response["process"]["links"]["executions"], dir)
307
+ result = GoodData::CloverGenerator.execute_process(deploy_response["process"]["links"]["executions"], dir)
308
+
291
309
  end
292
310
  end
293
311
  end
@@ -300,6 +318,7 @@ pre do |global,command,options,args|
300
318
  # chosen command
301
319
  # Use skips_pre before a command to skip this block
302
320
  # on that command only
321
+
303
322
  true
304
323
  end
305
324
 
data/lib/bam/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Bam
2
- VERSION = '0.0.7'
2
+ VERSION = '0.0.8'
3
3
  end
@@ -64,7 +64,7 @@ module GoodData
64
64
  attr_accessor :steps, :name
65
65
 
66
66
  def self.define(name="", &script)
67
- puts "Reading flow #{name}"
67
+ # puts "Reading flow #{name}"
68
68
  x = self.new
69
69
  x.flow_name(name)
70
70
  x.instance_eval(&script)
@@ -100,7 +100,7 @@ module GoodData
100
100
  type = options[:type]
101
101
 
102
102
  steps.push(options)
103
- puts "Running step #{graph}"
103
+ # puts "Running step #{graph}"
104
104
  end
105
105
 
106
106
  def metadata(name=nil,options={}, &bl)
@@ -121,7 +121,7 @@ module GoodData
121
121
  attr_accessor :usecases, :name, :dims
122
122
 
123
123
  def self.define(&script)
124
- print self
124
+ # print self
125
125
  x = self.new
126
126
  x.instance_eval(&script)
127
127
  x
@@ -209,23 +209,23 @@ module GoodData
209
209
  end
210
210
 
211
211
  def run(repo)
212
- puts "Running"
212
+ # puts "Running"
213
213
 
214
- puts "looking for dimension definitions"
215
- dims.each do |dim|
216
- puts "found #{dim}"
217
- end
214
+ # puts "looking for dimension definitions"
215
+ # dims.each do |dim|
216
+ # puts "found #{dim}"
217
+ # end
218
218
 
219
219
  sources = get_sources
220
220
  fail "You have no sources defined" if sources.empty?
221
- puts "Found #{sources.count} sources"
221
+ # puts "Found #{sources.count} sources"
222
222
 
223
223
 
224
224
  datasets = get_datasets
225
225
  fail "You have no datasets defined" if datasets.empty?
226
- puts "Found #{datasets.count} sources"
226
+ # puts "Found #{datasets.count} sources"
227
227
 
228
- puts "Composing the tree"
228
+ # puts "Composing the tree"
229
229
  you = GoodData::CloverGenerator::Dependency::N.new({
230
230
  :name => name,
231
231
  :type => "project",
@@ -262,13 +262,11 @@ module GoodData
262
262
 
263
263
  def self.validate_sf_metadata(sf_client, sources)
264
264
  sources.reduce({}) do |memo, source|
265
- puts "Checking #{source[:object]}"
266
265
  sf_object = source[:object]
267
266
  u = sf_client.describe(sf_object)
268
267
  sf_fields = u[:describeSObjectResponse][:result][:fields].map {|field| field[:name]}
269
268
  fields_to_validate = source[:fields].map {|field| field[:name]}
270
269
  memo[sf_object] = (fields_to_validate - sf_fields)
271
- pp fields_to_validate - sf_fields
272
270
  memo
273
271
  end
274
272
  end
@@ -393,7 +391,7 @@ module GoodData
393
391
  if spec[:condition].nil? || spec[:condition].empty?
394
392
  spec[:condition] = "SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
395
393
  else
396
- spec[:condition] += "AND SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
394
+ spec[:condition] += " AND SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
397
395
  end
398
396
  generate_select(spec)
399
397
  end
@@ -1229,7 +1227,7 @@ HEREDOC
1229
1227
  build_node2(builder, GoodData::CloverGenerator::Nodes.edge2({:toNode => "#{file}_es_reformat:0", :fromNode => "#{file}_copy:1", :metadata => "#{file}_clover_metadata", :id => get_id()}))
1230
1228
 
1231
1229
  if s3_backup then
1232
- build_node2(builder, GoodData::CloverGenerator::Nodes.writer2({:enabled => "disabled", :name => "#{file} s3 Writer", :id => "#{file}_s3", :fileURL => "https://${S3_ACCESS_KEY_ID}:\`replace(\"${S3_SECRET_ACCESS_KEY}\",\"/\",\"%2F\")\`@${S3_BUCKETNAME}.s3.amazonaws.com/${GDC_PROJECT_ID}/#{file}", :outputFieldNames => true, :quotedStrings => false}))
1230
+ build_node2(builder, GoodData::CloverGenerator::Nodes.writer2({:enabled => "enabled", :name => "#{file} s3 Writer", :id => "#{file}_s3", :fileURL => "https://${S3_ACCESS_KEY_ID}:\`replace(\"${S3_SECRET_ACCESS_KEY}\",\"/\",\"%2F\")\`@${S3_BUCKETNAME}.s3.amazonaws.com/${GDC_PROJECT_ID}/#{file}/#{file}_\`date2long(today())\`", :outputFieldNames => true, :quotedStrings => false}))
1233
1231
  end
1234
1232
  build_node2(builder, GoodData::CloverGenerator::Nodes.edge2({:toNode => "#{file}_csv:0", :fromNode => "#{file}_copy:0", :metadata => "#{file}_clover_metadata", :id => get_id()}))
1235
1233
  if s3_backup then
@@ -12,7 +12,7 @@ module GoodData
12
12
  class Visitor
13
13
 
14
14
  def accept(node)
15
- puts node
15
+ # puts node
16
16
  if node.type == "ldm"
17
17
  puts "LDM #{node.to_s}"
18
18
  else
data/lib/runtime.rb CHANGED
@@ -137,8 +137,7 @@ module GoodData
137
137
  sources = project.get_sources
138
138
  client = get_sf_client(PARAMS)
139
139
  sf_sources = sources.find_all {|tap| tap[:source] == "salesforce"}
140
- report = GoodData::CloverGenerator::validate_sf_metadata(client, sf_sources)
141
- pp report
140
+ GoodData::CloverGenerator::validate_sf_metadata(client, sf_sources)
142
141
  end
143
142
 
144
143
  def self.sf_jack_in
@@ -222,12 +221,14 @@ module GoodData
222
221
  sources = project.get_sources
223
222
  sf_sources = sources.find_all {|tap| tap[:source] == "salesforce" && tap[:incremental] == true}
224
223
  create_incremental_downloader_run_graph(CLOVER_DOWNLOADERS_ROOT + PROJECT_GRAPHS_ROOT + "main.grf", sf_sources)
224
+ s3_backup = PARAMS[:S3_SECRET_ACCESS_KEY] && PARAMS[:S3_ACCESS_KEY_ID] && PARAMS[:S3_BUCKETNAME]
225
+
225
226
  GoodData::CloverGenerator::create_incremental_downloading_graph(CLOVER_DOWNLOADERS_ROOT + PROJECT_GRAPHS_ROOT + "incremental.grf", sf_sources, {
226
227
  :password => PARAMS[:sf_password],
227
228
  :token => PARAMS[:sf_token],
228
229
  :login => PARAMS[:sf_login],
229
230
  :sf_server => PARAMS[:sf_server],
230
- :s3_backup => false
231
+ :s3_backup => s3_backup
231
232
  })
232
233
  end
233
234
 
@@ -239,7 +240,17 @@ module GoodData
239
240
  :params => {}
240
241
  }
241
242
  })
242
- GoodData.poll(result, "executionTask")
243
+ begin
244
+ GoodData.poll(result, "executionTask")
245
+ rescue RestClient::RequestFailed => e
246
+
247
+ ensure
248
+ result = GoodData.get(result["executionTask"]["links"]["detail"])
249
+ if result["executionDetail"]["status"] == "ERROR"
250
+ fail "Runing process failed. You can look at a log here #{result["executionDetail"]["logFileName"]}"
251
+ end
252
+ end
253
+ result
243
254
  end
244
255
 
245
256
  def self.connect_to_gd(options={})
@@ -384,6 +395,7 @@ module GoodData
384
395
  p = build_project
385
396
  sources = p.get_sources
386
397
  datasets = p.get_datasets
398
+ s3_backup = PARAMS[:S3_SECRET_ACCESS_KEY] && PARAMS[:S3_ACCESS_KEY_ID] && PARAMS[:S3_BUCKETNAME]
387
399
 
388
400
  flows = []
389
401
  FileUtils::cd FLOWS_ROOT do
@@ -441,7 +453,7 @@ module GoodData
441
453
 
442
454
  GoodData::CloverGenerator::create_es_downloading_graph(graph_name, [source], {
443
455
  :metadata => current_metadata[source_name],
444
- :s3_backup => false
456
+ :s3_backup => s3_backup
445
457
  })
446
458
  else
447
459
  GoodData::CloverGenerator::create_sf_downloading_graph(graph_name, [source], {
@@ -450,7 +462,7 @@ module GoodData
450
462
  :login => PARAMS[:sf_login],
451
463
  :sf_server => PARAMS[:sf_server],
452
464
  :metadata => current_metadata[source_name],
453
- :s3_backup => false
465
+ :s3_backup => s3_backup
454
466
  })
455
467
  end
456
468
 
@@ -0,0 +1,57 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <Graph author="fluke" created="Tue Feb 05 15:38:24 PST 2013" guiVersion="3.3.2" id="1360179808937" licenseCode="CLP1DGOODD71636137BY" licenseType="Commercial" modified="Mon May 06 10:12:35 PDT 2013" modifiedBy="gdc-defectivedisplay" name="process_name" revision="1.15" showComponentDetails="true">
3
+ <Global>
4
+ <Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_in.xml" id="Metadata0"/>
5
+ <Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_out.xml" id="Metadata1"/>
6
+ <Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/2_in.xml" id="Metadata2"/>
7
+ <Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/2_out.xml" id="Metadata3"/>
8
+ <MetadataGroup id="ComponentGroup0" name="metadata"/>
9
+ <Property fileURL="params.txt" id="GraphParameter14"/>
10
+ <Property fileURL="workspace.prm" id="GraphParameter0"/>
11
+ <Dictionary/>
12
+ </Global>
13
+ <Phase number="0">
14
+ <Node enabled="enabled" fileURL="data/1_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="169" id="DATA_READER0" quoteCharacter="&quot;" quotedStrings="true" skipRows="1" type="DATA_READER"/>
15
+ <Node enabled="enabled" fileURL="data/2_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="269" id="DATA_READER1" quoteCharacter="&quot;" quotedStrings="true" skipRows="1" type="DATA_READER"/>
16
+ <Node enabled="enabled" fileURL="data/out.csv" guiHeight="77" guiName="CSV Writer" guiWidth="128" guiX="776" guiY="196" id="DATA_WRITER0" outputFieldNames="true" quoteCharacter="&quot;" quotedStrings="true" type="DATA_WRITER"/>
17
+ <Node enabled="enabled" guiHeight="89" guiName="ExtMergeJoin" guiWidth="128" guiX="570" guiY="199" id="EXT_MERGE_JOIN0" joinKey="$UserRoleId(a)#$Id(a);" type="EXT_MERGE_JOIN">
18
+ <attr name="transform"><![CDATA[//#CTL2
19
+
20
+ // Transforms input record into output record.
21
+ function integer transform() {
22
+
23
+ $out.0.* = $in.0.*;
24
+ $out.0.Id = nvl2($out.0.Id, $in.1.Name, "");
25
+
26
+ return ALL;
27
+
28
+ }
29
+
30
+ // Called during component initialization.
31
+ // function boolean init() {}
32
+
33
+ // Called during each graph run before the transform is executed. May be used to allocate and initialize resources
34
+ // required by the transform. All resources allocated within this method should be released
35
+ // by the postExecute() method.
36
+ // function void preExecute() {}
37
+
38
+ // Called only if transform() throws an exception.
39
+ // function integer transformOnError(string errorMessage, string stackTrace) {}
40
+
41
+ // Called during each graph run after the entire transform was executed. Should be used to free any resources
42
+ // allocated within the preExecute() method.
43
+ // function void postExecute() {}
44
+
45
+ // Called to return a user-defined error message when an error occurs.
46
+ // function string getMessage() {}
47
+ ]]></attr>
48
+ </Node>
49
+ <Node enabled="enabled" guiHeight="77" guiName="ExtSort" guiWidth="128" guiX="362" guiY="161" id="EXT_SORT0" sortKey="UserRoleId(a)" type="EXT_SORT"/>
50
+ <Node enabled="enabled" guiHeight="77" guiName="ExtSort" guiWidth="128" guiX="362" guiY="275" id="EXT_SORT1" sortKey="Id(a)" type="EXT_SORT"/>
51
+ <Edge fromNode="DATA_READER0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (output)" toNode="EXT_SORT0:0"/>
52
+ <Edge fromNode="DATA_READER1:0" guiBendpoints="" guiRouter="Manhattan" id="Edge2" inPort="Port 0 (in)" metadata="Metadata2" outPort="Port 0 (output)" toNode="EXT_SORT1:0"/>
53
+ <Edge fromNode="EXT_MERGE_JOIN0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge4" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 0 (out)" toNode="DATA_WRITER0:0"/>
54
+ <Edge fromNode="EXT_SORT0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge1" inPort="Port 0 (driver)" metadata="Metadata0" outPort="Port 0 (out)" toNode="EXT_MERGE_JOIN0:0"/>
55
+ <Edge fromNode="EXT_SORT1:0" guiBendpoints="" guiRouter="Manhattan" id="Edge3" inPort="Port 1 (slave)" metadata="Metadata2" outPort="Port 0 (out)" toNode="EXT_MERGE_JOIN0:1"/>
56
+ </Phase>
57
+ </Graph>
metadata CHANGED
@@ -1,15 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gd_bam
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.7
4
+ version: 0.0.8
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
8
- - Your Name Here
8
+ - Tomas Svarovsky
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-05-03 00:00:00.000000000 Z
12
+ date: 2013-05-09 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
@@ -365,9 +365,9 @@ files:
365
365
  - lib/repo/1_config.json
366
366
  - lib/repository/repo.rb
367
367
  - lib/runtime.rb
368
- - lib/taps/tap.rb
369
368
  - templates/dataset.json.erb
370
369
  - templates/flow.rb.erb
370
+ - templates/join_template.grf.erb
371
371
  - templates/params.json.erb
372
372
  - templates/project.erb
373
373
  - templates/reformat_template.grf.erb
@@ -395,12 +395,18 @@ required_ruby_version: !ruby/object:Gem::Requirement
395
395
  - - ! '>='
396
396
  - !ruby/object:Gem::Version
397
397
  version: '0'
398
+ segments:
399
+ - 0
400
+ hash: 3774242164736806626
398
401
  required_rubygems_version: !ruby/object:Gem::Requirement
399
402
  none: false
400
403
  requirements:
401
404
  - - ! '>='
402
405
  - !ruby/object:Gem::Version
403
406
  version: '0'
407
+ segments:
408
+ - 0
409
+ hash: 3774242164736806626
404
410
  requirements: []
405
411
  rubyforge_project:
406
412
  rubygems_version: 1.8.25
data/lib/taps/tap.rb DELETED
@@ -1,52 +0,0 @@
1
- require 'hasb'
2
-
3
- class HashValidator::Validator::TapFieldValidtor < HashValidator::Validator::Base
4
- def initialize
5
- super('tap_field_validator') # The name of the validator
6
- end
7
-
8
- def validate(key, value, validations, errors)
9
- binding.pry
10
- unless value.is_a?(Integer) && value.odd?
11
- errors[key] = presence_error_message
12
- end
13
- end
14
- end
15
-
16
- module GoodData
17
- module BAM
18
- module Taps
19
-
20
- {
21
- "type" : "tap"
22
- ,"source" : "salesforce"
23
- ,"object" : "Account"
24
- ,"id" : "account"
25
- ,"incremental" : true
26
- ,"fields" : [
27
- {
28
- "name" : "Id"
29
- },
30
- {
31
- "name" : "Name"
32
- },
33
- {
34
- "name" : "SystemModstamp", "acts_as": ["timestamp"]
35
- }
36
- ]
37
- // ,"limit": "10"
38
- }
39
-
40
-
41
- VALIDATOR = {
42
- :type => :string,
43
- :source =>
44
-
45
- }
46
-
47
- def self.parse_tap(tap_spec)
48
-
49
- end
50
- end
51
- end
52
- end