gd_bam 0.0.7 → 0.0.8
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +92 -63
- data/bin/bam +26 -7
- data/lib/bam/version.rb +1 -1
- data/lib/dsl/project_dsl.rb +11 -11
- data/lib/nodes/clover_gen.rb +2 -4
- data/lib/nodes/dependency.rb +1 -1
- data/lib/runtime.rb +18 -6
- data/templates/join_template.grf.erb +57 -0
- metadata +10 -4
- data/lib/taps/tap.rb +0 -52
data/README.md
CHANGED
@@ -11,29 +11,13 @@ make sure you have ruby (1.9 and 1.8.7 is currently supported) and that you have
|
|
11
11
|
|
12
12
|
Done.
|
13
13
|
|
14
|
-
<!-- ##Spin up a new project
|
15
|
-
|
16
|
-
Here project refers to the GoodData project.
|
17
|
-
|
18
|
-
`bam project`
|
19
|
-
|
20
|
-
This spins up a completely new empty project. You can further specify to spin up several predefined templates that bam knows about. Currently it is just goodsales. You do not need to worry about versions etc. Bam does this for you and spins up the latest available.
|
21
|
-
|
22
|
-
###Scaffolding
|
23
|
-
Scaffolding helps you create the files and provides you initial structure. Project here reffers to BAM project.
|
24
|
-
|
25
|
-
create a project `bam scaffold project test`
|
26
|
-
|
27
|
-
this will create a project. You can also scaffold a project from a know template. Currently it is again just goodsales. The goal is that after spinning you gooddata and BAM project you are good to run. If you want the defaults there is nothing you need to do besides filling credentials.
|
28
|
-
|
29
|
-
-->
|
30
14
|
##Sample project -- GoodSales
|
31
15
|
|
32
16
|
###Prerequisites
|
33
|
-
You need a working project with API access. You should also have username pass and token ready to fill in to params.json.
|
17
|
+
You need a working Salesforce project with API access. You should also have username pass and token ready to fill in to params.json.
|
34
18
|
|
35
19
|
### Warnings
|
36
|
-
The project is currently cloned out of an existing project. That means that you need to have access to it. If you do not (the project PID is
|
20
|
+
The project is currently cloned out of an existing project. That means that you need to have access to it. If you do not (the project PID is nt935rwzls50zfqwy6dh62tabu8h0ocy) ask svarovsky@gooddata.com. Eventually this will be covered by a template so you will not need to do anything special. The template creation is tracked here https://jira.intgdc.com/browse/GD-34641 .
|
37
21
|
|
38
22
|
###Let's get to it
|
39
23
|
We will spin a goodsales project and load it with data. Prerequisite for this is a functioning Salesforce poject that you can grab at force.com.
|
@@ -41,7 +25,7 @@ We will spin a goodsales project and load it with data. Prerequisite for this is
|
|
41
25
|
`bam scaffold project test --blueprint goodsales`
|
42
26
|
|
43
27
|
now you can go inside `cd test`. You will notice several directories and files. We will get to `flows`, `taps` and `sinks` late. Currently focus just on `params.json`
|
44
|
-
If you open it you will see several parameters which you will have to fill. Common one should be predefined and empty. For starters you will need `gd_login` and `
|
28
|
+
If you open it you will see several parameters which you will have to fill. Common one should be predefined and empty. For starters you will need `gd_login`, `gd_pass`, `sf_password`, `sf_token` and `sf_login` parameters filled in. You can check that Salesforce connection is working by issuing `bam sf_jack_in`. If it is you should have a REPL opened up. If not you should get an error message.
|
45
29
|
|
46
30
|
One of the parameters is project_pid. To get that you need a project.
|
47
31
|
|
@@ -49,7 +33,10 @@ One of the parameters is project_pid. To get that you need a project.
|
|
49
33
|
|
50
34
|
This should spin for a while and eventually should give you project ID. Fill it in your params.json.
|
51
35
|
|
52
|
-
|
36
|
+
|
37
|
+
Now we are going to generate downloaders and before we do so it is a good practice to make sure that you have everything you need to. You can issue `bam taps_validate` which will go to Salesforce check every field you defined and make sure it is available. If not it will warn you. We tried to pick the fields that will be in you SalesForce but it is possible that they were deleted or the user does not have access to them.
|
38
|
+
|
39
|
+
If everything is ok go ahead and generate the downloaders.
|
53
40
|
|
54
41
|
`bam generate_downloaders`
|
55
42
|
|
@@ -69,16 +56,19 @@ This works the same as with downloaders but its default target is clover_project
|
|
69
56
|
|
70
57
|
After it is finished log in to gooddata go into your project and celebrate. You just did project using BAM.
|
71
58
|
|
72
|
-
##
|
73
|
-
|
59
|
+
##When Things go wrong
|
60
|
+
We tried our best to this experience be a smooth one but sometimes things go bad. Here are some typical problems that can occur.
|
74
61
|
|
75
|
-
###
|
76
|
-
|
62
|
+
###Field is inaccessible in SF
|
63
|
+
In the log there should be something like
|
77
64
|
|
78
|
-
|
65
|
+
Worker task failed: Missing mandatory fields
|
66
|
+
|
67
|
+
This means that some of your fields are either not accessible or not in your SF project. Use `bam taps_validate` to identify those and remap them.
|
68
|
+
|
69
|
+
##Next steps
|
70
|
+
Ok so by now you hopefully have your project up and running. Before we dive into modifications you have to understand key concepts tha BAM builds on. Once you are comfortable with those we will get back with.
|
79
71
|
|
80
|
-
###Types or not?
|
81
|
-
Clover engine is built on Java and it shows. It is statically typed and CTL Clover transformation language resembles Java a lot. While it helps speed and many people claim it prevents errors it also causes more work and helps metadata explosion. Sometimes you need to translate an field into another field becuase you need to do something specific or the component needs it. It is not problem per se but it is important to see the tradeoffs and push the functionality into the components that should work for you and not against you. It is also important to do certain tasks at certain phases. If you do this you found out that certain parts are easier to automate or you can easily reuse work that you did somewhere else.
|
82
72
|
|
83
73
|
##Taps
|
84
74
|
Taps are sources of data. Right now you can use just salesforce tap.
|
@@ -131,7 +121,7 @@ Sometimes it is useful to limit number of grabbed values for example for testing
|
|
131
121
|
}
|
132
122
|
|
133
123
|
####Acts as
|
134
|
-
Sometime it is needed to use one field several times in a source of data or you want to "call" certain field differently because the ETL relies on
|
124
|
+
Sometime it is needed to use one field several times in a source of data or you want to "call" certain field differently because the ETL relies on particular name. Both cases are handled using `acts_as`
|
135
125
|
|
136
126
|
{
|
137
127
|
"source" : "salesforce"
|
@@ -149,6 +139,8 @@ Sometime it is needed to use one field several times in a source of data or you
|
|
149
139
|
|
150
140
|
Id will be routed to both Id and Name. Custom_Amount__c will be called RenamedAmount.
|
151
141
|
|
142
|
+
Caution: This is a double edged sword so be careful. The idea here is that it should make your life easier not harder. You should map a field to a different one in exactly 2 cases. One is that you want the same field twice. The second is that (predefined) ETL requires certain field under certain name. If you are not careful it is easy to introduce data from 2 columns into a single one.
|
143
|
+
|
152
144
|
####Condition
|
153
145
|
You can also specify a condition during download. I recommend using it only if it drastically lowers the amount of data that goes over wire. Otherwise implement it elsewhere.
|
154
146
|
|
@@ -162,15 +154,15 @@ It is wasteful to download everything on and on again. If you specify the increm
|
|
162
154
|
The reason for this is simple. When you download only incrementally you do not stress the wires that much and that means you can run it pretty often. By runnning it often it means that even if something horrible happens once it will probably run succesfully next time. And as we mentioned this is cheap. On the other hand runnning the main ETL is often very expensive and recovering from failure is usually different so splitting them simplifies development of each. Since they are independent they can be even developed by different people which is sometimes useful.
|
163
155
|
|
164
156
|
####Taps validation
|
165
|
-
Fail early. There is nothing more frustrating than when the ETL fails during
|
157
|
+
Fail early. There is nothing more frustrating than when the ETL fails during execution. When you develop the taps you can ask BAM to connect to SF and validate that the fields are present. This is not bulletproof since some fields can go away at any time but it gives you good idea if you did not misspelled any fields.
|
166
158
|
|
167
159
|
####Mandatory fields
|
168
|
-
Sometimes it is necessary to move fields around in SF. In such case the tap will. If you know this upfront you can tell BAM that this field is not mandatory and it will silently go along filling the missing field with ''
|
160
|
+
Sometimes it is necessary to move fields around in SF. In such case the tap will. If you know this upfront you can tell BAM that this field is not mandatory and it will silently go along filling the missing field with ''. If it is marked as mandatory which all fields are by default it will fail if it cannot access the field.
|
169
161
|
|
170
162
|
##Flows
|
171
|
-
Flow is an abstraction
|
163
|
+
Flow is an abstraction that should connect a tap with a sink creating a .. well a flow.
|
172
164
|
|
173
|
-
This flow will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution). This graph concatenates first name and last name together. It then feeds data to the sink.
|
165
|
+
Probably better show you a simple example. This flow will download users from sf (you have to provide credentials in params.json). It then runs graph called "process user" (this is part of the distribution but we can act that is an arbitrary graph). This graph concatenates first name and last name together. It then feeds data to the sink.
|
174
166
|
|
175
167
|
GoodData::CloverGenerator::DSL::flow("user") do |f|
|
176
168
|
tap(:id => "user")
|
@@ -185,18 +177,19 @@ This flow will download users from sf (you have to provide credentials in params
|
|
185
177
|
sink(:id => "user")
|
186
178
|
end
|
187
179
|
|
188
|
-
|
180
|
+
Note couple of things.
|
189
181
|
|
190
|
-
|
191
|
-
|
182
|
+
* The flow is defined using a DSL in Ruby. If you like Ruby great if you do not I recommend http://rubymonk.com/ ot get you up to speed. This might change and we might introduce our own DSL.
|
183
|
+
* Flow has its own id. The name of the file does not actually matter. Again something that we are thinking about.
|
184
|
+
* with a tap you will include a tap into the flow. You can specify the id of a tap with id param. If you omit it it will try to include the tap with the same id as the flow.
|
185
|
+
* with graph you run a graph of a name process_owner. When BAM creates the graphs for you there are two places it looks for the graphs. First it lookes into your project into `local_graphs` then it tries to look into the library that comes with BAM. Again especially the second part is going to change.
|
186
|
+
* there might be one or more metadata statements after graph definition. Each graph might expect numerous inputs so the order of these `metadata` statements is telling you which input goes where. Second purpose is actually telling what is going to change in those metadata. Here we are saying "*Ok the user is going in as input number one (there is no number two in this case). At the output user will have one more field and that is Name. On top of that we are removing two fields FirstName and LastName*".
|
187
|
+
* The last thing we specify is the sink. Again as in tap you can specify id so you tell BAM for which sink it should look. If you do not fill it in by default it looks for the same is as your flow.
|
192
188
|
|
193
|
-
graph('my_graph')
|
194
|
-
|
195
|
-
It goes to 2 places (this will change) and tries to find the graph. First place is your `local_graphs` directory in your project the second place is central reporsitory that is currently inside bam library and this part will probably change.
|
196
189
|
|
197
190
|
##Sinks
|
198
191
|
|
199
|
-
Sink is a definition of where data goes
|
192
|
+
Sink is a definition of where data goes. Currently there is only one sink type and that is gooddata dataset.
|
200
193
|
|
201
194
|
###GoodData
|
202
195
|
|
@@ -233,11 +226,9 @@ Sink is a definition of where data goes to. Currently there is only one sink typ
|
|
233
226
|
Gooddat sink is currently just mimicking the CL tool definition + some shortcuts on top of that. If you are familiar with CL tool you should be right at home if I tell you that the only additional thing you have to provide is telling BAM which metadata field is pulled in to a given field.
|
234
227
|
|
235
228
|
|
229
|
+
##Adding a field
|
230
|
+
Ok let's say you have a basic GoodSales
|
236
231
|
|
237
|
-
<!--For this example to work you need to provide SF and gd credentials. Provide them in params.json. You would need to provide also a project with appropriate project but this is out of scope of this "example" (I am working on tools that would make it easier).
|
238
|
-
|
239
|
-
Now run `bam generate` and there will be a folder with the clover project generated. Open it in CC find main.grf and run it. After crunching for a while you should see data in the project.
|
240
|
-
-->
|
241
232
|
|
242
233
|
##Runtime commands
|
243
234
|
Part of the distribution is the bam executable which lets you do several neat things on the commandline
|
@@ -245,39 +236,77 @@ Part of the distribution is the bam executable which lets you do several neat th
|
|
245
236
|
Run `bam` to get the list of commands
|
246
237
|
Run `bam help command` to get help about the command
|
247
238
|
|
248
|
-
### deploy directory
|
249
|
-
deploys the directory to the server. You can provide the param of the process as a parameter
|
250
|
-
|
251
239
|
### generate
|
252
|
-
Generates the ETL. The default target directory is clover_project (currently cannot be changed).
|
240
|
+
Generates the ETL. The default target directory is clover_project (currently cannot be changed).
|
241
|
+
**--only flow_id** generates only one flow. Useful for debugging
|
242
|
+
|
243
|
+
bame generate --only owner
|
253
244
|
|
254
245
|
### generate_downloaders
|
255
|
-
|
246
|
+
Generates downloaders into downloaders_project (currently cannot be changed).
|
247
|
+
|
248
|
+
### deploy directory
|
249
|
+
deploys the directory to the server.
|
250
|
+
|
251
|
+
bam deploy clover_project
|
256
252
|
|
257
|
-
|
258
|
-
|
253
|
+
**--process process_id** You can specify a process ID so you can redeploy to the same process. This just updates the deployed project. All the schedules are still in effect.
|
254
|
+
|
255
|
+
bam deploy clover_project --process 1231jkadjk123k
|
259
256
|
|
260
257
|
### model_sync
|
261
|
-
|
258
|
+
This will go through the sinks and updates the model. It rellies on CL tool to do this so this describes the limitation. It is very useful for adding additonal fields not changing the model altogether.
|
262
259
|
|
263
260
|
### run
|
264
|
-
Runs the project and
|
265
|
-
|
261
|
+
Runs the project on the server. This is achived by deploying it there and deleting after the run finsihes.
|
262
|
+
|
263
|
+
bam run clover-project
|
264
|
+
|
265
|
+
**--email someone@example.com** This will create a temporary email channel hooks the events on success and failure. The channel is tore down once the ETL is done.
|
266
266
|
|
267
267
|
### scaffold
|
268
|
-
|
268
|
+
Creates a file templates so you do not need to start from scratch.
|
269
|
+
|
270
|
+
bam scaffold project my_new_project
|
271
|
+
|
272
|
+
bam scaffold tap new_tap
|
273
|
+
|
274
|
+
bam scaffold flow new_flow
|
275
|
+
|
276
|
+
bam scaffold dataset new_dataset
|
277
|
+
|
278
|
+
To further ease your typical tasks in ETL BAM comes with couple of templates with prefilled ETL constructs
|
279
|
+
|
280
|
+
bam scaffold graph_template reformat local_process_my_stuff
|
281
|
+
bam scaffold graph_template join local_process_my_other_stuff
|
282
|
+
|
269
283
|
|
270
284
|
### taps_generate_docs
|
271
|
-
In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git.
|
272
|
-
taps
|
273
|
-
sinks
|
285
|
+
In your project there should be a README.md.erb file. By running this command it will be transformed into README.md and put into the project so it can be committed to git. Since it is an erb template there are several expressions that you can use.
|
274
286
|
|
275
|
-
|
287
|
+
<%= taps %> - list of taps
|
288
|
+
<%= sinks %> - list of sinks
|
289
|
+
|
290
|
+
You can run arbitrary ruby code inside so you can write something like
|
291
|
+
|
292
|
+
Last generated at <%= Date.today %>
|
293
|
+
|
294
|
+
### taps_validate
|
276
295
|
Currently works only for SF. Validates that the target SF instance has all the fields in the objects that are specified in the taps definitions.
|
277
296
|
|
278
|
-
###
|
279
|
-
|
297
|
+
### sinks_validate
|
298
|
+
TBD
|
299
|
+
|
300
|
+
##The why
|
301
|
+
For those that are interested in reading why we actually bothered developing this. Read on.
|
302
|
+
|
303
|
+
###Metadata management
|
304
|
+
Key pain that I had with CloudConnect is that I did not like the management of metadata. Every project I saw was just pile of metadata definition that has to be constantly changed and tweaked. This is caused by couple of choices that creators of underlying Clover engine made in the beginning and probably will not be changed easily. While I am trying to make it better I am still bound by these choices and sometimes the wiring stick out - sorry for that.
|
280
305
|
|
281
|
-
|
282
|
-
|
283
|
-
|
306
|
+
###Incremental metadata
|
307
|
+
Bam is working with something that is called Incremental metadata. Metadata is not defined in each step you just say what you want to change. Picture is probably better than thousand words.
|
308
|
+
|
309
|
+
You have a conceptual picture of a simple transformation. You get a Tap that downloads FirstName and LastName somewhere. Obviously you would like to join them together to form a name. Exactly this happens in the second box the transformer. You would like to sink the only field and that is name. So on the next edge what you say is "I am adding Name and removing FirstName and LastName". So far so good. What is elegant about this approach is that how it copes with change. Imagine that the tap gets not only FirstName and LastName but also Age. Now what you need to change? If you would do it the old way You would have to change metadata on both edges, tap transformer and sink. With incremental metadata you need to change tap and sink nothing else. Since I claim that dealing with metadata was the biggest pain this is a lot of work (and errors) that you just saved.
|
310
|
+
|
311
|
+
###Types or not?
|
312
|
+
Clover engine is built on Java and it shows. It is statically typed and CTL Clover transformation language resembles Java a lot. While it helps speed and many people claim it prevents errors it also causes more work and helps metadata explosion. Sometimes you need to translate an field into another field becuase you need to do something specific or the component needs it. It is not problem per se but it is important to see the tradeoffs and push the functionality into the components that should work for you and not against you. It is also important to do certain tasks at certain phases. If you do this you found out that certain parts are easier to automate or you can easily reuse work that you did somewhere else.
|
data/bin/bam
CHANGED
@@ -16,6 +16,11 @@ default_value false
|
|
16
16
|
arg_name 'verbose'
|
17
17
|
switch [:v,:verbose]
|
18
18
|
|
19
|
+
desc 'Http logger'
|
20
|
+
default_value false
|
21
|
+
arg_name 'logger'
|
22
|
+
switch [:l,:logger]
|
23
|
+
|
19
24
|
|
20
25
|
desc 'Generates clover project based on information in current directory. The default ouptut is the directory ./clover_project'
|
21
26
|
# arg_name 'Describe arguments to new here'
|
@@ -58,7 +63,19 @@ desc 'Validates that the tap has the fields it is claimed it should have. This i
|
|
58
63
|
# arg_name 'Describe arguments to new here'
|
59
64
|
command :taps_validate do |c|
|
60
65
|
c.action do |global_options,options,args|
|
61
|
-
|
66
|
+
verbose = global_options[:v]
|
67
|
+
result = GoodData::CloverGenerator.validate_taps
|
68
|
+
|
69
|
+
error = false
|
70
|
+
result.each_pair do |obj, fields|
|
71
|
+
if fields.empty?
|
72
|
+
puts HighLine::color("GOOD", :green) + " #{obj}" if verbose
|
73
|
+
error = true
|
74
|
+
else
|
75
|
+
puts HighLine::color("BAD", :red) + " #{obj} [" + fields.join(', ') + "]" if verbose
|
76
|
+
end
|
77
|
+
end
|
78
|
+
exit_now!("Errors found",exit_code=1) if error
|
62
79
|
end
|
63
80
|
end
|
64
81
|
|
@@ -111,12 +128,13 @@ command :project do |c|
|
|
111
128
|
|
112
129
|
pid = case options[:blueprint]
|
113
130
|
when "goodsales"
|
114
|
-
"
|
131
|
+
"nt935rwzls50zfqwy6dh62tabu8h0ocy"
|
115
132
|
when nil
|
116
133
|
fail "Empty project not supported now"
|
117
134
|
end
|
118
135
|
|
119
|
-
|
136
|
+
logger = Logger.new(STDOUT) if global_options[:l]
|
137
|
+
GoodData::CloverGenerator.connect_to_gd(:logger => logger)
|
120
138
|
with_users = options[:with_users]
|
121
139
|
|
122
140
|
export = {
|
@@ -257,7 +275,8 @@ command :deploy do |c|
|
|
257
275
|
dir = args.first
|
258
276
|
fail "You have to specify directory to deploy as an argument" if dir.nil?
|
259
277
|
fail "Specified directory does not exist" unless File.exist?(dir)
|
260
|
-
|
278
|
+
logger = Logger.new(STDOUT) if global_options[:l]
|
279
|
+
GoodData::CloverGenerator.connect_to_gd(:logger => logger)
|
261
280
|
options = global_options.merge({:name => "temporary"}).merge(options)
|
262
281
|
response = GoodData::CloverGenerator.deploy(dir, options)
|
263
282
|
end
|
@@ -279,15 +298,14 @@ command :run do |c|
|
|
279
298
|
verbose = global_options[:v]
|
280
299
|
|
281
300
|
logger = Logger.new(STDOUT) if global_options[:l]
|
282
|
-
|
283
|
-
|
284
301
|
GoodData::CloverGenerator.connect_to_gd(:logger => logger)
|
285
302
|
options = global_options.merge({:name => "temporary"})
|
286
303
|
GoodData::CloverGenerator.deploy(dir, options) do |deploy_response|
|
287
304
|
puts HighLine::color("Executing", HighLine::BOLD) if verbose
|
288
305
|
GoodData::CloverGenerator.create_email_channel(options) do |channel_response|
|
289
306
|
GoodData::CloverGenerator.subscribe_on_finish(:success, channel_response["channelConfiguration"]["meta"]["uri"], deploy_response["process"]["links"]["self"].split('/').last)
|
290
|
-
GoodData::CloverGenerator.execute_process(deploy_response["process"]["links"]["executions"], dir)
|
307
|
+
result = GoodData::CloverGenerator.execute_process(deploy_response["process"]["links"]["executions"], dir)
|
308
|
+
|
291
309
|
end
|
292
310
|
end
|
293
311
|
end
|
@@ -300,6 +318,7 @@ pre do |global,command,options,args|
|
|
300
318
|
# chosen command
|
301
319
|
# Use skips_pre before a command to skip this block
|
302
320
|
# on that command only
|
321
|
+
|
303
322
|
true
|
304
323
|
end
|
305
324
|
|
data/lib/bam/version.rb
CHANGED
data/lib/dsl/project_dsl.rb
CHANGED
@@ -64,7 +64,7 @@ module GoodData
|
|
64
64
|
attr_accessor :steps, :name
|
65
65
|
|
66
66
|
def self.define(name="", &script)
|
67
|
-
puts "Reading flow #{name}"
|
67
|
+
# puts "Reading flow #{name}"
|
68
68
|
x = self.new
|
69
69
|
x.flow_name(name)
|
70
70
|
x.instance_eval(&script)
|
@@ -100,7 +100,7 @@ module GoodData
|
|
100
100
|
type = options[:type]
|
101
101
|
|
102
102
|
steps.push(options)
|
103
|
-
puts "Running step #{graph}"
|
103
|
+
# puts "Running step #{graph}"
|
104
104
|
end
|
105
105
|
|
106
106
|
def metadata(name=nil,options={}, &bl)
|
@@ -121,7 +121,7 @@ module GoodData
|
|
121
121
|
attr_accessor :usecases, :name, :dims
|
122
122
|
|
123
123
|
def self.define(&script)
|
124
|
-
print self
|
124
|
+
# print self
|
125
125
|
x = self.new
|
126
126
|
x.instance_eval(&script)
|
127
127
|
x
|
@@ -209,23 +209,23 @@ module GoodData
|
|
209
209
|
end
|
210
210
|
|
211
211
|
def run(repo)
|
212
|
-
puts "Running"
|
212
|
+
# puts "Running"
|
213
213
|
|
214
|
-
puts "looking for dimension definitions"
|
215
|
-
dims.each do |dim|
|
216
|
-
puts "found #{dim}"
|
217
|
-
end
|
214
|
+
# puts "looking for dimension definitions"
|
215
|
+
# dims.each do |dim|
|
216
|
+
# puts "found #{dim}"
|
217
|
+
# end
|
218
218
|
|
219
219
|
sources = get_sources
|
220
220
|
fail "You have no sources defined" if sources.empty?
|
221
|
-
puts "Found #{sources.count} sources"
|
221
|
+
# puts "Found #{sources.count} sources"
|
222
222
|
|
223
223
|
|
224
224
|
datasets = get_datasets
|
225
225
|
fail "You have no datasets defined" if datasets.empty?
|
226
|
-
puts "Found #{datasets.count} sources"
|
226
|
+
# puts "Found #{datasets.count} sources"
|
227
227
|
|
228
|
-
puts "Composing the tree"
|
228
|
+
# puts "Composing the tree"
|
229
229
|
you = GoodData::CloverGenerator::Dependency::N.new({
|
230
230
|
:name => name,
|
231
231
|
:type => "project",
|
data/lib/nodes/clover_gen.rb
CHANGED
@@ -262,13 +262,11 @@ module GoodData
|
|
262
262
|
|
263
263
|
def self.validate_sf_metadata(sf_client, sources)
|
264
264
|
sources.reduce({}) do |memo, source|
|
265
|
-
puts "Checking #{source[:object]}"
|
266
265
|
sf_object = source[:object]
|
267
266
|
u = sf_client.describe(sf_object)
|
268
267
|
sf_fields = u[:describeSObjectResponse][:result][:fields].map {|field| field[:name]}
|
269
268
|
fields_to_validate = source[:fields].map {|field| field[:name]}
|
270
269
|
memo[sf_object] = (fields_to_validate - sf_fields)
|
271
|
-
pp fields_to_validate - sf_fields
|
272
270
|
memo
|
273
271
|
end
|
274
272
|
end
|
@@ -393,7 +391,7 @@ module GoodData
|
|
393
391
|
if spec[:condition].nil? || spec[:condition].empty?
|
394
392
|
spec[:condition] = "SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
|
395
393
|
else
|
396
|
-
spec[:condition] += "AND SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
|
394
|
+
spec[:condition] += " AND SystemModstamp > ${#{spec[:id]}_START} AND SystemModstamp <= ${#{spec[:id]}_END}"
|
397
395
|
end
|
398
396
|
generate_select(spec)
|
399
397
|
end
|
@@ -1229,7 +1227,7 @@ HEREDOC
|
|
1229
1227
|
build_node2(builder, GoodData::CloverGenerator::Nodes.edge2({:toNode => "#{file}_es_reformat:0", :fromNode => "#{file}_copy:1", :metadata => "#{file}_clover_metadata", :id => get_id()}))
|
1230
1228
|
|
1231
1229
|
if s3_backup then
|
1232
|
-
build_node2(builder, GoodData::CloverGenerator::Nodes.writer2({:enabled => "
|
1230
|
+
build_node2(builder, GoodData::CloverGenerator::Nodes.writer2({:enabled => "enabled", :name => "#{file} s3 Writer", :id => "#{file}_s3", :fileURL => "https://${S3_ACCESS_KEY_ID}:\`replace(\"${S3_SECRET_ACCESS_KEY}\",\"/\",\"%2F\")\`@${S3_BUCKETNAME}.s3.amazonaws.com/${GDC_PROJECT_ID}/#{file}/#{file}_\`date2long(today())\`", :outputFieldNames => true, :quotedStrings => false}))
|
1233
1231
|
end
|
1234
1232
|
build_node2(builder, GoodData::CloverGenerator::Nodes.edge2({:toNode => "#{file}_csv:0", :fromNode => "#{file}_copy:0", :metadata => "#{file}_clover_metadata", :id => get_id()}))
|
1235
1233
|
if s3_backup then
|
data/lib/nodes/dependency.rb
CHANGED
data/lib/runtime.rb
CHANGED
@@ -137,8 +137,7 @@ module GoodData
|
|
137
137
|
sources = project.get_sources
|
138
138
|
client = get_sf_client(PARAMS)
|
139
139
|
sf_sources = sources.find_all {|tap| tap[:source] == "salesforce"}
|
140
|
-
|
141
|
-
pp report
|
140
|
+
GoodData::CloverGenerator::validate_sf_metadata(client, sf_sources)
|
142
141
|
end
|
143
142
|
|
144
143
|
def self.sf_jack_in
|
@@ -222,12 +221,14 @@ module GoodData
|
|
222
221
|
sources = project.get_sources
|
223
222
|
sf_sources = sources.find_all {|tap| tap[:source] == "salesforce" && tap[:incremental] == true}
|
224
223
|
create_incremental_downloader_run_graph(CLOVER_DOWNLOADERS_ROOT + PROJECT_GRAPHS_ROOT + "main.grf", sf_sources)
|
224
|
+
s3_backup = PARAMS[:S3_SECRET_ACCESS_KEY] && PARAMS[:S3_ACCESS_KEY_ID] && PARAMS[:S3_BUCKETNAME]
|
225
|
+
|
225
226
|
GoodData::CloverGenerator::create_incremental_downloading_graph(CLOVER_DOWNLOADERS_ROOT + PROJECT_GRAPHS_ROOT + "incremental.grf", sf_sources, {
|
226
227
|
:password => PARAMS[:sf_password],
|
227
228
|
:token => PARAMS[:sf_token],
|
228
229
|
:login => PARAMS[:sf_login],
|
229
230
|
:sf_server => PARAMS[:sf_server],
|
230
|
-
:s3_backup =>
|
231
|
+
:s3_backup => s3_backup
|
231
232
|
})
|
232
233
|
end
|
233
234
|
|
@@ -239,7 +240,17 @@ module GoodData
|
|
239
240
|
:params => {}
|
240
241
|
}
|
241
242
|
})
|
242
|
-
|
243
|
+
begin
|
244
|
+
GoodData.poll(result, "executionTask")
|
245
|
+
rescue RestClient::RequestFailed => e
|
246
|
+
|
247
|
+
ensure
|
248
|
+
result = GoodData.get(result["executionTask"]["links"]["detail"])
|
249
|
+
if result["executionDetail"]["status"] == "ERROR"
|
250
|
+
fail "Runing process failed. You can look at a log here #{result["executionDetail"]["logFileName"]}"
|
251
|
+
end
|
252
|
+
end
|
253
|
+
result
|
243
254
|
end
|
244
255
|
|
245
256
|
def self.connect_to_gd(options={})
|
@@ -384,6 +395,7 @@ module GoodData
|
|
384
395
|
p = build_project
|
385
396
|
sources = p.get_sources
|
386
397
|
datasets = p.get_datasets
|
398
|
+
s3_backup = PARAMS[:S3_SECRET_ACCESS_KEY] && PARAMS[:S3_ACCESS_KEY_ID] && PARAMS[:S3_BUCKETNAME]
|
387
399
|
|
388
400
|
flows = []
|
389
401
|
FileUtils::cd FLOWS_ROOT do
|
@@ -441,7 +453,7 @@ module GoodData
|
|
441
453
|
|
442
454
|
GoodData::CloverGenerator::create_es_downloading_graph(graph_name, [source], {
|
443
455
|
:metadata => current_metadata[source_name],
|
444
|
-
:s3_backup =>
|
456
|
+
:s3_backup => s3_backup
|
445
457
|
})
|
446
458
|
else
|
447
459
|
GoodData::CloverGenerator::create_sf_downloading_graph(graph_name, [source], {
|
@@ -450,7 +462,7 @@ module GoodData
|
|
450
462
|
:login => PARAMS[:sf_login],
|
451
463
|
:sf_server => PARAMS[:sf_server],
|
452
464
|
:metadata => current_metadata[source_name],
|
453
|
-
:s3_backup =>
|
465
|
+
:s3_backup => s3_backup
|
454
466
|
})
|
455
467
|
end
|
456
468
|
|
@@ -0,0 +1,57 @@
|
|
1
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
2
|
+
<Graph author="fluke" created="Tue Feb 05 15:38:24 PST 2013" guiVersion="3.3.2" id="1360179808937" licenseCode="CLP1DGOODD71636137BY" licenseType="Commercial" modified="Mon May 06 10:12:35 PDT 2013" modifiedBy="gdc-defectivedisplay" name="process_name" revision="1.15" showComponentDetails="true">
|
3
|
+
<Global>
|
4
|
+
<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_in.xml" id="Metadata0"/>
|
5
|
+
<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/1_out.xml" id="Metadata1"/>
|
6
|
+
<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/2_in.xml" id="Metadata2"/>
|
7
|
+
<Metadata fileURL="${PROJECT}/metadata/${FLOW}/${NAME}/2_out.xml" id="Metadata3"/>
|
8
|
+
<MetadataGroup id="ComponentGroup0" name="metadata"/>
|
9
|
+
<Property fileURL="params.txt" id="GraphParameter14"/>
|
10
|
+
<Property fileURL="workspace.prm" id="GraphParameter0"/>
|
11
|
+
<Dictionary/>
|
12
|
+
</Global>
|
13
|
+
<Phase number="0">
|
14
|
+
<Node enabled="enabled" fileURL="data/1_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="169" id="DATA_READER0" quoteCharacter=""" quotedStrings="true" skipRows="1" type="DATA_READER"/>
|
15
|
+
<Node enabled="enabled" fileURL="data/2_in.csv" guiHeight="77" guiName="CSV Reader" guiWidth="128" guiX="124" guiY="269" id="DATA_READER1" quoteCharacter=""" quotedStrings="true" skipRows="1" type="DATA_READER"/>
|
16
|
+
<Node enabled="enabled" fileURL="data/out.csv" guiHeight="77" guiName="CSV Writer" guiWidth="128" guiX="776" guiY="196" id="DATA_WRITER0" outputFieldNames="true" quoteCharacter=""" quotedStrings="true" type="DATA_WRITER"/>
|
17
|
+
<Node enabled="enabled" guiHeight="89" guiName="ExtMergeJoin" guiWidth="128" guiX="570" guiY="199" id="EXT_MERGE_JOIN0" joinKey="$UserRoleId(a)#$Id(a);" type="EXT_MERGE_JOIN">
|
18
|
+
<attr name="transform"><![CDATA[//#CTL2
|
19
|
+
|
20
|
+
// Transforms input record into output record.
|
21
|
+
function integer transform() {
|
22
|
+
|
23
|
+
$out.0.* = $in.0.*;
|
24
|
+
$out.0.Id = nvl2($out.0.Id, $in.1.Name, "");
|
25
|
+
|
26
|
+
return ALL;
|
27
|
+
|
28
|
+
}
|
29
|
+
|
30
|
+
// Called during component initialization.
|
31
|
+
// function boolean init() {}
|
32
|
+
|
33
|
+
// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
|
34
|
+
// required by the transform. All resources allocated within this method should be released
|
35
|
+
// by the postExecute() method.
|
36
|
+
// function void preExecute() {}
|
37
|
+
|
38
|
+
// Called only if transform() throws an exception.
|
39
|
+
// function integer transformOnError(string errorMessage, string stackTrace) {}
|
40
|
+
|
41
|
+
// Called during each graph run after the entire transform was executed. Should be used to free any resources
|
42
|
+
// allocated within the preExecute() method.
|
43
|
+
// function void postExecute() {}
|
44
|
+
|
45
|
+
// Called to return a user-defined error message when an error occurs.
|
46
|
+
// function string getMessage() {}
|
47
|
+
]]></attr>
|
48
|
+
</Node>
|
49
|
+
<Node enabled="enabled" guiHeight="77" guiName="ExtSort" guiWidth="128" guiX="362" guiY="161" id="EXT_SORT0" sortKey="UserRoleId(a)" type="EXT_SORT"/>
|
50
|
+
<Node enabled="enabled" guiHeight="77" guiName="ExtSort" guiWidth="128" guiX="362" guiY="275" id="EXT_SORT1" sortKey="Id(a)" type="EXT_SORT"/>
|
51
|
+
<Edge fromNode="DATA_READER0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (output)" toNode="EXT_SORT0:0"/>
|
52
|
+
<Edge fromNode="DATA_READER1:0" guiBendpoints="" guiRouter="Manhattan" id="Edge2" inPort="Port 0 (in)" metadata="Metadata2" outPort="Port 0 (output)" toNode="EXT_SORT1:0"/>
|
53
|
+
<Edge fromNode="EXT_MERGE_JOIN0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge4" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 0 (out)" toNode="DATA_WRITER0:0"/>
|
54
|
+
<Edge fromNode="EXT_SORT0:0" guiBendpoints="" guiRouter="Manhattan" id="Edge1" inPort="Port 0 (driver)" metadata="Metadata0" outPort="Port 0 (out)" toNode="EXT_MERGE_JOIN0:0"/>
|
55
|
+
<Edge fromNode="EXT_SORT1:0" guiBendpoints="" guiRouter="Manhattan" id="Edge3" inPort="Port 1 (slave)" metadata="Metadata2" outPort="Port 0 (out)" toNode="EXT_MERGE_JOIN0:1"/>
|
56
|
+
</Phase>
|
57
|
+
</Graph>
|
metadata
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gd_bam
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.8
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
|
-
-
|
8
|
+
- Tomas Svarovsky
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-05-
|
12
|
+
date: 2013-05-09 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rake
|
@@ -365,9 +365,9 @@ files:
|
|
365
365
|
- lib/repo/1_config.json
|
366
366
|
- lib/repository/repo.rb
|
367
367
|
- lib/runtime.rb
|
368
|
-
- lib/taps/tap.rb
|
369
368
|
- templates/dataset.json.erb
|
370
369
|
- templates/flow.rb.erb
|
370
|
+
- templates/join_template.grf.erb
|
371
371
|
- templates/params.json.erb
|
372
372
|
- templates/project.erb
|
373
373
|
- templates/reformat_template.grf.erb
|
@@ -395,12 +395,18 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
395
395
|
- - ! '>='
|
396
396
|
- !ruby/object:Gem::Version
|
397
397
|
version: '0'
|
398
|
+
segments:
|
399
|
+
- 0
|
400
|
+
hash: 3774242164736806626
|
398
401
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
399
402
|
none: false
|
400
403
|
requirements:
|
401
404
|
- - ! '>='
|
402
405
|
- !ruby/object:Gem::Version
|
403
406
|
version: '0'
|
407
|
+
segments:
|
408
|
+
- 0
|
409
|
+
hash: 3774242164736806626
|
404
410
|
requirements: []
|
405
411
|
rubyforge_project:
|
406
412
|
rubygems_version: 1.8.25
|
data/lib/taps/tap.rb
DELETED
@@ -1,52 +0,0 @@
|
|
1
|
-
require 'hasb'
|
2
|
-
|
3
|
-
class HashValidator::Validator::TapFieldValidtor < HashValidator::Validator::Base
|
4
|
-
def initialize
|
5
|
-
super('tap_field_validator') # The name of the validator
|
6
|
-
end
|
7
|
-
|
8
|
-
def validate(key, value, validations, errors)
|
9
|
-
binding.pry
|
10
|
-
unless value.is_a?(Integer) && value.odd?
|
11
|
-
errors[key] = presence_error_message
|
12
|
-
end
|
13
|
-
end
|
14
|
-
end
|
15
|
-
|
16
|
-
module GoodData
|
17
|
-
module BAM
|
18
|
-
module Taps
|
19
|
-
|
20
|
-
{
|
21
|
-
"type" : "tap"
|
22
|
-
,"source" : "salesforce"
|
23
|
-
,"object" : "Account"
|
24
|
-
,"id" : "account"
|
25
|
-
,"incremental" : true
|
26
|
-
,"fields" : [
|
27
|
-
{
|
28
|
-
"name" : "Id"
|
29
|
-
},
|
30
|
-
{
|
31
|
-
"name" : "Name"
|
32
|
-
},
|
33
|
-
{
|
34
|
-
"name" : "SystemModstamp", "acts_as": ["timestamp"]
|
35
|
-
}
|
36
|
-
]
|
37
|
-
// ,"limit": "10"
|
38
|
-
}
|
39
|
-
|
40
|
-
|
41
|
-
VALIDATOR = {
|
42
|
-
:type => :string,
|
43
|
-
:source =>
|
44
|
-
|
45
|
-
}
|
46
|
-
|
47
|
-
def self.parse_tap(tap_spec)
|
48
|
-
|
49
|
-
end
|
50
|
-
end
|
51
|
-
end
|
52
|
-
end
|