salesforce_bulk_query 0.1.6 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +14 -9
- data/lib/salesforce_bulk_query/batch.rb +3 -2
- data/lib/salesforce_bulk_query/connection.rb +3 -3
- data/lib/salesforce_bulk_query/job.rb +4 -2
- data/lib/salesforce_bulk_query/query.rb +41 -21
- data/lib/salesforce_bulk_query/version.rb +1 -1
- data/new-version.sh +22 -0
- data/spec/salesforce_bulk_query_spec.rb +23 -3
- metadata +2 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3c1805666d306731c5d9118b5e231ae9940fc92f
|
4
|
+
data.tar.gz: 347fc0ada770fd4ae8b7bcaaf32d6b3b14516a9e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 38d588dea39c3f5ce3d7bb230d8af27729c236ebc9a597d58566af64f8f8dee3f2cc459cf1f893e4e344bf8a53a9e2e0f4bd89d3f9e79a28e0d34719d81bc0e7
|
7
|
+
data.tar.gz: 1fd675272be999224f69cef8a6dbc593e00d0bf9d9879d9e600228b42687969279a2960dc29380365877454d93bea831e5662ecba3172f5df78291b5c65c04dd
|
data/README.md
CHANGED
@@ -77,7 +77,7 @@ end
|
|
77
77
|
|
78
78
|
## How it works
|
79
79
|
|
80
|
-
The library uses the [Salesforce Bulk API](https://www.salesforce.com/us/developer/docs/api_asynch/index_Left.htm#CSHID=asynch_api_bulk_query.htm|StartTopic=Content%2Fasynch_api_bulk_query.htm|SkinName=webhelp). The given query is divided into 15 subqueries, according to the [limits](http://www.salesforce.com/us/developer/docs/api_asynchpre/Content/asynch_api_concepts_limits.htm#batch_proc_time_title). Each subquery is an interval based on the CreatedDate Salesforce field. The limits are passed to the API in SOQL queries. Subqueries are sent to the API as batches and added to a job.
|
80
|
+
The library uses the [Salesforce Bulk API](https://www.salesforce.com/us/developer/docs/api_asynch/index_Left.htm#CSHID=asynch_api_bulk_query.htm|StartTopic=Content%2Fasynch_api_bulk_query.htm|SkinName=webhelp). The given query is divided into 15 subqueries, according to the [limits](http://www.salesforce.com/us/developer/docs/api_asynchpre/Content/asynch_api_concepts_limits.htm#batch_proc_time_title). Each subquery is an interval based on the CreatedDate Salesforce field (date field can be customized). The limits are passed to the API in SOQL queries. Subqueries are sent to the API as batches and added to a job.
|
81
81
|
|
82
82
|
The first interval starts with the date the first Salesforce object was created, we query Salesforce REST API for that. If this query times out, we use a constant. The last interval ends a few minutes before now to avoid consistency issues. Custom start and end can be passed - see Options.
|
83
83
|
|
@@ -87,13 +87,14 @@ CSV results are downloaded by chunks, so that we don't run into memory related i
|
|
87
87
|
|
88
88
|
## Options
|
89
89
|
There are a few optional settings you can pass to the `Api` methods:
|
90
|
-
* `api_version`:
|
91
|
-
* `logger`:
|
92
|
-
* `filename_prefix`:
|
93
|
-
* `directory_path`:
|
94
|
-
* `check_interval`:
|
95
|
-
* `time_limit`:
|
96
|
-
* `
|
90
|
+
* `api_version`: Which Salesforce api version should be used
|
91
|
+
* `logger`: Where logs should go, see the example.
|
92
|
+
* `filename_prefix`: Prefix applied to csv files.
|
93
|
+
* `directory_path`: Custom direcotory path for CSVs, if omitted, a new temp directory is created.
|
94
|
+
* `check_interval`: How often the results should be checked in secs.
|
95
|
+
* `time_limit`: Maximum time the query can take. If this time limit is exceeded, available results are downloaded and the list of subqueries that didn't finished is returned. In seconds. The limti should be understood as limit for waiting. When the limit is reached the function downloads data that is ready which can take some additonal time. If no limit is given the query runs until it finishes.
|
96
|
+
* `date_field`: Salesforce date field that will be used for splitting the query, and optionally limiting the results. Must be a date field on the queried sobject. Default is `CreatedDate`
|
97
|
+
* `date_from`, `date_to`: limits for the `date_field` (`CreatedDate` by default). Note that queries can't contain any WHERE statements as we're doing some manipulations to create subqueries and we don't want things to get too difficult. So this is the way to limit the query yourself. The format is like `"1999-01-01T00:00:00.000Z"`
|
97
98
|
* `single_batch`: If true, the queries are not divided into subqueries as described above. Instead one batch job is created with the given query. This is faster for small amount of data, but will fail with a timeout if you have a lot of data.
|
98
99
|
|
99
100
|
See specs for exact usage.
|
@@ -119,10 +120,14 @@ If you're using Restforce as a client (which you probably are) and you want to d
|
|
119
120
|
|
120
121
|
## Notes
|
121
122
|
|
122
|
-
Query (user given) -> Job (Salesforce construct that encapsulates 15 batches) -> Batch (1 SOQL with
|
123
|
+
Query (user given) -> Job (Salesforce construct that encapsulates 15 batches) -> Batch (1 SOQL with Date constraints)
|
123
124
|
|
124
125
|
At the beginning the query is divided into 15 subqueries and put into a single job. When one of the subqueries fails, a new job with 15 subqueries is created, the range of the failed query is divided into 15 sub-subqueries.
|
125
126
|
|
127
|
+
## Change policy
|
128
|
+
|
129
|
+
The gem is trying to follow [semantic versioning](http://semver.org/). All methods and options described in this readme document are considered public API. If any of these change, at least the minor version is bumped. Methods and their params not described in this document can change even in patches.
|
130
|
+
|
126
131
|
## Running tests locally
|
127
132
|
Travis CI is set up for this repository to make sure all the tests are passing with each commit.
|
128
133
|
|
@@ -12,6 +12,7 @@ module SalesforceBulkQuery
|
|
12
12
|
@soql = options[:soql]
|
13
13
|
@job_id = options[:job_id]
|
14
14
|
@connection = options[:connection]
|
15
|
+
@date_field = options[:date_field] or fail "date_field must be given when creating a batch"
|
15
16
|
@start = options[:start]
|
16
17
|
@stop = options[:stop]
|
17
18
|
@logger = options[:logger]
|
@@ -77,7 +78,7 @@ module SalesforceBulkQuery
|
|
77
78
|
end
|
78
79
|
|
79
80
|
def get_filename
|
80
|
-
return "#{@sobject}_#{@start}_#{@stop}_#{@batch_id}.csv"
|
81
|
+
return "#{@sobject}_#{@date_field}_#{@start}_#{@stop}_#{@batch_id}.csv"
|
81
82
|
end
|
82
83
|
|
83
84
|
def get_result(options={})
|
@@ -116,7 +117,7 @@ module SalesforceBulkQuery
|
|
116
117
|
end
|
117
118
|
|
118
119
|
def verify
|
119
|
-
api_count = @connection.query_count(@sobject, @start, @stop)
|
120
|
+
api_count = @connection.query_count(@sobject, @date_field, @start, @stop)
|
120
121
|
# if we weren't able to get the count, fail.
|
121
122
|
if api_count.nil?
|
122
123
|
return @verification = false
|
@@ -112,9 +112,9 @@ module SalesforceBulkQuery
|
|
112
112
|
end
|
113
113
|
end
|
114
114
|
|
115
|
-
def query_count(sobject, from, to)
|
115
|
+
def query_count(sobject, date_field, from, to)
|
116
116
|
# do it with retries, if it doesn't succeed, return nil, don't fail.
|
117
|
-
soql = "SELECT COUNT() FROM #{sobject} WHERE
|
117
|
+
soql = "SELECT COUNT() FROM #{sobject} WHERE #{date_field} >= #{from} AND #{date_field} < #{to}"
|
118
118
|
begin
|
119
119
|
with_retries do
|
120
120
|
q = @client.query(soql)
|
@@ -135,4 +135,4 @@ module SalesforceBulkQuery
|
|
135
135
|
}
|
136
136
|
end
|
137
137
|
end
|
138
|
-
end
|
138
|
+
end
|
@@ -16,6 +16,7 @@ module SalesforceBulkQuery
|
|
16
16
|
@connection = connection
|
17
17
|
@logger = options[:logger]
|
18
18
|
@job_time_limit = options[:job_time_limit] || JOB_TIME_LIMIT
|
19
|
+
@date_field = options[:date_field] or fail "date_field must be given when creating a batch"
|
19
20
|
|
20
21
|
# all batches (static)
|
21
22
|
@batches = []
|
@@ -43,7 +44,7 @@ module SalesforceBulkQuery
|
|
43
44
|
end
|
44
45
|
|
45
46
|
def get_extended_soql(soql, from, to)
|
46
|
-
return "#{soql} WHERE
|
47
|
+
return "#{soql} WHERE #{@date_field} >= #{from} AND #{@date_field} < #{to}"
|
47
48
|
end
|
48
49
|
|
49
50
|
def generate_batches(soql, start, stop, single_batch=false)
|
@@ -88,7 +89,8 @@ module SalesforceBulkQuery
|
|
88
89
|
:connection => @connection,
|
89
90
|
:start => options[:start],
|
90
91
|
:stop => options[:stop],
|
91
|
-
:logger => @logger
|
92
|
+
:logger => @logger,
|
93
|
+
:date_field => @date_field
|
92
94
|
)
|
93
95
|
batch.create
|
94
96
|
|
@@ -6,18 +6,21 @@ module SalesforceBulkQuery
|
|
6
6
|
# Abstraction of a single user-given query. It contains multiple jobs, is tied to a specific connection
|
7
7
|
class Query
|
8
8
|
|
9
|
-
# if no
|
9
|
+
# if no date_to is given we use the current time with this offset
|
10
10
|
# subtracted (to make sure the freshest changes that can be inconsistent
|
11
11
|
# aren't there) It's in minutes
|
12
12
|
OFFSET_FROM_NOW = 10
|
13
13
|
|
14
|
+
DEFAULT_DATE_FIELD = 'CreatedDate'
|
15
|
+
|
14
16
|
def initialize(sobject, soql, connection, options={})
|
15
17
|
@sobject = sobject
|
16
18
|
@soql = soql
|
17
19
|
@connection = connection
|
18
20
|
@logger = options[:logger]
|
19
|
-
@
|
20
|
-
@
|
21
|
+
@date_field = options[:date_field] || DEFAULT_DATE_FIELD
|
22
|
+
@date_from = options[:date_from]
|
23
|
+
@date_to = options[:date_to]
|
21
24
|
@single_batch = options[:single_batch]
|
22
25
|
|
23
26
|
# jobs currently running
|
@@ -41,31 +44,23 @@ module SalesforceBulkQuery
|
|
41
44
|
def start(options={})
|
42
45
|
# order by and where not allowed
|
43
46
|
if (!@single_batch) && (@soql =~ /WHERE/i || @soql =~ /ORDER BY/i)
|
44
|
-
raise "You can't have WHERE or ORDER BY in your soql. If you want to download just specific date range use
|
47
|
+
raise "You can't have WHERE or ORDER BY in your soql. If you want to download just specific date range use date_from / date_to"
|
45
48
|
end
|
46
49
|
|
47
50
|
# create the first job
|
48
|
-
job = SalesforceBulkQuery::Job.new(
|
51
|
+
job = SalesforceBulkQuery::Job.new(
|
52
|
+
@sobject,
|
53
|
+
@connection,
|
54
|
+
{:logger => @logger, :date_field => @date_field}.merge(options)
|
55
|
+
)
|
49
56
|
job.create_job
|
50
57
|
|
51
58
|
# get the date when it should start
|
52
|
-
|
53
|
-
min_created = @created_from
|
54
|
-
else
|
55
|
-
# get the date when the first was created
|
56
|
-
min_created = nil
|
57
|
-
begin
|
58
|
-
min_created_resp = @connection.client.query("SELECT CreatedDate FROM #{@sobject} ORDER BY CreatedDate LIMIT 1")
|
59
|
-
min_created_resp.each {|s| min_created = s[:CreatedDate]}
|
60
|
-
rescue Faraday::Error::TimeoutError => e
|
61
|
-
@logger.warn "Timeout getting the oldest object for #{@sobject}. Error: #{e}. Using the default value" if @logger
|
62
|
-
min_created = DEFAULT_MIN_CREATED
|
63
|
-
end
|
64
|
-
end
|
59
|
+
min_date = get_min_date
|
65
60
|
|
66
61
|
# generate intervals
|
67
|
-
start = DateTime.parse(
|
68
|
-
stop = @
|
62
|
+
start = DateTime.parse(min_date)
|
63
|
+
stop = @date_to ? DateTime.parse(@date_to) : DateTime.now - Rational(options[:offset_from_now] || OFFSET_FROM_NOW, 1440)
|
69
64
|
job.generate_batches(@soql, start, stop, @single_batch)
|
70
65
|
|
71
66
|
job.close_job
|
@@ -112,7 +107,11 @@ module SalesforceBulkQuery
|
|
112
107
|
to_split.each do |batch|
|
113
108
|
# for each unfinished batch create a new job and add it to new jobs
|
114
109
|
@logger.info "The following subquery didn't end in time / failed verification: #{batch.soql}. Dividing into multiple and running again" if @logger
|
115
|
-
new_job = SalesforceBulkQuery::Job.new(
|
110
|
+
new_job = SalesforceBulkQuery::Job.new(
|
111
|
+
@sobject,
|
112
|
+
@connection,
|
113
|
+
{:logger => @logger, :date_field => @date_field}.merge(options)
|
114
|
+
)
|
116
115
|
new_job.create_job
|
117
116
|
new_job.generate_batches(@soql, batch.start, batch.stop)
|
118
117
|
new_job.close_job
|
@@ -153,5 +152,26 @@ module SalesforceBulkQuery
|
|
153
152
|
:jobs_done => @jobs_done.map { |j| j.job_id }
|
154
153
|
}
|
155
154
|
end
|
155
|
+
|
156
|
+
private
|
157
|
+
|
158
|
+
def get_min_date
|
159
|
+
if @date_from
|
160
|
+
return @date_from
|
161
|
+
end
|
162
|
+
|
163
|
+
# get the date when the first was created
|
164
|
+
min_created = nil
|
165
|
+
begin
|
166
|
+
min_created_resp = @connection.client.query("SELECT #{@date_field} FROM #{@sobject} ORDER BY #{@date_field} LIMIT 1")
|
167
|
+
min_created_resp.each {|s| min_created = s[@date_field.to_sym]}
|
168
|
+
rescue Faraday::Error::TimeoutError => e
|
169
|
+
@logger.warn "Timeout getting the oldest object for #{@sobject}. Error: #{e}. Using the default value" if @logger
|
170
|
+
min_created = DEFAULT_MIN_CREATED
|
171
|
+
rescue Faraday::Error::ClientError => e
|
172
|
+
fail ArgumentError, "Error when trying to get the oldest record according to #{@date_field}, looks like #{@date_field} is not on #{@sobject}. Original error: #{e}\n #{e.message} \n #{e.backtrace} "
|
173
|
+
end
|
174
|
+
min_created
|
175
|
+
end
|
156
176
|
end
|
157
177
|
end
|
data/new-version.sh
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
#!/bin/sh
|
2
|
+
|
3
|
+
# get the new version
|
4
|
+
VERSION=`bundle exec ruby <<-EORUBY
|
5
|
+
|
6
|
+
require 'salesforce_bulk_query'
|
7
|
+
puts SalesforceBulkQuery::VERSION
|
8
|
+
|
9
|
+
EORUBY`
|
10
|
+
|
11
|
+
# create tag and push it
|
12
|
+
TAG="v$VERSION"
|
13
|
+
git tag $TAG
|
14
|
+
git push origin $TAG
|
15
|
+
|
16
|
+
# build and push the gem
|
17
|
+
gem build salesforce_bulk_query.gemspec
|
18
|
+
gem push "salesforce_bulk_query-$VERSION.gem"
|
19
|
+
|
20
|
+
# update the gem after a few secs
|
21
|
+
wait 30
|
22
|
+
gem update salesforce_bulk_query
|
@@ -104,15 +104,17 @@ describe SalesforceBulkQuery do
|
|
104
104
|
from = "#{frm}T00:00:00.000Z"
|
105
105
|
t = "2020-01-01"
|
106
106
|
to = "#{t}T00:00:00.000Z"
|
107
|
+
field = 'SystemModstamp'
|
107
108
|
result = @api.query(
|
108
109
|
"Account",
|
109
110
|
"SELECT Id, Name, Industry, Type FROM Account",
|
110
111
|
:check_interval => 30,
|
111
112
|
:directory_path => tmp,
|
112
|
-
:
|
113
|
-
:
|
113
|
+
:date_from => from,
|
114
|
+
:date_to => to,
|
114
115
|
:single_batch => true,
|
115
|
-
:count_lines => true
|
116
|
+
:count_lines => true,
|
117
|
+
:date_field => field
|
116
118
|
)
|
117
119
|
|
118
120
|
result[:filenames].should have(1).items
|
@@ -132,7 +134,25 @@ describe SalesforceBulkQuery do
|
|
132
134
|
filename.should match(tmp)
|
133
135
|
filename.should match(frm)
|
134
136
|
filename.should match(t)
|
137
|
+
filename.should match(field)
|
138
|
+
end
|
139
|
+
end
|
140
|
+
context "when you give it a bad date_field" do
|
141
|
+
it "fails with argument error with no from date" do
|
142
|
+
expect{@api.query(@entity, "SELECT Id, CreatedDate FROM #{@entity}", :date_field => 'SomethingInvalid')}.to raise_error(ArgumentError)
|
135
143
|
end
|
144
|
+
it "fails with argument error with given from date" do
|
145
|
+
from = "2000-01-01T00:00:00.000Z"
|
146
|
+
expect{
|
147
|
+
@api.query(
|
148
|
+
@entity,
|
149
|
+
"SELECT Id, CreatedDate FROM #{@entity}",
|
150
|
+
:date_field => 'SomethingInvalid',
|
151
|
+
:date_from => from
|
152
|
+
)
|
153
|
+
}.to raise_error(ArgumentError)
|
154
|
+
end
|
155
|
+
|
136
156
|
end
|
137
157
|
context "when you give it a short time limit" do
|
138
158
|
it "downloads some stuff is unfinished" do
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: salesforce_bulk_query
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Petr Cvengros
|
@@ -174,6 +174,7 @@ files:
|
|
174
174
|
- lib/salesforce_bulk_query/query.rb
|
175
175
|
- lib/salesforce_bulk_query/utils.rb
|
176
176
|
- lib/salesforce_bulk_query/version.rb
|
177
|
+
- new-version.sh
|
177
178
|
- salesforce_bulk_query.gemspec
|
178
179
|
- spec/salesforce_bulk_query_spec.rb
|
179
180
|
- spec/spec_helper.rb
|