salesforce_bulk_query 0.1.6 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 93410e3eaddd8559f8127d9cad6ed1223093da53
4
- data.tar.gz: f24d761b7d5e39c67efc54eb99c9e9006a7b3330
3
+ metadata.gz: 3c1805666d306731c5d9118b5e231ae9940fc92f
4
+ data.tar.gz: 347fc0ada770fd4ae8b7bcaaf32d6b3b14516a9e
5
5
  SHA512:
6
- metadata.gz: d849ee9f3525ecd0a92c57d4c997c7158d7b830892b0aaa221b9b38d7e55d0042d08dd19a020f1729d7deea37df9790d1e81bf4dab6bf15042cf2197d5501033
7
- data.tar.gz: 11c53de51a4c6cbe7541e63cf2f236900b559d12b464e287c79c7b800db1d13ff1ff6ae097a21030c59cb6cb95d24c2512844c459728b73a51aed6b8aba380dd
6
+ metadata.gz: 38d588dea39c3f5ce3d7bb230d8af27729c236ebc9a597d58566af64f8f8dee3f2cc459cf1f893e4e344bf8a53a9e2e0f4bd89d3f9e79a28e0d34719d81bc0e7
7
+ data.tar.gz: 1fd675272be999224f69cef8a6dbc593e00d0bf9d9879d9e600228b42687969279a2960dc29380365877454d93bea831e5662ecba3172f5df78291b5c65c04dd
data/README.md CHANGED
@@ -77,7 +77,7 @@ end
77
77
 
78
78
  ## How it works
79
79
 
80
- The library uses the [Salesforce Bulk API](https://www.salesforce.com/us/developer/docs/api_asynch/index_Left.htm#CSHID=asynch_api_bulk_query.htm|StartTopic=Content%2Fasynch_api_bulk_query.htm|SkinName=webhelp). The given query is divided into 15 subqueries, according to the [limits](http://www.salesforce.com/us/developer/docs/api_asynchpre/Content/asynch_api_concepts_limits.htm#batch_proc_time_title). Each subquery is an interval based on the CreatedDate Salesforce field. The limits are passed to the API in SOQL queries. Subqueries are sent to the API as batches and added to a job.
80
+ The library uses the [Salesforce Bulk API](https://www.salesforce.com/us/developer/docs/api_asynch/index_Left.htm#CSHID=asynch_api_bulk_query.htm|StartTopic=Content%2Fasynch_api_bulk_query.htm|SkinName=webhelp). The given query is divided into 15 subqueries, according to the [limits](http://www.salesforce.com/us/developer/docs/api_asynchpre/Content/asynch_api_concepts_limits.htm#batch_proc_time_title). Each subquery is an interval based on the CreatedDate Salesforce field (date field can be customized). The limits are passed to the API in SOQL queries. Subqueries are sent to the API as batches and added to a job.
81
81
 
82
82
  The first interval starts with the date the first Salesforce object was created, we query Salesforce REST API for that. If this query times out, we use a constant. The last interval ends a few minutes before now to avoid consistency issues. Custom start and end can be passed - see Options.
83
83
 
@@ -87,13 +87,14 @@ CSV results are downloaded by chunks, so that we don't run into memory related i
87
87
 
88
88
  ## Options
89
89
  There are a few optional settings you can pass to the `Api` methods:
90
- * `api_version`: which Salesforce api version should be used
91
- * `logger`: where logs should go
92
- * `filename_prefix`: prefix applied to csv files
93
- * `directory_path`: custom direcotory path for CSVs, if omitted, a new temp directory is created
94
- * `check_interval`: how often the results should be checked in secs.
95
- * `time_limit`: maximum time the query can take. If this time limit is exceeded, available results are downloaded and the list of subqueries that didn't finished is returned. In seconds. The limti should be understood as limit for waiting. When the limit is reached the function downloads data that is ready which can take some additonal time. If no limit is given the query runs until it finishes
96
- * `created_from`, `created_to`: limits for the CreatedDate field. Note that queries can't contain any WHERE statements as we're doing some manipulations to create subqueries and we don't want things to get too difficult. So this is the way to limit the query yourself. The format is like `"1999-01-01T00:00:00.000Z"`
90
+ * `api_version`: Which Salesforce api version should be used
91
+ * `logger`: Where logs should go, see the example.
92
+ * `filename_prefix`: Prefix applied to csv files.
93
+ * `directory_path`: Custom direcotory path for CSVs, if omitted, a new temp directory is created.
94
+ * `check_interval`: How often the results should be checked in secs.
95
+ * `time_limit`: Maximum time the query can take. If this time limit is exceeded, available results are downloaded and the list of subqueries that didn't finished is returned. In seconds. The limti should be understood as limit for waiting. When the limit is reached the function downloads data that is ready which can take some additonal time. If no limit is given the query runs until it finishes.
96
+ * `date_field`: Salesforce date field that will be used for splitting the query, and optionally limiting the results. Must be a date field on the queried sobject. Default is `CreatedDate`
97
+ * `date_from`, `date_to`: limits for the `date_field` (`CreatedDate` by default). Note that queries can't contain any WHERE statements as we're doing some manipulations to create subqueries and we don't want things to get too difficult. So this is the way to limit the query yourself. The format is like `"1999-01-01T00:00:00.000Z"`
97
98
  * `single_batch`: If true, the queries are not divided into subqueries as described above. Instead one batch job is created with the given query. This is faster for small amount of data, but will fail with a timeout if you have a lot of data.
98
99
 
99
100
  See specs for exact usage.
@@ -119,10 +120,14 @@ If you're using Restforce as a client (which you probably are) and you want to d
119
120
 
120
121
  ## Notes
121
122
 
122
- Query (user given) -> Job (Salesforce construct that encapsulates 15 batches) -> Batch (1 SOQL with CreatedDate constraints)
123
+ Query (user given) -> Job (Salesforce construct that encapsulates 15 batches) -> Batch (1 SOQL with Date constraints)
123
124
 
124
125
  At the beginning the query is divided into 15 subqueries and put into a single job. When one of the subqueries fails, a new job with 15 subqueries is created, the range of the failed query is divided into 15 sub-subqueries.
125
126
 
127
+ ## Change policy
128
+
129
+ The gem is trying to follow [semantic versioning](http://semver.org/). All methods and options described in this readme document are considered public API. If any of these change, at least the minor version is bumped. Methods and their params not described in this document can change even in patches.
130
+
126
131
  ## Running tests locally
127
132
  Travis CI is set up for this repository to make sure all the tests are passing with each commit.
128
133
 
@@ -12,6 +12,7 @@ module SalesforceBulkQuery
12
12
  @soql = options[:soql]
13
13
  @job_id = options[:job_id]
14
14
  @connection = options[:connection]
15
+ @date_field = options[:date_field] or fail "date_field must be given when creating a batch"
15
16
  @start = options[:start]
16
17
  @stop = options[:stop]
17
18
  @logger = options[:logger]
@@ -77,7 +78,7 @@ module SalesforceBulkQuery
77
78
  end
78
79
 
79
80
  def get_filename
80
- return "#{@sobject}_#{@start}_#{@stop}_#{@batch_id}.csv"
81
+ return "#{@sobject}_#{@date_field}_#{@start}_#{@stop}_#{@batch_id}.csv"
81
82
  end
82
83
 
83
84
  def get_result(options={})
@@ -116,7 +117,7 @@ module SalesforceBulkQuery
116
117
  end
117
118
 
118
119
  def verify
119
- api_count = @connection.query_count(@sobject, @start, @stop)
120
+ api_count = @connection.query_count(@sobject, @date_field, @start, @stop)
120
121
  # if we weren't able to get the count, fail.
121
122
  if api_count.nil?
122
123
  return @verification = false
@@ -112,9 +112,9 @@ module SalesforceBulkQuery
112
112
  end
113
113
  end
114
114
 
115
- def query_count(sobject, from, to)
115
+ def query_count(sobject, date_field, from, to)
116
116
  # do it with retries, if it doesn't succeed, return nil, don't fail.
117
- soql = "SELECT COUNT() FROM #{sobject} WHERE CreatedDate >= #{from} AND CreatedDate < #{to}"
117
+ soql = "SELECT COUNT() FROM #{sobject} WHERE #{date_field} >= #{from} AND #{date_field} < #{to}"
118
118
  begin
119
119
  with_retries do
120
120
  q = @client.query(soql)
@@ -135,4 +135,4 @@ module SalesforceBulkQuery
135
135
  }
136
136
  end
137
137
  end
138
- end
138
+ end
@@ -16,6 +16,7 @@ module SalesforceBulkQuery
16
16
  @connection = connection
17
17
  @logger = options[:logger]
18
18
  @job_time_limit = options[:job_time_limit] || JOB_TIME_LIMIT
19
+ @date_field = options[:date_field] or fail "date_field must be given when creating a batch"
19
20
 
20
21
  # all batches (static)
21
22
  @batches = []
@@ -43,7 +44,7 @@ module SalesforceBulkQuery
43
44
  end
44
45
 
45
46
  def get_extended_soql(soql, from, to)
46
- return "#{soql} WHERE CreatedDate >= #{from} AND CreatedDate < #{to}"
47
+ return "#{soql} WHERE #{@date_field} >= #{from} AND #{@date_field} < #{to}"
47
48
  end
48
49
 
49
50
  def generate_batches(soql, start, stop, single_batch=false)
@@ -88,7 +89,8 @@ module SalesforceBulkQuery
88
89
  :connection => @connection,
89
90
  :start => options[:start],
90
91
  :stop => options[:stop],
91
- :logger => @logger
92
+ :logger => @logger,
93
+ :date_field => @date_field
92
94
  )
93
95
  batch.create
94
96
 
@@ -6,18 +6,21 @@ module SalesforceBulkQuery
6
6
  # Abstraction of a single user-given query. It contains multiple jobs, is tied to a specific connection
7
7
  class Query
8
8
 
9
- # if no created_to is given we use the current time with this offset
9
+ # if no date_to is given we use the current time with this offset
10
10
  # subtracted (to make sure the freshest changes that can be inconsistent
11
11
  # aren't there) It's in minutes
12
12
  OFFSET_FROM_NOW = 10
13
13
 
14
+ DEFAULT_DATE_FIELD = 'CreatedDate'
15
+
14
16
  def initialize(sobject, soql, connection, options={})
15
17
  @sobject = sobject
16
18
  @soql = soql
17
19
  @connection = connection
18
20
  @logger = options[:logger]
19
- @created_from = options[:created_from]
20
- @created_to = options[:created_to]
21
+ @date_field = options[:date_field] || DEFAULT_DATE_FIELD
22
+ @date_from = options[:date_from]
23
+ @date_to = options[:date_to]
21
24
  @single_batch = options[:single_batch]
22
25
 
23
26
  # jobs currently running
@@ -41,31 +44,23 @@ module SalesforceBulkQuery
41
44
  def start(options={})
42
45
  # order by and where not allowed
43
46
  if (!@single_batch) && (@soql =~ /WHERE/i || @soql =~ /ORDER BY/i)
44
- raise "You can't have WHERE or ORDER BY in your soql. If you want to download just specific date range use created_from / created_to"
47
+ raise "You can't have WHERE or ORDER BY in your soql. If you want to download just specific date range use date_from / date_to"
45
48
  end
46
49
 
47
50
  # create the first job
48
- job = SalesforceBulkQuery::Job.new(@sobject, @connection, {:logger => @logger}.merge(options))
51
+ job = SalesforceBulkQuery::Job.new(
52
+ @sobject,
53
+ @connection,
54
+ {:logger => @logger, :date_field => @date_field}.merge(options)
55
+ )
49
56
  job.create_job
50
57
 
51
58
  # get the date when it should start
52
- if @created_from
53
- min_created = @created_from
54
- else
55
- # get the date when the first was created
56
- min_created = nil
57
- begin
58
- min_created_resp = @connection.client.query("SELECT CreatedDate FROM #{@sobject} ORDER BY CreatedDate LIMIT 1")
59
- min_created_resp.each {|s| min_created = s[:CreatedDate]}
60
- rescue Faraday::Error::TimeoutError => e
61
- @logger.warn "Timeout getting the oldest object for #{@sobject}. Error: #{e}. Using the default value" if @logger
62
- min_created = DEFAULT_MIN_CREATED
63
- end
64
- end
59
+ min_date = get_min_date
65
60
 
66
61
  # generate intervals
67
- start = DateTime.parse(min_created)
68
- stop = @created_to ? DateTime.parse(@created_to) : DateTime.now - Rational(options[:offset_from_now] || OFFSET_FROM_NOW, 1440)
62
+ start = DateTime.parse(min_date)
63
+ stop = @date_to ? DateTime.parse(@date_to) : DateTime.now - Rational(options[:offset_from_now] || OFFSET_FROM_NOW, 1440)
69
64
  job.generate_batches(@soql, start, stop, @single_batch)
70
65
 
71
66
  job.close_job
@@ -112,7 +107,11 @@ module SalesforceBulkQuery
112
107
  to_split.each do |batch|
113
108
  # for each unfinished batch create a new job and add it to new jobs
114
109
  @logger.info "The following subquery didn't end in time / failed verification: #{batch.soql}. Dividing into multiple and running again" if @logger
115
- new_job = SalesforceBulkQuery::Job.new(@sobject, @connection, {:logger => @logger}.merge(options))
110
+ new_job = SalesforceBulkQuery::Job.new(
111
+ @sobject,
112
+ @connection,
113
+ {:logger => @logger, :date_field => @date_field}.merge(options)
114
+ )
116
115
  new_job.create_job
117
116
  new_job.generate_batches(@soql, batch.start, batch.stop)
118
117
  new_job.close_job
@@ -153,5 +152,26 @@ module SalesforceBulkQuery
153
152
  :jobs_done => @jobs_done.map { |j| j.job_id }
154
153
  }
155
154
  end
155
+
156
+ private
157
+
158
+ def get_min_date
159
+ if @date_from
160
+ return @date_from
161
+ end
162
+
163
+ # get the date when the first was created
164
+ min_created = nil
165
+ begin
166
+ min_created_resp = @connection.client.query("SELECT #{@date_field} FROM #{@sobject} ORDER BY #{@date_field} LIMIT 1")
167
+ min_created_resp.each {|s| min_created = s[@date_field.to_sym]}
168
+ rescue Faraday::Error::TimeoutError => e
169
+ @logger.warn "Timeout getting the oldest object for #{@sobject}. Error: #{e}. Using the default value" if @logger
170
+ min_created = DEFAULT_MIN_CREATED
171
+ rescue Faraday::Error::ClientError => e
172
+ fail ArgumentError, "Error when trying to get the oldest record according to #{@date_field}, looks like #{@date_field} is not on #{@sobject}. Original error: #{e}\n #{e.message} \n #{e.backtrace} "
173
+ end
174
+ min_created
175
+ end
156
176
  end
157
177
  end
@@ -1,3 +1,3 @@
1
1
  module SalesforceBulkQuery
2
- VERSION = '0.1.6'
2
+ VERSION = '0.2.0'
3
3
  end
@@ -0,0 +1,22 @@
1
+ #!/bin/sh
2
+
3
+ # get the new version
4
+ VERSION=`bundle exec ruby <<-EORUBY
5
+
6
+ require 'salesforce_bulk_query'
7
+ puts SalesforceBulkQuery::VERSION
8
+
9
+ EORUBY`
10
+
11
+ # create tag and push it
12
+ TAG="v$VERSION"
13
+ git tag $TAG
14
+ git push origin $TAG
15
+
16
+ # build and push the gem
17
+ gem build salesforce_bulk_query.gemspec
18
+ gem push "salesforce_bulk_query-$VERSION.gem"
19
+
20
+ # update the gem after a few secs
21
+ wait 30
22
+ gem update salesforce_bulk_query
@@ -104,15 +104,17 @@ describe SalesforceBulkQuery do
104
104
  from = "#{frm}T00:00:00.000Z"
105
105
  t = "2020-01-01"
106
106
  to = "#{t}T00:00:00.000Z"
107
+ field = 'SystemModstamp'
107
108
  result = @api.query(
108
109
  "Account",
109
110
  "SELECT Id, Name, Industry, Type FROM Account",
110
111
  :check_interval => 30,
111
112
  :directory_path => tmp,
112
- :created_from => from,
113
- :created_to => to,
113
+ :date_from => from,
114
+ :date_to => to,
114
115
  :single_batch => true,
115
- :count_lines => true
116
+ :count_lines => true,
117
+ :date_field => field
116
118
  )
117
119
 
118
120
  result[:filenames].should have(1).items
@@ -132,7 +134,25 @@ describe SalesforceBulkQuery do
132
134
  filename.should match(tmp)
133
135
  filename.should match(frm)
134
136
  filename.should match(t)
137
+ filename.should match(field)
138
+ end
139
+ end
140
+ context "when you give it a bad date_field" do
141
+ it "fails with argument error with no from date" do
142
+ expect{@api.query(@entity, "SELECT Id, CreatedDate FROM #{@entity}", :date_field => 'SomethingInvalid')}.to raise_error(ArgumentError)
135
143
  end
144
+ it "fails with argument error with given from date" do
145
+ from = "2000-01-01T00:00:00.000Z"
146
+ expect{
147
+ @api.query(
148
+ @entity,
149
+ "SELECT Id, CreatedDate FROM #{@entity}",
150
+ :date_field => 'SomethingInvalid',
151
+ :date_from => from
152
+ )
153
+ }.to raise_error(ArgumentError)
154
+ end
155
+
136
156
  end
137
157
  context "when you give it a short time limit" do
138
158
  it "downloads some stuff is unfinished" do
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: salesforce_bulk_query
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.6
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Petr Cvengros
@@ -174,6 +174,7 @@ files:
174
174
  - lib/salesforce_bulk_query/query.rb
175
175
  - lib/salesforce_bulk_query/utils.rb
176
176
  - lib/salesforce_bulk_query/version.rb
177
+ - new-version.sh
177
178
  - salesforce_bulk_query.gemspec
178
179
  - spec/salesforce_bulk_query_spec.rb
179
180
  - spec/spec_helper.rb