embulk-input-google_analytics 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +39 -2
- data/Rakefile +6 -0
- data/embulk-input-google_analytics.gemspec +4 -2
- data/lib/embulk/input/google_analytics/client.rb +61 -15
- data/lib/embulk/input/google_analytics/plugin.rb +64 -3
- data/service_account.png +0 -0
- data/test/embulk/input/google_analytics/test_client.rb +91 -4
- data/test/embulk/input/google_analytics/test_plugin.rb +190 -0
- data/test/fixtures/valid.yml +2 -0
- metadata +31 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 122431f951c569688cc1df231944216e766f9356
|
4
|
+
data.tar.gz: 687477c6f8e0c08853b07e681bd3f3e39e81f876
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 226f36d2128a30c5f30da1083c6f9bb63db86bb7cd08a9221c3f55f891fee816bc2e154fcf0e48eadbb9f71c959547ca6c644e3cbebe3fdce5b20fcb5234b9b7
|
7
|
+
data.tar.gz: 2c1f6b067797788227f46210ba99270d80f6720da43cb97349fa439e7463fadc6b150c05f665ca60e15388cdf57ac4b67d409a5d2da4dc8e9b73e1b55e9f0854
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,8 @@
|
|
1
|
+
## 0.1.1 - 2016-07-13
|
2
|
+
* Enable scheduled execution [#4](https://github.com/treasure-data/embulk-input-google_analytics/pull/4)
|
3
|
+
* Error handling [#6](https://github.com/treasure-data/embulk-input-google_analytics/pull/6)
|
4
|
+
* Ignore too early accessing data due to it is not fixed value [#5](https://github.com/treasure-data/embulk-input-google_analytics/pull/5)
|
5
|
+
|
1
6
|
## 0.1.0 - 2016-07-07
|
2
7
|
|
3
8
|
The first release!!
|
data/README.md
CHANGED
@@ -15,8 +15,45 @@ Embulk input plugin for Google Analytics reports.
|
|
15
15
|
- **time_series**: Only `ga:dateHour` or `ga:date` (string, required)
|
16
16
|
- **dimensions**: Target dimensions (array, default: `[]` )
|
17
17
|
- **metrics**: Target metrics (array, default: `[]` )
|
18
|
-
- **start_date**: Target report start date (string, default: [7 days ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
19
|
-
- **end_date**: Target report end date (string, default: [1 day ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
18
|
+
- **start_date**: Target report start date. Valid format is "YYYY-MM-DD". (string, default: [7 days ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
19
|
+
- **end_date**: Target report end date. Valid format is "YYYY-MM-DD". (string, default: [1 day ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
20
|
+
- **incremental**: `true` for generate "config_diff" with `embulk run -c config.diff` (bool, default: true)
|
21
|
+
- **last_record_time**: Ignore fetched records until this time. Mainly for incremental:true. (string, default: nil)
|
22
|
+
- **retry_limit**: Try to retry this times (integer, default: 5)
|
23
|
+
- **retry_initial_wait_sec**: Wait seconds for exponential backoff initial value (integer, default: 2)
|
24
|
+
|
25
|
+
### About `json_key_content` option.
|
26
|
+
|
27
|
+
You need a service account on Google.
|
28
|
+
|
29
|
+
<ol>
|
30
|
+
<li>Open the <a href="https://console.developers.google.com/permissions/serviceaccounts"><b>Service accounts</b> page</a>. If prompted,
|
31
|
+
select a project.</li>
|
32
|
+
<li>Click <b>Create service account</b>.</li>
|
33
|
+
<li>
|
34
|
+
|
35
|
+
In the <b>Create service account</b> window, type a name for the service
|
36
|
+
account, and select <b>Furnish a new private key</b>. If you want to
|
37
|
+
<a href="https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority">grant
|
38
|
+
Google Apps domain-wide authority</a> to the service account, also select
|
39
|
+
<b>Enable Google Apps Domain-wide Delegation</b>.
|
40
|
+
|
41
|
+
Then click <b>Create</b>.</li>
|
42
|
+
</ol>
|
43
|
+
From: <https://developers.google.com/identity/protocols/OAuth2ServiceAccount>
|
44
|
+
|
45
|
+
Screenshot: 
|
46
|
+
|
47
|
+
## Why the result doesn't match with web interface?
|
48
|
+
|
49
|
+
Google Reporting API uses "sampling" data.
|
50
|
+
|
51
|
+
- https://developers.google.com/analytics/devguides/reporting/core/v4/basics#sampling
|
52
|
+
- https://support.google.com/analytics/answer/2637192
|
53
|
+
|
54
|
+
That means sometimes result will be unmatched with Google Analytics web interface, and the result is based on sampled data, not all of raw data. This is a Google API's limitation.
|
55
|
+
|
56
|
+
Currently a sampling level supported by this plugin is DEFAULT only. Let us know if you want to use other sampling level (SMALL or LARGE).
|
20
57
|
|
21
58
|
## Example
|
22
59
|
|
data/Rakefile
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
require "bundler/gem_tasks"
|
2
|
+
require "gem_release_helper/tasks"
|
2
3
|
|
3
4
|
task default: :test
|
4
5
|
|
@@ -13,3 +14,8 @@ task :cov do
|
|
13
14
|
ruby("--debug", "test/run-test.rb", "--use-color=yes", "--collector=dir")
|
14
15
|
end
|
15
16
|
|
17
|
+
GemReleaseHelper::Tasks.install({
|
18
|
+
gemspec: "./embulk-input-google_analytics.gemspec",
|
19
|
+
github_name: "treasure-data/embulk-input-google_analytics",
|
20
|
+
})
|
21
|
+
|
@@ -1,7 +1,7 @@
|
|
1
1
|
|
2
2
|
Gem::Specification.new do |spec|
|
3
3
|
spec.name = "embulk-input-google_analytics"
|
4
|
-
spec.version = "0.1.
|
4
|
+
spec.version = "0.1.1"
|
5
5
|
spec.authors = ["uu59"]
|
6
6
|
spec.summary = "Google Analytics input plugin for Embulk"
|
7
7
|
spec.description = "Loads records from Google Analytics."
|
@@ -16,7 +16,8 @@ Gem::Specification.new do |spec|
|
|
16
16
|
spec.add_dependency "httpclient"
|
17
17
|
spec.add_dependency "google-api-client", "~> 0.9"
|
18
18
|
spec.add_dependency "signet"
|
19
|
-
spec.add_dependency "activesupport" # for Time.zone.parse
|
19
|
+
spec.add_dependency "activesupport" # for Time.zone.parse, Time.zone.now
|
20
|
+
spec.add_dependency "perfect_retry", "~> 0.5"
|
20
21
|
|
21
22
|
spec.add_development_dependency 'embulk', ['>= 0.8.9']
|
22
23
|
spec.add_development_dependency 'bundler', ['>= 1.10.6']
|
@@ -26,4 +27,5 @@ Gem::Specification.new do |spec|
|
|
26
27
|
spec.add_development_dependency 'simplecov'
|
27
28
|
spec.add_development_dependency "codeclimate-test-reporter"
|
28
29
|
spec.add_development_dependency "pry"
|
30
|
+
spec.add_development_dependency "gem_release_helper", "~> 1.0"
|
29
31
|
end
|
@@ -1,3 +1,4 @@
|
|
1
|
+
require "perfect_retry"
|
1
2
|
require "active_support/core_ext/time"
|
2
3
|
require "google/apis/analyticsreporting_v4"
|
3
4
|
require "google/apis/analytics_v3"
|
@@ -46,8 +47,9 @@ module Embulk
|
|
46
47
|
dim = dimensions.zip(row[:dimensions]).to_h
|
47
48
|
met = metrics.zip(row[:metrics].first[:values]).to_h
|
48
49
|
format_row = dim.merge(met)
|
49
|
-
|
50
|
-
|
50
|
+
raw_time = format_row[task["time_series"]]
|
51
|
+
next if too_early_data?(raw_time)
|
52
|
+
format_row[task["time_series"]] = time_parse_with_profile_timezone(raw_time)
|
51
53
|
block.call format_row
|
52
54
|
end
|
53
55
|
|
@@ -80,7 +82,9 @@ module Embulk
|
|
80
82
|
service.authorization = auth
|
81
83
|
|
82
84
|
Embulk.logger.debug "Fetching profile from API"
|
83
|
-
|
85
|
+
retryer.with_retry do
|
86
|
+
service.list_profiles("~all", "~all")
|
87
|
+
end
|
84
88
|
end
|
85
89
|
|
86
90
|
def time_parse_with_profile_timezone(time_string)
|
@@ -93,11 +97,9 @@ module Embulk
|
|
93
97
|
end
|
94
98
|
parts = Date._strptime(time_string, date_format)
|
95
99
|
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
ensure
|
100
|
-
Time.zone = orig_timezone
|
100
|
+
swap_time_zone do
|
101
|
+
Time.zone.local(*parts.values_at(:year, :mon, :mday, :hour)).to_time
|
102
|
+
end
|
101
103
|
end
|
102
104
|
|
103
105
|
def get_reports(page_token = nil)
|
@@ -109,14 +111,18 @@ module Embulk
|
|
109
111
|
request.report_requests = build_report_request(page_token)
|
110
112
|
|
111
113
|
Embulk.logger.info "Query to Core Report API: #{request.to_json}"
|
112
|
-
|
114
|
+
retryer.with_retry do
|
115
|
+
service.batch_get_reports request
|
116
|
+
end
|
113
117
|
end
|
114
118
|
|
115
119
|
def get_columns_list
|
116
120
|
# https://developers.google.com/analytics/devguides/reporting/metadata/v3/reference/metadata/columns/list
|
117
121
|
service = Google::Apis::AnalyticsV3::AnalyticsService.new
|
118
122
|
service.authorization = auth
|
119
|
-
|
123
|
+
retryer.with_retry do
|
124
|
+
service.list_metadata_columns("ga").to_h[:items]
|
125
|
+
end
|
120
126
|
end
|
121
127
|
|
122
128
|
def build_report_request(page_token = nil)
|
@@ -147,13 +153,53 @@ module Embulk
|
|
147
153
|
end
|
148
154
|
|
149
155
|
def auth
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
156
|
+
retryer.with_retry do
|
157
|
+
Google::Auth::ServiceAccountCredentials.make_creds(
|
158
|
+
json_key_io: StringIO.new(task["json_key_content"]),
|
159
|
+
scope: "https://www.googleapis.com/auth/analytics.readonly"
|
160
|
+
)
|
161
|
+
end
|
162
|
+
rescue Google::Apis::AuthorizationError => e
|
155
163
|
raise ConfigError.new(e.message)
|
156
164
|
end
|
165
|
+
|
166
|
+
def swap_time_zone(&block)
|
167
|
+
orig_timezone = Time.zone
|
168
|
+
Time.zone = get_profile[:timezone]
|
169
|
+
yield
|
170
|
+
ensure
|
171
|
+
Time.zone = orig_timezone
|
172
|
+
end
|
173
|
+
|
174
|
+
def too_early_data?(time_str)
|
175
|
+
# fetching 20160720 data on 2016-07-20, it is too early fetching
|
176
|
+
swap_time_zone do
|
177
|
+
now = Time.zone.now
|
178
|
+
case task["time_series"]
|
179
|
+
when "ga:dateHour"
|
180
|
+
time_str.to_i >= now.strftime("%Y%m%d%H").to_i
|
181
|
+
when "ga:date"
|
182
|
+
time_str.to_i >= now.strftime("%Y%m%d").to_i
|
183
|
+
end
|
184
|
+
end
|
185
|
+
end
|
186
|
+
|
187
|
+
def retryer
|
188
|
+
PerfectRetry.new do |config|
|
189
|
+
config.limit = task["retry_limit"]
|
190
|
+
config.logger = Embulk.logger
|
191
|
+
config.log_level = nil
|
192
|
+
|
193
|
+
# https://developers.google.com/analytics/devguides/reporting/core/v4/errors
|
194
|
+
# https://developers.google.com/analytics/devguides/reporting/core/v4/limits-quotas#additional_quota
|
195
|
+
# https://github.com/google/google-api-ruby-client/blob/master/lib/google/apis/errors.rb
|
196
|
+
# https://github.com/google/google-api-ruby-client/blob/0.9.11/lib/google/apis/core/http_command.rb#L33
|
197
|
+
config.rescues = Google::Apis::Core::HttpCommand::RETRIABLE_ERRORS
|
198
|
+
config.dont_rescues = [Embulk::DataError, Embulk::ConfigError]
|
199
|
+
config.sleep = lambda{|n| task["retry_initial_wait_sec"]* (2 ** (n-1)) }
|
200
|
+
config.raise_original_error = true
|
201
|
+
end
|
202
|
+
end
|
157
203
|
end
|
158
204
|
end
|
159
205
|
end
|
@@ -42,7 +42,7 @@ module Embulk
|
|
42
42
|
def self.resume(task, columns, count, &control)
|
43
43
|
task_reports = yield(task, columns, count)
|
44
44
|
|
45
|
-
next_config_diff =
|
45
|
+
next_config_diff = task_reports.first
|
46
46
|
return next_config_diff
|
47
47
|
end
|
48
48
|
|
@@ -56,6 +56,10 @@ module Embulk
|
|
56
56
|
"time_series" => config.param("time_series", :string),
|
57
57
|
"start_date" => config.param("start_date", :string, default: nil),
|
58
58
|
"end_date" => config.param("end_date", :string, default: nil),
|
59
|
+
"incremental" => config.param("incremental", :bool, default: true),
|
60
|
+
"last_record_time" => config.param("last_record_time", :string, default: nil),
|
61
|
+
"retry_limit" => config.param("retry_limit", :integer, default: 5),
|
62
|
+
"retry_initial_wait_sec" => config.param("retry_initial_wait_sec", :integer, default: 2),
|
59
63
|
}
|
60
64
|
end
|
61
65
|
|
@@ -79,14 +83,28 @@ module Embulk
|
|
79
83
|
client = Client.new(task, preview?)
|
80
84
|
columns = self.class.columns_from_task(task)
|
81
85
|
|
86
|
+
last_record_time = task["last_record_time"] ? Time.parse(task["last_record_time"]) : nil
|
87
|
+
|
88
|
+
latest_time_series = nil
|
82
89
|
client.each_report_row do |row|
|
90
|
+
time = row[task["time_series"]]
|
91
|
+
next if last_record_time && time <= last_record_time
|
92
|
+
|
83
93
|
values = row.values_at(*columns)
|
84
94
|
page_builder.add values
|
95
|
+
|
96
|
+
latest_time_series = [
|
97
|
+
latest_time_series,
|
98
|
+
time,
|
99
|
+
].compact.max
|
85
100
|
end
|
86
101
|
page_builder.finish
|
87
102
|
|
88
|
-
|
89
|
-
|
103
|
+
if task["incremental"]
|
104
|
+
calculate_next_times(latest_time_series)
|
105
|
+
else
|
106
|
+
{}
|
107
|
+
end
|
90
108
|
end
|
91
109
|
|
92
110
|
def preview?
|
@@ -95,6 +113,49 @@ module Embulk
|
|
95
113
|
false
|
96
114
|
end
|
97
115
|
|
116
|
+
def calculate_next_times(fetched_latest_time)
|
117
|
+
task_report = {}
|
118
|
+
|
119
|
+
if fetched_latest_time
|
120
|
+
task_report[:start_date] = fetched_latest_time.strftime("%Y-%m-%d")
|
121
|
+
|
122
|
+
# if end_date specified as statically YYYY-MM-DD, it will be conflict with start_date (end_date < start_date)
|
123
|
+
# Modify it as "today" to be safe
|
124
|
+
if task["end_date"].match(/[0-9]{4}-[0-9]{2}-[0-9]{2}/)
|
125
|
+
task_report[:end_date] = "today" # "today" means now. running at 03:30 AM, will got 3 o'clock data.
|
126
|
+
end
|
127
|
+
|
128
|
+
# "start_date" format is YYYY-MM-DD, but ga:dateHour will return records by hourly.
|
129
|
+
# If run at 2016-07-03 05:00:00, start_date will set "2016-07-03" and got records until 2016-07-03 05:00:00.
|
130
|
+
# Then next run at 2016-07-04 05:00, will got records between 2016-07-03 00:00:00 and 2016-07-04 05:00:00.
|
131
|
+
# It will evantually duplicated between 2016-07-03 00:00:00 and 2016-07-03 05:00:00
|
132
|
+
#
|
133
|
+
# Date| 2016-07-03 | 2016-07-04
|
134
|
+
# Hour| 5 | 5
|
135
|
+
# 1st run ------|----| |
|
136
|
+
# 2nd run |------------------------|-----
|
137
|
+
# ^^^^^ duplicated
|
138
|
+
#
|
139
|
+
# "last_record_time" option solves that problem
|
140
|
+
#
|
141
|
+
# Date| 2016-07-03 | 2016-07-04
|
142
|
+
# Hour| 5 | 5
|
143
|
+
# 1st run ------|----| |
|
144
|
+
# 2nd run #####|-------------------|-----
|
145
|
+
# ^^^^^ ignored (skipped)
|
146
|
+
#
|
147
|
+
task_report[:last_record_time] = fetched_latest_time.strftime("%Y-%m-%d %H:%M:%S %z")
|
148
|
+
else
|
149
|
+
# no records fetched, don't modify config_diff
|
150
|
+
task_report = {
|
151
|
+
start_date: task["start_date"],
|
152
|
+
end_date: task["end_date"],
|
153
|
+
last_record_time: task["last_record_time"],
|
154
|
+
}
|
155
|
+
end
|
156
|
+
|
157
|
+
task_report
|
158
|
+
end
|
98
159
|
end
|
99
160
|
end
|
100
161
|
end
|
data/service_account.png
ADDED
Binary file
|
@@ -179,17 +179,95 @@ module Embulk
|
|
179
179
|
sub_test_case "auth" do
|
180
180
|
setup do
|
181
181
|
conf = valid_config["in"]
|
182
|
+
mute_logger
|
182
183
|
@client = Client.new(task(embulk_config(conf)))
|
183
184
|
end
|
184
185
|
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
186
|
+
sub_test_case "retry" do
|
187
|
+
def should_retry
|
188
|
+
mock(Google::Auth::ServiceAccountCredentials).make_creds(anything).times(retryer.config.limit + 1) { raise error }
|
189
|
+
assert_raise do
|
190
|
+
@client.auth
|
191
|
+
end
|
192
|
+
end
|
193
|
+
|
194
|
+
def should_not_retry
|
195
|
+
mock(Google::Auth::ServiceAccountCredentials).make_creds(anything).times(1) { raise error }
|
196
|
+
assert_raise do
|
197
|
+
@client.auth
|
198
|
+
end
|
199
|
+
end
|
200
|
+
|
201
|
+
setup do
|
202
|
+
# stub(Google::Auth::ServiceAccountCredentials).make_creds { raise error }
|
203
|
+
end
|
204
|
+
|
205
|
+
sub_test_case "Server error (5xx)" do
|
206
|
+
def error
|
207
|
+
Google::Apis::ServerError.new("error")
|
208
|
+
end
|
209
|
+
|
210
|
+
test "should retry" do
|
211
|
+
should_retry
|
212
|
+
end
|
213
|
+
end
|
214
|
+
|
215
|
+
sub_test_case "Rate Limit" do
|
216
|
+
def error
|
217
|
+
Google::Apis::RateLimitError.new("error")
|
218
|
+
end
|
219
|
+
|
220
|
+
test "should retry" do
|
221
|
+
should_retry
|
222
|
+
end
|
223
|
+
end
|
224
|
+
|
225
|
+
sub_test_case "Auth Error" do
|
226
|
+
def error
|
227
|
+
Google::Apis::AuthorizationError.new("error")
|
228
|
+
end
|
229
|
+
|
230
|
+
test "should not retry" do
|
231
|
+
should_not_retry
|
232
|
+
end
|
189
233
|
end
|
190
234
|
end
|
191
235
|
end
|
192
236
|
|
237
|
+
sub_test_case "too_early_data?" do
|
238
|
+
def stub_timezone(client)
|
239
|
+
stub(client).get_profile { {timezone: "America/Los_Angeles" } }
|
240
|
+
stub(client).swap_time_zone do |block|
|
241
|
+
stub(Time.zone).now { @now }
|
242
|
+
block.call
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
test "ga:dateHour" do
|
247
|
+
conf = valid_config["in"]
|
248
|
+
conf["time_series"] = "ga:dateHour"
|
249
|
+
client = Client.new(task(embulk_config(conf)))
|
250
|
+
@now = Time.parse("2016-06-01 05:00:00 PDT")
|
251
|
+
stub_timezone(client)
|
252
|
+
|
253
|
+
assert_equal false, client.too_early_data?("2016060104")
|
254
|
+
assert_equal true , client.too_early_data?("2016060105")
|
255
|
+
assert_equal true , client.too_early_data?("2016060106")
|
256
|
+
end
|
257
|
+
|
258
|
+
test "ga:date" do
|
259
|
+
conf = valid_config["in"]
|
260
|
+
conf["time_series"] = "ga:date"
|
261
|
+
client = Client.new(task(embulk_config(conf)))
|
262
|
+
@now = Time.parse("2016-06-03 05:00:00 PDT")
|
263
|
+
stub_timezone(client)
|
264
|
+
|
265
|
+
assert_equal false, client.too_early_data?("20160601")
|
266
|
+
assert_equal false, client.too_early_data?("20160602")
|
267
|
+
assert_equal true , client.too_early_data?("20160603")
|
268
|
+
end
|
269
|
+
end
|
270
|
+
|
193
271
|
sub_test_case "each_report_row" do
|
194
272
|
setup do
|
195
273
|
conf = valid_config["in"]
|
@@ -284,6 +362,15 @@ module Embulk
|
|
284
362
|
def embulk_config(hash)
|
285
363
|
Embulk::DataSource.new(hash)
|
286
364
|
end
|
365
|
+
|
366
|
+
def mute_logger
|
367
|
+
@logger = Logger.new(File::NULL)
|
368
|
+
stub(Embulk).logger { @logger }
|
369
|
+
end
|
370
|
+
|
371
|
+
def retryer
|
372
|
+
@client.retryer
|
373
|
+
end
|
287
374
|
end
|
288
375
|
end
|
289
376
|
end
|
@@ -156,6 +156,37 @@ module Embulk
|
|
156
156
|
mock(@page_builder).finish
|
157
157
|
@plugin.run
|
158
158
|
end
|
159
|
+
|
160
|
+
sub_test_case "last_record_time option" do
|
161
|
+
setup do
|
162
|
+
Time.zone = "America/Los_Angeles"
|
163
|
+
@last_record_time = Time.zone.parse("2016-06-01 12:00:00").to_time
|
164
|
+
|
165
|
+
conf = valid_config["in"]
|
166
|
+
conf["time_series"] = time_series
|
167
|
+
conf["last_record_time"] = @last_record_time.strftime("%Y-%m-%d %H:%M:%S %z")
|
168
|
+
@plugin = Plugin.new(embulk_config(conf), nil, nil, @page_builder)
|
169
|
+
end
|
170
|
+
|
171
|
+
test "ignore records when old" do
|
172
|
+
any_instance_of(Client) do |klass|
|
173
|
+
stub(klass).each_report_row do |block|
|
174
|
+
row = {
|
175
|
+
"ga:dateHour" => @last_record_time,
|
176
|
+
"ga:browser" => "wget",
|
177
|
+
"ga:visits" => 3,
|
178
|
+
"ga:pageviews" => 4,
|
179
|
+
}
|
180
|
+
block.call row
|
181
|
+
end
|
182
|
+
end
|
183
|
+
|
184
|
+
mock(@page_builder).add.never
|
185
|
+
mock(@page_builder).finish
|
186
|
+
@plugin.run
|
187
|
+
end
|
188
|
+
end
|
189
|
+
|
159
190
|
end
|
160
191
|
|
161
192
|
sub_test_case "time_series: 'ga:date'" do
|
@@ -182,6 +213,36 @@ module Embulk
|
|
182
213
|
mock(@page_builder).finish
|
183
214
|
@plugin.run
|
184
215
|
end
|
216
|
+
|
217
|
+
sub_test_case "last_record_time option" do
|
218
|
+
setup do
|
219
|
+
Time.zone = "America/Los_Angeles"
|
220
|
+
@last_record_time = Time.zone.parse("2016-06-01 12:00:00").to_time
|
221
|
+
|
222
|
+
conf = valid_config["in"]
|
223
|
+
conf["time_series"] = time_series
|
224
|
+
conf["last_record_time"] = @last_record_time.strftime("%Y-%m-%d %H:%M:%S %z")
|
225
|
+
@plugin = Plugin.new(embulk_config(conf), nil, nil, @page_builder)
|
226
|
+
end
|
227
|
+
|
228
|
+
test "ignore records when old" do
|
229
|
+
any_instance_of(Client) do |klass|
|
230
|
+
stub(klass).each_report_row do |block|
|
231
|
+
row = {
|
232
|
+
"ga:date" => @last_record_time,
|
233
|
+
"ga:browser" => "wget",
|
234
|
+
"ga:visits" => 3,
|
235
|
+
"ga:pageviews" => 4,
|
236
|
+
}
|
237
|
+
block.call row
|
238
|
+
end
|
239
|
+
end
|
240
|
+
|
241
|
+
mock(@page_builder).add.never
|
242
|
+
mock(@page_builder).finish
|
243
|
+
@plugin.run
|
244
|
+
end
|
245
|
+
end
|
185
246
|
end
|
186
247
|
end
|
187
248
|
end
|
@@ -201,6 +262,135 @@ module Embulk
|
|
201
262
|
end
|
202
263
|
end
|
203
264
|
|
265
|
+
sub_test_case "calculate_next_times" do
|
266
|
+
setup do
|
267
|
+
@page_builder = Object.new
|
268
|
+
@config = embulk_config(valid_config["in"])
|
269
|
+
end
|
270
|
+
|
271
|
+
sub_test_case "ga:dateHour" do
|
272
|
+
setup do
|
273
|
+
conf = valid_config["in"]
|
274
|
+
conf["time_series"] = "ga:dateHour"
|
275
|
+
@config = embulk_config(conf)
|
276
|
+
end
|
277
|
+
|
278
|
+
sub_test_case "no records fetched" do
|
279
|
+
test "config_diff won't modify" do
|
280
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
281
|
+
expected = {
|
282
|
+
start_date: task["start_date"],
|
283
|
+
end_date: task["end_date"],
|
284
|
+
last_record_time: task["last_record_time"],
|
285
|
+
}
|
286
|
+
assert_equal expected, plugin.calculate_next_times(nil)
|
287
|
+
end
|
288
|
+
end
|
289
|
+
|
290
|
+
sub_test_case "updated" do
|
291
|
+
sub_test_case "end_date is given as YYYY-MM-DD" do
|
292
|
+
setup do
|
293
|
+
@config[:start_date] = "2000-01-01"
|
294
|
+
@config[:end_date] = "2000-01-05"
|
295
|
+
end
|
296
|
+
|
297
|
+
test "config_diff will modify" do
|
298
|
+
latest_time = Time.parse("2000-01-07")
|
299
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
300
|
+
expected = {
|
301
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
302
|
+
end_date: "today",
|
303
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
304
|
+
}
|
305
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
306
|
+
end
|
307
|
+
end
|
308
|
+
|
309
|
+
sub_test_case "end_date is given as nDaysAgo" do
|
310
|
+
setup do
|
311
|
+
@config[:start_date] = "2000-01-01"
|
312
|
+
@config[:end_date] = "10DaysAgo"
|
313
|
+
end
|
314
|
+
|
315
|
+
test "config_diff end_date won't modify" do
|
316
|
+
latest_time = Time.parse("2000-01-07")
|
317
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
318
|
+
expected = {
|
319
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
320
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
321
|
+
}
|
322
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
323
|
+
end
|
324
|
+
end
|
325
|
+
end
|
326
|
+
end
|
327
|
+
|
328
|
+
sub_test_case "ga:date" do
|
329
|
+
setup do
|
330
|
+
conf = valid_config["in"]
|
331
|
+
conf["time_series"] = "ga:date"
|
332
|
+
@config = embulk_config(conf)
|
333
|
+
end
|
334
|
+
|
335
|
+
sub_test_case "no records fetched" do
|
336
|
+
test "config_diff will keep previous" do
|
337
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
338
|
+
expected = {
|
339
|
+
start_date: task["start_date"],
|
340
|
+
end_date: task["end_date"],
|
341
|
+
last_record_time: task["last_record_time"],
|
342
|
+
}
|
343
|
+
assert_equal expected, plugin.calculate_next_times(nil)
|
344
|
+
end
|
345
|
+
end
|
346
|
+
|
347
|
+
sub_test_case "updated" do
|
348
|
+
sub_test_case "end_date is given as YYYY-MM-DD" do
|
349
|
+
setup do
|
350
|
+
@config[:start_date] = "2000-01-01"
|
351
|
+
@config[:end_date] = "2000-01-05"
|
352
|
+
end
|
353
|
+
|
354
|
+
test "config_diff will modify" do
|
355
|
+
latest_time = Time.parse("2000-01-07")
|
356
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
357
|
+
expected = {
|
358
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
359
|
+
end_date: "today",
|
360
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
361
|
+
}
|
362
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
363
|
+
end
|
364
|
+
end
|
365
|
+
|
366
|
+
sub_test_case "end_date is given as nDaysAgo" do
|
367
|
+
setup do
|
368
|
+
@config[:start_date] = "2000-01-01"
|
369
|
+
@config[:end_date] = "10DaysAgo"
|
370
|
+
end
|
371
|
+
|
372
|
+
test "config_diff end_date won't modify" do
|
373
|
+
latest_time = Time.parse("2000-01-07")
|
374
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
375
|
+
expected = {
|
376
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
377
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
378
|
+
}
|
379
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
380
|
+
end
|
381
|
+
end
|
382
|
+
end
|
383
|
+
end
|
384
|
+
|
385
|
+
def task
|
386
|
+
Plugin.task_from_config(@config)
|
387
|
+
end
|
388
|
+
|
389
|
+
def config
|
390
|
+
@config
|
391
|
+
end
|
392
|
+
end
|
393
|
+
|
204
394
|
def valid_config
|
205
395
|
fixture_load("valid.yml")
|
206
396
|
end
|
data/test/fixtures/valid.yml
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-input-google_analytics
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- uu59
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-07-
|
11
|
+
date: 2016-07-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -66,6 +66,20 @@ dependencies:
|
|
66
66
|
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
requirement: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - "~>"
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '0.5'
|
75
|
+
name: perfect_retry
|
76
|
+
prerelease: false
|
77
|
+
type: :runtime
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - "~>"
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0.5'
|
69
83
|
- !ruby/object:Gem::Dependency
|
70
84
|
requirement: !ruby/object:Gem::Requirement
|
71
85
|
requirements:
|
@@ -178,6 +192,20 @@ dependencies:
|
|
178
192
|
- - ">="
|
179
193
|
- !ruby/object:Gem::Version
|
180
194
|
version: '0'
|
195
|
+
- !ruby/object:Gem::Dependency
|
196
|
+
requirement: !ruby/object:Gem::Requirement
|
197
|
+
requirements:
|
198
|
+
- - "~>"
|
199
|
+
- !ruby/object:Gem::Version
|
200
|
+
version: '1.0'
|
201
|
+
name: gem_release_helper
|
202
|
+
prerelease: false
|
203
|
+
type: :development
|
204
|
+
version_requirements: !ruby/object:Gem::Requirement
|
205
|
+
requirements:
|
206
|
+
- - "~>"
|
207
|
+
- !ruby/object:Gem::Version
|
208
|
+
version: '1.0'
|
181
209
|
description: Loads records from Google Analytics.
|
182
210
|
email:
|
183
211
|
- k@uu59.org
|
@@ -197,6 +225,7 @@ files:
|
|
197
225
|
- lib/embulk/input/google_analytics.rb
|
198
226
|
- lib/embulk/input/google_analytics/client.rb
|
199
227
|
- lib/embulk/input/google_analytics/plugin.rb
|
228
|
+
- service_account.png
|
200
229
|
- test/embulk/input/google_analytics/test_client.rb
|
201
230
|
- test/embulk/input/google_analytics/test_plugin.rb
|
202
231
|
- test/fixture_helper.rb
|