embulk-input-google_analytics 0.1.0 → 0.1.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +39 -2
- data/Rakefile +6 -0
- data/embulk-input-google_analytics.gemspec +4 -2
- data/lib/embulk/input/google_analytics/client.rb +61 -15
- data/lib/embulk/input/google_analytics/plugin.rb +64 -3
- data/service_account.png +0 -0
- data/test/embulk/input/google_analytics/test_client.rb +91 -4
- data/test/embulk/input/google_analytics/test_plugin.rb +190 -0
- data/test/fixtures/valid.yml +2 -0
- metadata +31 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 122431f951c569688cc1df231944216e766f9356
|
4
|
+
data.tar.gz: 687477c6f8e0c08853b07e681bd3f3e39e81f876
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 226f36d2128a30c5f30da1083c6f9bb63db86bb7cd08a9221c3f55f891fee816bc2e154fcf0e48eadbb9f71c959547ca6c644e3cbebe3fdce5b20fcb5234b9b7
|
7
|
+
data.tar.gz: 2c1f6b067797788227f46210ba99270d80f6720da43cb97349fa439e7463fadc6b150c05f665ca60e15388cdf57ac4b67d409a5d2da4dc8e9b73e1b55e9f0854
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,8 @@
|
|
1
|
+
## 0.1.1 - 2016-07-13
|
2
|
+
* Enable scheduled execution [#4](https://github.com/treasure-data/embulk-input-google_analytics/pull/4)
|
3
|
+
* Error handling [#6](https://github.com/treasure-data/embulk-input-google_analytics/pull/6)
|
4
|
+
* Ignore too early accessing data due to it is not fixed value [#5](https://github.com/treasure-data/embulk-input-google_analytics/pull/5)
|
5
|
+
|
1
6
|
## 0.1.0 - 2016-07-07
|
2
7
|
|
3
8
|
The first release!!
|
data/README.md
CHANGED
@@ -15,8 +15,45 @@ Embulk input plugin for Google Analytics reports.
|
|
15
15
|
- **time_series**: Only `ga:dateHour` or `ga:date` (string, required)
|
16
16
|
- **dimensions**: Target dimensions (array, default: `[]` )
|
17
17
|
- **metrics**: Target metrics (array, default: `[]` )
|
18
|
-
- **start_date**: Target report start date (string, default: [7 days ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
19
|
-
- **end_date**: Target report end date (string, default: [1 day ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
18
|
+
- **start_date**: Target report start date. Valid format is "YYYY-MM-DD". (string, default: [7 days ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
19
|
+
- **end_date**: Target report end date. Valid format is "YYYY-MM-DD". (string, default: [1 day ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
|
20
|
+
- **incremental**: `true` for generate "config_diff" with `embulk run -c config.diff` (bool, default: true)
|
21
|
+
- **last_record_time**: Ignore fetched records until this time. Mainly for incremental:true. (string, default: nil)
|
22
|
+
- **retry_limit**: Try to retry this times (integer, default: 5)
|
23
|
+
- **retry_initial_wait_sec**: Wait seconds for exponential backoff initial value (integer, default: 2)
|
24
|
+
|
25
|
+
### About `json_key_content` option.
|
26
|
+
|
27
|
+
You need a service account on Google.
|
28
|
+
|
29
|
+
<ol>
|
30
|
+
<li>Open the <a href="https://console.developers.google.com/permissions/serviceaccounts"><b>Service accounts</b> page</a>. If prompted,
|
31
|
+
select a project.</li>
|
32
|
+
<li>Click <b>Create service account</b>.</li>
|
33
|
+
<li>
|
34
|
+
|
35
|
+
In the <b>Create service account</b> window, type a name for the service
|
36
|
+
account, and select <b>Furnish a new private key</b>. If you want to
|
37
|
+
<a href="https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority">grant
|
38
|
+
Google Apps domain-wide authority</a> to the service account, also select
|
39
|
+
<b>Enable Google Apps Domain-wide Delegation</b>.
|
40
|
+
|
41
|
+
Then click <b>Create</b>.</li>
|
42
|
+
</ol>
|
43
|
+
From: <https://developers.google.com/identity/protocols/OAuth2ServiceAccount>
|
44
|
+
|
45
|
+
Screenshot: ![Service Account](./service_account.png)
|
46
|
+
|
47
|
+
## Why the result doesn't match with web interface?
|
48
|
+
|
49
|
+
Google Reporting API uses "sampling" data.
|
50
|
+
|
51
|
+
- https://developers.google.com/analytics/devguides/reporting/core/v4/basics#sampling
|
52
|
+
- https://support.google.com/analytics/answer/2637192
|
53
|
+
|
54
|
+
That means sometimes result will be unmatched with Google Analytics web interface, and the result is based on sampled data, not all of raw data. This is a Google API's limitation.
|
55
|
+
|
56
|
+
Currently a sampling level supported by this plugin is DEFAULT only. Let us know if you want to use other sampling level (SMALL or LARGE).
|
20
57
|
|
21
58
|
## Example
|
22
59
|
|
data/Rakefile
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
require "bundler/gem_tasks"
|
2
|
+
require "gem_release_helper/tasks"
|
2
3
|
|
3
4
|
task default: :test
|
4
5
|
|
@@ -13,3 +14,8 @@ task :cov do
|
|
13
14
|
ruby("--debug", "test/run-test.rb", "--use-color=yes", "--collector=dir")
|
14
15
|
end
|
15
16
|
|
17
|
+
GemReleaseHelper::Tasks.install({
|
18
|
+
gemspec: "./embulk-input-google_analytics.gemspec",
|
19
|
+
github_name: "treasure-data/embulk-input-google_analytics",
|
20
|
+
})
|
21
|
+
|
@@ -1,7 +1,7 @@
|
|
1
1
|
|
2
2
|
Gem::Specification.new do |spec|
|
3
3
|
spec.name = "embulk-input-google_analytics"
|
4
|
-
spec.version = "0.1.
|
4
|
+
spec.version = "0.1.1"
|
5
5
|
spec.authors = ["uu59"]
|
6
6
|
spec.summary = "Google Analytics input plugin for Embulk"
|
7
7
|
spec.description = "Loads records from Google Analytics."
|
@@ -16,7 +16,8 @@ Gem::Specification.new do |spec|
|
|
16
16
|
spec.add_dependency "httpclient"
|
17
17
|
spec.add_dependency "google-api-client", "~> 0.9"
|
18
18
|
spec.add_dependency "signet"
|
19
|
-
spec.add_dependency "activesupport" # for Time.zone.parse
|
19
|
+
spec.add_dependency "activesupport" # for Time.zone.parse, Time.zone.now
|
20
|
+
spec.add_dependency "perfect_retry", "~> 0.5"
|
20
21
|
|
21
22
|
spec.add_development_dependency 'embulk', ['>= 0.8.9']
|
22
23
|
spec.add_development_dependency 'bundler', ['>= 1.10.6']
|
@@ -26,4 +27,5 @@ Gem::Specification.new do |spec|
|
|
26
27
|
spec.add_development_dependency 'simplecov'
|
27
28
|
spec.add_development_dependency "codeclimate-test-reporter"
|
28
29
|
spec.add_development_dependency "pry"
|
30
|
+
spec.add_development_dependency "gem_release_helper", "~> 1.0"
|
29
31
|
end
|
@@ -1,3 +1,4 @@
|
|
1
|
+
require "perfect_retry"
|
1
2
|
require "active_support/core_ext/time"
|
2
3
|
require "google/apis/analyticsreporting_v4"
|
3
4
|
require "google/apis/analytics_v3"
|
@@ -46,8 +47,9 @@ module Embulk
|
|
46
47
|
dim = dimensions.zip(row[:dimensions]).to_h
|
47
48
|
met = metrics.zip(row[:metrics].first[:values]).to_h
|
48
49
|
format_row = dim.merge(met)
|
49
|
-
|
50
|
-
|
50
|
+
raw_time = format_row[task["time_series"]]
|
51
|
+
next if too_early_data?(raw_time)
|
52
|
+
format_row[task["time_series"]] = time_parse_with_profile_timezone(raw_time)
|
51
53
|
block.call format_row
|
52
54
|
end
|
53
55
|
|
@@ -80,7 +82,9 @@ module Embulk
|
|
80
82
|
service.authorization = auth
|
81
83
|
|
82
84
|
Embulk.logger.debug "Fetching profile from API"
|
83
|
-
|
85
|
+
retryer.with_retry do
|
86
|
+
service.list_profiles("~all", "~all")
|
87
|
+
end
|
84
88
|
end
|
85
89
|
|
86
90
|
def time_parse_with_profile_timezone(time_string)
|
@@ -93,11 +97,9 @@ module Embulk
|
|
93
97
|
end
|
94
98
|
parts = Date._strptime(time_string, date_format)
|
95
99
|
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
ensure
|
100
|
-
Time.zone = orig_timezone
|
100
|
+
swap_time_zone do
|
101
|
+
Time.zone.local(*parts.values_at(:year, :mon, :mday, :hour)).to_time
|
102
|
+
end
|
101
103
|
end
|
102
104
|
|
103
105
|
def get_reports(page_token = nil)
|
@@ -109,14 +111,18 @@ module Embulk
|
|
109
111
|
request.report_requests = build_report_request(page_token)
|
110
112
|
|
111
113
|
Embulk.logger.info "Query to Core Report API: #{request.to_json}"
|
112
|
-
|
114
|
+
retryer.with_retry do
|
115
|
+
service.batch_get_reports request
|
116
|
+
end
|
113
117
|
end
|
114
118
|
|
115
119
|
def get_columns_list
|
116
120
|
# https://developers.google.com/analytics/devguides/reporting/metadata/v3/reference/metadata/columns/list
|
117
121
|
service = Google::Apis::AnalyticsV3::AnalyticsService.new
|
118
122
|
service.authorization = auth
|
119
|
-
|
123
|
+
retryer.with_retry do
|
124
|
+
service.list_metadata_columns("ga").to_h[:items]
|
125
|
+
end
|
120
126
|
end
|
121
127
|
|
122
128
|
def build_report_request(page_token = nil)
|
@@ -147,13 +153,53 @@ module Embulk
|
|
147
153
|
end
|
148
154
|
|
149
155
|
def auth
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
156
|
+
retryer.with_retry do
|
157
|
+
Google::Auth::ServiceAccountCredentials.make_creds(
|
158
|
+
json_key_io: StringIO.new(task["json_key_content"]),
|
159
|
+
scope: "https://www.googleapis.com/auth/analytics.readonly"
|
160
|
+
)
|
161
|
+
end
|
162
|
+
rescue Google::Apis::AuthorizationError => e
|
155
163
|
raise ConfigError.new(e.message)
|
156
164
|
end
|
165
|
+
|
166
|
+
def swap_time_zone(&block)
|
167
|
+
orig_timezone = Time.zone
|
168
|
+
Time.zone = get_profile[:timezone]
|
169
|
+
yield
|
170
|
+
ensure
|
171
|
+
Time.zone = orig_timezone
|
172
|
+
end
|
173
|
+
|
174
|
+
def too_early_data?(time_str)
|
175
|
+
# fetching 20160720 data on 2016-07-20, it is too early fetching
|
176
|
+
swap_time_zone do
|
177
|
+
now = Time.zone.now
|
178
|
+
case task["time_series"]
|
179
|
+
when "ga:dateHour"
|
180
|
+
time_str.to_i >= now.strftime("%Y%m%d%H").to_i
|
181
|
+
when "ga:date"
|
182
|
+
time_str.to_i >= now.strftime("%Y%m%d").to_i
|
183
|
+
end
|
184
|
+
end
|
185
|
+
end
|
186
|
+
|
187
|
+
def retryer
|
188
|
+
PerfectRetry.new do |config|
|
189
|
+
config.limit = task["retry_limit"]
|
190
|
+
config.logger = Embulk.logger
|
191
|
+
config.log_level = nil
|
192
|
+
|
193
|
+
# https://developers.google.com/analytics/devguides/reporting/core/v4/errors
|
194
|
+
# https://developers.google.com/analytics/devguides/reporting/core/v4/limits-quotas#additional_quota
|
195
|
+
# https://github.com/google/google-api-ruby-client/blob/master/lib/google/apis/errors.rb
|
196
|
+
# https://github.com/google/google-api-ruby-client/blob/0.9.11/lib/google/apis/core/http_command.rb#L33
|
197
|
+
config.rescues = Google::Apis::Core::HttpCommand::RETRIABLE_ERRORS
|
198
|
+
config.dont_rescues = [Embulk::DataError, Embulk::ConfigError]
|
199
|
+
config.sleep = lambda{|n| task["retry_initial_wait_sec"]* (2 ** (n-1)) }
|
200
|
+
config.raise_original_error = true
|
201
|
+
end
|
202
|
+
end
|
157
203
|
end
|
158
204
|
end
|
159
205
|
end
|
@@ -42,7 +42,7 @@ module Embulk
|
|
42
42
|
def self.resume(task, columns, count, &control)
|
43
43
|
task_reports = yield(task, columns, count)
|
44
44
|
|
45
|
-
next_config_diff =
|
45
|
+
next_config_diff = task_reports.first
|
46
46
|
return next_config_diff
|
47
47
|
end
|
48
48
|
|
@@ -56,6 +56,10 @@ module Embulk
|
|
56
56
|
"time_series" => config.param("time_series", :string),
|
57
57
|
"start_date" => config.param("start_date", :string, default: nil),
|
58
58
|
"end_date" => config.param("end_date", :string, default: nil),
|
59
|
+
"incremental" => config.param("incremental", :bool, default: true),
|
60
|
+
"last_record_time" => config.param("last_record_time", :string, default: nil),
|
61
|
+
"retry_limit" => config.param("retry_limit", :integer, default: 5),
|
62
|
+
"retry_initial_wait_sec" => config.param("retry_initial_wait_sec", :integer, default: 2),
|
59
63
|
}
|
60
64
|
end
|
61
65
|
|
@@ -79,14 +83,28 @@ module Embulk
|
|
79
83
|
client = Client.new(task, preview?)
|
80
84
|
columns = self.class.columns_from_task(task)
|
81
85
|
|
86
|
+
last_record_time = task["last_record_time"] ? Time.parse(task["last_record_time"]) : nil
|
87
|
+
|
88
|
+
latest_time_series = nil
|
82
89
|
client.each_report_row do |row|
|
90
|
+
time = row[task["time_series"]]
|
91
|
+
next if last_record_time && time <= last_record_time
|
92
|
+
|
83
93
|
values = row.values_at(*columns)
|
84
94
|
page_builder.add values
|
95
|
+
|
96
|
+
latest_time_series = [
|
97
|
+
latest_time_series,
|
98
|
+
time,
|
99
|
+
].compact.max
|
85
100
|
end
|
86
101
|
page_builder.finish
|
87
102
|
|
88
|
-
|
89
|
-
|
103
|
+
if task["incremental"]
|
104
|
+
calculate_next_times(latest_time_series)
|
105
|
+
else
|
106
|
+
{}
|
107
|
+
end
|
90
108
|
end
|
91
109
|
|
92
110
|
def preview?
|
@@ -95,6 +113,49 @@ module Embulk
|
|
95
113
|
false
|
96
114
|
end
|
97
115
|
|
116
|
+
def calculate_next_times(fetched_latest_time)
|
117
|
+
task_report = {}
|
118
|
+
|
119
|
+
if fetched_latest_time
|
120
|
+
task_report[:start_date] = fetched_latest_time.strftime("%Y-%m-%d")
|
121
|
+
|
122
|
+
# if end_date specified as statically YYYY-MM-DD, it will be conflict with start_date (end_date < start_date)
|
123
|
+
# Modify it as "today" to be safe
|
124
|
+
if task["end_date"].match(/[0-9]{4}-[0-9]{2}-[0-9]{2}/)
|
125
|
+
task_report[:end_date] = "today" # "today" means now. running at 03:30 AM, will got 3 o'clock data.
|
126
|
+
end
|
127
|
+
|
128
|
+
# "start_date" format is YYYY-MM-DD, but ga:dateHour will return records by hourly.
|
129
|
+
# If run at 2016-07-03 05:00:00, start_date will set "2016-07-03" and got records until 2016-07-03 05:00:00.
|
130
|
+
# Then next run at 2016-07-04 05:00, will got records between 2016-07-03 00:00:00 and 2016-07-04 05:00:00.
|
131
|
+
# It will evantually duplicated between 2016-07-03 00:00:00 and 2016-07-03 05:00:00
|
132
|
+
#
|
133
|
+
# Date| 2016-07-03 | 2016-07-04
|
134
|
+
# Hour| 5 | 5
|
135
|
+
# 1st run ------|----| |
|
136
|
+
# 2nd run |------------------------|-----
|
137
|
+
# ^^^^^ duplicated
|
138
|
+
#
|
139
|
+
# "last_record_time" option solves that problem
|
140
|
+
#
|
141
|
+
# Date| 2016-07-03 | 2016-07-04
|
142
|
+
# Hour| 5 | 5
|
143
|
+
# 1st run ------|----| |
|
144
|
+
# 2nd run #####|-------------------|-----
|
145
|
+
# ^^^^^ ignored (skipped)
|
146
|
+
#
|
147
|
+
task_report[:last_record_time] = fetched_latest_time.strftime("%Y-%m-%d %H:%M:%S %z")
|
148
|
+
else
|
149
|
+
# no records fetched, don't modify config_diff
|
150
|
+
task_report = {
|
151
|
+
start_date: task["start_date"],
|
152
|
+
end_date: task["end_date"],
|
153
|
+
last_record_time: task["last_record_time"],
|
154
|
+
}
|
155
|
+
end
|
156
|
+
|
157
|
+
task_report
|
158
|
+
end
|
98
159
|
end
|
99
160
|
end
|
100
161
|
end
|
data/service_account.png
ADDED
Binary file
|
@@ -179,17 +179,95 @@ module Embulk
|
|
179
179
|
sub_test_case "auth" do
|
180
180
|
setup do
|
181
181
|
conf = valid_config["in"]
|
182
|
+
mute_logger
|
182
183
|
@client = Client.new(task(embulk_config(conf)))
|
183
184
|
end
|
184
185
|
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
186
|
+
sub_test_case "retry" do
|
187
|
+
def should_retry
|
188
|
+
mock(Google::Auth::ServiceAccountCredentials).make_creds(anything).times(retryer.config.limit + 1) { raise error }
|
189
|
+
assert_raise do
|
190
|
+
@client.auth
|
191
|
+
end
|
192
|
+
end
|
193
|
+
|
194
|
+
def should_not_retry
|
195
|
+
mock(Google::Auth::ServiceAccountCredentials).make_creds(anything).times(1) { raise error }
|
196
|
+
assert_raise do
|
197
|
+
@client.auth
|
198
|
+
end
|
199
|
+
end
|
200
|
+
|
201
|
+
setup do
|
202
|
+
# stub(Google::Auth::ServiceAccountCredentials).make_creds { raise error }
|
203
|
+
end
|
204
|
+
|
205
|
+
sub_test_case "Server error (5xx)" do
|
206
|
+
def error
|
207
|
+
Google::Apis::ServerError.new("error")
|
208
|
+
end
|
209
|
+
|
210
|
+
test "should retry" do
|
211
|
+
should_retry
|
212
|
+
end
|
213
|
+
end
|
214
|
+
|
215
|
+
sub_test_case "Rate Limit" do
|
216
|
+
def error
|
217
|
+
Google::Apis::RateLimitError.new("error")
|
218
|
+
end
|
219
|
+
|
220
|
+
test "should retry" do
|
221
|
+
should_retry
|
222
|
+
end
|
223
|
+
end
|
224
|
+
|
225
|
+
sub_test_case "Auth Error" do
|
226
|
+
def error
|
227
|
+
Google::Apis::AuthorizationError.new("error")
|
228
|
+
end
|
229
|
+
|
230
|
+
test "should not retry" do
|
231
|
+
should_not_retry
|
232
|
+
end
|
189
233
|
end
|
190
234
|
end
|
191
235
|
end
|
192
236
|
|
237
|
+
sub_test_case "too_early_data?" do
|
238
|
+
def stub_timezone(client)
|
239
|
+
stub(client).get_profile { {timezone: "America/Los_Angeles" } }
|
240
|
+
stub(client).swap_time_zone do |block|
|
241
|
+
stub(Time.zone).now { @now }
|
242
|
+
block.call
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
test "ga:dateHour" do
|
247
|
+
conf = valid_config["in"]
|
248
|
+
conf["time_series"] = "ga:dateHour"
|
249
|
+
client = Client.new(task(embulk_config(conf)))
|
250
|
+
@now = Time.parse("2016-06-01 05:00:00 PDT")
|
251
|
+
stub_timezone(client)
|
252
|
+
|
253
|
+
assert_equal false, client.too_early_data?("2016060104")
|
254
|
+
assert_equal true , client.too_early_data?("2016060105")
|
255
|
+
assert_equal true , client.too_early_data?("2016060106")
|
256
|
+
end
|
257
|
+
|
258
|
+
test "ga:date" do
|
259
|
+
conf = valid_config["in"]
|
260
|
+
conf["time_series"] = "ga:date"
|
261
|
+
client = Client.new(task(embulk_config(conf)))
|
262
|
+
@now = Time.parse("2016-06-03 05:00:00 PDT")
|
263
|
+
stub_timezone(client)
|
264
|
+
|
265
|
+
assert_equal false, client.too_early_data?("20160601")
|
266
|
+
assert_equal false, client.too_early_data?("20160602")
|
267
|
+
assert_equal true , client.too_early_data?("20160603")
|
268
|
+
end
|
269
|
+
end
|
270
|
+
|
193
271
|
sub_test_case "each_report_row" do
|
194
272
|
setup do
|
195
273
|
conf = valid_config["in"]
|
@@ -284,6 +362,15 @@ module Embulk
|
|
284
362
|
def embulk_config(hash)
|
285
363
|
Embulk::DataSource.new(hash)
|
286
364
|
end
|
365
|
+
|
366
|
+
def mute_logger
|
367
|
+
@logger = Logger.new(File::NULL)
|
368
|
+
stub(Embulk).logger { @logger }
|
369
|
+
end
|
370
|
+
|
371
|
+
def retryer
|
372
|
+
@client.retryer
|
373
|
+
end
|
287
374
|
end
|
288
375
|
end
|
289
376
|
end
|
@@ -156,6 +156,37 @@ module Embulk
|
|
156
156
|
mock(@page_builder).finish
|
157
157
|
@plugin.run
|
158
158
|
end
|
159
|
+
|
160
|
+
sub_test_case "last_record_time option" do
|
161
|
+
setup do
|
162
|
+
Time.zone = "America/Los_Angeles"
|
163
|
+
@last_record_time = Time.zone.parse("2016-06-01 12:00:00").to_time
|
164
|
+
|
165
|
+
conf = valid_config["in"]
|
166
|
+
conf["time_series"] = time_series
|
167
|
+
conf["last_record_time"] = @last_record_time.strftime("%Y-%m-%d %H:%M:%S %z")
|
168
|
+
@plugin = Plugin.new(embulk_config(conf), nil, nil, @page_builder)
|
169
|
+
end
|
170
|
+
|
171
|
+
test "ignore records when old" do
|
172
|
+
any_instance_of(Client) do |klass|
|
173
|
+
stub(klass).each_report_row do |block|
|
174
|
+
row = {
|
175
|
+
"ga:dateHour" => @last_record_time,
|
176
|
+
"ga:browser" => "wget",
|
177
|
+
"ga:visits" => 3,
|
178
|
+
"ga:pageviews" => 4,
|
179
|
+
}
|
180
|
+
block.call row
|
181
|
+
end
|
182
|
+
end
|
183
|
+
|
184
|
+
mock(@page_builder).add.never
|
185
|
+
mock(@page_builder).finish
|
186
|
+
@plugin.run
|
187
|
+
end
|
188
|
+
end
|
189
|
+
|
159
190
|
end
|
160
191
|
|
161
192
|
sub_test_case "time_series: 'ga:date'" do
|
@@ -182,6 +213,36 @@ module Embulk
|
|
182
213
|
mock(@page_builder).finish
|
183
214
|
@plugin.run
|
184
215
|
end
|
216
|
+
|
217
|
+
sub_test_case "last_record_time option" do
|
218
|
+
setup do
|
219
|
+
Time.zone = "America/Los_Angeles"
|
220
|
+
@last_record_time = Time.zone.parse("2016-06-01 12:00:00").to_time
|
221
|
+
|
222
|
+
conf = valid_config["in"]
|
223
|
+
conf["time_series"] = time_series
|
224
|
+
conf["last_record_time"] = @last_record_time.strftime("%Y-%m-%d %H:%M:%S %z")
|
225
|
+
@plugin = Plugin.new(embulk_config(conf), nil, nil, @page_builder)
|
226
|
+
end
|
227
|
+
|
228
|
+
test "ignore records when old" do
|
229
|
+
any_instance_of(Client) do |klass|
|
230
|
+
stub(klass).each_report_row do |block|
|
231
|
+
row = {
|
232
|
+
"ga:date" => @last_record_time,
|
233
|
+
"ga:browser" => "wget",
|
234
|
+
"ga:visits" => 3,
|
235
|
+
"ga:pageviews" => 4,
|
236
|
+
}
|
237
|
+
block.call row
|
238
|
+
end
|
239
|
+
end
|
240
|
+
|
241
|
+
mock(@page_builder).add.never
|
242
|
+
mock(@page_builder).finish
|
243
|
+
@plugin.run
|
244
|
+
end
|
245
|
+
end
|
185
246
|
end
|
186
247
|
end
|
187
248
|
end
|
@@ -201,6 +262,135 @@ module Embulk
|
|
201
262
|
end
|
202
263
|
end
|
203
264
|
|
265
|
+
sub_test_case "calculate_next_times" do
|
266
|
+
setup do
|
267
|
+
@page_builder = Object.new
|
268
|
+
@config = embulk_config(valid_config["in"])
|
269
|
+
end
|
270
|
+
|
271
|
+
sub_test_case "ga:dateHour" do
|
272
|
+
setup do
|
273
|
+
conf = valid_config["in"]
|
274
|
+
conf["time_series"] = "ga:dateHour"
|
275
|
+
@config = embulk_config(conf)
|
276
|
+
end
|
277
|
+
|
278
|
+
sub_test_case "no records fetched" do
|
279
|
+
test "config_diff won't modify" do
|
280
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
281
|
+
expected = {
|
282
|
+
start_date: task["start_date"],
|
283
|
+
end_date: task["end_date"],
|
284
|
+
last_record_time: task["last_record_time"],
|
285
|
+
}
|
286
|
+
assert_equal expected, plugin.calculate_next_times(nil)
|
287
|
+
end
|
288
|
+
end
|
289
|
+
|
290
|
+
sub_test_case "updated" do
|
291
|
+
sub_test_case "end_date is given as YYYY-MM-DD" do
|
292
|
+
setup do
|
293
|
+
@config[:start_date] = "2000-01-01"
|
294
|
+
@config[:end_date] = "2000-01-05"
|
295
|
+
end
|
296
|
+
|
297
|
+
test "config_diff will modify" do
|
298
|
+
latest_time = Time.parse("2000-01-07")
|
299
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
300
|
+
expected = {
|
301
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
302
|
+
end_date: "today",
|
303
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
304
|
+
}
|
305
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
306
|
+
end
|
307
|
+
end
|
308
|
+
|
309
|
+
sub_test_case "end_date is given as nDaysAgo" do
|
310
|
+
setup do
|
311
|
+
@config[:start_date] = "2000-01-01"
|
312
|
+
@config[:end_date] = "10DaysAgo"
|
313
|
+
end
|
314
|
+
|
315
|
+
test "config_diff end_date won't modify" do
|
316
|
+
latest_time = Time.parse("2000-01-07")
|
317
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
318
|
+
expected = {
|
319
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
320
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
321
|
+
}
|
322
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
323
|
+
end
|
324
|
+
end
|
325
|
+
end
|
326
|
+
end
|
327
|
+
|
328
|
+
sub_test_case "ga:date" do
|
329
|
+
setup do
|
330
|
+
conf = valid_config["in"]
|
331
|
+
conf["time_series"] = "ga:date"
|
332
|
+
@config = embulk_config(conf)
|
333
|
+
end
|
334
|
+
|
335
|
+
sub_test_case "no records fetched" do
|
336
|
+
test "config_diff will keep previous" do
|
337
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
338
|
+
expected = {
|
339
|
+
start_date: task["start_date"],
|
340
|
+
end_date: task["end_date"],
|
341
|
+
last_record_time: task["last_record_time"],
|
342
|
+
}
|
343
|
+
assert_equal expected, plugin.calculate_next_times(nil)
|
344
|
+
end
|
345
|
+
end
|
346
|
+
|
347
|
+
sub_test_case "updated" do
|
348
|
+
sub_test_case "end_date is given as YYYY-MM-DD" do
|
349
|
+
setup do
|
350
|
+
@config[:start_date] = "2000-01-01"
|
351
|
+
@config[:end_date] = "2000-01-05"
|
352
|
+
end
|
353
|
+
|
354
|
+
test "config_diff will modify" do
|
355
|
+
latest_time = Time.parse("2000-01-07")
|
356
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
357
|
+
expected = {
|
358
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
359
|
+
end_date: "today",
|
360
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
361
|
+
}
|
362
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
363
|
+
end
|
364
|
+
end
|
365
|
+
|
366
|
+
sub_test_case "end_date is given as nDaysAgo" do
|
367
|
+
setup do
|
368
|
+
@config[:start_date] = "2000-01-01"
|
369
|
+
@config[:end_date] = "10DaysAgo"
|
370
|
+
end
|
371
|
+
|
372
|
+
test "config_diff end_date won't modify" do
|
373
|
+
latest_time = Time.parse("2000-01-07")
|
374
|
+
plugin = Plugin.new(config, nil, nil, @page_builder)
|
375
|
+
expected = {
|
376
|
+
start_date: latest_time.strftime("%Y-%m-%d"),
|
377
|
+
last_record_time: latest_time.strftime("%Y-%m-%d %H:%M:%S %z"),
|
378
|
+
}
|
379
|
+
assert_equal expected, plugin.calculate_next_times(latest_time)
|
380
|
+
end
|
381
|
+
end
|
382
|
+
end
|
383
|
+
end
|
384
|
+
|
385
|
+
def task
|
386
|
+
Plugin.task_from_config(@config)
|
387
|
+
end
|
388
|
+
|
389
|
+
def config
|
390
|
+
@config
|
391
|
+
end
|
392
|
+
end
|
393
|
+
|
204
394
|
def valid_config
|
205
395
|
fixture_load("valid.yml")
|
206
396
|
end
|
data/test/fixtures/valid.yml
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-input-google_analytics
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- uu59
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-07-
|
11
|
+
date: 2016-07-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -66,6 +66,20 @@ dependencies:
|
|
66
66
|
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
requirement: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - "~>"
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '0.5'
|
75
|
+
name: perfect_retry
|
76
|
+
prerelease: false
|
77
|
+
type: :runtime
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - "~>"
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0.5'
|
69
83
|
- !ruby/object:Gem::Dependency
|
70
84
|
requirement: !ruby/object:Gem::Requirement
|
71
85
|
requirements:
|
@@ -178,6 +192,20 @@ dependencies:
|
|
178
192
|
- - ">="
|
179
193
|
- !ruby/object:Gem::Version
|
180
194
|
version: '0'
|
195
|
+
- !ruby/object:Gem::Dependency
|
196
|
+
requirement: !ruby/object:Gem::Requirement
|
197
|
+
requirements:
|
198
|
+
- - "~>"
|
199
|
+
- !ruby/object:Gem::Version
|
200
|
+
version: '1.0'
|
201
|
+
name: gem_release_helper
|
202
|
+
prerelease: false
|
203
|
+
type: :development
|
204
|
+
version_requirements: !ruby/object:Gem::Requirement
|
205
|
+
requirements:
|
206
|
+
- - "~>"
|
207
|
+
- !ruby/object:Gem::Version
|
208
|
+
version: '1.0'
|
181
209
|
description: Loads records from Google Analytics.
|
182
210
|
email:
|
183
211
|
- k@uu59.org
|
@@ -197,6 +225,7 @@ files:
|
|
197
225
|
- lib/embulk/input/google_analytics.rb
|
198
226
|
- lib/embulk/input/google_analytics/client.rb
|
199
227
|
- lib/embulk/input/google_analytics/plugin.rb
|
228
|
+
- service_account.png
|
200
229
|
- test/embulk/input/google_analytics/test_client.rb
|
201
230
|
- test/embulk/input/google_analytics/test_plugin.rb
|
202
231
|
- test/fixture_helper.rb
|