embulk-input-mixpanel 0.4.2 → 0.4.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +3 -0
- data/README.md +44 -1
- data/Rakefile +1 -1
- data/embulk-input-mixpanel.gemspec +1 -1
- data/lib/embulk/input/mixpanel.rb +35 -0
- data/test/embulk/input/test_mixpanel.rb +91 -0
- data/test/run-test.rb +7 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 82bac0d9e5e93a4fc606b3d43f1e9cd35debf4b3
|
4
|
+
data.tar.gz: 82e5c6323b3d9db15ff54589b572b642cb6d93e1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e606300ab082a98aa443432a0b4f4246c40a942ca7724bb68679d66a889fce60cb36eb95b7dcf59e7a35181bb3175efc785cb3710d7764b58d800209589a140d
|
7
|
+
data.tar.gz: abf1f2528faf56e7c6e65d72d4f8e34a6beebda7f93de1eb16bd93130de2c4b647cf716053ec039a7ddd8b1fdbfa9c62864bb280334630b53af1f15db6137cd3
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,6 @@
|
|
1
|
+
## 0.4.3 - 2016-03-16
|
2
|
+
* [enhancement] Custom properties json [#40](https://github.com/treasure-data/embulk-input-mixpanel/pull/40)
|
3
|
+
|
1
4
|
## 0.4.2 - 2016-03-08
|
2
5
|
* [fixed] Fix Range request was not satisfied [#39](https://github.com/treasure-data/embulk-input-mixpanel/pull/39)
|
3
6
|
|
data/README.md
CHANGED
@@ -38,14 +38,57 @@ To get it, you should log in mixpanel website, and click gear icon at the lower
|
|
38
38
|
- NOTE: Mixpanel API supports to export data from at least 2 days before to at most the previous day.
|
39
39
|
- **fetch_days**: Count of days range for exporting (integer, optional, default: from_date - (today - 1))
|
40
40
|
- NOTE: Mixpanel doesn't support to from_date > today - 2
|
41
|
-
- **fetch_unknown_columns
|
41
|
+
- **fetch_unknown_columns**(deprecated): If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: true)
|
42
42
|
- NOTE: If true, `unknown_columns` column is created and added unknown columns' data.
|
43
|
+
- **fetch_custom_properties**: All custom properties into `custom_properties` key. "custom properties" are not desribed Mixpanel document [1](https://mixpanel.com/help/questions/articles/special-or-reserved-properties), [2](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default). (boolean, optional, default: false)
|
44
|
+
- NOTE: Cannot set both `fetch_unknown_columns` and `fetch_custom_properties` to `true`.
|
43
45
|
- **event**: The event or events to filter data (array, optional, default: nil)
|
44
46
|
- **where**: Expression to filter data (c.f. https://mixpanel.com/docs/api-documentation/data-export-api#segmentation-expressions) (string, optional, default: nil)
|
45
47
|
- **bucket**:The data backet to filter data (string, optional, default: nil)
|
46
48
|
- **retry_initial_wait_sec** Wait seconds for exponential backoff initial value (integer, default: 1)
|
47
49
|
- **retry_limit**: Try to retry this times (integer, default: 5)
|
48
50
|
|
51
|
+
### `fetch_unknown_columns` and `fetch_custom_properties`
|
52
|
+
|
53
|
+
If you have such data and set config.yml as below.
|
54
|
+
|
55
|
+
| event | $city | $custom | $foobar |
|
56
|
+
| ----- | ------- | ------- | ------- |
|
57
|
+
| ev | Tokyo | custom | foobar |
|
58
|
+
|
59
|
+
(NOTE: `$city` is a [reserved key](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default), `$custom` and `$foobar` are not)
|
60
|
+
|
61
|
+
```yaml
|
62
|
+
in:
|
63
|
+
type: mixpanel
|
64
|
+
api_key: "API_KEY"
|
65
|
+
api_secret: "API_SECRET"
|
66
|
+
timezone: "US/Pacific"
|
67
|
+
from_date: "2015-07-19"
|
68
|
+
fetch_days: 5
|
69
|
+
columns:
|
70
|
+
- {name: event, type: string}
|
71
|
+
- {name: $custom, type: string}
|
72
|
+
```
|
73
|
+
|
74
|
+
|
75
|
+
`fetch_unknown_columns: true` will fetch as:
|
76
|
+
|
77
|
+
| event | $custom | unknown_columns (json) |
|
78
|
+
| ----- | ------- | ----------------- |
|
79
|
+
| ev | custom | `{"$city":"Tokyo", "$foobar": "foobar"}` |
|
80
|
+
|
81
|
+
`fetch_custom_properties: true` will fetch as:
|
82
|
+
|
83
|
+
| event | $custom | custom_properties (json) |
|
84
|
+
| ----- | ------- | ----------------- |
|
85
|
+
| ev | custom | `{"$foobar": "foobar"}` |
|
86
|
+
|
87
|
+
|
88
|
+
`fetch_unknown_columns` recognize `$city` and `$foobar` as `unknown_columns` because they are not described in config.yml.
|
89
|
+
|
90
|
+
`fetch_custom_properties` recognize `$foobar` as `custom_properties`. `$custom` is also custom property but it was described in config.yml.
|
91
|
+
|
49
92
|
## Example
|
50
93
|
|
51
94
|
```yaml
|
data/Rakefile
CHANGED
@@ -12,6 +12,18 @@ module Embulk
|
|
12
12
|
GUESS_RECORDS_COUNT = 10
|
13
13
|
NOT_PROPERTY_COLUMN = "event".freeze
|
14
14
|
|
15
|
+
# https://mixpanel.com/help/questions/articles/special-or-reserved-properties
|
16
|
+
# https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default
|
17
|
+
#
|
18
|
+
# JavaScript to extract key names from HTML: run it on Chrome Devtool when opening their document
|
19
|
+
# > Array.from(document.querySelectorAll("strong")).map(function(s){ return s.textContent.match(/[A-Z]/) ? s.parentNode.textContent.match(/\((.*?)\)/)[1] : s.textContent.split(",").join(" ") }).join(" ")
|
20
|
+
# > Array.from(document.querySelectorAll("li")).map(function(s){ m = s.textContent.match(/\((.*?)\)/); return m && m[1] }).filter(function(k) { return k && !k.match("utm") }).join(" ")
|
21
|
+
KNOWN_KEYS = %W(
|
22
|
+
#{NOT_PROPERTY_COLUMN}
|
23
|
+
distinct_id ip mp_name_tag mp_note token time mp_country_code length campaign_id $email $phone $distinct_id $ios_devices $android_devices $first_name $last_name $name $city $region $country_code $timezone $unsubscribed
|
24
|
+
$city $region mp_country_code $browser $browser_version $device $current_url $initial_referrer $initial_referring_domain $os $referrer $referring_domain $screen_height $screen_width $search_engine $city $region $mp_country_code $timezone $browser_version $browser $initial_referrer $initial_referring_domain $os $last_seen $city $region mp_country_code $app_release $app_version $carrier $ios_ifa $os_version $manufacturer $lib_version $model $os $screen_height $screen_width $wifi $city $region $mp_country_code $timezone $ios_app_release $ios_app_version $ios_device_model $ios_lib_version $ios_version $ios_ifa $last_seen $city $region mp_country_code $app_version $bluetooth_enabled $bluetooth_version $brand $carrier $has_nfc $has_telephone $lib_version $manufacturer $model $os $os_version $screen_dpi $screen_height $screen_width $wifi $google_play_services $city $region mp_country_code $timezone $android_app_version $android_app_version_code $android_lib_version $android_os $android_os_version $android_brand $android_model $android_manufacturer $last_seen
|
25
|
+
).uniq.freeze
|
26
|
+
|
15
27
|
# NOTE: It takes long time to fetch data between from_date to
|
16
28
|
# to_date by one API request. So this plugin fetches data
|
17
29
|
# between each 7 (SLICE_DAYS_COUNT) days.
|
@@ -36,10 +48,15 @@ module Embulk
|
|
36
48
|
api_secret: config.param(:api_secret, :string),
|
37
49
|
schema: config.param(:columns, :array),
|
38
50
|
fetch_unknown_columns: fetch_unknown_columns,
|
51
|
+
fetch_custom_properties: config.param(:fetch_custom_properties, :bool, default: false),
|
39
52
|
retry_initial_wait_sec: config.param(:retry_initial_wait_sec, :integer, default: 1),
|
40
53
|
retry_limit: config.param(:retry_limit, :integer, default: 5),
|
41
54
|
}
|
42
55
|
|
56
|
+
if task[:fetch_unknown_columns] && task[:fetch_custom_properties]
|
57
|
+
raise Embulk::ConfigError.new("Don't set true both `fetch_unknown_columns` and `fetch_custom_properties`.")
|
58
|
+
end
|
59
|
+
|
43
60
|
columns = task[:schema].map do |column|
|
44
61
|
name = column["name"]
|
45
62
|
type = column["type"].to_sym
|
@@ -48,9 +65,14 @@ module Embulk
|
|
48
65
|
end
|
49
66
|
|
50
67
|
if fetch_unknown_columns
|
68
|
+
Embulk.logger.warn "Deprecated `unknown_columns`. Use `fetch_custom_properties` instead."
|
51
69
|
columns << Column.new(nil, "unknown_columns", :json)
|
52
70
|
end
|
53
71
|
|
72
|
+
if task[:fetch_custom_properties]
|
73
|
+
columns << Column.new(nil, "custom_properties", :json)
|
74
|
+
end
|
75
|
+
|
54
76
|
resume(task, columns, 1, &control)
|
55
77
|
end
|
56
78
|
|
@@ -115,6 +137,9 @@ module Embulk
|
|
115
137
|
unknown_values = extract_unknown_values(record)
|
116
138
|
values << unknown_values.to_json
|
117
139
|
end
|
140
|
+
if task[:fetch_custom_properties]
|
141
|
+
values << collect_custom_properties(record)
|
142
|
+
end
|
118
143
|
page_builder.add(values)
|
119
144
|
end
|
120
145
|
|
@@ -153,6 +178,16 @@ module Embulk
|
|
153
178
|
end
|
154
179
|
end
|
155
180
|
|
181
|
+
def collect_custom_properties(record)
|
182
|
+
specified_columns = @schema.map{|col| col["name"]}
|
183
|
+
custom_keys = record["properties"].keys.find_all{|key| !KNOWN_KEYS.include?(key.to_s) && !specified_columns.include?(key.to_s) }
|
184
|
+
custom_keys.inject({}) do |result, key|
|
185
|
+
result.merge({
|
186
|
+
key => record["properties"][key]
|
187
|
+
})
|
188
|
+
end
|
189
|
+
end
|
190
|
+
|
156
191
|
def extract_unknown_values(record)
|
157
192
|
record_keys = record["properties"].keys + [NOT_PROPERTY_COLUMN]
|
158
193
|
schema_keys = @schema.map {|column| column["name"]}
|
@@ -335,6 +335,44 @@ module Embulk
|
|
335
335
|
end
|
336
336
|
end
|
337
337
|
|
338
|
+
class TestCustomProps < self
|
339
|
+
setup do
|
340
|
+
stub(Mixpanel).resume {}
|
341
|
+
end
|
342
|
+
|
343
|
+
data(
|
344
|
+
"false/false" => [false, false],
|
345
|
+
"false/true" => [false, true],
|
346
|
+
"true/false" => [true, false],
|
347
|
+
)
|
348
|
+
def test_valid_combination(data)
|
349
|
+
fetch_unknown_columns, fetch_custom_properties = data
|
350
|
+
conf = DataSource[*transaction_config.merge(fetch_unknown_columns: fetch_unknown_columns, fetch_custom_properties: fetch_custom_properties).to_a.flatten(1)]
|
351
|
+
|
352
|
+
assert_nothing_raised do
|
353
|
+
Mixpanel.transaction(conf, &control)
|
354
|
+
end
|
355
|
+
end
|
356
|
+
|
357
|
+
def test_both_true_then_raise_config_error
|
358
|
+
conf = DataSource[*transaction_config.merge(fetch_unknown_columns: true, fetch_custom_properties: true).to_a.flatten(1)]
|
359
|
+
|
360
|
+
assert_raise(Embulk::ConfigError) do
|
361
|
+
Mixpanel.transaction(conf, &control)
|
362
|
+
end
|
363
|
+
end
|
364
|
+
|
365
|
+
private
|
366
|
+
|
367
|
+
def transaction_config
|
368
|
+
config.merge(
|
369
|
+
columns: schema,
|
370
|
+
fetch_days: 2,
|
371
|
+
timezone: "UTC",
|
372
|
+
)
|
373
|
+
end
|
374
|
+
end
|
375
|
+
|
338
376
|
def test_resume
|
339
377
|
today = Date.today
|
340
378
|
control = proc { [{to_date: today.to_s}] }
|
@@ -463,6 +501,7 @@ module Embulk
|
|
463
501
|
dates: DATES.to_a.map(&:to_s),
|
464
502
|
params: Mixpanel.export_params(embulk_config),
|
465
503
|
fetch_unknown_columns: false,
|
504
|
+
fetch_custom_properties: false,
|
466
505
|
retry_initial_wait_sec: 0,
|
467
506
|
retry_limit: 3,
|
468
507
|
}
|
@@ -509,6 +548,56 @@ module Embulk
|
|
509
548
|
@plugin.run
|
510
549
|
end
|
511
550
|
|
551
|
+
class CustomPropertiesTest < self
|
552
|
+
def setup
|
553
|
+
super
|
554
|
+
@page_builder = Object.new
|
555
|
+
@plugin = Mixpanel.new(task, nil, nil, @page_builder)
|
556
|
+
stub(@plugin).fetch { [record] }
|
557
|
+
end
|
558
|
+
|
559
|
+
def test_run
|
560
|
+
stub(@plugin).preview? { false }
|
561
|
+
|
562
|
+
custom_property_keys = %w($foobar)
|
563
|
+
|
564
|
+
added = [
|
565
|
+
record["event"],
|
566
|
+
record["properties"]["$specified"],
|
567
|
+
custom_property_keys.map{|k| {k => record["properties"][k] }}.inject(&:merge)
|
568
|
+
]
|
569
|
+
|
570
|
+
mock(@page_builder).add(added).at_least(1)
|
571
|
+
mock(@page_builder).finish
|
572
|
+
|
573
|
+
@plugin.run
|
574
|
+
end
|
575
|
+
|
576
|
+
private
|
577
|
+
|
578
|
+
def task
|
579
|
+
super.merge(schema: schema, fetch_unknown_columns: false, fetch_custom_properties: true)
|
580
|
+
end
|
581
|
+
|
582
|
+
def record
|
583
|
+
{
|
584
|
+
"event" => "EV",
|
585
|
+
"properties" => {
|
586
|
+
"$os" => "Android",
|
587
|
+
"$specified" => "foo",
|
588
|
+
"$foobar" => "foobar",
|
589
|
+
}
|
590
|
+
}
|
591
|
+
end
|
592
|
+
|
593
|
+
def schema
|
594
|
+
[
|
595
|
+
{"name" => "event", "type" => "string"},
|
596
|
+
{"name" => "$specified", "type" => "string"},
|
597
|
+
]
|
598
|
+
end
|
599
|
+
end
|
600
|
+
|
512
601
|
class UnknownColumnsTest < self
|
513
602
|
def setup
|
514
603
|
super
|
@@ -581,6 +670,7 @@ module Embulk
|
|
581
670
|
dates: DATES.to_a.map(&:to_s),
|
582
671
|
params: Mixpanel.export_params(embulk_config),
|
583
672
|
fetch_unknown_columns: false,
|
673
|
+
fetch_custom_properties: false,
|
584
674
|
retry_initial_wait_sec: 2,
|
585
675
|
retry_limit: 3,
|
586
676
|
}
|
@@ -615,6 +705,7 @@ module Embulk
|
|
615
705
|
from_date: FROM_DATE,
|
616
706
|
fetch_days: DAYS,
|
617
707
|
fetch_unknown_columns: false,
|
708
|
+
fetch_custom_properties: false,
|
618
709
|
retry_initial_wait_sec: 2,
|
619
710
|
retry_limit: 3,
|
620
711
|
}
|
data/test/run-test.rb
CHANGED
@@ -13,6 +13,12 @@ $LOAD_PATH.unshift(test_dir)
|
|
13
13
|
|
14
14
|
ENV["TEST_UNIT_MAX_DIFF_TARGET_STRING_SIZE"] ||= "5000"
|
15
15
|
|
16
|
-
|
16
|
+
if ENV["CI"]
|
17
|
+
require "codeclimate-test-reporter"
|
18
|
+
CodeClimate::TestReporter.start
|
19
|
+
elsif ENV["COVERAGE"]
|
20
|
+
require 'simplecov'
|
21
|
+
SimpleCov.start 'test_frameworks'
|
22
|
+
end
|
17
23
|
|
18
24
|
exit Test::Unit::AutoRunner.run(true, test_dir)
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-input-mixpanel
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- yoshihara
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-03-
|
12
|
+
date: 2016-03-16 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|