embulk-input-mixpanel 0.4.2 → 0.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +3 -0
- data/README.md +44 -1
- data/Rakefile +1 -1
- data/embulk-input-mixpanel.gemspec +1 -1
- data/lib/embulk/input/mixpanel.rb +35 -0
- data/test/embulk/input/test_mixpanel.rb +91 -0
- data/test/run-test.rb +7 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 82bac0d9e5e93a4fc606b3d43f1e9cd35debf4b3
|
4
|
+
data.tar.gz: 82e5c6323b3d9db15ff54589b572b642cb6d93e1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e606300ab082a98aa443432a0b4f4246c40a942ca7724bb68679d66a889fce60cb36eb95b7dcf59e7a35181bb3175efc785cb3710d7764b58d800209589a140d
|
7
|
+
data.tar.gz: abf1f2528faf56e7c6e65d72d4f8e34a6beebda7f93de1eb16bd93130de2c4b647cf716053ec039a7ddd8b1fdbfa9c62864bb280334630b53af1f15db6137cd3
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,6 @@
|
|
1
|
+
## 0.4.3 - 2016-03-16
|
2
|
+
* [enhancement] Custom properties json [#40](https://github.com/treasure-data/embulk-input-mixpanel/pull/40)
|
3
|
+
|
1
4
|
## 0.4.2 - 2016-03-08
|
2
5
|
* [fixed] Fix Range request was not satisfied [#39](https://github.com/treasure-data/embulk-input-mixpanel/pull/39)
|
3
6
|
|
data/README.md
CHANGED
@@ -38,14 +38,57 @@ To get it, you should log in mixpanel website, and click gear icon at the lower
|
|
38
38
|
- NOTE: Mixpanel API supports to export data from at least 2 days before to at most the previous day.
|
39
39
|
- **fetch_days**: Count of days range for exporting (integer, optional, default: from_date - (today - 1))
|
40
40
|
- NOTE: Mixpanel doesn't support to from_date > today - 2
|
41
|
-
- **fetch_unknown_columns
|
41
|
+
- **fetch_unknown_columns**(deprecated): If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: true)
|
42
42
|
- NOTE: If true, `unknown_columns` column is created and added unknown columns' data.
|
43
|
+
- **fetch_custom_properties**: All custom properties into `custom_properties` key. "custom properties" are not desribed Mixpanel document [1](https://mixpanel.com/help/questions/articles/special-or-reserved-properties), [2](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default). (boolean, optional, default: false)
|
44
|
+
- NOTE: Cannot set both `fetch_unknown_columns` and `fetch_custom_properties` to `true`.
|
43
45
|
- **event**: The event or events to filter data (array, optional, default: nil)
|
44
46
|
- **where**: Expression to filter data (c.f. https://mixpanel.com/docs/api-documentation/data-export-api#segmentation-expressions) (string, optional, default: nil)
|
45
47
|
- **bucket**:The data backet to filter data (string, optional, default: nil)
|
46
48
|
- **retry_initial_wait_sec** Wait seconds for exponential backoff initial value (integer, default: 1)
|
47
49
|
- **retry_limit**: Try to retry this times (integer, default: 5)
|
48
50
|
|
51
|
+
### `fetch_unknown_columns` and `fetch_custom_properties`
|
52
|
+
|
53
|
+
If you have such data and set config.yml as below.
|
54
|
+
|
55
|
+
| event | $city | $custom | $foobar |
|
56
|
+
| ----- | ------- | ------- | ------- |
|
57
|
+
| ev | Tokyo | custom | foobar |
|
58
|
+
|
59
|
+
(NOTE: `$city` is a [reserved key](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default), `$custom` and `$foobar` are not)
|
60
|
+
|
61
|
+
```yaml
|
62
|
+
in:
|
63
|
+
type: mixpanel
|
64
|
+
api_key: "API_KEY"
|
65
|
+
api_secret: "API_SECRET"
|
66
|
+
timezone: "US/Pacific"
|
67
|
+
from_date: "2015-07-19"
|
68
|
+
fetch_days: 5
|
69
|
+
columns:
|
70
|
+
- {name: event, type: string}
|
71
|
+
- {name: $custom, type: string}
|
72
|
+
```
|
73
|
+
|
74
|
+
|
75
|
+
`fetch_unknown_columns: true` will fetch as:
|
76
|
+
|
77
|
+
| event | $custom | unknown_columns (json) |
|
78
|
+
| ----- | ------- | ----------------- |
|
79
|
+
| ev | custom | `{"$city":"Tokyo", "$foobar": "foobar"}` |
|
80
|
+
|
81
|
+
`fetch_custom_properties: true` will fetch as:
|
82
|
+
|
83
|
+
| event | $custom | custom_properties (json) |
|
84
|
+
| ----- | ------- | ----------------- |
|
85
|
+
| ev | custom | `{"$foobar": "foobar"}` |
|
86
|
+
|
87
|
+
|
88
|
+
`fetch_unknown_columns` recognize `$city` and `$foobar` as `unknown_columns` because they are not described in config.yml.
|
89
|
+
|
90
|
+
`fetch_custom_properties` recognize `$foobar` as `custom_properties`. `$custom` is also custom property but it was described in config.yml.
|
91
|
+
|
49
92
|
## Example
|
50
93
|
|
51
94
|
```yaml
|
data/Rakefile
CHANGED
@@ -12,6 +12,18 @@ module Embulk
|
|
12
12
|
GUESS_RECORDS_COUNT = 10
|
13
13
|
NOT_PROPERTY_COLUMN = "event".freeze
|
14
14
|
|
15
|
+
# https://mixpanel.com/help/questions/articles/special-or-reserved-properties
|
16
|
+
# https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default
|
17
|
+
#
|
18
|
+
# JavaScript to extract key names from HTML: run it on Chrome Devtool when opening their document
|
19
|
+
# > Array.from(document.querySelectorAll("strong")).map(function(s){ return s.textContent.match(/[A-Z]/) ? s.parentNode.textContent.match(/\((.*?)\)/)[1] : s.textContent.split(",").join(" ") }).join(" ")
|
20
|
+
# > Array.from(document.querySelectorAll("li")).map(function(s){ m = s.textContent.match(/\((.*?)\)/); return m && m[1] }).filter(function(k) { return k && !k.match("utm") }).join(" ")
|
21
|
+
KNOWN_KEYS = %W(
|
22
|
+
#{NOT_PROPERTY_COLUMN}
|
23
|
+
distinct_id ip mp_name_tag mp_note token time mp_country_code length campaign_id $email $phone $distinct_id $ios_devices $android_devices $first_name $last_name $name $city $region $country_code $timezone $unsubscribed
|
24
|
+
$city $region mp_country_code $browser $browser_version $device $current_url $initial_referrer $initial_referring_domain $os $referrer $referring_domain $screen_height $screen_width $search_engine $city $region $mp_country_code $timezone $browser_version $browser $initial_referrer $initial_referring_domain $os $last_seen $city $region mp_country_code $app_release $app_version $carrier $ios_ifa $os_version $manufacturer $lib_version $model $os $screen_height $screen_width $wifi $city $region $mp_country_code $timezone $ios_app_release $ios_app_version $ios_device_model $ios_lib_version $ios_version $ios_ifa $last_seen $city $region mp_country_code $app_version $bluetooth_enabled $bluetooth_version $brand $carrier $has_nfc $has_telephone $lib_version $manufacturer $model $os $os_version $screen_dpi $screen_height $screen_width $wifi $google_play_services $city $region mp_country_code $timezone $android_app_version $android_app_version_code $android_lib_version $android_os $android_os_version $android_brand $android_model $android_manufacturer $last_seen
|
25
|
+
).uniq.freeze
|
26
|
+
|
15
27
|
# NOTE: It takes long time to fetch data between from_date to
|
16
28
|
# to_date by one API request. So this plugin fetches data
|
17
29
|
# between each 7 (SLICE_DAYS_COUNT) days.
|
@@ -36,10 +48,15 @@ module Embulk
|
|
36
48
|
api_secret: config.param(:api_secret, :string),
|
37
49
|
schema: config.param(:columns, :array),
|
38
50
|
fetch_unknown_columns: fetch_unknown_columns,
|
51
|
+
fetch_custom_properties: config.param(:fetch_custom_properties, :bool, default: false),
|
39
52
|
retry_initial_wait_sec: config.param(:retry_initial_wait_sec, :integer, default: 1),
|
40
53
|
retry_limit: config.param(:retry_limit, :integer, default: 5),
|
41
54
|
}
|
42
55
|
|
56
|
+
if task[:fetch_unknown_columns] && task[:fetch_custom_properties]
|
57
|
+
raise Embulk::ConfigError.new("Don't set true both `fetch_unknown_columns` and `fetch_custom_properties`.")
|
58
|
+
end
|
59
|
+
|
43
60
|
columns = task[:schema].map do |column|
|
44
61
|
name = column["name"]
|
45
62
|
type = column["type"].to_sym
|
@@ -48,9 +65,14 @@ module Embulk
|
|
48
65
|
end
|
49
66
|
|
50
67
|
if fetch_unknown_columns
|
68
|
+
Embulk.logger.warn "Deprecated `unknown_columns`. Use `fetch_custom_properties` instead."
|
51
69
|
columns << Column.new(nil, "unknown_columns", :json)
|
52
70
|
end
|
53
71
|
|
72
|
+
if task[:fetch_custom_properties]
|
73
|
+
columns << Column.new(nil, "custom_properties", :json)
|
74
|
+
end
|
75
|
+
|
54
76
|
resume(task, columns, 1, &control)
|
55
77
|
end
|
56
78
|
|
@@ -115,6 +137,9 @@ module Embulk
|
|
115
137
|
unknown_values = extract_unknown_values(record)
|
116
138
|
values << unknown_values.to_json
|
117
139
|
end
|
140
|
+
if task[:fetch_custom_properties]
|
141
|
+
values << collect_custom_properties(record)
|
142
|
+
end
|
118
143
|
page_builder.add(values)
|
119
144
|
end
|
120
145
|
|
@@ -153,6 +178,16 @@ module Embulk
|
|
153
178
|
end
|
154
179
|
end
|
155
180
|
|
181
|
+
def collect_custom_properties(record)
|
182
|
+
specified_columns = @schema.map{|col| col["name"]}
|
183
|
+
custom_keys = record["properties"].keys.find_all{|key| !KNOWN_KEYS.include?(key.to_s) && !specified_columns.include?(key.to_s) }
|
184
|
+
custom_keys.inject({}) do |result, key|
|
185
|
+
result.merge({
|
186
|
+
key => record["properties"][key]
|
187
|
+
})
|
188
|
+
end
|
189
|
+
end
|
190
|
+
|
156
191
|
def extract_unknown_values(record)
|
157
192
|
record_keys = record["properties"].keys + [NOT_PROPERTY_COLUMN]
|
158
193
|
schema_keys = @schema.map {|column| column["name"]}
|
@@ -335,6 +335,44 @@ module Embulk
|
|
335
335
|
end
|
336
336
|
end
|
337
337
|
|
338
|
+
class TestCustomProps < self
|
339
|
+
setup do
|
340
|
+
stub(Mixpanel).resume {}
|
341
|
+
end
|
342
|
+
|
343
|
+
data(
|
344
|
+
"false/false" => [false, false],
|
345
|
+
"false/true" => [false, true],
|
346
|
+
"true/false" => [true, false],
|
347
|
+
)
|
348
|
+
def test_valid_combination(data)
|
349
|
+
fetch_unknown_columns, fetch_custom_properties = data
|
350
|
+
conf = DataSource[*transaction_config.merge(fetch_unknown_columns: fetch_unknown_columns, fetch_custom_properties: fetch_custom_properties).to_a.flatten(1)]
|
351
|
+
|
352
|
+
assert_nothing_raised do
|
353
|
+
Mixpanel.transaction(conf, &control)
|
354
|
+
end
|
355
|
+
end
|
356
|
+
|
357
|
+
def test_both_true_then_raise_config_error
|
358
|
+
conf = DataSource[*transaction_config.merge(fetch_unknown_columns: true, fetch_custom_properties: true).to_a.flatten(1)]
|
359
|
+
|
360
|
+
assert_raise(Embulk::ConfigError) do
|
361
|
+
Mixpanel.transaction(conf, &control)
|
362
|
+
end
|
363
|
+
end
|
364
|
+
|
365
|
+
private
|
366
|
+
|
367
|
+
def transaction_config
|
368
|
+
config.merge(
|
369
|
+
columns: schema,
|
370
|
+
fetch_days: 2,
|
371
|
+
timezone: "UTC",
|
372
|
+
)
|
373
|
+
end
|
374
|
+
end
|
375
|
+
|
338
376
|
def test_resume
|
339
377
|
today = Date.today
|
340
378
|
control = proc { [{to_date: today.to_s}] }
|
@@ -463,6 +501,7 @@ module Embulk
|
|
463
501
|
dates: DATES.to_a.map(&:to_s),
|
464
502
|
params: Mixpanel.export_params(embulk_config),
|
465
503
|
fetch_unknown_columns: false,
|
504
|
+
fetch_custom_properties: false,
|
466
505
|
retry_initial_wait_sec: 0,
|
467
506
|
retry_limit: 3,
|
468
507
|
}
|
@@ -509,6 +548,56 @@ module Embulk
|
|
509
548
|
@plugin.run
|
510
549
|
end
|
511
550
|
|
551
|
+
class CustomPropertiesTest < self
|
552
|
+
def setup
|
553
|
+
super
|
554
|
+
@page_builder = Object.new
|
555
|
+
@plugin = Mixpanel.new(task, nil, nil, @page_builder)
|
556
|
+
stub(@plugin).fetch { [record] }
|
557
|
+
end
|
558
|
+
|
559
|
+
def test_run
|
560
|
+
stub(@plugin).preview? { false }
|
561
|
+
|
562
|
+
custom_property_keys = %w($foobar)
|
563
|
+
|
564
|
+
added = [
|
565
|
+
record["event"],
|
566
|
+
record["properties"]["$specified"],
|
567
|
+
custom_property_keys.map{|k| {k => record["properties"][k] }}.inject(&:merge)
|
568
|
+
]
|
569
|
+
|
570
|
+
mock(@page_builder).add(added).at_least(1)
|
571
|
+
mock(@page_builder).finish
|
572
|
+
|
573
|
+
@plugin.run
|
574
|
+
end
|
575
|
+
|
576
|
+
private
|
577
|
+
|
578
|
+
def task
|
579
|
+
super.merge(schema: schema, fetch_unknown_columns: false, fetch_custom_properties: true)
|
580
|
+
end
|
581
|
+
|
582
|
+
def record
|
583
|
+
{
|
584
|
+
"event" => "EV",
|
585
|
+
"properties" => {
|
586
|
+
"$os" => "Android",
|
587
|
+
"$specified" => "foo",
|
588
|
+
"$foobar" => "foobar",
|
589
|
+
}
|
590
|
+
}
|
591
|
+
end
|
592
|
+
|
593
|
+
def schema
|
594
|
+
[
|
595
|
+
{"name" => "event", "type" => "string"},
|
596
|
+
{"name" => "$specified", "type" => "string"},
|
597
|
+
]
|
598
|
+
end
|
599
|
+
end
|
600
|
+
|
512
601
|
class UnknownColumnsTest < self
|
513
602
|
def setup
|
514
603
|
super
|
@@ -581,6 +670,7 @@ module Embulk
|
|
581
670
|
dates: DATES.to_a.map(&:to_s),
|
582
671
|
params: Mixpanel.export_params(embulk_config),
|
583
672
|
fetch_unknown_columns: false,
|
673
|
+
fetch_custom_properties: false,
|
584
674
|
retry_initial_wait_sec: 2,
|
585
675
|
retry_limit: 3,
|
586
676
|
}
|
@@ -615,6 +705,7 @@ module Embulk
|
|
615
705
|
from_date: FROM_DATE,
|
616
706
|
fetch_days: DAYS,
|
617
707
|
fetch_unknown_columns: false,
|
708
|
+
fetch_custom_properties: false,
|
618
709
|
retry_initial_wait_sec: 2,
|
619
710
|
retry_limit: 3,
|
620
711
|
}
|
data/test/run-test.rb
CHANGED
@@ -13,6 +13,12 @@ $LOAD_PATH.unshift(test_dir)
|
|
13
13
|
|
14
14
|
ENV["TEST_UNIT_MAX_DIFF_TARGET_STRING_SIZE"] ||= "5000"
|
15
15
|
|
16
|
-
|
16
|
+
if ENV["CI"]
|
17
|
+
require "codeclimate-test-reporter"
|
18
|
+
CodeClimate::TestReporter.start
|
19
|
+
elsif ENV["COVERAGE"]
|
20
|
+
require 'simplecov'
|
21
|
+
SimpleCov.start 'test_frameworks'
|
22
|
+
end
|
17
23
|
|
18
24
|
exit Test::Unit::AutoRunner.run(true, test_dir)
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-input-mixpanel
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- yoshihara
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-03-
|
12
|
+
date: 2016-03-16 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|