embulk-input-mixpanel 0.4.2 → 0.4.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 530a912601d813797ba6239f907ddd05d5a877f0
4
- data.tar.gz: 4b1c952056474b426fc3e4fedff9a200c9362590
3
+ metadata.gz: 82bac0d9e5e93a4fc606b3d43f1e9cd35debf4b3
4
+ data.tar.gz: 82e5c6323b3d9db15ff54589b572b642cb6d93e1
5
5
  SHA512:
6
- metadata.gz: 581d5574847051fff79c93a2d14df66212ea31a7a8a9f5e3a275250a31c5a902383baf164e5f3c7471d34d34eeb9c1cabd5168d5ed920be4d3e16e0a736f702f
7
- data.tar.gz: b679af78c706c0723477136cfe94e51445a5bf3a5b9ec721649f6ed6a0cf74871d6984f490c9a24c47d7d84de9e24623302e8e8611a6a3ef96e0e04181fd7d4f
6
+ metadata.gz: e606300ab082a98aa443432a0b4f4246c40a942ca7724bb68679d66a889fce60cb36eb95b7dcf59e7a35181bb3175efc785cb3710d7764b58d800209589a140d
7
+ data.tar.gz: abf1f2528faf56e7c6e65d72d4f8e34a6beebda7f93de1eb16bd93130de2c4b647cf716053ec039a7ddd8b1fdbfa9c62864bb280334630b53af1f15db6137cd3
data/CHANGELOG.md CHANGED
@@ -1,3 +1,6 @@
1
+ ## 0.4.3 - 2016-03-16
2
+ * [enhancement] Custom properties json [#40](https://github.com/treasure-data/embulk-input-mixpanel/pull/40)
3
+
1
4
  ## 0.4.2 - 2016-03-08
2
5
  * [fixed] Fix Range request was not satisfied [#39](https://github.com/treasure-data/embulk-input-mixpanel/pull/39)
3
6
 
data/README.md CHANGED
@@ -38,14 +38,57 @@ To get it, you should log in mixpanel website, and click gear icon at the lower
38
38
  - NOTE: Mixpanel API supports to export data from at least 2 days before to at most the previous day.
39
39
  - **fetch_days**: Count of days range for exporting (integer, optional, default: from_date - (today - 1))
40
40
  - NOTE: Mixpanel doesn't support to from_date > today - 2
41
- - **fetch_unknown_columns**: If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: true)
41
+ - **fetch_unknown_columns**(deprecated): If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: true)
42
42
  - NOTE: If true, `unknown_columns` column is created and added unknown columns' data.
43
+ - **fetch_custom_properties**: All custom properties into `custom_properties` key. "custom properties" are not desribed Mixpanel document [1](https://mixpanel.com/help/questions/articles/special-or-reserved-properties), [2](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default). (boolean, optional, default: false)
44
+ - NOTE: Cannot set both `fetch_unknown_columns` and `fetch_custom_properties` to `true`.
43
45
  - **event**: The event or events to filter data (array, optional, default: nil)
44
46
  - **where**: Expression to filter data (c.f. https://mixpanel.com/docs/api-documentation/data-export-api#segmentation-expressions) (string, optional, default: nil)
45
47
  - **bucket**:The data backet to filter data (string, optional, default: nil)
46
48
  - **retry_initial_wait_sec** Wait seconds for exponential backoff initial value (integer, default: 1)
47
49
  - **retry_limit**: Try to retry this times (integer, default: 5)
48
50
 
51
+ ### `fetch_unknown_columns` and `fetch_custom_properties`
52
+
53
+ If you have such data and set config.yml as below.
54
+
55
+ | event | $city | $custom | $foobar |
56
+ | ----- | ------- | ------- | ------- |
57
+ | ev | Tokyo | custom | foobar |
58
+
59
+ (NOTE: `$city` is a [reserved key](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default), `$custom` and `$foobar` are not)
60
+
61
+ ```yaml
62
+ in:
63
+ type: mixpanel
64
+ api_key: "API_KEY"
65
+ api_secret: "API_SECRET"
66
+ timezone: "US/Pacific"
67
+ from_date: "2015-07-19"
68
+ fetch_days: 5
69
+ columns:
70
+ - {name: event, type: string}
71
+ - {name: $custom, type: string}
72
+ ```
73
+
74
+
75
+ `fetch_unknown_columns: true` will fetch as:
76
+
77
+ | event | $custom | unknown_columns (json) |
78
+ | ----- | ------- | ----------------- |
79
+ | ev | custom | `{"$city":"Tokyo", "$foobar": "foobar"}` |
80
+
81
+ `fetch_custom_properties: true` will fetch as:
82
+
83
+ | event | $custom | custom_properties (json) |
84
+ | ----- | ------- | ----------------- |
85
+ | ev | custom | `{"$foobar": "foobar"}` |
86
+
87
+
88
+ `fetch_unknown_columns` recognize `$city` and `$foobar` as `unknown_columns` because they are not described in config.yml.
89
+
90
+ `fetch_custom_properties` recognize `$foobar` as `custom_properties`. `$custom` is also custom property but it was described in config.yml.
91
+
49
92
  ## Example
50
93
 
51
94
  ```yaml
data/Rakefile CHANGED
@@ -5,7 +5,7 @@ task default: :test
5
5
 
6
6
  desc "Run tests"
7
7
  task :test do
8
- ruby("test/run-test.rb", "--use-color=yes", "--collector=dir")
8
+ ruby("--debug", "test/run-test.rb", "--use-color=yes", "--collector=dir")
9
9
  end
10
10
 
11
11
  Everyleaf::EmbulkHelper::Tasks.install(
@@ -1,7 +1,7 @@
1
1
 
2
2
  Gem::Specification.new do |spec|
3
3
  spec.name = "embulk-input-mixpanel"
4
- spec.version = "0.4.2"
4
+ spec.version = "0.4.3"
5
5
  spec.authors = ["yoshihara", "uu59"]
6
6
  spec.summary = "Mixpanel input plugin for Embulk"
7
7
  spec.description = "Loads records from Mixpanel."
@@ -12,6 +12,18 @@ module Embulk
12
12
  GUESS_RECORDS_COUNT = 10
13
13
  NOT_PROPERTY_COLUMN = "event".freeze
14
14
 
15
+ # https://mixpanel.com/help/questions/articles/special-or-reserved-properties
16
+ # https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default
17
+ #
18
+ # JavaScript to extract key names from HTML: run it on Chrome Devtool when opening their document
19
+ # > Array.from(document.querySelectorAll("strong")).map(function(s){ return s.textContent.match(/[A-Z]/) ? s.parentNode.textContent.match(/\((.*?)\)/)[1] : s.textContent.split(",").join(" ") }).join(" ")
20
+ # > Array.from(document.querySelectorAll("li")).map(function(s){ m = s.textContent.match(/\((.*?)\)/); return m && m[1] }).filter(function(k) { return k && !k.match("utm") }).join(" ")
21
+ KNOWN_KEYS = %W(
22
+ #{NOT_PROPERTY_COLUMN}
23
+ distinct_id ip mp_name_tag mp_note token time mp_country_code length campaign_id $email $phone $distinct_id $ios_devices $android_devices $first_name $last_name $name $city $region $country_code $timezone $unsubscribed
24
+ $city $region mp_country_code $browser $browser_version $device $current_url $initial_referrer $initial_referring_domain $os $referrer $referring_domain $screen_height $screen_width $search_engine $city $region $mp_country_code $timezone $browser_version $browser $initial_referrer $initial_referring_domain $os $last_seen $city $region mp_country_code $app_release $app_version $carrier $ios_ifa $os_version $manufacturer $lib_version $model $os $screen_height $screen_width $wifi $city $region $mp_country_code $timezone $ios_app_release $ios_app_version $ios_device_model $ios_lib_version $ios_version $ios_ifa $last_seen $city $region mp_country_code $app_version $bluetooth_enabled $bluetooth_version $brand $carrier $has_nfc $has_telephone $lib_version $manufacturer $model $os $os_version $screen_dpi $screen_height $screen_width $wifi $google_play_services $city $region mp_country_code $timezone $android_app_version $android_app_version_code $android_lib_version $android_os $android_os_version $android_brand $android_model $android_manufacturer $last_seen
25
+ ).uniq.freeze
26
+
15
27
  # NOTE: It takes long time to fetch data between from_date to
16
28
  # to_date by one API request. So this plugin fetches data
17
29
  # between each 7 (SLICE_DAYS_COUNT) days.
@@ -36,10 +48,15 @@ module Embulk
36
48
  api_secret: config.param(:api_secret, :string),
37
49
  schema: config.param(:columns, :array),
38
50
  fetch_unknown_columns: fetch_unknown_columns,
51
+ fetch_custom_properties: config.param(:fetch_custom_properties, :bool, default: false),
39
52
  retry_initial_wait_sec: config.param(:retry_initial_wait_sec, :integer, default: 1),
40
53
  retry_limit: config.param(:retry_limit, :integer, default: 5),
41
54
  }
42
55
 
56
+ if task[:fetch_unknown_columns] && task[:fetch_custom_properties]
57
+ raise Embulk::ConfigError.new("Don't set true both `fetch_unknown_columns` and `fetch_custom_properties`.")
58
+ end
59
+
43
60
  columns = task[:schema].map do |column|
44
61
  name = column["name"]
45
62
  type = column["type"].to_sym
@@ -48,9 +65,14 @@ module Embulk
48
65
  end
49
66
 
50
67
  if fetch_unknown_columns
68
+ Embulk.logger.warn "Deprecated `unknown_columns`. Use `fetch_custom_properties` instead."
51
69
  columns << Column.new(nil, "unknown_columns", :json)
52
70
  end
53
71
 
72
+ if task[:fetch_custom_properties]
73
+ columns << Column.new(nil, "custom_properties", :json)
74
+ end
75
+
54
76
  resume(task, columns, 1, &control)
55
77
  end
56
78
 
@@ -115,6 +137,9 @@ module Embulk
115
137
  unknown_values = extract_unknown_values(record)
116
138
  values << unknown_values.to_json
117
139
  end
140
+ if task[:fetch_custom_properties]
141
+ values << collect_custom_properties(record)
142
+ end
118
143
  page_builder.add(values)
119
144
  end
120
145
 
@@ -153,6 +178,16 @@ module Embulk
153
178
  end
154
179
  end
155
180
 
181
+ def collect_custom_properties(record)
182
+ specified_columns = @schema.map{|col| col["name"]}
183
+ custom_keys = record["properties"].keys.find_all{|key| !KNOWN_KEYS.include?(key.to_s) && !specified_columns.include?(key.to_s) }
184
+ custom_keys.inject({}) do |result, key|
185
+ result.merge({
186
+ key => record["properties"][key]
187
+ })
188
+ end
189
+ end
190
+
156
191
  def extract_unknown_values(record)
157
192
  record_keys = record["properties"].keys + [NOT_PROPERTY_COLUMN]
158
193
  schema_keys = @schema.map {|column| column["name"]}
@@ -335,6 +335,44 @@ module Embulk
335
335
  end
336
336
  end
337
337
 
338
+ class TestCustomProps < self
339
+ setup do
340
+ stub(Mixpanel).resume {}
341
+ end
342
+
343
+ data(
344
+ "false/false" => [false, false],
345
+ "false/true" => [false, true],
346
+ "true/false" => [true, false],
347
+ )
348
+ def test_valid_combination(data)
349
+ fetch_unknown_columns, fetch_custom_properties = data
350
+ conf = DataSource[*transaction_config.merge(fetch_unknown_columns: fetch_unknown_columns, fetch_custom_properties: fetch_custom_properties).to_a.flatten(1)]
351
+
352
+ assert_nothing_raised do
353
+ Mixpanel.transaction(conf, &control)
354
+ end
355
+ end
356
+
357
+ def test_both_true_then_raise_config_error
358
+ conf = DataSource[*transaction_config.merge(fetch_unknown_columns: true, fetch_custom_properties: true).to_a.flatten(1)]
359
+
360
+ assert_raise(Embulk::ConfigError) do
361
+ Mixpanel.transaction(conf, &control)
362
+ end
363
+ end
364
+
365
+ private
366
+
367
+ def transaction_config
368
+ config.merge(
369
+ columns: schema,
370
+ fetch_days: 2,
371
+ timezone: "UTC",
372
+ )
373
+ end
374
+ end
375
+
338
376
  def test_resume
339
377
  today = Date.today
340
378
  control = proc { [{to_date: today.to_s}] }
@@ -463,6 +501,7 @@ module Embulk
463
501
  dates: DATES.to_a.map(&:to_s),
464
502
  params: Mixpanel.export_params(embulk_config),
465
503
  fetch_unknown_columns: false,
504
+ fetch_custom_properties: false,
466
505
  retry_initial_wait_sec: 0,
467
506
  retry_limit: 3,
468
507
  }
@@ -509,6 +548,56 @@ module Embulk
509
548
  @plugin.run
510
549
  end
511
550
 
551
+ class CustomPropertiesTest < self
552
+ def setup
553
+ super
554
+ @page_builder = Object.new
555
+ @plugin = Mixpanel.new(task, nil, nil, @page_builder)
556
+ stub(@plugin).fetch { [record] }
557
+ end
558
+
559
+ def test_run
560
+ stub(@plugin).preview? { false }
561
+
562
+ custom_property_keys = %w($foobar)
563
+
564
+ added = [
565
+ record["event"],
566
+ record["properties"]["$specified"],
567
+ custom_property_keys.map{|k| {k => record["properties"][k] }}.inject(&:merge)
568
+ ]
569
+
570
+ mock(@page_builder).add(added).at_least(1)
571
+ mock(@page_builder).finish
572
+
573
+ @plugin.run
574
+ end
575
+
576
+ private
577
+
578
+ def task
579
+ super.merge(schema: schema, fetch_unknown_columns: false, fetch_custom_properties: true)
580
+ end
581
+
582
+ def record
583
+ {
584
+ "event" => "EV",
585
+ "properties" => {
586
+ "$os" => "Android",
587
+ "$specified" => "foo",
588
+ "$foobar" => "foobar",
589
+ }
590
+ }
591
+ end
592
+
593
+ def schema
594
+ [
595
+ {"name" => "event", "type" => "string"},
596
+ {"name" => "$specified", "type" => "string"},
597
+ ]
598
+ end
599
+ end
600
+
512
601
  class UnknownColumnsTest < self
513
602
  def setup
514
603
  super
@@ -581,6 +670,7 @@ module Embulk
581
670
  dates: DATES.to_a.map(&:to_s),
582
671
  params: Mixpanel.export_params(embulk_config),
583
672
  fetch_unknown_columns: false,
673
+ fetch_custom_properties: false,
584
674
  retry_initial_wait_sec: 2,
585
675
  retry_limit: 3,
586
676
  }
@@ -615,6 +705,7 @@ module Embulk
615
705
  from_date: FROM_DATE,
616
706
  fetch_days: DAYS,
617
707
  fetch_unknown_columns: false,
708
+ fetch_custom_properties: false,
618
709
  retry_initial_wait_sec: 2,
619
710
  retry_limit: 3,
620
711
  }
data/test/run-test.rb CHANGED
@@ -13,6 +13,12 @@ $LOAD_PATH.unshift(test_dir)
13
13
 
14
14
  ENV["TEST_UNIT_MAX_DIFF_TARGET_STRING_SIZE"] ||= "5000"
15
15
 
16
- CodeClimate::TestReporter.start
16
+ if ENV["CI"]
17
+ require "codeclimate-test-reporter"
18
+ CodeClimate::TestReporter.start
19
+ elsif ENV["COVERAGE"]
20
+ require 'simplecov'
21
+ SimpleCov.start 'test_frameworks'
22
+ end
17
23
 
18
24
  exit Test::Unit::AutoRunner.run(true, test_dir)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-mixpanel
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshihara
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-03-08 00:00:00.000000000 Z
12
+ date: 2016-03-16 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  requirement: !ruby/object:Gem::Requirement