embulk-input-mixpanel 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 530a912601d813797ba6239f907ddd05d5a877f0
4
- data.tar.gz: 4b1c952056474b426fc3e4fedff9a200c9362590
3
+ metadata.gz: 82bac0d9e5e93a4fc606b3d43f1e9cd35debf4b3
4
+ data.tar.gz: 82e5c6323b3d9db15ff54589b572b642cb6d93e1
5
5
  SHA512:
6
- metadata.gz: 581d5574847051fff79c93a2d14df66212ea31a7a8a9f5e3a275250a31c5a902383baf164e5f3c7471d34d34eeb9c1cabd5168d5ed920be4d3e16e0a736f702f
7
- data.tar.gz: b679af78c706c0723477136cfe94e51445a5bf3a5b9ec721649f6ed6a0cf74871d6984f490c9a24c47d7d84de9e24623302e8e8611a6a3ef96e0e04181fd7d4f
6
+ metadata.gz: e606300ab082a98aa443432a0b4f4246c40a942ca7724bb68679d66a889fce60cb36eb95b7dcf59e7a35181bb3175efc785cb3710d7764b58d800209589a140d
7
+ data.tar.gz: abf1f2528faf56e7c6e65d72d4f8e34a6beebda7f93de1eb16bd93130de2c4b647cf716053ec039a7ddd8b1fdbfa9c62864bb280334630b53af1f15db6137cd3
data/CHANGELOG.md CHANGED
@@ -1,3 +1,6 @@
1
+ ## 0.4.3 - 2016-03-16
2
+ * [enhancement] Custom properties json [#40](https://github.com/treasure-data/embulk-input-mixpanel/pull/40)
3
+
1
4
  ## 0.4.2 - 2016-03-08
2
5
  * [fixed] Fix Range request was not satisfied [#39](https://github.com/treasure-data/embulk-input-mixpanel/pull/39)
3
6
 
data/README.md CHANGED
@@ -38,14 +38,57 @@ To get it, you should log in mixpanel website, and click gear icon at the lower
38
38
  - NOTE: Mixpanel API supports to export data from at least 2 days before to at most the previous day.
39
39
  - **fetch_days**: Count of days range for exporting (integer, optional, default: from_date - (today - 1))
40
40
  - NOTE: Mixpanel doesn't support to from_date > today - 2
41
- - **fetch_unknown_columns**: If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: true)
41
+ - **fetch_unknown_columns**(deprecated): If you want this plugin fetches unknown (unconfigured in config) columns (boolean, optional, default: true)
42
42
  - NOTE: If true, `unknown_columns` column is created and added unknown columns' data.
43
+ - **fetch_custom_properties**: All custom properties into `custom_properties` key. "custom properties" are not desribed Mixpanel document [1](https://mixpanel.com/help/questions/articles/special-or-reserved-properties), [2](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default). (boolean, optional, default: false)
44
+ - NOTE: Cannot set both `fetch_unknown_columns` and `fetch_custom_properties` to `true`.
43
45
  - **event**: The event or events to filter data (array, optional, default: nil)
44
46
  - **where**: Expression to filter data (c.f. https://mixpanel.com/docs/api-documentation/data-export-api#segmentation-expressions) (string, optional, default: nil)
45
47
  - **bucket**:The data backet to filter data (string, optional, default: nil)
46
48
  - **retry_initial_wait_sec** Wait seconds for exponential backoff initial value (integer, default: 1)
47
49
  - **retry_limit**: Try to retry this times (integer, default: 5)
48
50
 
51
+ ### `fetch_unknown_columns` and `fetch_custom_properties`
52
+
53
+ If you have such data and set config.yml as below.
54
+
55
+ | event | $city | $custom | $foobar |
56
+ | ----- | ------- | ------- | ------- |
57
+ | ev | Tokyo | custom | foobar |
58
+
59
+ (NOTE: `$city` is a [reserved key](https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default), `$custom` and `$foobar` are not)
60
+
61
+ ```yaml
62
+ in:
63
+ type: mixpanel
64
+ api_key: "API_KEY"
65
+ api_secret: "API_SECRET"
66
+ timezone: "US/Pacific"
67
+ from_date: "2015-07-19"
68
+ fetch_days: 5
69
+ columns:
70
+ - {name: event, type: string}
71
+ - {name: $custom, type: string}
72
+ ```
73
+
74
+
75
+ `fetch_unknown_columns: true` will fetch as:
76
+
77
+ | event | $custom | unknown_columns (json) |
78
+ | ----- | ------- | ----------------- |
79
+ | ev | custom | `{"$city":"Tokyo", "$foobar": "foobar"}` |
80
+
81
+ `fetch_custom_properties: true` will fetch as:
82
+
83
+ | event | $custom | custom_properties (json) |
84
+ | ----- | ------- | ----------------- |
85
+ | ev | custom | `{"$foobar": "foobar"}` |
86
+
87
+
88
+ `fetch_unknown_columns` recognize `$city` and `$foobar` as `unknown_columns` because they are not described in config.yml.
89
+
90
+ `fetch_custom_properties` recognize `$foobar` as `custom_properties`. `$custom` is also custom property but it was described in config.yml.
91
+
49
92
  ## Example
50
93
 
51
94
  ```yaml
data/Rakefile CHANGED
@@ -5,7 +5,7 @@ task default: :test
5
5
 
6
6
  desc "Run tests"
7
7
  task :test do
8
- ruby("test/run-test.rb", "--use-color=yes", "--collector=dir")
8
+ ruby("--debug", "test/run-test.rb", "--use-color=yes", "--collector=dir")
9
9
  end
10
10
 
11
11
  Everyleaf::EmbulkHelper::Tasks.install(
@@ -1,7 +1,7 @@
1
1
 
2
2
  Gem::Specification.new do |spec|
3
3
  spec.name = "embulk-input-mixpanel"
4
- spec.version = "0.4.2"
4
+ spec.version = "0.4.3"
5
5
  spec.authors = ["yoshihara", "uu59"]
6
6
  spec.summary = "Mixpanel input plugin for Embulk"
7
7
  spec.description = "Loads records from Mixpanel."
@@ -12,6 +12,18 @@ module Embulk
12
12
  GUESS_RECORDS_COUNT = 10
13
13
  NOT_PROPERTY_COLUMN = "event".freeze
14
14
 
15
+ # https://mixpanel.com/help/questions/articles/special-or-reserved-properties
16
+ # https://mixpanel.com/help/questions/articles/what-properties-do-mixpanels-libraries-store-by-default
17
+ #
18
+ # JavaScript to extract key names from HTML: run it on Chrome Devtool when opening their document
19
+ # > Array.from(document.querySelectorAll("strong")).map(function(s){ return s.textContent.match(/[A-Z]/) ? s.parentNode.textContent.match(/\((.*?)\)/)[1] : s.textContent.split(",").join(" ") }).join(" ")
20
+ # > Array.from(document.querySelectorAll("li")).map(function(s){ m = s.textContent.match(/\((.*?)\)/); return m && m[1] }).filter(function(k) { return k && !k.match("utm") }).join(" ")
21
+ KNOWN_KEYS = %W(
22
+ #{NOT_PROPERTY_COLUMN}
23
+ distinct_id ip mp_name_tag mp_note token time mp_country_code length campaign_id $email $phone $distinct_id $ios_devices $android_devices $first_name $last_name $name $city $region $country_code $timezone $unsubscribed
24
+ $city $region mp_country_code $browser $browser_version $device $current_url $initial_referrer $initial_referring_domain $os $referrer $referring_domain $screen_height $screen_width $search_engine $city $region $mp_country_code $timezone $browser_version $browser $initial_referrer $initial_referring_domain $os $last_seen $city $region mp_country_code $app_release $app_version $carrier $ios_ifa $os_version $manufacturer $lib_version $model $os $screen_height $screen_width $wifi $city $region $mp_country_code $timezone $ios_app_release $ios_app_version $ios_device_model $ios_lib_version $ios_version $ios_ifa $last_seen $city $region mp_country_code $app_version $bluetooth_enabled $bluetooth_version $brand $carrier $has_nfc $has_telephone $lib_version $manufacturer $model $os $os_version $screen_dpi $screen_height $screen_width $wifi $google_play_services $city $region mp_country_code $timezone $android_app_version $android_app_version_code $android_lib_version $android_os $android_os_version $android_brand $android_model $android_manufacturer $last_seen
25
+ ).uniq.freeze
26
+
15
27
  # NOTE: It takes long time to fetch data between from_date to
16
28
  # to_date by one API request. So this plugin fetches data
17
29
  # between each 7 (SLICE_DAYS_COUNT) days.
@@ -36,10 +48,15 @@ module Embulk
36
48
  api_secret: config.param(:api_secret, :string),
37
49
  schema: config.param(:columns, :array),
38
50
  fetch_unknown_columns: fetch_unknown_columns,
51
+ fetch_custom_properties: config.param(:fetch_custom_properties, :bool, default: false),
39
52
  retry_initial_wait_sec: config.param(:retry_initial_wait_sec, :integer, default: 1),
40
53
  retry_limit: config.param(:retry_limit, :integer, default: 5),
41
54
  }
42
55
 
56
+ if task[:fetch_unknown_columns] && task[:fetch_custom_properties]
57
+ raise Embulk::ConfigError.new("Don't set true both `fetch_unknown_columns` and `fetch_custom_properties`.")
58
+ end
59
+
43
60
  columns = task[:schema].map do |column|
44
61
  name = column["name"]
45
62
  type = column["type"].to_sym
@@ -48,9 +65,14 @@ module Embulk
48
65
  end
49
66
 
50
67
  if fetch_unknown_columns
68
+ Embulk.logger.warn "Deprecated `unknown_columns`. Use `fetch_custom_properties` instead."
51
69
  columns << Column.new(nil, "unknown_columns", :json)
52
70
  end
53
71
 
72
+ if task[:fetch_custom_properties]
73
+ columns << Column.new(nil, "custom_properties", :json)
74
+ end
75
+
54
76
  resume(task, columns, 1, &control)
55
77
  end
56
78
 
@@ -115,6 +137,9 @@ module Embulk
115
137
  unknown_values = extract_unknown_values(record)
116
138
  values << unknown_values.to_json
117
139
  end
140
+ if task[:fetch_custom_properties]
141
+ values << collect_custom_properties(record)
142
+ end
118
143
  page_builder.add(values)
119
144
  end
120
145
 
@@ -153,6 +178,16 @@ module Embulk
153
178
  end
154
179
  end
155
180
 
181
+ def collect_custom_properties(record)
182
+ specified_columns = @schema.map{|col| col["name"]}
183
+ custom_keys = record["properties"].keys.find_all{|key| !KNOWN_KEYS.include?(key.to_s) && !specified_columns.include?(key.to_s) }
184
+ custom_keys.inject({}) do |result, key|
185
+ result.merge({
186
+ key => record["properties"][key]
187
+ })
188
+ end
189
+ end
190
+
156
191
  def extract_unknown_values(record)
157
192
  record_keys = record["properties"].keys + [NOT_PROPERTY_COLUMN]
158
193
  schema_keys = @schema.map {|column| column["name"]}
@@ -335,6 +335,44 @@ module Embulk
335
335
  end
336
336
  end
337
337
 
338
+ class TestCustomProps < self
339
+ setup do
340
+ stub(Mixpanel).resume {}
341
+ end
342
+
343
+ data(
344
+ "false/false" => [false, false],
345
+ "false/true" => [false, true],
346
+ "true/false" => [true, false],
347
+ )
348
+ def test_valid_combination(data)
349
+ fetch_unknown_columns, fetch_custom_properties = data
350
+ conf = DataSource[*transaction_config.merge(fetch_unknown_columns: fetch_unknown_columns, fetch_custom_properties: fetch_custom_properties).to_a.flatten(1)]
351
+
352
+ assert_nothing_raised do
353
+ Mixpanel.transaction(conf, &control)
354
+ end
355
+ end
356
+
357
+ def test_both_true_then_raise_config_error
358
+ conf = DataSource[*transaction_config.merge(fetch_unknown_columns: true, fetch_custom_properties: true).to_a.flatten(1)]
359
+
360
+ assert_raise(Embulk::ConfigError) do
361
+ Mixpanel.transaction(conf, &control)
362
+ end
363
+ end
364
+
365
+ private
366
+
367
+ def transaction_config
368
+ config.merge(
369
+ columns: schema,
370
+ fetch_days: 2,
371
+ timezone: "UTC",
372
+ )
373
+ end
374
+ end
375
+
338
376
  def test_resume
339
377
  today = Date.today
340
378
  control = proc { [{to_date: today.to_s}] }
@@ -463,6 +501,7 @@ module Embulk
463
501
  dates: DATES.to_a.map(&:to_s),
464
502
  params: Mixpanel.export_params(embulk_config),
465
503
  fetch_unknown_columns: false,
504
+ fetch_custom_properties: false,
466
505
  retry_initial_wait_sec: 0,
467
506
  retry_limit: 3,
468
507
  }
@@ -509,6 +548,56 @@ module Embulk
509
548
  @plugin.run
510
549
  end
511
550
 
551
+ class CustomPropertiesTest < self
552
+ def setup
553
+ super
554
+ @page_builder = Object.new
555
+ @plugin = Mixpanel.new(task, nil, nil, @page_builder)
556
+ stub(@plugin).fetch { [record] }
557
+ end
558
+
559
+ def test_run
560
+ stub(@plugin).preview? { false }
561
+
562
+ custom_property_keys = %w($foobar)
563
+
564
+ added = [
565
+ record["event"],
566
+ record["properties"]["$specified"],
567
+ custom_property_keys.map{|k| {k => record["properties"][k] }}.inject(&:merge)
568
+ ]
569
+
570
+ mock(@page_builder).add(added).at_least(1)
571
+ mock(@page_builder).finish
572
+
573
+ @plugin.run
574
+ end
575
+
576
+ private
577
+
578
+ def task
579
+ super.merge(schema: schema, fetch_unknown_columns: false, fetch_custom_properties: true)
580
+ end
581
+
582
+ def record
583
+ {
584
+ "event" => "EV",
585
+ "properties" => {
586
+ "$os" => "Android",
587
+ "$specified" => "foo",
588
+ "$foobar" => "foobar",
589
+ }
590
+ }
591
+ end
592
+
593
+ def schema
594
+ [
595
+ {"name" => "event", "type" => "string"},
596
+ {"name" => "$specified", "type" => "string"},
597
+ ]
598
+ end
599
+ end
600
+
512
601
  class UnknownColumnsTest < self
513
602
  def setup
514
603
  super
@@ -581,6 +670,7 @@ module Embulk
581
670
  dates: DATES.to_a.map(&:to_s),
582
671
  params: Mixpanel.export_params(embulk_config),
583
672
  fetch_unknown_columns: false,
673
+ fetch_custom_properties: false,
584
674
  retry_initial_wait_sec: 2,
585
675
  retry_limit: 3,
586
676
  }
@@ -615,6 +705,7 @@ module Embulk
615
705
  from_date: FROM_DATE,
616
706
  fetch_days: DAYS,
617
707
  fetch_unknown_columns: false,
708
+ fetch_custom_properties: false,
618
709
  retry_initial_wait_sec: 2,
619
710
  retry_limit: 3,
620
711
  }
data/test/run-test.rb CHANGED
@@ -13,6 +13,12 @@ $LOAD_PATH.unshift(test_dir)
13
13
 
14
14
  ENV["TEST_UNIT_MAX_DIFF_TARGET_STRING_SIZE"] ||= "5000"
15
15
 
16
- CodeClimate::TestReporter.start
16
+ if ENV["CI"]
17
+ require "codeclimate-test-reporter"
18
+ CodeClimate::TestReporter.start
19
+ elsif ENV["COVERAGE"]
20
+ require 'simplecov'
21
+ SimpleCov.start 'test_frameworks'
22
+ end
17
23
 
18
24
  exit Test::Unit::AutoRunner.run(true, test_dir)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-mixpanel
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshihara
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-03-08 00:00:00.000000000 Z
12
+ date: 2016-03-16 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  requirement: !ruby/object:Gem::Requirement