metacrunch-elasticsearch 4.0.0 → 4.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 7fcdad44003a195fca6541d8c9156523dcde02ad
4
- data.tar.gz: 964c93ec7c91b54ea76bf338c5997920546f3840
3
+ metadata.gz: 54f127cb7af8ebc5f9b7246d0241994a499fad0d
4
+ data.tar.gz: f7fca4e9e60e2d2c6951147bb57f177dc21350e2
5
5
  SHA512:
6
- metadata.gz: 29bdc05cebb697ee8fbf6a434d2ad3861cf205c0a35324a77f1c1ef145599c72032f4adf2e5cc6e2cb395ac5bc744493942f87ddeb6f0310b342ac8285c04074
7
- data.tar.gz: 60afb62d71c16d17404b5fd2b1bf735bce2011398a9bc014f0cb2c0e398df13d668c1601b9caa3e6beda1ed3f0d946da4bd37c0a59548159410182e86d3faf12
6
+ metadata.gz: 198b86f27ce9a318ea975201f95f837be6b56408d6c302d08ba29ba9ec3469868ff367959bcab670d147180a2cba9dae89a0c594ec0551ea807be938513aaa48
7
+ data.tar.gz: ef8828c5ff6eac6002edaaa0beca1cc7eb29125ca5251c53eab3a35ca21b25910dfd5a63afc3d2af70332f28a7142e8e5aa0b303720c932cd757d826031cfbd6
@@ -0,0 +1,46 @@
1
+ # Ruby CircleCI 2.0 configuration file
2
+ #
3
+ # Check https://circleci.com/docs/2.0/language-ruby/ for more details
4
+ #
5
+ version: 2
6
+ jobs:
7
+ build:
8
+ docker:
9
+ - image: circleci/ruby:2.4.1-node-browsers
10
+ - image: docker.elastic.co/elasticsearch/elasticsearch:5.6.3
11
+ environment:
12
+ - "discovery.type=single-node"
13
+ - "xpack.security.enabled=false"
14
+ - "http.host=0.0.0.0"
15
+ - "transport.host=127.0.0.1"
16
+
17
+ working_directory: ~/repo
18
+
19
+ steps:
20
+ - checkout
21
+
22
+ - run:
23
+ name: Waiting for elasticsearch to start up (30 sec.)
24
+ command: sleep 30
25
+
26
+ - run:
27
+ name: Install dependencies
28
+ command: bundle install --jobs=4 --retry=3 --path vendor/bundle
29
+
30
+ - run:
31
+ name: Install CodeClimate test coverage reporter
32
+ command: |
33
+ curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
34
+ chmod +x ./cc-test-reporter
35
+ ./cc-test-reporter before-build
36
+
37
+ - run:
38
+ name: Run tests
39
+ command: |
40
+ mkdir /tmp/test-results
41
+ bundle exec rspec --format progress --format RspecJunitFormatter --out /tmp/test-results/rspec.xml
42
+
43
+ - run:
44
+ name: Upload test coverage report to CodeClimate
45
+ command: ./cc-test-reporter after-build --exit-code $?
46
+
data/Gemfile CHANGED
@@ -5,7 +5,6 @@ gemspec
5
5
  group :development do
6
6
  gem "bundler", ">= 1.15"
7
7
  gem "rake", ">= 12.1"
8
- gem "rspec", ">= 3.5.0", "< 4.0.0"
9
8
 
10
9
  if !ENV["CI"]
11
10
  gem "pry-byebug", ">= 3.5.0"
@@ -13,5 +12,8 @@ group :development do
13
12
  end
14
13
 
15
14
  group :test do
16
- gem "simplecov", ">= 0.15.0"
15
+ gem "faker", ">= 1.8.4"
16
+ gem "rspec", ">= 3.5.0", "< 4.0.0"
17
+ gem "rspec_junit_formatter", ">= 0.3.0"
18
+ gem "simplecov", ">= 0.15.0"
17
19
  end
data/License.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013 René Sprotte, Michael Sievers
1
+ Copyright (c) 2013 René Sprotte
2
2
 
3
3
  MIT License
4
4
 
data/Readme.md ADDED
@@ -0,0 +1,104 @@
1
+ metacrunch-elasticsearch
2
+ ========================
3
+
4
+ [![Gem Version](https://badge.fury.io/rb/metacrunch-elasticsearch.svg)](http://badge.fury.io/rb/metacrunch-elasticsearch)
5
+ [![Code Climate](https://codeclimate.com/github/ubpb/metacrunch-elasticsearch/badges/gpa.svg)](https://codeclimate.com/github/ubpb/metacrunch-elasticsearch)
6
+ [![Test Coverage](https://codeclimate.com/github/ubpb/metacrunch-elasticsearch/badges/coverage.svg)](https://codeclimate.com/github/ubpb/metacrunch-elasticsearch/coverage)
7
+ [![CircleCI](https://circleci.com/gh/ubpb/metacrunch-elasticsearch.svg?style=svg)](https://circleci.com/gh/ubpb/metacrunch-elasticsearch)
8
+
9
+ This is the official [Elasticsearch](https://www.elastic.co) package for the [metacrunch ETL toolkit](https://github.com/ubpb/metacrunch).
10
+
11
+ Installation
12
+ ------------
13
+
14
+ Include the gem in your `Gemfile`
15
+
16
+ ```ruby
17
+ gem "metacrunch-elasticsearch", "~> 4.0.1"
18
+ ```
19
+
20
+ and run `$ bundle install` to install it.
21
+
22
+ Or install it manually
23
+
24
+ ```
25
+ $ gem install metacrunch-elasticsearch
26
+ ```
27
+
28
+ Usage
29
+ -----
30
+
31
+ *Note: For working examples on how to use this package check out our [demo repository](https://github.com/ubpb/metacrunch-demo).*
32
+
33
+ ### `Metacrunch::Elasticsearch::Source`
34
+
35
+ This class provides a metacrunch `source` implementation that can be used to read data from Elasticsearch into a metacrunch job.
36
+
37
+ ```ruby
38
+ # my_job.metacrunch
39
+
40
+ # Create a Elasticsearch connection
41
+ elasticsearch = Elasticsearch::Client.new(...)
42
+
43
+ # Set the source
44
+ source Metacrunch::Elasticsearch::Source.new(elasticsearch, OPTIONS)
45
+ ```
46
+
47
+ **Options**
48
+
49
+ * `:search_options`: A hash with search options (including your query) as described [here](https://github.com/elastic/elasticsearch-ruby/blob/master/elasticsearch-api/lib/elasticsearch/api/actions/search.rb). We have set some meaningful defaults though: `size: 100`, `scroll: 1m`, `sort: ["_doc"]`. Depending on your use-case it may be needed to modify `:size` and `:scroll` for optimal performance.
50
+ * `:total_hits_callback`: You can set a `Proc` that gets called with the total number of hits your query will match. Use can use this callback to setup a progress bar for example. Defaults to `nil`.
51
+
52
+
53
+ ### `Metacrunch::Elasticsearch::Destination`
54
+
55
+ This class provides a metacrunch `destination` implementation that can be used to write data from a metacrunch job to Elasticsearch.
56
+
57
+ The data that gets passed to the destination, must be in a proper format. You can use a transformation to transform your data before it reaches the destination.
58
+
59
+ As `Metacrunch::Elasticsearch::Destination` utilizes the Elasticsearch bulk API, the expected format must match one of the available options for the `body`parameter described [here](https://github.com/elastic/elasticsearch-ruby/blob/master/elasticsearch-api/lib/elasticsearch/api/actions/bulk.rb). Please note that you can use the bulk API not only to index records. You can update or delete records as well.
60
+
61
+ ```ruby
62
+ # my_job.metacrunch
63
+
64
+ # Transform data into a format that the destination can understand.
65
+ # In this example `data` is some hash.
66
+ transformation ->(data) do
67
+ {
68
+ index: {
69
+ _index: "my-index",
70
+ _type: "my-type",
71
+ _id: data.delete(:id),
72
+ data: data
73
+ }
74
+ }
75
+ end
76
+ ```
77
+
78
+ It is not efficient to call Elasticsearch for every single record. Therefore we can use a transformation with a buffer, to create bulks of records. In this example we use a buffer size of 10. In production environments and depending on your data, larger buffers may be useful.
79
+
80
+ ```ruby
81
+ # my_job.metacrunch
82
+
83
+ transformation ->(data) { data }, buffer: 10
84
+ ```
85
+
86
+ If these transformations are in place you can now use the `Metacrunch::Elasticsearch::Destination` class as a destination.
87
+
88
+ ```ruby
89
+ # my_job.metacrunch
90
+
91
+ # Write data into elasticsearch
92
+ destination Metacrunch::Elasticsearch::Destination.new(elasticsearch [, OPTIONS])
93
+ ```
94
+
95
+ **Options**
96
+
97
+ * `:raise_on_result_errors`: If set to `true` an error is raised if one of the bulk operations return with an error. Defaults to `false`.
98
+ * `:result_callback`: You can set a `Proc` that gets called with the result from the bulk operation. Defaults to `nil`.
99
+ * `:bulk_options`: A hash of options for the Eleasticsearch bulk API as described [here](https://github.com/elastic/elasticsearch-ruby/blob/master/elasticsearch-api/lib/elasticsearch/api/actions/bulk.rb). Setting `body` here will be ignored. Defaults to `{}`.
100
+
101
+ License
102
+ -------
103
+
104
+ metacrunch-elasticsearch is available at [github](https://github.com/ubpb/metacrunch-elasticsearch) under [MIT license](https://github.com/ubpb/metacrunch-elasticsearch/blob/master/License.txt).
@@ -1,3 +1,5 @@
1
+ require "active_support"
2
+ require "active_support/core_ext"
1
3
  require "elasticsearch"
2
4
 
3
5
  module Metacrunch
@@ -1,5 +1,5 @@
1
1
  module Metacrunch
2
2
  module Elasticsearch
3
- VERSION = "4.0.0"
3
+ VERSION = "4.0.1"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metacrunch-elasticsearch
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.0.0
4
+ version: 4.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - René Sprotte
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-09-27 00:00:00.000000000 Z
11
+ date: 2017-10-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -44,11 +44,13 @@ executables: []
44
44
  extensions: []
45
45
  extra_rdoc_files: []
46
46
  files:
47
+ - ".circleci/config.yml"
47
48
  - ".gitignore"
48
49
  - ".rspec"
49
50
  - Gemfile
50
51
  - License.txt
51
52
  - Rakefile
53
+ - Readme.md
52
54
  - bin/console
53
55
  - lib/metacrunch/elasticsearch.rb
54
56
  - lib/metacrunch/elasticsearch/destination.rb