connectors_utility 8.4.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 824ceaf9dec38db287b8ec5739847b5dc08b3c1ea140bce575076bceb8b6caf6
4
+ data.tar.gz: b287e744ebe57f49162437744649e0e2d304c3e0a2ecfbb1bdeb082e0463f9f8
5
+ SHA512:
6
+ metadata.gz: 856965cb7aca1080b86e241e688495c18228dd75d9ee1dc895aed16b3ab86f5a8694c6715bd0e88699803665590c3d4c18d302e3b593ee4fba4a85ccc921c778
7
+ data.tar.gz: 32398b3b4c4771665dcddd9f493f7ca1155757828cfcb9ddb52ca9f2f2e8eca09d4f8f1a84d6aee108d9af18da2dc5dd2eb1829009bc1bce1896a87b28fb7474
data/LICENSE ADDED
@@ -0,0 +1,93 @@
1
+ Elastic License 2.0
2
+
3
+ URL: https://www.elastic.co/licensing/elastic-license
4
+
5
+ ## Acceptance
6
+
7
+ By using the software, you agree to all of the terms and conditions below.
8
+
9
+ ## Copyright License
10
+
11
+ The licensor grants you a non-exclusive, royalty-free, worldwide,
12
+ non-sublicensable, non-transferable license to use, copy, distribute, make
13
+ available, and prepare derivative works of the software, in each case subject to
14
+ the limitations and conditions below.
15
+
16
+ ## Limitations
17
+
18
+ You may not provide the software to third parties as a hosted or managed
19
+ service, where the service provides users with access to any substantial set of
20
+ the features or functionality of the software.
21
+
22
+ You may not move, change, disable, or circumvent the license key functionality
23
+ in the software, and you may not remove or obscure any functionality in the
24
+ software that is protected by the license key.
25
+
26
+ You may not alter, remove, or obscure any licensing, copyright, or other notices
27
+ of the licensor in the software. Any use of the licensor’s trademarks is subject
28
+ to applicable law.
29
+
30
+ ## Patents
31
+
32
+ The licensor grants you a license, under any patent claims the licensor can
33
+ license, or becomes able to license, to make, have made, use, sell, offer for
34
+ sale, import and have imported the software, in each case subject to the
35
+ limitations and conditions in this license. This license does not cover any
36
+ patent claims that you cause to be infringed by modifications or additions to
37
+ the software. If you or your company make any written claim that the software
38
+ infringes or contributes to infringement of any patent, your patent license for
39
+ the software granted under these terms ends immediately. If your company makes
40
+ such a claim, your patent license ends immediately for work on behalf of your
41
+ company.
42
+
43
+ ## Notices
44
+
45
+ You must ensure that anyone who gets a copy of any part of the software from you
46
+ also gets a copy of these terms.
47
+
48
+ If you modify the software, you must include in any modified copies of the
49
+ software prominent notices stating that you have modified the software.
50
+
51
+ ## No Other Rights
52
+
53
+ These terms do not imply any licenses other than those expressly granted in
54
+ these terms.
55
+
56
+ ## Termination
57
+
58
+ If you use the software in violation of these terms, such use is not licensed,
59
+ and your licenses will automatically terminate. If the licensor provides you
60
+ with a notice of your violation, and you cease all violation of this license no
61
+ later than 30 days after you receive that notice, your licenses will be
62
+ reinstated retroactively. However, if you violate these terms after such
63
+ reinstatement, any additional violation of these terms will cause your licenses
64
+ to terminate automatically and permanently.
65
+
66
+ ## No Liability
67
+
68
+ *As far as the law allows, the software comes as is, without any warranty or
69
+ condition, and the licensor will not be liable to you for any damages arising
70
+ out of these terms or the use or nature of the software, under any kind of
71
+ legal claim.*
72
+
73
+ ## Definitions
74
+
75
+ The **licensor** is the entity offering these terms, and the **software** is the
76
+ software the licensor makes available under these terms, including any portion
77
+ of it.
78
+
79
+ **you** refers to the individual or entity agreeing to these terms.
80
+
81
+ **your company** is any legal entity, sole proprietorship, or other kind of
82
+ organization that you work for, plus all organizations that have control over,
83
+ are under the control of, or are under common control with that
84
+ organization. **control** means ownership of substantially all the assets of an
85
+ entity, or the power to direct its management and policies by vote, contract, or
86
+ otherwise. Control can be direct or indirect.
87
+
88
+ **your licenses** are all the licenses granted to you for the software under
89
+ these terms.
90
+
91
+ **use** means anything you do with the software requiring one of your licenses.
92
+
93
+ **trademark** means trademarks, service marks, and similar rights.
data/NOTICE.txt ADDED
@@ -0,0 +1,2 @@
1
+ connectors
2
+ Copyright 2022 Elasticsearch B.V.
@@ -0,0 +1,10 @@
1
+ #
2
+ # Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3
+ # or more contributor license agreements. Licensed under the Elastic License;
4
+ # you may not use this file except in compliance with the Elastic License.
5
+ #
6
+
7
+ # frozen_string_literal: true
8
+
9
+ require_relative 'utility/elasticsearch/index/text_analysis_settings'
10
+ require_relative 'utility/elasticsearch/index/mappings'
@@ -0,0 +1,111 @@
1
+ ---
2
+ da:
3
+ name: Danish
4
+ stemmer: danish
5
+ stop_words: _danish_
6
+ de:
7
+ name: German
8
+ stemmer: light_german
9
+ stop_words: _german_
10
+ en:
11
+ name: English
12
+ stemmer: light_english
13
+ stop_words: _english_
14
+ es:
15
+ name: Spanish
16
+ stemmer: light_spanish
17
+ stop_words: _spanish_
18
+ fr:
19
+ name: French
20
+ stemmer: light_french
21
+ stop_words: _french_
22
+ custom_filter_definitions:
23
+ fr-elision:
24
+ type: elision
25
+ articles:
26
+ - l
27
+ - m
28
+ - t
29
+ - qu
30
+ - n
31
+ - s
32
+ - j
33
+ - d
34
+ - c
35
+ - jusqu
36
+ - quoiqu
37
+ - lorsqu
38
+ - puisqu
39
+ articles_case: true
40
+ prepended_filters:
41
+ - fr-elision
42
+ it:
43
+ name: Italian
44
+ stemmer: light_italian
45
+ stop_words: _italian_
46
+ custom_filter_definitions:
47
+ it-elision:
48
+ type: elision
49
+ articles:
50
+ - c
51
+ - l
52
+ - all
53
+ - dall
54
+ - dell
55
+ - nell
56
+ - sull
57
+ - coll
58
+ - pell
59
+ - gl
60
+ - agl
61
+ - dagl
62
+ - degl
63
+ - negl
64
+ - sugl
65
+ - un
66
+ - m
67
+ - t
68
+ - s
69
+ - v
70
+ - d
71
+ articles_case: true
72
+ prepended_filters:
73
+ - it-elision
74
+ ja:
75
+ name: Japanese
76
+ stemmer: light_english
77
+ stop_words: _english_
78
+ postpended_filters:
79
+ - cjk_bigram
80
+ ko:
81
+ name: Korean
82
+ stemmer: light_english
83
+ stop_words: _english_
84
+ postpended_filters:
85
+ - cjk_bigram
86
+ nl:
87
+ name: Dutch
88
+ stemmer: dutch
89
+ stop_words: _dutch_
90
+ pt:
91
+ name: Portuguese
92
+ stemmer: light_portuguese
93
+ stop_words: _portuguese_
94
+ pt-br:
95
+ name: Portuguese (Brazil)
96
+ stemmer: brazilian
97
+ stop_words: _brazilian_
98
+ ru:
99
+ name: Russian
100
+ stemmer: russian
101
+ stop_words: _russian_
102
+ th:
103
+ name: Thai
104
+ stemmer: light_english
105
+ stop_words: _thai_
106
+ zh:
107
+ name: Chinese
108
+ stemmer: light_english
109
+ stop_words: _english_
110
+ postpended_filters:
111
+ - cjk_bigram
@@ -0,0 +1,78 @@
1
+ #
2
+ # Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3
+ # or more contributor license agreements. Licensed under the Elastic License;
4
+ # you may not use this file except in compliance with the Elastic License.
5
+ #
6
+
7
+ # frozen_string_literal: true
8
+
9
+ module Utility
10
+ module Elasticsearch
11
+ module Index
12
+ module Mappings
13
+ ENUM_IGNORE_ABOVE = 2048
14
+
15
+ WORKPLACE_SEARCH_SUBEXTRACTION_STAMP_FIELD_MAPPINGS = {
16
+ _subextracted_as_of: {
17
+ type: 'date'
18
+ },
19
+ _subextracted_version: {
20
+ type: 'keyword'
21
+ }
22
+ }.freeze
23
+
24
+ def self.default_text_fields_mappings(connectors_index:)
25
+ {
26
+ dynamic: true,
27
+ dynamic_templates: [
28
+ {
29
+ data: {
30
+ match_mapping_type: 'string',
31
+ mapping: {
32
+ type: 'text',
33
+ analyzer: 'iq_text_base',
34
+ index_options: 'freqs',
35
+ fields: {
36
+ 'stem': {
37
+ type: 'text',
38
+ analyzer: 'iq_text_stem'
39
+ },
40
+ 'prefix' => {
41
+ type: 'text',
42
+ analyzer: 'i_prefix',
43
+ search_analyzer: 'q_prefix',
44
+ index_options: 'docs'
45
+ },
46
+ 'delimiter' => {
47
+ type: 'text',
48
+ analyzer: 'iq_text_delimiter',
49
+ index_options: 'freqs'
50
+ },
51
+ 'joined': {
52
+ type: 'text',
53
+ analyzer: 'i_text_bigram',
54
+ search_analyzer: 'q_text_bigram',
55
+ index_options: 'freqs'
56
+ },
57
+ 'enum': {
58
+ type: 'keyword',
59
+ ignore_above: ENUM_IGNORE_ABOVE
60
+ }
61
+ }
62
+ }
63
+ }
64
+ }
65
+ ],
66
+ properties: {
67
+ id: {
68
+ type: 'keyword'
69
+ }
70
+ }.tap do |properties|
71
+ properties.merge!(WORKPLACE_SEARCH_SUBEXTRACTION_STAMP_FIELD_MAPPINGS) if connectors_index
72
+ end
73
+ }
74
+ end
75
+ end
76
+ end
77
+ end
78
+ end
@@ -0,0 +1,226 @@
1
+ #
2
+ # Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3
+ # or more contributor license agreements. Licensed under the Elastic License;
4
+ # you may not use this file except in compliance with the Elastic License.
5
+ #
6
+
7
+ # frozen_string_literal: true
8
+
9
+ require 'yaml'
10
+
11
+ module Utility
12
+ module Elasticsearch
13
+ module Index
14
+ class TextAnalysisSettings
15
+ class UnsupportedLanguageCode < StandardError; end
16
+
17
+ DEFAULT_LANGUAGE = :en
18
+ FRONT_NGRAM_MAX_GRAM = 12
19
+ LANGUAGE_DATA_FILE_PATH = File.join(File.dirname(__FILE__), 'language_data.yml')
20
+
21
+ GENERIC_FILTERS = {
22
+ front_ngram: {
23
+ type: 'edge_ngram',
24
+ min_gram: 1,
25
+ max_gram: FRONT_NGRAM_MAX_GRAM
26
+ },
27
+ delimiter: {
28
+ type: 'word_delimiter_graph',
29
+ generate_word_parts: true,
30
+ generate_number_parts: true,
31
+ catenate_words: true,
32
+ catenate_numbers: true,
33
+ catenate_all: true,
34
+ preserve_original: false,
35
+ split_on_case_change: true,
36
+ split_on_numerics: true,
37
+ stem_english_possessive: true
38
+ },
39
+ bigram_joiner: {
40
+ type: 'shingle',
41
+ token_separator: '',
42
+ max_shingle_size: 2,
43
+ output_unigrams: false
44
+ },
45
+ bigram_joiner_unigrams: {
46
+ type: 'shingle',
47
+ token_separator: '',
48
+ max_shingle_size: 2,
49
+ output_unigrams: true
50
+ },
51
+ bigram_max_size: {
52
+ type: 'length',
53
+ min: 0,
54
+ max: 16
55
+ }
56
+ }.freeze
57
+
58
+ NON_ICU_ANALYSIS_SETTINGS = {
59
+ tokenizer_name: 'standard', folding_filters: %w(cjk_width lowercase asciifolding)
60
+ }.freeze
61
+
62
+ ICU_ANALYSIS_SETTINGS = {
63
+ tokenizer_name: 'icu_tokenizer', folding_filters: %w(icu_folding)
64
+ }.freeze
65
+
66
+ def initialize(language_code: nil, analysis_icu: false)
67
+ @language_code = (language_code || DEFAULT_LANGUAGE).to_sym
68
+
69
+ raise UnsupportedLanguageCode, "Language '#{language_code}' is not supported" unless language_data[@language_code]
70
+
71
+ @analysis_icu = analysis_icu
72
+ @analysis_settings = icu_settings(analysis_icu)
73
+ end
74
+
75
+ def to_h
76
+ {
77
+ analysis: {
78
+ analyzer: analyzer_definitions,
79
+ filter: filter_definitions
80
+ },
81
+ index: {
82
+ similarity: {
83
+ default: {
84
+ type: 'BM25'
85
+ }
86
+ }
87
+ }
88
+ }
89
+ end
90
+
91
+ private
92
+
93
+ attr_reader :language_code, :analysis_settings
94
+
95
+ def icu_settings(analysis_settings)
96
+ return ICU_ANALYSIS_SETTINGS if analysis_settings
97
+
98
+ NON_ICU_ANALYSIS_SETTINGS
99
+ end
100
+
101
+ def stemmer_name
102
+ language_data[language_code][:stemmer]
103
+ end
104
+
105
+ def stop_words_name_or_list
106
+ language_data[language_code][:stop_words]
107
+ end
108
+
109
+ def custom_filter_definitions
110
+ language_data[language_code][:custom_filter_definitions] || {}
111
+ end
112
+
113
+ def prepended_filters
114
+ language_data[language_code][:prepended_filters] || []
115
+ end
116
+
117
+ def postpended_filters
118
+ language_data[language_code][:postpended_filters] || []
119
+ end
120
+
121
+ def stem_filter_name
122
+ "#{language_code}-stem-filter".to_sym
123
+ end
124
+
125
+ def stop_words_filter_name
126
+ "#{language_code}-stop-words-filter".to_sym
127
+ end
128
+
129
+ def filter_definitions
130
+ definitions = GENERIC_FILTERS.dup
131
+
132
+ definitions[stem_filter_name] = {
133
+ type: 'stemmer',
134
+ name: stemmer_name
135
+ }
136
+
137
+ definitions[stop_words_filter_name] = {
138
+ type: 'stop',
139
+ stopwords: stop_words_name_or_list
140
+ }
141
+
142
+ definitions.merge(custom_filter_definitions)
143
+ end
144
+
145
+ def analyzer_definitions
146
+ definitions = {}
147
+
148
+ definitions[:i_prefix] = {
149
+ tokenizer: analysis_settings[:tokenizer_name],
150
+ filter: [
151
+ *analysis_settings[:folding_filters],
152
+ 'front_ngram'
153
+ ]
154
+ }
155
+
156
+ definitions[:q_prefix] = {
157
+ tokenizer: analysis_settings[:tokenizer_name],
158
+ filter: [
159
+ *analysis_settings[:folding_filters]
160
+ ]
161
+ }
162
+
163
+ definitions[:iq_text_base] = {
164
+ tokenizer: analysis_settings[:tokenizer_name],
165
+ filter: [
166
+ *analysis_settings[:folding_filters],
167
+ stop_words_filter_name
168
+ ]
169
+ }
170
+
171
+ definitions[:iq_text_stem] = {
172
+ tokenizer: analysis_settings[:tokenizer_name],
173
+ filter: [
174
+ *prepended_filters,
175
+ *analysis_settings[:folding_filters],
176
+ stop_words_filter_name,
177
+ stem_filter_name,
178
+ *postpended_filters
179
+ ]
180
+ }
181
+
182
+ definitions[:iq_text_delimiter] = {
183
+ tokenizer: 'whitespace',
184
+ filter: [
185
+ *prepended_filters,
186
+ 'delimiter',
187
+ *analysis_settings[:folding_filters],
188
+ stop_words_filter_name,
189
+ stem_filter_name,
190
+ *postpended_filters
191
+ ]
192
+ }
193
+
194
+ definitions[:i_text_bigram] = {
195
+ tokenizer: analysis_settings[:tokenizer_name],
196
+ filter: [
197
+ *analysis_settings[:folding_filters],
198
+ stem_filter_name,
199
+ 'bigram_joiner',
200
+ 'bigram_max_size'
201
+ ]
202
+ }
203
+
204
+ definitions[:q_text_bigram] = {
205
+ tokenizer: analysis_settings[:tokenizer_name],
206
+ filter: [
207
+ *analysis_settings[:folding_filters],
208
+ stem_filter_name,
209
+ 'bigram_joiner_unigrams',
210
+ 'bigram_max_size'
211
+ ]
212
+ }
213
+
214
+ definitions
215
+ end
216
+
217
+ def language_data
218
+ @language_data ||= YAML.safe_load(
219
+ File.read(LANGUAGE_DATA_FILE_PATH),
220
+ symbolize_names: true
221
+ )
222
+ end
223
+ end
224
+ end
225
+ end
226
+ end
metadata ADDED
@@ -0,0 +1,50 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: connectors_utility
3
+ version: !ruby/object:Gem::Version
4
+ version: 8.4.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Elastic
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2022-07-14 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: ''
14
+ email: ent-search-dev@elastic.co
15
+ executables: []
16
+ extensions: []
17
+ extra_rdoc_files: []
18
+ files:
19
+ - LICENSE
20
+ - NOTICE.txt
21
+ - lib/connectors_utility.rb
22
+ - lib/utility/elasticsearch/index/language_data.yml
23
+ - lib/utility/elasticsearch/index/mappings.rb
24
+ - lib/utility/elasticsearch/index/text_analysis_settings.rb
25
+ homepage: https://github.com/elastic/connectors-ruby
26
+ licenses:
27
+ - Elastic-2.0
28
+ metadata:
29
+ revision: c9283d0e12a3ae8253225becbefef02d0c6153c8
30
+ repository: git@github.com:elastic/connectors.git
31
+ post_install_message:
32
+ rdoc_options: []
33
+ require_paths:
34
+ - lib
35
+ required_ruby_version: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - ">="
38
+ - !ruby/object:Gem::Version
39
+ version: '0'
40
+ required_rubygems_version: !ruby/object:Gem::Requirement
41
+ requirements:
42
+ - - ">="
43
+ - !ruby/object:Gem::Version
44
+ version: '0'
45
+ requirements: []
46
+ rubygems_version: 3.0.3.1
47
+ signing_key:
48
+ specification_version: 4
49
+ summary: Gem containing shared Connector Services libraries
50
+ test_files: []