human_query_parser 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 1fd4940b679c88cf7ac1a25ca479d1b856f76205cab28f187ebbea7eb1a9b8c6
4
+ data.tar.gz: 5b85cdfd402e6aec048ce25bf81656f616efb52cafc17749d6fa9b1f57836dc0
5
+ SHA512:
6
+ metadata.gz: '08fd6f7b6cd8f10c2bdfea744746d2584343851d316779686908afe2c3181486406c7601b1ee43a04d74dd809a7c21b4cf9d3305c1d9bbc3b7c30c1f8d01875f'
7
+ data.tar.gz: cd437ebeec2376c80b65e9a5dc9d3cd8de94f5b50a7d07afbe62f0caa38cdd95310207e8383d012c23e66d92fd27287734f1a4e381eda1987738f147fc58e478
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /Gemfile.lock
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *.gem
data/.rubocop.yml ADDED
@@ -0,0 +1,131 @@
1
+ AllCops:
2
+ Exclude:
3
+ - '*.gemspec'
4
+ - 'Gemfile*'
5
+ TargetRubyVersion: "2.3"
6
+
7
+ # Do not modify any of these rules without running your changes past the `@techleads` Slack group.
8
+
9
+ # In order to make CodeClimate run in less than 30 minutes (the time-out), we're temporarily turning off
10
+ # some of the low-risk, high-frequency cops. This should be a temporary solution. We should either
11
+ # fix all of these issues across the codebase, or tune the cops to only pull up these issues when they
12
+ # matter (e.g. setting the max line length to 120 instead of 80)
13
+
14
+ Style/StringLiterals:
15
+ Enabled: false # this cop slows rubocop down too much so CodeClimate times out
16
+ EnforcedStyle: double_quotes
17
+
18
+ Style/NumericLiterals:
19
+ Enabled: false # this cop slows rubocop down too much so CodeClimate times out
20
+
21
+ # Permanent rules below
22
+
23
+ Layout/AlignParameters:
24
+ Enabled: false # we haven't reached consensus yet on what this rule should be
25
+
26
+ Layout/CaseIndentation:
27
+ EnforcedStyle: end
28
+ IndentOneStep: false
29
+
30
+ # Layout/ElseAlignment:
31
+ # Enabled: false # we haven't reached consensus yet on what this rule should be
32
+
33
+ # Layout/MultilineMethodCallIndentation:
34
+ # Enabled: false # we haven't reached consensus yet on what this rule should be
35
+
36
+ Layout/MultilineOperationIndentation:
37
+ Enabled: true
38
+ EnforcedStyle: indented
39
+
40
+ Lint/AmbiguousRegexpLiteral:
41
+ Enabled: false # we disagree with this rule
42
+
43
+ # Lint/EndAlignment:
44
+ # EnforcedStyleAlignWith: start_of_line
45
+
46
+ Metrics/AbcSize:
47
+ # There are ~500 methods > 20 as of 3/7/18, vs. ~900 methods > 15 (which is the default)
48
+ Max: 20
49
+
50
+ Metrics/BlockLength:
51
+ Max: 25
52
+ ExcludedMethods: ["class_methods", "describe", "included"]
53
+
54
+ Metrics/ClassLength:
55
+ # nearly all classes over 200 lines are old and crufty and should be split up
56
+ Max: 200
57
+
58
+ Metrics/CyclomaticComplexity:
59
+ # There are ~100 methods > 8 as of 3/7/18, vs. 220 > 6 (which is the default)
60
+ Max: 8
61
+
62
+ Metrics/LineLength:
63
+ # There are 23k lines > 80 chars, 5k > 120, 2k > 140 chars as of 12/21/17
64
+ Max: 120
65
+
66
+ Metrics/MethodLength:
67
+ # There are ~100 methods > 35 lines as of 12/21/17
68
+ Max: 35
69
+
70
+ Metrics/ModuleLength:
71
+ # nearly all modules over 200 lines are old and crufty and should be split up
72
+ Max: 200
73
+
74
+ Metrics/PerceivedComplexity:
75
+ # There are ~100 methods > 8 as of 3/7/18, vs. ~150 methods > 7 (which is the default)
76
+ Max: 8
77
+
78
+ Naming/VariableNumber:
79
+ EnforcedStyle: snake_case # `condition_info_1` is easier to read than `condition_info1`
80
+
81
+ Performance/Casecmp:
82
+ Enabled: false # we generally prefer the readability of downcase over the performance of casecmp
83
+
84
+ Style/Alias:
85
+ EnforcedStyle: prefer_alias_method
86
+
87
+ Style/BracesAroundHashParameters:
88
+ Enabled: false # using braces can be a good choice, e.g., `assert_equal expected_json, { "foo" => "bar" }`
89
+
90
+ Style/ClassAndModuleChildren:
91
+ EnforcedStyle: compact
92
+ Exclude: ["share/**/*"]
93
+
94
+ Style/CollectionMethods:
95
+ Enabled: true
96
+ PreferredMethods:
97
+ collect: map
98
+ collect!: map!
99
+ inject: reduce
100
+ detect: find
101
+ find_all: select
102
+
103
+ Style/Documentation:
104
+ Enabled: false # don't require class and method doc comments
105
+
106
+ Style/EmptyMethod:
107
+ EnforcedStyle: expanded
108
+
109
+ Style/FrozenStringLiteralComment:
110
+ EnforcedStyle: never
111
+
112
+ Style/GuardClause:
113
+ Enabled: false # sometimes guard clauses are appropriate and sometimes they aren't
114
+
115
+ Style/IfUnlessModifier:
116
+ Enabled: false # sometimes its clearer to use traditional if/end conditionals, even if they would fit on one line
117
+
118
+ Style/NumericPredicate:
119
+ Enabled: false # we disagree with this rule
120
+
121
+ Style/SymbolArray:
122
+ Enabled: false # undecided
123
+
124
+ Style/TrailingCommaInArguments:
125
+ EnforcedStyleForMultiline: consistent_comma
126
+
127
+ Style/TrailingCommaInLiteral:
128
+ EnforcedStyleForMultiline: consistent_comma
129
+
130
+ Style/WordArray:
131
+ Enabled: false # sometimes %w makes sense and sometimes it doesn't
data/CHANGELOG.md ADDED
@@ -0,0 +1,4 @@
1
+ ## [v1.0.0]
2
+ > 2018-03-29
3
+
4
+ * Extract gem from plm-website local gems
@@ -0,0 +1,46 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
6
+
7
+ ## Our Standards
8
+
9
+ Examples of behavior that contributes to creating a positive environment include:
10
+
11
+ * Using welcoming and inclusive language
12
+ * Being respectful of differing viewpoints and experiences
13
+ * Gracefully accepting constructive criticism
14
+ * Focusing on what is best for the community
15
+ * Showing empathy towards other community members
16
+
17
+ Examples of unacceptable behavior by participants include:
18
+
19
+ * The use of sexualized language or imagery and unwelcome sexual attention or advances
20
+ * Trolling, insulting/derogatory comments, and personal or political attacks
21
+ * Public or private harassment
22
+ * Publishing others' private information, such as a physical or electronic address, without explicit permission
23
+ * Other conduct which could reasonably be considered inappropriate in a professional setting
24
+
25
+ ## Our Responsibilities
26
+
27
+ Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28
+
29
+ Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30
+
31
+ ## Scope
32
+
33
+ This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34
+
35
+ ## Enforcement
36
+
37
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at nbudin@patientslikeme.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38
+
39
+ Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40
+
41
+ ## Attribution
42
+
43
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44
+
45
+ [homepage]: http://contributor-covenant.org
46
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'rake'
data/Guardfile ADDED
@@ -0,0 +1,22 @@
1
+ # A sample Guardfile
2
+ # More info at https://github.com/guard/guard#readme
3
+
4
+ ## Uncomment and set this to only include directories you want to watch
5
+ # directories %w(app lib config test spec features) \
6
+ # .select{|d| Dir.exists?(d) ? d : UI.warning("Directory #{d} does not exist")}
7
+
8
+ ## Note: if you are using the `directories` clause above and you are not
9
+ ## watching the project directory ('.'), then you will want to move
10
+ ## the Guardfile to a watched dir and symlink it back, e.g.
11
+ #
12
+ # $ mkdir config
13
+ # $ mv Guardfile config/
14
+ # $ ln -s config/Guardfile .
15
+ #
16
+ # and, you'll have to watch "config/Guardfile" instead of "Guardfile"
17
+
18
+ guard :minitest do
19
+ watch(%r{^test/(.*)\/?(.*)_test\.rb$})
20
+ watch(%r{^lib/(.*/)?([^/]+)\.rb$}) { "test" }
21
+ watch(%r{^test/test_helper\.rb$}) { 'test' }
22
+ end
data/Jenkinsfile ADDED
@@ -0,0 +1,92 @@
1
+ pipeline {
2
+ agent none
3
+ options {
4
+ timeout(time: 1, unit: 'HOURS')
5
+ skipDefaultCheckout()
6
+ }
7
+
8
+ stages {
9
+ stage("Build Ruby") {
10
+ agent {
11
+ node {
12
+ label 'docker'
13
+ }
14
+ }
15
+
16
+ steps {
17
+ script {
18
+ with_ruby_build() {
19
+ script {
20
+ uid = sh(returnStdout: true, script: 'stat -c %g .').trim()
21
+ gid = sh(returnStdout: true, script: 'stat -c %u .').trim()
22
+ }
23
+
24
+ sh "chown -R ${uid}:${gid} vendor/bundle/"
25
+ sh "rm -rf vendor/bundle/ruby/2.3.0/cache"
26
+ stash name: 'ruby-bundle', includes: 'vendor/bundle/'
27
+ }
28
+ }
29
+ }
30
+ }
31
+
32
+ stage("Test") {
33
+ steps {
34
+ script {
35
+ node('docker') {
36
+ checkout([
37
+ $class: 'GitSCM',
38
+ branches: scm.branches,
39
+ doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
40
+ extensions: scm.extensions + [[$class: 'CloneOption', noTags: true, reference: '', shallow: true]],
41
+ userRemoteConfigs: scm.userRemoteConfigs
42
+ ])
43
+ try {
44
+ docker.image('ruby:2.3.3').inside() {
45
+ sh 'rm -rf vendor/bundle'
46
+ unstash 'ruby-bundle'
47
+ sh 'bundle install --path=vendor/bundle'
48
+
49
+ withEnv([
50
+ 'DISABLE_SPRING=1',
51
+ 'TZ=America/New_York'
52
+ ]) {
53
+ sh 'bundle exec rake test'
54
+ }
55
+ }
56
+ }
57
+ finally {
58
+ junit 'test/reports/'
59
+ cleanWs()
60
+ }
61
+ }
62
+ }
63
+ }
64
+ }
65
+ }
66
+
67
+ post {
68
+ failure {
69
+ script {
70
+ if (env.BRANCH_NAME == 'master' || env.BRANCH_NAME == 'current') {
71
+ slackSend (channel: '#plm_website', color: '#FF0000', message: "FAILED ${env.JOB_NAME} [${env.BUILD_NUMBER}] (${env.RUN_DISPLAY_URL})")
72
+ }
73
+ }
74
+ }
75
+ }
76
+ }
77
+
78
+ def with_ruby_build(closure) {
79
+ docker.image('ruby:2.3.3').inside() {
80
+ checkout([
81
+ $class: 'GitSCM',
82
+ branches: scm.branches,
83
+ doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
84
+ extensions: scm.extensions + [[$class: 'CloneOption', noTags: true, reference: '', shallow: true]],
85
+ userRemoteConfigs: scm.userRemoteConfigs
86
+ ])
87
+ sh 'rm -rf vendor/bundle'
88
+ sh 'bundle install --path=vendor/bundle'
89
+ closure()
90
+ cleanWs()
91
+ }
92
+ }
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2017-2018 PatientsLikeMe, Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,261 @@
1
+ # HumanQueryParser
2
+
3
+ A tool for taking search queries of the form most users will expect, and producing ElasticSearch queries that do what
4
+ most users would expect.
5
+
6
+ For example:
7
+
8
+ ```
9
+ some terms to search for
10
+ ```
11
+
12
+ will produce a query that returns results that match as many as possible of "some", "terms", "to", "search", and "for",
13
+ preferably in that order, allowing some level of misspelling (but penalizing results for it).
14
+
15
+ ```
16
+ "a phrase" term
17
+ ```
18
+
19
+ will produce a query that returns results that match the entirety of "a phrase", or the word "term", or preferably
20
+ both, allowing some level of misspelling (but penalizing results for it).
21
+
22
+ ```
23
+ +required optional
24
+ ```
25
+
26
+ will produce a query that returns results that include exactly the word "required", ranking results that also contain
27
+ "optional" higher, and allowing misspellings of "optional" but not of "required".
28
+
29
+ ```
30
+ +"required phrase" optional
31
+ ```
32
+
33
+ will produce a query that returns results that include exactly the phrase "required phrase", ranking results that also
34
+ contain "optional" higher, and allowing misspellings of "optional" but not of "required phrase".
35
+
36
+ ```
37
+ -negatory +affirmative
38
+ ```
39
+
40
+ will produce a query that returns results that include exactly the word "affirmative", but not the word "negatory".
41
+ Misspellings of either are not counted.
42
+
43
+ ```
44
+ -none -of -these -words
45
+ ```
46
+
47
+ will produce a query that returns results that don't include any of the words "none", "of", "these", or "words".
48
+
49
+ ## Installation
50
+
51
+ Add this to your Gemfile:
52
+
53
+ ```ruby
54
+ gem 'human_query_parser'
55
+ ```
56
+
57
+ Then run:
58
+
59
+ ```bash
60
+ bundle install
61
+ ```
62
+
63
+ ## Usage
64
+
65
+ To compile a query, use the `HumanQueryParser.compile` method. This method takes two parameters:
66
+
67
+ 1. The text of the search query
68
+ 2. An array of field names to search against
69
+
70
+ The content of the field names is entirely up to you and the design of your ElasticSearch index, but typically
71
+ will be names of one or more text fields in an ElasticSearch document.
72
+
73
+ For example:
74
+
75
+ ```ruby
76
+ es_query = HumanQueryParser.compile("search query goes here", ['field1', 'field2'])
77
+ ```
78
+
79
+ You could then use this to query ElasticSearch in a variety of ways. For example, if you're using the official
80
+ [elasticsearch-model](https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model) gem, you could
81
+ do:
82
+
83
+ ```ruby
84
+ results = MyModel.search(query: es_query)
85
+ ```
86
+
87
+ Easy peasy!
88
+
89
+ ## Under the Hood
90
+
91
+ The above example returns the following ElasticSearch query:
92
+
93
+ ```json
94
+ {
95
+ "bool": {
96
+ "should": [
97
+ {
98
+ "multi_match": {
99
+ "fields": [
100
+ "field1",
101
+ "field2"
102
+ ],
103
+ "query": "search query goes here",
104
+ "max_expansions": 50,
105
+ "fuzziness": "AUTO"
106
+ }
107
+ },
108
+ {
109
+ "function_score": {
110
+ "query": {
111
+ "multi_match": {
112
+ "fields": [
113
+ "field1",
114
+ "field2"
115
+ ],
116
+ "query": "search query goes here",
117
+ "max_expansions": 50,
118
+ "fuzziness": "AUTO",
119
+ "operator": "and"
120
+ }
121
+ },
122
+ "boost": 6.0
123
+ }
124
+ },
125
+ {
126
+ "function_score": {
127
+ "query": {
128
+ "multi_match": {
129
+ "fields": [
130
+ "field1",
131
+ "field2"
132
+ ],
133
+ "query": "search query goes here",
134
+ "max_expansions": 50,
135
+ "fuzziness": "AUTO",
136
+ "type": "phrase"
137
+ }
138
+ },
139
+ "boost": 8.0
140
+ }
141
+ },
142
+ {
143
+ "multi_match": {
144
+ "fields": [
145
+ "field1",
146
+ "field2"
147
+ ],
148
+ "query": "search query goes here",
149
+ "max_expansions": 50,
150
+ "fuzziness": "AUTO",
151
+ "prefix_length": 3
152
+ }
153
+ }
154
+ ]
155
+ }
156
+ }
157
+ ```
158
+
159
+ It's a little complicated, yeah! Let's break this down piece by piece.
160
+
161
+ ```json
162
+ {
163
+ "multi_match": {
164
+ "fields": [
165
+ "field1",
166
+ "field2"
167
+ ],
168
+ "query": "search query goes here",
169
+ "max_expansions": 50,
170
+ "fuzziness": "AUTO"
171
+ }
172
+ }
173
+ ```
174
+
175
+ This is our basic, misspelling-allowed version of the query against all the fields we specified. We've found 80%
176
+ fuzziness and 50 expansions to produce good results and have hardcoded them in for now, but this may well become
177
+ configurable in future versions of HumanQueryParser.
178
+
179
+ ```json
180
+ {
181
+ "function_score": {
182
+ "query": {
183
+ "multi_match": {
184
+ "fields": [
185
+ "field1",
186
+ "field2"
187
+ ],
188
+ "query": "search query goes here",
189
+ "max_expansions": 50,
190
+ "fuzziness": "AUTO",
191
+ "operator": "and"
192
+ }
193
+ },
194
+ "boost": 6.0
195
+ }
196
+ }
197
+ ```
198
+
199
+ This is the same query with misspellings allowed, but all terms (or misspelled versions thereof) must be present in
200
+ order to match. If they all are, we boost the result score 6x.
201
+
202
+ ```json
203
+ {
204
+ "function_score": {
205
+ "query": {
206
+ "multi_match": {
207
+ "fields": [
208
+ "field1",
209
+ "field2"
210
+ ],
211
+ "query": "search query goes here",
212
+ "max_expansions": 50,
213
+ "fuzziness": "AUTO",
214
+ "type": "phrase"
215
+ }
216
+ },
217
+ "boost": 8.0
218
+ }
219
+ }
220
+ ```
221
+
222
+ This is a fundamentally different type of query, because it treats the entire search term set as a phrase.
223
+ Misspellings are still allowed, but all terms (or misspelled versions thereof) must be present,
224
+ *in the same order as in the query*. If they are, we boost the result score 8x.
225
+
226
+ ```json
227
+ {
228
+ "multi_match": {
229
+ "fields": [
230
+ "field1",
231
+ "field2"
232
+ ],
233
+ "query": "search query goes here",
234
+ "max_expansions": 50,
235
+ "fuzziness": "AUTO",
236
+ "prefix_length": 3
237
+ }
238
+ }
239
+ ```
240
+
241
+ This last fragment is yet again another type of query. This allows a greater degree of misspelling (50% fuzziness)
242
+ as long as the first 3 characters are an exact match. This makes it easier to do typeahead search, by not requiring
243
+ the user to have to know the entire spelling of the term (but guiding them by showing results as they go).
244
+
245
+ This is the basic pattern for generated queries. For terms with + and - operators, misspelling isn't allowed, and
246
+ as a result we can use a single `multi_match` query rather than these four. Terms with + and - will use `must` and
247
+ `must_not` sections of the `bool` query respectively.
248
+
249
+ ## Running Tests
250
+
251
+ 1. Change to the gem's directory
252
+ 2. Run `bundle`
253
+ 3. Run `rake`
254
+
255
+
256
+ ## Release Process
257
+ Once pull request is merged to master, on latest master:
258
+ 1. Update CHANGELOG.md. Version: [ major (breaking change: non-backwards
259
+ compatible release) | minor (new features) | patch (bugfixes) ]
260
+ 2. Update version in lib/global_enforcer/version.rb
261
+ 3. Release by running `bundle exec rake release`