RubyGems - human_query_parser - Versions diffs - 1.0.0 - Mend

human_query_parser 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

checksums.yaml +7 -0
data/.gitignore +10 -0
data/.rubocop.yml +131 -0
data/CHANGELOG.md +4 -0
data/CODE_OF_CONDUCT.md +46 -0
data/Gemfile +5 -0
data/Guardfile +22 -0
data/Jenkinsfile +92 -0
data/LICENSE +21 -0
data/README.md +261 -0
data/Rakefile +10 -0
data/bin/_guard-core +16 -0
data/bin/guard +16 -0
data/bin/rake +16 -0
data/human_query_parser.gemspec +29 -0
data/lib/human_query_parser.rb +14 -0
data/lib/human_query_parser/bareword.rb +63 -0
data/lib/human_query_parser/parser.rb +26 -0
data/lib/human_query_parser/phrase.rb +36 -0
data/lib/human_query_parser/query.rb +46 -0
data/lib/human_query_parser/term.rb +22 -0
data/lib/human_query_parser/transform.rb +14 -0
data/lib/human_query_parser/version.rb +3 -0
data/test/bareword_test.rb +52 -0
data/test/human_query_parser_test.rb +18 -0
data/test/parser_test.rb +136 -0
data/test/phrase_test.rb +33 -0
data/test/query_test.rb +159 -0
data/test/term_test.rb +16 -0
data/test/test_helper.rb +18 -0
data/test/transform_test.rb +185 -0
metadata +167 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 1fd4940b679c88cf7ac1a25ca479d1b856f76205cab28f187ebbea7eb1a9b8c6
+  data.tar.gz: 5b85cdfd402e6aec048ce25bf81656f616efb52cafc17749d6fa9b1f57836dc0
+SHA512:
+  metadata.gz: '08fd6f7b6cd8f10c2bdfea744746d2584343851d316779686908afe2c3181486406c7601b1ee43a04d74dd809a7c21b4cf9d3305c1d9bbc3b7c30c1f8d01875f'
+  data.tar.gz: cd437ebeec2376c80b65e9a5dc9d3cd8de94f5b50a7d07afbe62f0caa38cdd95310207e8383d012c23e66d92fd27287734f1a4e381eda1987738f147fc58e478

data/.gitignore ADDED Viewed

@@ -0,0 +1,10 @@
+/.bundle/
+/.yardoc
+/_yardoc/
+/Gemfile.lock
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/
+*.gem

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,131 @@
+AllCops:
+  Exclude:
+    - '*.gemspec'
+    - 'Gemfile*'
+  TargetRubyVersion: "2.3"
+# Do not modify any of these rules without running your changes past the `@techleads` Slack group.
+# In order to make CodeClimate run in less than 30 minutes (the time-out), we're temporarily turning off
+# some of the low-risk, high-frequency cops.  This should be a temporary solution.  We should either
+# fix all of these issues across the codebase, or tune the cops to only pull up these issues when they
+# matter (e.g. setting the max line length to 120 instead of 80)
+Style/StringLiterals:
+  Enabled: false # this cop slows rubocop down too much so CodeClimate times out
+  EnforcedStyle: double_quotes
+Style/NumericLiterals:
+  Enabled: false # this cop slows rubocop down too much so CodeClimate times out
+# Permanent rules below
+Layout/AlignParameters:
+  Enabled: false # we haven't reached consensus yet on what this rule should be
+Layout/CaseIndentation:
+  EnforcedStyle: end
+  IndentOneStep: false
+# Layout/ElseAlignment:
+#   Enabled: false # we haven't reached consensus yet on what this rule should be
+# Layout/MultilineMethodCallIndentation:
+#   Enabled: false # we haven't reached consensus yet on what this rule should be
+Layout/MultilineOperationIndentation:
+  Enabled: true
+  EnforcedStyle: indented
+Lint/AmbiguousRegexpLiteral:
+  Enabled: false # we disagree with this rule
+# Lint/EndAlignment:
+#   EnforcedStyleAlignWith: start_of_line
+Metrics/AbcSize:
+  # There are ~500 methods > 20 as of 3/7/18, vs. ~900 methods > 15 (which is the default)
+  Max: 20
+Metrics/BlockLength:
+  Max: 25
+  ExcludedMethods: ["class_methods", "describe", "included"]
+Metrics/ClassLength:
+  # nearly all classes over 200 lines are old and crufty and should be split up
+  Max: 200
+Metrics/CyclomaticComplexity:
+  # There are ~100 methods > 8 as of 3/7/18, vs. 220 > 6 (which is the default)
+  Max: 8
+Metrics/LineLength:
+  # There are 23k lines > 80 chars, 5k > 120, 2k > 140 chars as of 12/21/17
+  Max: 120
+Metrics/MethodLength:
+  # There are ~100 methods > 35 lines as of 12/21/17
+  Max: 35
+Metrics/ModuleLength:
+  # nearly all modules over 200 lines are old and crufty and should be split up
+  Max: 200
+Metrics/PerceivedComplexity:
+  # There are ~100 methods > 8 as of 3/7/18, vs. ~150 methods > 7 (which is the default)
+  Max: 8
+Naming/VariableNumber:
+  EnforcedStyle: snake_case # `condition_info_1` is easier to read than `condition_info1`
+Performance/Casecmp:
+  Enabled: false # we generally prefer the readability of downcase over the performance of casecmp
+Style/Alias:
+  EnforcedStyle: prefer_alias_method
+Style/BracesAroundHashParameters:
+  Enabled: false # using braces can be a good choice, e.g., `assert_equal expected_json, { "foo" => "bar" }`
+Style/ClassAndModuleChildren:
+  EnforcedStyle: compact
+  Exclude: ["share/**/*"]
+Style/CollectionMethods:
+  Enabled: true
+  PreferredMethods:
+    collect: map
+    collect!: map!
+    inject: reduce
+    detect: find
+    find_all: select
+Style/Documentation:
+  Enabled: false # don't require class and method doc comments
+Style/EmptyMethod:
+  EnforcedStyle: expanded
+Style/FrozenStringLiteralComment:
+  EnforcedStyle: never
+Style/GuardClause:
+  Enabled: false # sometimes guard clauses are appropriate and sometimes they aren't
+Style/IfUnlessModifier:
+  Enabled: false # sometimes its clearer to use traditional if/end conditionals, even if they would fit on one line
+Style/NumericPredicate:
+  Enabled: false # we disagree with this rule
+Style/SymbolArray:
+  Enabled: false # undecided
+Style/TrailingCommaInArguments:
+  EnforcedStyleForMultiline: consistent_comma
+Style/TrailingCommaInLiteral:
+  EnforcedStyleForMultiline: consistent_comma
+Style/WordArray:
+  Enabled: false # sometimes %w makes sense and sometimes it doesn't

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,4 @@
+## [v1.0.0]
+> 2018-03-29
+* Extract gem from plm-website local gems

data/CODE_OF_CONDUCT.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Contributor Covenant Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+* The use of sexualized language or imagery and unwelcome sexual attention or advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at nbudin@patientslikeme.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
+[homepage]: http://contributor-covenant.org
+[version]: http://contributor-covenant.org/version/1/4/

data/Gemfile ADDED Viewed

@@ -0,0 +1,5 @@
+source 'https://rubygems.org'
+gemspec
+gem 'rake'

data/Guardfile ADDED Viewed

@@ -0,0 +1,22 @@
+# A sample Guardfile
+# More info at https://github.com/guard/guard#readme
+## Uncomment and set this to only include directories you want to watch
+# directories %w(app lib config test spec features) \
+#  .select{|d| Dir.exists?(d) ? d : UI.warning("Directory #{d} does not exist")}
+## Note: if you are using the `directories` clause above and you are not
+## watching the project directory ('.'), then you will want to move
+## the Guardfile to a watched dir and symlink it back, e.g.
+#
+#  $ mkdir config
+#  $ mv Guardfile config/
+#  $ ln -s config/Guardfile .
+#
+# and, you'll have to watch "config/Guardfile" instead of "Guardfile"
+guard :minitest do
+  watch(%r{^test/(.*)\/?(.*)_test\.rb$})
+  watch(%r{^lib/(.*/)?([^/]+)\.rb$})     { "test" }
+  watch(%r{^test/test_helper\.rb$})      { 'test' }
+end

data/Jenkinsfile ADDED Viewed

@@ -0,0 +1,92 @@
+pipeline {
+  agent none
+  options {
+    timeout(time: 1, unit: 'HOURS')
+    skipDefaultCheckout()
+  }
+  stages {
+    stage("Build Ruby") {
+      agent {
+        node {
+          label 'docker'
+        }
+      }
+      steps {
+        script {
+          with_ruby_build() {
+            script {
+              uid = sh(returnStdout: true, script: 'stat -c %g .').trim()
+              gid = sh(returnStdout: true, script: 'stat -c %u .').trim()
+            }
+            sh "chown -R ${uid}:${gid} vendor/bundle/"
+            sh "rm -rf vendor/bundle/ruby/2.3.0/cache"
+            stash name: 'ruby-bundle', includes: 'vendor/bundle/'
+          }
+        }
+      }
+    }
+    stage("Test") {
+      steps {
+        script {
+          node('docker') {
+            checkout([
+              $class: 'GitSCM',
+              branches: scm.branches,
+              doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
+              extensions: scm.extensions + [[$class: 'CloneOption', noTags: true, reference: '', shallow: true]],
+              userRemoteConfigs: scm.userRemoteConfigs
+            ])
+            try {
+              docker.image('ruby:2.3.3').inside() {
+                sh 'rm -rf vendor/bundle'
+                unstash 'ruby-bundle'
+                sh 'bundle install --path=vendor/bundle'
+                withEnv([
+                  'DISABLE_SPRING=1',
+                  'TZ=America/New_York'
+                ]) {
+                  sh 'bundle exec rake test'
+                }
+              }
+            }
+            finally {
+              junit 'test/reports/'
+              cleanWs()
+            }
+          }
+        }
+      }
+    }
+  }
+  post {
+    failure {
+      script {
+        if (env.BRANCH_NAME == 'master' || env.BRANCH_NAME == 'current') {
+          slackSend (channel: '#plm_website', color: '#FF0000', message: "FAILED ${env.JOB_NAME} [${env.BUILD_NUMBER}] (${env.RUN_DISPLAY_URL})")
+        }
+      }
+    }
+  }
+}
+def with_ruby_build(closure) {
+  docker.image('ruby:2.3.3').inside() {
+    checkout([
+      $class: 'GitSCM',
+      branches: scm.branches,
+      doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
+      extensions: scm.extensions + [[$class: 'CloneOption', noTags: true, reference: '', shallow: true]],
+      userRemoteConfigs: scm.userRemoteConfigs
+    ])
+    sh 'rm -rf vendor/bundle'
+    sh 'bundle install --path=vendor/bundle'
+    closure()
+    cleanWs()
+  }
+}

data/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2017-2018 PatientsLikeMe, Inc.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,261 @@
+# HumanQueryParser
+A tool for taking search queries of the form most users will expect, and producing ElasticSearch queries that do what
+most users would expect.
+For example:
+```
+some terms to search for
+```
+will produce a query that returns results that match as many as possible of "some", "terms", "to", "search", and "for",
+preferably in that order, allowing some level of misspelling (but penalizing results for it).
+```
+"a phrase" term
+```
+will produce a query that returns results that match the entirety of "a phrase", or the word "term", or preferably
+both, allowing some level of misspelling (but penalizing results for it).
+```
++required optional
+```
+will produce a query that returns results that include exactly the word "required", ranking results that also contain
+"optional" higher, and allowing misspellings of "optional" but not of "required".
+```
++"required phrase" optional
+```
+will produce a query that returns results that include exactly the phrase "required phrase", ranking results that also
+contain "optional" higher, and allowing misspellings of "optional" but not of "required phrase".
+```
+-negatory +affirmative
+```
+will produce a query that returns results that include exactly the word "affirmative", but not the word "negatory".
+Misspellings of either are not counted.
+```
+-none -of -these -words
+```
+will produce a query that returns results that don't include any of the words "none", "of", "these", or "words".
+## Installation
+Add this to your Gemfile:
+```ruby
+gem 'human_query_parser'
+```
+Then run:
+```bash
+bundle install
+```
+## Usage
+To compile a query, use the `HumanQueryParser.compile` method.  This method takes two parameters:
+1. The text of the search query
+2. An array of field names to search against
+The content of the field names is entirely up to you and the design of your ElasticSearch index, but typically
+will be names of one or more text fields in an ElasticSearch document.
+For example:
+```ruby
+es_query = HumanQueryParser.compile("search query goes here", ['field1', 'field2'])
+```
+You could then use this to query ElasticSearch in a variety of ways.  For example, if you're using the official
+[elasticsearch-model](https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model) gem, you could
+do:
+```ruby
+results = MyModel.search(query: es_query)
+```
+Easy peasy!
+## Under the Hood
+The above example returns the following ElasticSearch query:
+```json
+{
+  "bool": {
+    "should": [
+      {
+        "multi_match": {
+          "fields": [
+            "field1",
+            "field2"
+          ],
+          "query": "search query goes here",
+          "max_expansions": 50,
+          "fuzziness": "AUTO"
+        }
+      },
+      {
+        "function_score": {
+          "query": {
+            "multi_match": {
+              "fields": [
+                "field1",
+                "field2"
+              ],
+              "query": "search query goes here",
+              "max_expansions": 50,
+              "fuzziness": "AUTO",
+              "operator": "and"
+            }
+          },
+          "boost": 6.0
+        }
+      },
+      {
+        "function_score": {
+          "query": {
+            "multi_match": {
+              "fields": [
+                "field1",
+                "field2"
+              ],
+              "query": "search query goes here",
+              "max_expansions": 50,
+              "fuzziness": "AUTO",
+              "type": "phrase"
+            }
+          },
+          "boost": 8.0
+        }
+      },
+      {
+        "multi_match": {
+          "fields": [
+            "field1",
+            "field2"
+          ],
+          "query": "search query goes here",
+          "max_expansions": 50,
+          "fuzziness": "AUTO",
+          "prefix_length": 3
+        }
+      }
+    ]
+  }
+}
+```
+It's a little complicated, yeah!  Let's break this down piece by piece.
+```json
+{
+  "multi_match": {
+    "fields": [
+      "field1",
+      "field2"
+    ],
+    "query": "search query goes here",
+    "max_expansions": 50,
+    "fuzziness": "AUTO"
+  }
+}
+```
+This is our basic, misspelling-allowed version of the query against all the fields we specified.  We've found 80%
+fuzziness and 50 expansions to produce good results and have hardcoded them in for now, but this may well become
+configurable in future versions of HumanQueryParser.
+```json
+{
+  "function_score": {
+    "query": {
+      "multi_match": {
+        "fields": [
+          "field1",
+          "field2"
+        ],
+        "query": "search query goes here",
+        "max_expansions": 50,
+        "fuzziness": "AUTO",
+        "operator": "and"
+      }
+    },
+    "boost": 6.0
+  }
+}
+```
+This is the same query with misspellings allowed, but all terms (or misspelled versions thereof) must be present in
+order to match.  If they all are, we boost the result score 6x.
+```json
+{
+  "function_score": {
+    "query": {
+      "multi_match": {
+        "fields": [
+          "field1",
+          "field2"
+        ],
+        "query": "search query goes here",
+        "max_expansions": 50,
+        "fuzziness": "AUTO",
+        "type": "phrase"
+      }
+    },
+    "boost": 8.0
+  }
+}
+```
+This is a fundamentally different type of query, because it treats the entire search term set as a phrase.
+Misspellings are still allowed, but all terms (or misspelled versions thereof) must be present,
+*in the same order as in the query*.  If they are, we boost the result score 8x.
+```json
+{
+  "multi_match": {
+    "fields": [
+      "field1",
+      "field2"
+    ],
+    "query": "search query goes here",
+    "max_expansions": 50,
+    "fuzziness": "AUTO",
+    "prefix_length": 3
+  }
+}
+```
+This last fragment is yet again another type of query.  This allows a greater degree of misspelling (50% fuzziness)
+as long as the first 3 characters are an exact match.  This makes it easier to do typeahead search, by not requiring
+the user to have to know the entire spelling of the term (but guiding them by showing results as they go).
+This is the basic pattern for generated queries.  For terms with + and - operators, misspelling isn't allowed, and
+as a result we can use a single `multi_match` query rather than these four.  Terms with + and - will use `must` and
+`must_not` sections of the `bool` query respectively.
+## Running Tests
+1. Change to the gem's directory
+2. Run `bundle`
+3. Run `rake`
+## Release Process
+Once pull request is merged to master, on latest master:
+1. Update CHANGELOG.md. Version: [ major (breaking change: non-backwards
+   compatible release) | minor (new features) | patch (bugfixes) ]
+2. Update version in lib/global_enforcer/version.rb
+3. Release by running `bundle exec rake release`