RubyGems - engtagger - Versions diffs - 0.4.0 → 0.4.1 - Mend

engtagger 0.4.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 0b61370e322595bd880097f51fe0728780fa6a01ee9975e6eb333c8720ff36d8
-  data.tar.gz: 0f990be4f4d5f71908d76f0fb52f2c925a2a01891a815cbc70eaf7a39f77edfe
+  metadata.gz: fe357706e69ed72bec9569babe91cc8531e2c1d0eac71ac8d248bdd74b97ba98
+  data.tar.gz: 02e6bb2ba29ecabf8e5087c5a2dc92ccad57ef3578fbd9c844f93188a4d39ced
 SHA512:
-  metadata.gz: ade5d1cf6fc11553519fe9217dffb06453e0ab7d69ab1532b3f2e2079dd05d035d90ce5ce92e4d0e1195f2a8f79df5b4d44c4cedb27f14df529ac0b0e91cf730
-  data.tar.gz: ff085546b0db152df0983dabea49ec5b0cf47525cca6118d3776378e908ea04fd675f0bb1daceb944d6be141615e3a5d9da5774025a0dc6ef609dd8b311b1412
+  metadata.gz: 5e477b0d839e825e8d49135cb6d6c72c21555454d6f722d00b994442cdaaba2b1afa84ab8f22f82f64b9540a0e24914180c59a563830ea11b2a0239921d3e88e
+  data.tar.gz: 49b02532d7ad940b25b19ba59df364fc553373371f7850f068729af3a339773417ad7f8d5e0e58ecc3facc6ef2168ae86bb9f94a46f9a412bf51d7de36fdab1e

data/.rubocop.yml CHANGED Viewed

@@ -18,9 +18,6 @@ Naming/FileName:
 Security/MarshalLoad:
   Enabled: false
-Layout/EndOfLine:
-   Enabled: False
 Style/ClassVars:
   Enabled: false

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger
-### Description
+## Description
 A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained
 tagger that assigns POS tags to English text based on a lookup dictionary and
@@ -13,13 +13,13 @@ word morphology or can be set to be treated as nouns or other parts of speech.
 The tagger also extracts as many nouns and noun phrases as it can, using a set
 of regular expressions.
-### Features
+## Features
 * Assigns POS tags to English text
 * Extract noun phrases from tagged text
 * etc.
-### Synopsis
+## Synopsis
 ```ruby
 require 'engtagger'
@@ -72,7 +72,7 @@ nps = tgr.get_noun_phrases(tagged)
 #=> {"Alice"=>1, "cat"=>1, "fat cat"=>1, "big fat cat"=>1}
 ```
-### Tag Set
+## Tag Set
 The set of POS tags used here is a modified version of the Penn Treebank tagset. Tags with non-letter characters have been redefined to work better in our data structures. Also, the "Determiner" tag (DET) has been changed from 'DT', in order to avoid confusion with the HTML tag, `<DT>`.
@@ -122,26 +122,56 @@ The set of POS tags used here is a modified version of the Penn Treebank tagset.
     LRB     Punctuation, left bracket               (, {, [
     RRB     Punctuation, right bracket              ), }, ]
-### Install
+## Installation
-    gem install engtagger
+**Recommended Approach (without sudo):**
-### Author
+It is recommended to install the `engtagger` gem within your user environment without root privileges. This ensures proper file permissions and avoids potential issues. You can achieve this by using Ruby version managers like `rbenv` or `rvm` to manage your Ruby versions and gemsets.
-of this Ruby library
+To install without `sudo`, simply run:
-* Yoichiro Hasebe (yohasebe [at] gmail.com)
+```bash
+gem install engtagger
+```
+**Alternative Approach (with sudo):**
+If you must use `sudo` for installation, you'll need to adjust file permissions afterward to ensure accessibility.
+1. Install the gem with `sudo`:
+```bash
+sudo gem install engtagger
+```
+2. Grant necessary permissions to your user:
+```bash
+sudo chown -R $(whoami) /Library/Ruby/Gems/2.6.0/gems/engtagger-0.4.1
+```
+**Note:** The path above assumes you are using Ruby version 2.6.0.  If you are using a different version, you will need to modify the path accordingly.  You can find your Ruby version by running `ruby -v`.
+## Troubleshooting
+**Permission Issues:**
+If you encounter "cannot load such file" errors after installation, it might be due to incorrect file permissions. Ensure you've followed the instructions for adjusting permissions if you used `sudo` during installation.
+## Author
+Yoichiro Hasebe (yohasebe [at] gmail.com)
-### Contributors
+## Contributors
 Many thanks to the collaborators listed in the right column of this GitHub page.
-### Acknowledgement
+## Acknowledgement
 This Ruby library is a direct port of Lingua::EN::Tagger available at CPAN.
 The credit for the crucial part of its algorithm/design therefore goes to
 Aaron Coburn, the author of the original Perl version.
-### License
+## License
 This library is distributed under the GPL.  Please see the LICENSE file.

data/lib/engtagger/porter.rb CHANGED Viewed

@@ -1,170 +1,169 @@
-# frozen_string_literal: true
-module Stemmable
-  STEP_2_LIST = {
-    "ational" => "ate", "tional" => "tion", "enci" => "ence", "anci" => "ance",
-    "izer" => "ize", "bli" => "ble",
-    "alli" => "al", "entli" => "ent", "eli" => "e", "ousli" => "ous",
-    "ization" => "ize", "ation" => "ate",
-    "ator" => "ate", "alism" => "al", "iveness" => "ive", "fulness" => "ful",
-    "ousness" => "ous", "aliti" => "al",
-    "iviti" => "ive", "biliti" => "ble", "logi" => "log"
-  }.freeze
-  STEP_3_LIST = {
-    "icate" => "ic", "ative" => "", "alize" => "al", "iciti" => "ic",
-    "ical" => "ic", "ful" => "", "ness" => ""
-  }.freeze
-  SUFFIX_1_REGEXP = /(
-                    ational  |
-                    tional   |
-                    enci     |
-                    anci     |
-                    izer     |
-                    bli      |
-                    alli     |
-                    entli    |
-                    eli      |
-                    ousli    |
-                    ization  |
-                    ation    |
-                    ator     |
-                    alism    |
-                    iveness  |
-                    fulness  |
-                    ousness  |
-                    aliti    |
-                    iviti    |
-                    biliti   |
-                    logi)$/x.freeze
-  SUFFIX_2_REGEXP = /(
-                      al       |
-                      ance     |
-                      ence     |
-                      er       |
-                      ic       |
-                      able     |
-                      ible     |
-                      ant      |
-                      ement    |
-                      ment     |
-                      ent      |
-                      ou       |
-                      ism      |
-                      ate      |
-                      iti      |
-                      ous      |
-                      ive      |
-                      ize)$/x.freeze
-  C = "[^aeiou]" # consonant
-  V = "[aeiouy]" # vowel
-  CC = "#{C}(?>[^aeiouy]*)" # consonant sequence
-  VV = "#{V}(?>[aeiou]*)"   # vowel sequence
-  MGR0 = /^(#{CC})?#{VV}#{CC}/o.freeze # [cc]vvcc... is m>0
-  MEQ1 = /^(#{CC})?#{VV}#{CC}(#{VV})?$/o.freeze # [cc]vvcc[vv] is m=1
-  MGR1 = /^(#{CC})?#{VV}#{CC}#{VV}#{CC}/o.freeze # [cc]vvccvvcc... is m>1
-  VOWEL_IN_STEM = /^(#{CC})?#{V}/o.freeze # vowel in stem
-  # Porter stemmer in Ruby.
-  #
-  # This is the Porter stemming algorithm, ported to Ruby from the
-  # version coded up in Perl.  It's easy to follow against the rules
-  # in the original paper in:
-  #
-  #   Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
-  #   no. 3, pp 130-137,
-  #
-  # See also http://www.tartarus.org/~martin/PorterStemmer
-  #
-  # Send comments to raypereda@hotmail.com
-  #
-  def stem_porter
-    # make a copy of the given object and convert it to a string.
-    w = dup.to_str
-    return w if w.length < 3
-    # now map initial y to Y so that the patterns never treat it as vowel
-    w[0] = "Y" if w[0] == "y"
-    # Step 1a
-    case w
-    when /(ss|i)es$/
-      w = $` + $1
-    when /([^s])s$/
-      w = $` + $1
-    end
-    # Step 1b
-    case w
-    when /eed$/
-      w.chop! if $` =~ MGR0
-    when /(ed|ing)$/
-      stem = $`
-      if stem =~ VOWEL_IN_STEM
-        w = stem
-        case w
-        when /(at|bl|iz)$/             then w << "e"
-        when /([^aeiouylsz])\1$/       then w.chop!
-        when /^#{CC}#{V}[^aeiouwxy]$/o then w << "e"
-        end
-      end
-    end
-    if w =~ /y$/
-      stem = $`
-      w = stem + "i" if stem =~ VOWEL_IN_STEM
-    end
-    # Step 2
-    if w =~ SUFFIX_1_REGEXP
-      stem = $`
-      suffix = $1
-      # print "stem= " + stem + "\n" + "suffix=" + suffix + "\n"
-      w = stem + STEP_2_LIST[suffix] if stem =~ MGR0
-    end
-    # Step 3
-    if w =~ /(icate|ative|alize|iciti|ical|ful|ness)$/
-      stem = $`
-      suffix = $1
-      w = stem + STEP_3_LIST[suffix] if stem =~ MGR0
-    end
-    # Step 4
-    if w =~ SUFFIX_2_REGEXP
-      stem = $`
-      w = stem if stem =~ MGR1
-    elsif w =~ /(s|t)(ion)$/
-      stem = $` + $1
-      w = stem if stem =~ MGR1
-    end
-    #  Step 5
-    if w =~ /e$/
-      stem = $`
-      w = stem if (stem =~ MGR1) || (stem =~ MEQ1 && stem !~ /^#{CC}#{V}[^aeiouwxy]$/o)
-    end
-    w.chop! if w =~ /ll$/ && w =~ MGR1
-    # and turn initial Y back to y
-    w[0] = "y" if w[0] == "Y"
-    w
-  end
-  # make the stem_porter the default stem method, just in case we
-  # feel like having multiple stemmers available later.
-  alias stem stem_porter
-end
-# Add stem method to all Strings
-class String
-  include Stemmable
-end
+# frozen_string_literal: true
+module Stemmable
+  STEP_2_LIST = {
+    "ational" => "ate", "tional" => "tion", "enci" => "ence", "anci" => "ance",
+    "izer" => "ize", "bli" => "ble",
+    "alli" => "al", "entli" => "ent", "eli" => "e", "ousli" => "ous",
+    "ization" => "ize", "ation" => "ate",
+    "ator" => "ate", "alism" => "al", "iveness" => "ive", "fulness" => "ful",
+    "ousness" => "ous", "aliti" => "al",
+    "iviti" => "ive", "biliti" => "ble", "logi" => "log"
+  }.freeze
+  STEP_3_LIST = {
+    "icate" => "ic", "ative" => "", "alize" => "al", "iciti" => "ic",
+    "ical" => "ic", "ful" => "", "ness" => ""
+  }.freeze
+  SUFFIX_1_REGEXP = /(
+                    ational  |
+                    tional   |
+                    enci     |
+                    anci     |
+                    izer     |
+                    bli      |
+                    alli     |
+                    entli    |
+                    eli      |
+                    ousli    |
+                    ization  |
+                    ation    |
+                    ator     |
+                    alism    |
+                    iveness  |
+                    fulness  |
+                    ousness  |
+                    aliti    |
+                    iviti    |
+                    biliti   |
+                    logi)$/x.freeze
+  SUFFIX_2_REGEXP = /(
+                      al       |
+                      ance     |
+                      ence     |
+                      er       |
+                      ic       |
+                      able     |
+                      ible     |
+                      ant      |
+                      ement    |
+                      ment     |
+                      ent      |
+                      ou       |
+                      ism      |
+                      ate      |
+                      iti      |
+                      ous      |
+                      ive      |
+                      ize)$/x.freeze
+  C = "[^aeiou]" # consonant
+  V = "[aeiouy]" # vowel
+  CC = "#{C}(?>[^aeiouy]*)" # consonant sequence
+  VV = "#{V}(?>[aeiou]*)"   # vowel sequence
+  MGR0 = /^(#{CC})?#{VV}#{CC}/o.freeze # [cc]vvcc... is m>0
+  MEQ1 = /^(#{CC})?#{VV}#{CC}(#{VV})?$/o.freeze # [cc]vvcc[vv] is m=1
+  MGR1 = /^(#{CC})?#{VV}#{CC}#{VV}#{CC}/o.freeze # [cc]vvccvvcc... is m>1
+  VOWEL_IN_STEM = /^(#{CC})?#{V}/o.freeze # vowel in stem
+  # Porter stemmer in Ruby.
+  #
+  # This is the Porter stemming algorithm, ported to Ruby from the
+  # version coded up in Perl.  It's easy to follow against the rules
+  # in the original paper in:
+  #
+  #   Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
+  #   no. 3, pp 130-137,
+  #
+  # See also http://www.tartarus.org/~martin/PorterStemmer
+  #
+  # Send comments to raypereda@hotmail.com
+  #
+  def stem_porter
+    # make a copy of the given object and convert it to a string.
+    w = dup.to_str
+    return w if w.length < 3
+    # now map initial y to Y so that the patterns never treat it as vowel
+    w[0] = "Y" if w[0] == "y"
+    # Step 1a
+    case w
+    when /(ss|i)es$/
+      w = $` + $1
+    when /([^s])s$/
+      w = $` + $1
+    end
+    # Step 1b
+    case w
+    when /eed$/
+      w.chop! if $` =~ MGR0
+    when /(ed|ing)$/
+      stem = $`
+      if stem =~ VOWEL_IN_STEM
+        w = stem
+        case w
+        when /(at|bl|iz)$/             then w << "e"
+        when /([^aeiouylsz])\1$/       then w.chop!
+        when /^#{CC}#{V}[^aeiouwxy]$/o then w << "e"
+        end
+      end
+    end
+    if w =~ /y$/
+      stem = $`
+      w = stem + "i" if stem =~ VOWEL_IN_STEM
+    end
+    # Step 2
+    if w =~ SUFFIX_1_REGEXP
+      stem = $`
+      suffix = $1
+      # print "stem= " + stem + "\n" + "suffix=" + suffix + "\n"
+      w = stem + STEP_2_LIST[suffix] if stem =~ MGR0
+    end
+    # Step 3
+    if w =~ /(icate|ative|alize|iciti|ical|ful|ness)$/
+      stem = $`
+      suffix = $1
+      w = stem + STEP_3_LIST[suffix] if stem =~ MGR0
+    end
+    # Step 4
+    if w =~ SUFFIX_2_REGEXP
+      stem = $`
+      w = stem if stem =~ MGR1
+    elsif w =~ /(s|t)(ion)$/
+      stem = $` + $1
+      w = stem if stem =~ MGR1
+    end
+    #  Step 5
+    if w =~ /e$/
+      stem = $`
+      w = stem if (stem =~ MGR1) || (stem =~ MEQ1 && stem !~ /^#{CC}#{V}[^aeiouwxy]$/o)
+    end
+    w.chop! if w =~ /ll$/ && w =~ MGR1
+    # and turn initial Y back to y
+    w[0] = "y" if w[0] == "Y"
+    w
+  end
+  # make the stem_porter the default stem method, just in case we
+  # feel like having multiple stemmers available later.
+  alias stem stem_porter
+end
+# Add stem method to all Strings
+class String
+  include Stemmable
+end

data/lib/engtagger/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class EngTagger
-  VERSION = "0.4.0"
+  VERSION = "0.4.1"
 end

data/lib/engtagger.rb CHANGED Viewed

@@ -4,7 +4,7 @@
 require "rubygems"
 require "lru_redux"
-require_relative "engtagger/porter"
+require_relative "./engtagger/porter"
 module BoundedSpaceMemoizable
   def memoize(method, max_cache_size = 100_000)

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: engtagger
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 0.4.1
 platform: ruby
 authors:
 - Yoichiro Hasebe
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-01-21 00:00:00.000000000 Z
+date: 2024-04-30 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: lru_redux
@@ -69,7 +69,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.4.2
+rubygems_version: 3.4.12
 signing_key:
 specification_version: 4
 summary: A probability based, corpus-trained English POS tagger