RubyGems - ruby-spacy - Versions diffs - 0.1.1 → 0.1.2 - Mend

ruby-spacy 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/Gemfile.lock +2 -1
data/README.md +82 -40
data/examples/get_started/morphology.rb +45 -0
data/examples/get_started/pos_tags_and_dependencies.rb +17 -17
data/examples/japanese/pos_tagging.rb +20 -20
data/lib/ruby-spacy.rb +20 -0
data/lib/ruby-spacy/version.rb +1 -1
metadata +3 -3
data/examples/linguistic_features/morphology.rb +0 -17

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2cdb24ba1156e16b0cd14809b4a4ea0fe832257eed4060e6eba0d55314849151
-  data.tar.gz: ba2d9c1957f1b650cf0a8902db6b3e901762ba94470c0443debf8b68a8b5f8c0
+  metadata.gz: 9add9d3b065bbf5064652cb115f824221d929a20478d182782df5db564cc8f45
+  data.tar.gz: f07d502f79883a452e7f250f0fe784425511a0de4f8a43db0b29ca03801bd755
 SHA512:
-  metadata.gz: 68f4acdf7375c8bb4107681f3425a6b16ae3544c8c46c6f80cdb37643fe5c2fed2b6a2cac738325d0b3a5f9605495ac9230fa11090a167d6e8efc9d59066d88b
-  data.tar.gz: 8eb877bea7a8b5d8f699cbf6637797b8d5bc4e6c6dcea228a5db0c7f56fa7add1c44b6e587ee212837124015ed3e0512fdf6f9015cbf7090f6d87bd7d19f4842
+  metadata.gz: 373c795a148034f4191cfaf130a23f464dc2b43927bf6aa3165999c78797365ce2f976021ea8b9ab1dd083736e5f9a1da51a5ccf0156d00ec39dac9fd19bde7c
+  data.tar.gz: e370e503c23d15a0a44be84bf578775b0a4acc5557468c7fc9468cde44e0e084018be8dc17c3e7c21d9efdaf229611ca234614fcd2e811272051c7c2922b408d

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    ruby-spacy (0.1.0)
+    ruby-spacy (0.1.2)
       numpy (~> 0.4.0)
       pycall (~> 1.4.0)
       terminal-table (~> 3.0.1)
@@ -23,6 +23,7 @@ GEM
 PLATFORMS
   arm64-darwin-20
+  x86_64-darwin-20
 DEPENDENCIES
   github-markup

data/README.md CHANGED Viewed

@@ -111,12 +111,10 @@ Output:
 |:-----:|:--:|:-------:|:--:|:------:|:----:|:-------:|:---:|:-:|:--:|:-------:|
 | Apple | is | looking | at | buying | U.K. | startup | for | $ | 1  | billion |
-### Part-of-speech tagging
+### Part-of-speech and dependency
 → [spaCy: Part-of-speech tags and dependencies](https://spacy.io/usage/spacy-101#annotations-pos-deps)
-→ [POS and morphology tags](https://github.com/explosion/spaCy/blob/master/spacy/glossary.py)
 Ruby code:
 ```ruby
@@ -126,73 +124,117 @@ require "terminal-table"
 nlp = Spacy::Language.new("en_core_web_sm")
 doc = nlp.read("Apple is looking at buying U.K. startup for $1 billion")
+headings = ["text", "lemma", "pos", "tag", "dep"]
 rows = []
 doc.each do |token|
-  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop]
+  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_]
 end
-headings = ["text", "lemma", "pos", "tag", "dep", "shape", "is_alpha", "is_stop"]
 table = Terminal::Table.new rows: rows, headings: headings
 puts table
 ```
 Output:
-| text    | lemma   | pos   | tag | dep      | shape | is_alpha | is_stop |
-|:--------|:--------|:------|:----|:---------|:------|:---------|:--------|
-| Apple   | Apple   | PROPN | NNP | nsubj    | Xxxxx | true     | false   |
-| is      | be      | AUX   | VBZ | aux      | xx    | true     | true    |
-| looking | look    | VERB  | VBG | ROOT     | xxxx  | true     | false   |
-| at      | at      | ADP   | IN  | prep     | xx    | true     | true    |
-| buying  | buy     | VERB  | VBG | pcomp    | xxxx  | true     | false   |
-| U.K.    | U.K.    | PROPN | NNP | dobj     | X.X.  | false    | false   |
-| startup | startup | NOUN  | NN  | advcl    | xxxx  | true     | false   |
-| for     | for     | ADP   | IN  | prep     | xxx   | true     | true    |
-| $       | $       | SYM   | $   | quantmod | $     | false    | false   |
-| 1       | 1       | NUM   | CD  | compound | d     | false    | false   |
-| billion | billion | NUM   | CD  | pobj     | xxxx  | true     | false   |
-### Part-of-speech tagging (Japanese)
+| text    | lemma   | pos   | tag | dep      |
+|:--------|:--------|:------|:----|:---------|
+| Apple   | Apple   | PROPN | NNP | nsubj    |
+| is      | be      | AUX   | VBZ | aux      |
+| looking | look    | VERB  | VBG | ROOT     |
+| at      | at      | ADP   | IN  | prep     |
+| buying  | buy     | VERB  | VBG | pcomp    |
+| U.K.    | U.K.    | PROPN | NNP | dobj     |
+| startup | startup | NOUN  | NN  | advcl    |
+| for     | for     | ADP   | IN  | prep     |
+| $       | $       | SYM   | $   | quantmod |
+| 1       | 1       | NUM   | CD  | compound |
+| billion | billion | NUM   | CD  | pobj     |
+### Part-of-speech and dependency (Japanese)
 Ruby code:
 ```ruby
-require( "ruby-spacy")
+require "ruby-spacy"
 require "terminal-table"
 nlp = Spacy::Language.new("ja_core_news_lg")
-doc = nlp.read("任天堂は1983年にファミリー・コンピュータを14,800円で発売した。")
+doc = nlp.read("任天堂は1983年にファミコンを14,800円で発売した。")
+headings = ["text", "lemma", "pos", "tag", "dep"]
 rows = []
 doc.each do |token|
-  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop]
+  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_]
 end
-headings = ["text", "lemma", "pos", "tag", "dep", "shape", "is_alpha", "is_stop"]
 table = Terminal::Table.new rows: rows, headings: headings
 puts table
 ```
 Output:
-| text       | lemma      | pos   | tag                      | dep    | shape  | is_alpha | is_stop |
-|:-----------|:-----------|:------|:-------------------------|:-------|:-------|:---------|:--------|
-| 任天堂     | 任天堂     | PROPN | 名詞-固有名詞-一般       | nsubj  | xxx    | true     | false   |
-| は         | は         | ADP   | 助詞-係助詞              | case   | x      | true     | true    |
-| 1983       | 1983       | NUM   | 名詞-数詞                | nummod | dddd   | false    | false   |
-| 年         | 年         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    | x      | true     | false   |
-| に         | に         | ADP   | 助詞-格助詞              | case   | x      | true     | true    |
-| ファミコン | ファミコン | NOUN  | 名詞-普通名詞-一般       | obj    | xxxx   | true     | false   |
-| を         | を         | ADP   | 助詞-格助詞              | case   | x      | true     | true    |
-| 14,800     | 14,800     | NUM   | 名詞-数詞                | fixed  | dd,ddd | false    | false   |
-| 円         | 円         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    | x      | true     | false   |
-| で         | で         | ADP   | 助詞-格助詞              | case   | x      | true     | true    |
-| 発売       | 発売       | VERB  | 名詞-普通名詞-サ変可能   | ROOT   | xx     | true     | false   |
-| し         | する       | AUX   | 動詞-非自立可能          | aux    | x      | true     | true    |
-| た         | た         | AUX   | 助動詞                   | aux    | x      | true     | true    |
-| 。         | 。         | PUNCT | 補助記号-句点            | punct  | 。     | false    | false   |
+| text       | lemma      | pos   | tag                      | dep    |
+|:-----------|:-----------|:------|:-------------------------|:-------|
+| 任天堂     | 任天堂     | PROPN | 名詞-固有名詞-一般       | nsubj  |
+| は         | は         | ADP   | 助詞-係助詞              | case   |
+| 1983       | 1983       | NUM   | 名詞-数詞                | nummod |
+| 年         | 年         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    |
+| に         | に         | ADP   | 助詞-格助詞              | case   |
+| ファミコン | ファミコン | NOUN  | 名詞-普通名詞-一般       | obj    |
+| を         | を         | ADP   | 助詞-格助詞              | case   |
+| 14,800     | 14,800     | NUM   | 名詞-数詞                | fixed  |
+| 円         | 円         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    |
+| で         | で         | ADP   | 助詞-格助詞              | case   |
+| 発売       | 発売       | VERB  | 名詞-普通名詞-サ変可能   | ROOT   |
+| し         | する       | AUX   | 動詞-非自立可能          | aux    |
+| た         | た         | AUX   | 助動詞                   | aux    |
+| 。         | 。         | PUNCT | 補助記号-句点            | punct  |
+### Morphology
+→ [POS and morphology tags](https://github.com/explosion/spaCy/blob/master/spacy/glossary.py)
+Ruby code:
+```ruby
+require "ruby-spacy"
+require "terminal-table"
+nlp = Spacy::Language.new("en_core_web_sm")
+doc = nlp.read("Apple is looking at buying U.K. startup for $1 billion")
+headings = ["text", "shape", "is_alpha", "is_stop", "morphology"]
+rows = []
+doc.each do |token|
+  morph = token.morphology.map do |k, v|
+    "#{k} = #{v}"
+  end.join("\n")
+  rows << [token.text, token.shape_, token.is_alpha, token.is_stop, morph]
+end
+table = Terminal::Table.new rows: rows, headings: headings
+puts table
+```
+Output:
+| text    | shape | is_alpha | is_stop | morphology                                                                          |
+|:--------|:------|:---------|:--------|:------------------------------------------------------------------------------------|
+| Apple   | Xxxxx | true     | false   | NounType = Prop<br />Number = Sing                                                  |
+| is      | xx    | true     | true    | Mood = Ind<br />Number = Sing<br />Person = 3<br />Tense = Pres<br />VerbForm = Fin |
+| looking | xxxx  | true     | false   | Aspect = Prog<br />Tense = Pres<br />VerbForm = Part                                |
+| at      | xx    | true     | true    |                                                                                     |
+| buying  | xxxx  | true     | false   | Aspect = Prog<br />Tense = Pres<br />VerbForm = Part                                |
+| U.K.    | X.X.  | false    | false   | NounType = Prop<br />Number = Sing                                                  |
+| startup | xxxx  | true     | false   | Number = Sing                                                                       |
+| for     | xxx   | true     | true    |                                                                                     |
+| $       | $     | false    | false   |                                                                                     |
+| 1       | d     | false    | false   | NumType = Card                                                                      |
+| billion | xxxx  | true     | false   | NumType = Card                                                                      |
 ### Visualizing dependency

data/examples/get_started/morphology.rb ADDED Viewed

@@ -0,0 +1,45 @@
+require "ruby-spacy"
+require "terminal-table"
+nlp = Spacy::Language.new("en_core_web_sm")
+doc = nlp.read("Apple is looking at buying U.K. startup for $1 billion")
+headings = ["text", "shape", "is_alpha", "is_stop", "morphology"]
+rows = []
+doc.each do |token|
+  morph = token.morphology.map do |k, v|
+    "#{k} = #{v}"
+  end.join("\n")
+  # end.join("<br />")
+  rows << [token.text, token.shape_, token.is_alpha, token.is_stop, morph]
+end
+table = Terminal::Table.new rows: rows, headings: headings
+puts table
+# +---------+-------+----------+---------+-----------------+
+# | text    | shape | is_alpha | is_stop | morphology      |
+# +---------+-------+----------+---------+-----------------+
+# | Apple   | Xxxxx | true     | false   | NounType = Prop |
+# |         |       |          |         | Number = Sing   |
+# | is      | xx    | true     | true    | Mood = Ind      |
+# |         |       |          |         | Number = Sing   |
+# |         |       |          |         | Person = 3      |
+# |         |       |          |         | Tense = Pres    |
+# |         |       |          |         | VerbForm = Fin  |
+# | looking | xxxx  | true     | false   | Aspect = Prog   |
+# |         |       |          |         | Tense = Pres    |
+# |         |       |          |         | VerbForm = Part |
+# | at      | xx    | true     | true    |                 |
+# | buying  | xxxx  | true     | false   | Aspect = Prog   |
+# |         |       |          |         | Tense = Pres    |
+# |         |       |          |         | VerbForm = Part |
+# | U.K.    | X.X.  | false    | false   | NounType = Prop |
+# |         |       |          |         | Number = Sing   |
+# | startup | xxxx  | true     | false   | Number = Sing   |
+# | for     | xxx   | true     | true    |                 |
+# | $       | $     | false    | false   |                 |
+# | 1       | d     | false    | false   | NumType = Card  |
+# | billion | xxxx  | true     | false   | NumType = Card  |
+# +---------+-------+----------+---------+-----------------+

data/examples/get_started/pos_tags_and_dependencies.rb CHANGED Viewed

@@ -4,28 +4,28 @@ require "terminal-table"
 nlp = Spacy::Language.new("en_core_web_sm")
 doc = nlp.read("Apple is looking at buying U.K. startup for $1 billion")
-headings = ["text", "lemma", "pos", "tag", "dep", "shape", "is_alpha", "is_stop"]
+headings = ["text", "lemma", "pos", "tag", "dep"]
 rows = []
 doc.each do |token|
-  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop]
+  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_]
 end
 table = Terminal::Table.new rows: rows, headings: headings
 puts table
-# +---------+---------+-------+-----+----------+-------+----------+---------+
-# | text    | lemma   | pos   | tag | dep      | shape | is_alpha | is_stop |
-# +---------+---------+-------+-----+----------+-------+----------+---------+
-# | Apple   | Apple   | PROPN | NNP | nsubj    | Xxxxx | true     | false   |
-# | is      | be      | AUX   | VBZ | aux      | xx    | true     | true    |
-# | looking | look    | VERB  | VBG | ROOT     | xxxx  | true     | false   |
-# | at      | at      | ADP   | IN  | prep     | xx    | true     | true    |
-# | buying  | buy     | VERB  | VBG | pcomp    | xxxx  | true     | false   |
-# | U.K.    | U.K.    | PROPN | NNP | dobj     | X.X.  | false    | false   |
-# | startup | startup | NOUN  | NN  | advcl    | xxxx  | true     | false   |
-# | for     | for     | ADP   | IN  | prep     | xxx   | true     | true    |
-# | $       | $       | SYM   | $   | quantmod | $     | false    | false   |
-# | 1       | 1       | NUM   | CD  | compound | d     | false    | false   |
-# | billion | billion | NUM   | CD  | pobj     | xxxx  | true     | false   |
-# +---------+---------+-------+-----+----------+-------+----------+---------+
+# +---------+---------+-------+-----+----------+
+# | text    | lemma   | pos   | tag | dep      |
+# +---------+---------+-------+-----+----------+
+# | Apple   | Apple   | PROPN | NNP | nsubj    |
+# | is      | be      | AUX   | VBZ | aux      |
+# | looking | look    | VERB  | VBG | ROOT     |
+# | at      | at      | ADP   | IN  | prep     |
+# | buying  | buy     | VERB  | VBG | pcomp    |
+# | U.K.    | U.K.    | PROPN | NNP | dobj     |
+# | startup | startup | NOUN  | NN  | advcl    |
+# | for     | for     | ADP   | IN  | prep     |
+# | $       | $       | SYM   | $   | quantmod |
+# | 1       | 1       | NUM   | CD  | compound |
+# | billion | billion | NUM   | CD  | pobj     |
+# +---------+---------+-------+-----+----------+

data/examples/japanese/pos_tagging.rb CHANGED Viewed

@@ -4,31 +4,31 @@ require "terminal-table"
 nlp = Spacy::Language.new("ja_core_news_lg")
 doc = nlp.read("任天堂は1983年にファミコンを14,800円で発売した。")
-headings = ["text", "lemma", "pos", "tag", "dep", "shape", "is_alpha", "is_stop"]
+headings = ["text", "lemma", "pos", "tag", "dep"]
 rows = []
 doc.each do |token|
-  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop]
+  rows << [token.text, token.lemma_, token.pos_, token.tag_, token.dep_]
 end
 table = Terminal::Table.new rows: rows, headings: headings
 puts table
-# +------------+------------+-------+--------------------------+--------+--------+----------+---------+
-# | text       | lemma      | pos   | tag                      | dep    | shape  | is_alpha | is_stop |
-# +------------+------------+-------+--------------------------+--------+--------+----------+---------+
-# | 任天堂     | 任天堂     | PROPN | 名詞-固有名詞-一般       | nsubj  | xxx    | true     | false   |
-# | は         | は         | ADP   | 助詞-係助詞              | case   | x      | true     | true    |
-# | 1983       | 1983       | NUM   | 名詞-数詞                | nummod | dddd   | false    | false   |
-# | 年         | 年         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    | x      | true     | false   |
-# | に         | に         | ADP   | 助詞-格助詞              | case   | x      | true     | true    |
-# | ファミコン | ファミコン | NOUN  | 名詞-普通名詞-一般       | obj    | xxxx   | true     | false   |
-# | を         | を         | ADP   | 助詞-格助詞              | case   | x      | true     | true    |
-# | 14,800     | 14,800     | NUM   | 名詞-数詞                | fixed  | dd,ddd | false    | false   |
-# | 円         | 円         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    | x      | true     | false   |
-# | で         | で         | ADP   | 助詞-格助詞              | case   | x      | true     | true    |
-# | 発売       | 発売       | VERB  | 名詞-普通名詞-サ変可能   | ROOT   | xx     | true     | false   |
-# | し         | する       | AUX   | 動詞-非自立可能          | aux    | x      | true     | true    |
-# | た         | た         | AUX   | 助動詞                   | aux    | x      | true     | true    |
-# | 。         | 。         | PUNCT | 補助記号-句点            | punct  | 。     | false    | false   |
-# +------------+------------+-------+--------------------------+--------+--------+----------+---------+
+# +------------+------------+-------+--------------------------+--------+
+# | text       | lemma      | pos   | tag                      | dep    |
+# +------------+------------+-------+--------------------------+--------+
+# | 任天堂     | 任天堂     | PROPN | 名詞-固有名詞-一般       | nsubj  |
+# | は         | は         | ADP   | 助詞-係助詞              | case   |
+# | 1983       | 1983       | NUM   | 名詞-数詞                | nummod |
+# | 年         | 年         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    |
+# | に         | に         | ADP   | 助詞-格助詞              | case   |
+# | ファミコン | ファミコン | NOUN  | 名詞-普通名詞-一般       | obj    |
+# | を         | を         | ADP   | 助詞-格助詞              | case   |
+# | 14,800     | 14,800     | NUM   | 名詞-数詞                | fixed  |
+# | 円         | 円         | NOUN  | 名詞-普通名詞-助数詞可能 | obl    |
+# | で         | で         | ADP   | 助詞-格助詞              | case   |
+# | 発売       | 発売       | VERB  | 名詞-普通名詞-サ変可能   | ROOT   |
+# | し         | する       | AUX   | 動詞-非自立可能          | aux    |
+# | た         | た         | AUX   | 助動詞                   | aux    |
+# | 。         | 。         | PUNCT | 補助記号-句点            | punct  |
+# +------------+------------+-------+--------------------------+--------+

data/lib/ruby-spacy.rb CHANGED Viewed

@@ -252,6 +252,26 @@ module Spacy
       @text
     end
+    # Returns a hash or string of morphological information
+    # @param dict [Boolean] if true, a hash will be returned instead of a string
+    # @return [Hash, String]
+    def morphology(hash = true)
+      if @py_token.has_morph
+        morph_analysis = @py_token.morph
+        if hash
+          return morph_analysis.to_dict
+        else
+          return morph_analysis.to_s
+        end
+      else
+        if hash
+          results = {}
+        else
+          return ""
+        end
+      end
+    end
     # Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
     def method_missing(name, *args)
       @py_token.send(name, *args)

data/lib/ruby-spacy/version.rb CHANGED Viewed

@@ -2,5 +2,5 @@
 module Spacy
   # The version number of the module
-  VERSION = "0.1.1"
+  VERSION = "0.1.2"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ruby-spacy
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.1.2
 platform: ruby
 authors:
 - Yoichiro Hasebe
@@ -75,6 +75,7 @@ files:
 - bin/setup
 - examples/get_started/lexeme.rb
 - examples/get_started/linguistic_annotations.rb
+- examples/get_started/morphology.rb
 - examples/get_started/most_similar.rb
 - examples/get_started/named_entities.rb
 - examples/get_started/outputs/test_dep.svg
@@ -111,7 +112,6 @@ files:
 - examples/linguistic_features/iterating_children.rb
 - examples/linguistic_features/iterating_lefts_and_rights.rb
 - examples/linguistic_features/lemmatization.rb
-- examples/linguistic_features/morphology.rb
 - examples/linguistic_features/named_entity_recognition.rb
 - examples/linguistic_features/navigating_parse_tree.rb
 - examples/linguistic_features/noun_chunks.rb
@@ -149,7 +149,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.2.3
+rubygems_version: 3.2.11
 signing_key:
 specification_version: 4
 summary: A wrapper module for using spaCy natural language processing library from

data/examples/linguistic_features/morphology.rb DELETED Viewed

@@ -1,17 +0,0 @@
-require "ruby-spacy"
-require "terminal-table"
-nlp = Spacy::Language.new("en_core_web_sm")
-puts "Pipeline: " + nlp.pipe_names.to_s
-doc = nlp.read("I was reading the paper.")
-token = doc[0]
-puts "Morph features of the first word: " + token.morph.to_s
-puts "PronType of the word: " + token.morph.get("PronType").to_s
-# Pipeline: ["tok2vec", "tagger", "parser", "ner", "attribute_ruler", "lemmatizer"]
-# Morph features of the first word: Case=Nom|Number=Sing|Person=1|PronType=Prs
-# PronType of the word: ['Prs']