RubyGems - textmood - Versions diffs - 0.0.5 → 0.0.6 - Mend

textmood 0.0.5 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

data/README.md CHANGED Viewed

@@ -1,14 +1,19 @@
 ## TextMood - Simple sentiment analyzer
-*TextMood* is a simple sentiment analyzer, provided as a Ruby gem with a command-line
-tool for simple interoperability with other processes. It takes text as input and
-returns a sentiment score. Above 0 is typically considered positive, below is
-considered negative.
+*TextMood* is a simple but powerful sentiment analyzer, provided as a Ruby gem with
+a command-line tool for simple interoperability with other processes. It takes text
+as input and returns a sentiment score.
-The goal is to have a robust and simple tool that comes with baseline sentiment files
-for many languages.
+The sentiment analysis is relatively simple, and works by splitting the text into
+tokens and comparing each token to a pre-selected sentiment score for that token.
+The combined score for all tokens is then returned.
+However, TextMood also supports doing multiple passes over the text, splitting
+it into tokens of N words (N-grams) for each pass. By adding multi-word tokens to
+the sentiment file and using this feature, you can achieve much greater accuracy
+than with just single-word analysis.
 ### Installation
-The easiest way to get the latest stable version is to use gem:
+The easiest way to get the latest stable version is to install the gem:
     gem install textmood
@@ -17,27 +22,50 @@ If you’d like to get the bleeding-edge version:
     git clone https://github.com/stiang/textmood
 ### Usage
-TextMood can be used as a ruby library or as a standalone CLI tool.
+TextMood can be used as a Ruby library or as a standalone CLI tool.
 #### Ruby library
-You can use textmood in a ruby program like this:
+You can use it in a Ruby program like this:
 ```ruby
 require "textmood"
 # The :lang parameter makes TextMood use one of the bundled language sentiment files
-scorer = TextMood.new(lang: "en_US")
-score = scorer.score_text("some text")
+tm = TextMood.new(lang: "en_US")
+score = tm.analyze("some text")
 #=> '1.121'
 # The :files parameter makes TextMood ignore the bundled sentiment files and use the
 # specified files instead. You can specify as many files as you want.
-scorer = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
+tm = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
+# Using :normalize_output, you can make TextMood return a normalized value:
+# 1 for positive, 0 for neutral and -1 for negative
+tm = TextMood.new(lang: "en_US", normalize_output: true)
+score = tm.analyze("some text")
+#=> '1'
+# :normalize_score will try to normalize the score to an integer between +/- 100,
+# based on how many tokens were scored, which can be useful when trying to compare
+# scores for texts of different length
+tm = TextMood.new(lang: "en_US", normalize_score: true)
+score = tm.analyze("some text")
+#=> '14'
+# :min_threshold and :max_threshold lets you customize the way :normalize_output
+# treats different values. The options below will make all scores below 1 negative,
+# 1-2 will be neutral, and above 2 will be positive.
+tm = TextMood.new(lang: "en_US",
+                      normalize_output: true,
+                      min_threshold: 1,
+                      max_threshold: 2)
+score = tm.analyze("some text")
+#=> '0'
 # TextMood will by default make one pass over the text, checking every word, but it
 # supports doing several passes for any range of word N-grams. Both the start and end
 # N-gram can be specified using the :start_ngram and :end_ngram options
-scorer = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
-score = scorer.score_text("some long text with many words")
+tm = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
+score = tm.analyze("some long text with many words")
 #(stdout): some long: 0.1
 #(stdout): long text: 0.1
 #(stdout): text with: -0.1
@@ -49,23 +77,10 @@ score = scorer.score_text("some long text with many words")
 #(stdout): with many words: 0.1
 #=> '0.1'
-# Using :normalize, you can make TextMood return a normalized value: 1 for positive,
-# 0 for neutral and -1 for negative
-scorer = TextMood.new(lang: "en_US", normalize: true)
-score = scorer.score_text("some text")
-#=> '1'
-# :min_threshold and :max_threshold lets you customize the way :normalize treats
-# different values. The options below will make all scores below 1 negative,
-# 1-2 will be neutral, and above 2 will be positive.
-scorer = TextMood.new(lang: "en_US", normalize: true, min_threshold: 1, max_threshold: 2)
-score = scorer.score_text("some text")
-#=> '0'
 # :debug prints out all tokens to stdout, alongs with their values (or 'nil' when the
 # token was not found)
-scorer = TextMood.new(lang: "en_US", debug: true)
-score = scorer.score_text("some text")
+tm = TextMood.new(lang: "en_US", debug: true)
+score = tm.analyze("some text")
 #(stdout): some: 0.1
 #(stdout): text: 0.1
 #(stdout): some text: -0.1
@@ -89,6 +104,8 @@ The cli tool has many useful options, mostly mirroring those of the library. Her
 output from `textmood -h`:
 ```
 Usage: textmood [options] "<text>"
+            OR
+       echo "<text>" | textmood [options]
 Returns a floating-point sentiment score of the provided text.
 Above 0 is considered positive, below is considered negative.
@@ -104,28 +121,32 @@ MANDATORY options:
                                      files will be loaded if this option is used.
 OPTIONAL options:
-        --start-ngram INTEGER        The lowest word N-gram number to split the text into
-                                     (default 1). Note that this only makes sense if the
-                                     sentiment file has tokens of similar N-gram length
+    -o, --normalize-output           Return 1 (positive), -1 (negative) or 0 (neutral)
+                                     instead of the actual score. See also --min and --max.
-        --end-ngram INTEGER          The highest word N-gram number to to split the text into
-                                     (default 1). Note that this only makes sense if the
-                                     sentiment file has tokens of similar N-gram length
+    -s, --normalize-score            Tries to normalize the score to an integer between +/- 100
+                                     according to the number of tokens that were scored, making
+                                     it more feasible to compare scores for texts of different
+                                     length
-    -n, --normalize-output           Return 1 (positive), -1 (negative) or 0 (neutral)
-                                     instead of the actual score. See also --min and --max.
+    -i, --min-threshold FLOAT        Scores lower than this are considered negative when
+                                     using --normalize-output (default 0.5). Note that the
+                                     threshold is compared to the normalized score, if applicable
-        --normalize-score            Return 1 (positive), -1 (negative) or 0 (neutral)
-                                     instead of the actual score. See also --min and --max.
+    -x, --max-threshold FLOAT        Scores higher than this are considered positive when
+                                     using --normalize-output (default 0.5). Note that the
+                                     threshold is compared to the normalized score, if applicable
-        --min-threshold FLOAT        Scores lower than this are considered negative when
-                                     using --normalize (default -0.5)
+    -b, --start-ngram INTEGER        The lowest word N-gram number to split the text into
+                                     (default 1). Note that this only makes sense if the
+                                     sentiment file has tokens of similar N-gram length
-        --max-threshold FLOAT        Scores higher than this are considered positive when
-                                     using --normalize (default 0.5)
+    -e, --end-ngram INTEGER          The highest word N-gram number to to split the text into
+                                     (default 1). Note that this only makes sense if the
+                                     sentiment file has tokens of similar N-gram length
-    -s, --skip-symbols               Do not include symbols file (emoticons etc.).
-                                     Only applies when using -l/--language.
+    -k, --skip-symbols               Do not include symbols file (emoticons etc.). Only applies
+                                     when using -l/--language.
     -d, --debug                      Prints out the score for each token in the provided text
                                      or 'nil' if the token was not found in the sentiment file

data/bin/textmood CHANGED Viewed

@@ -63,7 +63,7 @@ opts_parser = OptionParser.new do |opts|
   opts.separator ""
   opts.on("-s", "--normalize-score", "Tries to normalize the score to an integer between +/- 100",
                                      "according to the number of tokens that were scored, making",
-                                     "it more feasible to compare scores between texts of different",
+                                     "it more feasible to compare scores for texts of different",
                                      "length") do |ns|
     options[:normalize_score] = true
   end
@@ -111,8 +111,8 @@ end
 opts_parser.parse!
 def do_main(text, options)
-  scorer = TextMood.new(options)
-  puts scorer.score_text(text)
+  tm = TextMood.new(options)
+  puts tm.analyze(text)
 end
 if ARGV[0]

data/lib/textmood.rb CHANGED Viewed

@@ -37,7 +37,7 @@ class TextMood
   end
   # analyzes the sentiment of the provided text.
-  def score_text(text)
+  def analyze(text)
     sentiment_total = 0.0
     scores_added = 0
@@ -67,6 +67,8 @@ class TextMood
     end
   end
+  alias_method :analyse, :analyze
   private
   def score_token(token)
@@ -115,8 +117,8 @@ class TextMood
   end
   def normalize_score(score, count)
-    factor = NORMALIZE_TO / count
-    (score * factor).to_i
+    factor = NORMALIZE_TO.to_f / count.to_f
+    (score * factor).round
   end
 end

data/test/test.rb CHANGED Viewed

@@ -17,14 +17,14 @@ include Test::Unit::Assertions
 class TestScorer < Test::Unit::TestCase
   def setup
-    @scorer = TextMood.new({:lang => "en_US"})
+    @tm = TextMood.new({:lang => "en_US"})
   end
   def test_negative
     max = -0.01
     texts = ["This is just terrible"]
     texts.each do |text|
-      actual_score = @scorer.score_text(text)
+      actual_score = @tm.analyze(text)
       assert((actual_score < max), "actual: #{actual_score} >= max: #{max} for '#{text}'")
     end
   end
@@ -34,7 +34,7 @@ class TestScorer < Test::Unit::TestCase
     max =  0.5
     texts = ["This is neutral"]
     texts.each do |text, test_score|
-      actual_score = @scorer.score_text(text)
+      actual_score = @tm.analyze(text)
       assert((actual_score < max and actual_score > min), "min: #{min} <= actual: #{actual_score} >= max: #{max} for '#{text}'")
     end
   end
@@ -43,7 +43,7 @@ class TestScorer < Test::Unit::TestCase
     min = 0.01
     texts = ["This is amazing!"]
     texts.each do |text, test_score|
-      actual_score = @scorer.score_text(text)
+      actual_score = @tm.analyze(text)
       assert((actual_score >= min), "actual: #{actual_score} <= max: #{min} for '#{text}'")
     end
   end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: textmood
 version: !ruby/object:Gem::Version
-  version: 0.0.5
+  version: 0.0.6
   prerelease:
 platform: ruby
 authors: