RubyGems - textmood - Versions diffs - 0.0.5 → 0.0.6 - Mend

textmood 0.0.5 → 0.0.6

Files changed (5) hide show

data/README.md CHANGED Viewed

@@ -1,14 +1,19 @@
 ## TextMood - Simple sentiment analyzer
-*TextMood* is a simple sentiment analyzer, provided as a Ruby gem with a command-line
-tool for simple interoperability with other processes. It takes text as input and
-returns a sentiment score. Above 0 is typically considered positive, below is
-considered negative.
+*TextMood* is a simple but powerful sentiment analyzer, provided as a Ruby gem with
+a command-line tool for simple interoperability with other processes. It takes text
+as input and returns a sentiment score.
-The goal is to have a robust and simple tool that comes with baseline sentiment files
-for many languages.
+The sentiment analysis is relatively simple, and works by splitting the text into
+tokens and comparing each token to a pre-selected sentiment score for that token.
+The combined score for all tokens is then returned.
+However, TextMood also supports doing multiple passes over the text, splitting
+it into tokens of N words (N-grams) for each pass. By adding multi-word tokens to
+the sentiment file and using this feature, you can achieve much greater accuracy
+than with just single-word analysis.
 ### Installation
-The easiest way to get the latest stable version is to use gem:
+The easiest way to get the latest stable version is to install the gem:
     gem install textmood
@@ -17,27 +22,50 @@ If you’d like to get the bleeding-edge version:
     git clone https://github.com/stiang/textmood
 ### Usage
-TextMood can be used as a ruby library or as a standalone CLI tool.
+TextMood can be used as a Ruby library or as a standalone CLI tool.
 #### Ruby library
-You can use textmood in a ruby program like this:
+You can use it in a Ruby program like this:
 ```ruby
 require "textmood"
 # The :lang parameter makes TextMood use one of the bundled language sentiment files
-scorer = TextMood.new(lang: "en_US")
-score = scorer.score_text("some text")
+tm = TextMood.new(lang: "en_US")
+score = tm.analyze("some text")
 #=> '1.121'
 # The :files parameter makes TextMood ignore the bundled sentiment files and use the
 # specified files instead. You can specify as many files as you want.
-scorer = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
+tm = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
+# Using :normalize_output, you can make TextMood return a normalized value:
+# 1 for positive, 0 for neutral and -1 for negative
+tm = TextMood.new(lang: "en_US", normalize_output: true)
+score = tm.analyze("some text")
+#=> '1'
+# :normalize_score will try to normalize the score to an integer between +/- 100,
+# based on how many tokens were scored, which can be useful when trying to compare
+# scores for texts of different length
+tm = TextMood.new(lang: "en_US", normalize_score: true)
+score = tm.analyze("some text")
+#=> '14'
+# :min_threshold and :max_threshold lets you customize the way :normalize_output
+# treats different values. The options below will make all scores below 1 negative,
+# 1-2 will be neutral, and above 2 will be positive.
+tm = TextMood.new(lang: "en_US",
+                      normalize_output: true,
+                      min_threshold: 1,
+                      max_threshold: 2)
+score = tm.analyze("some text")
+#=> '0'
 # TextMood will by default make one pass over the text, checking every word, but it
 # supports doing several passes for any range of word N-grams. Both the start and end
 # N-gram can be specified using the :start_ngram and :end_ngram options
-scorer = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
-score = scorer.score_text("some long text with many words")
+tm = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
+score = tm.analyze("some long text with many words")
 #(stdout): some long: 0.1
 #(stdout): long text: 0.1
 #(stdout): text with: -0.1
@@ -49,23 +77,10 @@ score = scorer.score_text("some long text with many words")
 #(stdout): with many words: 0.1
 #=> '0.1'
-# Using :normalize, you can make TextMood return a normalized value: 1 for positive,
-# 0 for neutral and -1 for negative
-scorer = TextMood.new(lang: "en_US", normalize: true)
-score = scorer.score_text("some text")
-#=> '1'
-# :min_threshold and :max_threshold lets you customize the way :normalize treats
-# different values. The options below will make all scores below 1 negative,
-# 1-2 will be neutral, and above 2 will be positive.
-scorer = TextMood.new(lang: "en_US", normalize: true, min_threshold: 1, max_threshold: 2)
-score = scorer.score_text("some text")
-#=> '0'
 # :debug prints out all tokens to stdout, alongs with their values (or 'nil' when the
 # token was not found)
-scorer = TextMood.new(lang: "en_US", debug: true)
-score = scorer.score_text("some text")
+tm = TextMood.new(lang: "en_US", debug: true)
+score = tm.analyze("some text")
 #(stdout): some: 0.1
 #(stdout): text: 0.1
 #(stdout): some text: -0.1
@@ -89,6 +104,8 @@ The cli tool has many useful options, mostly mirroring those of the library. Her
 output from `textmood -h`:
 ```
 Usage: textmood [options] "<text>"
+            OR
+       echo "<text>" | textmood [options]
 Returns a floating-point sentiment score of the provided text.
 Above 0 is considered positive, below is considered negative.
@@ -104,28 +121,32 @@ MANDATORY options:
                                      files will be loaded if this option is used.
 OPTIONAL options:
-        --start-ngram INTEGER        The lowest word N-gram number to split the text into
-                                     (default 1). Note that this only makes sense if the
-                                     sentiment file has tokens of similar N-gram length
+    -o, --normalize-output           Return 1 (positive), -1 (negative) or 0 (neutral)
+                                     instead of the actual score. See also --min and --max.
-        --end-ngram INTEGER          The highest word N-gram number to to split the text into
-                                     (default 1). Note that this only makes sense if the
-                                     sentiment file has tokens of similar N-gram length
+    -s, --normalize-score            Tries to normalize the score to an integer between +/- 100
+                                     according to the number of tokens that were scored, making
+                                     it more feasible to compare scores for texts of different
+                                     length
-    -n, --normalize-output           Return 1 (positive), -1 (negative) or 0 (neutral)
-                                     instead of the actual score. See also --min and --max.
+    -i, --min-threshold FLOAT        Scores lower than this are considered negative when
+                                     using --normalize-output (default 0.5). Note that the
+                                     threshold is compared to the normalized score, if applicable
-        --normalize-score            Return 1 (positive), -1 (negative) or 0 (neutral)
-                                     instead of the actual score. See also --min and --max.
+    -x, --max-threshold FLOAT        Scores higher than this are considered positive when
+                                     using --normalize-output (default 0.5). Note that the
+                                     threshold is compared to the normalized score, if applicable
-        --min-threshold FLOAT        Scores lower than this are considered negative when
-                                     using --normalize (default -0.5)
+    -b, --start-ngram INTEGER        The lowest word N-gram number to split the text into
+                                     (default 1). Note that this only makes sense if the
+                                     sentiment file has tokens of similar N-gram length
-        --max-threshold FLOAT        Scores higher than this are considered positive when
-                                     using --normalize (default 0.5)
+    -e, --end-ngram INTEGER          The highest word N-gram number to to split the text into
+                                     (default 1). Note that this only makes sense if the
+                                     sentiment file has tokens of similar N-gram length
-    -s, --skip-symbols               Do not include symbols file (emoticons etc.).
-                                     Only applies when using -l/--language.
+    -k, --skip-symbols               Do not include symbols file (emoticons etc.). Only applies
+                                     when using -l/--language.
     -d, --debug                      Prints out the score for each token in the provided text
                                      or 'nil' if the token was not found in the sentiment file

data/bin/textmood CHANGED Viewed

@@ -63,7 +63,7 @@ opts_parser = OptionParser.new do |opts|
   opts.separator ""
   opts.on("-s", "--normalize-score", "Tries to normalize the score to an integer between +/- 100",
                                      "according to the number of tokens that were scored, making",
-                                     "it more feasible to compare scores between texts of different",
+                                     "it more feasible to compare scores for texts of different",
                                      "length") do |ns|
     options[:normalize_score] = true
   end
@@ -111,8 +111,8 @@ end
 opts_parser.parse!
 def do_main(text, options)
-  scorer = TextMood.new(options)
-  puts scorer.score_text(text)
+  tm = TextMood.new(options)
+  puts tm.analyze(text)
 end
 if ARGV[0]

data/lib/textmood.rb CHANGED Viewed

@@ -37,7 +37,7 @@ class TextMood
   end
   # analyzes the sentiment of the provided text.
-  def score_text(text)
+  def analyze(text)
     sentiment_total = 0.0
     scores_added = 0
@@ -67,6 +67,8 @@ class TextMood
     end
   end
+  alias_method :analyse, :analyze
   private
   def score_token(token)
@@ -115,8 +117,8 @@ class TextMood
   end
   def normalize_score(score, count)
-    factor = NORMALIZE_TO / count
-    (score * factor).to_i
+    factor = NORMALIZE_TO.to_f / count.to_f
+    (score * factor).round
   end
 end

data/test/test.rb CHANGED Viewed

@@ -17,14 +17,14 @@ include Test::Unit::Assertions
 class TestScorer < Test::Unit::TestCase
   def setup
-    @scorer = TextMood.new({:lang => "en_US"})
+    @tm = TextMood.new({:lang => "en_US"})
   end
   def test_negative
     max = -0.01
     texts = ["This is just terrible"]
     texts.each do |text|
-      actual_score = @scorer.score_text(text)
+      actual_score = @tm.analyze(text)
       assert((actual_score < max), "actual: #{actual_score} >= max: #{max} for '#{text}'")
     end
   end
@@ -34,7 +34,7 @@ class TestScorer < Test::Unit::TestCase
     max =  0.5
     texts = ["This is neutral"]
     texts.each do |text, test_score|
-      actual_score = @scorer.score_text(text)
+      actual_score = @tm.analyze(text)
       assert((actual_score < max and actual_score > min), "min: #{min} <= actual: #{actual_score} >= max: #{max} for '#{text}'")
     end
   end
@@ -43,7 +43,7 @@ class TestScorer < Test::Unit::TestCase
     min = 0.01
     texts = ["This is amazing!"]
     texts.each do |text, test_score|
-      actual_score = @scorer.score_text(text)
+      actual_score = @tm.analyze(text)
       assert((actual_score >= min), "actual: #{actual_score} <= max: #{min} for '#{text}'")
     end
   end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: textmood
 version: !ruby/object:Gem::Version
-  version: 0.0.5
+  version: 0.0.6
   prerelease:
 platform: ruby
 authors: