RubyGems - textmood - Versions diffs - 0.0.1 - Mend

textmood 0.0.1

Files changed (9) hide show

data/LICENSE ADDED Viewed

@@ -0,0 +1,20 @@
+The MIT License (MIT)
+Copyright (c) 2013 Stian Grytøyr
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,162 @@
+## TextMood - Simple sentiment analyzer
+*TextMood* is a simple sentiment analyzer, provided as a Ruby gem with a command-line
+tool for simple interoperability with other processes. It takes text as input and
+returns a sentiment score. Above 0 is typically considered positive, below is
+considered negative.
+The goal is to have a robust and simple tool that comes with baseline sentiment files
+for many languages.
+### Installation
+The easiest way to get the latest stable version is to use gem:
+    gem install textmood
+If you’d like to get the bleeding-edge version:
+    git clone https://github.com/stiang/textmood
+### Usage
+TextMood can be used as a ruby library or as a standalone CLI tool.
+#### Ruby library
+You can use textmood in a ruby program like this:
+```ruby
+require "textmood"
+# The :lang parameter makes TextMood use one of the bundled language sentiment files
+scorer = TextMood.new(lang: "en_US")
+score = scorer.score_text("some text")
+#=> '1.121'
+# The :files parameter makes TextMood ignore the bundled sentiment files and use the
+# specified files instead. You can specify as many files as you want.
+scorer = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
+# TextMood will by default make one pass over the text, checking every word, but it
+# supports doing several passes for any range of word N-grams. Both the start and end
+# N-gram can be specified using the :start_ngram and :end_ngram options
+scorer = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
+score = scorer.score_text("some long text with many words")
+#=> some long: 0.1
+#=> long text: 0.1
+#=> text with: -0.1
+#=> with many: -0.1
+#=> many words: -0.1
+#=> some long text: -0.1
+#=> long text with: 0.1
+#=> text with many: 0.1
+#=> with many words: 0.1
+#=> '0.1'
+# Using :normalize, you can make TextMood return a normalized value: 1 for positive,
+# 0 for neutral and -1 for negative
+scorer = TextMood.new(lang: "en_US", normalize: true)
+score = scorer.score_text("some text")
+#=> '1'
+# :min_threshold and :max_threshold lets you customize the way :normalize treats
+# different values. The options below will make all scores below 1 negative,
+# 1-2 will be neutral, and above 2 will be positive.
+scorer = TextMood.new(lang: "en_US", normalize: true, min_threshold: 1, max_threshold: 2)
+score = scorer.score_text("some text")
+#=> '0'
+# :debug prints out all tokens to stdout, alongs with their values (or 'nil' when the
+# token was not found)
+scorer = TextMood.new(lang: "en_US", debug: true)
+score = scorer.score_text("some text")
+#=> some: 0.1
+#=> text: 0.1
+#=> some text: -0.1
+#=> '0.1'
+```
+#### CLI tool
+Or you can simply pass some UTF-8-encoded text to the cli tool and get a score back, like so
+```bash
+textmood -l en_US "<some text>"
+-0.4375
+```
+The cli tool has many useful options, mostly mirroring those of the library:
+```
+Usage: textmood [options] "<text>"
+Returns a floating-point sentiment score of the provided text.
+Above 0 is considered positive, below is considered negative.
+MANDATORY options:
+    -l, --language LANGUAGE          The IETF language tag for the provided text.
+                                     Examples: en_US, no_NB
+              OR
+    -f, --file PATH TO FILE          Use the specified sentiment file. May be used
+                                     multiple times to load several files. No other
+                                     files will be loaded if this option is used.
+OPTIONAL options:
+        --start-ngram INTEGER        The lowest word N-gram number to split the text into
+                                     (default 1). Note that this only makes sense if the
+                                     sentiment file has tokens of similar N-gram length
+        --end-ngram INTEGER          The highest word N-gram number to to split the text into
+                                     (default 1). Note that this only makes sense if the
+                                     sentiment file has tokens of similar N-gram length
+    -n, --normalize                  Return 1 (positive), -1 (negative) or 0 (neutral)
+                                     instead of the actual score. See also --min and --max.
+        --min-threshold FLOAT        Scores lower than this are considered negative when
+                                     using --normalize (default -0.5)
+        --max-threshold FLOAT        Scores higher than this are considered positive when
+                                     using --normalize (default 0.5)
+    -s, --skip-symbols               Do not include symbols file (emoticons etc.).
+                                     Only applies when using -l/--language.
+    -d, --debug                      Prints out the score for each token in the provided text
+                                     or 'nil' if the token was not found in the sentiment file
+    -h, --help                       Show this message
+```
+## Sentiment files
+The included sentiment files reside in the *lang* directory. I hope to add many
+more baseline sentiment files in the future.
+Sentiment files should be named according to the IETF language tag, like *en_US*,
+and contain one colon-separated line per token, like so:
+```
+1.0: epic
+1.0: good
+1.0: upright
+0.958: fortunate
+0.875: wonderfulness
+0.875: wonderful
+0.875: wide-eyed
+0.875: wholesomeness
+0.875: well-to-do
+0.875: well-situated
+0.6: well suited
+```
+The score is to the left of the first ':', and everything to the right is the
+(potentially multi-word) token.
+## Contribute
+Including baseline word/N-gram scores for many different languages is one
+of the expressed goals of this project. If you are able to contribute scores
+for a missing language or improve an existing one, it would be much appreciated!
+The process is the usual:
+* Fork
+* Add/improve
+* Pull request
+## Credits
+Loosely based on https://github.com/cmaclell/Basic-Tweet-Sentiment-Analyzer
+## Author
+Stian Grytøyr

data/bin/textmood ADDED Viewed

@@ -0,0 +1,108 @@
+#!/usr/bin/env ruby
+#encoding: utf-8
+if RUBY_VERSION < '1.9'
+  $KCODE='u'
+else
+  Encoding.default_external = Encoding::UTF_8
+  Encoding.default_internal = Encoding::UTF_8
+end
+$:.unshift File.join(File.dirname(__FILE__), *%w{ .. lib })
+require "optparse"
+require "textmood"
+usage = "Usage: #{File.basename($0)} [options] \"<text>\""
+def mini_usage(usage, notext = false)
+  puts usage
+  puts ""
+  if notext
+    puts "ERROR: Quoted text must be provided after the last option."
+  else
+    puts "ERROR: An IETF language tag must be provided using the -l/--language option."
+  end
+  puts ""
+  puts "Use \"#{File.basename($0)} -h\" for full usage info."
+  puts ""
+  exit 20
+end
+if ARGV[0] != "-h" and ARGV[0] != "--help" and not (ARGV[0] and ARGV[1])
+  mini_usage(usage)
+end
+options = {:files => []}
+opts_parser = OptionParser.new do |opts|
+  opts.banner = usage
+  opts.separator ""
+  opts.separator "Returns a floating-point sentiment score of the provided text."
+  opts.separator "Above 0 is considered positive, below is considered negative."
+  opts.separator ""
+  opts.separator "MANDATORY options:"
+  opts.on("-l", "--language LANGUAGE", "The IETF language tag for the provided text.",
+                                       "Examples: en_US, no_NB") do |l|
+    options[:lang] = l
+  end
+  opts.separator ""
+  opts.separator "              OR "
+  opts.separator ""
+  opts.on("-f", "--file PATH TO FILE", "Use the specified sentiment file. May be used",
+                                       "multiple times to load several files. No other",
+                                       "files will be loaded if this option is used.") do |f|
+    options[:files] << f
+  end
+  opts.separator ""
+  opts.separator "OPTIONAL options:"
+  opts.on("--start-ngram INTEGER", "The lowest word N-gram number to split the text into",
+                                   "(default 1). Note that this only makes sense if the",
+                                   "sentiment file has tokens of similar N-gram length") do |start_ngram|
+    options[:start_ngram] = start_ngram.to_i
+  end
+  opts.separator ""
+  opts.on("--end-ngram INTEGER", "The highest word N-gram number to to split the text into",
+                                 "(default 1). Note that this only makes sense if the",
+                                 "sentiment file has tokens of similar N-gram length") do |end_ngram|
+    options[:end_ngram] = end_ngram.to_i
+  end
+  opts.separator ""
+  opts.on("-n", "--normalize", "Return 1 (positive), -1 (negative) or 0 (neutral)",
+                               "instead of the actual score. See also --min and --max.") do |n|
+    options[:normalize] = true
+  end
+  opts.separator ""
+  opts.on("--min-threshold FLOAT", "Scores lower than this are considered negative when",
+                                   "using --normalize (default -0.5)") do |min|
+    options[:min_threshold] = min.to_f
+  end
+  opts.separator ""
+  opts.on("--max-threshold FLOAT", "Scores higher than this are considered positive when",
+                                   "using --normalize (default 0.5)") do |max|
+    options[:max_threshold] = max.to_f
+  end
+  opts.separator ""
+  opts.on("-s", "--skip-symbols", "Do not include symbols file (emoticons etc.).",
+                                  "Only applies when using -l/--language.") do |s|
+    options[:include_symbols] = false
+  end
+  opts.separator ""
+  opts.on("-d", "--debug", "Prints out the score for each token in the provided text",
+                           "or 'nil' if the token was not found in the sentiment file") do |d|
+    options[:debug] = true
+  end
+  opts.separator ""
+  opts.on_tail("-h", "--help", "Show this message") do
+    puts opts
+    puts ""
+    exit
+  end
+end
+opts_parser.parse!
+if ARGV[0]
+  scorer = TextMood.new(options)
+  puts scorer.score_text(ARGV[0])
+else
+  mini_usage(usage, true)
+end