textmood 0.0.5 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. data/README.md +66 -45
  2. data/bin/textmood +3 -3
  3. data/lib/textmood.rb +5 -3
  4. data/test/test.rb +4 -4
  5. metadata +1 -1
data/README.md CHANGED
@@ -1,14 +1,19 @@
1
1
  ## TextMood - Simple sentiment analyzer
2
- *TextMood* is a simple sentiment analyzer, provided as a Ruby gem with a command-line
3
- tool for simple interoperability with other processes. It takes text as input and
4
- returns a sentiment score. Above 0 is typically considered positive, below is
5
- considered negative.
2
+ *TextMood* is a simple but powerful sentiment analyzer, provided as a Ruby gem with
3
+ a command-line tool for simple interoperability with other processes. It takes text
4
+ as input and returns a sentiment score.
6
5
 
7
- The goal is to have a robust and simple tool that comes with baseline sentiment files
8
- for many languages.
6
+ The sentiment analysis is relatively simple, and works by splitting the text into
7
+ tokens and comparing each token to a pre-selected sentiment score for that token.
8
+ The combined score for all tokens is then returned.
9
+
10
+ However, TextMood also supports doing multiple passes over the text, splitting
11
+ it into tokens of N words (N-grams) for each pass. By adding multi-word tokens to
12
+ the sentiment file and using this feature, you can achieve much greater accuracy
13
+ than with just single-word analysis.
9
14
 
10
15
  ### Installation
11
- The easiest way to get the latest stable version is to use gem:
16
+ The easiest way to get the latest stable version is to install the gem:
12
17
 
13
18
  gem install textmood
14
19
 
@@ -17,27 +22,50 @@ If you’d like to get the bleeding-edge version:
17
22
  git clone https://github.com/stiang/textmood
18
23
 
19
24
  ### Usage
20
- TextMood can be used as a ruby library or as a standalone CLI tool.
25
+ TextMood can be used as a Ruby library or as a standalone CLI tool.
21
26
 
22
27
  #### Ruby library
23
- You can use textmood in a ruby program like this:
28
+ You can use it in a Ruby program like this:
24
29
  ```ruby
25
30
  require "textmood"
26
31
 
27
32
  # The :lang parameter makes TextMood use one of the bundled language sentiment files
28
- scorer = TextMood.new(lang: "en_US")
29
- score = scorer.score_text("some text")
33
+ tm = TextMood.new(lang: "en_US")
34
+ score = tm.analyze("some text")
30
35
  #=> '1.121'
31
36
 
32
37
  # The :files parameter makes TextMood ignore the bundled sentiment files and use the
33
38
  # specified files instead. You can specify as many files as you want.
34
- scorer = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
39
+ tm = TextMood.new(files: ["en_US-mod1.txt", "emoticons.txt"])
40
+
41
+ # Using :normalize_output, you can make TextMood return a normalized value:
42
+ # 1 for positive, 0 for neutral and -1 for negative
43
+ tm = TextMood.new(lang: "en_US", normalize_output: true)
44
+ score = tm.analyze("some text")
45
+ #=> '1'
46
+
47
+ # :normalize_score will try to normalize the score to an integer between +/- 100,
48
+ # based on how many tokens were scored, which can be useful when trying to compare
49
+ # scores for texts of different length
50
+ tm = TextMood.new(lang: "en_US", normalize_score: true)
51
+ score = tm.analyze("some text")
52
+ #=> '14'
53
+
54
+ # :min_threshold and :max_threshold lets you customize the way :normalize_output
55
+ # treats different values. The options below will make all scores below 1 negative,
56
+ # 1-2 will be neutral, and above 2 will be positive.
57
+ tm = TextMood.new(lang: "en_US",
58
+ normalize_output: true,
59
+ min_threshold: 1,
60
+ max_threshold: 2)
61
+ score = tm.analyze("some text")
62
+ #=> '0'
35
63
 
36
64
  # TextMood will by default make one pass over the text, checking every word, but it
37
65
  # supports doing several passes for any range of word N-grams. Both the start and end
38
66
  # N-gram can be specified using the :start_ngram and :end_ngram options
39
- scorer = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
40
- score = scorer.score_text("some long text with many words")
67
+ tm = TextMood.new(lang: "en_US", debug: true, start_ngram: 2, end_ngram: 3)
68
+ score = tm.analyze("some long text with many words")
41
69
  #(stdout): some long: 0.1
42
70
  #(stdout): long text: 0.1
43
71
  #(stdout): text with: -0.1
@@ -49,23 +77,10 @@ score = scorer.score_text("some long text with many words")
49
77
  #(stdout): with many words: 0.1
50
78
  #=> '0.1'
51
79
 
52
- # Using :normalize, you can make TextMood return a normalized value: 1 for positive,
53
- # 0 for neutral and -1 for negative
54
- scorer = TextMood.new(lang: "en_US", normalize: true)
55
- score = scorer.score_text("some text")
56
- #=> '1'
57
-
58
- # :min_threshold and :max_threshold lets you customize the way :normalize treats
59
- # different values. The options below will make all scores below 1 negative,
60
- # 1-2 will be neutral, and above 2 will be positive.
61
- scorer = TextMood.new(lang: "en_US", normalize: true, min_threshold: 1, max_threshold: 2)
62
- score = scorer.score_text("some text")
63
- #=> '0'
64
-
65
80
  # :debug prints out all tokens to stdout, alongs with their values (or 'nil' when the
66
81
  # token was not found)
67
- scorer = TextMood.new(lang: "en_US", debug: true)
68
- score = scorer.score_text("some text")
82
+ tm = TextMood.new(lang: "en_US", debug: true)
83
+ score = tm.analyze("some text")
69
84
  #(stdout): some: 0.1
70
85
  #(stdout): text: 0.1
71
86
  #(stdout): some text: -0.1
@@ -89,6 +104,8 @@ The cli tool has many useful options, mostly mirroring those of the library. Her
89
104
  output from `textmood -h`:
90
105
  ```
91
106
  Usage: textmood [options] "<text>"
107
+ OR
108
+ echo "<text>" | textmood [options]
92
109
 
93
110
  Returns a floating-point sentiment score of the provided text.
94
111
  Above 0 is considered positive, below is considered negative.
@@ -104,28 +121,32 @@ MANDATORY options:
104
121
  files will be loaded if this option is used.
105
122
 
106
123
  OPTIONAL options:
107
- --start-ngram INTEGER The lowest word N-gram number to split the text into
108
- (default 1). Note that this only makes sense if the
109
- sentiment file has tokens of similar N-gram length
124
+ -o, --normalize-output Return 1 (positive), -1 (negative) or 0 (neutral)
125
+ instead of the actual score. See also --min and --max.
110
126
 
111
- --end-ngram INTEGER The highest word N-gram number to to split the text into
112
- (default 1). Note that this only makes sense if the
113
- sentiment file has tokens of similar N-gram length
127
+ -s, --normalize-score Tries to normalize the score to an integer between +/- 100
128
+ according to the number of tokens that were scored, making
129
+ it more feasible to compare scores for texts of different
130
+ length
114
131
 
115
- -n, --normalize-output Return 1 (positive), -1 (negative) or 0 (neutral)
116
- instead of the actual score. See also --min and --max.
132
+ -i, --min-threshold FLOAT Scores lower than this are considered negative when
133
+ using --normalize-output (default 0.5). Note that the
134
+ threshold is compared to the normalized score, if applicable
117
135
 
118
- --normalize-score Return 1 (positive), -1 (negative) or 0 (neutral)
119
- instead of the actual score. See also --min and --max.
136
+ -x, --max-threshold FLOAT Scores higher than this are considered positive when
137
+ using --normalize-output (default 0.5). Note that the
138
+ threshold is compared to the normalized score, if applicable
120
139
 
121
- --min-threshold FLOAT Scores lower than this are considered negative when
122
- using --normalize (default -0.5)
140
+ -b, --start-ngram INTEGER The lowest word N-gram number to split the text into
141
+ (default 1). Note that this only makes sense if the
142
+ sentiment file has tokens of similar N-gram length
123
143
 
124
- --max-threshold FLOAT Scores higher than this are considered positive when
125
- using --normalize (default 0.5)
144
+ -e, --end-ngram INTEGER The highest word N-gram number to to split the text into
145
+ (default 1). Note that this only makes sense if the
146
+ sentiment file has tokens of similar N-gram length
126
147
 
127
- -s, --skip-symbols Do not include symbols file (emoticons etc.).
128
- Only applies when using -l/--language.
148
+ -k, --skip-symbols Do not include symbols file (emoticons etc.). Only applies
149
+ when using -l/--language.
129
150
 
130
151
  -d, --debug Prints out the score for each token in the provided text
131
152
  or 'nil' if the token was not found in the sentiment file
data/bin/textmood CHANGED
@@ -63,7 +63,7 @@ opts_parser = OptionParser.new do |opts|
63
63
  opts.separator ""
64
64
  opts.on("-s", "--normalize-score", "Tries to normalize the score to an integer between +/- 100",
65
65
  "according to the number of tokens that were scored, making",
66
- "it more feasible to compare scores between texts of different",
66
+ "it more feasible to compare scores for texts of different",
67
67
  "length") do |ns|
68
68
  options[:normalize_score] = true
69
69
  end
@@ -111,8 +111,8 @@ end
111
111
  opts_parser.parse!
112
112
 
113
113
  def do_main(text, options)
114
- scorer = TextMood.new(options)
115
- puts scorer.score_text(text)
114
+ tm = TextMood.new(options)
115
+ puts tm.analyze(text)
116
116
  end
117
117
 
118
118
  if ARGV[0]
data/lib/textmood.rb CHANGED
@@ -37,7 +37,7 @@ class TextMood
37
37
  end
38
38
 
39
39
  # analyzes the sentiment of the provided text.
40
- def score_text(text)
40
+ def analyze(text)
41
41
  sentiment_total = 0.0
42
42
 
43
43
  scores_added = 0
@@ -67,6 +67,8 @@ class TextMood
67
67
  end
68
68
  end
69
69
 
70
+ alias_method :analyse, :analyze
71
+
70
72
  private
71
73
 
72
74
  def score_token(token)
@@ -115,8 +117,8 @@ class TextMood
115
117
  end
116
118
 
117
119
  def normalize_score(score, count)
118
- factor = NORMALIZE_TO / count
119
- (score * factor).to_i
120
+ factor = NORMALIZE_TO.to_f / count.to_f
121
+ (score * factor).round
120
122
  end
121
123
 
122
124
  end
data/test/test.rb CHANGED
@@ -17,14 +17,14 @@ include Test::Unit::Assertions
17
17
  class TestScorer < Test::Unit::TestCase
18
18
 
19
19
  def setup
20
- @scorer = TextMood.new({:lang => "en_US"})
20
+ @tm = TextMood.new({:lang => "en_US"})
21
21
  end
22
22
 
23
23
  def test_negative
24
24
  max = -0.01
25
25
  texts = ["This is just terrible"]
26
26
  texts.each do |text|
27
- actual_score = @scorer.score_text(text)
27
+ actual_score = @tm.analyze(text)
28
28
  assert((actual_score < max), "actual: #{actual_score} >= max: #{max} for '#{text}'")
29
29
  end
30
30
  end
@@ -34,7 +34,7 @@ class TestScorer < Test::Unit::TestCase
34
34
  max = 0.5
35
35
  texts = ["This is neutral"]
36
36
  texts.each do |text, test_score|
37
- actual_score = @scorer.score_text(text)
37
+ actual_score = @tm.analyze(text)
38
38
  assert((actual_score < max and actual_score > min), "min: #{min} <= actual: #{actual_score} >= max: #{max} for '#{text}'")
39
39
  end
40
40
  end
@@ -43,7 +43,7 @@ class TestScorer < Test::Unit::TestCase
43
43
  min = 0.01
44
44
  texts = ["This is amazing!"]
45
45
  texts.each do |text, test_score|
46
- actual_score = @scorer.score_text(text)
46
+ actual_score = @tm.analyze(text)
47
47
  assert((actual_score >= min), "actual: #{actual_score} <= max: #{min} for '#{text}'")
48
48
  end
49
49
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: textmood
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors: