huginn_naive_bayes_agent 0.1.3 → 0.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c4bf23b64717fc62020b16dcc96b3a9767e71f42
4
- data.tar.gz: d3687649e9758f6f57e71478ec4edeb47c9d4c1e
3
+ metadata.gz: 7587d0f8882b59f15113b585cfb259bc14c8ca1e
4
+ data.tar.gz: be9c5416f658807eadff0e7deeeaae28a9260d65
5
5
  SHA512:
6
- metadata.gz: 97a60c81731ae167e119b8bb68852341f15d7d5e1d6dd03ce728662a9adcdbe42ca5b2886dcfa8ba9e3c4cfe9186fd6f621f916dc3a303cd7a24c29182bcaf27
7
- data.tar.gz: 86bc9dfe3906f16cac1851bc38257768ded8fbb7611a527c35946e19ac03939dd61e511e4056c44cc4171331be44f6c244fe13081c34fd2d18c37757c60c746b
6
+ metadata.gz: ab9ec982cc1bf2a5bc49001069b25bc10db65654c57edd591d8e6d446e802498505c499f78680b2b89afe6a16351792d5a06060e66b76cd205fb4dfc6f390bc0
7
+ data.tar.gz: bcd43b5588a809d542bbaefbfc962508bb90f98737f33e53673aea3cef56202e66f2643101c3fb3a427e60653b27138d8c04e11722e39cdfa2d507ce34ce1648
@@ -1,5 +1,6 @@
1
1
  require 'nbayes'
2
- require'yaml'
2
+ require 'yaml'
3
+ require 'fast_stemmer'
3
4
 
4
5
  module Agents
5
6
  class NaiveBayesAgent < Agent
@@ -18,7 +19,9 @@ module Agents
18
19
 
19
20
  However, if `nb_cats` is already populated, then the content from `nb_content` will be used as training data for the categories listed in `nb_cats`. For instance, say `nb_cats` consists of `trees`. Then `nb_content` will be used as training data for the category `trees`. The data is saved to the agent memory.
20
21
 
21
- Data in `nb_content` can be cleaned before classification. If `strip_punctuation` is set to true, the text in `nb_content` is stripped of punctuation before it is sent to the classifier. The changes are not saved to `nb_content`.
22
+ Data in `nb_content` can be cleaned before classification. If `strip_punctuation` is set to true, the text in `nb_content` is stripped of punctuation before it is sent to the classifier. The changes are not saved to `nb_content` but will affect the Agent's saved training data.
23
+
24
+ Content can also be "stemmed", reducing words to their base, by setting `stem` to true. Stemming will reduce "epistemology", "epistemologies", and "epistemological" all to "epistemolog". See [here](https://github.com/romanbsd/fast-stemmer) for the implementation used. Again, changes are not saved to `nb_content` but will affect the Agent's saved training data.
22
25
 
23
26
  When an event is received for classification, the Naive Bayes Agent will assign a value between 0 and 1 representing the likelihood that it falls under a category. The `min_value` option lets you choose the minimum threshold that must be reached before the event is labeled with that category. If `min_value` is set to 1, then the event is labeled with whichever category has the highest value.
24
27
 
@@ -26,7 +29,7 @@ module Agents
26
29
 
27
30
  To load trained data into an agent's memory, create a Manual Agent with `nb_cats : =loadYML` and `nb_content : your-well-formed-training-data-here`. Use the text input box, not the form view, by clicking "Toggle View" when inputting your training data else whitespace errors occur in the YML. Then submit this to your Naive Bayes Agent.
28
31
 
29
- #### Advanced Features
32
+ #### Advanced Naive Bayes Features
30
33
 
31
34
  ##### Only works if the nbayes dependency was installed from Github, version => .1.2. Rubygems is still .1.1
32
35
 
@@ -46,7 +49,8 @@ module Agents
46
49
  'min_value' => "0.5",
47
50
  'propagate_training_events' => 'true',
48
51
  'expected_update_period_in_days' => "7",
49
- 'strip_punctuation' => 'false'
52
+ 'strip_punctuation' => 'false',
53
+ 'stem' => 'false'
50
54
  }
51
55
  end
52
56
 
@@ -90,6 +94,9 @@ module Agents
90
94
  if interpolated['strip_punctuation'] == "true"
91
95
  nb_content = nb_content.gsub(/[^[:word:]\s]/, '') #https://stackoverflow.com/a/10074271
92
96
  end
97
+ if interpolated['stem'] == "true"
98
+ nb_content = nb_content.split(/\s+/).map{|word| word.stem}.join(" ")
99
+ end
93
100
  cats.each do |c|
94
101
  c.starts_with?('-') ? nbayes.untrain(nb_content.split(/\s+/), c[1..-1]) : nbayes.train(nb_content.split(/\s+/), c)
95
102
  end
@@ -104,6 +111,9 @@ module Agents
104
111
  if interpolated['strip_punctuation'] == "true"
105
112
  nb_content = nb_content.gsub(/[^[:word:]\s]/, '') #https://stackoverflow.com/a/10074271
106
113
  end
114
+ if interpolated['stem'] == "true"
115
+ nb_content = nb_content.split(/\s+/).map{|word| word.stem}.join(" ")
116
+ end
107
117
  result = nbayes.classify(nb_content.split(/\s+/))
108
118
  if interpolated['min_value'].to_f == 1
109
119
  event.payload['nb_cats'] << (event.payload['nb_cats'].length == 0 ? result.max_class : " "+result.max_class)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: huginn_naive_bayes_agent
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Noah Greenstein
@@ -66,6 +66,20 @@ dependencies:
66
66
  - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: fast-stemmer
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
69
83
  description: The Huginn Naive Bayes agent uses some incoming Events as a training
70
84
  set for Naive Bayes Machine Learning. Then it classifies Events from other sources
71
85
  accordingly using tags. Acts as a Huginn Agent front end to the NBayes gem (https://github.com/oasic/nbayes).