huginn_naive_bayes_agent 0.1.3 → 0.1.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/lib/huginn_naive_bayes_agent/naive_bayes_agent.rb +14 -4
- metadata +15 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7587d0f8882b59f15113b585cfb259bc14c8ca1e
|
4
|
+
data.tar.gz: be9c5416f658807eadff0e7deeeaae28a9260d65
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ab9ec982cc1bf2a5bc49001069b25bc10db65654c57edd591d8e6d446e802498505c499f78680b2b89afe6a16351792d5a06060e66b76cd205fb4dfc6f390bc0
|
7
|
+
data.tar.gz: bcd43b5588a809d542bbaefbfc962508bb90f98737f33e53673aea3cef56202e66f2643101c3fb3a427e60653b27138d8c04e11722e39cdfa2d507ce34ce1648
|
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'nbayes'
|
2
|
-
require'yaml'
|
2
|
+
require 'yaml'
|
3
|
+
require 'fast_stemmer'
|
3
4
|
|
4
5
|
module Agents
|
5
6
|
class NaiveBayesAgent < Agent
|
@@ -18,7 +19,9 @@ module Agents
|
|
18
19
|
|
19
20
|
However, if `nb_cats` is already populated, then the content from `nb_content` will be used as training data for the categories listed in `nb_cats`. For instance, say `nb_cats` consists of `trees`. Then `nb_content` will be used as training data for the category `trees`. The data is saved to the agent memory.
|
20
21
|
|
21
|
-
Data in `nb_content` can be cleaned before classification. If `strip_punctuation` is set to true, the text in `nb_content` is stripped of punctuation before it is sent to the classifier. The changes are not saved to `nb_content
|
22
|
+
Data in `nb_content` can be cleaned before classification. If `strip_punctuation` is set to true, the text in `nb_content` is stripped of punctuation before it is sent to the classifier. The changes are not saved to `nb_content` but will affect the Agent's saved training data.
|
23
|
+
|
24
|
+
Content can also be "stemmed", reducing words to their base, by setting `stem` to true. Stemming will reduce "epistemology", "epistemologies", and "epistemological" all to "epistemolog". See [here](https://github.com/romanbsd/fast-stemmer) for the implementation used. Again, changes are not saved to `nb_content` but will affect the Agent's saved training data.
|
22
25
|
|
23
26
|
When an event is received for classification, the Naive Bayes Agent will assign a value between 0 and 1 representing the likelihood that it falls under a category. The `min_value` option lets you choose the minimum threshold that must be reached before the event is labeled with that category. If `min_value` is set to 1, then the event is labeled with whichever category has the highest value.
|
24
27
|
|
@@ -26,7 +29,7 @@ module Agents
|
|
26
29
|
|
27
30
|
To load trained data into an agent's memory, create a Manual Agent with `nb_cats : =loadYML` and `nb_content : your-well-formed-training-data-here`. Use the text input box, not the form view, by clicking "Toggle View" when inputting your training data else whitespace errors occur in the YML. Then submit this to your Naive Bayes Agent.
|
28
31
|
|
29
|
-
#### Advanced Features
|
32
|
+
#### Advanced Naive Bayes Features
|
30
33
|
|
31
34
|
##### Only works if the nbayes dependency was installed from Github, version => .1.2. Rubygems is still .1.1
|
32
35
|
|
@@ -46,7 +49,8 @@ module Agents
|
|
46
49
|
'min_value' => "0.5",
|
47
50
|
'propagate_training_events' => 'true',
|
48
51
|
'expected_update_period_in_days' => "7",
|
49
|
-
'strip_punctuation' => 'false'
|
52
|
+
'strip_punctuation' => 'false',
|
53
|
+
'stem' => 'false'
|
50
54
|
}
|
51
55
|
end
|
52
56
|
|
@@ -90,6 +94,9 @@ module Agents
|
|
90
94
|
if interpolated['strip_punctuation'] == "true"
|
91
95
|
nb_content = nb_content.gsub(/[^[:word:]\s]/, '') #https://stackoverflow.com/a/10074271
|
92
96
|
end
|
97
|
+
if interpolated['stem'] == "true"
|
98
|
+
nb_content = nb_content.split(/\s+/).map{|word| word.stem}.join(" ")
|
99
|
+
end
|
93
100
|
cats.each do |c|
|
94
101
|
c.starts_with?('-') ? nbayes.untrain(nb_content.split(/\s+/), c[1..-1]) : nbayes.train(nb_content.split(/\s+/), c)
|
95
102
|
end
|
@@ -104,6 +111,9 @@ module Agents
|
|
104
111
|
if interpolated['strip_punctuation'] == "true"
|
105
112
|
nb_content = nb_content.gsub(/[^[:word:]\s]/, '') #https://stackoverflow.com/a/10074271
|
106
113
|
end
|
114
|
+
if interpolated['stem'] == "true"
|
115
|
+
nb_content = nb_content.split(/\s+/).map{|word| word.stem}.join(" ")
|
116
|
+
end
|
107
117
|
result = nbayes.classify(nb_content.split(/\s+/))
|
108
118
|
if interpolated['min_value'].to_f == 1
|
109
119
|
event.payload['nb_cats'] << (event.payload['nb_cats'].length == 0 ? result.max_class : " "+result.max_class)
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: huginn_naive_bayes_agent
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Noah Greenstein
|
@@ -66,6 +66,20 @@ dependencies:
|
|
66
66
|
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: fast-stemmer
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - ">="
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '0'
|
76
|
+
type: :runtime
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - ">="
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0'
|
69
83
|
description: The Huginn Naive Bayes agent uses some incoming Events as a training
|
70
84
|
set for Naive Bayes Machine Learning. Then it classifies Events from other sources
|
71
85
|
accordingly using tags. Acts as a Huginn Agent front end to the NBayes gem (https://github.com/oasic/nbayes).
|