NaiveText 0.5.1 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/Gemfile +5 -0
- data/NaiveText.gemspec +2 -3
- data/README.md +3 -26
- data/lib/NaiveText/CategoriesFactory.rb +2 -2
- data/lib/NaiveText/ExamplesFactory.rb +1 -1
- data/lib/NaiveText/ExamplesGroup.rb +5 -2
- data/lib/NaiveText/version.rb +1 -1
- metadata +3 -45
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3b04e3a990ab60596a6e4067f3e6e6b7b762e9e7
|
4
|
+
data.tar.gz: 95cefeef5c2030e33c7290eecb848ec85e3a4d86
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d4b7734d40ca51cb0af57485ca7312007ba2ef0982f471cd3d95c000e488ea1d526bc6a03a84d52cfc1eeb41a3dc0793c986e7d9be49424ead2811042f0b8ce5
|
7
|
+
data.tar.gz: aed39b603081561255c043fbd61d9de06e0e91a14a628e1b324589e8eb0f6d4d3428248b9e18c6f35bf79a21852f7a121256a8fa16530f0774960526eeab3deb
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,10 @@
|
|
2
2
|
All notable changes to this project will be documented in this file.
|
3
3
|
This project adheres to [Semantic Versioning](http://semver.org/).
|
4
4
|
|
5
|
+
## [0.6.0]- 2015-11-30
|
6
|
+
### Added
|
7
|
+
- Added optional language_model, that make it possible to compare words based on the word stem. (Like 'testing', 'tests', 'tested' all matched with the stem 'test')
|
8
|
+
|
5
9
|
## [0.5.1] - 2015-11-21
|
6
10
|
### Added
|
7
11
|
- Added optional default category. This category will be returned from NaiveText.build if the algorithm can't find a match with the existing text examples. Default value is NullCategory.
|
data/Gemfile
CHANGED
data/NaiveText.gemspec
CHANGED
@@ -19,13 +19,12 @@ Gem::Specification.new do |spec|
|
|
19
19
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
20
20
|
spec.require_paths = ["lib"]
|
21
21
|
|
22
|
+
spec.required_ruby_version = '>= 2.0.0'
|
23
|
+
|
22
24
|
if spec.respond_to?(:metadata)
|
23
25
|
spec.metadata['allowed_push_host'] = "https://rubygems.org"
|
24
26
|
end
|
25
27
|
|
26
28
|
spec.add_development_dependency "bundler", "~> 1.8"
|
27
29
|
spec.add_development_dependency "rake", "~> 10.0"
|
28
|
-
spec.add_development_dependency "guard"
|
29
|
-
spec.add_development_dependency "guard-rspec"
|
30
|
-
spec.add_development_dependency "guard-rubocop"
|
31
30
|
end
|
data/README.md
CHANGED
@@ -6,7 +6,7 @@ NaiveText is a text classifier gem written in ruby and made to be easily integra
|
|
6
6
|
|
7
7
|
Text classifier are used in many areas of IT. The filter spam, predict what a user wants to buy, detect which language a text is written in, ...
|
8
8
|
|
9
|
-
The kind of classifier included in NaiveText, uses existing text examples (junk-makrde e-mails,
|
9
|
+
The kind of classifier included in NaiveText, uses existing text examples (junk-makrde e-mails, already bought products, texts in different languages, ...) to calculate in which category (spam/e-mail, interesting_product/not_interesting_product, ...) a unknown text belongs.
|
10
10
|
|
11
11
|
## Installation
|
12
12
|
|
@@ -31,32 +31,9 @@ You can also use local files as examples (via ExamplesFactory.from_files('path/t
|
|
31
31
|
|
32
32
|
|
33
33
|
|
34
|
-
|
34
|
+
## Example
|
35
35
|
|
36
|
-
|
37
|
-
|
38
|
-
We will build a system which predicts if a new post is interesting to the user or if this post will bore him a sleep.
|
39
|
-
|
40
|
-
In your system (an rails app of course) you haven a *Post* model with a text attribute containing the posts content. There are also two scopes on Post: *up_voted* and *down_voted*, which return all up/down voted posts.
|
41
|
-
|
42
|
-
```ruby
|
43
|
-
require 'NaiveText'
|
44
|
-
|
45
|
-
interesting_examples = Post.up_voted
|
46
|
-
boring_examples = Post.down_voted
|
47
|
-
|
48
|
-
categories = [{name: 'interesting', examples: interesting_examples},
|
49
|
-
{name: 'boring', examples: boring_examples}];
|
50
|
-
|
51
|
-
classifier = NaiveText.build(categories: categories)
|
52
|
-
|
53
|
-
category = classifier.classify(new_interesting_post.text)
|
54
|
-
category.name
|
55
|
-
=> 'interesting'
|
56
|
-
```
|
57
|
-
Checkout the full example and some more in the
|
58
|
-
[NaiveText-example repo](https://github.com/RicciFlowing/NaiveText-examples).
|
59
|
-
Have fun using it!
|
36
|
+
Can be found on the projects [homepage](https://ricciflowing.github.io/NaiveText/).
|
60
37
|
|
61
38
|
## Contributing
|
62
39
|
|
@@ -3,7 +3,7 @@ class CategoriesFactory
|
|
3
3
|
categories = []
|
4
4
|
default = nil
|
5
5
|
if config.is_a?(Array)
|
6
|
-
puts "The format [{name: name_of_category, path: path_to_trainings_data}] is deprecated and will be removed in
|
6
|
+
puts "The format [{name: name_of_category, path: path_to_trainings_data}] is deprecated and will be removed in version 1.0.0 (due in Jan. 2016). Use the following arguments instead: categories: [name: 'the name', examples:'An example']"
|
7
7
|
config.each do |category_config|
|
8
8
|
begin
|
9
9
|
examples = ExamplesFactory.from_files(category_config[:path])
|
@@ -20,7 +20,7 @@ class CategoriesFactory
|
|
20
20
|
else
|
21
21
|
config[:categories].each do |category_config|
|
22
22
|
begin
|
23
|
-
group = ExamplesGroup.new(examples: category_config[:examples])
|
23
|
+
group = ExamplesGroup.new(examples: category_config[:examples], language_model: config[:language_model] )
|
24
24
|
category = Category.new(name: category_config[:name], examples: group, weight: category_config[:weight])
|
25
25
|
categories << category
|
26
26
|
if category_config[:name] == config[:default]
|
@@ -1,6 +1,7 @@
|
|
1
1
|
class ExamplesGroup
|
2
2
|
def initialize(args)
|
3
|
-
@examples
|
3
|
+
@examples = args[:examples].to_a || []
|
4
|
+
@language_model = args[:language_model] || lambda {|str| str}
|
4
5
|
load_text
|
5
6
|
split_text_into_words
|
6
7
|
format_words
|
@@ -10,7 +11,7 @@ class ExamplesGroup
|
|
10
11
|
end
|
11
12
|
|
12
13
|
def count(word)
|
13
|
-
@words.count(word.downcase)
|
14
|
+
@words.count(@language_model.call(word.downcase))
|
14
15
|
end
|
15
16
|
|
16
17
|
def word_count
|
@@ -32,5 +33,7 @@ class ExamplesGroup
|
|
32
33
|
|
33
34
|
def format_words
|
34
35
|
@words.map! {|word| word.downcase}
|
36
|
+
@words.map! {|word| @language_model.call(word)}
|
37
|
+
@words
|
35
38
|
end
|
36
39
|
end
|
data/lib/NaiveText/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: NaiveText
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- RicciFlowing
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-12-01 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -38,48 +38,6 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '10.0'
|
41
|
-
- !ruby/object:Gem::Dependency
|
42
|
-
name: guard
|
43
|
-
requirement: !ruby/object:Gem::Requirement
|
44
|
-
requirements:
|
45
|
-
- - ">="
|
46
|
-
- !ruby/object:Gem::Version
|
47
|
-
version: '0'
|
48
|
-
type: :development
|
49
|
-
prerelease: false
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
51
|
-
requirements:
|
52
|
-
- - ">="
|
53
|
-
- !ruby/object:Gem::Version
|
54
|
-
version: '0'
|
55
|
-
- !ruby/object:Gem::Dependency
|
56
|
-
name: guard-rspec
|
57
|
-
requirement: !ruby/object:Gem::Requirement
|
58
|
-
requirements:
|
59
|
-
- - ">="
|
60
|
-
- !ruby/object:Gem::Version
|
61
|
-
version: '0'
|
62
|
-
type: :development
|
63
|
-
prerelease: false
|
64
|
-
version_requirements: !ruby/object:Gem::Requirement
|
65
|
-
requirements:
|
66
|
-
- - ">="
|
67
|
-
- !ruby/object:Gem::Version
|
68
|
-
version: '0'
|
69
|
-
- !ruby/object:Gem::Dependency
|
70
|
-
name: guard-rubocop
|
71
|
-
requirement: !ruby/object:Gem::Requirement
|
72
|
-
requirements:
|
73
|
-
- - ">="
|
74
|
-
- !ruby/object:Gem::Version
|
75
|
-
version: '0'
|
76
|
-
type: :development
|
77
|
-
prerelease: false
|
78
|
-
version_requirements: !ruby/object:Gem::Requirement
|
79
|
-
requirements:
|
80
|
-
- - ">="
|
81
|
-
- !ruby/object:Gem::Version
|
82
|
-
version: '0'
|
83
41
|
description: NaiveText is a text classifier gem written in ruby and made to be easily
|
84
42
|
integratable in your Rails app.
|
85
43
|
email:
|
@@ -124,7 +82,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
124
82
|
requirements:
|
125
83
|
- - ">="
|
126
84
|
- !ruby/object:Gem::Version
|
127
|
-
version:
|
85
|
+
version: 2.0.0
|
128
86
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
129
87
|
requirements:
|
130
88
|
- - ">="
|