NaiveText 0.5.1 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/Gemfile +5 -0
- data/NaiveText.gemspec +2 -3
- data/README.md +3 -26
- data/lib/NaiveText/CategoriesFactory.rb +2 -2
- data/lib/NaiveText/ExamplesFactory.rb +1 -1
- data/lib/NaiveText/ExamplesGroup.rb +5 -2
- data/lib/NaiveText/version.rb +1 -1
- metadata +3 -45
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3b04e3a990ab60596a6e4067f3e6e6b7b762e9e7
|
4
|
+
data.tar.gz: 95cefeef5c2030e33c7290eecb848ec85e3a4d86
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d4b7734d40ca51cb0af57485ca7312007ba2ef0982f471cd3d95c000e488ea1d526bc6a03a84d52cfc1eeb41a3dc0793c986e7d9be49424ead2811042f0b8ce5
|
7
|
+
data.tar.gz: aed39b603081561255c043fbd61d9de06e0e91a14a628e1b324589e8eb0f6d4d3428248b9e18c6f35bf79a21852f7a121256a8fa16530f0774960526eeab3deb
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,10 @@
|
|
2
2
|
All notable changes to this project will be documented in this file.
|
3
3
|
This project adheres to [Semantic Versioning](http://semver.org/).
|
4
4
|
|
5
|
+
## [0.6.0]- 2015-11-30
|
6
|
+
### Added
|
7
|
+
- Added optional language_model, that make it possible to compare words based on the word stem. (Like 'testing', 'tests', 'tested' all matched with the stem 'test')
|
8
|
+
|
5
9
|
## [0.5.1] - 2015-11-21
|
6
10
|
### Added
|
7
11
|
- Added optional default category. This category will be returned from NaiveText.build if the algorithm can't find a match with the existing text examples. Default value is NullCategory.
|
data/Gemfile
CHANGED
data/NaiveText.gemspec
CHANGED
@@ -19,13 +19,12 @@ Gem::Specification.new do |spec|
|
|
19
19
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
20
20
|
spec.require_paths = ["lib"]
|
21
21
|
|
22
|
+
spec.required_ruby_version = '>= 2.0.0'
|
23
|
+
|
22
24
|
if spec.respond_to?(:metadata)
|
23
25
|
spec.metadata['allowed_push_host'] = "https://rubygems.org"
|
24
26
|
end
|
25
27
|
|
26
28
|
spec.add_development_dependency "bundler", "~> 1.8"
|
27
29
|
spec.add_development_dependency "rake", "~> 10.0"
|
28
|
-
spec.add_development_dependency "guard"
|
29
|
-
spec.add_development_dependency "guard-rspec"
|
30
|
-
spec.add_development_dependency "guard-rubocop"
|
31
30
|
end
|
data/README.md
CHANGED
@@ -6,7 +6,7 @@ NaiveText is a text classifier gem written in ruby and made to be easily integra
|
|
6
6
|
|
7
7
|
Text classifier are used in many areas of IT. The filter spam, predict what a user wants to buy, detect which language a text is written in, ...
|
8
8
|
|
9
|
-
The kind of classifier included in NaiveText, uses existing text examples (junk-makrde e-mails,
|
9
|
+
The kind of classifier included in NaiveText, uses existing text examples (junk-makrde e-mails, already bought products, texts in different languages, ...) to calculate in which category (spam/e-mail, interesting_product/not_interesting_product, ...) a unknown text belongs.
|
10
10
|
|
11
11
|
## Installation
|
12
12
|
|
@@ -31,32 +31,9 @@ You can also use local files as examples (via ExamplesFactory.from_files('path/t
|
|
31
31
|
|
32
32
|
|
33
33
|
|
34
|
-
|
34
|
+
## Example
|
35
35
|
|
36
|
-
|
37
|
-
|
38
|
-
We will build a system which predicts if a new post is interesting to the user or if this post will bore him a sleep.
|
39
|
-
|
40
|
-
In your system (an rails app of course) you haven a *Post* model with a text attribute containing the posts content. There are also two scopes on Post: *up_voted* and *down_voted*, which return all up/down voted posts.
|
41
|
-
|
42
|
-
```ruby
|
43
|
-
require 'NaiveText'
|
44
|
-
|
45
|
-
interesting_examples = Post.up_voted
|
46
|
-
boring_examples = Post.down_voted
|
47
|
-
|
48
|
-
categories = [{name: 'interesting', examples: interesting_examples},
|
49
|
-
{name: 'boring', examples: boring_examples}];
|
50
|
-
|
51
|
-
classifier = NaiveText.build(categories: categories)
|
52
|
-
|
53
|
-
category = classifier.classify(new_interesting_post.text)
|
54
|
-
category.name
|
55
|
-
=> 'interesting'
|
56
|
-
```
|
57
|
-
Checkout the full example and some more in the
|
58
|
-
[NaiveText-example repo](https://github.com/RicciFlowing/NaiveText-examples).
|
59
|
-
Have fun using it!
|
36
|
+
Can be found on the projects [homepage](https://ricciflowing.github.io/NaiveText/).
|
60
37
|
|
61
38
|
## Contributing
|
62
39
|
|
@@ -3,7 +3,7 @@ class CategoriesFactory
|
|
3
3
|
categories = []
|
4
4
|
default = nil
|
5
5
|
if config.is_a?(Array)
|
6
|
-
puts "The format [{name: name_of_category, path: path_to_trainings_data}] is deprecated and will be removed in
|
6
|
+
puts "The format [{name: name_of_category, path: path_to_trainings_data}] is deprecated and will be removed in version 1.0.0 (due in Jan. 2016). Use the following arguments instead: categories: [name: 'the name', examples:'An example']"
|
7
7
|
config.each do |category_config|
|
8
8
|
begin
|
9
9
|
examples = ExamplesFactory.from_files(category_config[:path])
|
@@ -20,7 +20,7 @@ class CategoriesFactory
|
|
20
20
|
else
|
21
21
|
config[:categories].each do |category_config|
|
22
22
|
begin
|
23
|
-
group = ExamplesGroup.new(examples: category_config[:examples])
|
23
|
+
group = ExamplesGroup.new(examples: category_config[:examples], language_model: config[:language_model] )
|
24
24
|
category = Category.new(name: category_config[:name], examples: group, weight: category_config[:weight])
|
25
25
|
categories << category
|
26
26
|
if category_config[:name] == config[:default]
|
@@ -1,6 +1,7 @@
|
|
1
1
|
class ExamplesGroup
|
2
2
|
def initialize(args)
|
3
|
-
@examples
|
3
|
+
@examples = args[:examples].to_a || []
|
4
|
+
@language_model = args[:language_model] || lambda {|str| str}
|
4
5
|
load_text
|
5
6
|
split_text_into_words
|
6
7
|
format_words
|
@@ -10,7 +11,7 @@ class ExamplesGroup
|
|
10
11
|
end
|
11
12
|
|
12
13
|
def count(word)
|
13
|
-
@words.count(word.downcase)
|
14
|
+
@words.count(@language_model.call(word.downcase))
|
14
15
|
end
|
15
16
|
|
16
17
|
def word_count
|
@@ -32,5 +33,7 @@ class ExamplesGroup
|
|
32
33
|
|
33
34
|
def format_words
|
34
35
|
@words.map! {|word| word.downcase}
|
36
|
+
@words.map! {|word| @language_model.call(word)}
|
37
|
+
@words
|
35
38
|
end
|
36
39
|
end
|
data/lib/NaiveText/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: NaiveText
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- RicciFlowing
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-12-01 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -38,48 +38,6 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '10.0'
|
41
|
-
- !ruby/object:Gem::Dependency
|
42
|
-
name: guard
|
43
|
-
requirement: !ruby/object:Gem::Requirement
|
44
|
-
requirements:
|
45
|
-
- - ">="
|
46
|
-
- !ruby/object:Gem::Version
|
47
|
-
version: '0'
|
48
|
-
type: :development
|
49
|
-
prerelease: false
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
51
|
-
requirements:
|
52
|
-
- - ">="
|
53
|
-
- !ruby/object:Gem::Version
|
54
|
-
version: '0'
|
55
|
-
- !ruby/object:Gem::Dependency
|
56
|
-
name: guard-rspec
|
57
|
-
requirement: !ruby/object:Gem::Requirement
|
58
|
-
requirements:
|
59
|
-
- - ">="
|
60
|
-
- !ruby/object:Gem::Version
|
61
|
-
version: '0'
|
62
|
-
type: :development
|
63
|
-
prerelease: false
|
64
|
-
version_requirements: !ruby/object:Gem::Requirement
|
65
|
-
requirements:
|
66
|
-
- - ">="
|
67
|
-
- !ruby/object:Gem::Version
|
68
|
-
version: '0'
|
69
|
-
- !ruby/object:Gem::Dependency
|
70
|
-
name: guard-rubocop
|
71
|
-
requirement: !ruby/object:Gem::Requirement
|
72
|
-
requirements:
|
73
|
-
- - ">="
|
74
|
-
- !ruby/object:Gem::Version
|
75
|
-
version: '0'
|
76
|
-
type: :development
|
77
|
-
prerelease: false
|
78
|
-
version_requirements: !ruby/object:Gem::Requirement
|
79
|
-
requirements:
|
80
|
-
- - ">="
|
81
|
-
- !ruby/object:Gem::Version
|
82
|
-
version: '0'
|
83
41
|
description: NaiveText is a text classifier gem written in ruby and made to be easily
|
84
42
|
integratable in your Rails app.
|
85
43
|
email:
|
@@ -124,7 +82,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
124
82
|
requirements:
|
125
83
|
- - ">="
|
126
84
|
- !ruby/object:Gem::Version
|
127
|
-
version:
|
85
|
+
version: 2.0.0
|
128
86
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
129
87
|
requirements:
|
130
88
|
- - ">="
|