NaiveText 0.5.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ec519a11d1de21e62b71bb470ef0097ed62910f8
4
- data.tar.gz: bc96077d8ed6fa7890692466a2954425a8719540
3
+ metadata.gz: 3b04e3a990ab60596a6e4067f3e6e6b7b762e9e7
4
+ data.tar.gz: 95cefeef5c2030e33c7290eecb848ec85e3a4d86
5
5
  SHA512:
6
- metadata.gz: 15446a47f72ed08af4c32924f5b7b246878bdc184f21a9616259a346c010baba9f25ca426a22f60bb29ff5d9e94b9cd311c41221fdb728542d794fa7777f5c01
7
- data.tar.gz: bfb9e832ddd6a8f3f7aebc8cc47c949f7aac93d1a718d6db461c827e247575c937ddf4ae23be3b89d6d2c0617b93d8a5f5ed1a830fc59b1f440c0bbd770e1e3b
6
+ metadata.gz: d4b7734d40ca51cb0af57485ca7312007ba2ef0982f471cd3d95c000e488ea1d526bc6a03a84d52cfc1eeb41a3dc0793c986e7d9be49424ead2811042f0b8ce5
7
+ data.tar.gz: aed39b603081561255c043fbd61d9de06e0e91a14a628e1b324589e8eb0f6d4d3428248b9e18c6f35bf79a21852f7a121256a8fa16530f0774960526eeab3deb
data/CHANGELOG.md CHANGED
@@ -2,6 +2,10 @@
2
2
  All notable changes to this project will be documented in this file.
3
3
  This project adheres to [Semantic Versioning](http://semver.org/).
4
4
 
5
+ ## [0.6.0]- 2015-11-30
6
+ ### Added
7
+ - Added optional language_model, that make it possible to compare words based on the word stem. (Like 'testing', 'tests', 'tested' all matched with the stem 'test')
8
+
5
9
  ## [0.5.1] - 2015-11-21
6
10
  ### Added
7
11
  - Added optional default category. This category will be returned from NaiveText.build if the algorithm can't find a match with the existing text examples. Default value is NullCategory.
data/Gemfile CHANGED
@@ -2,3 +2,8 @@ source 'https://rubygems.org'
2
2
 
3
3
  # Specify your gem's dependencies in NaiveText.gemspec
4
4
  gemspec
5
+
6
+
7
+ spec.add_development_dependency "guard"
8
+ spec.add_development_dependency "guard-rspec"
9
+ spec.add_development_dependency "guard-rubocop"
data/NaiveText.gemspec CHANGED
@@ -19,13 +19,12 @@ Gem::Specification.new do |spec|
19
19
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
20
  spec.require_paths = ["lib"]
21
21
 
22
+ spec.required_ruby_version = '>= 2.0.0'
23
+
22
24
  if spec.respond_to?(:metadata)
23
25
  spec.metadata['allowed_push_host'] = "https://rubygems.org"
24
26
  end
25
27
 
26
28
  spec.add_development_dependency "bundler", "~> 1.8"
27
29
  spec.add_development_dependency "rake", "~> 10.0"
28
- spec.add_development_dependency "guard"
29
- spec.add_development_dependency "guard-rspec"
30
- spec.add_development_dependency "guard-rubocop"
31
30
  end
data/README.md CHANGED
@@ -6,7 +6,7 @@ NaiveText is a text classifier gem written in ruby and made to be easily integra
6
6
 
7
7
  Text classifier are used in many areas of IT. The filter spam, predict what a user wants to buy, detect which language a text is written in, ...
8
8
 
9
- The kind of classifier included in NaiveText, uses existing text examples (junk-makrde e-mails, allready bought products, texts in different languages, ...) to calculate in which category (spam/e-mail, interesting_product/not_interesting_product, ...) a unknown text belongs.
9
+ The kind of classifier included in NaiveText, uses existing text examples (junk-makrde e-mails, already bought products, texts in different languages, ...) to calculate in which category (spam/e-mail, interesting_product/not_interesting_product, ...) a unknown text belongs.
10
10
 
11
11
  ## Installation
12
12
 
@@ -31,32 +31,9 @@ You can also use local files as examples (via ExamplesFactory.from_files('path/t
31
31
 
32
32
 
33
33
 
34
- ### Example
34
+ ## Example
35
35
 
36
- Lets pretend you write some kind of forum. A user can write posts and can vote them up or down.
37
-
38
- We will build a system which predicts if a new post is interesting to the user or if this post will bore him a sleep.
39
-
40
- In your system (an rails app of course) you haven a *Post* model with a text attribute containing the posts content. There are also two scopes on Post: *up_voted* and *down_voted*, which return all up/down voted posts.
41
-
42
- ```ruby
43
- require 'NaiveText'
44
-
45
- interesting_examples = Post.up_voted
46
- boring_examples = Post.down_voted
47
-
48
- categories = [{name: 'interesting', examples: interesting_examples},
49
- {name: 'boring', examples: boring_examples}];
50
-
51
- classifier = NaiveText.build(categories: categories)
52
-
53
- category = classifier.classify(new_interesting_post.text)
54
- category.name
55
- => 'interesting'
56
- ```
57
- Checkout the full example and some more in the
58
- [NaiveText-example repo](https://github.com/RicciFlowing/NaiveText-examples).
59
- Have fun using it!
36
+ Can be found on the projects [homepage](https://ricciflowing.github.io/NaiveText/).
60
37
 
61
38
  ## Contributing
62
39
 
@@ -3,7 +3,7 @@ class CategoriesFactory
3
3
  categories = []
4
4
  default = nil
5
5
  if config.is_a?(Array)
6
- puts "The format [{name: name_of_category, path: path_to_trainings_data}] is deprecated and will be removed in future versions. Use the following arguments instead: categories: [name: 'the name', examples:'An example']"
6
+ puts "The format [{name: name_of_category, path: path_to_trainings_data}] is deprecated and will be removed in version 1.0.0 (due in Jan. 2016). Use the following arguments instead: categories: [name: 'the name', examples:'An example']"
7
7
  config.each do |category_config|
8
8
  begin
9
9
  examples = ExamplesFactory.from_files(category_config[:path])
@@ -20,7 +20,7 @@ class CategoriesFactory
20
20
  else
21
21
  config[:categories].each do |category_config|
22
22
  begin
23
- group = ExamplesGroup.new(examples: category_config[:examples])
23
+ group = ExamplesGroup.new(examples: category_config[:examples], language_model: config[:language_model] )
24
24
  category = Category.new(name: category_config[:name], examples: group, weight: category_config[:weight])
25
25
  categories << category
26
26
  if category_config[:name] == config[:default]
@@ -7,7 +7,7 @@ class ExamplesFactory
7
7
  examples.push FileExample.new(path: dir_path+'/'+file_path)
8
8
  end
9
9
  rescue
10
- puts "Failed laoding" + dir_path
10
+ puts "Failed loading" + dir_path
11
11
  end
12
12
  examples
13
13
  end
@@ -1,6 +1,7 @@
1
1
  class ExamplesGroup
2
2
  def initialize(args)
3
- @examples = args[:examples].to_a || []
3
+ @examples = args[:examples].to_a || []
4
+ @language_model = args[:language_model] || lambda {|str| str}
4
5
  load_text
5
6
  split_text_into_words
6
7
  format_words
@@ -10,7 +11,7 @@ class ExamplesGroup
10
11
  end
11
12
 
12
13
  def count(word)
13
- @words.count(word.downcase)
14
+ @words.count(@language_model.call(word.downcase))
14
15
  end
15
16
 
16
17
  def word_count
@@ -32,5 +33,7 @@ class ExamplesGroup
32
33
 
33
34
  def format_words
34
35
  @words.map! {|word| word.downcase}
36
+ @words.map! {|word| @language_model.call(word)}
37
+ @words
35
38
  end
36
39
  end
@@ -1,3 +1,3 @@
1
1
  module NaiveText
2
- VERSION = "0.5.1"
2
+ VERSION = "0.6.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: NaiveText
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.1
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - RicciFlowing
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2015-11-21 00:00:00.000000000 Z
11
+ date: 2015-12-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -38,48 +38,6 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '10.0'
41
- - !ruby/object:Gem::Dependency
42
- name: guard
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: guard-rspec
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- - !ruby/object:Gem::Dependency
70
- name: guard-rubocop
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '0'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '0'
83
41
  description: NaiveText is a text classifier gem written in ruby and made to be easily
84
42
  integratable in your Rails app.
85
43
  email:
@@ -124,7 +82,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
124
82
  requirements:
125
83
  - - ">="
126
84
  - !ruby/object:Gem::Version
127
- version: '0'
85
+ version: 2.0.0
128
86
  required_rubygems_version: !ruby/object:Gem::Requirement
129
87
  requirements:
130
88
  - - ">="