jnb_classifier 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 471d53529cd3370211a12feaabc49e09afde67a3
4
- data.tar.gz: 87a9c51b9d20d7e4f1447ec16b61c87ea6a22079
3
+ metadata.gz: 99aaba58f7a9e3c702d91490ced2e45f1413e252
4
+ data.tar.gz: 7430e3ab70b3aa959c43e5a518e4323878b65378
5
5
  SHA512:
6
- metadata.gz: f8fec00adcc3ea7fd0136ecc9fadb1c8be81a61a590008bca7c0c77f7175211f0bacfa0fc1dac8d2ac256f7c8bf536bc107ffd764f4ddbf3d710bf8d7a33717f
7
- data.tar.gz: 5507ed1ed18e4e67c67ffe4d7837caee1ad055f1fc535e048d92bf91e101cc4fb44ff884af8b4ac3a8774b1278600e386d2c01fb2d3069a6195971ba8810a383
6
+ metadata.gz: b1365b424c12f58ede00c7c44060808daf52e2b2ea41f7ed74fbe7ea5aacb1b735c9d0126ecea9220b7618768d370407d6d72cc5195ac7924363486b33ff8d58
7
+ data.tar.gz: 34781f0bef1ab8d812eda44dbb3be5b33c1e37110dfe4a8dbbfa560ddbb7cbfd5781067c1eb6219c1e88c13206def3381cbd5ce8eb00d9604145a412ede6cc17
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- jnb_classifier (0.4.1)
4
+ jnb_classifier (0.4.2)
5
5
  natto
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -1,12 +1,40 @@
1
1
  # JnbClassifier
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/jnb_classifier`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ Naive Bayes classifier for japanse text.
4
+
5
+ ## Description
6
+ JnbClassifier helps you classify japanese text.
7
+
8
+ You need Morphological Analysis Tool, Mecab and the dictionary to use JnbClassifier.
9
+
10
+ If you use linux(ubuntu), you can install Mecab and the dictionary as follows.
11
+
12
+ 1:Mecab
13
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE
14
+
15
+ ```bash
16
+ $ cd mecab-X.X
17
+ $ ./configure
18
+ $ make
19
+ $ make check
20
+ $ sudo make install
21
+ ```
22
+
23
+ 2:the dictionary for Mecab
24
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM
25
+
26
+ ```bash
27
+ $ tar zxfv mecab-ipadic-2.7.0-XXXX.tar.gz
28
+ $ cd mecab-ipadic-2.7.0-XXXX
29
+ $ ./configure --with-charset=utf8
30
+ $ make
31
+ $ sudo make install
32
+ ```
4
33
 
5
- TODO: Delete this and the text above, and describe your gem
6
34
 
7
35
  ## Installation
8
36
 
9
- Add this line to your application's Gemfile:
37
+ Add this line to your application's Gemfile
10
38
 
11
39
  ```ruby
12
40
  gem 'jnb_classifier'
@@ -22,7 +50,40 @@ Or install it yourself as:
22
50
 
23
51
  ## Usage
24
52
 
25
- TODO: Write usage instructions here
53
+ #### 1: Generate Document(label, attributes) from text
54
+ First you need create documents which neeed two parameters: category name and text.
55
+
56
+ ```
57
+ document1 = JnbClassifier::Document.new('hayao','私の住んでいる場所は風の谷ではありません。')
58
+ #=> @label="hayao", @attributes={"私"=>1, "場所"=>1, "風"=>1, "谷"=>1}
59
+ document2 = JnbClassifier::Document.new('hayao','私は魔女の宅急便が大好きです')
60
+ #=> @label="hayao", @attributes={"私"=>1, "魔女"=>1, "宅急便"=>1, "大好き"=>1}
61
+ document3 = JnbClassifier::Document.new('hujiko','僕、ドラえもん。')
62
+ #=> @label="hujiko", @attributes={"僕"=>1, "ドラえもん"=>1}
63
+ ```
64
+
65
+ #### 2: Learn documents
66
+ Your classifier can learn from documents you created.
67
+
68
+ And the more you give apropriate samples, the smarter your classifier will be.
69
+
70
+ ```
71
+ nb = nb = JnbClassifier::Classifier.new
72
+ mb.learn(document1)
73
+ mb.learn(document2)
74
+ mb.learn(document3)
75
+ ```
76
+
77
+ #### 3:Classify
78
+ When you input attributes(hash), ,classify method return the category which is selected by Naive Bayes.
79
+
80
+ ```
81
+ nb.classify({"私"=>1, "魔女"=>1, "宅急便"=>1} )
82
+ #=> "hayao"
83
+ nb.result
84
+ #=> {"hayao"=>-16.72902719126572, "hujiko"=>-20.435583764494627} # the values is logarithm.
85
+ ```
86
+
26
87
 
27
88
  ## Development
28
89
 
@@ -0,0 +1,80 @@
1
+ # JnbClassifier
2
+
3
+ Naive Bayes classifier for japanse text.
4
+
5
+ ## Description
6
+ JnbClassifier help you classify japanese text.
7
+
8
+ You need Morphological Analysis Tool, Mecab and the dictionary to use JnbClassifier.
9
+
10
+ If you use linux(ubuntu), you can install Mecab and the dictionary as follows.
11
+
12
+ 1:Mecab
13
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE
14
+
15
+ ```bash
16
+ $ cd mecab-X.X
17
+ $ ./configure
18
+ $ make
19
+ $ make check
20
+ $ sudo make install
21
+ ```
22
+
23
+ 2:the dictionary for Mecab
24
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM
25
+
26
+ ```bash
27
+ $ tar zxfv mecab-ipadic-2.7.0-XXXX.tar.gz
28
+ $ cd mecab-ipadic-2.7.0-XXXX
29
+ $ ./configure --with-charset=utf8
30
+ $ make
31
+ $ sudo make install
32
+ ```
33
+
34
+
35
+ ## Installation
36
+
37
+ Add this line to your application's Gemfile
38
+
39
+ ```ruby
40
+ gem 'jnb_classifier'
41
+ ```
42
+
43
+ And then execute:
44
+
45
+ $ bundle
46
+
47
+ Or install it yourself as:
48
+
49
+ $ gem install jnb_classifier
50
+
51
+ ## Usage
52
+
53
+ #### 1: generate hash from text
54
+
55
+ ```
56
+ document1 = JnbClassifier::Document.new('hayao','')
57
+
58
+ ```
59
+
60
+
61
+
62
+
63
+
64
+ ## Development
65
+
66
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
67
+
68
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
69
+
70
+ ## Contributing
71
+
72
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/jnb_classifier. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
73
+
74
+ ## License
75
+
76
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
77
+
78
+ ## Code of Conduct
79
+
80
+ Everyone interacting in the JnbClassifier project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/jnb_classifier/blob/master/CODE_OF_CONDUCT.md).
@@ -7,8 +7,8 @@ module JnbClassifier
7
7
  attr_reader :result
8
8
 
9
9
  def initialize
10
- @frequency_table = Hash.new() # frequency table for each class
11
- @word_table = Hash.new() # word feature table
10
+ @frequency_table = Hash.new # frequency table for each class
11
+ @word_table = Hash.new # word feature table
12
12
  @label_count = Hash.new(0) # count by each label
13
13
  @total_count = 0 # total learned documents
14
14
  @result = Hash.new
@@ -36,13 +36,14 @@ module JnbClassifier
36
36
 
37
37
  # P(Label)
38
38
  @label_count.each{|label,freq|
39
- label_p[label] = Math.log(freq.to_f / @total_count.to_f)
39
+ label_p[label] = Math.log(freq.fdiv(@total_count))
40
40
  }
41
41
 
42
- # P(X|Label)
42
+ # P(X|Label)
43
43
  @frequency_table.each_key{|label|
44
+ deno = @label_count[label] + @word_table.size()
44
45
  @word_table.each_key{|word|
45
- laplace_word_p[label] += Math.log((@frequency_table[label][word] + 1).to_f / (@label_count[label] + @word_table.size()).to_f)
46
+ laplace_word_p[label] += Math.log( (@frequency_table[label][word] + 1).fdiv(deno) )
46
47
  }
47
48
  score[label] = laplace_word_p[label] + label_p[label]
48
49
  }
@@ -51,7 +52,7 @@ module JnbClassifier
51
52
  score.each{|label, value|
52
53
  @result[label] = value
53
54
  }
54
- score.max{ |x, y| x[1] <=> y[1] }
55
+ score.max_by{ |x| x[1] }
55
56
  end
56
57
  end
57
58
 
@@ -60,23 +61,21 @@ module JnbClassifier
60
61
  attr_reader :label
61
62
  attr_reader :attributes
62
63
 
63
- def initialize(label, file_name)
64
+ def initialize(label,doc)
64
65
  @label = label # String
65
- @attributes = create_attributes(file_name) # Hash
66
+ @attributes = create_attributes(doc) # Hsah
66
67
  end
67
68
 
68
- def create_attributes(file_name)
69
+ def create_attributes(doc)
69
70
  attributes = Hash.new(0)
70
- File.open(file_name) {|f|
71
- doc = f.read
72
- nm = Natto::MeCab.new
73
- nm.parse(doc) do |n|
74
- attributes[n.surface] += 1 if n.feature.match(/名詞/)
75
- end
76
- }
71
+ nm = Natto::MeCab.new
72
+ nm.parse(doc) do |n|
73
+ attributes[n.surface] += 1 if n.feature.match(/名詞/)
74
+ end
77
75
  attributes
78
76
  end
79
77
  end
80
78
 
81
79
  end
82
80
 
81
+
@@ -1,3 +1,3 @@
1
1
  module JnbClassifier
2
- VERSION = "0.4.1"
2
+ VERSION = "0.4.2"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: jnb_classifier
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - chamao
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-11-03 00:00:00.000000000 Z
11
+ date: 2017-11-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -81,6 +81,7 @@ files:
81
81
  - Gemfile.lock
82
82
  - LICENSE.txt
83
83
  - README.md
84
+ - README.md.backup
84
85
  - Rakefile
85
86
  - bin/console
86
87
  - bin/setup