jnb_classifier 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 471d53529cd3370211a12feaabc49e09afde67a3
4
- data.tar.gz: 87a9c51b9d20d7e4f1447ec16b61c87ea6a22079
3
+ metadata.gz: 99aaba58f7a9e3c702d91490ced2e45f1413e252
4
+ data.tar.gz: 7430e3ab70b3aa959c43e5a518e4323878b65378
5
5
  SHA512:
6
- metadata.gz: f8fec00adcc3ea7fd0136ecc9fadb1c8be81a61a590008bca7c0c77f7175211f0bacfa0fc1dac8d2ac256f7c8bf536bc107ffd764f4ddbf3d710bf8d7a33717f
7
- data.tar.gz: 5507ed1ed18e4e67c67ffe4d7837caee1ad055f1fc535e048d92bf91e101cc4fb44ff884af8b4ac3a8774b1278600e386d2c01fb2d3069a6195971ba8810a383
6
+ metadata.gz: b1365b424c12f58ede00c7c44060808daf52e2b2ea41f7ed74fbe7ea5aacb1b735c9d0126ecea9220b7618768d370407d6d72cc5195ac7924363486b33ff8d58
7
+ data.tar.gz: 34781f0bef1ab8d812eda44dbb3be5b33c1e37110dfe4a8dbbfa560ddbb7cbfd5781067c1eb6219c1e88c13206def3381cbd5ce8eb00d9604145a412ede6cc17
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- jnb_classifier (0.4.1)
4
+ jnb_classifier (0.4.2)
5
5
  natto
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -1,12 +1,40 @@
1
1
  # JnbClassifier
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/jnb_classifier`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ Naive Bayes classifier for japanse text.
4
+
5
+ ## Description
6
+ JnbClassifier helps you classify japanese text.
7
+
8
+ You need Morphological Analysis Tool, Mecab and the dictionary to use JnbClassifier.
9
+
10
+ If you use linux(ubuntu), you can install Mecab and the dictionary as follows.
11
+
12
+ 1:Mecab
13
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE
14
+
15
+ ```bash
16
+ $ cd mecab-X.X
17
+ $ ./configure
18
+ $ make
19
+ $ make check
20
+ $ sudo make install
21
+ ```
22
+
23
+ 2:the dictionary for Mecab
24
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM
25
+
26
+ ```bash
27
+ $ tar zxfv mecab-ipadic-2.7.0-XXXX.tar.gz
28
+ $ cd mecab-ipadic-2.7.0-XXXX
29
+ $ ./configure --with-charset=utf8
30
+ $ make
31
+ $ sudo make install
32
+ ```
4
33
 
5
- TODO: Delete this and the text above, and describe your gem
6
34
 
7
35
  ## Installation
8
36
 
9
- Add this line to your application's Gemfile:
37
+ Add this line to your application's Gemfile
10
38
 
11
39
  ```ruby
12
40
  gem 'jnb_classifier'
@@ -22,7 +50,40 @@ Or install it yourself as:
22
50
 
23
51
  ## Usage
24
52
 
25
- TODO: Write usage instructions here
53
+ #### 1: Generate Document(label, attributes) from text
54
+ First you need create documents which neeed two parameters: category name and text.
55
+
56
+ ```
57
+ document1 = JnbClassifier::Document.new('hayao','私の住んでいる場所は風の谷ではありません。')
58
+ #=> @label="hayao", @attributes={"私"=>1, "場所"=>1, "風"=>1, "谷"=>1}
59
+ document2 = JnbClassifier::Document.new('hayao','私は魔女の宅急便が大好きです')
60
+ #=> @label="hayao", @attributes={"私"=>1, "魔女"=>1, "宅急便"=>1, "大好き"=>1}
61
+ document3 = JnbClassifier::Document.new('hujiko','僕、ドラえもん。')
62
+ #=> @label="hujiko", @attributes={"僕"=>1, "ドラえもん"=>1}
63
+ ```
64
+
65
+ #### 2: Learn documents
66
+ Your classifier can learn from documents you created.
67
+
68
+ And the more you give apropriate samples, the smarter your classifier will be.
69
+
70
+ ```
71
+ nb = nb = JnbClassifier::Classifier.new
72
+ mb.learn(document1)
73
+ mb.learn(document2)
74
+ mb.learn(document3)
75
+ ```
76
+
77
+ #### 3:Classify
78
+ When you input attributes(hash), ,classify method return the category which is selected by Naive Bayes.
79
+
80
+ ```
81
+ nb.classify({"私"=>1, "魔女"=>1, "宅急便"=>1} )
82
+ #=> "hayao"
83
+ nb.result
84
+ #=> {"hayao"=>-16.72902719126572, "hujiko"=>-20.435583764494627} # the values is logarithm.
85
+ ```
86
+
26
87
 
27
88
  ## Development
28
89
 
@@ -0,0 +1,80 @@
1
+ # JnbClassifier
2
+
3
+ Naive Bayes classifier for japanse text.
4
+
5
+ ## Description
6
+ JnbClassifier help you classify japanese text.
7
+
8
+ You need Morphological Analysis Tool, Mecab and the dictionary to use JnbClassifier.
9
+
10
+ If you use linux(ubuntu), you can install Mecab and the dictionary as follows.
11
+
12
+ 1:Mecab
13
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE
14
+
15
+ ```bash
16
+ $ cd mecab-X.X
17
+ $ ./configure
18
+ $ make
19
+ $ make check
20
+ $ sudo make install
21
+ ```
22
+
23
+ 2:the dictionary for Mecab
24
+ download site: https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM
25
+
26
+ ```bash
27
+ $ tar zxfv mecab-ipadic-2.7.0-XXXX.tar.gz
28
+ $ cd mecab-ipadic-2.7.0-XXXX
29
+ $ ./configure --with-charset=utf8
30
+ $ make
31
+ $ sudo make install
32
+ ```
33
+
34
+
35
+ ## Installation
36
+
37
+ Add this line to your application's Gemfile
38
+
39
+ ```ruby
40
+ gem 'jnb_classifier'
41
+ ```
42
+
43
+ And then execute:
44
+
45
+ $ bundle
46
+
47
+ Or install it yourself as:
48
+
49
+ $ gem install jnb_classifier
50
+
51
+ ## Usage
52
+
53
+ #### 1: generate hash from text
54
+
55
+ ```
56
+ document1 = JnbClassifier::Document.new('hayao','')
57
+
58
+ ```
59
+
60
+
61
+
62
+
63
+
64
+ ## Development
65
+
66
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
67
+
68
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
69
+
70
+ ## Contributing
71
+
72
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/jnb_classifier. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
73
+
74
+ ## License
75
+
76
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
77
+
78
+ ## Code of Conduct
79
+
80
+ Everyone interacting in the JnbClassifier project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/jnb_classifier/blob/master/CODE_OF_CONDUCT.md).
@@ -7,8 +7,8 @@ module JnbClassifier
7
7
  attr_reader :result
8
8
 
9
9
  def initialize
10
- @frequency_table = Hash.new() # frequency table for each class
11
- @word_table = Hash.new() # word feature table
10
+ @frequency_table = Hash.new # frequency table for each class
11
+ @word_table = Hash.new # word feature table
12
12
  @label_count = Hash.new(0) # count by each label
13
13
  @total_count = 0 # total learned documents
14
14
  @result = Hash.new
@@ -36,13 +36,14 @@ module JnbClassifier
36
36
 
37
37
  # P(Label)
38
38
  @label_count.each{|label,freq|
39
- label_p[label] = Math.log(freq.to_f / @total_count.to_f)
39
+ label_p[label] = Math.log(freq.fdiv(@total_count))
40
40
  }
41
41
 
42
- # P(X|Label)
42
+ # P(X|Label)
43
43
  @frequency_table.each_key{|label|
44
+ deno = @label_count[label] + @word_table.size()
44
45
  @word_table.each_key{|word|
45
- laplace_word_p[label] += Math.log((@frequency_table[label][word] + 1).to_f / (@label_count[label] + @word_table.size()).to_f)
46
+ laplace_word_p[label] += Math.log( (@frequency_table[label][word] + 1).fdiv(deno) )
46
47
  }
47
48
  score[label] = laplace_word_p[label] + label_p[label]
48
49
  }
@@ -51,7 +52,7 @@ module JnbClassifier
51
52
  score.each{|label, value|
52
53
  @result[label] = value
53
54
  }
54
- score.max{ |x, y| x[1] <=> y[1] }
55
+ score.max_by{ |x| x[1] }
55
56
  end
56
57
  end
57
58
 
@@ -60,23 +61,21 @@ module JnbClassifier
60
61
  attr_reader :label
61
62
  attr_reader :attributes
62
63
 
63
- def initialize(label, file_name)
64
+ def initialize(label,doc)
64
65
  @label = label # String
65
- @attributes = create_attributes(file_name) # Hash
66
+ @attributes = create_attributes(doc) # Hsah
66
67
  end
67
68
 
68
- def create_attributes(file_name)
69
+ def create_attributes(doc)
69
70
  attributes = Hash.new(0)
70
- File.open(file_name) {|f|
71
- doc = f.read
72
- nm = Natto::MeCab.new
73
- nm.parse(doc) do |n|
74
- attributes[n.surface] += 1 if n.feature.match(/名詞/)
75
- end
76
- }
71
+ nm = Natto::MeCab.new
72
+ nm.parse(doc) do |n|
73
+ attributes[n.surface] += 1 if n.feature.match(/名詞/)
74
+ end
77
75
  attributes
78
76
  end
79
77
  end
80
78
 
81
79
  end
82
80
 
81
+
@@ -1,3 +1,3 @@
1
1
  module JnbClassifier
2
- VERSION = "0.4.1"
2
+ VERSION = "0.4.2"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: jnb_classifier
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - chamao
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-11-03 00:00:00.000000000 Z
11
+ date: 2017-11-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -81,6 +81,7 @@ files:
81
81
  - Gemfile.lock
82
82
  - LICENSE.txt
83
83
  - README.md
84
+ - README.md.backup
84
85
  - Rakefile
85
86
  - bin/console
86
87
  - bin/setup