RubyGems - tiny-classifier - Versions diffs - 1.5 → 2.0 - Mend

tiny-classifier 1.5 → 2.0

Files changed (11) hide show

checksums.yaml +4 -4
data/README.md +12 -12
data/bin/tc-retrain +20 -0
data/lib/tiny-classifier/base.rb +10 -10
data/lib/tiny-classifier/classifier-generator.rb +2 -2
data/lib/tiny-classifier/classifier.rb +2 -2
data/lib/tiny-classifier/retrainer.rb +46 -0
data/lib/tiny-classifier/trainer.rb +14 -14
data/lib/tiny-classifier/untrainer.rb +3 -3
data/tiny-classifier.gemspec +1 -1
metadata +4 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 4c41c82316eee9b5ec0f08c1d9671b2d277fd782
-  data.tar.gz: 438e6265907f3b04d3ba51d7b340faa895fccc9d
+  metadata.gz: ff0e4cd3aafc37e4ba99b782d3b7816c6bbb5f9f
+  data.tar.gz: d9b99162fe1711f3f7cdaaf3791f1df9d75df14f
 SHA512:
-  metadata.gz: 4688090925478c53dc9e2378ad59925f28e74f162a5305ffc50bb044f0ab9b4ca07e4b2ea46caedb5353e7884a3983355a4539c4a8e4ae31a9d64ecd4f6ac3d4
-  data.tar.gz: 6f849698470a95ecd3e2a58bfb0dcdb668f4caa14924955e249f14aa4626531b3a6f39e832e3bec2d23432de7b9226303491268ca8666b46f1373d6de77e17a9
+  metadata.gz: 933a1773de70ee773863a643556ec4dfedff6433ee3385b4ce8f3283efa1959e25027fae1712f410c39d70f72b74a06b231299d5041a12276afe8c771f948047
+  data.tar.gz: 93602fc9af2bda18c52beaefd980b912aa438b95526c549d965fde192c3f3061fbb1ebffaf13de221ec92ae57e7cf25b06adb3c0c83c90f16520aa3208037cab

data/README.md CHANGED Viewed

@@ -27,34 +27,34 @@ This is example on Ubuntu.
 Training:
 ```
-% echo "Hello, world!"        | tc-train --labels=positive,negative positive
-% echo "I'm very very happy!" | tc-train --labels=positive,negative positive
-% echo "I'm so bad..."        | tc-train --labels=positive,negative negative
-% echo "Oh my god!"           | tc-train --labels=positive,negative negative
+% echo "Hello, world!"        | tc-train --categories=positive,negative positive
+% echo "I'm very very happy!" | tc-train --categories=positive,negative positive
+% echo "I'm so bad..."        | tc-train --categories=positive,negative negative
+% echo "Oh my god!"           | tc-train --categories=positive,negative negative
 ```
-The training data will be saved as `tc.negative-positive.dat` (`tc.` is the fixed prefix, `.dat` is the fixed suffix. The middle part is filled by given labels automatically.) in the current directory. If you hope the file to be saved in any different place, please specify `--base-dir=/path/to/data/directory`.
+The training data will be saved as `tc.negative-positive.dat` (`tc.` is the fixed prefix, `.dat` is the fixed suffix. The middle part is filled by given categories automatically.) in the current directory. If you hope the file to be saved in any different place, please specify `--base-dir=/path/to/data/directory`.
 Untraining for mistakes:
 ```
-% echo "I'm so bad..." | tc-untrain --labels=positive,negative positive
+% echo "I'm so bad..." | tc-untrain --categories=positive,negative positive
 ```
 Testing to classify:
 ~~~
-% echo "Happy day?" | tc-classify --labels=positive,negative
+% echo "Happy day?" | tc-classify --categories=positive,negative
 positive
 ~~~
 If you think that the classifier has been enoughly trained, then you can generate a fixed classifier:
 ~~~
-% tc-generate-classifier --labels=positive,negative --output-dir=/path/to/dir
+% tc-generate-classifier --categories=positive,negative --output-dir=/path/to/dir
 ~~~
-Then a fixed classifier (executable Ruby script) will be generated as `tc-classify-negative-positive` (`tc-classify-` is the fixed prefix, rest is filled by given labels automatically.)
+Then a fixed classifier (executable Ruby script) will be generated as `tc-classify-negative-positive` (`tc-classify-` is the fixed prefix, rest is filled by given categories automatically.)
 ~~~
 % ls /path/to/dir/
@@ -67,8 +67,8 @@ positive
 ### Common
-`-l`, `--labels=LABELS` (required)
-:  A comman-separated list of labels. You should use only alphabetic characters. (Non-alphabetical characters will cause problems.)
+`-l`, `--categories=CATEGORIES` (required)
+:  A comman-separated list of categories. You should use only alphabetic characters. (Non-alphabetical characters will cause problems.)
 `-d`, `--data-dir=PATH` (optional)
 : The path to the directory that the training data to be saved. The current directory is the default value.
@@ -78,7 +78,7 @@ positive
 ### `tc-train` and `tc-untrain` specific parameters
-Both `tc-train` and `tc-untrain` require one command line argument: the label. You need to specify one of labels given via the `--labels` parameter.
+Both `tc-train` and `tc-untrain` require one command line argument: the category. You need to specify one of categories given via the `--categories` parameter.
 ### `tc-generate-classifier` specific parameters

data/bin/tc-retrain ADDED Viewed

@@ -0,0 +1,20 @@
+#!/usr/bin/env ruby
+# Copyright (C) 2017 YUKI "Piro" Hiroshi
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+require "tiny-classifier/retrainer"
+TinyClassifier::Retrainer.run

data/lib/tiny-classifier/base.rb CHANGED Viewed

@@ -48,9 +48,9 @@ module TinyClassifier
         @data_dir = data_dir
       end
-      parser.on("-l LABELS", "--labels=LABELS",
-                "List of labels (comma-separated)") do |labels|
-        @labels = normalize_labels(labels)
+      parser.on("-c CATEGORIES", "--categories=CATEGORIES",
+                "List of categories (comma-separated)") do |categories|
+        @categories = normalize_categories(categories)
       end
       parser.on("-t TOKENIZER", "--tokenizer=TOKENIZER",
@@ -61,14 +61,14 @@ module TinyClassifier
       parser
     end
-    def normalize_labels(labels)
-      labels
+    def normalize_categories(categories)
+      categories
         .strip
         .downcase
         .split(",")
         .collect(&:strip)
-        .reject do |label|
-          label.empty?
+        .reject do |category|
+          category.empty?
         end
         .sort
         .collect(&:capitalize)
@@ -79,8 +79,8 @@ module TinyClassifier
     end
     def prepare_data_file_name
-      labels = @labels.join("-").downcase
-      "tc.#{labels}.dat"
+      categories = @categories.join("-").downcase
+      "tc.#{categories}.dat"
     end
     def data_file_path
@@ -97,7 +97,7 @@ module TinyClassifier
         data = File.read(data_file_path.to_s)
         Marshal.load(data)
       else
-        ClassifierReborn::Bayes.new(*@labels)
+        ClassifierReborn::Bayes.new(*@categories)
       end
     end

data/lib/tiny-classifier/classifier-generator.rb CHANGED Viewed

@@ -72,8 +72,8 @@ module TinyClassifier
     end
     def prepare_classifier_name
-      labels = @labels.join("-").downcase
-      "tc-classify-#{labels}"
+      categories = @categories.join("-").downcase
+      "tc-classify-#{categories}"
     end
     def output_file_path

data/lib/tiny-classifier/classifier.rb CHANGED Viewed

@@ -33,8 +33,8 @@ module TinyClassifier
         STDERR.puts("Error: No effective input.")
         false
       else
-        label = classifier.classify(input)
-        puts label.downcase
+        category = classifier.classify(input)
+        puts category.downcase
         true
       end
     end

data/lib/tiny-classifier/retrainer.rb ADDED Viewed

@@ -0,0 +1,46 @@
+# Copyright (C) 2017 YUKI "Piro" Hiroshi
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+require "tiny-classifier/trainer"
+module TinyClassifier
+  class Retrainer < Trainer
+    class << self
+      def run(argv=nil)
+        argv ||= ARGV.dup
+        retrainer = new
+        *categories = retrainer.parse_command_line_options(argv)
+        retrainer.run(wrong: categories[0],
+                      correct: categories[1])
+      end
+    end
+    def run(params)
+      if input.empty?
+        STDERR.puts("Error: No effective input.")
+        false
+      else
+        @category = params[:wrong]
+        prepare_category
+        classifier.send("untrain_#{@category}", input)
+        @category = params[:correct]
+        prepare_category
+        classifier.send("train_#{@category}", input)
+        save
+        true
+      end
+    end
+  end
+end

data/lib/tiny-classifier/trainer.rb CHANGED Viewed

@@ -21,45 +21,45 @@ module TinyClassifier
       def run(argv=nil)
         argv ||= ARGV.dup
         trainer = new
-        *labels = trainer.parse_command_line_options(argv)
-        trainer.run(label: labels.first)
+        *categories = trainer.parse_command_line_options(argv)
+        trainer.run(category: categories.first)
       end
     end
     def initialize
       super
-      option_parser.banner += " LABEL"
+      option_parser.banner += " CATEGORY"
     end
     def run(params)
-      @label = params[:label]
-      prepare_label
+      @category = params[:category]
+      prepare_category
       if input.empty?
         STDERR.puts("Error: No effective input.")
         false
       else
-        classifier.send("train_#{@label}", input)
+        classifier.send("train_#{@category}", input)
         save
         true
       end
     end
     private
-    def prepare_label
-      unless @label
-        STDERR.puts("Error: You need to specify the label for the input.")
+    def prepare_category
+      unless @category
+        STDERR.puts("Error: You need to specify the category for the input.")
         exit(false)
       end
-      @label = @label.downcase.strip
+      @category = @category.downcase.strip
-      if @label.empty?
-        STDERR.puts("Error: You need to specify the label for the input.")
+      if @category.empty?
+        STDERR.puts("Error: You need to specify the category for the input.")
         exit(false)
       end
-      unless @labels.include?(@label.capitalize)
-        STDERR.puts("Error: You need to specify one of valid labels: #{@labels.join(', ')}")
+      unless @categories.include?(@category.capitalize)
+        STDERR.puts("Error: You need to specify one of valid categories: #{@categories.join(', ')}")
         exit(false)
       end
     end

data/lib/tiny-classifier/untrainer.rb CHANGED Viewed

@@ -18,13 +18,13 @@ require "tiny-classifier/trainer"
 module TinyClassifier
   class Untrainer < Trainer
     def run(params)
-      @label = params[:label]
-      prepare_label
+      @category = params[:category]
+      prepare_category
       if input.empty?
         STDERR.puts("Error: No effective input.")
         false
       else
-        classifier.send("untrain_#{@label}", input)
+        classifier.send("untrain_#{@category}", input)
         save
         true
       end

data/tiny-classifier.gemspec CHANGED Viewed

@@ -21,7 +21,7 @@ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), "lib"))
 Gem::Specification.new do |spec|
   spec.name = "tiny-classifier"
-  spec.version = "1.5"
+  spec.version = "2.0"
   spec.homepage = "https://github.com/piroor/tiny-classifier"
   spec.authors = ["YUKI \"Piro\" Hiroshi"]
   spec.email = ["piro.outsider.reflex@gmail.com"]

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: tiny-classifier
 version: !ruby/object:Gem::Version
-  version: '1.5'
+  version: '2.0'
 platform: ruby
 authors:
 - YUKI "Piro" Hiroshi
@@ -44,6 +44,7 @@ email:
 executables:
 - tc-classify
 - tc-generate-classifier
+- tc-retrain
 - tc-train
 - tc-untrain
 extensions: []
@@ -54,11 +55,13 @@ files:
 - Rakefile
 - bin/tc-classify
 - bin/tc-generate-classifier
+- bin/tc-retrain
 - bin/tc-train
 - bin/tc-untrain
 - lib/tiny-classifier/base.rb
 - lib/tiny-classifier/classifier-generator.rb
 - lib/tiny-classifier/classifier.rb
+- lib/tiny-classifier/retrainer.rb
 - lib/tiny-classifier/tokenizer.rb
 - lib/tiny-classifier/trainer.rb
 - lib/tiny-classifier/untrainer.rb