RubyGems - plain_text - Versions diffs - 0.2 → 0.3 - Mend

plain_text 0.2 → 0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2c015ed947812371558456375c2933f0d03720082899f8c58c699419eda77f1b
-  data.tar.gz: fafc479d9bb492bd3b3ad140ec7a58d2cc0e7bc49dec4ad80c4111ad1f63e3df
+  metadata.gz: e93dc475c0c5f66817fbe63e6824f1ec9d1a36c487126548eec0dead4dfde3f4
+  data.tar.gz: e7e64d6aa8dd28ea282cd11f0bfc7d0a55476735ec9b6e7db7a638a48e0457d8
 SHA512:
-  metadata.gz: cb7d054e24cc85c64bbb556d4de30b3b54c9b51b409519d9b7f307fbe64dc05dc32e6e7cbeccc027b41c842a31ec5b489e60801b1c1c1f72e587157f62f38391
-  data.tar.gz: aef2b0ebd0c69f694c438cbf8d8e62d6d754d92c5d804553649c681d6c088bd9bb363197d9fb209b184aa49fb44ef5e733268e1d53a19bc7dfef260c86dee88c
+  metadata.gz: e877ab86109aaf3078990ae1ac55f534026ae8ff941f22af38e39d6655356ed9b5a1f69d4d1f7cfc05efcf812e29efe185dba94e97db88c5383d49eb6b487579
+  data.tar.gz: c2cbd7f86fd1779ab0cd18136a44680f58fce87f3c9c8068e8ef7a5c049ed40d47f2a0614f169902780a9a362a0a6bf46fee7f626ef74cf94fb2425266bac1a1

data/ChangeLog CHANGED Viewed

@@ -1,3 +1,19 @@
+-----
+(Version: 0.3)
+2019-10-27  Masa Sakano
+  * Added 3 executables textclean, head.rb, tail.rb in bin/ together with their tests
+  * lib/plaintext.rb refactoring
+    * Added a new constant `DEF_METHOD_OPTS`
+  * bin/countchar refactoring
+-----
+(Version: 0.3)
+2019-10-27  Masa Sakano
+  * Added 3 executables textclean, head.rb, tail.rb in bin/ together with their tests
+  * lib/plaintext.rb refactoring
+    * Added a new constant `DEF_METHOD_OPTS`
+  * bin/countchar refactoring
 -----
 (Version: 0.2)
 2019-10-27  Masa Sakano

data/README.en.rdoc CHANGED Viewed

@@ -7,8 +7,9 @@ This module provides utility functions and methods to handle plain
 text.  In the namespace, classes Part/Paragraph/Boundary are defined,
 which represent the logical structure of a document and another class
 ParseRule, which describes the rules to parse plain text to produce a Part-type Ruby instance.
-This package also provides a command-line program to count the number
-of characters, especially useful for documents in Asian (CJK) chatacters.
+This package also provides a few command-line programs, such as counting the number
+of characters (especially useful for documents in Asian (CJK)
+chatacters) and advanced head/tail commands.
 == Design concept
@@ -93,7 +94,10 @@ it is applied to each Paragraph and Section separately to split them further.
 standard methods to apply the rules to an object (either String or
 {PlainText::Part}.
-== Command-line tool
+== Command-line tools
+All the commands here accept +-h+ (or +--help+) option to print the
+help message.
 === countchar
@@ -102,9 +106,54 @@ Counts the number of characters in a file(s) or STDIN.
 The simplest example to run the command-line script is
    countchar YourFile.txt
-You may start with
-   countchar --help
-to see the available options.
+=== textclean
+Wrapper command of {PlainText.clean_text}.
+Outputs *cleaned* text, such as, truncating more than 3 linebreaks
+into 2.  See the reference of {PlainText.clean_text} for detail.
+=== head.rb
+This gives advanced functions, in addition to the standard +head+, including
+Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line).
+Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
+Inverse:: It can inverse the counting to ouput everything but initial NUM lines.
+A few examples are
+  head.rb -n 5 < try.txt
+    # the same as the UNIX head; printing the first 5 lines
+  head.rb -i -n 5 try.txt
+    # printing everything but the first 5 lines
+    # The same as the UNIX command:  tail -n +5
+  head.rb -e '^===+' try.txt
+    # => first line up to the line that begins with more than 3 "="
+  head.rb -x -e '^===+' try.txt
+    # => first line up to the line before what begins with more than 3 "="
+The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
+=== tail.rb
+This gives advanced functions, in addition to the standard +tail+, including
+Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end).
+Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
+Inverse:: It can inverse the counting to ouput everything but the last NUM lines.
+Note the UNIX form of
+  tail -n +5
+(which I think is a bit counter-intuieive format) is equivalent to
+  head.rb -i -n 5
+The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
 == Miscellaneous
@@ -119,13 +168,13 @@ sent by a String instance +s+ = +"XQabXXcXQ"+:
     s.split(/X+Q?/)         #=> ["", "ab", "c"],
     s.split(/X+Q?/, -1)     #=> ["", "ab", "c", ""],
     s.split(/X+(Q?)/, -1)   #=> ["", "Q", "ab", "", "c", "Q", ""],
-    s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],
+    s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],
 With this method,
     s.split_with_delimiter(/X+(Q?)/)
                             #=> ["", "XQ", "ab", "XX", "c", "XQ"]
 from which the original string is always easily recovered by simple +join+.
 Also, {PlainText::Util} contains some miscellaneous methods.
@@ -134,8 +183,6 @@ Also, {PlainText::Util} contains some miscellaneous methods.
 Work in progress...
-It is still in a preliminary state.
 == Install
 This script requires {Ruby}[http://www.ruby-lang.org] Version 2.0
@@ -153,6 +200,8 @@ explicitly with your Ruby command as
 == Developer's note
+The source code is maintained also in {Github}[https://github.com/masasakano/plain_text]
 === Tests
 Ruby codes under the directory <tt>test/</tt> are the test scripts.

data/bin/countchar CHANGED Viewed

@@ -11,22 +11,17 @@ __EOF__
 # Initialising the hash for the command-line options.
 OPTS = {
-  preserve_paragraph: true,
-  boundary_style: true,
-  lbs_style: :t, # :truncate,
-  lb_is_space: false,
-  sps_style: :truncate,
-  delete_asian_space: true,
-  linehead_style: :delete,
-  linetail_style: :delete,
-  firstsps_style: :delete,
-  lastsps_style:  :truncate,
   line_i: nil,
   line_f: nil,
   # :chatter => 3,        # Default
   debug: false,
 }
+# Load the default values from the Module
+PlainText::DEF_METHOD_OPTS[:count_char].each_key do |ek|
+  OPTS[ek] ||= PlainText::DEF_METHOD_OPTS[:count_char][ek]
+end
 # Function to handle the command-line arguments.
 #
 # ARGV will be modified, and the constant variable OPTS is set.
@@ -45,8 +40,9 @@ def handle_argv
   opt.parse!(ARGV)
+  OPTS[:lbs_style] = OPTS[:lbs_style].to_s[0].to_sym
   unless %i(t d n).include? OPTS[:lbs_style]
-    warn "ERROR: --lbs-style must be one of (t(runcate)|d(elete)|n(one))."; exit 1
+    warn "ERROR: --lbs-style must be one of (t(runcate)|d(elete)|n(one)), but given (#{OPTS[:lbs_style].inspect})"; exit 1
   end
   OPTS
@@ -67,20 +63,15 @@ end
 # Handle the command-line options => OPTS
 opts = handle_argv()
+valid_keys = PlainText::DEF_METHOD_OPTS[:count_char].keys
+opts.each_key do |ek|
+  opts.delete ek if !valid_keys.include? ek
+end
 str = ARGF.read
-puts str.count_char(
-       preserve_paragraph: opts[:preserve_paragraph],
-       boundary_style:     opts[:boundary_style],
-       lbs_style:          opts[:lbs_style],
-       lb_is_space:        opts[:lb_is_space],
-       sps_style:          opts[:sps_style],
-       delete_asian_space: opts[:delete_asian_space],
-       linehead_style:     opts[:linehead_style],
-       linetail_style:     opts[:linetail_style],
-       firstsps_style:     opts[:firstsps_style],
-       lastsps_style:      opts[:lastsps_style],
-     )
+puts PlainText.count_char(str, **opts)
+# str.count_char() should be equivalent.
 exit

data/bin/head.rb ADDED Viewed

@@ -0,0 +1,87 @@
+#!/usr/bin/env ruby
+# -*- coding: utf-8 -*-
+require 'optparse'
+require 'plain_text'
+BANNER = <<"__EOF__"
+USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
+  Head command with (multi-byte) character-based manipulation and Regexp.
+__EOF__
+# Initialising the hash for the command-line options.
+OPTS = {
+  num: PlainText::DEF_HEADTAIL_N_LINES,
+  unit: :line,
+  inclusive: true,
+  inverse: false,  # unique option
+  # :chatter => 3,        # Default
+  debug: false,
+}
+# Function to handle the command-line arguments.
+#
+# ARGV will be modified, and the constant variable OPTS is set.
+#
+# @return [Hash]  Optional-argument hash.
+#
+def handle_argv
+  opt = OptionParser.new(BANNER)
+  opt.separator "Options:"        # Way to control a help message.
+  opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
+  opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
+  opt.on(  '--char=NUM',    sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
+  opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
+  opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
+  opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
+  # opt.on(  '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v}  # Consider opts.on_tail
+  # opt.on(  '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
+  # opt.separator ""        # Way to control a help message.
+  # opt.separator "Note:"
+  # opt.separator " Spaces are truncated in default."
+  begin
+    opt.parse!(ARGV)
+  rescue OptionParser::MissingArgument => er
+    # Missing argument like "-b" without a number.
+    warn er
+    exit 1
+  end
+  OPTS
+end
+################################################
+# MAIN
+################################################
+$stdout.sync=true
+$stderr.sync=true
+class String
+  include PlainText
+end
+# Handle the command-line options => OPTS
+opts = handle_argv()
+num_in = opts[:num]
+is_inverse = opts[:inverse]
+%i(num inverse debug).each do |ek|
+  opts.delete ek if opts.has_key? ek
+end
+str = ARGF.read
+# A linebreak guaranteed at the end.
+if is_inverse
+  puts PlainText.head_inverse(str, num_in, **opts)
+else
+  puts PlainText.head(str, num_in, **opts)
+end
+exit
+__END__

data/bin/tail.rb ADDED Viewed

@@ -0,0 +1,87 @@
+#!/usr/bin/env ruby
+# -*- coding: utf-8 -*-
+require 'optparse'
+require 'plain_text'
+BANNER = <<"__EOF__"
+USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
+  tail command with (multi-byte) character-based manipulation and Regexp.
+__EOF__
+# Initialising the hash for the command-line options.
+OPTS = {
+  num: PlainText::DEF_HEADTAIL_N_LINES,
+  unit: :line,
+  inclusive: true,
+  inverse: false,  # unique option
+  # :chatter => 3,        # Default
+  debug: false,
+}
+# Function to handle the command-line arguments.
+#
+# ARGV will be modified, and the constant variable OPTS is set.
+#
+# @return [Hash]  Optional-argument hash.
+#
+def handle_argv
+  opt = OptionParser.new(BANNER)
+  opt.separator "Options:"        # Way to control a help message.
+  opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
+  opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
+  opt.on(  '--char=NUM',    sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
+  opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
+  opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
+  opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
+  # opt.on(  '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v}  # Consider opts.on_tail
+  # opt.on(  '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
+  opt.separator ""        # Way to control a help message.
+  opt.separator "Note:"
+  opt.separator "  UNIX command of 'tail -n +5' is equivalent to 'head.rb -i -n 5'"
+  begin
+    opt.parse!(ARGV)
+  rescue OptionParser::MissingArgument => er
+    # Missing argument like "-b" without a number.
+    warn er
+    exit 1
+  end
+  OPTS
+end
+################################################
+# MAIN
+################################################
+$stdout.sync=true
+$stderr.sync=true
+class String
+  include PlainText
+end
+# Handle the command-line options => OPTS
+opts = handle_argv()
+num_in = opts[:num]
+is_inverse = opts[:inverse]
+%i(num inverse debug).each do |ek|
+  opts.delete ek if opts.has_key? ek
+end
+str = ARGF.read
+# A linebreak guaranteed at the end.
+if is_inverse
+  puts PlainText.tail_inverse(str, num_in, **opts)
+else
+  puts PlainText.tail(str, num_in, **opts)
+end
+exit
+__END__

data/bin/textclean ADDED Viewed

@@ -0,0 +1,103 @@
+#!/usr/bin/env ruby
+# -*- coding: utf-8 -*-
+require 'optparse'
+require 'plain_text'
+BANNER = <<"__EOF__"
+USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
+  Clean the text file INFILE (or STDIN), unifying linebreaks, and outputs it.
+__EOF__
+# Initialising the hash for the command-line options.
+OPTS = {
+  line_i: nil,
+  line_f: nil,
+  # :chatter => 3,        # Default
+  debug: false,
+}
+# Load the default values from the Module
+PlainText::DEF_METHOD_OPTS[:clean_text].each_key do |ek|
+  OPTS[ek] ||= PlainText::DEF_METHOD_OPTS[:clean_text][ek]
+end
+# Function to handle the command-line arguments.
+#
+# ARGV will be modified, and the constant variable OPTS is set.
+#
+# @return [Hash]  Optional-argument hash.
+#
+def handle_argv
+  opt = OptionParser.new(BANNER)
+  opt.on(  '--[no-]preserve_paragraph', sprintf("Preserved paragraph structures? (Def: %s)", OPTS[:preserve_paragraph].inspect), TrueClass) {|v| OPTS[:preserve_paragraph] = v}
+  opt.on(  '--boundary-style=STYLE', sprintf("One of (t(runcate)(2)|d(elete)|n(one)) (Def: truncate).")) { |v| OPTS[:boundary_style]=v.strip }
+  opt.on(  '--lbs-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:lbs_style])) { |v| OPTS[:lbs_style]=v.strip }
+  opt.on(  '--[no-]lb-is-space', sprintf("Linebraeks are equivalent to spaces? (Def: %s)", OPTS[:lb_is_space].inspect), TrueClass) {|v| OPTS[:lb_is_space] = v}
+  opt.on(  '--sps-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:sps_style])) { |v| OPTS[:sps_style]=v.strip }
+  opt.on(  '--[no-]delete-asian-space', sprintf("Deletes spaces between, before or after a CJK character? (Def: %s)", OPTS[:delete_asian_space].inspect), TrueClass) {|v| OPTS[:delete_asian_space] = v}
+  opt.on(  '--linehead-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:linehead_style])) { |v| OPTS[:linehead_style]=v.strip }
+  opt.on(  '--linetail-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:linetail_style])) { |v| OPTS[:linetail_style]=v.strip }
+  opt.on(  '--firstlbs-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:firstlbs_style])) { |v| OPTS[:firstlbs_style]=v.strip }
+  opt.on(  '--lastsps-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)|m(arkdown)) (Def: %s).", OPTS[:lastsps_style])) { |v| OPTS[:lastsps_style]=v.strip }
+  # opt.on(  '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v}  # Consider opts.on_tail
+  opt.on(  '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
+  # opt.separator ""        # Way to control a help message.
+  # opt.separator "Note:"
+  # opt.separator " Spaces are truncated in default."
+  opt.parse!(ARGV)
+  if (OPTS[:boundary_style].class.method_defined?(:to_str) &&
+      /\A(t(runcate)?(2)?|d(elete)?|n(one)?)\z/ =~ OPTS[:boundary_style])
+    OPTS[:boundary_style] = OPTS[:boundary_style].to_sym
+  end
+  %w(lbs sps linehead linetail firstlbs lastsps).each do |ek_head|
+    sym_k = (ek_head+"_style").to_sym
+    trysym = OPTS[sym_k].to_s[0].to_sym  # Symbol of 1 character (nb., NOT boundary_style)
+    if  (!(%i(t d n).include? trysym)   && (sym_k != :lastsps_style) ||
+         !(%i(t d n m).include? trysym) && (sym_k == :lastsps_style))
+      errmsg = sprintf(
+        "ERROR: --%s-style must be one of (t(runcate)|d(elete)%s|n(one)), but given %s.",
+        ek_head,
+        ((ek_head == "lastsps") ? "|m(arkdown)" : ""),
+        OPTS[sym_k].inspect
+      )
+      warn errmsg
+      exit 1
+    end
+    OPTS[sym_k] = trysym
+  end
+  OPTS
+end
+################################################
+# MAIN
+################################################
+$stdout.sync=true
+$stderr.sync=true
+class String
+  include PlainText
+end
+# Handle the command-line options => OPTS
+opts = handle_argv()
+valid_keys = PlainText::DEF_METHOD_OPTS[:clean_text].keys
+opts.each_key do |ek|
+  opts.delete ek if !valid_keys.include? ek
+end
+str = ARGF.read
+print PlainText.clean_text(str, **opts)
+exit
+__END__

data/lib/plain_text.rb CHANGED Viewed

@@ -25,6 +25,36 @@ module PlainText
   # Default number of lines to extract for {#head} and {#tail}
   DEF_HEADTAIL_N_LINES = 10
+  # Default options for class/instance methods
+  DEF_METHOD_OPTS = {
+    :clean_text => {
+      preserve_paragraph: true,
+      boundary_style: true,  # If unspecified, will be replaced with lb_out * 2
+      lbs_style: :truncate,
+      lb_is_space: false,
+      sps_style: :truncate,
+      delete_asian_space: true,
+      linehead_style: :none,
+      linetail_style: :delete,
+      firstlbs_style: :delete,
+      lastsps_style:  :truncate,
+      lb: $/,
+      lb_out: nil,           # If unspecified, will be replaced with lb
+    },
+    :count_char => {
+      lbs_style: :delete,
+      linehead_style: :delete,
+      lastsps_style: :delete,
+      lb_out: "\n",
+    },
+  }
+  # Adjusts DEF_METHOD_OPTS[:count_char]
+  DEF_METHOD_OPTS[:clean_text].each_key do |ek|
+    # %i(preserve_paragraph boundary_style lb_is_space sps_style delete_asian_space linetail_style firstlbs_style lb).each do |ek|
+    DEF_METHOD_OPTS[:count_char][ek] ||= DEF_METHOD_OPTS[:clean_text][ek]
+  end
   # Call instance method as a Module function
   #
   # The return String includes {PlainText} as Singleton.
@@ -39,33 +69,39 @@ module PlainText
   end
   # If the class of the obj does not "include" this module, do so in the singular class.
-  #
+  #
   # @param obj [Object] Maybe String. For which a singular class def is run, if the condition is met.
   # @return [TrueClass, NilClass] true if the singular class def is run. Else nil.
   def self.extend_this(obj)
-    return nil if defined? obj.delete_spaces_bw_cjk_european!
+    return nil if defined? obj.delete_spaces_bw_cjk_european!
     obj.extend(PlainText)
     true
   end
-  # Module function of {#count_char}
+  # Count the number of characters
+  #
+  # See {PlainText#clean_text!} for the optional parameters.  The defaults of a few of the optional parameters are different from it,
+  # such as the default for +lb_out+ is +"\n"+ (newline, so that a line-break is 1 byte in size).
+  # It is so that this method is more optimized for East-Asian (CJK) characters, given this method is most useful for CJK Strings,
+  # whereas, for European alphabets, counting the number of words, rather than characters as in this method, would be more standard.
   #
   # @param instr [String] String for which the number of chars is counted
   # @param (see #count_char)
   # @return [Integer]
   def self.count_char(instr, *rest,
-                 lbs_style: :delete,
-                 linehead_style: :delete,
-                 lastsps_style: :delete,
-                 lb_out: "\n",
-                 **k)
-    clean_text(instr, *rest, lbs_style: lbs_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
+        lbs_style:      DEF_METHOD_OPTS[:count_char][:lbs_style],
+        linehead_style: DEF_METHOD_OPTS[:count_char][:linehead_style],
+        lastsps_style:  DEF_METHOD_OPTS[:count_char][:lastsps_style],
+        lb_out:         DEF_METHOD_OPTS[:count_char][:lb_out],
+        **k
+      )
+    clean_text(instr, *rest, lbs_style: lbs_style, linehead_style: linehead_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
   end
   # Cleans the text
   #
-  # Such as, removing extra spaces, normalising the linebreaks, etc.
+  # Such as, removing extra spaces, normalising the linebreaks, etc.
   #
   # In default,
   #
@@ -77,9 +113,9 @@ module PlainText
   # * Trailing white spaces in each line are deleted: +linetail_style=:delete+
   # * Line-breaks at the beginning of the entire input string are deleted: +firstlbs_style=:delete+
   # * Trailing white spaces and line-breaks at the end of the entire input string are truncated into a single linebreak: +lastsps_style=:truncate+
-  #
+  #
   # For a String with predominantly CJK characters, the following setting is recommended:
-  #
+  #
   # * +lbs_style: :delete+
   # * +delete_asian_space: true+ (Default)
   #
@@ -111,26 +147,26 @@ module PlainText
   #
   def self.clean_text(
         prt,
-        preserve_paragraph: true,
-        boundary_style: true,  # If unspecified, will be replaced with lb_out * 2
-        lbs_style: :truncate,
-        lb_is_space: false,
-        sps_style: :truncate,
-        delete_asian_space: true,
-        linehead_style: :none,
-        linetail_style: :delete,
-        firstlbs_style: :delete,
-        lastsps_style:  :truncate,
-        lb: $/,
-        lb_out: nil,           # If unspecified, will be replaced with lb
+        preserve_paragraph: DEF_METHOD_OPTS[:clean_text][:preserve_paragraph],
+        boundary_style:     DEF_METHOD_OPTS[:clean_text][:boundary_style], # If unspecified, will be replaced with lb_out * 2
+        lbs_style:      DEF_METHOD_OPTS[:clean_text][:lbs_style],
+        lb_is_space:    DEF_METHOD_OPTS[:clean_text][:lb_is_space],
+        sps_style:      DEF_METHOD_OPTS[:clean_text][:sps_style],
+        delete_asian_space: DEF_METHOD_OPTS[:clean_text][:delete_asian_space],
+        linehead_style: DEF_METHOD_OPTS[:clean_text][:linehead_style],
+        linetail_style: DEF_METHOD_OPTS[:clean_text][:linetail_style],
+        firstlbs_style: DEF_METHOD_OPTS[:clean_text][:firstlbs_style],
+        lastsps_style:  DEF_METHOD_OPTS[:clean_text][:lastsps_style],
+        lb:     DEF_METHOD_OPTS[:clean_text][:lb],
+        lb_out: DEF_METHOD_OPTS[:clean_text][:lb_out], # If unspecified, will be replaced with lb
         is_debug: false
       )
-isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
+#isdebug = true if prt == "foo\n\n\nbar\n"
     lb_out ||= lb  # Output linebreak
     boundary_style = lb_out*2 if true       == boundary_style
     boundary_style = ""       if [:delete, :d].include? boundary_style
-    lastsps_style  = lb_out   if :linebreak == lastsps_style
+    lastsps_style  = lb_out   if :linebreak == lastsps_style
     if !prt.class.method_defined? :last_significant_element
       # Construct a Part instance from the given String.
@@ -172,7 +208,7 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
     clean_text_file_head_tail!( prt,
       firstlbs_style: firstlbs_style,
       lastsps_style:  lastsps_style,
-      is_debug: isdebug
+      is_debug: is_debug
     )
     # Replaces the linebreaks to the specified one
@@ -254,13 +290,13 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   # Class methods (Private)
   ##########
-  # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
-  # @param boundary_style (see Plaintext.clean_text#boundary_style)
+  # @param prt [PlainText:Part] (see PlainText.clean_text)
+  # @param boundary_style (see PlainText.clean_text)
   # @return [void]
   #
-  # @see Plaintext.clean_text
+  # @see PlainText.clean_text
   def self.clean_text_boundary!( prt,
-        boundary_style: $/*2,
+        boundary_style: ,
         is_debug: false
       )
@@ -280,20 +316,20 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   end # self.clean_text_boundary!
   private_class_method :clean_text_boundary!
-  # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
-  # @param lbs_style (see Plaintext.clean_text#lbs_style)
-  # @param sps_style (see Plaintext.clean_text#sps_style)
-  # @param lb_is_space (see Plaintext.clean_text#lb_is_space)
-  # @param delete_asian_space (see Plaintext.clean_text#delete_asian_space)
+  # @param prt [PlainText:Part] (see PlainText.clean_text)
+  # @param lbs_style (see PlainText.clean_text)
+  # @param sps_style (see PlainText.clean_text)
+  # @param lb_is_space (see PlainText.clean_text)
+  # @param delete_asian_space (see PlainText.clean_text)
   # @return [void]
   #
-  # @see Plaintext.clean_text
+  # @see PlainText.clean_text
   def self.clean_text_lbs_sps!(
         prt,
-        lbs_style: :truncate,
-        lb_is_space: false,
-        sps_style: :truncate,
-        delete_asian_space: true,
+        lbs_style:          ,
+        lb_is_space:        ,
+        sps_style:          ,
+        delete_asian_space: ,
         is_debug: false
       )
@@ -328,16 +364,16 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   end # self.clean_text_lbs_sps!
   private_class_method :clean_text_lbs_sps!
-  # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
-  # @param linehead_style [Symbol, String] (see Plaintext.clean_text#linehead_style)
-  # @param linetail_style [Symbol, String] (see Plaintext.clean_text#linetail_style)
+  # @param prt [PlainText:Part] (see PlainText.clean_text)
+  # @param linehead_style [Symbol, String] (see PlainText.clean_text)
+  # @param linetail_style [Symbol, String] (see PlainText.clean_text)
   # @return [void]
   #
-  # @see Plaintext.clean_text
+  # @see PlainText.clean_text
   def self.clean_text_line_head_tail!(
         prt,
-        linehead_style: :none,
-        linetail_style: :delete,
+        linehead_style: ,
+        linetail_style: ,
         is_debug: false
       )
@@ -371,16 +407,16 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   end # self.clean_text_line_head_tail!
   private_class_method :clean_text_line_head_tail!
-  # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
-  # @param firstlbs_style [Symbol, String] (see Plaintext.clean_text#firstlbs_style)
-  # @param lastsps_style [Symbol, String]  (see Plaintext.clean_text#lastsps_style)
+  # @param prt [PlainText:Part] (see PlainText.clean_text#prt)
+  # @param firstlbs_style [Symbol, String] (see PlainText.clean_text#firstlbs_style)
+  # @param lastsps_style [Symbol, String]  (see PlainText.clean_text#lastsps_style)
   # @return [void]
   #
-  # @see Plaintext.clean_text
+  # @see PlainText.clean_text
   def self.clean_text_file_head_tail!(
         prt,
-        firstlbs_style: :delete,
-        lastsps_style:  :truncate,
+        firstlbs_style: ,
+        lastsps_style:  ,
         is_debug: false
       )
@@ -452,19 +488,18 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   #
   # uses Part to transform a Paragraph into a Part
   #
-  # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
-  # @param sps_style (see Plaintext.clean_text#sps_style)
+  # @param prt [PlainText:Part] (see PlainText.clean_text)
+  # @param sps_style (see PlainText.clean_text)
   # @return [void]
   #
-  # @see Plaintext.clean_text
+  # @see PlainText.clean_text
   def self.clean_text_sps!(
         prt,
-        sps_style: :truncate,
+        sps_style: ,
         is_debug: false
       )
     prt.parts.each do |e_pa|
-      ru = ParseRule
       # Each line treated as a Paragraph, and [[:space:]]+ between them as a Boundary.
       # Then, to work on anything within a line except for line-head/tail is easy.
       prt_para = Part.parse(e_pa, rule: ParseRule::RuleEachLineStrip).map_parts { |e_li|
@@ -490,21 +525,16 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   ####################################################
   # Count the number of characters
-  #
-  # See {PlainText#clean_text!} for the optional parameters.  The defaults of a few of the optional parameters are different from {PlainText#clean_text!},
-  # such as the default for +lb_out+ is "\n" (so that a line-break is 1 byte in size).
+  #
+  # See {PlainText.count_char} and further {PlainText.clean_text!} for the optional parameters.  The defaults of a few of the optional parameters are different from the latter,
+  # such as the default for +lb_out+ is +"\n"+ (newline, so that a line-break is 1 byte in size).
   # It is so that this method is more optimized for East-Asian (CJK) characters, given this method is most useful for CJK Strings,
   # whereas, for European alphabets, counting the number of words, rather than characters as in this method, would be more standard.
   #
-  # @param (see PlainText#clean_text!)
+  # @param (see {PlainText.count_char})
   # @return [Integer]
-  def count_char(*rest,
-                 lbs_style: :delete,
-                 linehead_style: :delete,
-                 lastsps_style: :none,
-                 lb_out: "\n",
-                 **k)
-    PlainText.clean_text(self, *rest, lbs_style: lbs_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
+  def count_char(*rest, **k)
+    PlainText.public_send(__method__, self, *rest, **k)
   end
   # Delete all the spaces between CJK and European characters or numbers.
@@ -732,7 +762,7 @@ isdebug = true if prt == "\n  \n abc\n\n \ndef\n\n \n\n"
   # till the last one is returned.  "The next line" means (1) the line immediately after the match
   # if the matched string has the linebreak at the end, or (2) the line after the first linebreak after the matched string,
   # where the trailing characters after the matched string to the linebreak (inclusive) is ignored.
-  #
+  #
   # = Tips =
   # To specify the *last* line that matches the Regexp, consider prefixing +(?:.*)+ with the option +m+,
   # e.g., +/(?:.*)ABC/m+

data/plain_text.gemspec CHANGED Viewed

@@ -5,9 +5,9 @@ require 'date'
 Gem::Specification.new do |s|
   s.name = %q{plain_text}.sub(/.*/){|c| (c == File.basename(Dir.pwd)) ? c : raise("ERROR: s.name=(#{c}) in gemspec seems wrong!")}
-  s.version = "0.2"
+  s.version = "0.3"
   # s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
-  %w(countchar).each do |f|
+  %w(countchar textclean head.rb tail.rb).each do |f|
     path = s.bindir+'/'+f
     File.executable?(path) ? s.executables << f : raise("ERROR: Executable (#{path}) is not executable!")
   end

data/test/testcountchar.rb ADDED Viewed

@@ -0,0 +1,46 @@
+# -*- encoding: utf-8 -*-
+# Tests of an executable.
+#
+# @author: M. Sakano (Wise Babel Ltd)
+require 'open3'
+$stdout.sync=true
+$stderr.sync=true
+# print '$LOAD_PATH=';p $LOAD_PATH
+#################################################
+# Unit Test
+#################################################
+gem "minitest"
+# require 'minitest/unit'
+require 'minitest/autorun'
+class TestUnitCountchar < MiniTest::Test
+  T = true
+  F = false
+  SCFNAME = File.basename(__FILE__)
+  EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1')]
+  def setup
+  end
+  def teardown
+  end
+  def test_countchar01
+    o, e, s = Open3.capture3 EXE
+    assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_equal "0", o.chomp
+    assert_empty e
+    stin = "foo\n\n\nbar\n"
+    o, e, s = Open3.capture3 EXE, stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin.size-2, o.to_i
+    assert_empty e
+  end
+end # class TestUnitCountchar < MiniTest::Test

data/test/testhead_rb.rb ADDED Viewed

@@ -0,0 +1,70 @@
+# -*- encoding: utf-8 -*-
+# Tests of an executable.
+#
+# @author: M. Sakano (Wise Babel Ltd)
+require 'open3'
+$stdout.sync=true
+$stderr.sync=true
+# print '$LOAD_PATH=';p $LOAD_PATH
+#################################################
+# Unit Test
+#################################################
+gem "minitest"
+# require 'minitest/unit'
+require 'minitest/autorun'
+class TestUnitHeadRb < MiniTest::Test
+  T = true
+  F = false
+  SCFNAME = File.basename(__FILE__)
+  EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1').sub(/_rb$/, '.rb')]
+  def setup
+  end
+  def teardown
+  end
+  def test_countchar01
+    o, e, s = Open3.capture3 EXE
+    assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_equal "\n", o
+    assert_empty e
+    stin = "1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\n"
+    o, e, s = Open3.capture3 EXE, stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[0..19], o
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -i', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[20..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -n 10', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[0..19], o
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -b', stdin_data: stin
+    assert_equal 1, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_match(/missing/i, e)
+    o, e, s = Open3.capture3 EXE+' -e "[5-9]"', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[0..9], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -e "[5-9]" -x', stdin_data: stin
+    assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_equal stin[0..7], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+  end
+end # class TestUnitHeadRb < MiniTest::Test

data/test/testtail_rb.rb ADDED Viewed

@@ -0,0 +1,70 @@
+# -*- encoding: utf-8 -*-
+# Tests of an executable.
+#
+# @author: M. Sakano (Wise Babel Ltd)
+require 'open3'
+$stdout.sync=true
+$stderr.sync=true
+# print '$LOAD_PATH=';p $LOAD_PATH
+#################################################
+# Unit Test
+#################################################
+gem "minitest"
+# require 'minitest/unit'
+require 'minitest/autorun'
+class TestUnitTailRb < MiniTest::Test
+  T = true
+  F = false
+  SCFNAME = File.basename(__FILE__)
+  EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1').sub(/_rb$/, '.rb')]
+  def setup
+  end
+  def teardown
+  end
+  def test_countchar01
+    o, e, s = Open3.capture3 EXE
+    assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_equal "\n", o
+    assert_empty e
+    stin = "1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\n"
+    o, e, s = Open3.capture3 EXE, stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[2..-1], o
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -i', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[0..1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -n 10', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[2..-1], o
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -b', stdin_data: stin
+    assert_equal 1, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_match(/missing/i, e)
+    o, e, s = Open3.capture3 EXE+' -e "[5-9]"', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal stin[-6..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+    o, e, s = Open3.capture3 EXE+' -e "[5-9]" -x', stdin_data: stin
+    assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_equal stin[-4..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+  end
+end # class TestUnitTailRb < MiniTest::Test

data/test/testtextclean.rb ADDED Viewed

@@ -0,0 +1,52 @@
+# -*- encoding: utf-8 -*-
+# Tests of an executable.
+#
+# @author: M. Sakano (Wise Babel Ltd)
+require 'open3'
+$stdout.sync=true
+$stderr.sync=true
+# print '$LOAD_PATH=';p $LOAD_PATH
+#################################################
+# Unit Test
+#################################################
+gem "minitest"
+# require 'minitest/unit'
+require 'minitest/autorun'
+class TestUnitTextclean < MiniTest::Test
+  T = true
+  F = false
+  SCFNAME = File.basename(__FILE__)
+  EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1')]
+  def setup
+  end
+  def teardown
+  end
+  def test_textclean01
+    o, e, s = Open3.capture3 EXE
+    assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_equal "", o.chomp
+    assert_empty e
+    stin = "foo\n\n\nbar\n"
+    s2   = "foo\n\nbar\n"
+    #o, e, s = Open3.capture3 EXE, stdin_data: stin
+    #assert_equal 0, s.exitstatus
+    #assert_equal s2, o
+    #assert_empty e
+    o, e, s = Open3.capture3 EXE+' --lastsps-style=delete', stdin_data: stin
+    assert_equal 0, s.exitstatus
+    assert_equal s2.chop.chomp, o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
+    assert_empty e
+  end
+end # class TestUnitTextclean < MiniTest::Test

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: plain_text
 version: !ruby/object:Gem::Version
-  version: '0.2'
+  version: '0.3'
 platform: ruby
 authors:
 - Masa Sakano
@@ -17,6 +17,9 @@ description: This module provides utility functions and methods to handle plain
 email:
 executables:
 - countchar
+- textclean
+- head.rb
+- tail.rb
 extensions: []
 extra_rdoc_files:
 - README.en.rdoc
@@ -28,6 +31,9 @@ files:
 - README.en.rdoc
 - Rakefile
 - bin/countchar
+- bin/head.rb
+- bin/tail.rb
+- bin/textclean
 - lib/plain_text.rb
 - lib/plain_text/parse_rule.rb
 - lib/plain_text/part.rb
@@ -40,6 +46,10 @@ files:
 - test/test_plain_text_parse_rule.rb
 - test/test_plain_text_part.rb
 - test/test_plain_text_split.rb
+- test/testcountchar.rb
+- test/testhead_rb.rb
+- test/testtail_rb.rb
+- test/testtextclean.rb
 homepage: https://www.wisebabel.com
 licenses:
 - MIT
@@ -67,6 +77,10 @@ specification_version: 4
 summary: Module to handle Plain-Text
 test_files:
 - test/test_plain_text_parse_rule.rb
+- test/testtail_rb.rb
 - test/test_plain_text_part.rb
 - test/test_plain_text.rb
+- test/testcountchar.rb
+- test/testtextclean.rb
 - test/test_plain_text_split.rb
+- test/testhead_rb.rb