plain_text 0.2 → 0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2c015ed947812371558456375c2933f0d03720082899f8c58c699419eda77f1b
4
- data.tar.gz: fafc479d9bb492bd3b3ad140ec7a58d2cc0e7bc49dec4ad80c4111ad1f63e3df
3
+ metadata.gz: e93dc475c0c5f66817fbe63e6824f1ec9d1a36c487126548eec0dead4dfde3f4
4
+ data.tar.gz: e7e64d6aa8dd28ea282cd11f0bfc7d0a55476735ec9b6e7db7a638a48e0457d8
5
5
  SHA512:
6
- metadata.gz: cb7d054e24cc85c64bbb556d4de30b3b54c9b51b409519d9b7f307fbe64dc05dc32e6e7cbeccc027b41c842a31ec5b489e60801b1c1c1f72e587157f62f38391
7
- data.tar.gz: aef2b0ebd0c69f694c438cbf8d8e62d6d754d92c5d804553649c681d6c088bd9bb363197d9fb209b184aa49fb44ef5e733268e1d53a19bc7dfef260c86dee88c
6
+ metadata.gz: e877ab86109aaf3078990ae1ac55f534026ae8ff941f22af38e39d6655356ed9b5a1f69d4d1f7cfc05efcf812e29efe185dba94e97db88c5383d49eb6b487579
7
+ data.tar.gz: c2cbd7f86fd1779ab0cd18136a44680f58fce87f3c9c8068e8ef7a5c049ed40d47f2a0614f169902780a9a362a0a6bf46fee7f626ef74cf94fb2425266bac1a1
data/ChangeLog CHANGED
@@ -1,3 +1,19 @@
1
+ -----
2
+ (Version: 0.3)
3
+ 2019-10-27 Masa Sakano
4
+ * Added 3 executables textclean, head.rb, tail.rb in bin/ together with their tests
5
+ * lib/plaintext.rb refactoring
6
+ * Added a new constant `DEF_METHOD_OPTS`
7
+ * bin/countchar refactoring
8
+
9
+ -----
10
+ (Version: 0.3)
11
+ 2019-10-27 Masa Sakano
12
+ * Added 3 executables textclean, head.rb, tail.rb in bin/ together with their tests
13
+ * lib/plaintext.rb refactoring
14
+ * Added a new constant `DEF_METHOD_OPTS`
15
+ * bin/countchar refactoring
16
+
1
17
  -----
2
18
  (Version: 0.2)
3
19
  2019-10-27 Masa Sakano
data/README.en.rdoc CHANGED
@@ -7,8 +7,9 @@ This module provides utility functions and methods to handle plain
7
7
  text. In the namespace, classes Part/Paragraph/Boundary are defined,
8
8
  which represent the logical structure of a document and another class
9
9
  ParseRule, which describes the rules to parse plain text to produce a Part-type Ruby instance.
10
- This package also provides a command-line program to count the number
11
- of characters, especially useful for documents in Asian (CJK) chatacters.
10
+ This package also provides a few command-line programs, such as counting the number
11
+ of characters (especially useful for documents in Asian (CJK)
12
+ chatacters) and advanced head/tail commands.
12
13
 
13
14
  == Design concept
14
15
 
@@ -93,7 +94,10 @@ it is applied to each Paragraph and Section separately to split them further.
93
94
  standard methods to apply the rules to an object (either String or
94
95
  {PlainText::Part}.
95
96
 
96
- == Command-line tool
97
+ == Command-line tools
98
+
99
+ All the commands here accept +-h+ (or +--help+) option to print the
100
+ help message.
97
101
 
98
102
  === countchar
99
103
 
@@ -102,9 +106,54 @@ Counts the number of characters in a file(s) or STDIN.
102
106
  The simplest example to run the command-line script is
103
107
  countchar YourFile.txt
104
108
 
105
- You may start with
106
- countchar --help
107
- to see the available options.
109
+ === textclean
110
+
111
+ Wrapper command of {PlainText.clean_text}.
112
+ Outputs *cleaned* text, such as, truncating more than 3 linebreaks
113
+ into 2. See the reference of {PlainText.clean_text} for detail.
114
+
115
+ === head.rb
116
+
117
+ This gives advanced functions, in addition to the standard +head+, including
118
+
119
+ Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line).
120
+ Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
121
+ Inverse:: It can inverse the counting to ouput everything but initial NUM lines.
122
+
123
+ A few examples are
124
+
125
+ head.rb -n 5 < try.txt
126
+ # the same as the UNIX head; printing the first 5 lines
127
+
128
+ head.rb -i -n 5 try.txt
129
+ # printing everything but the first 5 lines
130
+ # The same as the UNIX command: tail -n +5
131
+
132
+ head.rb -e '^===+' try.txt
133
+ # => first line up to the line that begins with more than 3 "="
134
+
135
+ head.rb -x -e '^===+' try.txt
136
+ # => first line up to the line before what begins with more than 3 "="
137
+
138
+ The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
139
+
140
+ === tail.rb
141
+
142
+ This gives advanced functions, in addition to the standard +tail+, including
143
+
144
+ Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end).
145
+ Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
146
+ Inverse:: It can inverse the counting to ouput everything but the last NUM lines.
147
+
148
+ Note the UNIX form of
149
+
150
+ tail -n +5
151
+
152
+ (which I think is a bit counter-intuieive format) is equivalent to
153
+
154
+ head.rb -i -n 5
155
+
156
+ The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
108
157
 
109
158
  == Miscellaneous
110
159
 
@@ -119,13 +168,13 @@ sent by a String instance +s+ = +"XQabXXcXQ"+:
119
168
  s.split(/X+Q?/) #=> ["", "ab", "c"],
120
169
  s.split(/X+Q?/, -1) #=> ["", "ab", "c", ""],
121
170
  s.split(/X+(Q?)/, -1) #=> ["", "Q", "ab", "", "c", "Q", ""],
122
- s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],
123
-
171
+ s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],
172
+
124
173
  With this method,
125
174
 
126
175
  s.split_with_delimiter(/X+(Q?)/)
127
176
  #=> ["", "XQ", "ab", "XX", "c", "XQ"]
128
-
177
+
129
178
  from which the original string is always easily recovered by simple +join+.
130
179
 
131
180
  Also, {PlainText::Util} contains some miscellaneous methods.
@@ -134,8 +183,6 @@ Also, {PlainText::Util} contains some miscellaneous methods.
134
183
 
135
184
  Work in progress...
136
185
 
137
- It is still in a preliminary state.
138
-
139
186
  == Install
140
187
 
141
188
  This script requires {Ruby}[http://www.ruby-lang.org] Version 2.0
@@ -153,6 +200,8 @@ explicitly with your Ruby command as
153
200
 
154
201
  == Developer's note
155
202
 
203
+ The source code is maintained also in {Github}[https://github.com/masasakano/plain_text]
204
+
156
205
  === Tests
157
206
 
158
207
  Ruby codes under the directory <tt>test/</tt> are the test scripts.
data/bin/countchar CHANGED
@@ -11,22 +11,17 @@ __EOF__
11
11
 
12
12
  # Initialising the hash for the command-line options.
13
13
  OPTS = {
14
- preserve_paragraph: true,
15
- boundary_style: true,
16
- lbs_style: :t, # :truncate,
17
- lb_is_space: false,
18
- sps_style: :truncate,
19
- delete_asian_space: true,
20
- linehead_style: :delete,
21
- linetail_style: :delete,
22
- firstsps_style: :delete,
23
- lastsps_style: :truncate,
24
14
  line_i: nil,
25
15
  line_f: nil,
26
16
  # :chatter => 3, # Default
27
17
  debug: false,
28
18
  }
29
19
 
20
+ # Load the default values from the Module
21
+ PlainText::DEF_METHOD_OPTS[:count_char].each_key do |ek|
22
+ OPTS[ek] ||= PlainText::DEF_METHOD_OPTS[:count_char][ek]
23
+ end
24
+
30
25
  # Function to handle the command-line arguments.
31
26
  #
32
27
  # ARGV will be modified, and the constant variable OPTS is set.
@@ -45,8 +40,9 @@ def handle_argv
45
40
 
46
41
  opt.parse!(ARGV)
47
42
 
43
+ OPTS[:lbs_style] = OPTS[:lbs_style].to_s[0].to_sym
48
44
  unless %i(t d n).include? OPTS[:lbs_style]
49
- warn "ERROR: --lbs-style must be one of (t(runcate)|d(elete)|n(one))."; exit 1
45
+ warn "ERROR: --lbs-style must be one of (t(runcate)|d(elete)|n(one)), but given (#{OPTS[:lbs_style].inspect})"; exit 1
50
46
  end
51
47
 
52
48
  OPTS
@@ -67,20 +63,15 @@ end
67
63
  # Handle the command-line options => OPTS
68
64
  opts = handle_argv()
69
65
 
66
+ valid_keys = PlainText::DEF_METHOD_OPTS[:count_char].keys
67
+ opts.each_key do |ek|
68
+ opts.delete ek if !valid_keys.include? ek
69
+ end
70
+
70
71
  str = ARGF.read
71
72
 
72
- puts str.count_char(
73
- preserve_paragraph: opts[:preserve_paragraph],
74
- boundary_style: opts[:boundary_style],
75
- lbs_style: opts[:lbs_style],
76
- lb_is_space: opts[:lb_is_space],
77
- sps_style: opts[:sps_style],
78
- delete_asian_space: opts[:delete_asian_space],
79
- linehead_style: opts[:linehead_style],
80
- linetail_style: opts[:linetail_style],
81
- firstsps_style: opts[:firstsps_style],
82
- lastsps_style: opts[:lastsps_style],
83
- )
73
+ puts PlainText.count_char(str, **opts)
74
+ # str.count_char() should be equivalent.
84
75
 
85
76
  exit
86
77
 
data/bin/head.rb ADDED
@@ -0,0 +1,87 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- coding: utf-8 -*-
3
+
4
+ require 'optparse'
5
+ require 'plain_text'
6
+
7
+ BANNER = <<"__EOF__"
8
+ USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
9
+ Head command with (multi-byte) character-based manipulation and Regexp.
10
+ __EOF__
11
+
12
+ # Initialising the hash for the command-line options.
13
+ OPTS = {
14
+ num: PlainText::DEF_HEADTAIL_N_LINES,
15
+ unit: :line,
16
+ inclusive: true,
17
+ inverse: false, # unique option
18
+ # :chatter => 3, # Default
19
+ debug: false,
20
+ }
21
+
22
+ # Function to handle the command-line arguments.
23
+ #
24
+ # ARGV will be modified, and the constant variable OPTS is set.
25
+ #
26
+ # @return [Hash] Optional-argument hash.
27
+ #
28
+ def handle_argv
29
+ opt = OptionParser.new(BANNER)
30
+ opt.separator "Options:" # Way to control a help message.
31
+ opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
32
+ opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
33
+ opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
34
+ opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
35
+ opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
36
+ opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
37
+ # opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
38
+ # opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
39
+ # opt.separator "" # Way to control a help message.
40
+ # opt.separator "Note:"
41
+ # opt.separator " Spaces are truncated in default."
42
+
43
+ begin
44
+ opt.parse!(ARGV)
45
+ rescue OptionParser::MissingArgument => er
46
+ # Missing argument like "-b" without a number.
47
+ warn er
48
+ exit 1
49
+ end
50
+
51
+ OPTS
52
+ end
53
+
54
+
55
+ ################################################
56
+ # MAIN
57
+ ################################################
58
+
59
+ $stdout.sync=true
60
+ $stderr.sync=true
61
+
62
+ class String
63
+ include PlainText
64
+ end
65
+
66
+ # Handle the command-line options => OPTS
67
+ opts = handle_argv()
68
+ num_in = opts[:num]
69
+ is_inverse = opts[:inverse]
70
+
71
+ %i(num inverse debug).each do |ek|
72
+ opts.delete ek if opts.has_key? ek
73
+ end
74
+
75
+ str = ARGF.read
76
+
77
+ # A linebreak guaranteed at the end.
78
+ if is_inverse
79
+ puts PlainText.head_inverse(str, num_in, **opts)
80
+ else
81
+ puts PlainText.head(str, num_in, **opts)
82
+ end
83
+
84
+ exit
85
+
86
+ __END__
87
+
data/bin/tail.rb ADDED
@@ -0,0 +1,87 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- coding: utf-8 -*-
3
+
4
+ require 'optparse'
5
+ require 'plain_text'
6
+
7
+ BANNER = <<"__EOF__"
8
+ USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
9
+ tail command with (multi-byte) character-based manipulation and Regexp.
10
+ __EOF__
11
+
12
+ # Initialising the hash for the command-line options.
13
+ OPTS = {
14
+ num: PlainText::DEF_HEADTAIL_N_LINES,
15
+ unit: :line,
16
+ inclusive: true,
17
+ inverse: false, # unique option
18
+ # :chatter => 3, # Default
19
+ debug: false,
20
+ }
21
+
22
+ # Function to handle the command-line arguments.
23
+ #
24
+ # ARGV will be modified, and the constant variable OPTS is set.
25
+ #
26
+ # @return [Hash] Optional-argument hash.
27
+ #
28
+ def handle_argv
29
+ opt = OptionParser.new(BANNER)
30
+ opt.separator "Options:" # Way to control a help message.
31
+ opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
32
+ opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
33
+ opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
34
+ opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
35
+ opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
36
+ opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
37
+ # opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
38
+ # opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
39
+ opt.separator "" # Way to control a help message.
40
+ opt.separator "Note:"
41
+ opt.separator " UNIX command of 'tail -n +5' is equivalent to 'head.rb -i -n 5'"
42
+
43
+ begin
44
+ opt.parse!(ARGV)
45
+ rescue OptionParser::MissingArgument => er
46
+ # Missing argument like "-b" without a number.
47
+ warn er
48
+ exit 1
49
+ end
50
+
51
+ OPTS
52
+ end
53
+
54
+
55
+ ################################################
56
+ # MAIN
57
+ ################################################
58
+
59
+ $stdout.sync=true
60
+ $stderr.sync=true
61
+
62
+ class String
63
+ include PlainText
64
+ end
65
+
66
+ # Handle the command-line options => OPTS
67
+ opts = handle_argv()
68
+ num_in = opts[:num]
69
+ is_inverse = opts[:inverse]
70
+
71
+ %i(num inverse debug).each do |ek|
72
+ opts.delete ek if opts.has_key? ek
73
+ end
74
+
75
+ str = ARGF.read
76
+
77
+ # A linebreak guaranteed at the end.
78
+ if is_inverse
79
+ puts PlainText.tail_inverse(str, num_in, **opts)
80
+ else
81
+ puts PlainText.tail(str, num_in, **opts)
82
+ end
83
+
84
+ exit
85
+
86
+ __END__
87
+
data/bin/textclean ADDED
@@ -0,0 +1,103 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- coding: utf-8 -*-
3
+
4
+ require 'optparse'
5
+ require 'plain_text'
6
+
7
+ BANNER = <<"__EOF__"
8
+ USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
9
+ Clean the text file INFILE (or STDIN), unifying linebreaks, and outputs it.
10
+ __EOF__
11
+
12
+ # Initialising the hash for the command-line options.
13
+ OPTS = {
14
+ line_i: nil,
15
+ line_f: nil,
16
+ # :chatter => 3, # Default
17
+ debug: false,
18
+ }
19
+
20
+ # Load the default values from the Module
21
+ PlainText::DEF_METHOD_OPTS[:clean_text].each_key do |ek|
22
+ OPTS[ek] ||= PlainText::DEF_METHOD_OPTS[:clean_text][ek]
23
+ end
24
+
25
+ # Function to handle the command-line arguments.
26
+ #
27
+ # ARGV will be modified, and the constant variable OPTS is set.
28
+ #
29
+ # @return [Hash] Optional-argument hash.
30
+ #
31
+ def handle_argv
32
+ opt = OptionParser.new(BANNER)
33
+ opt.on( '--[no-]preserve_paragraph', sprintf("Preserved paragraph structures? (Def: %s)", OPTS[:preserve_paragraph].inspect), TrueClass) {|v| OPTS[:preserve_paragraph] = v}
34
+ opt.on( '--boundary-style=STYLE', sprintf("One of (t(runcate)(2)|d(elete)|n(one)) (Def: truncate).")) { |v| OPTS[:boundary_style]=v.strip }
35
+ opt.on( '--lbs-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:lbs_style])) { |v| OPTS[:lbs_style]=v.strip }
36
+ opt.on( '--[no-]lb-is-space', sprintf("Linebraeks are equivalent to spaces? (Def: %s)", OPTS[:lb_is_space].inspect), TrueClass) {|v| OPTS[:lb_is_space] = v}
37
+ opt.on( '--sps-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:sps_style])) { |v| OPTS[:sps_style]=v.strip }
38
+ opt.on( '--[no-]delete-asian-space', sprintf("Deletes spaces between, before or after a CJK character? (Def: %s)", OPTS[:delete_asian_space].inspect), TrueClass) {|v| OPTS[:delete_asian_space] = v}
39
+ opt.on( '--linehead-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:linehead_style])) { |v| OPTS[:linehead_style]=v.strip }
40
+ opt.on( '--linetail-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:linetail_style])) { |v| OPTS[:linetail_style]=v.strip }
41
+ opt.on( '--firstlbs-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:firstlbs_style])) { |v| OPTS[:firstlbs_style]=v.strip }
42
+ opt.on( '--lastsps-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)|m(arkdown)) (Def: %s).", OPTS[:lastsps_style])) { |v| OPTS[:lastsps_style]=v.strip }
43
+ # opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
44
+ opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
45
+ # opt.separator "" # Way to control a help message.
46
+ # opt.separator "Note:"
47
+ # opt.separator " Spaces are truncated in default."
48
+
49
+ opt.parse!(ARGV)
50
+
51
+ if (OPTS[:boundary_style].class.method_defined?(:to_str) &&
52
+ /\A(t(runcate)?(2)?|d(elete)?|n(one)?)\z/ =~ OPTS[:boundary_style])
53
+ OPTS[:boundary_style] = OPTS[:boundary_style].to_sym
54
+ end
55
+
56
+ %w(lbs sps linehead linetail firstlbs lastsps).each do |ek_head|
57
+ sym_k = (ek_head+"_style").to_sym
58
+ trysym = OPTS[sym_k].to_s[0].to_sym # Symbol of 1 character (nb., NOT boundary_style)
59
+ if (!(%i(t d n).include? trysym) && (sym_k != :lastsps_style) ||
60
+ !(%i(t d n m).include? trysym) && (sym_k == :lastsps_style))
61
+ errmsg = sprintf(
62
+ "ERROR: --%s-style must be one of (t(runcate)|d(elete)%s|n(one)), but given %s.",
63
+ ek_head,
64
+ ((ek_head == "lastsps") ? "|m(arkdown)" : ""),
65
+ OPTS[sym_k].inspect
66
+ )
67
+ warn errmsg
68
+ exit 1
69
+ end
70
+ OPTS[sym_k] = trysym
71
+ end
72
+
73
+ OPTS
74
+ end
75
+
76
+
77
+ ################################################
78
+ # MAIN
79
+ ################################################
80
+
81
+ $stdout.sync=true
82
+ $stderr.sync=true
83
+
84
+ class String
85
+ include PlainText
86
+ end
87
+
88
+ # Handle the command-line options => OPTS
89
+ opts = handle_argv()
90
+
91
+ valid_keys = PlainText::DEF_METHOD_OPTS[:clean_text].keys
92
+ opts.each_key do |ek|
93
+ opts.delete ek if !valid_keys.include? ek
94
+ end
95
+
96
+ str = ARGF.read
97
+
98
+ print PlainText.clean_text(str, **opts)
99
+
100
+ exit
101
+
102
+ __END__
103
+
data/lib/plain_text.rb CHANGED
@@ -25,6 +25,36 @@ module PlainText
25
25
  # Default number of lines to extract for {#head} and {#tail}
26
26
  DEF_HEADTAIL_N_LINES = 10
27
27
 
28
+ # Default options for class/instance methods
29
+ DEF_METHOD_OPTS = {
30
+ :clean_text => {
31
+ preserve_paragraph: true,
32
+ boundary_style: true, # If unspecified, will be replaced with lb_out * 2
33
+ lbs_style: :truncate,
34
+ lb_is_space: false,
35
+ sps_style: :truncate,
36
+ delete_asian_space: true,
37
+ linehead_style: :none,
38
+ linetail_style: :delete,
39
+ firstlbs_style: :delete,
40
+ lastsps_style: :truncate,
41
+ lb: $/,
42
+ lb_out: nil, # If unspecified, will be replaced with lb
43
+ },
44
+ :count_char => {
45
+ lbs_style: :delete,
46
+ linehead_style: :delete,
47
+ lastsps_style: :delete,
48
+ lb_out: "\n",
49
+ },
50
+ }
51
+
52
+ # Adjusts DEF_METHOD_OPTS[:count_char]
53
+ DEF_METHOD_OPTS[:clean_text].each_key do |ek|
54
+ # %i(preserve_paragraph boundary_style lb_is_space sps_style delete_asian_space linetail_style firstlbs_style lb).each do |ek|
55
+ DEF_METHOD_OPTS[:count_char][ek] ||= DEF_METHOD_OPTS[:clean_text][ek]
56
+ end
57
+
28
58
  # Call instance method as a Module function
29
59
  #
30
60
  # The return String includes {PlainText} as Singleton.
@@ -39,33 +69,39 @@ module PlainText
39
69
  end
40
70
 
41
71
  # If the class of the obj does not "include" this module, do so in the singular class.
42
- #
72
+ #
43
73
  # @param obj [Object] Maybe String. For which a singular class def is run, if the condition is met.
44
74
  # @return [TrueClass, NilClass] true if the singular class def is run. Else nil.
45
75
  def self.extend_this(obj)
46
- return nil if defined? obj.delete_spaces_bw_cjk_european!
76
+ return nil if defined? obj.delete_spaces_bw_cjk_european!
47
77
  obj.extend(PlainText)
48
78
  true
49
79
  end
50
80
 
51
- # Module function of {#count_char}
81
+ # Count the number of characters
82
+ #
83
+ # See {PlainText#clean_text!} for the optional parameters. The defaults of a few of the optional parameters are different from it,
84
+ # such as the default for +lb_out+ is +"\n"+ (newline, so that a line-break is 1 byte in size).
85
+ # It is so that this method is more optimized for East-Asian (CJK) characters, given this method is most useful for CJK Strings,
86
+ # whereas, for European alphabets, counting the number of words, rather than characters as in this method, would be more standard.
52
87
  #
53
88
  # @param instr [String] String for which the number of chars is counted
54
89
  # @param (see #count_char)
55
90
  # @return [Integer]
56
91
  def self.count_char(instr, *rest,
57
- lbs_style: :delete,
58
- linehead_style: :delete,
59
- lastsps_style: :delete,
60
- lb_out: "\n",
61
- **k)
62
- clean_text(instr, *rest, lbs_style: lbs_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
92
+ lbs_style: DEF_METHOD_OPTS[:count_char][:lbs_style],
93
+ linehead_style: DEF_METHOD_OPTS[:count_char][:linehead_style],
94
+ lastsps_style: DEF_METHOD_OPTS[:count_char][:lastsps_style],
95
+ lb_out: DEF_METHOD_OPTS[:count_char][:lb_out],
96
+ **k
97
+ )
98
+ clean_text(instr, *rest, lbs_style: lbs_style, linehead_style: linehead_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
63
99
  end
64
100
 
65
101
 
66
102
  # Cleans the text
67
103
  #
68
- # Such as, removing extra spaces, normalising the linebreaks, etc.
104
+ # Such as, removing extra spaces, normalising the linebreaks, etc.
69
105
  #
70
106
  # In default,
71
107
  #
@@ -77,9 +113,9 @@ module PlainText
77
113
  # * Trailing white spaces in each line are deleted: +linetail_style=:delete+
78
114
  # * Line-breaks at the beginning of the entire input string are deleted: +firstlbs_style=:delete+
79
115
  # * Trailing white spaces and line-breaks at the end of the entire input string are truncated into a single linebreak: +lastsps_style=:truncate+
80
- #
116
+ #
81
117
  # For a String with predominantly CJK characters, the following setting is recommended:
82
- #
118
+ #
83
119
  # * +lbs_style: :delete+
84
120
  # * +delete_asian_space: true+ (Default)
85
121
  #
@@ -111,26 +147,26 @@ module PlainText
111
147
  #
112
148
  def self.clean_text(
113
149
  prt,
114
- preserve_paragraph: true,
115
- boundary_style: true, # If unspecified, will be replaced with lb_out * 2
116
- lbs_style: :truncate,
117
- lb_is_space: false,
118
- sps_style: :truncate,
119
- delete_asian_space: true,
120
- linehead_style: :none,
121
- linetail_style: :delete,
122
- firstlbs_style: :delete,
123
- lastsps_style: :truncate,
124
- lb: $/,
125
- lb_out: nil, # If unspecified, will be replaced with lb
150
+ preserve_paragraph: DEF_METHOD_OPTS[:clean_text][:preserve_paragraph],
151
+ boundary_style: DEF_METHOD_OPTS[:clean_text][:boundary_style], # If unspecified, will be replaced with lb_out * 2
152
+ lbs_style: DEF_METHOD_OPTS[:clean_text][:lbs_style],
153
+ lb_is_space: DEF_METHOD_OPTS[:clean_text][:lb_is_space],
154
+ sps_style: DEF_METHOD_OPTS[:clean_text][:sps_style],
155
+ delete_asian_space: DEF_METHOD_OPTS[:clean_text][:delete_asian_space],
156
+ linehead_style: DEF_METHOD_OPTS[:clean_text][:linehead_style],
157
+ linetail_style: DEF_METHOD_OPTS[:clean_text][:linetail_style],
158
+ firstlbs_style: DEF_METHOD_OPTS[:clean_text][:firstlbs_style],
159
+ lastsps_style: DEF_METHOD_OPTS[:clean_text][:lastsps_style],
160
+ lb: DEF_METHOD_OPTS[:clean_text][:lb],
161
+ lb_out: DEF_METHOD_OPTS[:clean_text][:lb_out], # If unspecified, will be replaced with lb
126
162
  is_debug: false
127
163
  )
128
164
 
129
- isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
165
+ #isdebug = true if prt == "foo\n\n\nbar\n"
130
166
  lb_out ||= lb # Output linebreak
131
167
  boundary_style = lb_out*2 if true == boundary_style
132
168
  boundary_style = "" if [:delete, :d].include? boundary_style
133
- lastsps_style = lb_out if :linebreak == lastsps_style
169
+ lastsps_style = lb_out if :linebreak == lastsps_style
134
170
 
135
171
  if !prt.class.method_defined? :last_significant_element
136
172
  # Construct a Part instance from the given String.
@@ -172,7 +208,7 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
172
208
  clean_text_file_head_tail!( prt,
173
209
  firstlbs_style: firstlbs_style,
174
210
  lastsps_style: lastsps_style,
175
- is_debug: isdebug
211
+ is_debug: is_debug
176
212
  )
177
213
 
178
214
  # Replaces the linebreaks to the specified one
@@ -254,13 +290,13 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
254
290
  # Class methods (Private)
255
291
  ##########
256
292
 
257
- # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
258
- # @param boundary_style (see Plaintext.clean_text#boundary_style)
293
+ # @param prt [PlainText:Part] (see PlainText.clean_text)
294
+ # @param boundary_style (see PlainText.clean_text)
259
295
  # @return [void]
260
296
  #
261
- # @see Plaintext.clean_text
297
+ # @see PlainText.clean_text
262
298
  def self.clean_text_boundary!( prt,
263
- boundary_style: $/*2,
299
+ boundary_style: ,
264
300
  is_debug: false
265
301
  )
266
302
 
@@ -280,20 +316,20 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
280
316
  end # self.clean_text_boundary!
281
317
  private_class_method :clean_text_boundary!
282
318
 
283
- # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
284
- # @param lbs_style (see Plaintext.clean_text#lbs_style)
285
- # @param sps_style (see Plaintext.clean_text#sps_style)
286
- # @param lb_is_space (see Plaintext.clean_text#lb_is_space)
287
- # @param delete_asian_space (see Plaintext.clean_text#delete_asian_space)
319
+ # @param prt [PlainText:Part] (see PlainText.clean_text)
320
+ # @param lbs_style (see PlainText.clean_text)
321
+ # @param sps_style (see PlainText.clean_text)
322
+ # @param lb_is_space (see PlainText.clean_text)
323
+ # @param delete_asian_space (see PlainText.clean_text)
288
324
  # @return [void]
289
325
  #
290
- # @see Plaintext.clean_text
326
+ # @see PlainText.clean_text
291
327
  def self.clean_text_lbs_sps!(
292
328
  prt,
293
- lbs_style: :truncate,
294
- lb_is_space: false,
295
- sps_style: :truncate,
296
- delete_asian_space: true,
329
+ lbs_style: ,
330
+ lb_is_space: ,
331
+ sps_style: ,
332
+ delete_asian_space: ,
297
333
  is_debug: false
298
334
  )
299
335
 
@@ -328,16 +364,16 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
328
364
  end # self.clean_text_lbs_sps!
329
365
  private_class_method :clean_text_lbs_sps!
330
366
 
331
- # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
332
- # @param linehead_style [Symbol, String] (see Plaintext.clean_text#linehead_style)
333
- # @param linetail_style [Symbol, String] (see Plaintext.clean_text#linetail_style)
367
+ # @param prt [PlainText:Part] (see PlainText.clean_text)
368
+ # @param linehead_style [Symbol, String] (see PlainText.clean_text)
369
+ # @param linetail_style [Symbol, String] (see PlainText.clean_text)
334
370
  # @return [void]
335
371
  #
336
- # @see Plaintext.clean_text
372
+ # @see PlainText.clean_text
337
373
  def self.clean_text_line_head_tail!(
338
374
  prt,
339
- linehead_style: :none,
340
- linetail_style: :delete,
375
+ linehead_style: ,
376
+ linetail_style: ,
341
377
  is_debug: false
342
378
  )
343
379
 
@@ -371,16 +407,16 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
371
407
  end # self.clean_text_line_head_tail!
372
408
  private_class_method :clean_text_line_head_tail!
373
409
 
374
- # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
375
- # @param firstlbs_style [Symbol, String] (see Plaintext.clean_text#firstlbs_style)
376
- # @param lastsps_style [Symbol, String] (see Plaintext.clean_text#lastsps_style)
410
+ # @param prt [PlainText:Part] (see PlainText.clean_text#prt)
411
+ # @param firstlbs_style [Symbol, String] (see PlainText.clean_text#firstlbs_style)
412
+ # @param lastsps_style [Symbol, String] (see PlainText.clean_text#lastsps_style)
377
413
  # @return [void]
378
414
  #
379
- # @see Plaintext.clean_text
415
+ # @see PlainText.clean_text
380
416
  def self.clean_text_file_head_tail!(
381
417
  prt,
382
- firstlbs_style: :delete,
383
- lastsps_style: :truncate,
418
+ firstlbs_style: ,
419
+ lastsps_style: ,
384
420
  is_debug: false
385
421
  )
386
422
 
@@ -452,19 +488,18 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
452
488
  #
453
489
  # uses Part to transform a Paragraph into a Part
454
490
  #
455
- # @param prt [PlainText:Part] (see Plaintext.clean_text#prt)
456
- # @param sps_style (see Plaintext.clean_text#sps_style)
491
+ # @param prt [PlainText:Part] (see PlainText.clean_text)
492
+ # @param sps_style (see PlainText.clean_text)
457
493
  # @return [void]
458
494
  #
459
- # @see Plaintext.clean_text
495
+ # @see PlainText.clean_text
460
496
  def self.clean_text_sps!(
461
497
  prt,
462
- sps_style: :truncate,
498
+ sps_style: ,
463
499
  is_debug: false
464
500
  )
465
501
 
466
502
  prt.parts.each do |e_pa|
467
- ru = ParseRule
468
503
  # Each line treated as a Paragraph, and [[:space:]]+ between them as a Boundary.
469
504
  # Then, to work on anything within a line except for line-head/tail is easy.
470
505
  prt_para = Part.parse(e_pa, rule: ParseRule::RuleEachLineStrip).map_parts { |e_li|
@@ -490,21 +525,16 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
490
525
  ####################################################
491
526
 
492
527
  # Count the number of characters
493
- #
494
- # See {PlainText#clean_text!} for the optional parameters. The defaults of a few of the optional parameters are different from {PlainText#clean_text!},
495
- # such as the default for +lb_out+ is "\n" (so that a line-break is 1 byte in size).
528
+ #
529
+ # See {PlainText.count_char} and further {PlainText.clean_text!} for the optional parameters. The defaults of a few of the optional parameters are different from the latter,
530
+ # such as the default for +lb_out+ is +"\n"+ (newline, so that a line-break is 1 byte in size).
496
531
  # It is so that this method is more optimized for East-Asian (CJK) characters, given this method is most useful for CJK Strings,
497
532
  # whereas, for European alphabets, counting the number of words, rather than characters as in this method, would be more standard.
498
533
  #
499
- # @param (see PlainText#clean_text!)
534
+ # @param (see {PlainText.count_char})
500
535
  # @return [Integer]
501
- def count_char(*rest,
502
- lbs_style: :delete,
503
- linehead_style: :delete,
504
- lastsps_style: :none,
505
- lb_out: "\n",
506
- **k)
507
- PlainText.clean_text(self, *rest, lbs_style: lbs_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
536
+ def count_char(*rest, **k)
537
+ PlainText.public_send(__method__, self, *rest, **k)
508
538
  end
509
539
 
510
540
  # Delete all the spaces between CJK and European characters or numbers.
@@ -732,7 +762,7 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
732
762
  # till the last one is returned. "The next line" means (1) the line immediately after the match
733
763
  # if the matched string has the linebreak at the end, or (2) the line after the first linebreak after the matched string,
734
764
  # where the trailing characters after the matched string to the linebreak (inclusive) is ignored.
735
- #
765
+ #
736
766
  # = Tips =
737
767
  # To specify the *last* line that matches the Regexp, consider prefixing +(?:.*)+ with the option +m+,
738
768
  # e.g., +/(?:.*)ABC/m+
data/plain_text.gemspec CHANGED
@@ -5,9 +5,9 @@ require 'date'
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{plain_text}.sub(/.*/){|c| (c == File.basename(Dir.pwd)) ? c : raise("ERROR: s.name=(#{c}) in gemspec seems wrong!")}
8
- s.version = "0.2"
8
+ s.version = "0.3"
9
9
  # s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
10
- %w(countchar).each do |f|
10
+ %w(countchar textclean head.rb tail.rb).each do |f|
11
11
  path = s.bindir+'/'+f
12
12
  File.executable?(path) ? s.executables << f : raise("ERROR: Executable (#{path}) is not executable!")
13
13
  end
@@ -0,0 +1,46 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ # Tests of an executable.
4
+ #
5
+ # @author: M. Sakano (Wise Babel Ltd)
6
+
7
+ require 'open3'
8
+
9
+ $stdout.sync=true
10
+ $stderr.sync=true
11
+ # print '$LOAD_PATH=';p $LOAD_PATH
12
+
13
+ #################################################
14
+ # Unit Test
15
+ #################################################
16
+
17
+ gem "minitest"
18
+ # require 'minitest/unit'
19
+ require 'minitest/autorun'
20
+
21
+ class TestUnitCountchar < MiniTest::Test
22
+ T = true
23
+ F = false
24
+ SCFNAME = File.basename(__FILE__)
25
+ EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1')]
26
+
27
+ def setup
28
+ end
29
+
30
+ def teardown
31
+ end
32
+
33
+ def test_countchar01
34
+ o, e, s = Open3.capture3 EXE
35
+ assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
36
+ assert_equal "0", o.chomp
37
+ assert_empty e
38
+
39
+ stin = "foo\n\n\nbar\n"
40
+ o, e, s = Open3.capture3 EXE, stdin_data: stin
41
+ assert_equal 0, s.exitstatus
42
+ assert_equal stin.size-2, o.to_i
43
+ assert_empty e
44
+ end
45
+ end # class TestUnitCountchar < MiniTest::Test
46
+
@@ -0,0 +1,70 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ # Tests of an executable.
4
+ #
5
+ # @author: M. Sakano (Wise Babel Ltd)
6
+
7
+ require 'open3'
8
+
9
+ $stdout.sync=true
10
+ $stderr.sync=true
11
+ # print '$LOAD_PATH=';p $LOAD_PATH
12
+
13
+ #################################################
14
+ # Unit Test
15
+ #################################################
16
+
17
+ gem "minitest"
18
+ # require 'minitest/unit'
19
+ require 'minitest/autorun'
20
+
21
+ class TestUnitHeadRb < MiniTest::Test
22
+ T = true
23
+ F = false
24
+ SCFNAME = File.basename(__FILE__)
25
+ EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1').sub(/_rb$/, '.rb')]
26
+
27
+ def setup
28
+ end
29
+
30
+ def teardown
31
+ end
32
+
33
+ def test_countchar01
34
+ o, e, s = Open3.capture3 EXE
35
+ assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
36
+ assert_equal "\n", o
37
+ assert_empty e
38
+
39
+ stin = "1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\n"
40
+ o, e, s = Open3.capture3 EXE, stdin_data: stin
41
+ assert_equal 0, s.exitstatus
42
+ assert_equal stin[0..19], o
43
+ assert_empty e
44
+
45
+ o, e, s = Open3.capture3 EXE+' -i', stdin_data: stin
46
+ assert_equal 0, s.exitstatus
47
+ assert_equal stin[20..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
48
+ assert_empty e
49
+
50
+ o, e, s = Open3.capture3 EXE+' -n 10', stdin_data: stin
51
+ assert_equal 0, s.exitstatus
52
+ assert_equal stin[0..19], o
53
+ assert_empty e
54
+
55
+ o, e, s = Open3.capture3 EXE+' -b', stdin_data: stin
56
+ assert_equal 1, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
57
+ assert_match(/missing/i, e)
58
+
59
+ o, e, s = Open3.capture3 EXE+' -e "[5-9]"', stdin_data: stin
60
+ assert_equal 0, s.exitstatus
61
+ assert_equal stin[0..9], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
62
+ assert_empty e
63
+
64
+ o, e, s = Open3.capture3 EXE+' -e "[5-9]" -x', stdin_data: stin
65
+ assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
66
+ assert_equal stin[0..7], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
67
+ assert_empty e
68
+ end
69
+ end # class TestUnitHeadRb < MiniTest::Test
70
+
@@ -0,0 +1,70 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ # Tests of an executable.
4
+ #
5
+ # @author: M. Sakano (Wise Babel Ltd)
6
+
7
+ require 'open3'
8
+
9
+ $stdout.sync=true
10
+ $stderr.sync=true
11
+ # print '$LOAD_PATH=';p $LOAD_PATH
12
+
13
+ #################################################
14
+ # Unit Test
15
+ #################################################
16
+
17
+ gem "minitest"
18
+ # require 'minitest/unit'
19
+ require 'minitest/autorun'
20
+
21
+ class TestUnitTailRb < MiniTest::Test
22
+ T = true
23
+ F = false
24
+ SCFNAME = File.basename(__FILE__)
25
+ EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1').sub(/_rb$/, '.rb')]
26
+
27
+ def setup
28
+ end
29
+
30
+ def teardown
31
+ end
32
+
33
+ def test_countchar01
34
+ o, e, s = Open3.capture3 EXE
35
+ assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
36
+ assert_equal "\n", o
37
+ assert_empty e
38
+
39
+ stin = "1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\n"
40
+ o, e, s = Open3.capture3 EXE, stdin_data: stin
41
+ assert_equal 0, s.exitstatus
42
+ assert_equal stin[2..-1], o
43
+ assert_empty e
44
+
45
+ o, e, s = Open3.capture3 EXE+' -i', stdin_data: stin
46
+ assert_equal 0, s.exitstatus
47
+ assert_equal stin[0..1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
48
+ assert_empty e
49
+
50
+ o, e, s = Open3.capture3 EXE+' -n 10', stdin_data: stin
51
+ assert_equal 0, s.exitstatus
52
+ assert_equal stin[2..-1], o
53
+ assert_empty e
54
+
55
+ o, e, s = Open3.capture3 EXE+' -b', stdin_data: stin
56
+ assert_equal 1, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
57
+ assert_match(/missing/i, e)
58
+
59
+ o, e, s = Open3.capture3 EXE+' -e "[5-9]"', stdin_data: stin
60
+ assert_equal 0, s.exitstatus
61
+ assert_equal stin[-6..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
62
+ assert_empty e
63
+
64
+ o, e, s = Open3.capture3 EXE+' -e "[5-9]" -x', stdin_data: stin
65
+ assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
66
+ assert_equal stin[-4..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
67
+ assert_empty e
68
+ end
69
+ end # class TestUnitTailRb < MiniTest::Test
70
+
@@ -0,0 +1,52 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ # Tests of an executable.
4
+ #
5
+ # @author: M. Sakano (Wise Babel Ltd)
6
+
7
+ require 'open3'
8
+
9
+ $stdout.sync=true
10
+ $stderr.sync=true
11
+ # print '$LOAD_PATH=';p $LOAD_PATH
12
+
13
+ #################################################
14
+ # Unit Test
15
+ #################################################
16
+
17
+ gem "minitest"
18
+ # require 'minitest/unit'
19
+ require 'minitest/autorun'
20
+
21
+ class TestUnitTextclean < MiniTest::Test
22
+ T = true
23
+ F = false
24
+ SCFNAME = File.basename(__FILE__)
25
+ EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1')]
26
+
27
+ def setup
28
+ end
29
+
30
+ def teardown
31
+ end
32
+
33
+ def test_textclean01
34
+ o, e, s = Open3.capture3 EXE
35
+ assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
36
+ assert_equal "", o.chomp
37
+ assert_empty e
38
+
39
+ stin = "foo\n\n\nbar\n"
40
+ s2 = "foo\n\nbar\n"
41
+ #o, e, s = Open3.capture3 EXE, stdin_data: stin
42
+ #assert_equal 0, s.exitstatus
43
+ #assert_equal s2, o
44
+ #assert_empty e
45
+
46
+ o, e, s = Open3.capture3 EXE+' --lastsps-style=delete', stdin_data: stin
47
+ assert_equal 0, s.exitstatus
48
+ assert_equal s2.chop.chomp, o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
49
+ assert_empty e
50
+ end
51
+ end # class TestUnitTextclean < MiniTest::Test
52
+
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: plain_text
3
3
  version: !ruby/object:Gem::Version
4
- version: '0.2'
4
+ version: '0.3'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Masa Sakano
@@ -17,6 +17,9 @@ description: This module provides utility functions and methods to handle plain
17
17
  email:
18
18
  executables:
19
19
  - countchar
20
+ - textclean
21
+ - head.rb
22
+ - tail.rb
20
23
  extensions: []
21
24
  extra_rdoc_files:
22
25
  - README.en.rdoc
@@ -28,6 +31,9 @@ files:
28
31
  - README.en.rdoc
29
32
  - Rakefile
30
33
  - bin/countchar
34
+ - bin/head.rb
35
+ - bin/tail.rb
36
+ - bin/textclean
31
37
  - lib/plain_text.rb
32
38
  - lib/plain_text/parse_rule.rb
33
39
  - lib/plain_text/part.rb
@@ -40,6 +46,10 @@ files:
40
46
  - test/test_plain_text_parse_rule.rb
41
47
  - test/test_plain_text_part.rb
42
48
  - test/test_plain_text_split.rb
49
+ - test/testcountchar.rb
50
+ - test/testhead_rb.rb
51
+ - test/testtail_rb.rb
52
+ - test/testtextclean.rb
43
53
  homepage: https://www.wisebabel.com
44
54
  licenses:
45
55
  - MIT
@@ -67,6 +77,10 @@ specification_version: 4
67
77
  summary: Module to handle Plain-Text
68
78
  test_files:
69
79
  - test/test_plain_text_parse_rule.rb
80
+ - test/testtail_rb.rb
70
81
  - test/test_plain_text_part.rb
71
82
  - test/test_plain_text.rb
83
+ - test/testcountchar.rb
84
+ - test/testtextclean.rb
72
85
  - test/test_plain_text_split.rb
86
+ - test/testhead_rb.rb