plain_text 0.2 → 0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/ChangeLog +16 -0
- data/README.en.rdoc +60 -11
- data/bin/countchar +14 -23
- data/bin/head.rb +87 -0
- data/bin/tail.rb +87 -0
- data/bin/textclean +103 -0
- data/lib/plain_text.rb +100 -70
- data/plain_text.gemspec +2 -2
- data/test/testcountchar.rb +46 -0
- data/test/testhead_rb.rb +70 -0
- data/test/testtail_rb.rb +70 -0
- data/test/testtextclean.rb +52 -0
- metadata +15 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e93dc475c0c5f66817fbe63e6824f1ec9d1a36c487126548eec0dead4dfde3f4
|
4
|
+
data.tar.gz: e7e64d6aa8dd28ea282cd11f0bfc7d0a55476735ec9b6e7db7a638a48e0457d8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e877ab86109aaf3078990ae1ac55f534026ae8ff941f22af38e39d6655356ed9b5a1f69d4d1f7cfc05efcf812e29efe185dba94e97db88c5383d49eb6b487579
|
7
|
+
data.tar.gz: c2cbd7f86fd1779ab0cd18136a44680f58fce87f3c9c8068e8ef7a5c049ed40d47f2a0614f169902780a9a362a0a6bf46fee7f626ef74cf94fb2425266bac1a1
|
data/ChangeLog
CHANGED
@@ -1,3 +1,19 @@
|
|
1
|
+
-----
|
2
|
+
(Version: 0.3)
|
3
|
+
2019-10-27 Masa Sakano
|
4
|
+
* Added 3 executables textclean, head.rb, tail.rb in bin/ together with their tests
|
5
|
+
* lib/plaintext.rb refactoring
|
6
|
+
* Added a new constant `DEF_METHOD_OPTS`
|
7
|
+
* bin/countchar refactoring
|
8
|
+
|
9
|
+
-----
|
10
|
+
(Version: 0.3)
|
11
|
+
2019-10-27 Masa Sakano
|
12
|
+
* Added 3 executables textclean, head.rb, tail.rb in bin/ together with their tests
|
13
|
+
* lib/plaintext.rb refactoring
|
14
|
+
* Added a new constant `DEF_METHOD_OPTS`
|
15
|
+
* bin/countchar refactoring
|
16
|
+
|
1
17
|
-----
|
2
18
|
(Version: 0.2)
|
3
19
|
2019-10-27 Masa Sakano
|
data/README.en.rdoc
CHANGED
@@ -7,8 +7,9 @@ This module provides utility functions and methods to handle plain
|
|
7
7
|
text. In the namespace, classes Part/Paragraph/Boundary are defined,
|
8
8
|
which represent the logical structure of a document and another class
|
9
9
|
ParseRule, which describes the rules to parse plain text to produce a Part-type Ruby instance.
|
10
|
-
This package also provides a command-line
|
11
|
-
of characters
|
10
|
+
This package also provides a few command-line programs, such as counting the number
|
11
|
+
of characters (especially useful for documents in Asian (CJK)
|
12
|
+
chatacters) and advanced head/tail commands.
|
12
13
|
|
13
14
|
== Design concept
|
14
15
|
|
@@ -93,7 +94,10 @@ it is applied to each Paragraph and Section separately to split them further.
|
|
93
94
|
standard methods to apply the rules to an object (either String or
|
94
95
|
{PlainText::Part}.
|
95
96
|
|
96
|
-
== Command-line
|
97
|
+
== Command-line tools
|
98
|
+
|
99
|
+
All the commands here accept +-h+ (or +--help+) option to print the
|
100
|
+
help message.
|
97
101
|
|
98
102
|
=== countchar
|
99
103
|
|
@@ -102,9 +106,54 @@ Counts the number of characters in a file(s) or STDIN.
|
|
102
106
|
The simplest example to run the command-line script is
|
103
107
|
countchar YourFile.txt
|
104
108
|
|
105
|
-
|
106
|
-
|
107
|
-
|
109
|
+
=== textclean
|
110
|
+
|
111
|
+
Wrapper command of {PlainText.clean_text}.
|
112
|
+
Outputs *cleaned* text, such as, truncating more than 3 linebreaks
|
113
|
+
into 2. See the reference of {PlainText.clean_text} for detail.
|
114
|
+
|
115
|
+
=== head.rb
|
116
|
+
|
117
|
+
This gives advanced functions, in addition to the standard +head+, including
|
118
|
+
|
119
|
+
Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line).
|
120
|
+
Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
|
121
|
+
Inverse:: It can inverse the counting to ouput everything but initial NUM lines.
|
122
|
+
|
123
|
+
A few examples are
|
124
|
+
|
125
|
+
head.rb -n 5 < try.txt
|
126
|
+
# the same as the UNIX head; printing the first 5 lines
|
127
|
+
|
128
|
+
head.rb -i -n 5 try.txt
|
129
|
+
# printing everything but the first 5 lines
|
130
|
+
# The same as the UNIX command: tail -n +5
|
131
|
+
|
132
|
+
head.rb -e '^===+' try.txt
|
133
|
+
# => first line up to the line that begins with more than 3 "="
|
134
|
+
|
135
|
+
head.rb -x -e '^===+' try.txt
|
136
|
+
# => first line up to the line before what begins with more than 3 "="
|
137
|
+
|
138
|
+
The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
|
139
|
+
|
140
|
+
=== tail.rb
|
141
|
+
|
142
|
+
This gives advanced functions, in addition to the standard +tail+, including
|
143
|
+
|
144
|
+
Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end).
|
145
|
+
Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
|
146
|
+
Inverse:: It can inverse the counting to ouput everything but the last NUM lines.
|
147
|
+
|
148
|
+
Note the UNIX form of
|
149
|
+
|
150
|
+
tail -n +5
|
151
|
+
|
152
|
+
(which I think is a bit counter-intuieive format) is equivalent to
|
153
|
+
|
154
|
+
head.rb -i -n 5
|
155
|
+
|
156
|
+
The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
|
108
157
|
|
109
158
|
== Miscellaneous
|
110
159
|
|
@@ -119,13 +168,13 @@ sent by a String instance +s+ = +"XQabXXcXQ"+:
|
|
119
168
|
s.split(/X+Q?/) #=> ["", "ab", "c"],
|
120
169
|
s.split(/X+Q?/, -1) #=> ["", "ab", "c", ""],
|
121
170
|
s.split(/X+(Q?)/, -1) #=> ["", "Q", "ab", "", "c", "Q", ""],
|
122
|
-
s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],
|
123
|
-
|
171
|
+
s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""],
|
172
|
+
|
124
173
|
With this method,
|
125
174
|
|
126
175
|
s.split_with_delimiter(/X+(Q?)/)
|
127
176
|
#=> ["", "XQ", "ab", "XX", "c", "XQ"]
|
128
|
-
|
177
|
+
|
129
178
|
from which the original string is always easily recovered by simple +join+.
|
130
179
|
|
131
180
|
Also, {PlainText::Util} contains some miscellaneous methods.
|
@@ -134,8 +183,6 @@ Also, {PlainText::Util} contains some miscellaneous methods.
|
|
134
183
|
|
135
184
|
Work in progress...
|
136
185
|
|
137
|
-
It is still in a preliminary state.
|
138
|
-
|
139
186
|
== Install
|
140
187
|
|
141
188
|
This script requires {Ruby}[http://www.ruby-lang.org] Version 2.0
|
@@ -153,6 +200,8 @@ explicitly with your Ruby command as
|
|
153
200
|
|
154
201
|
== Developer's note
|
155
202
|
|
203
|
+
The source code is maintained also in {Github}[https://github.com/masasakano/plain_text]
|
204
|
+
|
156
205
|
=== Tests
|
157
206
|
|
158
207
|
Ruby codes under the directory <tt>test/</tt> are the test scripts.
|
data/bin/countchar
CHANGED
@@ -11,22 +11,17 @@ __EOF__
|
|
11
11
|
|
12
12
|
# Initialising the hash for the command-line options.
|
13
13
|
OPTS = {
|
14
|
-
preserve_paragraph: true,
|
15
|
-
boundary_style: true,
|
16
|
-
lbs_style: :t, # :truncate,
|
17
|
-
lb_is_space: false,
|
18
|
-
sps_style: :truncate,
|
19
|
-
delete_asian_space: true,
|
20
|
-
linehead_style: :delete,
|
21
|
-
linetail_style: :delete,
|
22
|
-
firstsps_style: :delete,
|
23
|
-
lastsps_style: :truncate,
|
24
14
|
line_i: nil,
|
25
15
|
line_f: nil,
|
26
16
|
# :chatter => 3, # Default
|
27
17
|
debug: false,
|
28
18
|
}
|
29
19
|
|
20
|
+
# Load the default values from the Module
|
21
|
+
PlainText::DEF_METHOD_OPTS[:count_char].each_key do |ek|
|
22
|
+
OPTS[ek] ||= PlainText::DEF_METHOD_OPTS[:count_char][ek]
|
23
|
+
end
|
24
|
+
|
30
25
|
# Function to handle the command-line arguments.
|
31
26
|
#
|
32
27
|
# ARGV will be modified, and the constant variable OPTS is set.
|
@@ -45,8 +40,9 @@ def handle_argv
|
|
45
40
|
|
46
41
|
opt.parse!(ARGV)
|
47
42
|
|
43
|
+
OPTS[:lbs_style] = OPTS[:lbs_style].to_s[0].to_sym
|
48
44
|
unless %i(t d n).include? OPTS[:lbs_style]
|
49
|
-
warn "ERROR: --lbs-style must be one of (t(runcate)|d(elete)|n(one))."; exit 1
|
45
|
+
warn "ERROR: --lbs-style must be one of (t(runcate)|d(elete)|n(one)), but given (#{OPTS[:lbs_style].inspect})"; exit 1
|
50
46
|
end
|
51
47
|
|
52
48
|
OPTS
|
@@ -67,20 +63,15 @@ end
|
|
67
63
|
# Handle the command-line options => OPTS
|
68
64
|
opts = handle_argv()
|
69
65
|
|
66
|
+
valid_keys = PlainText::DEF_METHOD_OPTS[:count_char].keys
|
67
|
+
opts.each_key do |ek|
|
68
|
+
opts.delete ek if !valid_keys.include? ek
|
69
|
+
end
|
70
|
+
|
70
71
|
str = ARGF.read
|
71
72
|
|
72
|
-
puts
|
73
|
-
|
74
|
-
boundary_style: opts[:boundary_style],
|
75
|
-
lbs_style: opts[:lbs_style],
|
76
|
-
lb_is_space: opts[:lb_is_space],
|
77
|
-
sps_style: opts[:sps_style],
|
78
|
-
delete_asian_space: opts[:delete_asian_space],
|
79
|
-
linehead_style: opts[:linehead_style],
|
80
|
-
linetail_style: opts[:linetail_style],
|
81
|
-
firstsps_style: opts[:firstsps_style],
|
82
|
-
lastsps_style: opts[:lastsps_style],
|
83
|
-
)
|
73
|
+
puts PlainText.count_char(str, **opts)
|
74
|
+
# str.count_char() should be equivalent.
|
84
75
|
|
85
76
|
exit
|
86
77
|
|
data/bin/head.rb
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# -*- coding: utf-8 -*-
|
3
|
+
|
4
|
+
require 'optparse'
|
5
|
+
require 'plain_text'
|
6
|
+
|
7
|
+
BANNER = <<"__EOF__"
|
8
|
+
USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
|
9
|
+
Head command with (multi-byte) character-based manipulation and Regexp.
|
10
|
+
__EOF__
|
11
|
+
|
12
|
+
# Initialising the hash for the command-line options.
|
13
|
+
OPTS = {
|
14
|
+
num: PlainText::DEF_HEADTAIL_N_LINES,
|
15
|
+
unit: :line,
|
16
|
+
inclusive: true,
|
17
|
+
inverse: false, # unique option
|
18
|
+
# :chatter => 3, # Default
|
19
|
+
debug: false,
|
20
|
+
}
|
21
|
+
|
22
|
+
# Function to handle the command-line arguments.
|
23
|
+
#
|
24
|
+
# ARGV will be modified, and the constant variable OPTS is set.
|
25
|
+
#
|
26
|
+
# @return [Hash] Optional-argument hash.
|
27
|
+
#
|
28
|
+
def handle_argv
|
29
|
+
opt = OptionParser.new(BANNER)
|
30
|
+
opt.separator "Options:" # Way to control a help message.
|
31
|
+
opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
|
32
|
+
opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
|
33
|
+
opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
|
34
|
+
opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
|
35
|
+
opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
|
36
|
+
opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
|
37
|
+
# opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
|
38
|
+
# opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
|
39
|
+
# opt.separator "" # Way to control a help message.
|
40
|
+
# opt.separator "Note:"
|
41
|
+
# opt.separator " Spaces are truncated in default."
|
42
|
+
|
43
|
+
begin
|
44
|
+
opt.parse!(ARGV)
|
45
|
+
rescue OptionParser::MissingArgument => er
|
46
|
+
# Missing argument like "-b" without a number.
|
47
|
+
warn er
|
48
|
+
exit 1
|
49
|
+
end
|
50
|
+
|
51
|
+
OPTS
|
52
|
+
end
|
53
|
+
|
54
|
+
|
55
|
+
################################################
|
56
|
+
# MAIN
|
57
|
+
################################################
|
58
|
+
|
59
|
+
$stdout.sync=true
|
60
|
+
$stderr.sync=true
|
61
|
+
|
62
|
+
class String
|
63
|
+
include PlainText
|
64
|
+
end
|
65
|
+
|
66
|
+
# Handle the command-line options => OPTS
|
67
|
+
opts = handle_argv()
|
68
|
+
num_in = opts[:num]
|
69
|
+
is_inverse = opts[:inverse]
|
70
|
+
|
71
|
+
%i(num inverse debug).each do |ek|
|
72
|
+
opts.delete ek if opts.has_key? ek
|
73
|
+
end
|
74
|
+
|
75
|
+
str = ARGF.read
|
76
|
+
|
77
|
+
# A linebreak guaranteed at the end.
|
78
|
+
if is_inverse
|
79
|
+
puts PlainText.head_inverse(str, num_in, **opts)
|
80
|
+
else
|
81
|
+
puts PlainText.head(str, num_in, **opts)
|
82
|
+
end
|
83
|
+
|
84
|
+
exit
|
85
|
+
|
86
|
+
__END__
|
87
|
+
|
data/bin/tail.rb
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# -*- coding: utf-8 -*-
|
3
|
+
|
4
|
+
require 'optparse'
|
5
|
+
require 'plain_text'
|
6
|
+
|
7
|
+
BANNER = <<"__EOF__"
|
8
|
+
USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
|
9
|
+
tail command with (multi-byte) character-based manipulation and Regexp.
|
10
|
+
__EOF__
|
11
|
+
|
12
|
+
# Initialising the hash for the command-line options.
|
13
|
+
OPTS = {
|
14
|
+
num: PlainText::DEF_HEADTAIL_N_LINES,
|
15
|
+
unit: :line,
|
16
|
+
inclusive: true,
|
17
|
+
inverse: false, # unique option
|
18
|
+
# :chatter => 3, # Default
|
19
|
+
debug: false,
|
20
|
+
}
|
21
|
+
|
22
|
+
# Function to handle the command-line arguments.
|
23
|
+
#
|
24
|
+
# ARGV will be modified, and the constant variable OPTS is set.
|
25
|
+
#
|
26
|
+
# @return [Hash] Optional-argument hash.
|
27
|
+
#
|
28
|
+
def handle_argv
|
29
|
+
opt = OptionParser.new(BANNER)
|
30
|
+
opt.separator "Options:" # Way to control a help message.
|
31
|
+
opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
|
32
|
+
opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
|
33
|
+
opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
|
34
|
+
opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
|
35
|
+
opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
|
36
|
+
opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
|
37
|
+
# opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
|
38
|
+
# opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
|
39
|
+
opt.separator "" # Way to control a help message.
|
40
|
+
opt.separator "Note:"
|
41
|
+
opt.separator " UNIX command of 'tail -n +5' is equivalent to 'head.rb -i -n 5'"
|
42
|
+
|
43
|
+
begin
|
44
|
+
opt.parse!(ARGV)
|
45
|
+
rescue OptionParser::MissingArgument => er
|
46
|
+
# Missing argument like "-b" without a number.
|
47
|
+
warn er
|
48
|
+
exit 1
|
49
|
+
end
|
50
|
+
|
51
|
+
OPTS
|
52
|
+
end
|
53
|
+
|
54
|
+
|
55
|
+
################################################
|
56
|
+
# MAIN
|
57
|
+
################################################
|
58
|
+
|
59
|
+
$stdout.sync=true
|
60
|
+
$stderr.sync=true
|
61
|
+
|
62
|
+
class String
|
63
|
+
include PlainText
|
64
|
+
end
|
65
|
+
|
66
|
+
# Handle the command-line options => OPTS
|
67
|
+
opts = handle_argv()
|
68
|
+
num_in = opts[:num]
|
69
|
+
is_inverse = opts[:inverse]
|
70
|
+
|
71
|
+
%i(num inverse debug).each do |ek|
|
72
|
+
opts.delete ek if opts.has_key? ek
|
73
|
+
end
|
74
|
+
|
75
|
+
str = ARGF.read
|
76
|
+
|
77
|
+
# A linebreak guaranteed at the end.
|
78
|
+
if is_inverse
|
79
|
+
puts PlainText.tail_inverse(str, num_in, **opts)
|
80
|
+
else
|
81
|
+
puts PlainText.tail(str, num_in, **opts)
|
82
|
+
end
|
83
|
+
|
84
|
+
exit
|
85
|
+
|
86
|
+
__END__
|
87
|
+
|
data/bin/textclean
ADDED
@@ -0,0 +1,103 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# -*- coding: utf-8 -*-
|
3
|
+
|
4
|
+
require 'optparse'
|
5
|
+
require 'plain_text'
|
6
|
+
|
7
|
+
BANNER = <<"__EOF__"
|
8
|
+
USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
|
9
|
+
Clean the text file INFILE (or STDIN), unifying linebreaks, and outputs it.
|
10
|
+
__EOF__
|
11
|
+
|
12
|
+
# Initialising the hash for the command-line options.
|
13
|
+
OPTS = {
|
14
|
+
line_i: nil,
|
15
|
+
line_f: nil,
|
16
|
+
# :chatter => 3, # Default
|
17
|
+
debug: false,
|
18
|
+
}
|
19
|
+
|
20
|
+
# Load the default values from the Module
|
21
|
+
PlainText::DEF_METHOD_OPTS[:clean_text].each_key do |ek|
|
22
|
+
OPTS[ek] ||= PlainText::DEF_METHOD_OPTS[:clean_text][ek]
|
23
|
+
end
|
24
|
+
|
25
|
+
# Function to handle the command-line arguments.
|
26
|
+
#
|
27
|
+
# ARGV will be modified, and the constant variable OPTS is set.
|
28
|
+
#
|
29
|
+
# @return [Hash] Optional-argument hash.
|
30
|
+
#
|
31
|
+
def handle_argv
|
32
|
+
opt = OptionParser.new(BANNER)
|
33
|
+
opt.on( '--[no-]preserve_paragraph', sprintf("Preserved paragraph structures? (Def: %s)", OPTS[:preserve_paragraph].inspect), TrueClass) {|v| OPTS[:preserve_paragraph] = v}
|
34
|
+
opt.on( '--boundary-style=STYLE', sprintf("One of (t(runcate)(2)|d(elete)|n(one)) (Def: truncate).")) { |v| OPTS[:boundary_style]=v.strip }
|
35
|
+
opt.on( '--lbs-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:lbs_style])) { |v| OPTS[:lbs_style]=v.strip }
|
36
|
+
opt.on( '--[no-]lb-is-space', sprintf("Linebraeks are equivalent to spaces? (Def: %s)", OPTS[:lb_is_space].inspect), TrueClass) {|v| OPTS[:lb_is_space] = v}
|
37
|
+
opt.on( '--sps-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:sps_style])) { |v| OPTS[:sps_style]=v.strip }
|
38
|
+
opt.on( '--[no-]delete-asian-space', sprintf("Deletes spaces between, before or after a CJK character? (Def: %s)", OPTS[:delete_asian_space].inspect), TrueClass) {|v| OPTS[:delete_asian_space] = v}
|
39
|
+
opt.on( '--linehead-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:linehead_style])) { |v| OPTS[:linehead_style]=v.strip }
|
40
|
+
opt.on( '--linetail-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:linetail_style])) { |v| OPTS[:linetail_style]=v.strip }
|
41
|
+
opt.on( '--firstlbs-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)) (Def: %s).", OPTS[:firstlbs_style])) { |v| OPTS[:firstlbs_style]=v.strip }
|
42
|
+
opt.on( '--lastsps-style=STYLE', sprintf("One of (t(runcate)|d(elete)|n(one)|m(arkdown)) (Def: %s).", OPTS[:lastsps_style])) { |v| OPTS[:lastsps_style]=v.strip }
|
43
|
+
# opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
|
44
|
+
opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
|
45
|
+
# opt.separator "" # Way to control a help message.
|
46
|
+
# opt.separator "Note:"
|
47
|
+
# opt.separator " Spaces are truncated in default."
|
48
|
+
|
49
|
+
opt.parse!(ARGV)
|
50
|
+
|
51
|
+
if (OPTS[:boundary_style].class.method_defined?(:to_str) &&
|
52
|
+
/\A(t(runcate)?(2)?|d(elete)?|n(one)?)\z/ =~ OPTS[:boundary_style])
|
53
|
+
OPTS[:boundary_style] = OPTS[:boundary_style].to_sym
|
54
|
+
end
|
55
|
+
|
56
|
+
%w(lbs sps linehead linetail firstlbs lastsps).each do |ek_head|
|
57
|
+
sym_k = (ek_head+"_style").to_sym
|
58
|
+
trysym = OPTS[sym_k].to_s[0].to_sym # Symbol of 1 character (nb., NOT boundary_style)
|
59
|
+
if (!(%i(t d n).include? trysym) && (sym_k != :lastsps_style) ||
|
60
|
+
!(%i(t d n m).include? trysym) && (sym_k == :lastsps_style))
|
61
|
+
errmsg = sprintf(
|
62
|
+
"ERROR: --%s-style must be one of (t(runcate)|d(elete)%s|n(one)), but given %s.",
|
63
|
+
ek_head,
|
64
|
+
((ek_head == "lastsps") ? "|m(arkdown)" : ""),
|
65
|
+
OPTS[sym_k].inspect
|
66
|
+
)
|
67
|
+
warn errmsg
|
68
|
+
exit 1
|
69
|
+
end
|
70
|
+
OPTS[sym_k] = trysym
|
71
|
+
end
|
72
|
+
|
73
|
+
OPTS
|
74
|
+
end
|
75
|
+
|
76
|
+
|
77
|
+
################################################
|
78
|
+
# MAIN
|
79
|
+
################################################
|
80
|
+
|
81
|
+
$stdout.sync=true
|
82
|
+
$stderr.sync=true
|
83
|
+
|
84
|
+
class String
|
85
|
+
include PlainText
|
86
|
+
end
|
87
|
+
|
88
|
+
# Handle the command-line options => OPTS
|
89
|
+
opts = handle_argv()
|
90
|
+
|
91
|
+
valid_keys = PlainText::DEF_METHOD_OPTS[:clean_text].keys
|
92
|
+
opts.each_key do |ek|
|
93
|
+
opts.delete ek if !valid_keys.include? ek
|
94
|
+
end
|
95
|
+
|
96
|
+
str = ARGF.read
|
97
|
+
|
98
|
+
print PlainText.clean_text(str, **opts)
|
99
|
+
|
100
|
+
exit
|
101
|
+
|
102
|
+
__END__
|
103
|
+
|
data/lib/plain_text.rb
CHANGED
@@ -25,6 +25,36 @@ module PlainText
|
|
25
25
|
# Default number of lines to extract for {#head} and {#tail}
|
26
26
|
DEF_HEADTAIL_N_LINES = 10
|
27
27
|
|
28
|
+
# Default options for class/instance methods
|
29
|
+
DEF_METHOD_OPTS = {
|
30
|
+
:clean_text => {
|
31
|
+
preserve_paragraph: true,
|
32
|
+
boundary_style: true, # If unspecified, will be replaced with lb_out * 2
|
33
|
+
lbs_style: :truncate,
|
34
|
+
lb_is_space: false,
|
35
|
+
sps_style: :truncate,
|
36
|
+
delete_asian_space: true,
|
37
|
+
linehead_style: :none,
|
38
|
+
linetail_style: :delete,
|
39
|
+
firstlbs_style: :delete,
|
40
|
+
lastsps_style: :truncate,
|
41
|
+
lb: $/,
|
42
|
+
lb_out: nil, # If unspecified, will be replaced with lb
|
43
|
+
},
|
44
|
+
:count_char => {
|
45
|
+
lbs_style: :delete,
|
46
|
+
linehead_style: :delete,
|
47
|
+
lastsps_style: :delete,
|
48
|
+
lb_out: "\n",
|
49
|
+
},
|
50
|
+
}
|
51
|
+
|
52
|
+
# Adjusts DEF_METHOD_OPTS[:count_char]
|
53
|
+
DEF_METHOD_OPTS[:clean_text].each_key do |ek|
|
54
|
+
# %i(preserve_paragraph boundary_style lb_is_space sps_style delete_asian_space linetail_style firstlbs_style lb).each do |ek|
|
55
|
+
DEF_METHOD_OPTS[:count_char][ek] ||= DEF_METHOD_OPTS[:clean_text][ek]
|
56
|
+
end
|
57
|
+
|
28
58
|
# Call instance method as a Module function
|
29
59
|
#
|
30
60
|
# The return String includes {PlainText} as Singleton.
|
@@ -39,33 +69,39 @@ module PlainText
|
|
39
69
|
end
|
40
70
|
|
41
71
|
# If the class of the obj does not "include" this module, do so in the singular class.
|
42
|
-
#
|
72
|
+
#
|
43
73
|
# @param obj [Object] Maybe String. For which a singular class def is run, if the condition is met.
|
44
74
|
# @return [TrueClass, NilClass] true if the singular class def is run. Else nil.
|
45
75
|
def self.extend_this(obj)
|
46
|
-
return nil if defined? obj.delete_spaces_bw_cjk_european!
|
76
|
+
return nil if defined? obj.delete_spaces_bw_cjk_european!
|
47
77
|
obj.extend(PlainText)
|
48
78
|
true
|
49
79
|
end
|
50
80
|
|
51
|
-
#
|
81
|
+
# Count the number of characters
|
82
|
+
#
|
83
|
+
# See {PlainText#clean_text!} for the optional parameters. The defaults of a few of the optional parameters are different from it,
|
84
|
+
# such as the default for +lb_out+ is +"\n"+ (newline, so that a line-break is 1 byte in size).
|
85
|
+
# It is so that this method is more optimized for East-Asian (CJK) characters, given this method is most useful for CJK Strings,
|
86
|
+
# whereas, for European alphabets, counting the number of words, rather than characters as in this method, would be more standard.
|
52
87
|
#
|
53
88
|
# @param instr [String] String for which the number of chars is counted
|
54
89
|
# @param (see #count_char)
|
55
90
|
# @return [Integer]
|
56
91
|
def self.count_char(instr, *rest,
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
92
|
+
lbs_style: DEF_METHOD_OPTS[:count_char][:lbs_style],
|
93
|
+
linehead_style: DEF_METHOD_OPTS[:count_char][:linehead_style],
|
94
|
+
lastsps_style: DEF_METHOD_OPTS[:count_char][:lastsps_style],
|
95
|
+
lb_out: DEF_METHOD_OPTS[:count_char][:lb_out],
|
96
|
+
**k
|
97
|
+
)
|
98
|
+
clean_text(instr, *rest, lbs_style: lbs_style, linehead_style: linehead_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
|
63
99
|
end
|
64
100
|
|
65
101
|
|
66
102
|
# Cleans the text
|
67
103
|
#
|
68
|
-
# Such as, removing extra spaces, normalising the linebreaks, etc.
|
104
|
+
# Such as, removing extra spaces, normalising the linebreaks, etc.
|
69
105
|
#
|
70
106
|
# In default,
|
71
107
|
#
|
@@ -77,9 +113,9 @@ module PlainText
|
|
77
113
|
# * Trailing white spaces in each line are deleted: +linetail_style=:delete+
|
78
114
|
# * Line-breaks at the beginning of the entire input string are deleted: +firstlbs_style=:delete+
|
79
115
|
# * Trailing white spaces and line-breaks at the end of the entire input string are truncated into a single linebreak: +lastsps_style=:truncate+
|
80
|
-
#
|
116
|
+
#
|
81
117
|
# For a String with predominantly CJK characters, the following setting is recommended:
|
82
|
-
#
|
118
|
+
#
|
83
119
|
# * +lbs_style: :delete+
|
84
120
|
# * +delete_asian_space: true+ (Default)
|
85
121
|
#
|
@@ -111,26 +147,26 @@ module PlainText
|
|
111
147
|
#
|
112
148
|
def self.clean_text(
|
113
149
|
prt,
|
114
|
-
preserve_paragraph:
|
115
|
-
boundary_style:
|
116
|
-
lbs_style:
|
117
|
-
lb_is_space:
|
118
|
-
sps_style:
|
119
|
-
delete_asian_space:
|
120
|
-
linehead_style: :
|
121
|
-
linetail_style: :
|
122
|
-
firstlbs_style: :
|
123
|
-
lastsps_style: :
|
124
|
-
lb:
|
125
|
-
lb_out:
|
150
|
+
preserve_paragraph: DEF_METHOD_OPTS[:clean_text][:preserve_paragraph],
|
151
|
+
boundary_style: DEF_METHOD_OPTS[:clean_text][:boundary_style], # If unspecified, will be replaced with lb_out * 2
|
152
|
+
lbs_style: DEF_METHOD_OPTS[:clean_text][:lbs_style],
|
153
|
+
lb_is_space: DEF_METHOD_OPTS[:clean_text][:lb_is_space],
|
154
|
+
sps_style: DEF_METHOD_OPTS[:clean_text][:sps_style],
|
155
|
+
delete_asian_space: DEF_METHOD_OPTS[:clean_text][:delete_asian_space],
|
156
|
+
linehead_style: DEF_METHOD_OPTS[:clean_text][:linehead_style],
|
157
|
+
linetail_style: DEF_METHOD_OPTS[:clean_text][:linetail_style],
|
158
|
+
firstlbs_style: DEF_METHOD_OPTS[:clean_text][:firstlbs_style],
|
159
|
+
lastsps_style: DEF_METHOD_OPTS[:clean_text][:lastsps_style],
|
160
|
+
lb: DEF_METHOD_OPTS[:clean_text][:lb],
|
161
|
+
lb_out: DEF_METHOD_OPTS[:clean_text][:lb_out], # If unspecified, will be replaced with lb
|
126
162
|
is_debug: false
|
127
163
|
)
|
128
164
|
|
129
|
-
isdebug = true if prt == "\n
|
165
|
+
#isdebug = true if prt == "foo\n\n\nbar\n"
|
130
166
|
lb_out ||= lb # Output linebreak
|
131
167
|
boundary_style = lb_out*2 if true == boundary_style
|
132
168
|
boundary_style = "" if [:delete, :d].include? boundary_style
|
133
|
-
lastsps_style = lb_out if :linebreak == lastsps_style
|
169
|
+
lastsps_style = lb_out if :linebreak == lastsps_style
|
134
170
|
|
135
171
|
if !prt.class.method_defined? :last_significant_element
|
136
172
|
# Construct a Part instance from the given String.
|
@@ -172,7 +208,7 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
172
208
|
clean_text_file_head_tail!( prt,
|
173
209
|
firstlbs_style: firstlbs_style,
|
174
210
|
lastsps_style: lastsps_style,
|
175
|
-
is_debug:
|
211
|
+
is_debug: is_debug
|
176
212
|
)
|
177
213
|
|
178
214
|
# Replaces the linebreaks to the specified one
|
@@ -254,13 +290,13 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
254
290
|
# Class methods (Private)
|
255
291
|
##########
|
256
292
|
|
257
|
-
# @param prt [PlainText:Part] (see
|
258
|
-
# @param boundary_style (see
|
293
|
+
# @param prt [PlainText:Part] (see PlainText.clean_text)
|
294
|
+
# @param boundary_style (see PlainText.clean_text)
|
259
295
|
# @return [void]
|
260
296
|
#
|
261
|
-
# @see
|
297
|
+
# @see PlainText.clean_text
|
262
298
|
def self.clean_text_boundary!( prt,
|
263
|
-
boundary_style:
|
299
|
+
boundary_style: ,
|
264
300
|
is_debug: false
|
265
301
|
)
|
266
302
|
|
@@ -280,20 +316,20 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
280
316
|
end # self.clean_text_boundary!
|
281
317
|
private_class_method :clean_text_boundary!
|
282
318
|
|
283
|
-
# @param prt [PlainText:Part] (see
|
284
|
-
# @param lbs_style (see
|
285
|
-
# @param sps_style (see
|
286
|
-
# @param lb_is_space (see
|
287
|
-
# @param delete_asian_space (see
|
319
|
+
# @param prt [PlainText:Part] (see PlainText.clean_text)
|
320
|
+
# @param lbs_style (see PlainText.clean_text)
|
321
|
+
# @param sps_style (see PlainText.clean_text)
|
322
|
+
# @param lb_is_space (see PlainText.clean_text)
|
323
|
+
# @param delete_asian_space (see PlainText.clean_text)
|
288
324
|
# @return [void]
|
289
325
|
#
|
290
|
-
# @see
|
326
|
+
# @see PlainText.clean_text
|
291
327
|
def self.clean_text_lbs_sps!(
|
292
328
|
prt,
|
293
|
-
lbs_style:
|
294
|
-
lb_is_space:
|
295
|
-
sps_style:
|
296
|
-
delete_asian_space:
|
329
|
+
lbs_style: ,
|
330
|
+
lb_is_space: ,
|
331
|
+
sps_style: ,
|
332
|
+
delete_asian_space: ,
|
297
333
|
is_debug: false
|
298
334
|
)
|
299
335
|
|
@@ -328,16 +364,16 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
328
364
|
end # self.clean_text_lbs_sps!
|
329
365
|
private_class_method :clean_text_lbs_sps!
|
330
366
|
|
331
|
-
# @param prt [PlainText:Part] (see
|
332
|
-
# @param linehead_style [Symbol, String] (see
|
333
|
-
# @param linetail_style [Symbol, String] (see
|
367
|
+
# @param prt [PlainText:Part] (see PlainText.clean_text)
|
368
|
+
# @param linehead_style [Symbol, String] (see PlainText.clean_text)
|
369
|
+
# @param linetail_style [Symbol, String] (see PlainText.clean_text)
|
334
370
|
# @return [void]
|
335
371
|
#
|
336
|
-
# @see
|
372
|
+
# @see PlainText.clean_text
|
337
373
|
def self.clean_text_line_head_tail!(
|
338
374
|
prt,
|
339
|
-
linehead_style:
|
340
|
-
linetail_style:
|
375
|
+
linehead_style: ,
|
376
|
+
linetail_style: ,
|
341
377
|
is_debug: false
|
342
378
|
)
|
343
379
|
|
@@ -371,16 +407,16 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
371
407
|
end # self.clean_text_line_head_tail!
|
372
408
|
private_class_method :clean_text_line_head_tail!
|
373
409
|
|
374
|
-
# @param prt [PlainText:Part] (see
|
375
|
-
# @param firstlbs_style [Symbol, String] (see
|
376
|
-
# @param lastsps_style [Symbol, String] (see
|
410
|
+
# @param prt [PlainText:Part] (see PlainText.clean_text#prt)
|
411
|
+
# @param firstlbs_style [Symbol, String] (see PlainText.clean_text#firstlbs_style)
|
412
|
+
# @param lastsps_style [Symbol, String] (see PlainText.clean_text#lastsps_style)
|
377
413
|
# @return [void]
|
378
414
|
#
|
379
|
-
# @see
|
415
|
+
# @see PlainText.clean_text
|
380
416
|
def self.clean_text_file_head_tail!(
|
381
417
|
prt,
|
382
|
-
firstlbs_style:
|
383
|
-
lastsps_style:
|
418
|
+
firstlbs_style: ,
|
419
|
+
lastsps_style: ,
|
384
420
|
is_debug: false
|
385
421
|
)
|
386
422
|
|
@@ -452,19 +488,18 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
452
488
|
#
|
453
489
|
# uses Part to transform a Paragraph into a Part
|
454
490
|
#
|
455
|
-
# @param prt [PlainText:Part] (see
|
456
|
-
# @param sps_style (see
|
491
|
+
# @param prt [PlainText:Part] (see PlainText.clean_text)
|
492
|
+
# @param sps_style (see PlainText.clean_text)
|
457
493
|
# @return [void]
|
458
494
|
#
|
459
|
-
# @see
|
495
|
+
# @see PlainText.clean_text
|
460
496
|
def self.clean_text_sps!(
|
461
497
|
prt,
|
462
|
-
sps_style:
|
498
|
+
sps_style: ,
|
463
499
|
is_debug: false
|
464
500
|
)
|
465
501
|
|
466
502
|
prt.parts.each do |e_pa|
|
467
|
-
ru = ParseRule
|
468
503
|
# Each line treated as a Paragraph, and [[:space:]]+ between them as a Boundary.
|
469
504
|
# Then, to work on anything within a line except for line-head/tail is easy.
|
470
505
|
prt_para = Part.parse(e_pa, rule: ParseRule::RuleEachLineStrip).map_parts { |e_li|
|
@@ -490,21 +525,16 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
490
525
|
####################################################
|
491
526
|
|
492
527
|
# Count the number of characters
|
493
|
-
#
|
494
|
-
# See {PlainText
|
495
|
-
# such as the default for +lb_out+ is "\n" (so that a line-break is 1 byte in size).
|
528
|
+
#
|
529
|
+
# See {PlainText.count_char} and further {PlainText.clean_text!} for the optional parameters. The defaults of a few of the optional parameters are different from the latter,
|
530
|
+
# such as the default for +lb_out+ is +"\n"+ (newline, so that a line-break is 1 byte in size).
|
496
531
|
# It is so that this method is more optimized for East-Asian (CJK) characters, given this method is most useful for CJK Strings,
|
497
532
|
# whereas, for European alphabets, counting the number of words, rather than characters as in this method, would be more standard.
|
498
533
|
#
|
499
|
-
# @param (see PlainText
|
534
|
+
# @param (see {PlainText.count_char})
|
500
535
|
# @return [Integer]
|
501
|
-
def count_char(*rest,
|
502
|
-
|
503
|
-
linehead_style: :delete,
|
504
|
-
lastsps_style: :none,
|
505
|
-
lb_out: "\n",
|
506
|
-
**k)
|
507
|
-
PlainText.clean_text(self, *rest, lbs_style: lbs_style, lastsps_style: lastsps_style, lb_out: lb_out, **k).size
|
536
|
+
def count_char(*rest, **k)
|
537
|
+
PlainText.public_send(__method__, self, *rest, **k)
|
508
538
|
end
|
509
539
|
|
510
540
|
# Delete all the spaces between CJK and European characters or numbers.
|
@@ -732,7 +762,7 @@ isdebug = true if prt == "\n \n abc\n\n \ndef\n\n \n\n"
|
|
732
762
|
# till the last one is returned. "The next line" means (1) the line immediately after the match
|
733
763
|
# if the matched string has the linebreak at the end, or (2) the line after the first linebreak after the matched string,
|
734
764
|
# where the trailing characters after the matched string to the linebreak (inclusive) is ignored.
|
735
|
-
#
|
765
|
+
#
|
736
766
|
# = Tips =
|
737
767
|
# To specify the *last* line that matches the Regexp, consider prefixing +(?:.*)+ with the option +m+,
|
738
768
|
# e.g., +/(?:.*)ABC/m+
|
data/plain_text.gemspec
CHANGED
@@ -5,9 +5,9 @@ require 'date'
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{plain_text}.sub(/.*/){|c| (c == File.basename(Dir.pwd)) ? c : raise("ERROR: s.name=(#{c}) in gemspec seems wrong!")}
|
8
|
-
s.version = "0.
|
8
|
+
s.version = "0.3"
|
9
9
|
# s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
10
|
-
%w(countchar).each do |f|
|
10
|
+
%w(countchar textclean head.rb tail.rb).each do |f|
|
11
11
|
path = s.bindir+'/'+f
|
12
12
|
File.executable?(path) ? s.executables << f : raise("ERROR: Executable (#{path}) is not executable!")
|
13
13
|
end
|
@@ -0,0 +1,46 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
# Tests of an executable.
|
4
|
+
#
|
5
|
+
# @author: M. Sakano (Wise Babel Ltd)
|
6
|
+
|
7
|
+
require 'open3'
|
8
|
+
|
9
|
+
$stdout.sync=true
|
10
|
+
$stderr.sync=true
|
11
|
+
# print '$LOAD_PATH=';p $LOAD_PATH
|
12
|
+
|
13
|
+
#################################################
|
14
|
+
# Unit Test
|
15
|
+
#################################################
|
16
|
+
|
17
|
+
gem "minitest"
|
18
|
+
# require 'minitest/unit'
|
19
|
+
require 'minitest/autorun'
|
20
|
+
|
21
|
+
class TestUnitCountchar < MiniTest::Test
|
22
|
+
T = true
|
23
|
+
F = false
|
24
|
+
SCFNAME = File.basename(__FILE__)
|
25
|
+
EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1')]
|
26
|
+
|
27
|
+
def setup
|
28
|
+
end
|
29
|
+
|
30
|
+
def teardown
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_countchar01
|
34
|
+
o, e, s = Open3.capture3 EXE
|
35
|
+
assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
36
|
+
assert_equal "0", o.chomp
|
37
|
+
assert_empty e
|
38
|
+
|
39
|
+
stin = "foo\n\n\nbar\n"
|
40
|
+
o, e, s = Open3.capture3 EXE, stdin_data: stin
|
41
|
+
assert_equal 0, s.exitstatus
|
42
|
+
assert_equal stin.size-2, o.to_i
|
43
|
+
assert_empty e
|
44
|
+
end
|
45
|
+
end # class TestUnitCountchar < MiniTest::Test
|
46
|
+
|
data/test/testhead_rb.rb
ADDED
@@ -0,0 +1,70 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
# Tests of an executable.
|
4
|
+
#
|
5
|
+
# @author: M. Sakano (Wise Babel Ltd)
|
6
|
+
|
7
|
+
require 'open3'
|
8
|
+
|
9
|
+
$stdout.sync=true
|
10
|
+
$stderr.sync=true
|
11
|
+
# print '$LOAD_PATH=';p $LOAD_PATH
|
12
|
+
|
13
|
+
#################################################
|
14
|
+
# Unit Test
|
15
|
+
#################################################
|
16
|
+
|
17
|
+
gem "minitest"
|
18
|
+
# require 'minitest/unit'
|
19
|
+
require 'minitest/autorun'
|
20
|
+
|
21
|
+
class TestUnitHeadRb < MiniTest::Test
|
22
|
+
T = true
|
23
|
+
F = false
|
24
|
+
SCFNAME = File.basename(__FILE__)
|
25
|
+
EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1').sub(/_rb$/, '.rb')]
|
26
|
+
|
27
|
+
def setup
|
28
|
+
end
|
29
|
+
|
30
|
+
def teardown
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_countchar01
|
34
|
+
o, e, s = Open3.capture3 EXE
|
35
|
+
assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
36
|
+
assert_equal "\n", o
|
37
|
+
assert_empty e
|
38
|
+
|
39
|
+
stin = "1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\n"
|
40
|
+
o, e, s = Open3.capture3 EXE, stdin_data: stin
|
41
|
+
assert_equal 0, s.exitstatus
|
42
|
+
assert_equal stin[0..19], o
|
43
|
+
assert_empty e
|
44
|
+
|
45
|
+
o, e, s = Open3.capture3 EXE+' -i', stdin_data: stin
|
46
|
+
assert_equal 0, s.exitstatus
|
47
|
+
assert_equal stin[20..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
48
|
+
assert_empty e
|
49
|
+
|
50
|
+
o, e, s = Open3.capture3 EXE+' -n 10', stdin_data: stin
|
51
|
+
assert_equal 0, s.exitstatus
|
52
|
+
assert_equal stin[0..19], o
|
53
|
+
assert_empty e
|
54
|
+
|
55
|
+
o, e, s = Open3.capture3 EXE+' -b', stdin_data: stin
|
56
|
+
assert_equal 1, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
57
|
+
assert_match(/missing/i, e)
|
58
|
+
|
59
|
+
o, e, s = Open3.capture3 EXE+' -e "[5-9]"', stdin_data: stin
|
60
|
+
assert_equal 0, s.exitstatus
|
61
|
+
assert_equal stin[0..9], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
62
|
+
assert_empty e
|
63
|
+
|
64
|
+
o, e, s = Open3.capture3 EXE+' -e "[5-9]" -x', stdin_data: stin
|
65
|
+
assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
66
|
+
assert_equal stin[0..7], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
67
|
+
assert_empty e
|
68
|
+
end
|
69
|
+
end # class TestUnitHeadRb < MiniTest::Test
|
70
|
+
|
data/test/testtail_rb.rb
ADDED
@@ -0,0 +1,70 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
# Tests of an executable.
|
4
|
+
#
|
5
|
+
# @author: M. Sakano (Wise Babel Ltd)
|
6
|
+
|
7
|
+
require 'open3'
|
8
|
+
|
9
|
+
$stdout.sync=true
|
10
|
+
$stderr.sync=true
|
11
|
+
# print '$LOAD_PATH=';p $LOAD_PATH
|
12
|
+
|
13
|
+
#################################################
|
14
|
+
# Unit Test
|
15
|
+
#################################################
|
16
|
+
|
17
|
+
gem "minitest"
|
18
|
+
# require 'minitest/unit'
|
19
|
+
require 'minitest/autorun'
|
20
|
+
|
21
|
+
class TestUnitTailRb < MiniTest::Test
|
22
|
+
T = true
|
23
|
+
F = false
|
24
|
+
SCFNAME = File.basename(__FILE__)
|
25
|
+
EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1').sub(/_rb$/, '.rb')]
|
26
|
+
|
27
|
+
def setup
|
28
|
+
end
|
29
|
+
|
30
|
+
def teardown
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_countchar01
|
34
|
+
o, e, s = Open3.capture3 EXE
|
35
|
+
assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
36
|
+
assert_equal "\n", o
|
37
|
+
assert_empty e
|
38
|
+
|
39
|
+
stin = "1\n2\n3\n4\n5\n6\n7\n8\n9\nA\nB\n"
|
40
|
+
o, e, s = Open3.capture3 EXE, stdin_data: stin
|
41
|
+
assert_equal 0, s.exitstatus
|
42
|
+
assert_equal stin[2..-1], o
|
43
|
+
assert_empty e
|
44
|
+
|
45
|
+
o, e, s = Open3.capture3 EXE+' -i', stdin_data: stin
|
46
|
+
assert_equal 0, s.exitstatus
|
47
|
+
assert_equal stin[0..1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
48
|
+
assert_empty e
|
49
|
+
|
50
|
+
o, e, s = Open3.capture3 EXE+' -n 10', stdin_data: stin
|
51
|
+
assert_equal 0, s.exitstatus
|
52
|
+
assert_equal stin[2..-1], o
|
53
|
+
assert_empty e
|
54
|
+
|
55
|
+
o, e, s = Open3.capture3 EXE+' -b', stdin_data: stin
|
56
|
+
assert_equal 1, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
57
|
+
assert_match(/missing/i, e)
|
58
|
+
|
59
|
+
o, e, s = Open3.capture3 EXE+' -e "[5-9]"', stdin_data: stin
|
60
|
+
assert_equal 0, s.exitstatus
|
61
|
+
assert_equal stin[-6..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
62
|
+
assert_empty e
|
63
|
+
|
64
|
+
o, e, s = Open3.capture3 EXE+' -e "[5-9]" -x', stdin_data: stin
|
65
|
+
assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
66
|
+
assert_equal stin[-4..-1], o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
67
|
+
assert_empty e
|
68
|
+
end
|
69
|
+
end # class TestUnitTailRb < MiniTest::Test
|
70
|
+
|
@@ -0,0 +1,52 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
# Tests of an executable.
|
4
|
+
#
|
5
|
+
# @author: M. Sakano (Wise Babel Ltd)
|
6
|
+
|
7
|
+
require 'open3'
|
8
|
+
|
9
|
+
$stdout.sync=true
|
10
|
+
$stderr.sync=true
|
11
|
+
# print '$LOAD_PATH=';p $LOAD_PATH
|
12
|
+
|
13
|
+
#################################################
|
14
|
+
# Unit Test
|
15
|
+
#################################################
|
16
|
+
|
17
|
+
gem "minitest"
|
18
|
+
# require 'minitest/unit'
|
19
|
+
require 'minitest/autorun'
|
20
|
+
|
21
|
+
class TestUnitTextclean < MiniTest::Test
|
22
|
+
T = true
|
23
|
+
F = false
|
24
|
+
SCFNAME = File.basename(__FILE__)
|
25
|
+
EXE = "%s/../bin/%s" % [File.dirname(__FILE__), File.basename(__FILE__).sub(/^test_?(.+)\.rb/, '\1')]
|
26
|
+
|
27
|
+
def setup
|
28
|
+
end
|
29
|
+
|
30
|
+
def teardown
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_textclean01
|
34
|
+
o, e, s = Open3.capture3 EXE
|
35
|
+
assert_equal 0, s.exitstatus, "error is raised: STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
36
|
+
assert_equal "", o.chomp
|
37
|
+
assert_empty e
|
38
|
+
|
39
|
+
stin = "foo\n\n\nbar\n"
|
40
|
+
s2 = "foo\n\nbar\n"
|
41
|
+
#o, e, s = Open3.capture3 EXE, stdin_data: stin
|
42
|
+
#assert_equal 0, s.exitstatus
|
43
|
+
#assert_equal s2, o
|
44
|
+
#assert_empty e
|
45
|
+
|
46
|
+
o, e, s = Open3.capture3 EXE+' --lastsps-style=delete', stdin_data: stin
|
47
|
+
assert_equal 0, s.exitstatus
|
48
|
+
assert_equal s2.chop.chomp, o, "Wrong! STDOUT="+o.inspect+" STDERR="+(e.empty? ? '""' : ":\n"+e)
|
49
|
+
assert_empty e
|
50
|
+
end
|
51
|
+
end # class TestUnitTextclean < MiniTest::Test
|
52
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: plain_text
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: '0.
|
4
|
+
version: '0.3'
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Masa Sakano
|
@@ -17,6 +17,9 @@ description: This module provides utility functions and methods to handle plain
|
|
17
17
|
email:
|
18
18
|
executables:
|
19
19
|
- countchar
|
20
|
+
- textclean
|
21
|
+
- head.rb
|
22
|
+
- tail.rb
|
20
23
|
extensions: []
|
21
24
|
extra_rdoc_files:
|
22
25
|
- README.en.rdoc
|
@@ -28,6 +31,9 @@ files:
|
|
28
31
|
- README.en.rdoc
|
29
32
|
- Rakefile
|
30
33
|
- bin/countchar
|
34
|
+
- bin/head.rb
|
35
|
+
- bin/tail.rb
|
36
|
+
- bin/textclean
|
31
37
|
- lib/plain_text.rb
|
32
38
|
- lib/plain_text/parse_rule.rb
|
33
39
|
- lib/plain_text/part.rb
|
@@ -40,6 +46,10 @@ files:
|
|
40
46
|
- test/test_plain_text_parse_rule.rb
|
41
47
|
- test/test_plain_text_part.rb
|
42
48
|
- test/test_plain_text_split.rb
|
49
|
+
- test/testcountchar.rb
|
50
|
+
- test/testhead_rb.rb
|
51
|
+
- test/testtail_rb.rb
|
52
|
+
- test/testtextclean.rb
|
43
53
|
homepage: https://www.wisebabel.com
|
44
54
|
licenses:
|
45
55
|
- MIT
|
@@ -67,6 +77,10 @@ specification_version: 4
|
|
67
77
|
summary: Module to handle Plain-Text
|
68
78
|
test_files:
|
69
79
|
- test/test_plain_text_parse_rule.rb
|
80
|
+
- test/testtail_rb.rb
|
70
81
|
- test/test_plain_text_part.rb
|
71
82
|
- test/test_plain_text.rb
|
83
|
+
- test/testcountchar.rb
|
84
|
+
- test/testtextclean.rb
|
72
85
|
- test/test_plain_text_split.rb
|
86
|
+
- test/testhead_rb.rb
|