plain_text 0.4 → 0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/ChangeLog +28 -0
- data/README.en.rdoc +52 -8
- data/bin/head.rb +27 -13
- data/bin/tail.rb +28 -12
- data/bin/yard2md_afterclean +213 -0
- data/lib/plain_text/parse_rule.rb +4 -1
- data/lib/plain_text/part.rb +103 -0
- data/lib/plain_text/split.rb +74 -0
- data/lib/plain_text/util.rb +71 -10
- data/lib/plain_text.rb +153 -28
- data/plain_text.gemspec +9 -5
- data/test/test_plain_text.rb +110 -1
- data/test/test_plain_text_part.rb +80 -0
- data/test/test_plain_text_split.rb +29 -0
- data/test/test_plain_text_util.rb +36 -0
- data/test/testhead_rb.rb +59 -4
- data/test/testtail_rb.rb +58 -8
- data/test/testyard2md_afterclean.rb +71 -0
- metadata +11 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e80b87e7f19d6f0e9799371f333126010239270db7026d001d7f97f11ec37146
|
4
|
+
data.tar.gz: 17882ccf6af631b485a7548e01b594423e525bdf154c246adebe97862bfc0d3a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8faa943dddd4f29791e39403db20c5967fe101b7cb7c8b3d72e3d601d9d0abcafdac78eb3e3d8d6a0a82140651ff8e0d91fba759013d724ff1676c4c4275136b
|
7
|
+
data.tar.gz: 37dbb8cb4a40b8cd53e85c41158cf1a1da2740805138b4845de44b7212fc708b1e8759febc3950b8fc13555512616d0104d54c043950282eed37f660959dbb56
|
data/ChangeLog
CHANGED
@@ -1,3 +1,31 @@
|
|
1
|
+
-----
|
2
|
+
(Version: 0.5)
|
3
|
+
2019-11-07 Masa Sakano
|
4
|
+
* bin/head.rb, bin/tail.rb (hence `lib/plain_text.rb`)
|
5
|
+
* "-p|--padding" option added.
|
6
|
+
* Algorithm in `PlainText#tail_regexp` well simplified.
|
7
|
+
* Some boundary-condtion bugs fixed.
|
8
|
+
* `PlainText#Split` (`lib/plain_text/split.rb`)
|
9
|
+
* Added public methods {#count_regexp} and {#count_lines} and their corresponding class methods.
|
10
|
+
* New Ruby executable script: `bin/yard2md_afterclean`
|
11
|
+
|
12
|
+
-----
|
13
|
+
2019-11-06 Masa Sakano
|
14
|
+
* head.rb, tail.rb
|
15
|
+
* "-i|--[no]-inverse" command-line option renamed to "-r|--[no-]reverse"
|
16
|
+
* "-i|--[no-]ignore-case" option added.
|
17
|
+
* "-m|--[no-]multi-line" option added.
|
18
|
+
|
19
|
+
-----
|
20
|
+
2019-11-06 Masa Sakano
|
21
|
+
* PlainText::Util (`plain_text/util.rb`)
|
22
|
+
* All the methods are now private.
|
23
|
+
* New dedicated test code file: `lib/plain_text/util.rb`
|
24
|
+
* PlainText::Part (`plain_text/part.rb`)
|
25
|
+
* Two new public methods `merge_para!` and `merge_para_if`
|
26
|
+
* head.rb, tail.rb (hence `plain_text.rb`)
|
27
|
+
* Fixed a critical bug in the null case with a Regexp option.
|
28
|
+
|
1
29
|
-----
|
2
30
|
(Version: 0.4)
|
3
31
|
2019-10-29 Masa Sakano
|
data/README.en.rdoc
CHANGED
@@ -11,6 +11,11 @@ This package also provides a few command-line programs, such as counting the num
|
|
11
11
|
of characters (especially useful for documents in Asian (CJK)
|
12
12
|
chatacters) and advanced head/tail commands.
|
13
13
|
|
14
|
+
The master of this README file, as well as the document for all the methods, is found in
|
15
|
+
{RubyGems/plain_text}[https://rubygems.org/gems/plain_text]
|
16
|
+
and in {Github}[https://github.com/masasakano/plain_text]
|
17
|
+
where all the hyperlinks are active.
|
18
|
+
|
14
19
|
== Design concept
|
15
20
|
|
16
21
|
=== PlainText - Module and root Namespace
|
@@ -104,6 +109,7 @@ help message.
|
|
104
109
|
Counts the number of characters in a file(s) or STDIN.
|
105
110
|
|
106
111
|
The simplest example to run the command-line script is
|
112
|
+
|
107
113
|
countchar YourFile.txt
|
108
114
|
|
109
115
|
=== textclean
|
@@ -116,9 +122,9 @@ into 2. See the reference of {PlainText.clean_text} for detail.
|
|
116
122
|
|
117
123
|
This gives advanced functions, in addition to the standard +head+, including
|
118
124
|
|
119
|
-
Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line).
|
125
|
+
Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line), including ignore-case, multi-line, extra *padding-line* etc.
|
120
126
|
Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
|
121
|
-
|
127
|
+
Reverse:: It can *reverese* the behaviour - inverse the counting to ouput everything but initial NUM lines.
|
122
128
|
|
123
129
|
A few examples are
|
124
130
|
|
@@ -130,10 +136,17 @@ A few examples are
|
|
130
136
|
# The same as the UNIX command: tail -n +5
|
131
137
|
|
132
138
|
head.rb -e '^===+' try.txt
|
133
|
-
# =>
|
139
|
+
# => from the top up to the line that begins with more than 3 "="
|
134
140
|
|
135
141
|
head.rb -x -e '^===+' try.txt
|
136
|
-
# =>
|
142
|
+
# => from the top up to the line before what begins with more than 3 "="
|
143
|
+
|
144
|
+
head.rb -e '^===+' -p 3 try.txt
|
145
|
+
# => from the top up to 3 lines after what begins with more than 3 "="
|
146
|
+
|
147
|
+
head.rb -e '([a-z])\1$' --padding=-2 try.txt
|
148
|
+
# => from the top up to 2 lines before what ends with 2
|
149
|
+
# consecutive same letters (case-insentive) like "AA" or "qQ"
|
137
150
|
|
138
151
|
The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
|
139
152
|
|
@@ -141,9 +154,11 @@ The suffix +.rb+ is used to distinguish this command from the UNIX-shell standar
|
|
141
154
|
|
142
155
|
This gives advanced functions, in addition to the standard +tail+, including
|
143
156
|
|
144
|
-
Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end).
|
157
|
+
Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end), including ignore-case, multi-line, extra *padding-line* etc.
|
145
158
|
Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
|
146
|
-
|
159
|
+
Reverse:: It can *reverese* the behaviour - inverse the counting to ouput everything but the last NUM lines.
|
160
|
+
|
161
|
+
See +head.rb+ for practical examples.
|
147
162
|
|
148
163
|
Note the UNIX form of
|
149
164
|
|
@@ -155,6 +170,18 @@ Note the UNIX form of
|
|
155
170
|
|
156
171
|
The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
|
157
172
|
|
173
|
+
=== yard2md_afterclean
|
174
|
+
|
175
|
+
This stands for "yard to markdown - after-clean".
|
176
|
+
|
177
|
+
The standard conversion way of RDoc (written for yard) with +rdoc+ library
|
178
|
+
|
179
|
+
RDoc::Markup::ToMarkdown.new.convert
|
180
|
+
|
181
|
+
is limited, with the produced markdown having a fair number of flaws.
|
182
|
+
This command tries to botch-fix it. The result is
|
183
|
+
still not perfect but does some good automation job.
|
184
|
+
|
158
185
|
== Miscellaneous
|
159
186
|
|
160
187
|
Module {PlainText::Split} contains an instance method (and class
|
@@ -188,19 +215,36 @@ Work in progress...
|
|
188
215
|
This script requires {Ruby}[http://www.ruby-lang.org] Version 2.0
|
189
216
|
or above (possibley 2.2 or above?).
|
190
217
|
|
191
|
-
|
218
|
+
For use of the library, if your Ruby script declares
|
219
|
+
|
220
|
+
require "plain_text"
|
221
|
+
|
222
|
+
all the related libraries should be read.
|
223
|
+
If you +include PlainText+ from String, it would be handy, though
|
224
|
+
not mandatory to use this library.
|
225
|
+
|
226
|
+
As for the command-line script files, they can be put in any of your command-line search
|
192
227
|
paths. Make sure the RUBYLIB environment
|
193
228
|
variable contains the library directory to this gem, which is
|
229
|
+
|
194
230
|
/THIS/GEM/LIBRARY/PATH/plain_text/lib
|
195
231
|
|
232
|
+
(which should be set automatically, as long as you use the standard Gem environment).
|
196
233
|
You may need to modify the first line (Shebang line) of the script to suit your
|
197
234
|
environment (it should be unnecessary for Linux and MacOS), or run it
|
198
235
|
explicitly with your Ruby command as
|
236
|
+
|
199
237
|
Prompt% /YOUR/ENV/ruby /YOUR/INSTALLED/countchar
|
200
238
|
|
201
239
|
== Developer's note
|
202
240
|
|
203
|
-
The source
|
241
|
+
The source codes are annotated in the {YARD}[https://yardoc.org/] format. You
|
242
|
+
can view it in
|
243
|
+
{RubyGems/plain_text}[https://rubygems.org/gems/plain_text] .
|
244
|
+
|
245
|
+
The source code is maintained also in
|
246
|
+
{Github}[https://github.com/masasakano/plain_text] (no intuitive
|
247
|
+
interface for annotation)
|
204
248
|
|
205
249
|
=== Tests
|
206
250
|
|
data/bin/head.rb
CHANGED
@@ -13,10 +13,13 @@ __EOF__
|
|
13
13
|
OPTS = {
|
14
14
|
num: PlainText::DEF_HEADTAIL_N_LINES,
|
15
15
|
unit: :line,
|
16
|
+
ignore_case: false,
|
16
17
|
inclusive: true,
|
17
|
-
inverse: false, #
|
18
|
+
inverse: false, # Option --reverse
|
19
|
+
multi_line: false,
|
20
|
+
padding: 0,
|
18
21
|
# :chatter => 3, # Default
|
19
|
-
debug: false,
|
22
|
+
# debug: false,
|
20
23
|
}
|
21
24
|
|
22
25
|
# Function to handle the command-line arguments.
|
@@ -31,14 +34,19 @@ def handle_argv
|
|
31
34
|
opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
|
32
35
|
opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
|
33
36
|
opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
|
34
|
-
opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] =
|
37
|
+
opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = v}
|
38
|
+
opt.on('-i', '--[no-]ignore-case', sprintf("Ignore case distinctions in Regexp (Def: %s)", (!OPTS[:ignore_case]).inspect), TrueClass) {|v| OPTS[:ignore_case] = v}
|
39
|
+
opt.on('-m', '--[no-]multi-line', sprintf("Multi-line match (option m) in Regexp (Def: %s)", (!OPTS[:multi_line]).inspect), TrueClass) {|v| OPTS[:multi_line] = v}
|
35
40
|
opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
|
36
|
-
opt.on('-
|
41
|
+
opt.on('-p NUM', '--padding=NUM', sprintf("The number of lines included as 'padding' below the matched line (Def: %s)", (!OPTS[:padding]).inspect), Integer) {|v| OPTS[:padding] = v}
|
42
|
+
opt.on('-r', '--[no-]reverse', sprintf("Reverse the behaviour (run AFTER - (inc|ex)clusive and padding) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v} # WARNING-NOTE: the Hash keyword is "inverse" as opposed to "reverse"
|
37
43
|
# opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
|
38
44
|
# opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
|
39
45
|
# opt.separator "" # Way to control a help message.
|
40
|
-
|
41
|
-
|
46
|
+
opt.separator "Note:"
|
47
|
+
opt.separator " Option -m means '.' includes a newline. '\\s' includes it regardless."
|
48
|
+
opt.separator " 'Padding' (-p) is calculated after Option -x is considered."
|
49
|
+
opt.separator " Negative 'Padding' like '--padding=-3' reduces the number of lines by 3."
|
42
50
|
|
43
51
|
begin
|
44
52
|
opt.parse!(ARGV)
|
@@ -48,6 +56,12 @@ def handle_argv
|
|
48
56
|
exit 1
|
49
57
|
end
|
50
58
|
|
59
|
+
if OPTS[:num].respond_to? :to_str
|
60
|
+
# Regexp specified with --regexp=REGEXP
|
61
|
+
cond = (0 | (OPTS[:ignore_case] ? Regexp::IGNORECASE : 0) | (OPTS[:multi_line] ? Regexp::MULTILINE : 0))
|
62
|
+
OPTS[:num] = Regexp.new OPTS[:num], cond
|
63
|
+
end
|
64
|
+
|
51
65
|
OPTS
|
52
66
|
end
|
53
67
|
|
@@ -67,19 +81,19 @@ end
|
|
67
81
|
opts = handle_argv()
|
68
82
|
num_in = opts[:num]
|
69
83
|
is_inverse = opts[:inverse]
|
84
|
+
# $DEBUG = true if opts[:debug] # Better specify by running this script with ruby --debug
|
70
85
|
|
71
|
-
%i(num inverse debug).each do |ek|
|
86
|
+
%i(num ignore_case inverse multi_line debug).each do |ek|
|
72
87
|
opts.delete ek if opts.has_key? ek
|
73
88
|
end
|
74
89
|
|
75
90
|
str = ARGF.read
|
76
91
|
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
end
|
92
|
+
method = (is_inverse ? :head_inverse : :head)
|
93
|
+
sout = PlainText.public_send(method, str, num_in, **opts)
|
94
|
+
|
95
|
+
# A linebreak guaranteed at the end, unless it is empty.
|
96
|
+
puts sout if !sout.empty?
|
83
97
|
|
84
98
|
exit
|
85
99
|
|
data/bin/tail.rb
CHANGED
@@ -13,10 +13,13 @@ __EOF__
|
|
13
13
|
OPTS = {
|
14
14
|
num: PlainText::DEF_HEADTAIL_N_LINES,
|
15
15
|
unit: :line,
|
16
|
+
ignore_case: false,
|
16
17
|
inclusive: true,
|
17
|
-
inverse: false, #
|
18
|
+
inverse: false, # Option --reverse
|
19
|
+
multi_line: false,
|
20
|
+
padding: 0,
|
18
21
|
# :chatter => 3, # Default
|
19
|
-
debug: false,
|
22
|
+
# debug: false,
|
20
23
|
}
|
21
24
|
|
22
25
|
# Function to handle the command-line arguments.
|
@@ -31,14 +34,21 @@ def handle_argv
|
|
31
34
|
opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
|
32
35
|
opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
|
33
36
|
opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
|
34
|
-
opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] =
|
37
|
+
opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = v}
|
38
|
+
opt.on('-i', '--[no-]ignore-case', sprintf("Ignore case distinctions in Regexp (Def: %s)", (!OPTS[:ignore_case]).inspect), TrueClass) {|v| OPTS[:ignore_case] = v}
|
39
|
+
opt.on('-m', '--[no-]multi-line', sprintf("Multi-line match (option m) in Regexp (Def: %s)", (!OPTS[:multi_line]).inspect), TrueClass) {|v| OPTS[:multi_line] = v}
|
35
40
|
opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
|
36
|
-
opt.on('-
|
41
|
+
opt.on('-p NUM', '--padding=NUM', sprintf("The number of lines included as 'padding' below the matched line (Def: %s)", (!OPTS[:padding]).inspect), Integer) {|v| OPTS[:padding] = v}
|
42
|
+
opt.on('-p NUM', '--padding=NUM', sprintf("The number of lines included as 'padding' below the matched line (Def: %s)", (!OPTS[:padding]).inspect), Integer) {|v| OPTS[:padding] = v}
|
43
|
+
opt.on('-r', '--[no-]reverse', sprintf("Reverse the behaviour (run AFTER - (inc|ex)clusive and padding) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v} # WARNING-NOTE: the Hash keyword is "inverse" as opposed to "reverse"
|
37
44
|
# opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
|
38
45
|
# opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
|
39
46
|
opt.separator "" # Way to control a help message.
|
40
47
|
opt.separator "Note:"
|
41
|
-
opt.separator " UNIX command of 'tail -n +5' is equivalent to 'head.rb
|
48
|
+
opt.separator " UNIX command of 'tail -n +5' is equivalent to 'head.rb --reverse -n 5'"
|
49
|
+
opt.separator " Option -m means '.' includes a newline. '\\s' includes it regardless."
|
50
|
+
opt.separator " 'Padding' (-p) is calculated after Option -x is considered."
|
51
|
+
opt.separator " Negative 'Padding' like '--padding=-3' reduces the number of lines by 3."
|
42
52
|
|
43
53
|
begin
|
44
54
|
opt.parse!(ARGV)
|
@@ -48,6 +58,12 @@ def handle_argv
|
|
48
58
|
exit 1
|
49
59
|
end
|
50
60
|
|
61
|
+
if OPTS[:num].respond_to? :to_str
|
62
|
+
# Regexp specified with --regexp=REGEXP
|
63
|
+
cond = (0 | (OPTS[:ignore_case] ? Regexp::IGNORECASE : 0) | (OPTS[:multi_line] ? Regexp::MULTILINE : 0))
|
64
|
+
OPTS[:num] = Regexp.new OPTS[:num], cond
|
65
|
+
end
|
66
|
+
|
51
67
|
OPTS
|
52
68
|
end
|
53
69
|
|
@@ -67,19 +83,19 @@ end
|
|
67
83
|
opts = handle_argv()
|
68
84
|
num_in = opts[:num]
|
69
85
|
is_inverse = opts[:inverse]
|
86
|
+
# $DEBUG = true if opts[:debug] # Better specify by running this script with ruby --debug
|
70
87
|
|
71
|
-
%i(num inverse debug).each do |ek|
|
88
|
+
%i(num ignore_case inverse multi_line debug).each do |ek|
|
72
89
|
opts.delete ek if opts.has_key? ek
|
73
90
|
end
|
74
91
|
|
75
92
|
str = ARGF.read
|
76
93
|
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
end
|
94
|
+
method = (is_inverse ? :tail_inverse : :tail)
|
95
|
+
sout = PlainText.public_send(method, str, num_in, **opts)
|
96
|
+
|
97
|
+
# A linebreak guaranteed at the end, unless it is empty.
|
98
|
+
puts sout if !sout.empty?
|
83
99
|
|
84
100
|
exit
|
85
101
|
|
@@ -0,0 +1,213 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# -*- coding: utf-8 -*-
|
3
|
+
|
4
|
+
require 'optparse'
|
5
|
+
require 'open3'
|
6
|
+
require 'plain_text'
|
7
|
+
|
8
|
+
BANNER = <<"__EOF__"
|
9
|
+
USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
|
10
|
+
Clean the partially ill-formated (Github) Markdown converted from yard-Rdoc.
|
11
|
+
__EOF__
|
12
|
+
|
13
|
+
# Initialising the hash for the command-line options.
|
14
|
+
OPTS = {
|
15
|
+
lang: 'ruby',
|
16
|
+
# :chatter => 3, # Default
|
17
|
+
debug: false,
|
18
|
+
}
|
19
|
+
|
20
|
+
# Function to handle the command-line arguments.
|
21
|
+
#
|
22
|
+
# ARGV will be modified, and the constant variable OPTS is set.
|
23
|
+
#
|
24
|
+
# @return [Hash] Optional-argument hash.
|
25
|
+
#
|
26
|
+
def handle_argv
|
27
|
+
opt = OptionParser.new(BANNER)
|
28
|
+
opt.on( '--lang=LANGUAGE', sprintf("Programming Language like ruby (Def: %s).", OPTS[:lang])) { |v| OPTS[:lang]=v.strip }
|
29
|
+
# opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
|
30
|
+
opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
|
31
|
+
# opt.separator "" # Way to control a help message.
|
32
|
+
# opt.separator "Note:"
|
33
|
+
# opt.separator " Spaces are truncated in default."
|
34
|
+
|
35
|
+
opt.parse!(ARGV)
|
36
|
+
|
37
|
+
OPTS
|
38
|
+
end
|
39
|
+
|
40
|
+
def fix_string_based(str)
|
41
|
+
fix_def_list(
|
42
|
+
fix_inline_link(
|
43
|
+
fix_inline_code(str)
|
44
|
+
)
|
45
|
+
)
|
46
|
+
end
|
47
|
+
|
48
|
+
# Removes some markdown formatting (for definition list etc)
|
49
|
+
def remove_mdfmt(str)
|
50
|
+
str.gsub(/`([^`\n]+)`/, '<tt>\1</tt>').gsub(/\*+([^*\n]+)\*+/, '<strong>\1</strong>').gsub(/\&/, '&').gsub(/</, '<').gsub(/>/, '>').gsub(/"/, '"')
|
51
|
+
end
|
52
|
+
|
53
|
+
# Removes some markdown formatting (for definition list etc)
|
54
|
+
def remove_mdfmt_raw(str)
|
55
|
+
str.gsub(/`([^`\n]+)`/, '\1').gsub(/\*+([^*\n]+)\*+/, '\1').gsub(/\&/, '&').gsub(/</, '<').gsub(/>/, '>').gsub(/"/, '"')
|
56
|
+
end
|
57
|
+
|
58
|
+
|
59
|
+
# returns the string where the definition list is rewritten for github
|
60
|
+
#
|
61
|
+
# Similar to {#fix_inline_code} but for def list
|
62
|
+
#
|
63
|
+
# @param str [String]
|
64
|
+
# @return [String]
|
65
|
+
def fix_def_list(str)
|
66
|
+
str.gsub(/^(\S+[^\n]*)\n:((?:\s+[^\n]+(?:\n|\z))+)/m){
|
67
|
+
sdt, sdd = $1, $2
|
68
|
+
"<dt>%s</dt>\n<dd>%s</dd>\n"%[remove_mdfmt_raw(sdt), remove_mdfmt(sdd.chop)]
|
69
|
+
}.gsub(/(\s+\n|\A)(<dt>)/m, '\1<dl>'+"\n"+'\2').gsub(%r@(</dd>[[:blank:]]*)(\n(?:\s+|\z))@, '\1'+"\n"+'</dl>\2')
|
70
|
+
end
|
71
|
+
|
72
|
+
# returns the string where inline code are fixed.
|
73
|
+
#
|
74
|
+
# More than 2 words are left like
|
75
|
+
#
|
76
|
+
# +abc def+
|
77
|
+
#
|
78
|
+
# which should be converted into
|
79
|
+
#
|
80
|
+
# `abc def`
|
81
|
+
#
|
82
|
+
# This is assuming the current paragraph is not a code block.
|
83
|
+
# This does not *properly* take into account the escape sequence.
|
84
|
+
# For example, '+a\+ b+' is not properly taken into account
|
85
|
+
# (though RDoc may not do, either)!
|
86
|
+
#
|
87
|
+
# Note if words between '+' straddle over more than 2 lines, something may be wrong,
|
88
|
+
# and hence they are ignored.
|
89
|
+
#
|
90
|
+
# @param str [String]
|
91
|
+
# @return [String]
|
92
|
+
def fix_inline_code(str)
|
93
|
+
str.gsub(/(?<!\\)((?:\\\\)*)\+([^+\n]+)(\n[^+\n]+)?(?<!\\)(\\\\)*\+/m){
|
94
|
+
($1 ? $1 : "")+'`'+$2+($3 ? ' '+$3[1..-1] : '')+'`'+($4 ? $4 : "")
|
95
|
+
}
|
96
|
+
end
|
97
|
+
|
98
|
+
# returns the string where multi-line links are fixed.
|
99
|
+
#
|
100
|
+
# Similar to {#fix_inline_code} but for links
|
101
|
+
#
|
102
|
+
# @param str [String]
|
103
|
+
# @return [String]
|
104
|
+
def fix_inline_link(str)
|
105
|
+
str.gsub(%r@(?<!\\)((?:\\\\)*)\[([^\]\n]+)(\n[^\]\n]+)?(?<!\\)(\\\\)*\](\(https?://[^)]+\))@m){
|
106
|
+
($1 ? $1 : "")+'['+$2+($3 ? ' '+$3[1..-1] : '')+']'+($4 ? $4 : "")+$5.gsub(/\s*\n+\s*/m, '')
|
107
|
+
}
|
108
|
+
end
|
109
|
+
|
110
|
+
# Indent of the current line
|
111
|
+
#
|
112
|
+
# @param str [String]
|
113
|
+
# @param lb [String] Linebreak: default $/
|
114
|
+
# @return [Integer]
|
115
|
+
def indent_line(str)
|
116
|
+
/\A(\s*)/ =~ str
|
117
|
+
$1.size
|
118
|
+
end
|
119
|
+
|
120
|
+
# Returns the minimum indent of the input String, excluding blank lines.
|
121
|
+
#
|
122
|
+
# @param str [String]
|
123
|
+
# @param lb [String] Linebreak: default $/ (ignored so far)
|
124
|
+
# @return [Integer]
|
125
|
+
def min_indent(str, lb=$/)
|
126
|
+
return 0 if str.empty?
|
127
|
+
lines = PlainText::Part.parse(str).parts.join("\n").split("\n")
|
128
|
+
lines.map{|ec| indent_line(ec)}.min
|
129
|
+
end
|
130
|
+
|
131
|
+
# True if it looks like Markdown code block.
|
132
|
+
#
|
133
|
+
# Neither Github-style "```ruby" nor pandoc-style "~~~~{#mycode...}" is
|
134
|
+
# assumed not to be used.
|
135
|
+
# This is not accurate and can be cheated if it is already indented as list.
|
136
|
+
#
|
137
|
+
# @param str [String]
|
138
|
+
# @param indent [Integer] Base indent. If it is 0, 4 or more indents are the conditions.
|
139
|
+
def md_code_block?(str, indent=0, *rest)
|
140
|
+
return nil if str.empty?
|
141
|
+
(min_indent(str, *rest) - indent) >= 4
|
142
|
+
end
|
143
|
+
|
144
|
+
# Returns the last indent of the paragraph if it ends with a list.
|
145
|
+
#
|
146
|
+
# @param str [String]
|
147
|
+
# @param indent_prev [Integer] The minimum indent for an item to keep being in the list in the previous paragraph.
|
148
|
+
# @param lb [String] Linebreak: default $/
|
149
|
+
# @return [Integer]
|
150
|
+
def last_indent(str, indent_prev=0, lb=$/)
|
151
|
+
return indent_prev if !str || str.empty?
|
152
|
+
lines = PlainText::Part.parse(str).parts.join("\n").split("\n")
|
153
|
+
# Note: numsps = 2 # "2." takes up 2 spaces, whereas "12." takes 3.
|
154
|
+
lines.each do |ec|
|
155
|
+
cind = indent_line(ec)
|
156
|
+
if cind - indent_prev >= 4 # Code block! ##### Maybe deals with it in future!!
|
157
|
+
# This means it is indented more than 5 spaces from the previous.
|
158
|
+
elsif /^(\s*)(?:(\*\s)|(\d+\.(?:\s|$)))/ =~ ec
|
159
|
+
# Reset the indent
|
160
|
+
ind_now = $1.size + ($2 || $3).size + 1 # maybe +2 (for Rdoc2md?)
|
161
|
+
indent_prev = ind_now # Deeper or shallower or same-level list.
|
162
|
+
# numsps = $3.size + 1 if $3 && !$3.empty?
|
163
|
+
elsif cind < indent_prev - 1 # 1 is a margin...
|
164
|
+
# Breaks out from the previous list.
|
165
|
+
indent_prev = cind
|
166
|
+
end
|
167
|
+
end
|
168
|
+
indent_prev
|
169
|
+
end
|
170
|
+
|
171
|
+
################################################
|
172
|
+
# MAIN
|
173
|
+
################################################
|
174
|
+
|
175
|
+
$stdout.sync=true
|
176
|
+
$stderr.sync=true
|
177
|
+
|
178
|
+
#class String
|
179
|
+
# include PlainText
|
180
|
+
#end
|
181
|
+
|
182
|
+
# Handle the command-line options => OPTS
|
183
|
+
opts = handle_argv()
|
184
|
+
|
185
|
+
strin = ARGF.read
|
186
|
+
## split to paras, fixing inline code blocks
|
187
|
+
mdpart = PlainText::Part.parse(strin)
|
188
|
+
|
189
|
+
indent_prev = last_indent(mdpart[0])
|
190
|
+
mdpart.merge_para_if{ |pbp, _, _|
|
191
|
+
prev_cb = md_code_block?(pbp[0], indent_prev)
|
192
|
+
next_cb = md_code_block?(pbp[2], indent_prev)
|
193
|
+
next true if prev_cb && next_cb
|
194
|
+
indent_prev = last_indent(pbp[2], indent_prev)
|
195
|
+
false
|
196
|
+
}
|
197
|
+
|
198
|
+
indent_next = 0
|
199
|
+
mdpart = mdpart.map_part{|ec|
|
200
|
+
indent_prev = indent_next
|
201
|
+
indent_next = last_indent(ec, indent_prev)
|
202
|
+
next fix_string_based(ec) if !md_code_block?(ec, indent_prev)
|
203
|
+
inde = " "*indent_prev
|
204
|
+
st = ec.gsub(/^ /, '')
|
205
|
+
"%s```%s\n%s\n%s```"%[inde, opts[:lang], st, inde, opts[:lang]]
|
206
|
+
}
|
207
|
+
|
208
|
+
puts mdpart.join('')
|
209
|
+
|
210
|
+
exit
|
211
|
+
|
212
|
+
__END__
|
213
|
+
|
@@ -89,6 +89,8 @@ module PlainText
|
|
89
89
|
#
|
90
90
|
class ParseRule
|
91
91
|
|
92
|
+
include PlainText::Util
|
93
|
+
|
92
94
|
# Main Array of rules (Proc or Regexp). Do not delete or add the contents, as it would have a knock-on effect, especially with {#names}!
|
93
95
|
# Use {#rule_at} to get a rule for the index/key.
|
94
96
|
# The private method {#rule_at}(-1) is the same as {#rules}[-1],
|
@@ -283,7 +285,8 @@ module PlainText
|
|
283
285
|
# @param index_rules [Integer] Index for {#rules}. A negative index is allowed.
|
284
286
|
# @return [Integer] Non-negative index where name is set; i.e., if index=-1 is specified for {#rules} with a size of 3, the returned value is 2 (the last index of it).
|
285
287
|
def set_name_at(name, index_rules)
|
286
|
-
index =
|
288
|
+
index = positive_array_index_checked(index_rules, @rules, accept_too_big: false, varname: 'rules')
|
289
|
+
# index = PlainText::Util.positive_array_index_checked(index_rules, @rules, accept_too_big: false, varname: 'rules')
|
287
290
|
if !name
|
288
291
|
@names[index] = nil
|
289
292
|
return index
|