plain_text 0.4 → 0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e0bbafc2df85dc7fab4a71b03126805e7f2e9916ae0e3fee91e31d9584e5a6ee
4
- data.tar.gz: b385be9df6ce8d8c1081a8c5a233daaa60659ccc25c44722c4a333d8cb81a8c0
3
+ metadata.gz: e80b87e7f19d6f0e9799371f333126010239270db7026d001d7f97f11ec37146
4
+ data.tar.gz: 17882ccf6af631b485a7548e01b594423e525bdf154c246adebe97862bfc0d3a
5
5
  SHA512:
6
- metadata.gz: c3fb2676c18ad0f3e637fc4bac35af7b7fba6d664c852265cd59485d0909dd4e6739c8a4dc12257016472ef31ac8719ed85be061abfe9955a4adf5fcb8b16d29
7
- data.tar.gz: dfa034f4130c02aa7ba12817da0395b78f5d6dfcfd349a80966a20e8aa98652ec4c2d2b8c4762b65d25dbfbe39244c2375663085b2189077a04a732300ca6dc7
6
+ metadata.gz: 8faa943dddd4f29791e39403db20c5967fe101b7cb7c8b3d72e3d601d9d0abcafdac78eb3e3d8d6a0a82140651ff8e0d91fba759013d724ff1676c4c4275136b
7
+ data.tar.gz: 37dbb8cb4a40b8cd53e85c41158cf1a1da2740805138b4845de44b7212fc708b1e8759febc3950b8fc13555512616d0104d54c043950282eed37f660959dbb56
data/ChangeLog CHANGED
@@ -1,3 +1,31 @@
1
+ -----
2
+ (Version: 0.5)
3
+ 2019-11-07 Masa Sakano
4
+ * bin/head.rb, bin/tail.rb (hence `lib/plain_text.rb`)
5
+ * "-p|--padding" option added.
6
+ * Algorithm in `PlainText#tail_regexp` well simplified.
7
+ * Some boundary-condtion bugs fixed.
8
+ * `PlainText#Split` (`lib/plain_text/split.rb`)
9
+ * Added public methods {#count_regexp} and {#count_lines} and their corresponding class methods.
10
+ * New Ruby executable script: `bin/yard2md_afterclean`
11
+
12
+ -----
13
+ 2019-11-06 Masa Sakano
14
+ * head.rb, tail.rb
15
+ * "-i|--[no]-inverse" command-line option renamed to "-r|--[no-]reverse"
16
+ * "-i|--[no-]ignore-case" option added.
17
+ * "-m|--[no-]multi-line" option added.
18
+
19
+ -----
20
+ 2019-11-06 Masa Sakano
21
+ * PlainText::Util (`plain_text/util.rb`)
22
+ * All the methods are now private.
23
+ * New dedicated test code file: `lib/plain_text/util.rb`
24
+ * PlainText::Part (`plain_text/part.rb`)
25
+ * Two new public methods `merge_para!` and `merge_para_if`
26
+ * head.rb, tail.rb (hence `plain_text.rb`)
27
+ * Fixed a critical bug in the null case with a Regexp option.
28
+
1
29
  -----
2
30
  (Version: 0.4)
3
31
  2019-10-29 Masa Sakano
data/README.en.rdoc CHANGED
@@ -11,6 +11,11 @@ This package also provides a few command-line programs, such as counting the num
11
11
  of characters (especially useful for documents in Asian (CJK)
12
12
  chatacters) and advanced head/tail commands.
13
13
 
14
+ The master of this README file, as well as the document for all the methods, is found in
15
+ {RubyGems/plain_text}[https://rubygems.org/gems/plain_text]
16
+ and in {Github}[https://github.com/masasakano/plain_text]
17
+ where all the hyperlinks are active.
18
+
14
19
  == Design concept
15
20
 
16
21
  === PlainText - Module and root Namespace
@@ -104,6 +109,7 @@ help message.
104
109
  Counts the number of characters in a file(s) or STDIN.
105
110
 
106
111
  The simplest example to run the command-line script is
112
+
107
113
  countchar YourFile.txt
108
114
 
109
115
  === textclean
@@ -116,9 +122,9 @@ into 2. See the reference of {PlainText.clean_text} for detail.
116
122
 
117
123
  This gives advanced functions, in addition to the standard +head+, including
118
124
 
119
- Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line).
125
+ Regexp:: It can accept Ruby Regexp to determine the boundary (beginning to the first-matched line), including ignore-case, multi-line, extra *padding-line* etc.
120
126
  Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
121
- Inverse:: It can inverse the counting to ouput everything but initial NUM lines.
127
+ Reverse:: It can *reverese* the behaviour - inverse the counting to ouput everything but initial NUM lines.
122
128
 
123
129
  A few examples are
124
130
 
@@ -130,10 +136,17 @@ A few examples are
130
136
  # The same as the UNIX command: tail -n +5
131
137
 
132
138
  head.rb -e '^===+' try.txt
133
- # => first line up to the line that begins with more than 3 "="
139
+ # => from the top up to the line that begins with more than 3 "="
134
140
 
135
141
  head.rb -x -e '^===+' try.txt
136
- # => first line up to the line before what begins with more than 3 "="
142
+ # => from the top up to the line before what begins with more than 3 "="
143
+
144
+ head.rb -e '^===+' -p 3 try.txt
145
+ # => from the top up to 3 lines after what begins with more than 3 "="
146
+
147
+ head.rb -e '([a-z])\1$' --padding=-2 try.txt
148
+ # => from the top up to 2 lines before what ends with 2
149
+ # consecutive same letters (case-insentive) like "AA" or "qQ"
137
150
 
138
151
  The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
139
152
 
@@ -141,9 +154,11 @@ The suffix +.rb+ is used to distinguish this command from the UNIX-shell standar
141
154
 
142
155
  This gives advanced functions, in addition to the standard +tail+, including
143
156
 
144
- Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end).
157
+ Regexp:: It can accept Ruby Regexp to determine the boundary (last-matched line to the end), including ignore-case, multi-line, extra *padding-line* etc.
145
158
  Character-based:: With +--char+ option, it handles the file in units of a chracter, which is especially handy to deal with multi-byte characters like UTF-8.
146
- Inverse:: It can inverse the counting to ouput everything but the last NUM lines.
159
+ Reverse:: It can *reverese* the behaviour - inverse the counting to ouput everything but the last NUM lines.
160
+
161
+ See +head.rb+ for practical examples.
147
162
 
148
163
  Note the UNIX form of
149
164
 
@@ -155,6 +170,18 @@ Note the UNIX form of
155
170
 
156
171
  The suffix +.rb+ is used to distinguish this command from the UNIX-shell standard command.
157
172
 
173
+ === yard2md_afterclean
174
+
175
+ This stands for "yard to markdown - after-clean".
176
+
177
+ The standard conversion way of RDoc (written for yard) with +rdoc+ library
178
+
179
+ RDoc::Markup::ToMarkdown.new.convert
180
+
181
+ is limited, with the produced markdown having a fair number of flaws.
182
+ This command tries to botch-fix it. The result is
183
+ still not perfect but does some good automation job.
184
+
158
185
  == Miscellaneous
159
186
 
160
187
  Module {PlainText::Split} contains an instance method (and class
@@ -188,19 +215,36 @@ Work in progress...
188
215
  This script requires {Ruby}[http://www.ruby-lang.org] Version 2.0
189
216
  or above (possibley 2.2 or above?).
190
217
 
191
- As for the command-line script file, it can be put in any of your command-line search
218
+ For use of the library, if your Ruby script declares
219
+
220
+ require "plain_text"
221
+
222
+ all the related libraries should be read.
223
+ If you +include PlainText+ from String, it would be handy, though
224
+ not mandatory to use this library.
225
+
226
+ As for the command-line script files, they can be put in any of your command-line search
192
227
  paths. Make sure the RUBYLIB environment
193
228
  variable contains the library directory to this gem, which is
229
+
194
230
  /THIS/GEM/LIBRARY/PATH/plain_text/lib
195
231
 
232
+ (which should be set automatically, as long as you use the standard Gem environment).
196
233
  You may need to modify the first line (Shebang line) of the script to suit your
197
234
  environment (it should be unnecessary for Linux and MacOS), or run it
198
235
  explicitly with your Ruby command as
236
+
199
237
  Prompt% /YOUR/ENV/ruby /YOUR/INSTALLED/countchar
200
238
 
201
239
  == Developer's note
202
240
 
203
- The source code is maintained also in {Github}[https://github.com/masasakano/plain_text]
241
+ The source codes are annotated in the {YARD}[https://yardoc.org/] format. You
242
+ can view it in
243
+ {RubyGems/plain_text}[https://rubygems.org/gems/plain_text] .
244
+
245
+ The source code is maintained also in
246
+ {Github}[https://github.com/masasakano/plain_text] (no intuitive
247
+ interface for annotation)
204
248
 
205
249
  === Tests
206
250
 
data/bin/head.rb CHANGED
@@ -13,10 +13,13 @@ __EOF__
13
13
  OPTS = {
14
14
  num: PlainText::DEF_HEADTAIL_N_LINES,
15
15
  unit: :line,
16
+ ignore_case: false,
16
17
  inclusive: true,
17
- inverse: false, # unique option
18
+ inverse: false, # Option --reverse
19
+ multi_line: false,
20
+ padding: 0,
18
21
  # :chatter => 3, # Default
19
- debug: false,
22
+ # debug: false,
20
23
  }
21
24
 
22
25
  # Function to handle the command-line arguments.
@@ -31,14 +34,19 @@ def handle_argv
31
34
  opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
32
35
  opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
33
36
  opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
34
- opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
37
+ opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = v}
38
+ opt.on('-i', '--[no-]ignore-case', sprintf("Ignore case distinctions in Regexp (Def: %s)", (!OPTS[:ignore_case]).inspect), TrueClass) {|v| OPTS[:ignore_case] = v}
39
+ opt.on('-m', '--[no-]multi-line', sprintf("Multi-line match (option m) in Regexp (Def: %s)", (!OPTS[:multi_line]).inspect), TrueClass) {|v| OPTS[:multi_line] = v}
35
40
  opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
36
- opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
41
+ opt.on('-p NUM', '--padding=NUM', sprintf("The number of lines included as 'padding' below the matched line (Def: %s)", (!OPTS[:padding]).inspect), Integer) {|v| OPTS[:padding] = v}
42
+ opt.on('-r', '--[no-]reverse', sprintf("Reverse the behaviour (run AFTER - (inc|ex)clusive and padding) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v} # WARNING-NOTE: the Hash keyword is "inverse" as opposed to "reverse"
37
43
  # opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
38
44
  # opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
39
45
  # opt.separator "" # Way to control a help message.
40
- # opt.separator "Note:"
41
- # opt.separator " Spaces are truncated in default."
46
+ opt.separator "Note:"
47
+ opt.separator " Option -m means '.' includes a newline. '\\s' includes it regardless."
48
+ opt.separator " 'Padding' (-p) is calculated after Option -x is considered."
49
+ opt.separator " Negative 'Padding' like '--padding=-3' reduces the number of lines by 3."
42
50
 
43
51
  begin
44
52
  opt.parse!(ARGV)
@@ -48,6 +56,12 @@ def handle_argv
48
56
  exit 1
49
57
  end
50
58
 
59
+ if OPTS[:num].respond_to? :to_str
60
+ # Regexp specified with --regexp=REGEXP
61
+ cond = (0 | (OPTS[:ignore_case] ? Regexp::IGNORECASE : 0) | (OPTS[:multi_line] ? Regexp::MULTILINE : 0))
62
+ OPTS[:num] = Regexp.new OPTS[:num], cond
63
+ end
64
+
51
65
  OPTS
52
66
  end
53
67
 
@@ -67,19 +81,19 @@ end
67
81
  opts = handle_argv()
68
82
  num_in = opts[:num]
69
83
  is_inverse = opts[:inverse]
84
+ # $DEBUG = true if opts[:debug] # Better specify by running this script with ruby --debug
70
85
 
71
- %i(num inverse debug).each do |ek|
86
+ %i(num ignore_case inverse multi_line debug).each do |ek|
72
87
  opts.delete ek if opts.has_key? ek
73
88
  end
74
89
 
75
90
  str = ARGF.read
76
91
 
77
- # A linebreak guaranteed at the end.
78
- if is_inverse
79
- puts PlainText.head_inverse(str, num_in, **opts)
80
- else
81
- puts PlainText.head(str, num_in, **opts)
82
- end
92
+ method = (is_inverse ? :head_inverse : :head)
93
+ sout = PlainText.public_send(method, str, num_in, **opts)
94
+
95
+ # A linebreak guaranteed at the end, unless it is empty.
96
+ puts sout if !sout.empty?
83
97
 
84
98
  exit
85
99
 
data/bin/tail.rb CHANGED
@@ -13,10 +13,13 @@ __EOF__
13
13
  OPTS = {
14
14
  num: PlainText::DEF_HEADTAIL_N_LINES,
15
15
  unit: :line,
16
+ ignore_case: false,
16
17
  inclusive: true,
17
- inverse: false, # unique option
18
+ inverse: false, # Option --reverse
19
+ multi_line: false,
20
+ padding: 0,
18
21
  # :chatter => 3, # Default
19
- debug: false,
22
+ # debug: false,
20
23
  }
21
24
 
22
25
  # Function to handle the command-line arguments.
@@ -31,14 +34,21 @@ def handle_argv
31
34
  opt.on('-n NUM', '--line=NUM', sprintf("Number of lines (Def: %d).", PlainText::DEF_HEADTAIL_N_LINES), Integer) { |v| OPTS[:num]=v }
32
35
  opt.on('-c NUM', '--byte=NUM', sprintf("Number of bytes, instead of lines."), Integer) { |v| OPTS[:unit] = :byte; OPTS[:num]=v }
33
36
  opt.on( '--char=NUM', sprintf("Number of characters, instead of lines."), Integer) { |v| OPTS[:unit] = :char; OPTS[:num]=v }
34
- opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = Regexp.new v}
37
+ opt.on('-e REGEXP', '--regexp=REGEXP', sprintf("Regexp for the boundary, instead of a number.", (!OPTS[:num]).inspect)) {|v| OPTS[:num] = v}
38
+ opt.on('-i', '--[no-]ignore-case', sprintf("Ignore case distinctions in Regexp (Def: %s)", (!OPTS[:ignore_case]).inspect), TrueClass) {|v| OPTS[:ignore_case] = v}
39
+ opt.on('-m', '--[no-]multi-line', sprintf("Multi-line match (option m) in Regexp (Def: %s)", (!OPTS[:multi_line]).inspect), TrueClass) {|v| OPTS[:multi_line] = v}
35
40
  opt.on('-x', '--[no-]exclusive', sprintf("The line that matches is excluded? (Def: %s)", (!OPTS[:inclusive]).inspect), FalseClass) {|v| OPTS[:inclusive] = v}
36
- opt.on('-i', '--[no-]inverse', sprintf("Inverse the result (print after NUM-th line) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v}
41
+ opt.on('-p NUM', '--padding=NUM', sprintf("The number of lines included as 'padding' below the matched line (Def: %s)", (!OPTS[:padding]).inspect), Integer) {|v| OPTS[:padding] = v}
42
+ opt.on('-p NUM', '--padding=NUM', sprintf("The number of lines included as 'padding' below the matched line (Def: %s)", (!OPTS[:padding]).inspect), Integer) {|v| OPTS[:padding] = v}
43
+ opt.on('-r', '--[no-]reverse', sprintf("Reverse the behaviour (run AFTER - (inc|ex)clusive and padding) (Def: %s)", (!OPTS[:inverse]).inspect), TrueClass) {|v| OPTS[:inverse] = v} # WARNING-NOTE: the Hash keyword is "inverse" as opposed to "reverse"
37
44
  # opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
38
45
  # opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
39
46
  opt.separator "" # Way to control a help message.
40
47
  opt.separator "Note:"
41
- opt.separator " UNIX command of 'tail -n +5' is equivalent to 'head.rb -i -n 5'"
48
+ opt.separator " UNIX command of 'tail -n +5' is equivalent to 'head.rb --reverse -n 5'"
49
+ opt.separator " Option -m means '.' includes a newline. '\\s' includes it regardless."
50
+ opt.separator " 'Padding' (-p) is calculated after Option -x is considered."
51
+ opt.separator " Negative 'Padding' like '--padding=-3' reduces the number of lines by 3."
42
52
 
43
53
  begin
44
54
  opt.parse!(ARGV)
@@ -48,6 +58,12 @@ def handle_argv
48
58
  exit 1
49
59
  end
50
60
 
61
+ if OPTS[:num].respond_to? :to_str
62
+ # Regexp specified with --regexp=REGEXP
63
+ cond = (0 | (OPTS[:ignore_case] ? Regexp::IGNORECASE : 0) | (OPTS[:multi_line] ? Regexp::MULTILINE : 0))
64
+ OPTS[:num] = Regexp.new OPTS[:num], cond
65
+ end
66
+
51
67
  OPTS
52
68
  end
53
69
 
@@ -67,19 +83,19 @@ end
67
83
  opts = handle_argv()
68
84
  num_in = opts[:num]
69
85
  is_inverse = opts[:inverse]
86
+ # $DEBUG = true if opts[:debug] # Better specify by running this script with ruby --debug
70
87
 
71
- %i(num inverse debug).each do |ek|
88
+ %i(num ignore_case inverse multi_line debug).each do |ek|
72
89
  opts.delete ek if opts.has_key? ek
73
90
  end
74
91
 
75
92
  str = ARGF.read
76
93
 
77
- # A linebreak guaranteed at the end.
78
- if is_inverse
79
- puts PlainText.tail_inverse(str, num_in, **opts)
80
- else
81
- puts PlainText.tail(str, num_in, **opts)
82
- end
94
+ method = (is_inverse ? :tail_inverse : :tail)
95
+ sout = PlainText.public_send(method, str, num_in, **opts)
96
+
97
+ # A linebreak guaranteed at the end, unless it is empty.
98
+ puts sout if !sout.empty?
83
99
 
84
100
  exit
85
101
 
@@ -0,0 +1,213 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- coding: utf-8 -*-
3
+
4
+ require 'optparse'
5
+ require 'open3'
6
+ require 'plain_text'
7
+
8
+ BANNER = <<"__EOF__"
9
+ USAGE: #{File.basename($0)} [options] [INFILE.txt] < STDIN
10
+ Clean the partially ill-formated (Github) Markdown converted from yard-Rdoc.
11
+ __EOF__
12
+
13
+ # Initialising the hash for the command-line options.
14
+ OPTS = {
15
+ lang: 'ruby',
16
+ # :chatter => 3, # Default
17
+ debug: false,
18
+ }
19
+
20
+ # Function to handle the command-line arguments.
21
+ #
22
+ # ARGV will be modified, and the constant variable OPTS is set.
23
+ #
24
+ # @return [Hash] Optional-argument hash.
25
+ #
26
+ def handle_argv
27
+ opt = OptionParser.new(BANNER)
28
+ opt.on( '--lang=LANGUAGE', sprintf("Programming Language like ruby (Def: %s).", OPTS[:lang])) { |v| OPTS[:lang]=v.strip }
29
+ # opt.on( '--version', "Display the version and exits.", TrueClass) {|v| OPTS[:version] = v} # Consider opts.on_tail
30
+ opt.on( '--[no-]debug', "Debug (Def: false)", TrueClass) {|v| OPTS[:debug] = v}
31
+ # opt.separator "" # Way to control a help message.
32
+ # opt.separator "Note:"
33
+ # opt.separator " Spaces are truncated in default."
34
+
35
+ opt.parse!(ARGV)
36
+
37
+ OPTS
38
+ end
39
+
40
+ def fix_string_based(str)
41
+ fix_def_list(
42
+ fix_inline_link(
43
+ fix_inline_code(str)
44
+ )
45
+ )
46
+ end
47
+
48
+ # Removes some markdown formatting (for definition list etc)
49
+ def remove_mdfmt(str)
50
+ str.gsub(/`([^`\n]+)`/, '<tt>\1</tt>').gsub(/\*+([^*\n]+)\*+/, '<strong>\1</strong>').gsub(/\&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, '&gt;').gsub(/"/, '&quot;')
51
+ end
52
+
53
+ # Removes some markdown formatting (for definition list etc)
54
+ def remove_mdfmt_raw(str)
55
+ str.gsub(/`([^`\n]+)`/, '\1').gsub(/\*+([^*\n]+)\*+/, '\1').gsub(/\&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, '&gt;').gsub(/"/, '&quot;')
56
+ end
57
+
58
+
59
+ # returns the string where the definition list is rewritten for github
60
+ #
61
+ # Similar to {#fix_inline_code} but for def list
62
+ #
63
+ # @param str [String]
64
+ # @return [String]
65
+ def fix_def_list(str)
66
+ str.gsub(/^(\S+[^\n]*)\n:((?:\s+[^\n]+(?:\n|\z))+)/m){
67
+ sdt, sdd = $1, $2
68
+ "<dt>%s</dt>\n<dd>%s</dd>\n"%[remove_mdfmt_raw(sdt), remove_mdfmt(sdd.chop)]
69
+ }.gsub(/(\s+\n|\A)(<dt>)/m, '\1<dl>'+"\n"+'\2').gsub(%r@(</dd>[[:blank:]]*)(\n(?:\s+|\z))@, '\1'+"\n"+'</dl>\2')
70
+ end
71
+
72
+ # returns the string where inline code are fixed.
73
+ #
74
+ # More than 2 words are left like
75
+ #
76
+ # +abc def+
77
+ #
78
+ # which should be converted into
79
+ #
80
+ # `abc def`
81
+ #
82
+ # This is assuming the current paragraph is not a code block.
83
+ # This does not *properly* take into account the escape sequence.
84
+ # For example, '+a\+ b+' is not properly taken into account
85
+ # (though RDoc may not do, either)!
86
+ #
87
+ # Note if words between '+' straddle over more than 2 lines, something may be wrong,
88
+ # and hence they are ignored.
89
+ #
90
+ # @param str [String]
91
+ # @return [String]
92
+ def fix_inline_code(str)
93
+ str.gsub(/(?<!\\)((?:\\\\)*)\+([^+\n]+)(\n[^+\n]+)?(?<!\\)(\\\\)*\+/m){
94
+ ($1 ? $1 : "")+'`'+$2+($3 ? ' '+$3[1..-1] : '')+'`'+($4 ? $4 : "")
95
+ }
96
+ end
97
+
98
+ # returns the string where multi-line links are fixed.
99
+ #
100
+ # Similar to {#fix_inline_code} but for links
101
+ #
102
+ # @param str [String]
103
+ # @return [String]
104
+ def fix_inline_link(str)
105
+ str.gsub(%r@(?<!\\)((?:\\\\)*)\[([^\]\n]+)(\n[^\]\n]+)?(?<!\\)(\\\\)*\](\(https?://[^)]+\))@m){
106
+ ($1 ? $1 : "")+'['+$2+($3 ? ' '+$3[1..-1] : '')+']'+($4 ? $4 : "")+$5.gsub(/\s*\n+\s*/m, '')
107
+ }
108
+ end
109
+
110
+ # Indent of the current line
111
+ #
112
+ # @param str [String]
113
+ # @param lb [String] Linebreak: default $/
114
+ # @return [Integer]
115
+ def indent_line(str)
116
+ /\A(\s*)/ =~ str
117
+ $1.size
118
+ end
119
+
120
+ # Returns the minimum indent of the input String, excluding blank lines.
121
+ #
122
+ # @param str [String]
123
+ # @param lb [String] Linebreak: default $/ (ignored so far)
124
+ # @return [Integer]
125
+ def min_indent(str, lb=$/)
126
+ return 0 if str.empty?
127
+ lines = PlainText::Part.parse(str).parts.join("\n").split("\n")
128
+ lines.map{|ec| indent_line(ec)}.min
129
+ end
130
+
131
+ # True if it looks like Markdown code block.
132
+ #
133
+ # Neither Github-style "```ruby" nor pandoc-style "~~~~{#mycode...}" is
134
+ # assumed not to be used.
135
+ # This is not accurate and can be cheated if it is already indented as list.
136
+ #
137
+ # @param str [String]
138
+ # @param indent [Integer] Base indent. If it is 0, 4 or more indents are the conditions.
139
+ def md_code_block?(str, indent=0, *rest)
140
+ return nil if str.empty?
141
+ (min_indent(str, *rest) - indent) >= 4
142
+ end
143
+
144
+ # Returns the last indent of the paragraph if it ends with a list.
145
+ #
146
+ # @param str [String]
147
+ # @param indent_prev [Integer] The minimum indent for an item to keep being in the list in the previous paragraph.
148
+ # @param lb [String] Linebreak: default $/
149
+ # @return [Integer]
150
+ def last_indent(str, indent_prev=0, lb=$/)
151
+ return indent_prev if !str || str.empty?
152
+ lines = PlainText::Part.parse(str).parts.join("\n").split("\n")
153
+ # Note: numsps = 2 # "2." takes up 2 spaces, whereas "12." takes 3.
154
+ lines.each do |ec|
155
+ cind = indent_line(ec)
156
+ if cind - indent_prev >= 4 # Code block! ##### Maybe deals with it in future!!
157
+ # This means it is indented more than 5 spaces from the previous.
158
+ elsif /^(\s*)(?:(\*\s)|(\d+\.(?:\s|$)))/ =~ ec
159
+ # Reset the indent
160
+ ind_now = $1.size + ($2 || $3).size + 1 # maybe +2 (for Rdoc2md?)
161
+ indent_prev = ind_now # Deeper or shallower or same-level list.
162
+ # numsps = $3.size + 1 if $3 && !$3.empty?
163
+ elsif cind < indent_prev - 1 # 1 is a margin...
164
+ # Breaks out from the previous list.
165
+ indent_prev = cind
166
+ end
167
+ end
168
+ indent_prev
169
+ end
170
+
171
+ ################################################
172
+ # MAIN
173
+ ################################################
174
+
175
+ $stdout.sync=true
176
+ $stderr.sync=true
177
+
178
+ #class String
179
+ # include PlainText
180
+ #end
181
+
182
+ # Handle the command-line options => OPTS
183
+ opts = handle_argv()
184
+
185
+ strin = ARGF.read
186
+ ## split to paras, fixing inline code blocks
187
+ mdpart = PlainText::Part.parse(strin)
188
+
189
+ indent_prev = last_indent(mdpart[0])
190
+ mdpart.merge_para_if{ |pbp, _, _|
191
+ prev_cb = md_code_block?(pbp[0], indent_prev)
192
+ next_cb = md_code_block?(pbp[2], indent_prev)
193
+ next true if prev_cb && next_cb
194
+ indent_prev = last_indent(pbp[2], indent_prev)
195
+ false
196
+ }
197
+
198
+ indent_next = 0
199
+ mdpart = mdpart.map_part{|ec|
200
+ indent_prev = indent_next
201
+ indent_next = last_indent(ec, indent_prev)
202
+ next fix_string_based(ec) if !md_code_block?(ec, indent_prev)
203
+ inde = " "*indent_prev
204
+ st = ec.gsub(/^ /, '')
205
+ "%s```%s\n%s\n%s```"%[inde, opts[:lang], st, inde, opts[:lang]]
206
+ }
207
+
208
+ puts mdpart.join('')
209
+
210
+ exit
211
+
212
+ __END__
213
+
@@ -89,6 +89,8 @@ module PlainText
89
89
  #
90
90
  class ParseRule
91
91
 
92
+ include PlainText::Util
93
+
92
94
  # Main Array of rules (Proc or Regexp). Do not delete or add the contents, as it would have a knock-on effect, especially with {#names}!
93
95
  # Use {#rule_at} to get a rule for the index/key.
94
96
  # The private method {#rule_at}(-1) is the same as {#rules}[-1],
@@ -283,7 +285,8 @@ module PlainText
283
285
  # @param index_rules [Integer] Index for {#rules}. A negative index is allowed.
284
286
  # @return [Integer] Non-negative index where name is set; i.e., if index=-1 is specified for {#rules} with a size of 3, the returned value is 2 (the last index of it).
285
287
  def set_name_at(name, index_rules)
286
- index = PlainText::Util.positive_array_index_checked(index_rules, @rules, accept_too_big: false, varname: 'rules')
288
+ index = positive_array_index_checked(index_rules, @rules, accept_too_big: false, varname: 'rules')
289
+ # index = PlainText::Util.positive_array_index_checked(index_rules, @rules, accept_too_big: false, varname: 'rules')
287
290
  if !name
288
291
  @names[index] = nil
289
292
  return index