reverse_asciidoctor 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile +2 -0
- data/README.adoc +68 -55
- data/bin/reverse_asciidoctor +1 -0
- data/bin/w2a +11 -2
- data/lib/reverse_asciidoctor/cleaner.rb +5 -0
- data/lib/reverse_asciidoctor/config.rb +6 -1
- data/lib/reverse_asciidoctor/converters/math.rb +8 -2
- data/lib/reverse_asciidoctor/version.rb +1 -1
- data/reverse_asciidoctor.gemspec +1 -0
- metadata +17 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4401fdaa4562099df961e17489d733cfed7aa2b2
|
4
|
+
data.tar.gz: 0dd3c32893e3f3abe2b8a90ee0812966de9b3cbf
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d4e5b592c0a68ac39bf0846f43b8955720ad5457219942f40afe97cc74568d37f8a01072b2d52660923292ea516aa967b42bb2d649586548b1d9c95c74281422
|
7
|
+
data.tar.gz: ca556358cd939540a1c1e745eb2438b78038f4da121a05cce0da35e43ff5f0b8f1e3d44f7cf99eba8448874b65a64335427690bd4f606fe0f04a4feecdd52e8b
|
data/Gemfile
CHANGED
data/README.adoc
CHANGED
@@ -25,12 +25,66 @@ or add it to your Gemfile
|
|
25
25
|
gem 'reverse_asciidoctor'
|
26
26
|
----
|
27
27
|
|
28
|
+
== Usage
|
29
|
+
|
30
|
+
=== Ruby
|
31
|
+
|
32
|
+
You can convert html content as string or Nokogiri document:
|
33
|
+
|
34
|
+
[source,ruby]
|
35
|
+
----
|
36
|
+
input = '<strong>feelings</strong>'
|
37
|
+
result = ReverseAsciidoctor.convert input
|
38
|
+
result.inspect # " *feelings* "
|
39
|
+
----
|
40
|
+
|
41
|
+
=== Commandline
|
42
|
+
|
43
|
+
It's also possible to convert html files to markdown using the binary:
|
44
|
+
|
45
|
+
[source,console]
|
46
|
+
----
|
47
|
+
$ bin/reverse_asciidoctor file.html > file.adoc
|
48
|
+
$ cat file.html | bin/reverse_asciidoctor > file.adoc
|
49
|
+
----
|
50
|
+
|
51
|
+
In addition, the `bin/w2a` script (adapted from the `bin/w2m` script in
|
52
|
+
https://github.com/benbalter/word-to-markdown[Ben Balter's word-to-markdown])
|
53
|
+
extracts HTML from Word docx documents, and converts it to Asciidoc.
|
54
|
+
|
55
|
+
[source,console]
|
56
|
+
----
|
57
|
+
$ bundle exec bin/w2a document.docx > document.adoc
|
58
|
+
----
|
59
|
+
|
60
|
+
The script presumes that LibreOffice has already been installed: it uses LibreOffice's
|
61
|
+
export to XHTML. LibreOffice's export of XHTML is superior to the native Microsoft Word export
|
62
|
+
to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML.
|
63
|
+
On the other hand, the LibreOffice export relies on default styling being used in the
|
64
|
+
document, and it may not cope with ordered lists or headings with customised appearance.
|
65
|
+
For best results, reset the styles in the document you're converting to those in
|
66
|
+
the default Normal.dot template.
|
67
|
+
|
68
|
+
If you wish to convert the MathML in the document to AsciiMath, run the script with the
|
69
|
+
`--mathml2asciimath` option:
|
70
|
+
|
71
|
+
[source,console]
|
72
|
+
----
|
73
|
+
$ bundle exec bin/w2a --mathml2asciimath document.docx > document.adoc
|
74
|
+
----
|
75
|
+
|
76
|
+
Note that some information in OOMML is not preserved in the export to MathML from LibreOffice;
|
77
|
+
in particular, font shifts such as double-struck fonts.
|
78
|
+
|
79
|
+
Note that the LibreOffice exporter does seem to drop some text (possibly associated with
|
80
|
+
MathML); use with caution.
|
81
|
+
|
28
82
|
== Features
|
29
83
|
|
30
84
|
As a port of reverse_markdown, reverse_asciidoctor shares its features:
|
31
85
|
|
32
86
|
* Module based - if you miss a tag, just add it
|
33
|
-
* Can deal with nested lists
|
87
|
+
* Can deal with nested lists
|
34
88
|
* Inline and block code is supported
|
35
89
|
* Supports blockquote
|
36
90
|
|
@@ -43,7 +97,7 @@ It supports the following html tags supported by reverse_markdown:
|
|
43
97
|
* `div`, `article`
|
44
98
|
* `em`, `i` (added: `cite`)
|
45
99
|
* `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `hr`
|
46
|
-
* `img`
|
100
|
+
* `img`
|
47
101
|
* `li`, `ol`, `ul` (added: `dir`)
|
48
102
|
* `p`, `pre`
|
49
103
|
* `strong`, `b`
|
@@ -60,7 +114,7 @@ In addition, it supports:
|
|
60
114
|
|
61
115
|
* `aside`
|
62
116
|
* `audio`, `video` (with `@src` attributes)
|
63
|
-
* `figure`, `figcaption`
|
117
|
+
* `figure`, `figcaption`
|
64
118
|
* `mark`
|
65
119
|
* `q`
|
66
120
|
* `sub`, `sup`
|
@@ -77,19 +131,15 @@ It also supports MathML... sort of.
|
|
77
131
|
|
78
132
|
* Asciidoctor supports AsciiMath and LaTeX for stem expressions. HTML uses MathML.
|
79
133
|
The gem will recognise MathML expressions in HTML, and will wrap them in Asciidoctor
|
80
|
-
`stem:[ ]` macros. The result of this gem is not actually legal Asciidoctor for stem:
|
134
|
+
`stem:[ ]` macros. The result of this gem is not actually legal Asciidoctor for stem:
|
81
135
|
Asciidoctor will presumably
|
82
136
|
think this is AsciiMath in the `stem:[ ]` macro, try to pass it into MathJax as
|
83
137
|
AsciiMath, and fail. But of course, MathJax has no problem with MathML, and some postprocessing
|
84
138
|
on the Asciidoctor output can ensure that the MathML is treated by MathJax (or whatever else
|
85
139
|
uses the output) as such; so this is still much better than nothing for stem processing.
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
when MathJax is entirely happy with MathML anyway.
|
90
|
-
** https://github.com/transpect/mml2tex looks rather more robust, and is also used
|
91
|
-
to export Word documents and their OOMML to LaTeX via MathML. But we'd still rather
|
92
|
-
keep the MathML in place.
|
140
|
+
** The gem will optionally invoke the https://github.com/riboseinc/mathml2asciimath
|
141
|
+
gem, to convert MathML to AsciiMath. The conversion is not perfect, and will need to be
|
142
|
+
post-edited; but it's a lot better than nothing.
|
93
143
|
|
94
144
|
The gem does not support:
|
95
145
|
|
@@ -105,47 +155,8 @@ The gem does not support:
|
|
105
155
|
* `del`, `ins`
|
106
156
|
* `footer`, `header`, `main`, `nav`, `details`, `section`, `summary`, `template`
|
107
157
|
|
108
|
-
== Usage
|
109
|
-
|
110
|
-
=== Ruby
|
111
|
-
|
112
|
-
You can convert html content as string or Nokogiri document:
|
113
|
-
|
114
|
-
[source,ruby]
|
115
|
-
----
|
116
|
-
input = '<strong>feelings</strong>'
|
117
|
-
result = ReverseAsciidoctor.convert input
|
118
|
-
result.inspect # " *feelings* "
|
119
|
-
----
|
120
|
-
|
121
|
-
=== Commandline
|
122
|
-
|
123
|
-
It's also possible to convert html files to markdown using the binary:
|
124
|
-
|
125
|
-
[source,console]
|
126
|
-
----
|
127
|
-
$ bin/reverse_asciidoctor file.html > file.adoc
|
128
|
-
$ cat file.html | bin/reverse_asciidoctor > file.adoc
|
129
|
-
----
|
130
|
-
|
131
|
-
In addition, the `bin/w2a` script (adapted from the `bin/w2m` script in
|
132
|
-
https://github.com/benbalter/word-to-markdown[Ben Balter's word-to-markdown])
|
133
|
-
extracts HTML from Word docx documents, and converts it to Asciidoc.
|
134
|
-
|
135
|
-
[source,console]
|
136
|
-
----
|
137
|
-
$ bundle exec bin/w2a document.docx > document.adoc
|
138
|
-
----
|
139
|
-
|
140
|
-
The script presumes that LibreOffice has already been installed: it uses LibreOffice's
|
141
|
-
export to XHTML. LibreOffice's export of XHTML is superior to the native Microsoft Word export
|
142
|
-
to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML.
|
143
|
-
On the other hand, the LibreOffice export relies on default styling being used in the
|
144
|
-
document, and it may not cope with ordered lists or headings with customised appearance.
|
145
|
-
For best results, reset the styles in the document you're converting to those in
|
146
|
-
the default Normal.dot template.
|
147
158
|
|
148
|
-
|
159
|
+
== Configuration
|
149
160
|
|
150
161
|
The following options are available:
|
151
162
|
|
@@ -157,17 +168,19 @@ The following options are available:
|
|
157
168
|
* `tag_border` (default `' '`) - how to handle tag borders. valid options are:
|
158
169
|
** `' '` - Add whitespace if there is none at tag borders.
|
159
170
|
** `''` - Do not not add whitespace.
|
171
|
+
* `mathml2asciimath` - if `true`, will use the https://github.com/riboseinc/mathml2asciimath gem
|
172
|
+
to convert MathML to AsciiMath
|
160
173
|
|
161
|
-
|
174
|
+
=== As options
|
162
175
|
|
163
176
|
Just pass your chosen configuration options in after the input. The given options will last for this operation only.
|
164
177
|
|
165
178
|
[source,ruby]
|
166
179
|
----
|
167
|
-
ReverseAsciidoctor.convert(input, unknown_tags: :raise)
|
180
|
+
ReverseAsciidoctor.convert(input, unknown_tags: :raise, mathml2asciimath: true)
|
168
181
|
----
|
169
182
|
|
170
|
-
|
183
|
+
=== Preconfigure
|
171
184
|
|
172
185
|
Or configure it block style on a initializer level. These configurations will last for all conversions until they are set to something different.
|
173
186
|
|
@@ -175,7 +188,7 @@ Or configure it block style on a initializer level. These configurations will la
|
|
175
188
|
----
|
176
189
|
ReverseAsciidoctor.config do |config|
|
177
190
|
config.unknown_tags = :bypass
|
178
|
-
config.
|
191
|
+
config.mathml2asciimath = true
|
179
192
|
config.tag_border = ''
|
180
193
|
end
|
181
194
|
----
|
data/bin/reverse_asciidoctor
CHANGED
@@ -9,6 +9,7 @@ OptionParser.new do |opts|
|
|
9
9
|
opts.banner = "Usage: reverse_asciidoctor [options] <file>"
|
10
10
|
|
11
11
|
opts.on('-u', '--unknown_tags [pass_through, drop, bypass, raise]', 'Unknown tag handling (default: pass_through)') { |v| ReverseMarkdown.config.unknown_tags = v }
|
12
|
+
opts.on('-a', '--mathml2asciimath', 'Convert MathML to AsciiMath') { |v| ReverseMarkdown.config.mathml2asciimath = true }
|
12
13
|
end.parse!
|
13
14
|
|
14
15
|
puts ReverseAsciidoctor.convert(ARGF.read)
|
data/bin/w2a
CHANGED
@@ -12,20 +12,29 @@ def scrub_whitespace(string)
|
|
12
12
|
string.gsub!(/([ ]+)$/, '') # line trailing whitespace
|
13
13
|
string.gsub!(/\n\n\n\n/, "\n\n") # Quadruple line breaks
|
14
14
|
string.delete!(' ') # Unicode non-breaking spaces, injected as tabs
|
15
|
+
# following added by me
|
16
|
+
string.gsub!(%r{<h[1-9][^>]*></h1>}, " ") # I don't know why Libre Office is inserting them, but they need to go
|
17
|
+
string.gsub!(%r{<h1[^>]* style="vertical-align: super;[^>]*>([^<]+)</h1>},
|
18
|
+
"<sup>\\1</sup>") # I absolutely don't know why Libre Office is rendering superscripts as h1
|
15
19
|
string
|
16
20
|
end
|
17
21
|
|
18
|
-
if ARGV.size != 1 || ARGV[0] == '--help'
|
22
|
+
if ARGV.size != 1 && ARGV[0] != "--mathml2asciimath" || ARGV[0] == '--help'
|
19
23
|
puts 'Usage: bundle exec w2m path/to/document.docx'
|
20
24
|
exit 1
|
21
25
|
end
|
22
26
|
|
27
|
+
if ARGV[0] == "--mathml2asciimath" && ARGV[1]
|
28
|
+
ReverseAsciidoctor.config.mathml2asciimath = true
|
29
|
+
ARGV[0] = ARGV[1]
|
30
|
+
end
|
31
|
+
|
23
32
|
if ARGV[0] == '--version'
|
24
33
|
puts "WordToMarkdown v#{WordToMarkdown::VERSION}"
|
25
34
|
puts "LibreOffice v#{WordToMarkdown.soffice.version}" unless Gem.win_platform?
|
26
35
|
else
|
27
36
|
doc = WordToMarkdown.new ARGV[0]
|
28
|
-
# puts doc.
|
37
|
+
# puts scrub_whitespace(doc.document.html)
|
29
38
|
puts ReverseAsciidoctor.convert(scrub_whitespace(doc.document.html), WordToMarkdown::REVERSE_MARKDOWN_OPTIONS)
|
30
39
|
end
|
31
40
|
|
@@ -18,6 +18,11 @@ module ReverseAsciidoctor
|
|
18
18
|
end
|
19
19
|
|
20
20
|
def remove_inner_whitespaces(string)
|
21
|
+
unless string.nil?
|
22
|
+
string.gsub!(/\n stem:\[/, "\nstem:[")
|
23
|
+
string.gsub!(/(stem:\[([^\]]|\\\])*\])\n(?=\S)/, "\\1 ")
|
24
|
+
string.gsub!(/(stem:\[([^\]]|\\\])*\])\s+(?=[\^-])/, "\\1")
|
25
|
+
end
|
21
26
|
string.each_line.inject("") do |memo, line|
|
22
27
|
memo + preserve_border_whitespaces(line) do
|
23
28
|
line.strip.gsub(/[ \t]{2,}/, ' ')
|
@@ -1,9 +1,10 @@
|
|
1
1
|
module ReverseAsciidoctor
|
2
2
|
class Config
|
3
|
-
attr_accessor :unknown_tags, :tag_border
|
3
|
+
attr_accessor :unknown_tags, :tag_border, :mathml2asciimath
|
4
4
|
|
5
5
|
def initialize
|
6
6
|
@unknown_tags = :pass_through
|
7
|
+
@mathml2asciimath = false
|
7
8
|
@em_delimiter = '_'.freeze
|
8
9
|
@strong_delimiter = '*'.freeze
|
9
10
|
@inline_options = {}
|
@@ -21,6 +22,10 @@ module ReverseAsciidoctor
|
|
21
22
|
@inline_options[:unknown_tags] || @unknown_tags
|
22
23
|
end
|
23
24
|
|
25
|
+
def mathml2asciimath
|
26
|
+
@inline_options[:mathml2asciimath] || @mathml2asciimath
|
27
|
+
end
|
28
|
+
|
24
29
|
def tag_border
|
25
30
|
@inline_options[:tag_border] || @tag_border
|
26
31
|
end
|
@@ -1,11 +1,17 @@
|
|
1
|
-
#
|
1
|
+
# Unless run with ReverseAsciidoctor.config.mathml2asciimath,
|
2
|
+
# this is cheating: we're injecting MathML into Asciidoctor, but
|
2
3
|
# Asciidoctor only understands AsciiMath or LaTeX
|
3
4
|
|
5
|
+
require "mathml2asciimath"
|
6
|
+
|
4
7
|
module ReverseAsciidoctor
|
5
8
|
module Converters
|
6
9
|
class Math < Base
|
7
10
|
def convert(node, state = {})
|
8
|
-
|
11
|
+
stem = node.to_s.gsub(/\n/, " ")
|
12
|
+
stem = MathML2AsciiMath.m2a(stem) if ReverseAsciidoctor.config.mathml2asciimath
|
13
|
+
stem = stem.gsub(/\[/, "\\[").gsub(/\]/, "\\]").gsub(/\(\(([^\)]+)\)\)/, "(\\1)") unless stem.nil?
|
14
|
+
" stem:[" << stem << "] "
|
9
15
|
end
|
10
16
|
end
|
11
17
|
|
data/reverse_asciidoctor.gemspec
CHANGED
@@ -19,6 +19,7 @@ Gem::Specification.new do |s|
|
|
19
19
|
|
20
20
|
# specify any dependencies here; for example:
|
21
21
|
s.add_dependency 'nokogiri'
|
22
|
+
s.add_dependency 'mathml2asciimath'
|
22
23
|
s.add_development_dependency 'rspec'
|
23
24
|
s.add_development_dependency 'simplecov'
|
24
25
|
s.add_development_dependency 'rake'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: reverse_asciidoctor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ribose Inc.
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-05-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: mathml2asciimath
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
27
41
|
- !ruby/object:Gem::Dependency
|
28
42
|
name: rspec
|
29
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -229,7 +243,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
229
243
|
version: '0'
|
230
244
|
requirements: []
|
231
245
|
rubyforge_project:
|
232
|
-
rubygems_version: 2.6.
|
246
|
+
rubygems_version: 2.6.12
|
233
247
|
signing_key:
|
234
248
|
specification_version: 4
|
235
249
|
summary: Convert html code into asciidoctor.
|