reverse_asciidoctor 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile +2 -0
- data/README.adoc +68 -55
- data/bin/reverse_asciidoctor +1 -0
- data/bin/w2a +11 -2
- data/lib/reverse_asciidoctor/cleaner.rb +5 -0
- data/lib/reverse_asciidoctor/config.rb +6 -1
- data/lib/reverse_asciidoctor/converters/math.rb +8 -2
- data/lib/reverse_asciidoctor/version.rb +1 -1
- data/reverse_asciidoctor.gemspec +1 -0
- metadata +17 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4401fdaa4562099df961e17489d733cfed7aa2b2
|
4
|
+
data.tar.gz: 0dd3c32893e3f3abe2b8a90ee0812966de9b3cbf
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d4e5b592c0a68ac39bf0846f43b8955720ad5457219942f40afe97cc74568d37f8a01072b2d52660923292ea516aa967b42bb2d649586548b1d9c95c74281422
|
7
|
+
data.tar.gz: ca556358cd939540a1c1e745eb2438b78038f4da121a05cce0da35e43ff5f0b8f1e3d44f7cf99eba8448874b65a64335427690bd4f606fe0f04a4feecdd52e8b
|
data/Gemfile
CHANGED
data/README.adoc
CHANGED
@@ -25,12 +25,66 @@ or add it to your Gemfile
|
|
25
25
|
gem 'reverse_asciidoctor'
|
26
26
|
----
|
27
27
|
|
28
|
+
== Usage
|
29
|
+
|
30
|
+
=== Ruby
|
31
|
+
|
32
|
+
You can convert html content as string or Nokogiri document:
|
33
|
+
|
34
|
+
[source,ruby]
|
35
|
+
----
|
36
|
+
input = '<strong>feelings</strong>'
|
37
|
+
result = ReverseAsciidoctor.convert input
|
38
|
+
result.inspect # " *feelings* "
|
39
|
+
----
|
40
|
+
|
41
|
+
=== Commandline
|
42
|
+
|
43
|
+
It's also possible to convert html files to markdown using the binary:
|
44
|
+
|
45
|
+
[source,console]
|
46
|
+
----
|
47
|
+
$ bin/reverse_asciidoctor file.html > file.adoc
|
48
|
+
$ cat file.html | bin/reverse_asciidoctor > file.adoc
|
49
|
+
----
|
50
|
+
|
51
|
+
In addition, the `bin/w2a` script (adapted from the `bin/w2m` script in
|
52
|
+
https://github.com/benbalter/word-to-markdown[Ben Balter's word-to-markdown])
|
53
|
+
extracts HTML from Word docx documents, and converts it to Asciidoc.
|
54
|
+
|
55
|
+
[source,console]
|
56
|
+
----
|
57
|
+
$ bundle exec bin/w2a document.docx > document.adoc
|
58
|
+
----
|
59
|
+
|
60
|
+
The script presumes that LibreOffice has already been installed: it uses LibreOffice's
|
61
|
+
export to XHTML. LibreOffice's export of XHTML is superior to the native Microsoft Word export
|
62
|
+
to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML.
|
63
|
+
On the other hand, the LibreOffice export relies on default styling being used in the
|
64
|
+
document, and it may not cope with ordered lists or headings with customised appearance.
|
65
|
+
For best results, reset the styles in the document you're converting to those in
|
66
|
+
the default Normal.dot template.
|
67
|
+
|
68
|
+
If you wish to convert the MathML in the document to AsciiMath, run the script with the
|
69
|
+
`--mathml2asciimath` option:
|
70
|
+
|
71
|
+
[source,console]
|
72
|
+
----
|
73
|
+
$ bundle exec bin/w2a --mathml2asciimath document.docx > document.adoc
|
74
|
+
----
|
75
|
+
|
76
|
+
Note that some information in OOMML is not preserved in the export to MathML from LibreOffice;
|
77
|
+
in particular, font shifts such as double-struck fonts.
|
78
|
+
|
79
|
+
Note that the LibreOffice exporter does seem to drop some text (possibly associated with
|
80
|
+
MathML); use with caution.
|
81
|
+
|
28
82
|
== Features
|
29
83
|
|
30
84
|
As a port of reverse_markdown, reverse_asciidoctor shares its features:
|
31
85
|
|
32
86
|
* Module based - if you miss a tag, just add it
|
33
|
-
* Can deal with nested lists
|
87
|
+
* Can deal with nested lists
|
34
88
|
* Inline and block code is supported
|
35
89
|
* Supports blockquote
|
36
90
|
|
@@ -43,7 +97,7 @@ It supports the following html tags supported by reverse_markdown:
|
|
43
97
|
* `div`, `article`
|
44
98
|
* `em`, `i` (added: `cite`)
|
45
99
|
* `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `hr`
|
46
|
-
* `img`
|
100
|
+
* `img`
|
47
101
|
* `li`, `ol`, `ul` (added: `dir`)
|
48
102
|
* `p`, `pre`
|
49
103
|
* `strong`, `b`
|
@@ -60,7 +114,7 @@ In addition, it supports:
|
|
60
114
|
|
61
115
|
* `aside`
|
62
116
|
* `audio`, `video` (with `@src` attributes)
|
63
|
-
* `figure`, `figcaption`
|
117
|
+
* `figure`, `figcaption`
|
64
118
|
* `mark`
|
65
119
|
* `q`
|
66
120
|
* `sub`, `sup`
|
@@ -77,19 +131,15 @@ It also supports MathML... sort of.
|
|
77
131
|
|
78
132
|
* Asciidoctor supports AsciiMath and LaTeX for stem expressions. HTML uses MathML.
|
79
133
|
The gem will recognise MathML expressions in HTML, and will wrap them in Asciidoctor
|
80
|
-
`stem:[ ]` macros. The result of this gem is not actually legal Asciidoctor for stem:
|
134
|
+
`stem:[ ]` macros. The result of this gem is not actually legal Asciidoctor for stem:
|
81
135
|
Asciidoctor will presumably
|
82
136
|
think this is AsciiMath in the `stem:[ ]` macro, try to pass it into MathJax as
|
83
137
|
AsciiMath, and fail. But of course, MathJax has no problem with MathML, and some postprocessing
|
84
138
|
on the Asciidoctor output can ensure that the MathML is treated by MathJax (or whatever else
|
85
139
|
uses the output) as such; so this is still much better than nothing for stem processing.
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
when MathJax is entirely happy with MathML anyway.
|
90
|
-
** https://github.com/transpect/mml2tex looks rather more robust, and is also used
|
91
|
-
to export Word documents and their OOMML to LaTeX via MathML. But we'd still rather
|
92
|
-
keep the MathML in place.
|
140
|
+
** The gem will optionally invoke the https://github.com/riboseinc/mathml2asciimath
|
141
|
+
gem, to convert MathML to AsciiMath. The conversion is not perfect, and will need to be
|
142
|
+
post-edited; but it's a lot better than nothing.
|
93
143
|
|
94
144
|
The gem does not support:
|
95
145
|
|
@@ -105,47 +155,8 @@ The gem does not support:
|
|
105
155
|
* `del`, `ins`
|
106
156
|
* `footer`, `header`, `main`, `nav`, `details`, `section`, `summary`, `template`
|
107
157
|
|
108
|
-
== Usage
|
109
|
-
|
110
|
-
=== Ruby
|
111
|
-
|
112
|
-
You can convert html content as string or Nokogiri document:
|
113
|
-
|
114
|
-
[source,ruby]
|
115
|
-
----
|
116
|
-
input = '<strong>feelings</strong>'
|
117
|
-
result = ReverseAsciidoctor.convert input
|
118
|
-
result.inspect # " *feelings* "
|
119
|
-
----
|
120
|
-
|
121
|
-
=== Commandline
|
122
|
-
|
123
|
-
It's also possible to convert html files to markdown using the binary:
|
124
|
-
|
125
|
-
[source,console]
|
126
|
-
----
|
127
|
-
$ bin/reverse_asciidoctor file.html > file.adoc
|
128
|
-
$ cat file.html | bin/reverse_asciidoctor > file.adoc
|
129
|
-
----
|
130
|
-
|
131
|
-
In addition, the `bin/w2a` script (adapted from the `bin/w2m` script in
|
132
|
-
https://github.com/benbalter/word-to-markdown[Ben Balter's word-to-markdown])
|
133
|
-
extracts HTML from Word docx documents, and converts it to Asciidoc.
|
134
|
-
|
135
|
-
[source,console]
|
136
|
-
----
|
137
|
-
$ bundle exec bin/w2a document.docx > document.adoc
|
138
|
-
----
|
139
|
-
|
140
|
-
The script presumes that LibreOffice has already been installed: it uses LibreOffice's
|
141
|
-
export to XHTML. LibreOffice's export of XHTML is superior to the native Microsoft Word export
|
142
|
-
to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML.
|
143
|
-
On the other hand, the LibreOffice export relies on default styling being used in the
|
144
|
-
document, and it may not cope with ordered lists or headings with customised appearance.
|
145
|
-
For best results, reset the styles in the document you're converting to those in
|
146
|
-
the default Normal.dot template.
|
147
158
|
|
148
|
-
|
159
|
+
== Configuration
|
149
160
|
|
150
161
|
The following options are available:
|
151
162
|
|
@@ -157,17 +168,19 @@ The following options are available:
|
|
157
168
|
* `tag_border` (default `' '`) - how to handle tag borders. valid options are:
|
158
169
|
** `' '` - Add whitespace if there is none at tag borders.
|
159
170
|
** `''` - Do not not add whitespace.
|
171
|
+
* `mathml2asciimath` - if `true`, will use the https://github.com/riboseinc/mathml2asciimath gem
|
172
|
+
to convert MathML to AsciiMath
|
160
173
|
|
161
|
-
|
174
|
+
=== As options
|
162
175
|
|
163
176
|
Just pass your chosen configuration options in after the input. The given options will last for this operation only.
|
164
177
|
|
165
178
|
[source,ruby]
|
166
179
|
----
|
167
|
-
ReverseAsciidoctor.convert(input, unknown_tags: :raise)
|
180
|
+
ReverseAsciidoctor.convert(input, unknown_tags: :raise, mathml2asciimath: true)
|
168
181
|
----
|
169
182
|
|
170
|
-
|
183
|
+
=== Preconfigure
|
171
184
|
|
172
185
|
Or configure it block style on a initializer level. These configurations will last for all conversions until they are set to something different.
|
173
186
|
|
@@ -175,7 +188,7 @@ Or configure it block style on a initializer level. These configurations will la
|
|
175
188
|
----
|
176
189
|
ReverseAsciidoctor.config do |config|
|
177
190
|
config.unknown_tags = :bypass
|
178
|
-
config.
|
191
|
+
config.mathml2asciimath = true
|
179
192
|
config.tag_border = ''
|
180
193
|
end
|
181
194
|
----
|
data/bin/reverse_asciidoctor
CHANGED
@@ -9,6 +9,7 @@ OptionParser.new do |opts|
|
|
9
9
|
opts.banner = "Usage: reverse_asciidoctor [options] <file>"
|
10
10
|
|
11
11
|
opts.on('-u', '--unknown_tags [pass_through, drop, bypass, raise]', 'Unknown tag handling (default: pass_through)') { |v| ReverseMarkdown.config.unknown_tags = v }
|
12
|
+
opts.on('-a', '--mathml2asciimath', 'Convert MathML to AsciiMath') { |v| ReverseMarkdown.config.mathml2asciimath = true }
|
12
13
|
end.parse!
|
13
14
|
|
14
15
|
puts ReverseAsciidoctor.convert(ARGF.read)
|
data/bin/w2a
CHANGED
@@ -12,20 +12,29 @@ def scrub_whitespace(string)
|
|
12
12
|
string.gsub!(/([ ]+)$/, '') # line trailing whitespace
|
13
13
|
string.gsub!(/\n\n\n\n/, "\n\n") # Quadruple line breaks
|
14
14
|
string.delete!(' ') # Unicode non-breaking spaces, injected as tabs
|
15
|
+
# following added by me
|
16
|
+
string.gsub!(%r{<h[1-9][^>]*></h1>}, " ") # I don't know why Libre Office is inserting them, but they need to go
|
17
|
+
string.gsub!(%r{<h1[^>]* style="vertical-align: super;[^>]*>([^<]+)</h1>},
|
18
|
+
"<sup>\\1</sup>") # I absolutely don't know why Libre Office is rendering superscripts as h1
|
15
19
|
string
|
16
20
|
end
|
17
21
|
|
18
|
-
if ARGV.size != 1 || ARGV[0] == '--help'
|
22
|
+
if ARGV.size != 1 && ARGV[0] != "--mathml2asciimath" || ARGV[0] == '--help'
|
19
23
|
puts 'Usage: bundle exec w2m path/to/document.docx'
|
20
24
|
exit 1
|
21
25
|
end
|
22
26
|
|
27
|
+
if ARGV[0] == "--mathml2asciimath" && ARGV[1]
|
28
|
+
ReverseAsciidoctor.config.mathml2asciimath = true
|
29
|
+
ARGV[0] = ARGV[1]
|
30
|
+
end
|
31
|
+
|
23
32
|
if ARGV[0] == '--version'
|
24
33
|
puts "WordToMarkdown v#{WordToMarkdown::VERSION}"
|
25
34
|
puts "LibreOffice v#{WordToMarkdown.soffice.version}" unless Gem.win_platform?
|
26
35
|
else
|
27
36
|
doc = WordToMarkdown.new ARGV[0]
|
28
|
-
# puts doc.
|
37
|
+
# puts scrub_whitespace(doc.document.html)
|
29
38
|
puts ReverseAsciidoctor.convert(scrub_whitespace(doc.document.html), WordToMarkdown::REVERSE_MARKDOWN_OPTIONS)
|
30
39
|
end
|
31
40
|
|
@@ -18,6 +18,11 @@ module ReverseAsciidoctor
|
|
18
18
|
end
|
19
19
|
|
20
20
|
def remove_inner_whitespaces(string)
|
21
|
+
unless string.nil?
|
22
|
+
string.gsub!(/\n stem:\[/, "\nstem:[")
|
23
|
+
string.gsub!(/(stem:\[([^\]]|\\\])*\])\n(?=\S)/, "\\1 ")
|
24
|
+
string.gsub!(/(stem:\[([^\]]|\\\])*\])\s+(?=[\^-])/, "\\1")
|
25
|
+
end
|
21
26
|
string.each_line.inject("") do |memo, line|
|
22
27
|
memo + preserve_border_whitespaces(line) do
|
23
28
|
line.strip.gsub(/[ \t]{2,}/, ' ')
|
@@ -1,9 +1,10 @@
|
|
1
1
|
module ReverseAsciidoctor
|
2
2
|
class Config
|
3
|
-
attr_accessor :unknown_tags, :tag_border
|
3
|
+
attr_accessor :unknown_tags, :tag_border, :mathml2asciimath
|
4
4
|
|
5
5
|
def initialize
|
6
6
|
@unknown_tags = :pass_through
|
7
|
+
@mathml2asciimath = false
|
7
8
|
@em_delimiter = '_'.freeze
|
8
9
|
@strong_delimiter = '*'.freeze
|
9
10
|
@inline_options = {}
|
@@ -21,6 +22,10 @@ module ReverseAsciidoctor
|
|
21
22
|
@inline_options[:unknown_tags] || @unknown_tags
|
22
23
|
end
|
23
24
|
|
25
|
+
def mathml2asciimath
|
26
|
+
@inline_options[:mathml2asciimath] || @mathml2asciimath
|
27
|
+
end
|
28
|
+
|
24
29
|
def tag_border
|
25
30
|
@inline_options[:tag_border] || @tag_border
|
26
31
|
end
|
@@ -1,11 +1,17 @@
|
|
1
|
-
#
|
1
|
+
# Unless run with ReverseAsciidoctor.config.mathml2asciimath,
|
2
|
+
# this is cheating: we're injecting MathML into Asciidoctor, but
|
2
3
|
# Asciidoctor only understands AsciiMath or LaTeX
|
3
4
|
|
5
|
+
require "mathml2asciimath"
|
6
|
+
|
4
7
|
module ReverseAsciidoctor
|
5
8
|
module Converters
|
6
9
|
class Math < Base
|
7
10
|
def convert(node, state = {})
|
8
|
-
|
11
|
+
stem = node.to_s.gsub(/\n/, " ")
|
12
|
+
stem = MathML2AsciiMath.m2a(stem) if ReverseAsciidoctor.config.mathml2asciimath
|
13
|
+
stem = stem.gsub(/\[/, "\\[").gsub(/\]/, "\\]").gsub(/\(\(([^\)]+)\)\)/, "(\\1)") unless stem.nil?
|
14
|
+
" stem:[" << stem << "] "
|
9
15
|
end
|
10
16
|
end
|
11
17
|
|
data/reverse_asciidoctor.gemspec
CHANGED
@@ -19,6 +19,7 @@ Gem::Specification.new do |s|
|
|
19
19
|
|
20
20
|
# specify any dependencies here; for example:
|
21
21
|
s.add_dependency 'nokogiri'
|
22
|
+
s.add_dependency 'mathml2asciimath'
|
22
23
|
s.add_development_dependency 'rspec'
|
23
24
|
s.add_development_dependency 'simplecov'
|
24
25
|
s.add_development_dependency 'rake'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: reverse_asciidoctor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ribose Inc.
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-05-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: mathml2asciimath
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
27
41
|
- !ruby/object:Gem::Dependency
|
28
42
|
name: rspec
|
29
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -229,7 +243,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
229
243
|
version: '0'
|
230
244
|
requirements: []
|
231
245
|
rubyforge_project:
|
232
|
-
rubygems_version: 2.6.
|
246
|
+
rubygems_version: 2.6.12
|
233
247
|
signing_key:
|
234
248
|
specification_version: 4
|
235
249
|
summary: Convert html code into asciidoctor.
|