reverse_asciidoctor 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 14f4831267dd9a4738994bd00b46eb45b4142441
4
- data.tar.gz: ff0f1af49f22bbf573f02b0d3df20d725ab95df3
3
+ metadata.gz: 4401fdaa4562099df961e17489d733cfed7aa2b2
4
+ data.tar.gz: 0dd3c32893e3f3abe2b8a90ee0812966de9b3cbf
5
5
  SHA512:
6
- metadata.gz: 2f9b24155a4091c855f8035353c1a192d339f2563e75fcfc93f52858462efc850a7ddeed813bbbc70c57524ae722ed19cba44401c808ac9c72fa8a776f968b84
7
- data.tar.gz: '087d8216c45fffc860af770aff3ad1f81a473ba7c8e5b92bfbf9db9b60b59fb6f9bae6fe58b6a14756e7514232594363b9da168cb835e10ab5b836f048342183'
6
+ metadata.gz: d4e5b592c0a68ac39bf0846f43b8955720ad5457219942f40afe97cc74568d37f8a01072b2d52660923292ea516aa967b42bb2d649586548b1d9c95c74281422
7
+ data.tar.gz: ca556358cd939540a1c1e745eb2438b78038f4da121a05cce0da35e43ff5f0b8f1e3d44f7cf99eba8448874b65a64335427690bd4f606fe0f04a4feecdd52e8b
data/Gemfile CHANGED
@@ -1,4 +1,6 @@
1
1
  source "http://rubygems.org"
2
2
 
3
+ gem "mathml2asciimath", git: "https://github.com/riboseinc/mathml2asciimath"
4
+
3
5
  # Specify your gem's dependencies in reverse_markdown.gemspec
4
6
  gemspec
@@ -25,12 +25,66 @@ or add it to your Gemfile
25
25
  gem 'reverse_asciidoctor'
26
26
  ----
27
27
 
28
+ == Usage
29
+
30
+ === Ruby
31
+
32
+ You can convert html content as string or Nokogiri document:
33
+
34
+ [source,ruby]
35
+ ----
36
+ input = '<strong>feelings</strong>'
37
+ result = ReverseAsciidoctor.convert input
38
+ result.inspect # " *feelings* "
39
+ ----
40
+
41
+ === Commandline
42
+
43
+ It's also possible to convert html files to markdown using the binary:
44
+
45
+ [source,console]
46
+ ----
47
+ $ bin/reverse_asciidoctor file.html > file.adoc
48
+ $ cat file.html | bin/reverse_asciidoctor > file.adoc
49
+ ----
50
+
51
+ In addition, the `bin/w2a` script (adapted from the `bin/w2m` script in
52
+ https://github.com/benbalter/word-to-markdown[Ben Balter's word-to-markdown])
53
+ extracts HTML from Word docx documents, and converts it to Asciidoc.
54
+
55
+ [source,console]
56
+ ----
57
+ $ bundle exec bin/w2a document.docx > document.adoc
58
+ ----
59
+
60
+ The script presumes that LibreOffice has already been installed: it uses LibreOffice's
61
+ export to XHTML. LibreOffice's export of XHTML is superior to the native Microsoft Word export
62
+ to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML.
63
+ On the other hand, the LibreOffice export relies on default styling being used in the
64
+ document, and it may not cope with ordered lists or headings with customised appearance.
65
+ For best results, reset the styles in the document you're converting to those in
66
+ the default Normal.dot template.
67
+
68
+ If you wish to convert the MathML in the document to AsciiMath, run the script with the
69
+ `--mathml2asciimath` option:
70
+
71
+ [source,console]
72
+ ----
73
+ $ bundle exec bin/w2a --mathml2asciimath document.docx > document.adoc
74
+ ----
75
+
76
+ Note that some information in OOMML is not preserved in the export to MathML from LibreOffice;
77
+ in particular, font shifts such as double-struck fonts.
78
+
79
+ Note that the LibreOffice exporter does seem to drop some text (possibly associated with
80
+ MathML); use with caution.
81
+
28
82
  == Features
29
83
 
30
84
  As a port of reverse_markdown, reverse_asciidoctor shares its features:
31
85
 
32
86
  * Module based - if you miss a tag, just add it
33
- * Can deal with nested lists
87
+ * Can deal with nested lists
34
88
  * Inline and block code is supported
35
89
  * Supports blockquote
36
90
 
@@ -43,7 +97,7 @@ It supports the following html tags supported by reverse_markdown:
43
97
  * `div`, `article`
44
98
  * `em`, `i` (added: `cite`)
45
99
  * `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `hr`
46
- * `img`
100
+ * `img`
47
101
  * `li`, `ol`, `ul` (added: `dir`)
48
102
  * `p`, `pre`
49
103
  * `strong`, `b`
@@ -60,7 +114,7 @@ In addition, it supports:
60
114
 
61
115
  * `aside`
62
116
  * `audio`, `video` (with `@src` attributes)
63
- * `figure`, `figcaption`
117
+ * `figure`, `figcaption`
64
118
  * `mark`
65
119
  * `q`
66
120
  * `sub`, `sup`
@@ -77,19 +131,15 @@ It also supports MathML... sort of.
77
131
 
78
132
  * Asciidoctor supports AsciiMath and LaTeX for stem expressions. HTML uses MathML.
79
133
  The gem will recognise MathML expressions in HTML, and will wrap them in Asciidoctor
80
- `stem:[ ]` macros. The result of this gem is not actually legal Asciidoctor for stem:
134
+ `stem:[ ]` macros. The result of this gem is not actually legal Asciidoctor for stem:
81
135
  Asciidoctor will presumably
82
136
  think this is AsciiMath in the `stem:[ ]` macro, try to pass it into MathJax as
83
137
  AsciiMath, and fail. But of course, MathJax has no problem with MathML, and some postprocessing
84
138
  on the Asciidoctor output can ensure that the MathML is treated by MathJax (or whatever else
85
139
  uses the output) as such; so this is still much better than nothing for stem processing.
86
- * An alternative would be to attempt to map MathML to either LaTeX or AsciiMath.
87
- ** The self-description of https://github.com/learningobjectsinc/mathml-to-asciimath
88
- ("subset"... "this module is not: comprehensive, performant") does not recommend it,
89
- when MathJax is entirely happy with MathML anyway.
90
- ** https://github.com/transpect/mml2tex looks rather more robust, and is also used
91
- to export Word documents and their OOMML to LaTeX via MathML. But we'd still rather
92
- keep the MathML in place.
140
+ ** The gem will optionally invoke the https://github.com/riboseinc/mathml2asciimath
141
+ gem, to convert MathML to AsciiMath. The conversion is not perfect, and will need to be
142
+ post-edited; but it's a lot better than nothing.
93
143
 
94
144
  The gem does not support:
95
145
 
@@ -105,47 +155,8 @@ The gem does not support:
105
155
  * `del`, `ins`
106
156
  * `footer`, `header`, `main`, `nav`, `details`, `section`, `summary`, `template`
107
157
 
108
- == Usage
109
-
110
- === Ruby
111
-
112
- You can convert html content as string or Nokogiri document:
113
-
114
- [source,ruby]
115
- ----
116
- input = '<strong>feelings</strong>'
117
- result = ReverseAsciidoctor.convert input
118
- result.inspect # " *feelings* "
119
- ----
120
-
121
- === Commandline
122
-
123
- It's also possible to convert html files to markdown using the binary:
124
-
125
- [source,console]
126
- ----
127
- $ bin/reverse_asciidoctor file.html > file.adoc
128
- $ cat file.html | bin/reverse_asciidoctor > file.adoc
129
- ----
130
-
131
- In addition, the `bin/w2a` script (adapted from the `bin/w2m` script in
132
- https://github.com/benbalter/word-to-markdown[Ben Balter's word-to-markdown])
133
- extracts HTML from Word docx documents, and converts it to Asciidoc.
134
-
135
- [source,console]
136
- ----
137
- $ bundle exec bin/w2a document.docx > document.adoc
138
- ----
139
-
140
- The script presumes that LibreOffice has already been installed: it uses LibreOffice's
141
- export to XHTML. LibreOffice's export of XHTML is superior to the native Microsoft Word export
142
- to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML.
143
- On the other hand, the LibreOffice export relies on default styling being used in the
144
- document, and it may not cope with ordered lists or headings with customised appearance.
145
- For best results, reset the styles in the document you're converting to those in
146
- the default Normal.dot template.
147
158
 
148
- === Configuration
159
+ == Configuration
149
160
 
150
161
  The following options are available:
151
162
 
@@ -157,17 +168,19 @@ The following options are available:
157
168
  * `tag_border` (default `' '`) - how to handle tag borders. valid options are:
158
169
  ** `' '` - Add whitespace if there is none at tag borders.
159
170
  ** `''` - Do not not add whitespace.
171
+ * `mathml2asciimath` - if `true`, will use the https://github.com/riboseinc/mathml2asciimath gem
172
+ to convert MathML to AsciiMath
160
173
 
161
- ==== As options
174
+ === As options
162
175
 
163
176
  Just pass your chosen configuration options in after the input. The given options will last for this operation only.
164
177
 
165
178
  [source,ruby]
166
179
  ----
167
- ReverseAsciidoctor.convert(input, unknown_tags: :raise)
180
+ ReverseAsciidoctor.convert(input, unknown_tags: :raise, mathml2asciimath: true)
168
181
  ----
169
182
 
170
- ==== Preconfigure
183
+ === Preconfigure
171
184
 
172
185
  Or configure it block style on a initializer level. These configurations will last for all conversions until they are set to something different.
173
186
 
@@ -175,7 +188,7 @@ Or configure it block style on a initializer level. These configurations will la
175
188
  ----
176
189
  ReverseAsciidoctor.config do |config|
177
190
  config.unknown_tags = :bypass
178
- config.github_flavored = true
191
+ config.mathml2asciimath = true
179
192
  config.tag_border = ''
180
193
  end
181
194
  ----
@@ -9,6 +9,7 @@ OptionParser.new do |opts|
9
9
  opts.banner = "Usage: reverse_asciidoctor [options] <file>"
10
10
 
11
11
  opts.on('-u', '--unknown_tags [pass_through, drop, bypass, raise]', 'Unknown tag handling (default: pass_through)') { |v| ReverseMarkdown.config.unknown_tags = v }
12
+ opts.on('-a', '--mathml2asciimath', 'Convert MathML to AsciiMath') { |v| ReverseMarkdown.config.mathml2asciimath = true }
12
13
  end.parse!
13
14
 
14
15
  puts ReverseAsciidoctor.convert(ARGF.read)
data/bin/w2a CHANGED
@@ -12,20 +12,29 @@ def scrub_whitespace(string)
12
12
  string.gsub!(/([ ]+)$/, '') # line trailing whitespace
13
13
  string.gsub!(/\n\n\n\n/, "\n\n") # Quadruple line breaks
14
14
  string.delete!(' ') # Unicode non-breaking spaces, injected as tabs
15
+ # following added by me
16
+ string.gsub!(%r{<h[1-9][^>]*></h1>}, " ") # I don't know why Libre Office is inserting them, but they need to go
17
+ string.gsub!(%r{<h1[^>]* style="vertical-align: super;[^>]*>([^<]+)</h1>},
18
+ "<sup>\\1</sup>") # I absolutely don't know why Libre Office is rendering superscripts as h1
15
19
  string
16
20
  end
17
21
 
18
- if ARGV.size != 1 || ARGV[0] == '--help'
22
+ if ARGV.size != 1 && ARGV[0] != "--mathml2asciimath" || ARGV[0] == '--help'
19
23
  puts 'Usage: bundle exec w2m path/to/document.docx'
20
24
  exit 1
21
25
  end
22
26
 
27
+ if ARGV[0] == "--mathml2asciimath" && ARGV[1]
28
+ ReverseAsciidoctor.config.mathml2asciimath = true
29
+ ARGV[0] = ARGV[1]
30
+ end
31
+
23
32
  if ARGV[0] == '--version'
24
33
  puts "WordToMarkdown v#{WordToMarkdown::VERSION}"
25
34
  puts "LibreOffice v#{WordToMarkdown.soffice.version}" unless Gem.win_platform?
26
35
  else
27
36
  doc = WordToMarkdown.new ARGV[0]
28
- # puts doc.to_s
37
+ # puts scrub_whitespace(doc.document.html)
29
38
  puts ReverseAsciidoctor.convert(scrub_whitespace(doc.document.html), WordToMarkdown::REVERSE_MARKDOWN_OPTIONS)
30
39
  end
31
40
 
@@ -18,6 +18,11 @@ module ReverseAsciidoctor
18
18
  end
19
19
 
20
20
  def remove_inner_whitespaces(string)
21
+ unless string.nil?
22
+ string.gsub!(/\n stem:\[/, "\nstem:[")
23
+ string.gsub!(/(stem:\[([^\]]|\\\])*\])\n(?=\S)/, "\\1 ")
24
+ string.gsub!(/(stem:\[([^\]]|\\\])*\])\s+(?=[\^-])/, "\\1")
25
+ end
21
26
  string.each_line.inject("") do |memo, line|
22
27
  memo + preserve_border_whitespaces(line) do
23
28
  line.strip.gsub(/[ \t]{2,}/, ' ')
@@ -1,9 +1,10 @@
1
1
  module ReverseAsciidoctor
2
2
  class Config
3
- attr_accessor :unknown_tags, :tag_border
3
+ attr_accessor :unknown_tags, :tag_border, :mathml2asciimath
4
4
 
5
5
  def initialize
6
6
  @unknown_tags = :pass_through
7
+ @mathml2asciimath = false
7
8
  @em_delimiter = '_'.freeze
8
9
  @strong_delimiter = '*'.freeze
9
10
  @inline_options = {}
@@ -21,6 +22,10 @@ module ReverseAsciidoctor
21
22
  @inline_options[:unknown_tags] || @unknown_tags
22
23
  end
23
24
 
25
+ def mathml2asciimath
26
+ @inline_options[:mathml2asciimath] || @mathml2asciimath
27
+ end
28
+
24
29
  def tag_border
25
30
  @inline_options[:tag_border] || @tag_border
26
31
  end
@@ -1,11 +1,17 @@
1
- # This is cheating: we're injecting MathML into Asciidoctor, but
1
+ # Unless run with ReverseAsciidoctor.config.mathml2asciimath,
2
+ # this is cheating: we're injecting MathML into Asciidoctor, but
2
3
  # Asciidoctor only understands AsciiMath or LaTeX
3
4
 
5
+ require "mathml2asciimath"
6
+
4
7
  module ReverseAsciidoctor
5
8
  module Converters
6
9
  class Math < Base
7
10
  def convert(node, state = {})
8
- "stem:[" << node.to_s.gsub(/\n/, " ") << "]"
11
+ stem = node.to_s.gsub(/\n/, " ")
12
+ stem = MathML2AsciiMath.m2a(stem) if ReverseAsciidoctor.config.mathml2asciimath
13
+ stem = stem.gsub(/\[/, "\\[").gsub(/\]/, "\\]").gsub(/\(\(([^\)]+)\)\)/, "(\\1)") unless stem.nil?
14
+ " stem:[" << stem << "] "
9
15
  end
10
16
  end
11
17
 
@@ -1,3 +1,3 @@
1
1
  module ReverseAsciidoctor
2
- VERSION = '0.2.0'
2
+ VERSION = '0.2.1'
3
3
  end
@@ -19,6 +19,7 @@ Gem::Specification.new do |s|
19
19
 
20
20
  # specify any dependencies here; for example:
21
21
  s.add_dependency 'nokogiri'
22
+ s.add_dependency 'mathml2asciimath'
22
23
  s.add_development_dependency 'rspec'
23
24
  s.add_development_dependency 'simplecov'
24
25
  s.add_development_dependency 'rake'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: reverse_asciidoctor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-04-17 00:00:00.000000000 Z
11
+ date: 2018-05-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: mathml2asciimath
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
27
41
  - !ruby/object:Gem::Dependency
28
42
  name: rspec
29
43
  requirement: !ruby/object:Gem::Requirement
@@ -229,7 +243,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
229
243
  version: '0'
230
244
  requirements: []
231
245
  rubyforge_project:
232
- rubygems_version: 2.6.14
246
+ rubygems_version: 2.6.12
233
247
  signing_key:
234
248
  specification_version: 4
235
249
  summary: Convert html code into asciidoctor.