docx2gfm 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +10 -0
- data/Gemfile +6 -0
- data/LICENSE.txt +21 -0
- data/MOTIVATION.md +19 -0
- data/README.md +82 -0
- data/Rakefile +2 -0
- data/bin/console +14 -0
- data/bin/docx2gfm +4 -0
- data/bin/setup +8 -0
- data/docx2gfm.gemspec +36 -0
- data/examples/sample-pure-pandoc.md +103 -0
- data/examples/sample.docx +0 -0
- data/examples/sample.md +112 -0
- data/lib/docx2gfm.rb +59 -0
- data/lib/docx2gfm/assets/front-matter.md +8 -0
- data/lib/docx2gfm/docx_gfm_converter.rb +117 -0
- data/lib/docx2gfm/version.rb +3 -0
- metadata +91 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: d9a2da3b4b09ea75ecb7523a3a01e0b275237239acea98dbc20c1371afa8a3ea
|
4
|
+
data.tar.gz: 35624ce545980e73a9f931be6f94d5fb7e2c642a8854d2c75d1041cc9d3cb9d4
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: '0861e90f7edda3bb08c31f7e5963e3ed7db5866f9bb2e61787aa40a0ee2b48c3ca2d00dab1e44411df635004d2796f1025456979fb7ad46e291d0aec72ad88af'
|
7
|
+
data.tar.gz: e7a5df12dc792018f6c4ff469289e8d5d16cfc4feccfc62a2b507af11163778616ff0c3632f4a9cd398486abb26673abbb78df571c8851d76bd87e7b1d4684e7
|
data/.gitignore
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2019 Sebastian Spier
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/MOTIVATION.md
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
# Motivation
|
2
|
+
|
3
|
+
docx2gfm is a wrapper around [pandoc][pandoc], which does the heavy-lifting of the docx to markdown conversation. So why not use pandoc straight up you may ask?
|
4
|
+
|
5
|
+
The best pandoc configuration that I could find so far is this:
|
6
|
+
|
7
|
+
```
|
8
|
+
pandoc examples/sample.docx --wrap=none --atx-headers --reference-links -f docx -t markdown-bracketed_spans-link_attributes-smart-simple_tables -s
|
9
|
+
```
|
10
|
+
|
11
|
+
It produces the markdown output as shown in [examples/sample-pure-pandoc.md](./examples/sample-pure-pandoc.md).
|
12
|
+
|
13
|
+
While this is pretty good already, this markdown has the following shortcomings:
|
14
|
+
|
15
|
+
* lists have superfluous spaces before each list item
|
16
|
+
* HTML formatting for underlines is created e.g. `<span class="underline">In mattis lectus</span>` => one could use something similar to `sed -e 's/<[^>]*>//g'` to get rid of the HTML. However this will also remove the HTML placeholders for the images, which are good to keep.
|
17
|
+
* less pretty reference-links at the end of the file
|
18
|
+
|
19
|
+
[pandoc]: https://pandoc.org/installing.html
|
data/README.md
ADDED
@@ -0,0 +1,82 @@
|
|
1
|
+
# docx2gfm - docx to github-flavored-markdown converter
|
2
|
+
|
3
|
+
If you need to convert `.docx` documents to markdown, then `docx2gfm` is for you, as it makes the process faster.
|
4
|
+
|
5
|
+
"Don't tell me, show. me"! Ok ok! `docx2gfm` turns [this docx file](./examples/sample.docx) into [this markdown](./examples/sample.md). Also see the original [google Doc][gDoc].
|
6
|
+
|
7
|
+
Some post-processing of the markdown is still required but `docx2gfm` already makes the conversion process much faster.
|
8
|
+
|
9
|
+
## The Long Story
|
10
|
+
|
11
|
+
I am maintaining an engineering blog, that uses [jekyll][jekyll] to generate static pages.
|
12
|
+
|
13
|
+
In our blogging process, the authors write blog post as a Google Doc to collect feedback. Once the post is ready for publishing, they convert the Google Doc to [github-flavored-markdown][gfm], as that is what [jekyll][jekyll] needs as input to render the HTML for the blog.
|
14
|
+
|
15
|
+
We used to do this conversion step manually. This was tedious, boring, and in parts error-prone.
|
16
|
+
|
17
|
+
With `docx2gfm` you can do this conversion quickly, and have more time to write new blog posts ... or drink coffee :)
|
18
|
+
|
19
|
+
Technically `docx2gfm` is a thin wrapper around [pandoc][pandoc]. In [MOTIVATION.md](./MOTIVATION.md) you find more about the technical approach we chose.
|
20
|
+
|
21
|
+
## Installation
|
22
|
+
|
23
|
+
- install ruby
|
24
|
+
- install [pandoc][pandoc]
|
25
|
+
- install this gem: `gem install docx2gfm`
|
26
|
+
|
27
|
+
## Usage
|
28
|
+
|
29
|
+
1. download your Google Doc as a `.docx` file e.g. `my_post.docx` (File >> Download as >> Microsoft Word (.docx))
|
30
|
+
1. convert docx to github-flavored-markdown:
|
31
|
+
|
32
|
+
```
|
33
|
+
docx2gfm -f my_post.docx > my_post.md
|
34
|
+
```
|
35
|
+
|
36
|
+
To learn more about the available options please refer to the built-in help.
|
37
|
+
|
38
|
+
```
|
39
|
+
$ docx2gfm -h
|
40
|
+
|
41
|
+
Usage: docx2gfm [options]
|
42
|
+
-f, --file FILE (required) The .docx file to convert to markdown
|
43
|
+
-j, --[no-]jekyll (optional) Prefix the markdown output with a jekyll frontmatter. Default: --jekyll
|
44
|
+
-r, --[no-]ref-style-links (optional) Create reference style links at the end of the markdown. Default: --ref-style-links
|
45
|
+
-h, --help Display this help screen
|
46
|
+
```
|
47
|
+
|
48
|
+
## Finishing touches for your markdown
|
49
|
+
|
50
|
+
The markdown produced by `docx2gfm` is good but not perfect. You still have to do some manual steps:
|
51
|
+
|
52
|
+
* Adapt the YAML Frontmatter (if you used the `--jekyll` option)
|
53
|
+
* Add the correct image links
|
54
|
+
* Add code blocks
|
55
|
+
* Add quotes
|
56
|
+
* Add tables
|
57
|
+
|
58
|
+
## Alternatives to docx2gfm
|
59
|
+
|
60
|
+
* Word to Markdown Converter: [online](https://word-to-markdown.herokuapp.com/), [source](https://github.com/benbalter/word-to-markdown)
|
61
|
+
* [Writage](http://www.writage.com) - Markdown plugin for Microsoft Word
|
62
|
+
|
63
|
+
## Development
|
64
|
+
|
65
|
+
After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
66
|
+
|
67
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
68
|
+
|
69
|
+
## Contributing
|
70
|
+
|
71
|
+
`docx2gfm` is far from perfect.
|
72
|
+
Bug reports and pull requests are welcome on GitHub at [github.com/spier/docx2gfm](https://github.com/spier/docx2gfm).
|
73
|
+
|
74
|
+
## License
|
75
|
+
|
76
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
77
|
+
|
78
|
+
[uth]: https://underthehood.meltwater.com/
|
79
|
+
[gfm]: https://guides.github.com/features/mastering-markdown/
|
80
|
+
[gDoc]: https://docs.google.com/document/d/16Kww2ic-YgFKskfDxYJu6o_ooSF3IORJh8Ho7XbgngI/edit
|
81
|
+
[pandoc]: https://pandoc.org/installing.html
|
82
|
+
[jekyll]: https://jekyllrb.com
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "docx2gfm"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start(__FILE__)
|
data/bin/docx2gfm
ADDED
data/bin/setup
ADDED
data/docx2gfm.gemspec
ADDED
@@ -0,0 +1,36 @@
|
|
1
|
+
|
2
|
+
lib = File.expand_path("../lib", __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require "docx2gfm/version"
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "docx2gfm"
|
8
|
+
spec.version = Docx2gfm::VERSION
|
9
|
+
spec.authors = ["Sebastian Spier"]
|
10
|
+
spec.email = ["github@spier.hu"]
|
11
|
+
|
12
|
+
spec.summary = "Convert a docx file, to github-flavored-markdown"
|
13
|
+
spec.description = "Convert a docx file, to github-flavored-markdown. thin wrapper around pandoc."
|
14
|
+
spec.homepage = "https://github.com/spier/docx2gfm"
|
15
|
+
spec.license = "MIT"
|
16
|
+
|
17
|
+
# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
|
18
|
+
# to allow pushing to a single host or delete this section to allow pushing to any host.
|
19
|
+
if spec.respond_to?(:metadata)
|
20
|
+
spec.metadata["allowed_push_host"] = "https://rubygems.org"
|
21
|
+
else
|
22
|
+
raise "RubyGems 2.0 or newer is required to protect against " \
|
23
|
+
"public gem pushes."
|
24
|
+
end
|
25
|
+
|
26
|
+
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
27
|
+
f.match(%r{^(test|spec|features)/})
|
28
|
+
end
|
29
|
+
spec.bindir = "bin"
|
30
|
+
spec.require_paths = ["lib"]
|
31
|
+
|
32
|
+
spec.executables << "docx2gfm"
|
33
|
+
|
34
|
+
spec.add_development_dependency "bundler", "~> 1.16"
|
35
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
36
|
+
end
|
@@ -0,0 +1,103 @@
|
|
1
|
+
---
|
2
|
+
title: '<span id="_iq8c5s4280y5" class="anchor"></span>A Google Doc (docx) for testing the markdown conversation with docx2gfm <img src="media/image1.jpg" style="width:1.77604in;height:1.98311in" />'
|
3
|
+
---
|
4
|
+
|
5
|
+
Nulla ante dui, efficitur ut accumsan id, imperdiet ac urna. Duis nec eros non ex posuere scelerisque. Duis non dui quam. Vivamus pretium pretium lacus sit amet volutpat. In sollicitudin massa euismod, consectetur est in, malesuada sem. Pellentesque ullamcorper ligula blandit lacinia cursus. Nunc sit amet quam dapibus, blandit lacus in, pellentesque lacus. Morbi varius est sapien, vel imperdiet turpis varius vitae. Sed laoreet eu magna vel dictum. Nullam eget iaculis nisl, et congue orci. Curabitur hendrerit fermentum sapien fringilla vehicula. Ut rutrum pretium ligula in accumsan. Donec sed facilisis justo.
|
6
|
+
|
7
|
+
## What is this
|
8
|
+
|
9
|
+
The document that you are looking at was produced by exporting [<span class="underline">this Google doc</span>] as a .docx file, and then running it through this experimental [<span class="underline">docx2gfm</span>] converter.
|
10
|
+
|
11
|
+
Let's link to the [<span class="underline">github repo again</span>][<span class="underline">docx2gfm</span>] but with a different link name.
|
12
|
+
|
13
|
+
## Emphasize section
|
14
|
+
|
15
|
+
We **may be Bolder**, we *may be Italian* (ouch!), we ~~may be on Strike~~, or <span class="underline">under the line</span>.
|
16
|
+
|
17
|
+
Not all of these work in [<span class="underline">github-flavored-markdown</span>], but that is ok. Remember it is a markup language, not a [<span class="underline">text formatting language</span>].
|
18
|
+
|
19
|
+
## List section
|
20
|
+
|
21
|
+
- Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
|
22
|
+
|
23
|
+
- With bullets
|
24
|
+
|
25
|
+
- Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [<span class="underline">Mauris</span>] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
|
26
|
+
|
27
|
+
1. Numbered lists are great
|
28
|
+
|
29
|
+
2. If one comes after the …
|
30
|
+
|
31
|
+
3. … other
|
32
|
+
|
33
|
+
- Bullet A
|
34
|
+
|
35
|
+
- Bullet A1
|
36
|
+
|
37
|
+
- Bullet A2
|
38
|
+
|
39
|
+
- Bullet B
|
40
|
+
|
41
|
+
- Bullet B1
|
42
|
+
|
43
|
+
- Bullet B2
|
44
|
+
|
45
|
+
- Bullet C
|
46
|
+
|
47
|
+
- Bullet C1
|
48
|
+
|
49
|
+
- Bullet C1a
|
50
|
+
|
51
|
+
## An image
|
52
|
+
|
53
|
+
## Headline 1<img src="media/image1.jpg" style="width:6.27083in;height:2.3125in" />
|
54
|
+
|
55
|
+
[<span class="underline">Lorem ipsum dolor sit amet</span>], consectetur adipiscing elit. Sed cursus laoreet leo, non tempor libero malesuada a. Donec commodo tempor neque, vitae tempor nibh. Vestibulum scelerisque purus ipsum, ac efficitur ipsum feugiat sit amet. Praesent luctus tincidunt ante at cursus. Etiam dignissim aliquet lacus, eu sodales nunc porttitor in. Nunc ut luctus purus. Donec eu vehicula enim. Vivamus eu hendrerit velit. Vivamus pellentesque facilisis ex molestie cursus. Fusce egestas tellus quis tortor cursus imperdiet. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec facilisis ipsum nulla, eget dapibus lectus interdum vel. Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
|
56
|
+
|
57
|
+
- A list
|
58
|
+
|
59
|
+
- With bullets
|
60
|
+
|
61
|
+
- And even more bullets
|
62
|
+
|
63
|
+
## Headline 2
|
64
|
+
|
65
|
+
Nullam vehicula ac dolor ac ultrices. Integer lacinia urna eu vestibulum tempor. Ut lacus ante, scelerisque consectetur odio blandit, vulputate dapibus nunc. Curabitur mi ex, ullamcorper sit amet diam ac, blandit consectetur tellus. Pellentesque ut eros sit amet enim consequat fermentum at quis orci. In nunc eros, vestibulum mollis posuere non, venenatis a justo. Suspendisse tincidunt, mi vel tincidunt auctor, lacus sem hendrerit ex, eget maximus mi risus eu felis. Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [<span class="underline">Mauris</span>] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
|
66
|
+
|
67
|
+
1. Numbered lists are great
|
68
|
+
|
69
|
+
2. If one comes after the …
|
70
|
+
|
71
|
+
3. … other
|
72
|
+
|
73
|
+
**Ut bibendum turpis ex**, id consequat elit euismod et. Nam lectus arcu, pharetra tincidunt iaculis a, interdum vitae arcu. <span class="underline">Cras ac lectus quis</span> risus laoreet scelerisque. Curabitur metus ex, sagittis sit amet quam vitae, consequat sodales velit. Nulla quis ex urna. Integer et tortor odio. Curabitur euismod feugiat mollis. *Nullam vestibulum tempus feugiat*. Vestibulum porta aliquam mauris non aliquet. Sed aliquam erat ac bibendum lacinia. Nam iaculis ornare lorem, sed consectetur dolor.
|
74
|
+
|
75
|
+
[<span class="underline">In mattis lectus</span>] accumsan diam accumsan, in bibendum justo fringilla. Aenean lacinia aliquam ligula vel semper. Phasellus purus ipsum, condimentum in tellus nec, finibus consectetur elit. Nam fringilla, diam id dignissim sodales, magna elit tempor arcu, at pharetra nisi metus eget purus. Donec mollis ac leo in aliquet. Proin congue congue diam, ac euismod est blandit non. Aliquam ac efficitur eros. Mauris mollis commodo fermentum. Donec mattis sit amet risus nec vestibulum. Donec vel sollicitudixn dolor. Aliquam aliquet neque ac augue consequat porta. Praesent tristique lobortis tincidunt. Etiam dapibus consequat fringilla. Etiam ac orci a nunc rutrum finibus eget a turpis. Duis vitae venenatis magna, ut vulputate turpis. Nunc fringilla non tellus at egestas.
|
76
|
+
|
77
|
+
## Table Tests
|
78
|
+
|
79
|
+
| **TH 1** | **TH 2** | **TH 3** |
|
80
|
+
|----------|----------|----------|
|
81
|
+
| TD 1 | TD 2 | TD 3 |
|
82
|
+
| TD 4 | TD 5 | TD 6 |
|
83
|
+
| TD 7 | TD 8 | TD 9 |
|
84
|
+
|
85
|
+
## Quote!
|
86
|
+
|
87
|
+
Every once in a while you need a great quote. To do so, write your quote and then move the indentation level in the Google Doc to the right, as shown below:
|
88
|
+
|
89
|
+
> Blogging is great for your charma.
|
90
|
+
>
|
91
|
+
> \- Sebastian Spier (2019)
|
92
|
+
|
93
|
+
## Headline 3
|
94
|
+
|
95
|
+
In closing: Goodbye!
|
96
|
+
|
97
|
+
[<span class="underline">this Google doc</span>]: https://docs.google.com/document/d/1oKGYVORih0GNC1CZHKv0d2IirCtcgMu0O1sifTfH5zo/edit#
|
98
|
+
[<span class="underline">docx2gfm</span>]: https://github.com/meltwater/docx2gfm
|
99
|
+
[<span class="underline">github-flavored-markdown</span>]: https://help.github.com/articles/basic-writing-and-formatting-syntax/
|
100
|
+
[<span class="underline">text formatting language</span>]: https://softwareengineering.stackexchange.com/questions/207727/why-there-is-no-markdown-for-underline
|
101
|
+
[<span class="underline">Mauris</span>]: https://underthehood.meltwater.com
|
102
|
+
[<span class="underline">Lorem ipsum dolor sit amet</span>]: https://loremipsum.io/
|
103
|
+
[<span class="underline">In mattis lectus</span>]: https://spier.hu
|
Binary file
|
data/examples/sample.md
ADDED
@@ -0,0 +1,112 @@
|
|
1
|
+
---
|
2
|
+
layout: post
|
3
|
+
title: "YOUR POST TITLE"
|
4
|
+
comments: true
|
5
|
+
categories: [tag1, tag2, multi word tag]
|
6
|
+
author: <a href="link to twitter, personal blog, linkedin, etc">YOUR NAME</a>
|
7
|
+
image: "/images/own/post_directory/logo_image_NOT_SVG_FORMAT.png"
|
8
|
+
---
|
9
|
+
|
10
|
+
---
|
11
|
+
title: '<span id="_iq8c5s4280y5" class="anchor"></span>A Google Doc (docx) for testing the markdown conversation with docx2gfm <img src="media/image1.jpg" style="width:1.77604in;height:1.98311in" />'
|
12
|
+
---
|
13
|
+
|
14
|
+
Nulla ante dui, efficitur ut accumsan id, imperdiet ac urna. Duis nec eros non ex posuere scelerisque. Duis non dui quam. Vivamus pretium pretium lacus sit amet volutpat. In sollicitudin massa euismod, consectetur est in, malesuada sem. Pellentesque ullamcorper ligula blandit lacinia cursus. Nunc sit amet quam dapibus, blandit lacus in, pellentesque lacus. Morbi varius est sapien, vel imperdiet turpis varius vitae. Sed laoreet eu magna vel dictum. Nullam eget iaculis nisl, et congue orci. Curabitur hendrerit fermentum sapien fringilla vehicula. Ut rutrum pretium ligula in accumsan. Donec sed facilisis justo.
|
15
|
+
|
16
|
+
## What is this
|
17
|
+
|
18
|
+
The document that you are looking at was produced by exporting [this Google doc][this-google-doc] as a .docx file, and then running it through this experimental [docx2gfm][docx2gfm] converter.
|
19
|
+
|
20
|
+
Let's link to the [github repo again][docx2gfm] but with a different link name.
|
21
|
+
|
22
|
+
## Emphasize section
|
23
|
+
|
24
|
+
We **may be Bolder**, we *may be Italian* (ouch!), we ~~may be on Strike~~, or under the line.
|
25
|
+
|
26
|
+
Not all of these work in [github-flavored-markdown][github-flavored-markdown], but that is ok. Remember it is a markup language, not a [text formatting language][text-formatting-language].
|
27
|
+
|
28
|
+
## List section
|
29
|
+
|
30
|
+
- Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
|
31
|
+
|
32
|
+
- With bullets
|
33
|
+
|
34
|
+
- Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [Mauris][mauris] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
|
35
|
+
|
36
|
+
1. Numbered lists are great
|
37
|
+
|
38
|
+
2. If one comes after the …
|
39
|
+
|
40
|
+
3. … other
|
41
|
+
|
42
|
+
- Bullet A
|
43
|
+
|
44
|
+
- Bullet A1
|
45
|
+
|
46
|
+
- Bullet A2
|
47
|
+
|
48
|
+
- Bullet B
|
49
|
+
|
50
|
+
- Bullet B1
|
51
|
+
|
52
|
+
- Bullet B2
|
53
|
+
|
54
|
+
- Bullet C
|
55
|
+
|
56
|
+
- Bullet C1
|
57
|
+
|
58
|
+
- Bullet C1a
|
59
|
+
|
60
|
+
## An image
|
61
|
+
|
62
|
+
## Headline 1<img src="media/image1.jpg" style="width:6.27083in;height:2.3125in" />
|
63
|
+
|
64
|
+
[Lorem ipsum dolor sit amet][lorem-ipsum-dolor-sit-amet], consectetur adipiscing elit. Sed cursus laoreet leo, non tempor libero malesuada a. Donec commodo tempor neque, vitae tempor nibh. Vestibulum scelerisque purus ipsum, ac efficitur ipsum feugiat sit amet. Praesent luctus tincidunt ante at cursus. Etiam dignissim aliquet lacus, eu sodales nunc porttitor in. Nunc ut luctus purus. Donec eu vehicula enim. Vivamus eu hendrerit velit. Vivamus pellentesque facilisis ex molestie cursus. Fusce egestas tellus quis tortor cursus imperdiet. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec facilisis ipsum nulla, eget dapibus lectus interdum vel. Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
|
65
|
+
|
66
|
+
- A list
|
67
|
+
|
68
|
+
- With bullets
|
69
|
+
|
70
|
+
- And even more bullets
|
71
|
+
|
72
|
+
## Headline 2
|
73
|
+
|
74
|
+
Nullam vehicula ac dolor ac ultrices. Integer lacinia urna eu vestibulum tempor. Ut lacus ante, scelerisque consectetur odio blandit, vulputate dapibus nunc. Curabitur mi ex, ullamcorper sit amet diam ac, blandit consectetur tellus. Pellentesque ut eros sit amet enim consequat fermentum at quis orci. In nunc eros, vestibulum mollis posuere non, venenatis a justo. Suspendisse tincidunt, mi vel tincidunt auctor, lacus sem hendrerit ex, eget maximus mi risus eu felis. Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [Mauris][mauris] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
|
75
|
+
|
76
|
+
1. Numbered lists are great
|
77
|
+
|
78
|
+
2. If one comes after the …
|
79
|
+
|
80
|
+
3. … other
|
81
|
+
|
82
|
+
**Ut bibendum turpis ex**, id consequat elit euismod et. Nam lectus arcu, pharetra tincidunt iaculis a, interdum vitae arcu. Cras ac lectus quis risus laoreet scelerisque. Curabitur metus ex, sagittis sit amet quam vitae, consequat sodales velit. Nulla quis ex urna. Integer et tortor odio. Curabitur euismod feugiat mollis. *Nullam vestibulum tempus feugiat*. Vestibulum porta aliquam mauris non aliquet. Sed aliquam erat ac bibendum lacinia. Nam iaculis ornare lorem, sed consectetur dolor.
|
83
|
+
|
84
|
+
[In mattis lectus][in-mattis-lectus] accumsan diam accumsan, in bibendum justo fringilla. Aenean lacinia aliquam ligula vel semper. Phasellus purus ipsum, condimentum in tellus nec, finibus consectetur elit. Nam fringilla, diam id dignissim sodales, magna elit tempor arcu, at pharetra nisi metus eget purus. Donec mollis ac leo in aliquet. Proin congue congue diam, ac euismod est blandit non. Aliquam ac efficitur eros. Mauris mollis commodo fermentum. Donec mattis sit amet risus nec vestibulum. Donec vel sollicitudixn dolor. Aliquam aliquet neque ac augue consequat porta. Praesent tristique lobortis tincidunt. Etiam dapibus consequat fringilla. Etiam ac orci a nunc rutrum finibus eget a turpis. Duis vitae venenatis magna, ut vulputate turpis. Nunc fringilla non tellus at egestas.
|
85
|
+
|
86
|
+
## Table Tests
|
87
|
+
|
88
|
+
| **TH 1** | **TH 2** | **TH 3** |
|
89
|
+
|----------|----------|----------|
|
90
|
+
| TD 1 | TD 2 | TD 3 |
|
91
|
+
| TD 4 | TD 5 | TD 6 |
|
92
|
+
| TD 7 | TD 8 | TD 9 |
|
93
|
+
|
94
|
+
## Quote!
|
95
|
+
|
96
|
+
Every once in a while you need a great quote. To do so, write your quote and then move the indentation level in the Google Doc to the right, as shown below:
|
97
|
+
|
98
|
+
> Blogging is great for your charma.
|
99
|
+
>
|
100
|
+
> \- Sebastian Spier (2019)
|
101
|
+
|
102
|
+
## Headline 3
|
103
|
+
|
104
|
+
In closing: Goodbye!
|
105
|
+
|
106
|
+
[this-google-doc]: https://docs.google.com/document/d/1oKGYVORih0GNC1CZHKv0d2IirCtcgMu0O1sifTfH5zo/edit#
|
107
|
+
[docx2gfm]: https://github.com/meltwater/docx2gfm
|
108
|
+
[github-flavored-markdown]: https://help.github.com/articles/basic-writing-and-formatting-syntax/
|
109
|
+
[text-formatting-language]: https://softwareengineering.stackexchange.com/questions/207727/why-there-is-no-markdown-for-underline
|
110
|
+
[mauris]: https://underthehood.meltwater.com
|
111
|
+
[lorem-ipsum-dolor-sit-amet]: https://loremipsum.io/
|
112
|
+
[in-mattis-lectus]: https://spier.hu
|
data/lib/docx2gfm.rb
ADDED
@@ -0,0 +1,59 @@
|
|
1
|
+
require 'docx2gfm/version'
|
2
|
+
require 'docx2gfm/docx_gfm_converter'
|
3
|
+
|
4
|
+
require 'pp'
|
5
|
+
require 'optparse'
|
6
|
+
|
7
|
+
module Docx2gfm
|
8
|
+
|
9
|
+
class Runner
|
10
|
+
|
11
|
+
def self.run
|
12
|
+
|
13
|
+
# set default values for options
|
14
|
+
options = {}
|
15
|
+
options[:jekyll] = true
|
16
|
+
options[:ref_style_links] = true
|
17
|
+
|
18
|
+
# specify available options for the CLI
|
19
|
+
parser = OptionParser.new do |opts|
|
20
|
+
opts.banner = 'Usage: docx2gfm [options]'
|
21
|
+
|
22
|
+
opts.on('-f', '--file FILE', '(required) The .docx file to convert to markdown') do |v|
|
23
|
+
options[:file] = v
|
24
|
+
end
|
25
|
+
opts.on('-j', '--[no-]jekyll', '(optional) Prefix the markdown output with a jekyll frontmatter. Default: --jekyll') do |v|
|
26
|
+
options[:jekyll] = v
|
27
|
+
end
|
28
|
+
opts.on('-r', '--[no-]ref-style-links', '(optional) Create reference style links at the end of the markdown. Default: --ref-style-links') do |v|
|
29
|
+
options[:ref_style_links] = v
|
30
|
+
end
|
31
|
+
opts.on('-h', '--help', 'Display this help screen') do
|
32
|
+
puts opts
|
33
|
+
exit
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
# most useful way of creating a required parameter with OptionParser
|
38
|
+
# https://stackoverflow.com/questions/1541294/how-do-you-specify-a-required-switch-not-argument-with-ruby-optionparser/1542658#1542658
|
39
|
+
begin
|
40
|
+
parser.parse!
|
41
|
+
mandatory = [:file]
|
42
|
+
missing = mandatory.select{ |param| options[param].nil? }
|
43
|
+
raise OptionParser::MissingArgument, missing.join(', ') unless missing.empty?
|
44
|
+
rescue OptionParser::ParseError => e
|
45
|
+
puts e
|
46
|
+
puts parser
|
47
|
+
exit
|
48
|
+
end
|
49
|
+
|
50
|
+
# pass on options to the Docx2Gfm Converter, and run the conversion
|
51
|
+
doc = DocxGfmConverter.new(options)
|
52
|
+
doc.process_markdown
|
53
|
+
puts doc
|
54
|
+
|
55
|
+
end #run
|
56
|
+
|
57
|
+
end #class
|
58
|
+
|
59
|
+
end # module
|
@@ -0,0 +1,117 @@
|
|
1
|
+
class DocxGfmConverter
|
2
|
+
attr_accessor :options, :content
|
3
|
+
|
4
|
+
def initialize(options)
|
5
|
+
@options = options
|
6
|
+
end
|
7
|
+
|
8
|
+
# perform all conversation and cleanup steps
|
9
|
+
def process_gfm()
|
10
|
+
docx_2_gfm(@options[:file])
|
11
|
+
cleanup_content_gfm()
|
12
|
+
create_ref_style_links() if @options[:ref_style_links]
|
13
|
+
add_frontmatter() if @options[:jekyll]
|
14
|
+
end
|
15
|
+
|
16
|
+
def process_markdown()
|
17
|
+
docx_2_markdown(@options[:file])
|
18
|
+
cleanup_content_markdown()
|
19
|
+
create_ref_style_links() if @options[:ref_style_links]
|
20
|
+
add_frontmatter() if @options[:jekyll]
|
21
|
+
end
|
22
|
+
|
23
|
+
# output this document (i.e. the markdown content)
|
24
|
+
def to_s
|
25
|
+
@content
|
26
|
+
end
|
27
|
+
|
28
|
+
# convert docx to initial markdown
|
29
|
+
def docx_2_gfm(file)
|
30
|
+
# TODO before reading the file, I could check if the file exists
|
31
|
+
# TODO check out pandoc options that might be useful e.g. --extract-media='/images/own/'
|
32
|
+
@content = `pandoc #{file} -f docx -t gfm --wrap=none`
|
33
|
+
end
|
34
|
+
|
35
|
+
def docx_2_markdown(file)
|
36
|
+
# TODO before reading the file, I could check if the file exists
|
37
|
+
# TODO check out pandoc options that might be useful e.g. --extract-media='/images/own/'
|
38
|
+
@content = `pandoc #{file} --wrap=none --atx-headers -f docx -t markdown-bracketed_spans-link_attributes-smart-simple_tables -s`
|
39
|
+
end
|
40
|
+
|
41
|
+
# this removes all sorts of strange stuff that pandoc generates when
|
42
|
+
# converting a .docx exported from Google Docs into GFM
|
43
|
+
def cleanup_content_gfm()
|
44
|
+
# remove escaping in front of exclamation marks
|
45
|
+
@content = @content.gsub /\\!/, '!'
|
46
|
+
|
47
|
+
# remove underlining of anchors. Anchors are styled by the markdown renderer, so no need to add any explicit formatting here pandoc!
|
48
|
+
# example: [<span class="underline">In mattis lectus</span>](https://spier.hu) => [In mattis lectus](https://spier.hu)
|
49
|
+
@content = @content.gsub /\[<span class="underline">(.*?)<\/span>\]/m,'[\1]'
|
50
|
+
|
51
|
+
# convert underlining of regular text (not anchors) into markdown syntax
|
52
|
+
# example: <span class="underline">Cras ac lectus quis</span> => _Cras ac lectus quis_
|
53
|
+
# Underlining text is not possible??? ok, so I could spit out a warning here, as the author used a formatting feature that our blog does not support
|
54
|
+
@content = @content.gsub /<span class="underline">(.*?)<\/span>/m,'\1'
|
55
|
+
|
56
|
+
# fix unordered lists
|
57
|
+
@content = @content.gsub(/^(\s*)- > /, '\1- ')
|
58
|
+
@content = @content.gsub(/^(\s*)> /, '\1 ')
|
59
|
+
|
60
|
+
# fix ordered lists
|
61
|
+
@content = @content.gsub(/^(\d+\.) > /, '\1 ')
|
62
|
+
|
63
|
+
# remove `<!-- end list -->`
|
64
|
+
# See http://pandoc.org/MANUAL.html => "Ending a list"
|
65
|
+
@content = @content.gsub(/<!-- end list -->/,'')
|
66
|
+
end
|
67
|
+
|
68
|
+
def cleanup_content_markdown()
|
69
|
+
# remove underlining from links
|
70
|
+
@content = @content.gsub /\[<span class="underline">(.*?)<\/span>\]/m,'[\1]'
|
71
|
+
|
72
|
+
# remove underlining from all other text (and print a warning)
|
73
|
+
@content = @content.gsub(/<span class="underline">(.*?)<\/span>/m) do |match|
|
74
|
+
STDERR.puts "Underline is not supported in markdown. Removing underlining from '#{$1}'."
|
75
|
+
$1
|
76
|
+
end
|
77
|
+
|
78
|
+
# fix lists - remove unneccesary spacing before list items
|
79
|
+
# 1. Numbered lists are great
|
80
|
+
# - And even more bullets
|
81
|
+
@content = @content.gsub(/^(\s*)(-|\d+\.)\s+(\S)/, '\1\2 \3')
|
82
|
+
|
83
|
+
# fix spacing in front of reference links
|
84
|
+
@content = @content.gsub(/^ +(\[.+?\]:)/, '\1')
|
85
|
+
end
|
86
|
+
|
87
|
+
def add_frontmatter()
|
88
|
+
asset_file = File.join(File.dirname(__FILE__), '/assets/front-matter.md')
|
89
|
+
front_matter = open(asset_file).readlines().join()
|
90
|
+
@content = front_matter + "\n" + @content
|
91
|
+
end
|
92
|
+
|
93
|
+
def clean_link_placeholder(text)
|
94
|
+
text.downcase.gsub(/\s/,'-')
|
95
|
+
end
|
96
|
+
|
97
|
+
def create_ref_style_links()
|
98
|
+
# matcher = content.scan(/[^!]\[(?<text>.*?)\]\((?<url>.*?)\)/)
|
99
|
+
# TODO using named groups below would be more descriptive. Need to figure out how.
|
100
|
+
link_dictionary = {}
|
101
|
+
|
102
|
+
@content.gsub!(/([^!])\[(.*?)\]\((.*?)\)/) do |match|
|
103
|
+
cleaned_link_placeholder = clean_link_placeholder($2)
|
104
|
+
if not link_dictionary.has_key?($3)
|
105
|
+
link_dictionary[$3] = cleaned_link_placeholder
|
106
|
+
end
|
107
|
+
"#{$1}[#{$2}][#{link_dictionary[$3]}]"
|
108
|
+
end
|
109
|
+
|
110
|
+
# add link references to the end of the content
|
111
|
+
@content += "\n"
|
112
|
+
link_dictionary.each_pair do |url, label|
|
113
|
+
@content += "[#{label}]: #{url}\n"
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
end #class
|
metadata
ADDED
@@ -0,0 +1,91 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: docx2gfm
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.2.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Sebastian Spier
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2019-11-17 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: bundler
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.16'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.16'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rake
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '10.0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '10.0'
|
41
|
+
description: Convert a docx file, to github-flavored-markdown. thin wrapper around
|
42
|
+
pandoc.
|
43
|
+
email:
|
44
|
+
- github@spier.hu
|
45
|
+
executables:
|
46
|
+
- docx2gfm
|
47
|
+
extensions: []
|
48
|
+
extra_rdoc_files: []
|
49
|
+
files:
|
50
|
+
- ".gitignore"
|
51
|
+
- Gemfile
|
52
|
+
- LICENSE.txt
|
53
|
+
- MOTIVATION.md
|
54
|
+
- README.md
|
55
|
+
- Rakefile
|
56
|
+
- bin/console
|
57
|
+
- bin/docx2gfm
|
58
|
+
- bin/setup
|
59
|
+
- docx2gfm.gemspec
|
60
|
+
- examples/sample-pure-pandoc.md
|
61
|
+
- examples/sample.docx
|
62
|
+
- examples/sample.md
|
63
|
+
- lib/docx2gfm.rb
|
64
|
+
- lib/docx2gfm/assets/front-matter.md
|
65
|
+
- lib/docx2gfm/docx_gfm_converter.rb
|
66
|
+
- lib/docx2gfm/version.rb
|
67
|
+
homepage: https://github.com/spier/docx2gfm
|
68
|
+
licenses:
|
69
|
+
- MIT
|
70
|
+
metadata:
|
71
|
+
allowed_push_host: https://rubygems.org
|
72
|
+
post_install_message:
|
73
|
+
rdoc_options: []
|
74
|
+
require_paths:
|
75
|
+
- lib
|
76
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
77
|
+
requirements:
|
78
|
+
- - ">="
|
79
|
+
- !ruby/object:Gem::Version
|
80
|
+
version: '0'
|
81
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
82
|
+
requirements:
|
83
|
+
- - ">="
|
84
|
+
- !ruby/object:Gem::Version
|
85
|
+
version: '0'
|
86
|
+
requirements: []
|
87
|
+
rubygems_version: 3.0.3
|
88
|
+
signing_key:
|
89
|
+
specification_version: 4
|
90
|
+
summary: Convert a docx file, to github-flavored-markdown
|
91
|
+
test_files: []
|