docx2gfm 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: d9a2da3b4b09ea75ecb7523a3a01e0b275237239acea98dbc20c1371afa8a3ea
4
+ data.tar.gz: 35624ce545980e73a9f931be6f94d5fb7e2c642a8854d2c75d1041cc9d3cb9d4
5
+ SHA512:
6
+ metadata.gz: '0861e90f7edda3bb08c31f7e5963e3ed7db5866f9bb2e61787aa40a0ee2b48c3ca2d00dab1e44411df635004d2796f1025456979fb7ad46e291d0aec72ad88af'
7
+ data.tar.gz: e7a5df12dc792018f6c4ff469289e8d5d16cfc4feccfc62a2b507af11163778616ff0c3632f4a9cd398486abb26673abbb78df571c8851d76bd87e7b1d4684e7
@@ -0,0 +1,10 @@
1
+ Gemfile.lock
2
+
3
+ /.bundle/
4
+ /.yardoc
5
+ /_yardoc/
6
+ /coverage/
7
+ /doc/
8
+ /pkg/
9
+ /spec/reports/
10
+ /tmp/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in docx2gfm.gemspec
6
+ gemspec
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2019 Sebastian Spier
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,19 @@
1
+ # Motivation
2
+
3
+ docx2gfm is a wrapper around [pandoc][pandoc], which does the heavy-lifting of the docx to markdown conversation. So why not use pandoc straight up you may ask?
4
+
5
+ The best pandoc configuration that I could find so far is this:
6
+
7
+ ```
8
+ pandoc examples/sample.docx --wrap=none --atx-headers --reference-links -f docx -t markdown-bracketed_spans-link_attributes-smart-simple_tables -s
9
+ ```
10
+
11
+ It produces the markdown output as shown in [examples/sample-pure-pandoc.md](./examples/sample-pure-pandoc.md).
12
+
13
+ While this is pretty good already, this markdown has the following shortcomings:
14
+
15
+ * lists have superfluous spaces before each list item
16
+ * HTML formatting for underlines is created e.g. `<span class="underline">In mattis lectus</span>` => one could use something similar to `sed -e 's/<[^>]*>//g'` to get rid of the HTML. However this will also remove the HTML placeholders for the images, which are good to keep.
17
+ * less pretty reference-links at the end of the file
18
+
19
+ [pandoc]: https://pandoc.org/installing.html
@@ -0,0 +1,82 @@
1
+ # docx2gfm - docx to github-flavored-markdown converter
2
+
3
+ If you need to convert `.docx` documents to markdown, then `docx2gfm` is for you, as it makes the process faster.
4
+
5
+ "Don't tell me, show. me"! Ok ok! `docx2gfm` turns [this docx file](./examples/sample.docx) into [this markdown](./examples/sample.md). Also see the original [google Doc][gDoc].
6
+
7
+ Some post-processing of the markdown is still required but `docx2gfm` already makes the conversion process much faster.
8
+
9
+ ## The Long Story
10
+
11
+ I am maintaining an engineering blog, that uses [jekyll][jekyll] to generate static pages.
12
+
13
+ In our blogging process, the authors write blog post as a Google Doc to collect feedback. Once the post is ready for publishing, they convert the Google Doc to [github-flavored-markdown][gfm], as that is what [jekyll][jekyll] needs as input to render the HTML for the blog.
14
+
15
+ We used to do this conversion step manually. This was tedious, boring, and in parts error-prone.
16
+
17
+ With `docx2gfm` you can do this conversion quickly, and have more time to write new blog posts ... or drink coffee :)
18
+
19
+ Technically `docx2gfm` is a thin wrapper around [pandoc][pandoc]. In [MOTIVATION.md](./MOTIVATION.md) you find more about the technical approach we chose.
20
+
21
+ ## Installation
22
+
23
+ - install ruby
24
+ - install [pandoc][pandoc]
25
+ - install this gem: `gem install docx2gfm`
26
+
27
+ ## Usage
28
+
29
+ 1. download your Google Doc as a `.docx` file e.g. `my_post.docx` (File >> Download as >> Microsoft Word (.docx))
30
+ 1. convert docx to github-flavored-markdown:
31
+
32
+ ```
33
+ docx2gfm -f my_post.docx > my_post.md
34
+ ```
35
+
36
+ To learn more about the available options please refer to the built-in help.
37
+
38
+ ```
39
+ $ docx2gfm -h
40
+
41
+ Usage: docx2gfm [options]
42
+ -f, --file FILE (required) The .docx file to convert to markdown
43
+ -j, --[no-]jekyll (optional) Prefix the markdown output with a jekyll frontmatter. Default: --jekyll
44
+ -r, --[no-]ref-style-links (optional) Create reference style links at the end of the markdown. Default: --ref-style-links
45
+ -h, --help Display this help screen
46
+ ```
47
+
48
+ ## Finishing touches for your markdown
49
+
50
+ The markdown produced by `docx2gfm` is good but not perfect. You still have to do some manual steps:
51
+
52
+ * Adapt the YAML Frontmatter (if you used the `--jekyll` option)
53
+ * Add the correct image links
54
+ * Add code blocks
55
+ * Add quotes
56
+ * Add tables
57
+
58
+ ## Alternatives to docx2gfm
59
+
60
+ * Word to Markdown Converter: [online](https://word-to-markdown.herokuapp.com/), [source](https://github.com/benbalter/word-to-markdown)
61
+ * [Writage](http://www.writage.com) - Markdown plugin for Microsoft Word
62
+
63
+ ## Development
64
+
65
+ After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
66
+
67
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
68
+
69
+ ## Contributing
70
+
71
+ `docx2gfm` is far from perfect.
72
+ Bug reports and pull requests are welcome on GitHub at [github.com/spier/docx2gfm](https://github.com/spier/docx2gfm).
73
+
74
+ ## License
75
+
76
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
77
+
78
+ [uth]: https://underthehood.meltwater.com/
79
+ [gfm]: https://guides.github.com/features/mastering-markdown/
80
+ [gDoc]: https://docs.google.com/document/d/16Kww2ic-YgFKskfDxYJu6o_ooSF3IORJh8Ho7XbgngI/edit
81
+ [pandoc]: https://pandoc.org/installing.html
82
+ [jekyll]: https://jekyllrb.com
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "docx2gfm"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'docx2gfm'
4
+ puts Docx2gfm::Runner.run()
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,36 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "docx2gfm/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "docx2gfm"
8
+ spec.version = Docx2gfm::VERSION
9
+ spec.authors = ["Sebastian Spier"]
10
+ spec.email = ["github@spier.hu"]
11
+
12
+ spec.summary = "Convert a docx file, to github-flavored-markdown"
13
+ spec.description = "Convert a docx file, to github-flavored-markdown. thin wrapper around pandoc."
14
+ spec.homepage = "https://github.com/spier/docx2gfm"
15
+ spec.license = "MIT"
16
+
17
+ # Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
18
+ # to allow pushing to a single host or delete this section to allow pushing to any host.
19
+ if spec.respond_to?(:metadata)
20
+ spec.metadata["allowed_push_host"] = "https://rubygems.org"
21
+ else
22
+ raise "RubyGems 2.0 or newer is required to protect against " \
23
+ "public gem pushes."
24
+ end
25
+
26
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
27
+ f.match(%r{^(test|spec|features)/})
28
+ end
29
+ spec.bindir = "bin"
30
+ spec.require_paths = ["lib"]
31
+
32
+ spec.executables << "docx2gfm"
33
+
34
+ spec.add_development_dependency "bundler", "~> 1.16"
35
+ spec.add_development_dependency "rake", "~> 10.0"
36
+ end
@@ -0,0 +1,103 @@
1
+ ---
2
+ title: '<span id="_iq8c5s4280y5" class="anchor"></span>A Google Doc (docx) for testing the markdown conversation with docx2gfm <img src="media/image1.jpg" style="width:1.77604in;height:1.98311in" />'
3
+ ---
4
+
5
+ Nulla ante dui, efficitur ut accumsan id, imperdiet ac urna. Duis nec eros non ex posuere scelerisque. Duis non dui quam. Vivamus pretium pretium lacus sit amet volutpat. In sollicitudin massa euismod, consectetur est in, malesuada sem. Pellentesque ullamcorper ligula blandit lacinia cursus. Nunc sit amet quam dapibus, blandit lacus in, pellentesque lacus. Morbi varius est sapien, vel imperdiet turpis varius vitae. Sed laoreet eu magna vel dictum. Nullam eget iaculis nisl, et congue orci. Curabitur hendrerit fermentum sapien fringilla vehicula. Ut rutrum pretium ligula in accumsan. Donec sed facilisis justo.
6
+
7
+ ## What is this
8
+
9
+ The document that you are looking at was produced by exporting [<span class="underline">this Google doc</span>] as a .docx file, and then running it through this experimental [<span class="underline">docx2gfm</span>] converter.
10
+
11
+ Let's link to the [<span class="underline">github repo again</span>][<span class="underline">docx2gfm</span>] but with a different link name.
12
+
13
+ ## Emphasize section
14
+
15
+ We **may be Bolder**, we *may be Italian* (ouch!), we ~~may be on Strike~~, or <span class="underline">under the line</span>.
16
+
17
+ Not all of these work in [<span class="underline">github-flavored-markdown</span>], but that is ok. Remember it is a markup language, not a [<span class="underline">text formatting language</span>].
18
+
19
+ ## List section
20
+
21
+ - Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
22
+
23
+ - With bullets
24
+
25
+ - Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [<span class="underline">Mauris</span>] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
26
+
27
+ 1. Numbered lists are great
28
+
29
+ 2. If one comes after the …
30
+
31
+ 3. … other
32
+
33
+ - Bullet A
34
+
35
+ - Bullet A1
36
+
37
+ - Bullet A2
38
+
39
+ - Bullet B
40
+
41
+ - Bullet B1
42
+
43
+ - Bullet B2
44
+
45
+ - Bullet C
46
+
47
+ - Bullet C1
48
+
49
+ - Bullet C1a
50
+
51
+ ## An image
52
+
53
+ ## Headline 1<img src="media/image1.jpg" style="width:6.27083in;height:2.3125in" />
54
+
55
+ [<span class="underline">Lorem ipsum dolor sit amet</span>], consectetur adipiscing elit. Sed cursus laoreet leo, non tempor libero malesuada a. Donec commodo tempor neque, vitae tempor nibh. Vestibulum scelerisque purus ipsum, ac efficitur ipsum feugiat sit amet. Praesent luctus tincidunt ante at cursus. Etiam dignissim aliquet lacus, eu sodales nunc porttitor in. Nunc ut luctus purus. Donec eu vehicula enim. Vivamus eu hendrerit velit. Vivamus pellentesque facilisis ex molestie cursus. Fusce egestas tellus quis tortor cursus imperdiet. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec facilisis ipsum nulla, eget dapibus lectus interdum vel. Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
56
+
57
+ - A list
58
+
59
+ - With bullets
60
+
61
+ - And even more bullets
62
+
63
+ ## Headline 2
64
+
65
+ Nullam vehicula ac dolor ac ultrices. Integer lacinia urna eu vestibulum tempor. Ut lacus ante, scelerisque consectetur odio blandit, vulputate dapibus nunc. Curabitur mi ex, ullamcorper sit amet diam ac, blandit consectetur tellus. Pellentesque ut eros sit amet enim consequat fermentum at quis orci. In nunc eros, vestibulum mollis posuere non, venenatis a justo. Suspendisse tincidunt, mi vel tincidunt auctor, lacus sem hendrerit ex, eget maximus mi risus eu felis. Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [<span class="underline">Mauris</span>] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
66
+
67
+ 1. Numbered lists are great
68
+
69
+ 2. If one comes after the …
70
+
71
+ 3. … other
72
+
73
+ **Ut bibendum turpis ex**, id consequat elit euismod et. Nam lectus arcu, pharetra tincidunt iaculis a, interdum vitae arcu. <span class="underline">Cras ac lectus quis</span> risus laoreet scelerisque. Curabitur metus ex, sagittis sit amet quam vitae, consequat sodales velit. Nulla quis ex urna. Integer et tortor odio. Curabitur euismod feugiat mollis. *Nullam vestibulum tempus feugiat*. Vestibulum porta aliquam mauris non aliquet. Sed aliquam erat ac bibendum lacinia. Nam iaculis ornare lorem, sed consectetur dolor.
74
+
75
+ [<span class="underline">In mattis lectus</span>] accumsan diam accumsan, in bibendum justo fringilla. Aenean lacinia aliquam ligula vel semper. Phasellus purus ipsum, condimentum in tellus nec, finibus consectetur elit. Nam fringilla, diam id dignissim sodales, magna elit tempor arcu, at pharetra nisi metus eget purus. Donec mollis ac leo in aliquet. Proin congue congue diam, ac euismod est blandit non. Aliquam ac efficitur eros. Mauris mollis commodo fermentum. Donec mattis sit amet risus nec vestibulum. Donec vel sollicitudixn dolor. Aliquam aliquet neque ac augue consequat porta. Praesent tristique lobortis tincidunt. Etiam dapibus consequat fringilla. Etiam ac orci a nunc rutrum finibus eget a turpis. Duis vitae venenatis magna, ut vulputate turpis. Nunc fringilla non tellus at egestas.
76
+
77
+ ## Table Tests
78
+
79
+ | **TH 1** | **TH 2** | **TH 3** |
80
+ |----------|----------|----------|
81
+ | TD 1 | TD 2 | TD 3 |
82
+ | TD 4 | TD 5 | TD 6 |
83
+ | TD 7 | TD 8 | TD 9 |
84
+
85
+ ## Quote!
86
+
87
+ Every once in a while you need a great quote. To do so, write your quote and then move the indentation level in the Google Doc to the right, as shown below:
88
+
89
+ > Blogging is great for your charma.
90
+ >
91
+ > \- Sebastian Spier (2019)
92
+
93
+ ## Headline 3
94
+
95
+ In closing: Goodbye!
96
+
97
+ [<span class="underline">this Google doc</span>]: https://docs.google.com/document/d/1oKGYVORih0GNC1CZHKv0d2IirCtcgMu0O1sifTfH5zo/edit#
98
+ [<span class="underline">docx2gfm</span>]: https://github.com/meltwater/docx2gfm
99
+ [<span class="underline">github-flavored-markdown</span>]: https://help.github.com/articles/basic-writing-and-formatting-syntax/
100
+ [<span class="underline">text formatting language</span>]: https://softwareengineering.stackexchange.com/questions/207727/why-there-is-no-markdown-for-underline
101
+ [<span class="underline">Mauris</span>]: https://underthehood.meltwater.com
102
+ [<span class="underline">Lorem ipsum dolor sit amet</span>]: https://loremipsum.io/
103
+ [<span class="underline">In mattis lectus</span>]: https://spier.hu
Binary file
@@ -0,0 +1,112 @@
1
+ ---
2
+ layout: post
3
+ title: "YOUR POST TITLE"
4
+ comments: true
5
+ categories: [tag1, tag2, multi word tag]
6
+ author: <a href="link to twitter, personal blog, linkedin, etc">YOUR NAME</a>
7
+ image: "/images/own/post_directory/logo_image_NOT_SVG_FORMAT.png"
8
+ ---
9
+
10
+ ---
11
+ title: '<span id="_iq8c5s4280y5" class="anchor"></span>A Google Doc (docx) for testing the markdown conversation with docx2gfm <img src="media/image1.jpg" style="width:1.77604in;height:1.98311in" />'
12
+ ---
13
+
14
+ Nulla ante dui, efficitur ut accumsan id, imperdiet ac urna. Duis nec eros non ex posuere scelerisque. Duis non dui quam. Vivamus pretium pretium lacus sit amet volutpat. In sollicitudin massa euismod, consectetur est in, malesuada sem. Pellentesque ullamcorper ligula blandit lacinia cursus. Nunc sit amet quam dapibus, blandit lacus in, pellentesque lacus. Morbi varius est sapien, vel imperdiet turpis varius vitae. Sed laoreet eu magna vel dictum. Nullam eget iaculis nisl, et congue orci. Curabitur hendrerit fermentum sapien fringilla vehicula. Ut rutrum pretium ligula in accumsan. Donec sed facilisis justo.
15
+
16
+ ## What is this
17
+
18
+ The document that you are looking at was produced by exporting [this Google doc][this-google-doc] as a .docx file, and then running it through this experimental [docx2gfm][docx2gfm] converter.
19
+
20
+ Let's link to the [github repo again][docx2gfm] but with a different link name.
21
+
22
+ ## Emphasize section
23
+
24
+ We **may be Bolder**, we *may be Italian* (ouch!), we ~~may be on Strike~~, or under the line.
25
+
26
+ Not all of these work in [github-flavored-markdown][github-flavored-markdown], but that is ok. Remember it is a markup language, not a [text formatting language][text-formatting-language].
27
+
28
+ ## List section
29
+
30
+ - Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
31
+
32
+ - With bullets
33
+
34
+ - Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [Mauris][mauris] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
35
+
36
+ 1. Numbered lists are great
37
+
38
+ 2. If one comes after the …
39
+
40
+ 3. … other
41
+
42
+ - Bullet A
43
+
44
+ - Bullet A1
45
+
46
+ - Bullet A2
47
+
48
+ - Bullet B
49
+
50
+ - Bullet B1
51
+
52
+ - Bullet B2
53
+
54
+ - Bullet C
55
+
56
+ - Bullet C1
57
+
58
+ - Bullet C1a
59
+
60
+ ## An image
61
+
62
+ ## Headline 1<img src="media/image1.jpg" style="width:6.27083in;height:2.3125in" />
63
+
64
+ [Lorem ipsum dolor sit amet][lorem-ipsum-dolor-sit-amet], consectetur adipiscing elit. Sed cursus laoreet leo, non tempor libero malesuada a. Donec commodo tempor neque, vitae tempor nibh. Vestibulum scelerisque purus ipsum, ac efficitur ipsum feugiat sit amet. Praesent luctus tincidunt ante at cursus. Etiam dignissim aliquet lacus, eu sodales nunc porttitor in. Nunc ut luctus purus. Donec eu vehicula enim. Vivamus eu hendrerit velit. Vivamus pellentesque facilisis ex molestie cursus. Fusce egestas tellus quis tortor cursus imperdiet. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec facilisis ipsum nulla, eget dapibus lectus interdum vel. Nam leo ipsum, commodo ut quam in, ultrices dictum nibh. Donec vehicula velit mi, a vehicula ante bibendum id. Duis sit amet pellentesque nibh. Nunc pretium, enim in sodales sollicitudin, sem nisi porta leo, a tempor neque augue a felis.
65
+
66
+ - A list
67
+
68
+ - With bullets
69
+
70
+ - And even more bullets
71
+
72
+ ## Headline 2
73
+
74
+ Nullam vehicula ac dolor ac ultrices. Integer lacinia urna eu vestibulum tempor. Ut lacus ante, scelerisque consectetur odio blandit, vulputate dapibus nunc. Curabitur mi ex, ullamcorper sit amet diam ac, blandit consectetur tellus. Pellentesque ut eros sit amet enim consequat fermentum at quis orci. In nunc eros, vestibulum mollis posuere non, venenatis a justo. Suspendisse tincidunt, mi vel tincidunt auctor, lacus sem hendrerit ex, eget maximus mi risus eu felis. Mauris nec tempus enim, a porttitor leo. Sed eu varius elit. Cras aliquam felis non ante porta maximus. [Mauris][mauris] quis aliquam diam, ac vestibulum odio. Etiam et commodo tellus.
75
+
76
+ 1. Numbered lists are great
77
+
78
+ 2. If one comes after the …
79
+
80
+ 3. … other
81
+
82
+ **Ut bibendum turpis ex**, id consequat elit euismod et. Nam lectus arcu, pharetra tincidunt iaculis a, interdum vitae arcu. Cras ac lectus quis risus laoreet scelerisque. Curabitur metus ex, sagittis sit amet quam vitae, consequat sodales velit. Nulla quis ex urna. Integer et tortor odio. Curabitur euismod feugiat mollis. *Nullam vestibulum tempus feugiat*. Vestibulum porta aliquam mauris non aliquet. Sed aliquam erat ac bibendum lacinia. Nam iaculis ornare lorem, sed consectetur dolor.
83
+
84
+ [In mattis lectus][in-mattis-lectus] accumsan diam accumsan, in bibendum justo fringilla. Aenean lacinia aliquam ligula vel semper. Phasellus purus ipsum, condimentum in tellus nec, finibus consectetur elit. Nam fringilla, diam id dignissim sodales, magna elit tempor arcu, at pharetra nisi metus eget purus. Donec mollis ac leo in aliquet. Proin congue congue diam, ac euismod est blandit non. Aliquam ac efficitur eros. Mauris mollis commodo fermentum. Donec mattis sit amet risus nec vestibulum. Donec vel sollicitudixn dolor. Aliquam aliquet neque ac augue consequat porta. Praesent tristique lobortis tincidunt. Etiam dapibus consequat fringilla. Etiam ac orci a nunc rutrum finibus eget a turpis. Duis vitae venenatis magna, ut vulputate turpis. Nunc fringilla non tellus at egestas.
85
+
86
+ ## Table Tests
87
+
88
+ | **TH 1** | **TH 2** | **TH 3** |
89
+ |----------|----------|----------|
90
+ | TD 1 | TD 2 | TD 3 |
91
+ | TD 4 | TD 5 | TD 6 |
92
+ | TD 7 | TD 8 | TD 9 |
93
+
94
+ ## Quote!
95
+
96
+ Every once in a while you need a great quote. To do so, write your quote and then move the indentation level in the Google Doc to the right, as shown below:
97
+
98
+ > Blogging is great for your charma.
99
+ >
100
+ > \- Sebastian Spier (2019)
101
+
102
+ ## Headline 3
103
+
104
+ In closing: Goodbye!
105
+
106
+ [this-google-doc]: https://docs.google.com/document/d/1oKGYVORih0GNC1CZHKv0d2IirCtcgMu0O1sifTfH5zo/edit#
107
+ [docx2gfm]: https://github.com/meltwater/docx2gfm
108
+ [github-flavored-markdown]: https://help.github.com/articles/basic-writing-and-formatting-syntax/
109
+ [text-formatting-language]: https://softwareengineering.stackexchange.com/questions/207727/why-there-is-no-markdown-for-underline
110
+ [mauris]: https://underthehood.meltwater.com
111
+ [lorem-ipsum-dolor-sit-amet]: https://loremipsum.io/
112
+ [in-mattis-lectus]: https://spier.hu
@@ -0,0 +1,59 @@
1
+ require 'docx2gfm/version'
2
+ require 'docx2gfm/docx_gfm_converter'
3
+
4
+ require 'pp'
5
+ require 'optparse'
6
+
7
+ module Docx2gfm
8
+
9
+ class Runner
10
+
11
+ def self.run
12
+
13
+ # set default values for options
14
+ options = {}
15
+ options[:jekyll] = true
16
+ options[:ref_style_links] = true
17
+
18
+ # specify available options for the CLI
19
+ parser = OptionParser.new do |opts|
20
+ opts.banner = 'Usage: docx2gfm [options]'
21
+
22
+ opts.on('-f', '--file FILE', '(required) The .docx file to convert to markdown') do |v|
23
+ options[:file] = v
24
+ end
25
+ opts.on('-j', '--[no-]jekyll', '(optional) Prefix the markdown output with a jekyll frontmatter. Default: --jekyll') do |v|
26
+ options[:jekyll] = v
27
+ end
28
+ opts.on('-r', '--[no-]ref-style-links', '(optional) Create reference style links at the end of the markdown. Default: --ref-style-links') do |v|
29
+ options[:ref_style_links] = v
30
+ end
31
+ opts.on('-h', '--help', 'Display this help screen') do
32
+ puts opts
33
+ exit
34
+ end
35
+ end
36
+
37
+ # most useful way of creating a required parameter with OptionParser
38
+ # https://stackoverflow.com/questions/1541294/how-do-you-specify-a-required-switch-not-argument-with-ruby-optionparser/1542658#1542658
39
+ begin
40
+ parser.parse!
41
+ mandatory = [:file]
42
+ missing = mandatory.select{ |param| options[param].nil? }
43
+ raise OptionParser::MissingArgument, missing.join(', ') unless missing.empty?
44
+ rescue OptionParser::ParseError => e
45
+ puts e
46
+ puts parser
47
+ exit
48
+ end
49
+
50
+ # pass on options to the Docx2Gfm Converter, and run the conversion
51
+ doc = DocxGfmConverter.new(options)
52
+ doc.process_markdown
53
+ puts doc
54
+
55
+ end #run
56
+
57
+ end #class
58
+
59
+ end # module
@@ -0,0 +1,8 @@
1
+ ---
2
+ layout: post
3
+ title: "YOUR POST TITLE"
4
+ comments: true
5
+ categories: [tag1, tag2, multi word tag]
6
+ author: <a href="link to twitter, personal blog, linkedin, etc">YOUR NAME</a>
7
+ image: "/images/own/post_directory/logo_image_NOT_SVG_FORMAT.png"
8
+ ---
@@ -0,0 +1,117 @@
1
+ class DocxGfmConverter
2
+ attr_accessor :options, :content
3
+
4
+ def initialize(options)
5
+ @options = options
6
+ end
7
+
8
+ # perform all conversation and cleanup steps
9
+ def process_gfm()
10
+ docx_2_gfm(@options[:file])
11
+ cleanup_content_gfm()
12
+ create_ref_style_links() if @options[:ref_style_links]
13
+ add_frontmatter() if @options[:jekyll]
14
+ end
15
+
16
+ def process_markdown()
17
+ docx_2_markdown(@options[:file])
18
+ cleanup_content_markdown()
19
+ create_ref_style_links() if @options[:ref_style_links]
20
+ add_frontmatter() if @options[:jekyll]
21
+ end
22
+
23
+ # output this document (i.e. the markdown content)
24
+ def to_s
25
+ @content
26
+ end
27
+
28
+ # convert docx to initial markdown
29
+ def docx_2_gfm(file)
30
+ # TODO before reading the file, I could check if the file exists
31
+ # TODO check out pandoc options that might be useful e.g. --extract-media='/images/own/'
32
+ @content = `pandoc #{file} -f docx -t gfm --wrap=none`
33
+ end
34
+
35
+ def docx_2_markdown(file)
36
+ # TODO before reading the file, I could check if the file exists
37
+ # TODO check out pandoc options that might be useful e.g. --extract-media='/images/own/'
38
+ @content = `pandoc #{file} --wrap=none --atx-headers -f docx -t markdown-bracketed_spans-link_attributes-smart-simple_tables -s`
39
+ end
40
+
41
+ # this removes all sorts of strange stuff that pandoc generates when
42
+ # converting a .docx exported from Google Docs into GFM
43
+ def cleanup_content_gfm()
44
+ # remove escaping in front of exclamation marks
45
+ @content = @content.gsub /\\!/, '!'
46
+
47
+ # remove underlining of anchors. Anchors are styled by the markdown renderer, so no need to add any explicit formatting here pandoc!
48
+ # example: [<span class="underline">In mattis lectus</span>](https://spier.hu) => [In mattis lectus](https://spier.hu)
49
+ @content = @content.gsub /\[<span class="underline">(.*?)<\/span>\]/m,'[\1]'
50
+
51
+ # convert underlining of regular text (not anchors) into markdown syntax
52
+ # example: <span class="underline">Cras ac lectus quis</span> => _Cras ac lectus quis_
53
+ # Underlining text is not possible??? ok, so I could spit out a warning here, as the author used a formatting feature that our blog does not support
54
+ @content = @content.gsub /<span class="underline">(.*?)<\/span>/m,'\1'
55
+
56
+ # fix unordered lists
57
+ @content = @content.gsub(/^(\s*)- > /, '\1- ')
58
+ @content = @content.gsub(/^(\s*)> /, '\1 ')
59
+
60
+ # fix ordered lists
61
+ @content = @content.gsub(/^(\d+\.) > /, '\1 ')
62
+
63
+ # remove `<!-- end list -->`
64
+ # See http://pandoc.org/MANUAL.html => "Ending a list"
65
+ @content = @content.gsub(/<!-- end list -->/,'')
66
+ end
67
+
68
+ def cleanup_content_markdown()
69
+ # remove underlining from links
70
+ @content = @content.gsub /\[<span class="underline">(.*?)<\/span>\]/m,'[\1]'
71
+
72
+ # remove underlining from all other text (and print a warning)
73
+ @content = @content.gsub(/<span class="underline">(.*?)<\/span>/m) do |match|
74
+ STDERR.puts "Underline is not supported in markdown. Removing underlining from '#{$1}'."
75
+ $1
76
+ end
77
+
78
+ # fix lists - remove unneccesary spacing before list items
79
+ # 1. Numbered lists are great
80
+ # - And even more bullets
81
+ @content = @content.gsub(/^(\s*)(-|\d+\.)\s+(\S)/, '\1\2 \3')
82
+
83
+ # fix spacing in front of reference links
84
+ @content = @content.gsub(/^ +(\[.+?\]:)/, '\1')
85
+ end
86
+
87
+ def add_frontmatter()
88
+ asset_file = File.join(File.dirname(__FILE__), '/assets/front-matter.md')
89
+ front_matter = open(asset_file).readlines().join()
90
+ @content = front_matter + "\n" + @content
91
+ end
92
+
93
+ def clean_link_placeholder(text)
94
+ text.downcase.gsub(/\s/,'-')
95
+ end
96
+
97
+ def create_ref_style_links()
98
+ # matcher = content.scan(/[^!]\[(?<text>.*?)\]\((?<url>.*?)\)/)
99
+ # TODO using named groups below would be more descriptive. Need to figure out how.
100
+ link_dictionary = {}
101
+
102
+ @content.gsub!(/([^!])\[(.*?)\]\((.*?)\)/) do |match|
103
+ cleaned_link_placeholder = clean_link_placeholder($2)
104
+ if not link_dictionary.has_key?($3)
105
+ link_dictionary[$3] = cleaned_link_placeholder
106
+ end
107
+ "#{$1}[#{$2}][#{link_dictionary[$3]}]"
108
+ end
109
+
110
+ # add link references to the end of the content
111
+ @content += "\n"
112
+ link_dictionary.each_pair do |url, label|
113
+ @content += "[#{label}]: #{url}\n"
114
+ end
115
+ end
116
+
117
+ end #class
@@ -0,0 +1,3 @@
1
+ module Docx2gfm
2
+ VERSION = "0.2.0"
3
+ end
metadata ADDED
@@ -0,0 +1,91 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: docx2gfm
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.2.0
5
+ platform: ruby
6
+ authors:
7
+ - Sebastian Spier
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-11-17 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.16'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.16'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ description: Convert a docx file, to github-flavored-markdown. thin wrapper around
42
+ pandoc.
43
+ email:
44
+ - github@spier.hu
45
+ executables:
46
+ - docx2gfm
47
+ extensions: []
48
+ extra_rdoc_files: []
49
+ files:
50
+ - ".gitignore"
51
+ - Gemfile
52
+ - LICENSE.txt
53
+ - MOTIVATION.md
54
+ - README.md
55
+ - Rakefile
56
+ - bin/console
57
+ - bin/docx2gfm
58
+ - bin/setup
59
+ - docx2gfm.gemspec
60
+ - examples/sample-pure-pandoc.md
61
+ - examples/sample.docx
62
+ - examples/sample.md
63
+ - lib/docx2gfm.rb
64
+ - lib/docx2gfm/assets/front-matter.md
65
+ - lib/docx2gfm/docx_gfm_converter.rb
66
+ - lib/docx2gfm/version.rb
67
+ homepage: https://github.com/spier/docx2gfm
68
+ licenses:
69
+ - MIT
70
+ metadata:
71
+ allowed_push_host: https://rubygems.org
72
+ post_install_message:
73
+ rdoc_options: []
74
+ require_paths:
75
+ - lib
76
+ required_ruby_version: !ruby/object:Gem::Requirement
77
+ requirements:
78
+ - - ">="
79
+ - !ruby/object:Gem::Version
80
+ version: '0'
81
+ required_rubygems_version: !ruby/object:Gem::Requirement
82
+ requirements:
83
+ - - ">="
84
+ - !ruby/object:Gem::Version
85
+ version: '0'
86
+ requirements: []
87
+ rubygems_version: 3.0.3
88
+ signing_key:
89
+ specification_version: 4
90
+ summary: Convert a docx file, to github-flavored-markdown
91
+ test_files: []