word-to-markdown 1.1.7 → 1.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 9d852a4ac53478b17c5336883ec861ea52bfac83
4
- data.tar.gz: 4505d3165e6e167a5908cf277d9c28d42b8b66e5
2
+ SHA256:
3
+ metadata.gz: f2b816a2ad9402eb1c45f74f806482502608337ac9127d072647bd4f1f97cd39
4
+ data.tar.gz: ff35f5c9f2e89c0e781ea864552f6f20bafd23083301c4d72c5c95c7aae4f38c
5
5
  SHA512:
6
- metadata.gz: b14cf7b9341a1f779b0943c050822bd8c2c358851e99c0e712722f6090f8d7d8ca3f2e1c79ded4663b51fca3706c8f490e1bcdda99d77d85e7aa0c0895615597
7
- data.tar.gz: d3ea8bb028083f5e9752f5989700c98de481e654e2a8188efc7d9741049605036c36b26fd17df6ef3d9aca2ef7ef358c736e589af980dea98b12be8f08025868
6
+ metadata.gz: e74c055913709cd0fa871ba95cf22b22a86089bb18fd9e88cd27d1dec3ec3b4927708fc9a591ba278b3e5cdc98150178cd17547ba76579702893645554370c83
7
+ data.tar.gz: 9b01d816e8f95d43fb19dc828213e3db4f7b0897957c6e3eba42b4f4fe888d3564db3b3730b416f9eb10837ca4911fd7083c73b3556c3313dd12cefe2f8ae597
data/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2014, Ben Balter
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,94 @@
1
+ # Word to Markdown converter
2
+
3
+ A Ruby gem to liberate content from [the jail that is Word documents](http://ben.balter.com/2012/10/19/we-ve-been-trained-to-make-paper/#jailbreaking-content)
4
+
5
+ [![CI](https://github.com/benbalter/word-to-markdown/actions/workflows/ci.yml/badge.svg)](https://github.com/benbalter/word-to-markdown/actions/workflows/ci.yml) [![Gem Version](https://badge.fury.io/rb/word-to-markdown.png)](http://badge.fury.io/rb/word-to-markdown) [![Inline docs](http://inch-ci.org/github/benbalter/word-to-markdown.png)](http://inch-ci.org/github/benbalter/word-to-markdown) [![Build status](https://ci.appveyor.com/api/projects/status/x2gnsfvli3q47a2e/branch/master?svg=true)](https://ci.appveyor.com/project/benbalter/word-to-markdown/branch/master) [![Maintainability](https://api.codeclimate.com/v1/badges/aae0d67ea7db185f1595/maintainability)](https://codeclimate.com/github/benbalter/word-to-markdown/maintainability) [![Test Coverage](https://api.codeclimate.com/v1/badges/aae0d67ea7db185f1595/test_coverage)](https://codeclimate.com/github/benbalter/word-to-markdown/test_coverage)
6
+
7
+ ## The problem
8
+
9
+ > Our default content publishing workflow is terribly broken. [We've all been trained to make paper](http://ben.balter.com/2012/10/19/we-ve-been-trained-to-make-paper/), yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s.
10
+ >
11
+ > I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to [Markdown](http://guides.github.com/overviews/mastering-markdown/), the *lingua franca* of the internet, but as my recent foray into building [just such a converter](http://word-to-markdown.herokuapp.com/) proves, it's not that simple.
12
+ >
13
+ > Markdown isn't just an alternative format. Markdown forces you to write for the web.
14
+
15
+ **[Read more](http://ben.balter.com/2014/03/31/word-versus-markdown-more-than-mere-semantics/)**
16
+
17
+ ## Just want to convert a Microsoft Word (or Google) document to Markdown?
18
+
19
+ You can use this **[hosted service](https://word2md.com/)** (or check out [its source](https://github.com/benbalter/word-to-markdown-server)).
20
+
21
+ ## Install
22
+
23
+ You'll need to install [LibreOffice](http://www.libreoffice.org/). Then:
24
+
25
+ ```bash
26
+ gem install word-to-markdown
27
+ ```
28
+
29
+ ## Usage
30
+
31
+ ```ruby
32
+ file = WordToMarkdown.new("/path/to/document.docx")
33
+ => <WordToMarkdown path="/path/to/document.docx">
34
+
35
+ file.to_s
36
+ => "# Test\n\n This is a test"
37
+
38
+ file.document.tree
39
+ => <Nokogiri Document>
40
+ ```
41
+
42
+ ### Command line usage
43
+
44
+ Once you've installed the gem, it's just:
45
+
46
+ ```
47
+ $ w2m path/to/document.docx
48
+ ```
49
+
50
+ *Outputs the resulting markdown to stdout*
51
+
52
+ ## Supports
53
+
54
+ * Paragraphs
55
+ * Numbered lists
56
+ * Unnumbered lists
57
+ * Nested lists
58
+ * Italic
59
+ * Bold
60
+ * Explicit headings (e.g., selected as "Heading 1" or "Heading 2")
61
+ * Implicit headings (e.g., text with a larger font size relative to paragraph text)
62
+ * Images
63
+ * Tables
64
+ * Hyperlinks
65
+
66
+ ## Requirements and configuration
67
+
68
+ Word-to-markdown requires `soffice` a command line interface to LibreOffice that works on Linux, Mac, and Windows. To install soffice, see [the LibreOffice documentation](https://www.libreoffice.org/get-help/install-howto/).
69
+
70
+ ## Testing
71
+
72
+ ```
73
+ script/cibuild
74
+ ```
75
+
76
+ ## Docker
77
+
78
+ First, create the `Gemfile.lock` by installing the dependencies:
79
+
80
+ ```
81
+ bundle install
82
+ ```
83
+
84
+ Everything you need to run the executable locally:
85
+
86
+ ```
87
+ docker-compose build
88
+ docker-compose run --rm app bundle exec w2m --help
89
+ docker-compose run --rm app bundle exec w2m test/fixtures/em.docx
90
+ ```
91
+
92
+ ## Hosted service
93
+
94
+ [Word-to-markdown-server](https://github.com/benbalter/word-to-markdown-server) contains a lightweight server for converting Word Documents as a service. A live version runs at [word2md.com](https://word2md.com).
data/bin/w2m CHANGED
@@ -1,13 +1,14 @@
1
1
  #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
2
3
 
3
4
  require 'word-to-markdown'
4
5
 
5
- if ARGV.size != 1 || ARGV[0] == "--help"
6
- puts "Usage: bundle exec w2m path/to/document.docx"
6
+ if ARGV.size != 1 || ARGV[0] == '--help'
7
+ puts 'Usage: bundle exec w2m path/to/document.docx'
7
8
  exit 1
8
9
  end
9
10
 
10
- if ARGV[0] == "--version"
11
+ if ARGV[0] == '--version'
11
12
  puts "WordToMarkdown v#{WordToMarkdown::VERSION}"
12
13
  puts "LibreOffice v#{WordToMarkdown.soffice.version}" unless Gem.win_platform?
13
14
  else
@@ -1,16 +1,18 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'sys/proctable'
2
4
 
3
5
  module Cliver
4
6
  class Dependency
5
-
6
7
  include Sys
7
8
 
8
9
  # Memoized shortcut for detect
9
10
  # Returns the path to the detected dependency
10
11
  # Raises an error if the dependency was not satisfied
11
- def path
12
+ def detected_path
12
13
  @detected_path ||= detect!
13
14
  end
15
+ alias path detected_path
14
16
 
15
17
  # Is the detected dependency currently open?
16
18
  def open?
@@ -22,14 +24,15 @@ module Cliver
22
24
 
23
25
  # Returns the version of the resolved dependency
24
26
  def version
25
- return @detected_version if defined? @detected_version
27
+ return @version if defined? @version
26
28
  return if Gem.win_platform?
27
- version = installed_versions.find { |p, v| p == path }
28
- @detected_version = version.nil? ? nil : version[1]
29
+
30
+ version = installed_versions.find { |p, _v| p == path }
31
+ @version = version.nil? ? nil : version[1]
29
32
  end
30
33
 
31
34
  def major_version
32
- version.split(".").first if version
35
+ version&.split('.')&.first
33
36
  end
34
37
  end
35
38
  end
@@ -1,7 +1,8 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Nokogiri
2
4
  module XML
3
5
  class Element
4
-
5
6
  DEFAULT_FONT_SIZE = 12.to_f
6
7
 
7
8
  # The node's font size
@@ -13,11 +14,11 @@ module Nokogiri
13
14
  end
14
15
 
15
16
  def bold?
16
- styles['font-weight'] && styles['font-weight'] == "bold"
17
+ styles['font-weight'] && styles['font-weight'] == 'bold'
17
18
  end
18
19
 
19
20
  def italic?
20
- styles['font-style'] && styles['font-style'] == "italic"
21
+ styles['font-style'] && styles['font-style'] == 'italic'
21
22
  end
22
23
  end
23
24
  end
@@ -1,18 +1,29 @@
1
- # encoding: utf-8
1
+ # frozen_string_literal: true
2
+
2
3
  class WordToMarkdown
3
4
  class Converter
4
-
5
5
  attr_reader :document
6
6
 
7
- HEADING_DEPTH = 6 # Number of headings to guess, e.g., h6
8
- HEADING_STEP = 100/HEADING_DEPTH
7
+ # Number of headings to guess, e.g., h6
8
+ HEADING_DEPTH = 6
9
+
10
+ # Percentile step for eaceh eheading
11
+ HEADING_STEP = 100 / HEADING_DEPTH
12
+
13
+ # Minimum heading size
9
14
  MIN_HEADING_SIZE = 20
10
- UNICODE_BULLETS = ["○", "o", "●", "\u2022", "\\p{C}"]
11
15
 
16
+ # Unicode bullets to strip when processing
17
+ UNICODE_BULLETS = ['○', 'o', '●', "\u2022", '\\p{C}'].freeze
18
+
19
+ # @param document [WordToMarkdown::Document] The document to convert
12
20
  def initialize(document)
13
21
  @document = document
14
22
  end
15
23
 
24
+ # Convert the document
25
+ #
26
+ # Note: this action is destructive!
16
27
  def convert!
17
28
  # Fonts and headings
18
29
  semanticize_font_styles!
@@ -29,35 +40,35 @@ class WordToMarkdown
29
40
  remove_numbering_from_list_items!
30
41
  end
31
42
 
32
- # Returns an array of Nokogiri nodes that are implicit headings
43
+ # @return [Array<Nokogiri::Node>] Return an array of Nokogiri Nodes that are implicit headings
33
44
  def implicit_headings
34
45
  @implicit_headings ||= begin
35
46
  headings = []
36
- @document.tree.css("[style]").each do |element|
47
+ @document.tree.css('[style]').each do |element|
37
48
  headings.push element unless element.font_size.nil? || element.font_size < MIN_HEADING_SIZE
38
49
  end
39
50
  headings
40
51
  end
41
52
  end
42
53
 
43
- # Returns an array of font-sizes for implicit headings in the document
54
+ # @return [Array<Integer>] An array of font-sizes for implicit headings in the document
44
55
  def font_sizes
45
56
  @font_sizes ||= begin
46
57
  sizes = []
47
- @document.tree.css("[style]").each do |element|
58
+ @document.tree.css('[style]').each do |element|
48
59
  sizes.push element.font_size.round(-1) unless element.font_size.nil?
49
60
  end
50
- sizes.uniq.sort
61
+ sizes.uniq.sort.extend(DescriptiveStatistics)
51
62
  end
52
63
  end
53
64
 
54
65
  # Given a Nokogiri node, guess what heading it represents, if any
55
66
  #
56
- # node - the nokigiri node
57
- #
58
- # retuns the heading tag (e.g., H1), or nil
67
+ # @param node [Nokigiri::Node] the nokigiri node
68
+ # @return [String, nil] the heading tag (e.g., H1), or nil
59
69
  def guess_heading(node)
60
- return nil if node.font_size == nil
70
+ return nil if node.font_size.nil?
71
+
61
72
  [*1...HEADING_DEPTH].each do |heading|
62
73
  return "h#{heading}" if node.font_size >= h(heading)
63
74
  end
@@ -67,51 +78,58 @@ class WordToMarkdown
67
78
  # Minimum font size required for a given heading
68
79
  # e.g., H(2) would represent the minimum font size of an implicit h2
69
80
  #
70
- # n - the heading number, e.g., 1, 2
81
+ # @param num [Integer] the heading number, e.g., 1, 2
71
82
  #
72
- # returns the minimum font size as an integer
73
- def h(n)
74
- font_sizes.percentile ((HEADING_DEPTH-1)-n) * HEADING_STEP
83
+ # @return [Integer] the minimum font size
84
+ def h(num)
85
+ font_sizes.percentile(((HEADING_DEPTH - 1) - num) * HEADING_STEP)
75
86
  end
76
87
 
88
+ # Convert span-based font styles to `strong`s and `em`s
77
89
  def semanticize_font_styles!
78
- @document.tree.css("span").each do |node|
90
+ @document.tree.css('span').each do |node|
79
91
  if node.bold?
80
- node.node_name = "strong"
92
+ node.node_name = 'strong'
81
93
  elsif node.italic?
82
- node.node_name = "em"
94
+ node.node_name = 'em'
83
95
  end
84
96
  end
85
97
  end
86
98
 
99
+ # Remove top-level paragraphs from table cells
87
100
  def remove_paragraphs_from_tables!
88
- @document.tree.search("td p").each { |node| node.node_name = "span" }
101
+ @document.tree.search('td p').each { |node| node.node_name = 'span' }
89
102
  end
90
103
 
104
+ # Remove top-level paragraphs from list items
91
105
  def remove_paragraphs_from_list_items!
92
- @document.tree.search("li p").each { |node| node.node_name = "span" }
106
+ @document.tree.search('li p').each { |node| node.node_name = 'span' }
93
107
  end
94
108
 
109
+ # Remove prepended unicode bullets from list items
95
110
  def remove_unicode_bullets_from_list_items!
96
- path = WordToMarkdown.soffice.major_version == "5" ? "li span span" : "li span"
111
+ path = WordToMarkdown.soffice.major_version == '5' ? 'li span span' : 'li span'
97
112
  @document.tree.search(path).each do |span|
98
- span.inner_html = span.inner_html.gsub /^([#{UNICODE_BULLETS.join("")}]+)/, ""
113
+ span.inner_html = span.inner_html.gsub(/^([#{UNICODE_BULLETS.join}]+)/, '')
99
114
  end
100
115
  end
101
116
 
117
+ # Remove prepended numbers from list items
102
118
  def remove_numbering_from_list_items!
103
- path = WordToMarkdown.soffice.major_version == "5" ? "li span span" : "li span"
119
+ path = WordToMarkdown.soffice.major_version == '5' ? 'li span span' : 'li span'
104
120
  @document.tree.search(path).each do |span|
105
- span.inner_html = span.inner_html.gsub /^[a-zA-Z0-9]+\./m, ""
121
+ span.inner_html = span.inner_html.gsub(/^[a-zA-Z0-9]+\./m, '')
106
122
  end
107
123
  end
108
124
 
125
+ # Remvoe whitespace from list items
109
126
  def remove_whitespace_from_list_items!
110
- @document.tree.search("li span").each { |span| span.inner_html.strip! }
127
+ @document.tree.search('li span').each { |span| span.inner_html.strip! }
111
128
  end
112
129
 
130
+ # Convert table headers to `th`s2
113
131
  def semanticize_table_headers!
114
- @document.tree.search("table tr:first td").each { |node| node.node_name = "th" }
132
+ @document.tree.search('table tr:first td').each { |node| node.node_name = 'th' }
115
133
  end
116
134
 
117
135
  # Try to guess heading where implicit bassed on font size
@@ -121,6 +139,5 @@ class WordToMarkdown
121
139
  element.node_name = heading unless heading.nil?
122
140
  end
123
141
  end
124
-
125
142
  end
126
143
  end
@@ -1,50 +1,55 @@
1
- # encoding: utf-8
1
+ # frozen_string_literal: true
2
+
2
3
  class WordToMarkdown
3
4
  class Document
4
5
  class NotFoundError < StandardError; end
5
- class ConverstionError < StandardError; end
6
6
 
7
- attr_reader :path, :raw_html, :tmpdir
7
+ class ConversionError < StandardError; end
8
+
9
+ attr_reader :path, :tmpdir
8
10
 
11
+ # @param path [string] Path to the Word document
12
+ # @param tmpdir [string] Path to a working directory to use
9
13
  def initialize(path, tmpdir = nil)
10
14
  @path = File.expand_path path, Dir.pwd
11
15
  @tmpdir = tmpdir || Dir.mktmpdir
12
16
  raise NotFoundError, "File #{@path} does not exist" unless File.exist?(@path)
13
17
  end
14
18
 
19
+ # @return [String] the document's extension
15
20
  def extension
16
21
  File.extname path
17
22
  end
18
23
 
24
+ # @return [Nokigiri::Document]
19
25
  def tree
20
26
  @tree ||= begin
21
27
  tree = Nokogiri::HTML(normalized_html)
22
- tree.css("title").remove
28
+ tree.css('title').remove
23
29
  tree
24
30
  end
25
31
  end
26
32
 
27
- # Returns the html representation of the document
33
+ # @return [String] the html representation of the document
28
34
  def html
29
- tree.to_html.gsub("</li>\n", "</li>")
35
+ tree.to_html.gsub("</li>\n", '</li>')
30
36
  end
31
37
 
32
- # Returns the markdown representation of the document
33
- def to_s
38
+ # @return [String] the markdown representation of the document
39
+ def markdown
34
40
  @markdown ||= scrub_whitespace(ReverseMarkdown.convert(html, WordToMarkdown::REVERSE_MARKDOWN_OPTIONS))
35
41
  end
42
+ alias to_s markdown
36
43
 
37
44
  # Determine the document encoding
38
45
  #
39
- # html - the raw html export
40
- #
41
- # Returns the encoding, defaulting to "UTF-8"
46
+ # @return [String] the encoding, defaulting to "UTF-8"
42
47
  def encoding
43
- match = raw_html.encode("UTF-8", :invalid => :replace, :replace => "").match(/charset=([^\"]+)/)
48
+ match = raw_html.encode('UTF-8', invalid: :replace, replace: '').match(/charset=([^"]+)/)
44
49
  if match
45
- match[1].sub("macintosh", "MacRoman")
50
+ match[1].sub('macintosh', 'MacRoman')
46
51
  else
47
- "UTF-8"
52
+ 'UTF-8'
48
53
  end
49
54
  end
50
55
 
@@ -52,55 +57,59 @@ class WordToMarkdown
52
57
 
53
58
  # Perform pre-processing normalization
54
59
  #
55
- # html - the raw html input from the export
56
- #
57
- # Returns the normalized html
60
+ # @return [String] the normalized html
58
61
  def normalized_html
59
- html = raw_html.force_encoding(encoding)
60
- html = html.encode("UTF-8", :invalid => :replace, :replace => "")
61
- html = Premailer.new(html, :with_html_string => true, :input_encoding => "UTF-8").to_inline_css
62
- html.gsub! /\n|\r/," " # Remove linebreaks
63
- html.gsub! /“|”/, '"' # Straighten curly double quotes
64
- html.gsub! /‘|’/, "'" # Straighten curly single quotes
65
- html.gsub! />\s+</, "><" # Remove extra whitespace between tags
62
+ html = raw_html.dup.force_encoding(encoding)
63
+ html = html.encode('UTF-8', invalid: :replace, replace: '')
64
+ html = Premailer.new(html, with_html_string: true, input_encoding: 'UTF-8').to_inline_css
65
+ html.gsub!(/\n|\r/, ' ') # Remove linebreaks
66
+ html.gsub!(/“|”/, '"') # Straighten curly double quotes
67
+ html.gsub!(/‘|’/, "'") # Straighten curly single quotes
68
+ html.gsub!(/>\s+</, '><') # Remove extra whitespace between tags
66
69
  html
67
70
  end
68
71
 
69
72
  # Perform post-processing normalization of certain Word quirks
70
73
  #
71
- # string - the markdown representation of the document
74
+ # @param string [String] the markdown representation of the document
72
75
  #
73
- # Returns the normalized markdown
76
+ # @return [String] the normalized markdown
74
77
  def scrub_whitespace(string)
75
- string.gsub!("&nbsp;", " ") # HTML encoded spaces
76
- string.sub!(/\A[[:space:]]+/,'') # document leading whitespace
77
- string.sub!(/[[:space:]]+\z/,'') # document trailing whitespace
78
- string.gsub!(/([ ]+)$/, '') # line trailing whitespace
79
- string.gsub!(/\n\n\n\n/,"\n\n") # Quadruple line breaks
80
- string.gsub!(/\u00A0/, "") # Unicode non-breaking spaces, injected as tabs
78
+ string = string.dup
79
+ string.gsub!('&nbsp;', ' ') # HTML encoded spaces
80
+ string.sub!(/\A[[:space:]]+/, '') # document leading whitespace
81
+ string.sub!(/[[:space:]]+\z/, '') # document trailing whitespace
82
+ string.gsub!(/([ ]+)$/, '') # line trailing whitespace
83
+ string.gsub!(/\n\n\n\n/, "\n\n") # Quadruple line breaks
84
+ string.delete!(' ') # Unicode non-breaking spaces, injected as tabs
85
+ string.gsub!(/\*\*\ +(?!\*|_)([[:punct:]])/, '**\1') # Remove extra space after bold
81
86
  string
82
87
  end
83
88
 
89
+ # @return [String] the path to the intermediary HTML document
84
90
  def dest_path
85
- dest_filename = File.basename(path).gsub(/#{Regexp.escape(extension)}$/, ".html")
91
+ dest_filename = File.basename(path).gsub(/#{Regexp.escape(extension)}$/, '.html')
86
92
  File.expand_path(dest_filename, tmpdir)
87
93
  end
88
94
 
95
+ # @return [String] the unnormalized HTML representation
89
96
  def raw_html
90
97
  @raw_html ||= begin
91
- WordToMarkdown::run_command '--headless', '--convert-to', filter, path, '--outdir', tmpdir
92
- raise ConverstionError, "Failed to convert #{path}" unless File.exists?(dest_path)
98
+ WordToMarkdown.run_command '--headless', '--convert-to', filter, path, '--outdir', tmpdir
99
+ raise ConversionError, "Failed to convert #{path}" unless File.exist?(dest_path)
100
+
93
101
  html = File.read dest_path
94
102
  File.delete dest_path
95
103
  html
96
104
  end
97
105
  end
98
106
 
107
+ # @return [String] the LibreOffice filter to use for conversion
99
108
  def filter
100
- if WordToMarkdown.soffice.major_version == "5"
101
- "html:XHTML Writer File:UTF8"
109
+ if WordToMarkdown.soffice.major_version == '5'
110
+ 'html:XHTML Writer File:UTF8'
102
111
  else
103
- "html"
112
+ 'html'
104
113
  end
105
114
  end
106
115
  end
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  class WordToMarkdown
2
- VERSION = "1.1.7"
4
+ VERSION = '1.1.9'
3
5
  end
@@ -1,4 +1,6 @@
1
- require 'descriptive_statistics'
1
+ # frozen_string_literal: true
2
+
3
+ require 'descriptive_statistics/safe'
2
4
  require 'reverse_markdown'
3
5
  require 'nokogiri-styles'
4
6
  require 'premailer'
@@ -16,86 +18,94 @@ require_relative 'nokogiri/xml/element'
16
18
  require_relative 'cliver/dependency_ext'
17
19
 
18
20
  class WordToMarkdown
19
-
20
21
  attr_reader :document, :converter
21
22
 
23
+ # Options to be passed to Reverse Markdown
22
24
  REVERSE_MARKDOWN_OPTIONS = {
23
25
  unknown_tags: :bypass,
24
26
  github_flavored: true
25
- }
27
+ }.freeze
26
28
 
29
+ # Minimum version of LibreOffice Required
27
30
  SOFFICE_VERSION_REQUIREMENT = '> 4.0'
28
31
 
32
+ # Paths to look for LibreOffice, in order of preference
29
33
  PATHS = [
30
- "*", # Sub'd for ENV["PATH"]
31
- "~/Applications/LibreOffice.app/Contents/MacOS",
32
- "/Applications/LibreOffice.app/Contents/MacOS",
33
- "/Program Files/LibreOffice 5/program",
34
- "/Program Files (x86)/LibreOffice 4/program"
35
- ]
34
+ '*', # Sub'd for ENV["PATH"]
35
+ '~/Applications/LibreOffice.app/Contents/MacOS',
36
+ '/Applications/LibreOffice.app/Contents/MacOS',
37
+ '/Program Files/LibreOffice 5/program',
38
+ '/Program Files (x86)/LibreOffice 4/program'
39
+ ].freeze
36
40
 
37
41
  # Create a new WordToMarkdown object
38
42
  #
39
- # input - a HTML string or path to an HTML file
40
- #
41
- # Returns the WordToMarkdown object
43
+ # @param path [string] Path to the Word document
44
+ # @param tmpdir [string] Path to a working directory to use
45
+ # @return [WordToMarkdown] WordToMarkdown object with the converted document
42
46
  def initialize(path, tmpdir = nil)
43
47
  @document = WordToMarkdown::Document.new path, tmpdir
44
48
  @converter = WordToMarkdown::Converter.new @document
45
49
  converter.convert!
46
50
  end
47
51
 
48
- def self.run_command(*args)
49
- raise "LibreOffice already running" if soffice.open?
50
-
51
- output, status = Open3.capture2e(soffice.path, *args)
52
- logger.debug output
53
- raise "Command `#{soffice.path} #{args.join(" ")}` failed: #{output}" if status.exitstatus != 0
54
- output
52
+ # Helper method to return the document body, as markdown
53
+ # @return [string] the document body, as markdown
54
+ def to_s
55
+ document.to_s
55
56
  end
56
57
 
57
- # Returns a Cliver::Dependency object representing our soffice dependency
58
- #
59
- # Attempts to resolve by looking at PATH followed by paths in the PATHS constant
60
- #
61
- # Methods used internally:
62
- # path - returns the resolved path. Raises an error if not satisfied
63
- # version - returns the resolved version
64
- # open - is the dependency currently open/running?
65
- def self.soffice
66
- @@soffice_dependency ||= Cliver::Dependency.new("soffice", *soffice_dependency_args)
67
- end
58
+ class << self
59
+ # Run an soffice command
60
+ #
61
+ # @param args [string] one or more arguments to pass to the sofice command
62
+ # @return [string] the command output
63
+ def run_command(*args)
64
+ raise 'LibreOffice already running' if soffice.open?
68
65
 
69
- def self.logger
70
- @@logger ||= begin
71
- logger = Logger.new(STDOUT)
72
- logger.level = Logger::ERROR unless ENV["DEBUG"]
73
- logger
66
+ output, status = Open3.capture2e(soffice.path, *args)
67
+ logger.debug output
68
+ raise "Command `#{soffice.path} #{args.join(' ')}` failed: #{output}" if status.exitstatus != 0
69
+
70
+ output
74
71
  end
75
- end
76
72
 
77
- # Pretty print the class in console
78
- def inspect
79
- "<WordToMarkdown path=\"#{@document.path}\">"
80
- end
73
+ # Returns a Cliver::Dependency object representing our soffice dependency
74
+ #
75
+ # Attempts to resolve by looking at PATH followed by paths in the PATHS constant
76
+ #
77
+ # Methods used internally:
78
+ # path - returns the resolved path. Raises an error if not satisfied
79
+ # version - returns the resolved version
80
+ # open - is the dependency currently open/running?
81
+ # @return Cliver::Dependency instance
82
+ def soffice
83
+ @soffice ||= Cliver::Dependency.new('soffice', *soffice_dependency_args)
84
+ end
81
85
 
82
- def to_s
83
- document.to_s
84
- end
86
+ # @return Logger instance
87
+ def logger
88
+ @logger ||= begin
89
+ logger = Logger.new($stdout)
90
+ logger.level = Logger::ERROR unless ENV['DEBUG']
91
+ logger
92
+ end
93
+ end
85
94
 
86
- private
95
+ private
87
96
 
88
- # Workaround for two upstream bugs:
89
- # 1. `soffice.exe --version` on windows opens a popup and retuns a null string when manually closed
90
- # 2. Even if the second argument to Cliver is nil, Cliver thinks there's a requirement
91
- # and will shell out to `soffice.exe --version`
92
- # In order to support Windows, don't pass *any* version requirement to Cliver
93
- def self.soffice_dependency_args
94
- args = [:path => PATHS.join(File::PATH_SEPARATOR)]
95
- if Gem.win_platform?
96
- args
97
- else
98
- args.unshift SOFFICE_VERSION_REQUIREMENT
97
+ # Workaround for two upstream bugs:
98
+ # 1. `soffice.exe --version` on windows opens a popup and retuns a null string when manually closed
99
+ # 2. Even if the second argument to Cliver is nil, Cliver thinks there's a requirement
100
+ # and will shell out to `soffice.exe --version`
101
+ # In order to support Windows, don't pass *any* version requirement to Cliver
102
+ def soffice_dependency_args
103
+ args = [path: PATHS.join(File::PATH_SEPARATOR)]
104
+ if Gem.win_platform?
105
+ args
106
+ else
107
+ args.unshift SOFFICE_VERSION_REQUIREMENT
108
+ end
99
109
  end
100
110
  end
101
111
  end
metadata CHANGED
@@ -1,29 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: word-to-markdown
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.7
4
+ version: 1.1.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ben Balter
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-01-04 00:00:00.000000000 Z
11
+ date: 2025-01-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: reverse_markdown
14
+ name: cliver
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '0.6'
19
+ version: '0.3'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '0.6'
26
+ version: '0.3'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: descriptive_statistics
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '2.5'
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri-styles
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '0.1'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '0.1'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: premailer
43
57
  requirement: !ruby/object:Gem::Requirement
@@ -53,131 +67,151 @@ dependencies:
53
67
  - !ruby/object:Gem::Version
54
68
  version: '1.8'
55
69
  - !ruby/object:Gem::Dependency
56
- name: nokogiri-styles
70
+ name: reverse_markdown
57
71
  requirement: !ruby/object:Gem::Requirement
58
72
  requirements:
59
- - - "~>"
73
+ - - ">="
60
74
  - !ruby/object:Gem::Version
61
- version: '0.1'
75
+ version: '1'
76
+ - - "<"
77
+ - !ruby/object:Gem::Version
78
+ version: '3'
62
79
  type: :runtime
63
80
  prerelease: false
64
81
  version_requirements: !ruby/object:Gem::Requirement
65
82
  requirements:
66
- - - "~>"
83
+ - - ">="
67
84
  - !ruby/object:Gem::Version
68
- version: '0.1'
85
+ version: '1'
86
+ - - "<"
87
+ - !ruby/object:Gem::Version
88
+ version: '3'
69
89
  - !ruby/object:Gem::Dependency
70
90
  name: sys-proctable
71
91
  requirement: !ruby/object:Gem::Requirement
72
92
  requirements:
73
93
  - - "~>"
74
94
  - !ruby/object:Gem::Version
75
- version: '0.9'
95
+ version: '1.0'
76
96
  type: :runtime
77
97
  prerelease: false
78
98
  version_requirements: !ruby/object:Gem::Requirement
79
99
  requirements:
80
100
  - - "~>"
81
101
  - !ruby/object:Gem::Version
82
- version: '0.9'
102
+ version: '1.0'
83
103
  - !ruby/object:Gem::Dependency
84
- name: cliver
104
+ name: minitest
85
105
  requirement: !ruby/object:Gem::Requirement
86
106
  requirements:
87
107
  - - "~>"
88
108
  - !ruby/object:Gem::Version
89
- version: '0.3'
90
- type: :runtime
109
+ version: '5.0'
110
+ type: :development
91
111
  prerelease: false
92
112
  version_requirements: !ruby/object:Gem::Requirement
93
113
  requirements:
94
114
  - - "~>"
95
115
  - !ruby/object:Gem::Version
96
- version: '0.3'
116
+ version: '5.0'
97
117
  - !ruby/object:Gem::Dependency
98
- name: rake
118
+ name: mocha
99
119
  requirement: !ruby/object:Gem::Requirement
100
120
  requirements:
101
121
  - - "~>"
102
122
  - !ruby/object:Gem::Version
103
- version: '10.4'
123
+ version: '1.1'
104
124
  type: :development
105
125
  prerelease: false
106
126
  version_requirements: !ruby/object:Gem::Requirement
107
127
  requirements:
108
128
  - - "~>"
109
129
  - !ruby/object:Gem::Version
110
- version: '10.4'
130
+ version: '1.1'
111
131
  - !ruby/object:Gem::Dependency
112
- name: shoulda
132
+ name: pry
113
133
  requirement: !ruby/object:Gem::Requirement
114
134
  requirements:
115
135
  - - "~>"
116
136
  - !ruby/object:Gem::Version
117
- version: '3.5'
137
+ version: '0.10'
118
138
  type: :development
119
139
  prerelease: false
120
140
  version_requirements: !ruby/object:Gem::Requirement
121
141
  requirements:
122
142
  - - "~>"
123
143
  - !ruby/object:Gem::Version
124
- version: '3.5'
144
+ version: '0.10'
125
145
  - !ruby/object:Gem::Dependency
126
- name: bundler
146
+ name: rake
127
147
  requirement: !ruby/object:Gem::Requirement
128
148
  requirements:
129
149
  - - "~>"
130
150
  - !ruby/object:Gem::Version
131
- version: '1.6'
151
+ version: '13.0'
132
152
  type: :development
133
153
  prerelease: false
134
154
  version_requirements: !ruby/object:Gem::Requirement
135
155
  requirements:
136
156
  - - "~>"
137
157
  - !ruby/object:Gem::Version
138
- version: '1.6'
158
+ version: '13.0'
139
159
  - !ruby/object:Gem::Dependency
140
- name: pry
160
+ name: rubocop
141
161
  requirement: !ruby/object:Gem::Requirement
142
162
  requirements:
143
163
  - - "~>"
144
164
  - !ruby/object:Gem::Version
145
- version: '0.10'
165
+ version: '1.0'
146
166
  type: :development
147
167
  prerelease: false
148
168
  version_requirements: !ruby/object:Gem::Requirement
149
169
  requirements:
150
170
  - - "~>"
151
171
  - !ruby/object:Gem::Version
152
- version: '0.10'
172
+ version: '1.0'
153
173
  - !ruby/object:Gem::Dependency
154
- name: mocha
174
+ name: rubocop-minitest
155
175
  requirement: !ruby/object:Gem::Requirement
156
176
  requirements:
157
177
  - - "~>"
158
178
  - !ruby/object:Gem::Version
159
- version: '1.1'
179
+ version: '0.3'
160
180
  type: :development
161
181
  prerelease: false
162
182
  version_requirements: !ruby/object:Gem::Requirement
163
183
  requirements:
164
184
  - - "~>"
165
185
  - !ruby/object:Gem::Version
166
- version: '1.1'
186
+ version: '0.3'
167
187
  - !ruby/object:Gem::Dependency
168
- name: minitest
188
+ name: rubocop-performance
169
189
  requirement: !ruby/object:Gem::Requirement
170
190
  requirements:
171
191
  - - "~>"
172
192
  - !ruby/object:Gem::Version
173
- version: '5.0'
193
+ version: '1.5'
174
194
  type: :development
175
195
  prerelease: false
176
196
  version_requirements: !ruby/object:Gem::Requirement
177
197
  requirements:
178
198
  - - "~>"
179
199
  - !ruby/object:Gem::Version
180
- version: '5.0'
200
+ version: '1.5'
201
+ - !ruby/object:Gem::Dependency
202
+ name: shoulda
203
+ requirement: !ruby/object:Gem::Requirement
204
+ requirements:
205
+ - - "~>"
206
+ - !ruby/object:Gem::Version
207
+ version: '4.0'
208
+ type: :development
209
+ prerelease: false
210
+ version_requirements: !ruby/object:Gem::Requirement
211
+ requirements:
212
+ - - "~>"
213
+ - !ruby/object:Gem::Version
214
+ version: '4.0'
181
215
  description: Ruby Gem to convert Word documents to markdown.
182
216
  email: ben.balter@github.com
183
217
  executables:
@@ -185,6 +219,8 @@ executables:
185
219
  extensions: []
186
220
  extra_rdoc_files: []
187
221
  files:
222
+ - LICENSE.md
223
+ - README.md
188
224
  - bin/w2m
189
225
  - lib/cliver/dependency_ext.rb
190
226
  - lib/nokogiri/xml/element.rb
@@ -195,8 +231,9 @@ files:
195
231
  homepage: https://github.com/benbalter/word-to-markdown
196
232
  licenses:
197
233
  - MIT
198
- metadata: {}
199
- post_install_message:
234
+ metadata:
235
+ rubygems_mfa_required: 'true'
236
+ post_install_message:
200
237
  rdoc_options: []
201
238
  require_paths:
202
239
  - lib
@@ -211,9 +248,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
211
248
  - !ruby/object:Gem::Version
212
249
  version: '0'
213
250
  requirements: []
214
- rubyforge_project:
215
- rubygems_version: 2.5.1
216
- signing_key:
251
+ rubygems_version: 3.5.16
252
+ signing_key:
217
253
  specification_version: 4
218
254
  summary: Ruby Gem to convert Word documents to markdown
219
255
  test_files: []