html2tex 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -59,3 +59,22 @@ comprehensive.
59
59
  StringScanner is used to process the HTML, but cannot read from a stream
60
60
  directly, so the entire input document must be read into memory as a string
61
61
  first.
62
+
63
+ UTF-8 is assumed everywhere; other character encodings will produce odd
64
+ results. If the HTML file to be processed is not in UTF-8 encoding with unix
65
+ line endings (at least, on Linux/OS X/etc.), _fix that first_. The usual
66
+ suspects will help here:
67
+
68
+ iconv -f windows-1252 -t utf-8 < somefile-win1252.html > somefile-utf8.html
69
+ dos2unix somefile-utf8.html
70
+
71
+ Next steps
72
+ ----------
73
+
74
+ If you have XeLaTex, you can easily turn the generated `.tex` file into a PDF:
75
+
76
+ xelatex my-book.tex
77
+
78
+ For better results, tweak the font settings or use a custom class like [this][ebook.cls].
79
+
80
+ [ebook.cls]: http://github.com/threedaymonk/gutenberg2pdf/blob/master/ebook.cls
@@ -21,6 +21,7 @@ class HTML2TeX
21
21
  private
22
22
  def read_html_head
23
23
  scanner.scan %r{\s*}
24
+ scanner.scan %r{<\?xml[^>]*?\?>\s*}i
24
25
  scanner.scan %r{<!doctype[^>]*>\s*}i
25
26
  scanner.scan %r{<html[^>]*>\s*}i
26
27
  if head = scanner.scan(%r{<head[^>]*>.*?</head>}im)
@@ -1,3 +1,3 @@
1
1
  class HTML2TeX
2
- VERSION = "0.1.2"
2
+ VERSION = "0.1.3"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html2tex
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Paul Battley
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2010-05-09 00:00:00 +01:00
12
+ date: 2010-05-10 00:00:00 +01:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency