text-hyphen 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. data/ChangeLog +4 -0
  2. data/Changelog +4 -0
  3. data/INSTALL +6 -0
  4. data/LICENCE +47 -0
  5. data/README +56 -0
  6. data/Rakefile +116 -0
  7. data/bin/hyphen +107 -0
  8. data/lib/text/hyphen.rb +289 -0
  9. data/lib/text/hyphen/language.rb +112 -0
  10. data/lib/text/hyphen/language/ca.rb +174 -0
  11. data/lib/text/hyphen/language/cs.rb +363 -0
  12. data/lib/text/hyphen/language/da.rb +118 -0
  13. data/lib/text/hyphen/language/de1.rb +723 -0
  14. data/lib/text/hyphen/language/de2.rb +685 -0
  15. data/lib/text/hyphen/language/en_uk.rb +791 -0
  16. data/lib/text/hyphen/language/en_us.rb +493 -0
  17. data/lib/text/hyphen/language/es.rb +289 -0
  18. data/lib/text/hyphen/language/et.rb +337 -0
  19. data/lib/text/hyphen/language/eu.rb +115 -0
  20. data/lib/text/hyphen/language/fi.rb +113 -0
  21. data/lib/text/hyphen/language/fr.rb +392 -0
  22. data/lib/text/hyphen/language/ga.rb +608 -0
  23. data/lib/text/hyphen/language/hr.rb +124 -0
  24. data/lib/text/hyphen/language/hsb.rb +180 -0
  25. data/lib/text/hyphen/language/hu1.rb +385 -0
  26. data/lib/text/hyphen/language/hu2.rb +1283 -0
  27. data/lib/text/hyphen/language/ia.rb +73 -0
  28. data/lib/text/hyphen/language/id.rb +97 -0
  29. data/lib/text/hyphen/language/is.rb +390 -0
  30. data/lib/text/hyphen/language/it.rb +135 -0
  31. data/lib/text/hyphen/language/la.rb +134 -0
  32. data/lib/text/hyphen/language/mn.rb +103 -0
  33. data/lib/text/hyphen/language/nl.rb +1253 -0
  34. data/lib/text/hyphen/language/no1.rb +303 -0
  35. data/lib/text/hyphen/language/no2.rb +138 -0
  36. data/lib/text/hyphen/language/pl.rb +480 -0
  37. data/lib/text/hyphen/language/pt.rb +56 -0
  38. data/lib/text/hyphen/language/sv.rb +449 -0
  39. data/tests/tc_text_hyphen.rb +62 -0
  40. metadata +90 -0
data/ChangeLog ADDED
@@ -0,0 +1,4 @@
1
+ == Text::Hyphen 1.0.0
2
+ * Initial version based on TeX::Hyphen 0.4.0 (some changes have been backported
3
+ to TeX::Hyphen 0.5.0).
4
+ * Incorporated many hyphenation pattern files from CTAN.
data/Changelog ADDED
@@ -0,0 +1,4 @@
1
+ == Text::Hyphen 1.0.0
2
+ * Initial version based on TeX::Hyphen 0.4.0 (some changes have been backported
3
+ to TeX::Hyphen 0.5.0).
4
+ * Incorporated many hyphenation pattern files from CTAN.
data/INSTALL ADDED
@@ -0,0 +1,6 @@
1
+ Installing this package is as simple as:
2
+
3
+ % ruby install.rb
4
+
5
+ Alternatively, you can use the RubyGem version of TeX::Hyphen available as
6
+ text-hyphen-1.0.0.gem from the usual sources.
data/LICENCE ADDED
@@ -0,0 +1,47 @@
1
+ Text::Hyphen is copyright (c) 2004 Austin Ziegler
2
+
3
+ Licensing for Text::Hyphen is unfortunately complex because of the various
4
+ copyrights and licences of the source hyphenation files. Some of these files
5
+ are available only under the TeX licence and others are available only under
6
+ the GNU GPL while others are public domain. Each language file has these
7
+ licences embedded within the file. Please consult each file's licence to
8
+ ensure that it is compatible with your application.
9
+
10
+ The copyright on the Text::Hyphen application/library and the Ruby
11
+ translations of hyphenation files belongs to Austin Ziegler. All other
12
+ copyrights on original versions still stand; Text::Hyphen is a derivative
13
+ work of these and other projects.
14
+
15
+ Application and Compilation Licences
16
+ ------------------------------------
17
+ Text::Hyphen, the application/library is licensed under the same terms as
18
+ Ruby. Note that this specifically refers to the contents of bin/hyphen,
19
+ lib/text/hyphen.rb, and lib/text/hyphen/language.rb.
20
+
21
+ Individual language hyphenation files are NOT licensed under these terms, but
22
+ under the following MIT-style licence and the original hyphenation pattern
23
+ licenses. The copyright for the original TeX hyphenation files is held by the
24
+ original authors; any mistakes in conversion of these files to Ruby is
25
+ attributable to the contributors to the Text::Hyphen package only.
26
+
27
+ The compilation package Text::Hyphen is licensed under the same terms as Ruby.
28
+
29
+ Blanket Language Hyphenation File Licence
30
+ -----------------------------------------
31
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
32
+ this software and associated documentation files (the "Software"), to deal in
33
+ the Software without restriction, including without limitation the rights to
34
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
35
+ of the Software, and to permit persons to whom the Software is furnished to do
36
+ so, subject to the following conditions:
37
+
38
+ The above copyright notice and this permission notice shall be included in all
39
+ copies or substantial portions of the Software.
40
+
41
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
42
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
43
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
44
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
45
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
46
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
47
+ SOFTWARE.
data/README ADDED
@@ -0,0 +1,56 @@
1
+ Text::Hyphen README
2
+ ===================
3
+
4
+ Text::Hyphen will properly hyphenate various words according to the rules of
5
+ the language the word is written in. The algorithm is based on that of the TeX
6
+ typesetting system by Donald E. Knuth. This is originally based on the Perl
7
+ implementation of TeX::Hyphen[1] and the Ruby port TeX::Hyphen[2]. The
8
+ language hyphenation pattern files are based on the sources available from
9
+ CTAN[3] as of 2004.12.19 and have been translated by Austin Ziegler.
10
+
11
+ This release is 1.0, the initial release of Text::Hyphen, representing a
12
+ significant improvement over its predecessor, TeX::Hyphen.
13
+
14
+ require 'text/hyphen'
15
+ hh = Text::Hyphen.new(:language => 'en_us', :left => 2, :right => 2)
16
+ # Defaults to the above
17
+ hh = TeX::Hyphen.new
18
+
19
+ word = "representation"
20
+ points = hyp.hyphenate(word) #=> [3, 5, 8, 10]
21
+ puts hyp.visualize(word) #=> rep-re-sen-ta-tion
22
+
23
+ Text::Hyphen is truly multilingual in nature[4]. As an example, consider the
24
+ difference between the following:
25
+
26
+ require 'text/hyphen'
27
+ # Using left and right minimum values of 0 ensures that you will see all
28
+ # possible hyphenation points, not just those that meet the minimum
29
+ # width requirements.
30
+ en = Text::Hyphen.new(:left => 0, :right => 0)
31
+ fr = Text::Hyphen.new(:language = "fr", :left => 0, :right => 0)
32
+
33
+ puts en.visualise("organiser") #=> or-gan-iser
34
+ puts fr.visualise("organiser") #=> or-ga-ni-ser
35
+
36
+ As you can see, the hyphenation is distinct between the two hyphenators.
37
+ Additional improvements over TeX::Hyphen include thread safety (except for
38
+ debug control) and support for UTF-8.
39
+
40
+ It is very important to read the LICENCE file and each language file desired,
41
+ as some languages may be held under a more strict licence than that granted by
42
+ LICENCE.
43
+
44
+ Copyright
45
+ =========
46
+ # Copyright 2004 Austin Ziegler <text-hyphen@halostatue.ca>
47
+ # See the LICENCE file for more information.
48
+
49
+ [1] <http://search.cpan.org/author/JANPAZ/TeX-Hyphen-0.140/lib/TeX/Hyphen.pm>
50
+ Maintained by Jan Pazdziora.
51
+ [2] Available at <http://rubyforge.org/projects/text-format>.
52
+ [3] <http://www.ctan.org>
53
+ [4] There are some bugs and design decisions in the original Perl
54
+ implementation of TeX::Hyphen that make it unsuitable for most
55
+ multilingual implementations that carried over to the Ruby port of
56
+ TeX::Hyphen.
data/Rakefile ADDED
@@ -0,0 +1,116 @@
1
+ #! /usr/bin/env rake
2
+ $LOAD_PATH.unshift('lib')
3
+
4
+ require 'rubygems'
5
+ require 'rake/gempackagetask'
6
+ require 'text/hyphen'
7
+ require 'archive/tar/minitar'
8
+ require 'zlib'
9
+
10
+ DISTDIR = "text-hyphen-#{Text::Hyphen::VERSION}"
11
+ TARDIST = "../#{DISTDIR}.tar.gz"
12
+
13
+ DATE_RE = %r<(\d{4})[./-]?(\d{2})[./-]?(\d{2})(?:[\sT]?(\d{2})[:.]?(\d{2})[:.]?(\d{2})?)?>
14
+
15
+ if ENV['RELEASE_DATE']
16
+ year, month, day, hour, minute, second = DATE_RE.match(ENV['RELEASE_DATE']).captures
17
+ year ||= 0
18
+ month ||= 0
19
+ day ||= 0
20
+ hour ||= 0
21
+ minute ||= 0
22
+ second ||= 0
23
+ ReleaseDate = Time.mktime(year, month, day, hour, minute, second)
24
+ else
25
+ ReleaseDate = nil
26
+ end
27
+
28
+ task :test do |t|
29
+ require 'test/unit/testsuite'
30
+ require 'test/unit/ui/console/testrunner'
31
+
32
+ runner = Test::Unit::UI::Console::TestRunner
33
+
34
+ $LOAD_PATH.unshift('tests')
35
+ $stderr.puts "Checking for test cases:" if t.verbose
36
+ Dir['tests/tc_*.rb'].each do |testcase|
37
+ $stderr.puts "\t#{testcase}" if t.verbose
38
+ load testcase
39
+ end
40
+
41
+ suite = Test::Unit::TestSuite.new("Text::Hyphen")
42
+
43
+ ObjectSpace.each_object(Class) do |testcase|
44
+ suite << testcase.suite if testcase < Test::Unit::TestCase
45
+ end
46
+
47
+ runner.run(suite)
48
+ end
49
+
50
+ spec = eval(File.read("text-hyphen.gemspec"))
51
+ spec.version = Text::Hyphen::VERSION
52
+ desc "Build the RubyGem for Text::Hyphen"
53
+ task :gem => [ :test ]
54
+ Rake::GemPackageTask.new(spec) do |g|
55
+ g.need_tar = false
56
+ g.need_zip = false
57
+ g.package_dir = ".."
58
+ end
59
+
60
+ desc "Build a Text::Hyphen .tar.gz distribution."
61
+ task :tar => [ TARDIST ]
62
+ file TARDIST => [ :test ] do |t|
63
+ current = File.basename(Dir.pwd)
64
+ Dir.chdir("..") do
65
+ begin
66
+ files = Dir["#{current}/**/*"].select { |dd| dd !~ %r{(?:/CVS/?|~$)} }
67
+ files.map! do |dd|
68
+ ddnew = dd.gsub(/^#{current}/, DISTDIR)
69
+ mtime = ReleaseDate || File.stat(dd).mtime
70
+ if File.directory?(dd)
71
+ { :name => ddnew, :mode => 0755, :dir => true, :mtime => mtime }
72
+ else
73
+ if dd =~ %r{bin/}
74
+ mode = 0755
75
+ else
76
+ mode = 0644
77
+ end
78
+ data = File.read(dd)
79
+ { :name => ddnew, :mode => mode, :data => data, :size => data.size,
80
+ :mtime => mtime }
81
+ end
82
+ end
83
+
84
+ ff = File.open(t.name.gsub(%r{^\.\./}o, ''), "wb")
85
+ gz = Zlib::GzipWriter.new(ff)
86
+ tw = Archive::Tar::Minitar::Writer.new(gz)
87
+
88
+ files.each do |entry|
89
+ if entry[:dir]
90
+ tw.mkdir(entry[:name], entry)
91
+ else
92
+ tw.add_file_simple(entry[:name], entry) { |os| os.write(entry[:data]) }
93
+ end
94
+ end
95
+ ensure
96
+ tw.close if tw
97
+ gz.close if gz
98
+ end
99
+ end
100
+ end
101
+ task TARDIST => [ :test ]
102
+
103
+ def sign(file)
104
+ system %("C:/Program Files/Windows Privacy Tools/GnuPG/gpg.exe" -ba #{file}).gsub(%r{/}) { "\\" }
105
+ raise "Error signing with GPG." unless File.exists?("#{file}.asc")
106
+ end
107
+
108
+ task :signtar => [ :tar ] do
109
+ sign TARDIST
110
+ end
111
+ task :signgem => [ :gem ] do
112
+ sign "../#{DISTDIR}.gem"
113
+ end
114
+
115
+ desc "Build everything."
116
+ task :default => [ :signtar, :signgem ]
data/bin/hyphen ADDED
@@ -0,0 +1,107 @@
1
+ #!/usr/bin/env ruby
2
+ # Text::Hyphen
3
+ # Copyright 2003 - 2004, Martin DeMello and Austin Ziegler
4
+ #
5
+ # Licensed under the same terms as Ruby.
6
+ #
7
+ # $Id: hyphen,v 1.2 2004/12/20 22:43:03 austin Exp $
8
+ #++
9
+
10
+ require 'optparse'
11
+ require 'ostruct'
12
+
13
+ begin
14
+ require 'text/hyphen'
15
+ rescue LoadError
16
+ require 'rubygems'
17
+ require 'text/hyphen'
18
+ end
19
+
20
+ options = OpenStruct.new
21
+ options.action = :visualise
22
+ ARGV.options do |opt|
23
+ opt.banner = "Usage: #{File.basename($0)} [options] [mode] word+"
24
+ opt.separator ""
25
+ opt.separator "Modes"
26
+ opt.on('-V', '--visualise', 'Visualises the hyphenation of the word.', 'Default action.') { |mode|
27
+ options.action = :visualise
28
+ }
29
+ opt.on('-P', '--points', 'Shows the letters on which a word will', 'be hyphenated.') { |mode|
30
+ options.action = :hyphenate
31
+ }
32
+ opt.on('-H', '--hyphenate-to SIZE', Numeric, 'Hyphenates the word so that the first', 'point is at least SIZE letters.') { |size|
33
+ options.action = :hyphenate_to
34
+ options.size = size.to_i
35
+ }
36
+ opt.on("-S", "--stats", 'Shows the hyphenation statistics for the', 'current language pattern dictionary.') { |mode|
37
+ options.action = :stats
38
+ }
39
+
40
+ opt.separator ""
41
+ opt.separator "Options"
42
+ opt.on('-L', '--left SIZE', Integer, 'Sets the minimum number of letters on', 'the left side of the word.') { |left|
43
+ options.left = left.to_i
44
+ }
45
+ opt.on('-R', '--right SIZE', Integer, 'Sets the minimum number of letters on', 'the right side of the word.') { |right|
46
+ options.right = right.to_i
47
+ }
48
+ opt.on('-l', '--language LANGUAGE', 'Loads the specified language resource.') { |lang|
49
+ options.language = lang
50
+ }
51
+
52
+ opt.separator ""
53
+ opt.on_tail('-h', '--help', 'Shows this help') {
54
+ $stderr.puts opt
55
+ exit 0
56
+ }
57
+ opt.on_tail('-v', '--version', 'Display the program and library version.') {
58
+ $stderr.puts "#{File.basename($0)}: Text::Hyphen version #{Text::Hyphen::VERSION}"
59
+ exit 0
60
+ }
61
+ opt.parse!
62
+ end
63
+
64
+ if ARGV.empty? and options.action != :stats
65
+ $stderr.puts ARGV.options
66
+ exit 0
67
+ end
68
+
69
+ hyphenator = Text::Hyphen.new do |h|
70
+ h.left = options.left if options.left
71
+ h.right = options.right if options.right
72
+ h.language = options.language if options.language
73
+ end
74
+
75
+ case options.action
76
+ when :visualise
77
+ size = 80
78
+ ARGV.each do |word|
79
+ vis = hyphenator.visualise(word)
80
+ if (size - vis.size - 1) < 0
81
+ puts
82
+ size = 80
83
+ end
84
+ size -= (vis.size + 1)
85
+ print "#{vis} "
86
+ end
87
+ when :hyphenate
88
+ ARGV.each do |word|
89
+ hyp = hyphenator.hyphenate(word)
90
+ print "#{word}: "
91
+ hyp.each { |pt| print "#{word[pt, 1]} " }
92
+ puts
93
+ end
94
+ when :hyphenate_to
95
+ size = 80
96
+ ARGV.each do |word|
97
+ vis = hyphenator.visualise_to(word, options.size)
98
+ if (size - vis.size - 1) < 0
99
+ puts
100
+ size = 80
101
+ end
102
+ size -= (vis.size + 1)
103
+ print "#{vis} "
104
+ end
105
+ when :stats
106
+ puts hyphenator.stats
107
+ end
@@ -0,0 +1,289 @@
1
+ module Text; end
2
+
3
+ # = Introduction
4
+ # Text::Hyphen -- hyphenate words using modified versions of TeX
5
+ # hyphenation patterns.
6
+ #
7
+ # == Usage
8
+ # require 'text/hyphen'
9
+ # hh = Text::Hyphen.new(:language => 'en_us', :left => 2, :right => 2)
10
+ # # Defaults to the above
11
+ # hh = TeX::Hyphen.new
12
+ #
13
+ # word = "representation"
14
+ # points = hyp.hyphenate(word) #=> [3, 5, 8, 10]
15
+ # puts hyp.visualize(word) #=> rep-re-sen-ta-tion
16
+ #
17
+ # en = Text::Hyphen.new(:left => 0, :right => 0)
18
+ # fr = Text::Hyphen.new(:language = "fr", :left => 0, :right => 0)
19
+ # puts en.visualise("organiser") #=> or-gan-iser
20
+ # puts fr.visualise("organiser") #=> or-ga-ni-ser
21
+ #
22
+ # == Description
23
+ # Creates a new Hyphen object and loads the language patterns into
24
+ # memory. The hyphenator can then be asked for the hyphenation of
25
+ # a word. If no language is specified, then the language en_us (EN_US)
26
+ # is used by default.
27
+ #
28
+ # Copyright:: Copyright (c) 2004 Austin Ziegler
29
+ # Version:: 1.0.0
30
+ # Based On:: <tt>TeX::Hyphen</tt> 0.4 Copyright (c) 2003 - 2004
31
+ # Martin DeMello and Austin Ziegler, in turn based on
32
+ # Perl's <tt>TeX::Hyphen</tt>
33
+ # [http://search.cpan.org/author/JANPAZ/TeX-Hyphen-0.140/lib/TeX/Hyphen.pm]
34
+ # Copyright (c) 1997 - 2002 Jan Pazdziora
35
+ #
36
+ # == Licence
37
+ # Licensing for Text::Hyphen is unfortunately complex because of the
38
+ # various copyrights and licences of the source hyphenation files. Some of
39
+ # these files are available only under the TeX licence and others are
40
+ # available only under the GNU GPL while others are public domain. Each
41
+ # language file has these licences embedded within the file. Please
42
+ # consult each file's licence to ensure that it is compatible with your
43
+ # application.
44
+ #
45
+ # The copyright on the Text::Hyphen application/library and the Ruby
46
+ # translations of hyphenation files belongs to Austin Ziegler. All other
47
+ # copyrights on original versions still stand; Text::Hyphen is a derivative
48
+ # work of these and other projects.
49
+ #
50
+ # === Application and Compilation Licences
51
+ # Text::Hyphen, the application/library is licensed under the same terms
52
+ # as Ruby. Note that this specifically refers to the contents of
53
+ # bin/hyphen, lib/text/hyphen.rb, and lib/text/hyphen/language.rb.
54
+ #
55
+ # Individual language hyphenation files are NOT licensed under these
56
+ # terms, but under the following MIT-style licence and the original
57
+ # hyphenation pattern licenses. The copyright for the original TeX
58
+ # hyphenation files is held by the original authors; any mistakes in
59
+ # conversion of these files to Ruby is attributable to the contributors to
60
+ # the Text::Hyphen package only.
61
+ #
62
+ # The compilation package Text::Hyphen is licensed under the same terms as
63
+ # Ruby.
64
+ #
65
+ # === Blanket Language Hyphenation File Licence
66
+ # Permission is hereby granted, free of charge, to any person obtaining
67
+ # a copy of this software and associated documentation files (the
68
+ # "Software"), to deal in the Software without restriction, including
69
+ # without limitation the rights to use, copy, modify, merge, publish,
70
+ # distribute, sublicense, and/or sell copies of the Software, and to
71
+ # permit persons to whom the Software is furnished to do so, subject to
72
+ # the following conditions:
73
+ #
74
+ # The above copyright notice and this permission notice shall be included
75
+ # in all copies or substantial portions of the Software.
76
+ #
77
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
78
+ # OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
79
+ # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
80
+ # IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
81
+ # CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
82
+ # TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
83
+ # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
84
+ class Text::Hyphen
85
+ DEBUG = false
86
+ VERSION = '1.0.0'
87
+
88
+ DEFAULT_MIN_LEFT = 2
89
+ DEFAULT_MIN_RIGHT = 2
90
+
91
+ # No fewer than this number of letters will show up to the left of the
92
+ # hyphen. This overrides the default specified in the language.
93
+ attr_accessor :left
94
+ # No fewer than this number of letters will show up to the right of the
95
+ # hyphen. This overrides the default specified in the language.
96
+ attr_accessor :right
97
+ # The name of the language to be used in hyphenating words. This will be
98
+ # a two or three character ISO 639 code, with the two character form
99
+ # being the canonical resource name. This will load the language
100
+ # hyphenation definitions from text/hyphen/language/&lt;code&gt; as
101
+ # a Ruby class. The resource 'text/hyphen/language/en_us' defines the
102
+ # language class Text::Hyphen::Language::EN_US. It also defines the
103
+ # secondary forms Text::Hyphen::Language::EN and
104
+ # Text::Hyphen::Language::ENG_US.
105
+ #
106
+ # Minimal transformations will be performed on the language code
107
+ # provided, such that any dashes are converted to underscores (e.g.,
108
+ # 'en-us' becomes 'en_us') and all characters are regularised. Resource
109
+ # names will be downcased and class names will be upcased (e.g., 'Pt'
110
+ # for the Portuguese language becomes 'pt' and 'PT', respectively).
111
+ #
112
+ # The language may also be specified as an instance of
113
+ # Text::Hyphen::Language.
114
+ attr_accessor :language
115
+ def language=(lang) #:nodoc:
116
+ require 'text/hyphen/language' unless defined?(Text::Hyphen::Language)
117
+ if lang.kind_of?(Text::Hyphen::Language)
118
+ @iso_language = lang.to_s.split(%r{::}o)[-1].downcase
119
+ @language = lang
120
+ else
121
+ @iso_language = lang.downcase
122
+ load_language
123
+ end
124
+ @iso_language
125
+ end
126
+ # Returns the language's ISO 639 ID, e.g., "en_us" or "pt".
127
+ attr_reader :iso_language
128
+
129
+ # The following initializations are equivalent:
130
+ #
131
+ # hyp = TeX::Hyphenate.new(:language => "EU")
132
+ # hyp = TeX::Hyphenate.new { |h| h.language = "EU" }
133
+ def initialize(options = {}) # :yields self:
134
+ @iso_language = options[:language]
135
+ @left = options[:left]
136
+ @right = options[:right]
137
+
138
+ @cache = {}
139
+ @vcache = {}
140
+
141
+ @hyphen = {}
142
+ @begin_hyphen = {}
143
+ @end_hyphen = {}
144
+ @both_hyphen = {}
145
+ @exception = {}
146
+
147
+ @first_load = true
148
+ yield self if block_given?
149
+ @first_load = false
150
+
151
+ load_language
152
+
153
+ @left ||= DEFAULT_MIN_LEFT
154
+ @right ||= DEFAULT_MIN_RIGHT
155
+ end
156
+
157
+ # Returns a list of places where the word can be divided, as
158
+ #
159
+ # hyp.hyphenate('representation')
160
+ #
161
+ # returns [3, 5, 8, 10]. If the word has been hyphenated previously, it
162
+ # will be returned from a per-instance cache.
163
+ def hyphenate(word)
164
+ word = word.downcase
165
+ $stderr.puts "Hyphenating #{word}" if DEBUG
166
+ return @cache[word] if @cache.has_key?(word)
167
+ res = @language.exceptions[word]
168
+ return @cache[word] = make_result_list(res) if res
169
+
170
+ result = [0] * (word.split(//).size + 1)
171
+ rightstop = word.split(//).size - @right
172
+
173
+ updater = Proc.new do |hash, str, pos|
174
+ if hash.has_key?(str)
175
+ $stderr.print "#{pos}: #{str}: #{hash[str]}" if DEBUG
176
+ hash[str].split(//).each_with_index do |cc, ii|
177
+ cc = cc.to_i
178
+ result[ii + pos] = cc if cc > result[ii + pos]
179
+ end
180
+ $stderr.print ": #{result}\n" if DEBUG
181
+ end
182
+ end
183
+
184
+ # Walk the word
185
+ (0..rightstop).each do |pos|
186
+ restlength = word.length - pos
187
+ (1..restlength).each do |length|
188
+ substr = word[pos, length]
189
+ updater[@language.hyphen, substr, pos]
190
+ updater[@language.start, substr, pos] if pos.zero?
191
+ updater[@language.stop, substr, pos] if (length == restlength)
192
+ end
193
+ end
194
+
195
+ updater[@language.both, word, 0] if @language.both[word]
196
+
197
+ (0..@left).each { |i| result[i] = 0 }
198
+ ((-1 - @right)..(-1)).each { |i| result[i] = 0 }
199
+ @cache[word] = make_result_list(result)
200
+ end
201
+
202
+ # Returns a visualization of the hyphenation points, so:
203
+ #
204
+ # hyp.visualize('representation')
205
+ #
206
+ # returns <tt>rep-re-sen-ta-tion</tt>, at least for English patterns. If
207
+ # the word has been visualised previously, it will be returned from
208
+ # a per-instance cache.
209
+ def visualise(word)
210
+ return @vcache[word] if @vcache.has_key?(word)
211
+ w = word.dup
212
+ hyphenate(w).each_with_index do |pos, n|
213
+ w[pos.to_i + n, 0] = '-' if pos != 0
214
+ end
215
+ @vcache[word] = w
216
+ end
217
+
218
+ alias visualize visualise
219
+
220
+ def clear_cache!
221
+ @cache.clear
222
+ @vcache.clear
223
+ end
224
+
225
+ # This function will hyphenate a word so that the first point is at most
226
+ # +size+ characters.
227
+ def hyphenate_to(word, size)
228
+ point = hyphenate(word).delete_if { |e| e >= size }.max
229
+ if point.nil?
230
+ [nil, word]
231
+ else
232
+ [word[0 ... point] + "-", word[point .. -1]]
233
+ end
234
+ end
235
+
236
+ # Returns statistics
237
+ def stats
238
+ _b = @language.both.size
239
+ _s = @language.start.size
240
+ _e = @language.stop.size
241
+ _h = @language.hyphen.size
242
+ _x = @language.exceptions.size
243
+ _T = _b + _s + _e + _h + _x
244
+
245
+ s = <<-EOS
246
+
247
+ The language '%s' contains %d total hyphenation patterns.
248
+ % 6d patterns are word start patterns.
249
+ % 6d patterns are word stop patterns.
250
+ % 6d patterns are word start/stop patterns.
251
+ % 6d patterns are normal patterns.
252
+ % 6d patterns are exceptions.
253
+
254
+ EOS
255
+ s % [ @iso_language, _T, _s, _e, _b, _h, _x ]
256
+ end
257
+
258
+ private
259
+ def updateresult(hash, str, pos) #:nodoc:
260
+ if hash.has_key?(str)
261
+ STDERR.print "#{pos}: #{str}: #{hash[str]}" if DEBUG
262
+ hash[str].split('').each_with_index do |c, i|
263
+ c = c.to_i
264
+ @result[i + pos] = c if c > @result[i + pos]
265
+ end
266
+ STDERR.puts ": #{@result}" if DEBUG
267
+ end
268
+ end
269
+
270
+ def make_result_list(res) #:nodoc:
271
+ r = []
272
+ res.each_with_index { |c, i| r << i * (c.to_i % 2) }
273
+ r.reject { |i| i.to_i == 0 }
274
+ end
275
+
276
+ def load_language
277
+ return if @first_load
278
+
279
+ @iso_language ||= "en_us"
280
+
281
+ require "text/hyphen/language/#{@iso_language}"
282
+
283
+ @language = Text::Hyphen::Language.const_get(@iso_language.upcase)
284
+ @left ||= @language.left
285
+ @right ||= @language.right
286
+
287
+ @iso_language
288
+ end
289
+ end