crawdad 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,13 @@
1
+ = Version 0.1.0 (April 22, 2010)
2
+
3
+ * Add support for centered, ragged-right, and ragged-left text.
4
+
5
+ * Work around an issue with hyphenating some words like "to" using
6
+ text-hyphen.
7
+
8
+ * Fix segfault when a token stream starts with glue.
9
+
10
+
11
+ = Version 0.0.1 (March 19, 2010)
12
+
13
+ * Initial release.
@@ -0,0 +1,61 @@
1
+ # Crawdad: Knuth-Plass linebreaking for Ruby
2
+
3
+ Crawdad is a basic implementation of the Knuth-Plass linebreaking
4
+ algorithm for breaking paragraphs of text into lines. It uses a total-fit
5
+ method to ensure consistency across the entire paragraph at once. The
6
+ algorithm is a simplified version of the one used in the TeX typesetting
7
+ system.
8
+
9
+ ## References
10
+
11
+ The canonical reference for this algorithm is "Breaking Paragraphs Into
12
+ Lines", by Donald E. Knuth and Michael F. Plass, originally published in
13
+ Software Practice and Experience 11 (1981), pp. 1119-1184. It is reprinted
14
+ in the excellent monograph Digital Typography (Knuth 1999).
15
+
16
+ This implementation was inspired by Bram Stein's TypeSet project, an
17
+ implementation of Knuth-Plass in Javascript that uses HTML5 Canvas.
18
+
19
+ http://www.bramstein.com/projects/typeset/
20
+
21
+ ## To Do
22
+
23
+ ### Short-Term
24
+
25
+ * Collect all of our constants and magic numbers and expose them somewhat to
26
+ the user. Survey other software (TeX?) to harvest good starting values for
27
+ these parameters.
28
+
29
+ * Fix the demerits calculation to the TeX "improved" formula (Digital
30
+ Typography p. 154). Thanks to Bram Stein for pointing this out.
31
+
32
+ * Implement the looseness parameter q (algorithm "Choose the appropriate
33
+ active node", Digital Typography p. 120).
34
+
35
+ ### Long-Term
36
+
37
+ * Bring this into Prawn, and integrate (if possible) with the Text::Box API.
38
+
39
+ * The tokenizer could be smarter; it should recognize more than just
40
+ low-ASCII hyphens as hyphens / dashes, and it can get confused when
41
+ whitespace and hyphens interact.
42
+
43
+ * Automatically relax the thresholds when the constraints cannot be
44
+ satisfied? Or we could look into TeX's two-pass method (pp. 121-122).
45
+
46
+ ## Acknowledgements
47
+
48
+ Thanks are due to the following individuals:
49
+
50
+ * Donald Knuth and Michael Plass created the original algorithm and data
51
+ structures.
52
+
53
+ * Bram Stein wrote the aforementioned Javascript implementation of the
54
+ Knuth-Plass algorithm. His code was helpful in exposing some of the darker
55
+ corners of the original authors' description of the method.
56
+
57
+ ## License
58
+
59
+ Crawdad is copyrighted free software, written by Brad Ediger. See the
60
+ LICENSE file for details.
61
+
data/Rakefile CHANGED
@@ -4,10 +4,10 @@ require 'rake/testtask'
4
4
  require 'rake/rdoctask'
5
5
  require 'rake/gempackagetask'
6
6
 
7
- CRAWDAD_VERSION = '0.0.1'
8
-
7
+ # Build must be the default task, to fake out using a Makefile to build a
8
+ # non-Ruby extension with Rubygems. There's probably an easier way, but I can't
9
+ # find it.
9
10
  task :default => [:build]
10
-
11
11
  task :build do
12
12
  system "make -Cext/crawdad"
13
13
  end
@@ -27,29 +27,7 @@ Rake::RDocTask.new do |rdoc|
27
27
  rdoc.title = "Crawdad Documentation"
28
28
  end
29
29
 
30
- spec = Gem::Specification.new do |spec|
31
- spec.name = 'crawdad'
32
- spec.version = CRAWDAD_VERSION
33
- spec.platform = Gem::Platform::RUBY
34
- spec.summary = "Knuth-Plass linebreaking for Ruby"
35
- spec.files = FileList["lib/**/**/*"] + FileList["ext/crawdad/*"]
36
- spec.require_paths << 'ext'
37
-
38
- binaries = FileList['ext/crawdad/*.bundle', 'ext/crawdad/*.so']
39
- spec.extensions << 'Rakefile'
40
- spec.files += binaries.to_a
41
-
42
- spec.has_rdoc = true
43
- spec.rdoc_options << '--title' << 'Crawdad Documentation' << '-q'
44
- spec.author = 'Brad Ediger'
45
- spec.email = 'brad.ediger@madriska.com'
46
- spec.homepage = 'http://github.com/madriska/crawdad'
47
- spec.description = <<END_DESC
48
- Crawdad is an implementation of Knuth-Plass linebreaking (justification)
49
- for Ruby.
50
- END_DESC
51
- end
52
-
30
+ spec = Gem::Specification.load("crawdad.gemspec")
53
31
  Rake::GemPackageTask.new(spec) do |pkg|
54
32
  pkg.need_tar = true
55
33
  end
@@ -0,0 +1,24 @@
1
+ Gem::Specification.new do |spec|
2
+ spec.name = 'crawdad'
3
+ spec.version = '0.1.0'
4
+ spec.platform = Gem::Platform::RUBY
5
+ spec.summary = "Knuth-Plass linebreaking for Ruby"
6
+ spec.files = FileList["lib/**/**/*", "ext/crawdad/*", "README.markdown",
7
+ "crawdad.gemspec", "CHANGELOG"]
8
+ spec.require_paths << 'ext'
9
+
10
+ binaries = FileList['ext/crawdad/*.bundle', 'ext/crawdad/*.so']
11
+ spec.extensions << 'Rakefile'
12
+ spec.files += binaries.to_a
13
+
14
+ spec.has_rdoc = true
15
+ spec.rdoc_options << '--title' << 'Crawdad Documentation' << '-q'
16
+ spec.author = 'Brad Ediger'
17
+ spec.email = 'brad.ediger@madriska.com'
18
+ spec.homepage = 'http://github.com/madriska/crawdad'
19
+ spec.description = <<END_DESC
20
+ Crawdad is an implementation of Knuth-Plass linebreaking (justification)
21
+ for Ruby.
22
+ END_DESC
23
+ end
24
+
@@ -1,6 +1,6 @@
1
1
  OS:=$(shell uname | sed 's/[-_].*//')
2
2
  CFLAGS=-Wall -O2 -fPIC
3
- #CFLAGS=-Wall -fPIC -g
3
+ #CFLAGS=-Wall -fPIC -ggdb
4
4
  SHARED=-shared
5
5
  SOEXT:=.so
6
6
 
@@ -107,7 +107,7 @@ void foreach_legal_breakpoint(token *stream[], float width, float threshold,
107
107
  tw += t->box.width;
108
108
  break;
109
109
  case GLUE:
110
- if(stream[i-1]->box.type == BOX)
110
+ if((i > 0) && (stream[i-1]->box.type == BOX))
111
111
  fn(stream, i, tw, ty, tz, width, threshold);
112
112
  tw += t->glue.width;
113
113
  ty += t->glue.stretch;
Binary file
Binary file
@@ -14,6 +14,10 @@ module Crawdad
14
14
  layout :type, Type,
15
15
  :width, :float,
16
16
  :content, :string
17
+
18
+ def inspect
19
+ "(box %.2f %s)" % [self[:width], self[:content].inspect]
20
+ end
17
21
  end
18
22
 
19
23
  def box(width, content)
@@ -29,6 +33,10 @@ module Crawdad
29
33
  :width, :float,
30
34
  :stretch, :float,
31
35
  :shrink, :float
36
+
37
+ def inspect
38
+ "(glue %.2f %.2f %.2f)" % [self[:width], self[:stretch], self[:shrink]]
39
+ end
32
40
  end
33
41
 
34
42
  def glue(width, stretch, shrink)
@@ -48,6 +56,11 @@ module Crawdad
48
56
  :width, :float,
49
57
  :penalty, :float,
50
58
  :flagged, :int
59
+
60
+ def inspect
61
+ "(penalty %.2f %.2f#{" F" if self[:flagged] == 1})" %
62
+ [self[:penalty], self[:width]]
63
+ end
51
64
  end
52
65
 
53
66
  def penalty(penalty, width=0.0, flagged=false)
@@ -36,7 +36,15 @@ module Crawdad
36
36
  # number of PDF points.
37
37
  #
38
38
  def paragraph(text, options={})
39
+ @align = options[:align] || :justify
40
+
39
41
  hyphenator = if options[:hyphenation]
42
+ # Box-glue-penalty model does not easily permit optional hyphenation
43
+ # with the construction we use for centered text.
44
+ if @align == :center
45
+ raise ArgumentError, "Hyphenation is not supported with centered text"
46
+ end
47
+
40
48
  begin
41
49
  gem 'text-hyphen'
42
50
  require 'text/hyphen'
@@ -49,78 +57,125 @@ module Crawdad
49
57
  @hyphenators[language] ||= Text::Hyphen.new(:language => language)
50
58
  end
51
59
 
52
- stream = []
53
-
54
- if w = options[:indent]
55
- stream << box(w, "")
56
- end
57
-
58
- # Interword glue can stretch by half and shrink by a third.
59
- # TODO: optimal stretch/shrink ratios
60
- space_width = @pdf.width_of(" ")
61
- interword_glue = glue(space_width,
62
- space_width / 2.0,
63
- space_width / 3.0)
64
-
65
- # TODO: optimal values for sentence space w/y/z
66
- sentence_space_width = space_width * 1.5
67
- sentence_glue = glue(sentence_space_width,
68
- sentence_space_width / 2.0,
69
- sentence_space_width / 3.0)
60
+ stream = starting_tokens(options[:indent])
70
61
 
71
62
  # Break paragraph on whitespace.
72
63
  # TODO: how should "battle-\nfield" be tokenized?
73
- text.strip.split(/\s+/).each do |word|
64
+ words = text.strip.split(/\s+/)
65
+
66
+ words.each_with_index do |word, i|
74
67
  w = StringScanner.new(word)
75
68
 
76
69
  # For hyphenated words, follow each hyphen by a zero-width flagged
77
70
  # penalty.
78
- # TODO: recognize dashes in all their variants
79
71
  while seg = w.scan(/[^-]+-/) # "night-time" --> "<<night->>time"
80
- stream.concat add_word_segment(seg, hyphenator)
72
+ stream += word_segment(seg, hyphenator)
81
73
  end
82
74
 
83
- stream.concat(add_word_segment(w.rest, hyphenator))
84
-
85
- # TODO: add ties (~) or some other way to denote a period that
86
- # doesn't end a sentence.
87
- if w.rest =~ /\.$/
88
- stream << sentence_glue
89
- else
90
- stream << interword_glue
75
+ stream += word_segment(w.rest, hyphenator)
76
+
77
+ unless i == words.length - 1
78
+ stream += interword_tokens
91
79
  end
92
80
  end
93
81
 
94
- # Remove extra glue at the end.
95
- stream.pop if token_type(stream.last) == :glue
96
-
97
- # Finish paragraph with a penalty inhibiting a break, finishing glue (to
98
- # pad out the last line), and a forced break to finish the paragraph.
99
- stream << penalty(Infinity)
100
- stream << glue(0, Infinity, 0)
101
- stream << penalty(-Infinity)
82
+ # Add needed tokens to finish off the paragraph.
83
+ stream += finishing_tokens
102
84
 
103
85
  stream
104
86
  end
105
87
 
106
88
  protected
107
89
 
108
- # Returns a series of tokens representing the given word. Hyphenates using
109
- # the given +hyphenator+, if provided. Appends a zero-width flagged penalty
110
- # if the given word ends in a hyphen.
90
+ # Width of one space.
91
+ #
92
+ def space
93
+ @space ||= @pdf.width_of(" ")
94
+ end
95
+
96
+ # Tokens used to start a paragraph. Accepts one argument, +indent_width+,
97
+ # the amount by which to indent the first line, which only really makes
98
+ # sense for justified or ragged-left text.
99
+ #
100
+ def starting_tokens(indent_width)
101
+ if @align == :center
102
+ [box(0, ""), glue(0, 3*space, 0)]
103
+ elsif indent_width
104
+ [box(w, "")]
105
+ else
106
+ []
107
+ end
108
+ end
109
+
110
+ # Tokens used between words in a sentence. This depends on @align; the
111
+ # box-glue-penalty model is flexible enough to accommodate ragged (right or
112
+ # left), centered, or justified text.
111
113
  #
112
- def add_word_segment(word, hyphenator)
114
+ # See Digital Typography pp. 93-95 for details.
115
+ #
116
+ def interword_tokens
117
+ case @align
118
+ when :justify
119
+ [glue(space, space / 2.0, space / 3.0)]
120
+ when :center
121
+ [glue(0, 3*space, 0), penalty(0), glue(space, -6*space, 0), box(0, ""),
122
+ penalty(Infinity), glue(0, 3*space, 0)]
123
+ else # :right, :left
124
+ [glue(0, 3*space, 0), penalty(0), glue(space, -3*space, 0)]
125
+ end
126
+ end
127
+
128
+ # Tokens representing a possible hyphenation point.
129
+ #
130
+ def optional_hyphen
131
+ hyphen = @pdf.width_of('-')
132
+
133
+ if @align == :justify
134
+ [penalty(50, hyphen, true)]
135
+ else # :left or :right (:center is incompatible with hyphenation)
136
+ # Hyphens cost 10 times more in unjustified text because we can usually
137
+ # do better to avoid them.
138
+ [penalty(Infinity), glue(0, 3*space, 0), penalty(500, hyphen, true),
139
+ glue(0, -3*space, 0)]
140
+ end
141
+ end
142
+
143
+ # Tokens to finish out a paragraph -- pad out the last line if needed, and
144
+ # force a break.
145
+ #
146
+ def finishing_tokens
147
+ if @align == :center
148
+ [glue(0, 3*space, 0), penalty(-Infinity)]
149
+ else
150
+ [penalty(Infinity), glue(0, Infinity, 0), penalty(-Infinity)]
151
+ end
152
+ end
153
+
154
+ # Returns tokens representing the given word. Hyphenates using the given
155
+ # +hyphenator+, if provided. Appends a zero-width flagged penalty if the
156
+ # given word ends in a hyphen.
157
+ #
158
+ def word_segment(word, hyphenator)
113
159
  tokens = []
114
160
 
115
161
  if hyphenator
116
- hyphen_width = @pdf.width_of('-')
162
+ begin
163
+ splits = hyphenator.hyphenate(word)
164
+ rescue NoMethodError => e
165
+ if e.message =~ /each_with_index/
166
+ # known issue wth text-hyphen 1.0.0:
167
+ # http://rubyforge.org/tracker/index.php?func=detail&aid=28128&group_id=294&atid=1195
168
+ splits = []
169
+ else
170
+ raise
171
+ end
172
+ end
117
173
 
118
- splits = hyphenator.hyphenate(word)
119
- # For each hyphenated segment, add the box with an optional penalty.
174
+ # For each hyphenated segment, add the box with an optional hyphen.
120
175
  [0, *splits].each_cons(2) do |a, b|
121
176
  seg = word[a...b]
122
177
  tokens << box(@pdf.width_of(seg), seg)
123
- tokens << penalty(50, @pdf.width_of('-'), true)
178
+ tokens += optional_hyphen
124
179
  end
125
180
 
126
181
  last = word[(splits.last || 0)..-1]
metadata CHANGED
@@ -1,7 +1,12 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: crawdad
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 1
8
+ - 0
9
+ version: 0.1.0
5
10
  platform: ruby
6
11
  authors:
7
12
  - Brad Ediger
@@ -9,11 +14,11 @@ autorequire:
9
14
  bindir: bin
10
15
  cert_chain: []
11
16
 
12
- date: 2010-03-19 00:00:00 -05:00
17
+ date: 2010-04-22 00:00:00 -05:00
13
18
  default_executable:
14
19
  dependencies: []
15
20
 
16
- description: Crawdad is an implementation of Knuth-Plass linebreaking (justification) for Ruby.
21
+ description: " Crawdad is an implementation of Knuth-Plass linebreaking (justification)\n for Ruby.\n"
17
22
  email: brad.ediger@madriska.com
18
23
  executables: []
19
24
 
@@ -22,28 +27,34 @@ extensions:
22
27
  extra_rdoc_files: []
23
28
 
24
29
  files:
25
- - lib/crawdad
26
- - lib/crawdad/prawn_tokenizer.rb
27
30
  - lib/crawdad/breakpoint.rb
28
- - lib/crawdad/native.rb
29
- - lib/crawdad/ffi.rb
30
- - lib/crawdad/paragraph.rb
31
31
  - lib/crawdad/compatibility.rb
32
- - lib/crawdad/tokens.rb
33
- - lib/crawdad/ffi
32
+ - lib/crawdad/ffi/breakpoint_node.rb
34
33
  - lib/crawdad/ffi/paragraph.rb
35
34
  - lib/crawdad/ffi/tokens.rb
36
- - lib/crawdad/ffi/breakpoint_node.rb
35
+ - lib/crawdad/ffi.rb
36
+ - lib/crawdad/native.rb
37
+ - lib/crawdad/paragraph.rb
38
+ - lib/crawdad/prawn_tokenizer.rb
39
+ - lib/crawdad/tokens.rb
37
40
  - lib/crawdad.rb
38
41
  - ext/crawdad/breakpoint.h
39
- - ext/crawdad/tokens.c
40
- - ext/crawdad/paragraph.h
42
+ - ext/crawdad/crawdad.bundle
41
43
  - ext/crawdad/Makefile
42
- - ext/crawdad/tokens.h
43
44
  - ext/crawdad/paragraph.c
45
+ - ext/crawdad/paragraph.h
46
+ - ext/crawdad/paragraph.o
47
+ - ext/crawdad/tokens.c
48
+ - ext/crawdad/tokens.h
49
+ - ext/crawdad/tokens.o
50
+ - README.markdown
51
+ - crawdad.gemspec
52
+ - CHANGELOG
44
53
  - Rakefile
45
54
  has_rdoc: true
46
55
  homepage: http://github.com/madriska/crawdad
56
+ licenses: []
57
+
47
58
  post_install_message:
48
59
  rdoc_options:
49
60
  - --title
@@ -56,20 +67,22 @@ required_ruby_version: !ruby/object:Gem::Requirement
56
67
  requirements:
57
68
  - - ">="
58
69
  - !ruby/object:Gem::Version
70
+ segments:
71
+ - 0
59
72
  version: "0"
60
- version:
61
73
  required_rubygems_version: !ruby/object:Gem::Requirement
62
74
  requirements:
63
75
  - - ">="
64
76
  - !ruby/object:Gem::Version
77
+ segments:
78
+ - 0
65
79
  version: "0"
66
- version:
67
80
  requirements: []
68
81
 
69
82
  rubyforge_project:
70
- rubygems_version: 1.3.1
83
+ rubygems_version: 1.3.6
71
84
  signing_key:
72
- specification_version: 2
85
+ specification_version: 3
73
86
  summary: Knuth-Plass linebreaking for Ruby
74
87
  test_files: []
75
88