ritex 0.1 → 0.2

Sign up to get free protection for your applications and to get access to all the features.
data/README CHANGED
@@ -1,14 +1,12 @@
1
1
  Author:: William Morgan (mailto: wmorgan-ritex@masanjin.net)
2
- Copyright:: Copyright 2005 William Morgan
2
+ Copyright:: Copyright 2005--2009 William Morgan
3
3
  License:: GNU GPL version 2
4
4
 
5
5
  = Introduction
6
6
 
7
- Ritex converts expressions from WebTeX into MathML. WebTeX is an
8
- adaptation of TeX math syntax for web display.
9
-
10
- Ritex makes inserting math into HTML pages easy. It supports most TeX
11
- math syntax as well as macros.
7
+ Ritex converts expressions from WebTeX into MathML. WebTeX is an adaptation of
8
+ TeX math syntax for web display. Ritex supports most TeX math syntax, and
9
+ supports macros.
12
10
 
13
11
  For example, Ritex turns
14
12
  \alpha^\beta
@@ -20,20 +18,18 @@ into
20
18
  </msup>
21
19
  </math>
22
20
 
23
- Ritex is based heavily on itex2mml
24
- (http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html), a popular TeX
25
- math to MathML convertor--so much so that the default correct answer
26
- to unit tests is to do whatever itex2mml does!
21
+ = How to get Ritex
27
22
 
28
- Ritex features several advantages over itex2mml:
23
+ To install, use RubyGems: gem install ritex.
24
+ For examples and git instructions, see the home page: http://masanjin.net/ritex/
29
25
 
30
- * It's written in Ruby (hey, I consider that an advantage).
31
- * It supports macros.
32
- * It handles unary minus better.
33
- * It's easier to extend.
26
+ = News about Ritex
27
+
28
+ See the blog: http://all-thing.net/label/ritex/.
34
29
 
35
30
  = Synopsis
36
31
 
32
+ require 'rubygems'
37
33
  require 'ritex'
38
34
  p = Ritex::Parser.new
39
35
  ARGF.each { |l| puts p.parse(l) }
@@ -48,49 +44,44 @@ Ritex features several advantages over itex2mml:
48
44
  end
49
45
  end
50
46
 
51
- = Using Ritex
52
-
53
- Calling Ritex from Ruby is very simple. If the synopsis above isn't
54
- enough, see the documentation for Ritex::Parser for the gory details.
47
+ See the documentation for Ritex::Parser for gory details.
55
48
 
56
- = Creating MathML with Ritex
49
+ = Ritex's MathML output
57
50
 
58
- Ritex parses WebTeX. WebTeX is an adapation of the TeX math syntax
59
- which is designed for web page display. The WebTeX documentation can
60
- be found at
61
- http://stuff.mit.edu/afs/athena/software/webeq/currenthome/docs/webtex/toc.html.
51
+ To be pedantic, Ritex is a WebTeX to MathML converter. WebTeX is an adapation
52
+ of the TeX math syntax which is designed for web page display. WebTeX
53
+ documentation can be found at:
54
+ http://stuff.mit.edu/afs/athena/software/webeq/currenthome/docs/webtex/toc.html
62
55
 
63
- If you're familiar with TeX math syntax, you'll feel right at
64
- home. But there are several important differences between it and
65
- WebTeX. Most notably:
56
+ If you're familiar with TeX math syntax, it's mostly the same, but there are
57
+ several important differences in WebTeX. Notably:
66
58
 
67
- * arrays: different \array syntax; no \eqnarray or \align
59
+ * arrays: Use \array syntax; there's no no \eqnarray or \align
68
60
  * macro definitions: \define; no \newcommand or \def
69
- * \left and \right no longer need "invisible" delimiters
61
+ * \left and \right no longer need "invisible" delimiters like "."
70
62
 
71
63
  These differences are explained in the WebTeX documentation.
72
64
 
73
- Ritex is based heavily on Itex2mml. Itex2mml accepts what it calls
74
- "Itex", an extension of WebTeX which adds a few aliases (like
75
- \infinity for \infty) and markups (like \underoverset). Ritex supports
76
- these extensions. Regardless, I've chosen to say that Ritex parses
77
- WebTeX rather than Itex, mainly because the former includes macros and
78
- is better documented.
65
+ Ritex also supports many of itex2MML's various extensions to WebTeX, mainly
66
+ consisting of additional aliases (e.g. \infinity for \infty) and markup (e.g.
67
+ \underoverset).
79
68
 
80
- Itex is described at
81
- http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html.
69
+ = Comparison with itex2MML
82
70
 
83
- See the ReleaseNotes for features in WebTeX that are currently
84
- unimplemented in Ritex.
85
-
86
- = Differences between Ritex and itex2mml
87
-
88
- If you're familiar with itex2mml, there are a few subtle differences
89
- between the two:
90
- * A sequence of letters like "abc" is treated as three separate
91
- variables and not as one variable. I believe that's the TeX Way (tm).
92
- * \ (backslash space) is a medium space, not an undefined character.
93
- * Sequences like "x--3" will correctly mark the second operator as a
94
- unary minus.
95
- * And of course, macros.
71
+ itex2MML is another option for converting LaTeX-like math into MathML. It has
72
+ Ruby bindings. Compared against itex2MML version 1.3.7 (3/7/2009), Ritex has
73
+ several differences:
96
74
 
75
+ * It supports macros.
76
+ * It's written in Ruby.
77
+ * It fixes several output bugs:
78
+ - Operators like < and > produce <mo> (math operator) tags instead of
79
+ <mi> (math identifier) tags.
80
+ - A sequence of letters like "abc" is treated as three separate variables and
81
+ not as one variable. That's The TeX Way (tm).
82
+ - \ (backslash space) is a medium space, not an undefined character.
83
+ - \binom output does not add extra parentheses
84
+ - Empty delimiters are accepted (e.g. "\left" is sufficient; no need for
85
+ "\left.") as per WebTeX spec.
86
+ - \cellopts{} can be elided in arrays as per WebTeX examples.
87
+ * It's slower.
@@ -1,11 +1,19 @@
1
+ version 0.2, 04/02/2009
2
+ ======================
3
+
4
+ After almost four years, a new version! Many upgrades and updates.
5
+ Highlights include:
6
+ * array options support
7
+ * better unary minus detection
8
+ * better testing
9
+
1
10
  version 0.1, 9/15/2005
2
- ----------------------
11
+ ======================
3
12
 
4
13
  First version. Highly experimental!
5
14
 
6
- Unimplemented features:
15
+ Unimplemented WebTeX features:
7
16
  * \floatleft, \floatright
8
- * array options
9
17
  * \tensor and \multiscripts
10
18
  * macros with 4 or more arguments
11
- * \bghighlight, \fghighlight, \statusline
19
+ * \bghighlight, \fghighlight, \statusline
@@ -1,9 +1,9 @@
1
1
  ## lib/ritex.rb -- contains Ritex::Parser
2
2
  ## Author:: William Morgan (mailto: wmorgan-ritex@masanjin.net)
3
- ## Copyright:: Copyright 2005 William Morgan
3
+ ## Copyright:: Copyright 2005-2009 William Morgan
4
4
  ## License:: GNU GPL version 2
5
5
  ##
6
- ## :title:Ritex: a Ruby itex to mathml converter
6
+ ## :title:Ritex: a Ruby WebTeX to MathML converter
7
7
  ## :main:README
8
8
 
9
9
  require "ritex/parser"
@@ -17,9 +17,6 @@ require 'racc/parser' # just for Racc::ParserError
17
17
  ## Ritex::Parser.
18
18
  module Ritex
19
19
 
20
- ## See #merror=
21
- class Error < Racc::ParseError; end
22
-
23
20
  ## This is not ideal by any means. Until we can call a Proc with an
24
21
  ## arbitrary binding (Ruby 1.9?), we will relay all #markup and
25
22
  ## #lookup calls within the module to a registered parser, so that the
@@ -31,6 +28,9 @@ class Error < Racc::ParseError; end
31
28
  attr_accessor :global_parser
32
29
  module_function :global_parser, :global_parser=
33
30
 
31
+ ## Thrown by Parser upon errors. See Parser#merror=.
32
+ class Error < StandardError; end
33
+
34
34
  ## The parser for itex and the main entry point for Ritex. This class
35
35
  ## is partially defined here and partially generated by Racc from
36
36
  ## lib/parser.y.
@@ -69,7 +69,7 @@ class Parser
69
69
  def parse s, wrap = true, inline = true
70
70
  @lex = Lexer.new(self, s)
71
71
  r = yyparse @lex, :lex
72
- r = markup r, (inline ? :math : :displaymath) if wrap unless r.empty?
72
+ r = markup r, (inline ? :math : :displaymath) if wrap
73
73
  r
74
74
  end
75
75
 
@@ -82,18 +82,17 @@ class Parser
82
82
  ## Delete all macros
83
83
  def flush_macros; @macros = {}; end
84
84
 
85
- def markup what, tag, opts=nil #:nodoc:
85
+ def markup what, tag, opts=[] #:nodoc:
86
86
  case @format
87
87
  when :mathml
88
- # puts "x marking up #{type}, member? #{MathML::MARKUP.member? type}"
89
- tag, opts =
90
- case tag
88
+ tag, opts = case tag
91
89
  when String
92
90
  [tag, opts]
93
91
  when Symbol
94
- MathML::MARKUP[tag]
92
+ a, b = MathML::MARKUP[tag]
93
+ [a, [b, opts].flatten.compact]
95
94
  end
96
- if opts
95
+ unless opts.empty?
97
96
  "<#{tag} #{opts}>#{what}</#{tag}>"
98
97
  else
99
98
  "<#{tag}>#{what}</#{tag}>"
@@ -109,6 +108,20 @@ class Parser
109
108
  end
110
109
  end
111
110
 
111
+ def token o #:nodoc:
112
+ case @format
113
+ when :mathml
114
+ MathML::TOKENS[o] || o
115
+ end
116
+ end
117
+
118
+ def op o, opts=[]
119
+ case @format
120
+ when :mathml
121
+ markup(token(o), "mo", opts)
122
+ end
123
+ end
124
+
112
125
  def funcs #:nodoc:
113
126
  case @format
114
127
  when :mathml
@@ -170,7 +183,7 @@ private
170
183
  elsif envs.member? name
171
184
  envs[name][*a]
172
185
  else
173
- error "unknown function, macro or environment #{name}"
186
+ error "unknown function, macro or environment #{name.inspect}"
174
187
  end
175
188
  end
176
189
 
@@ -191,10 +204,6 @@ private
191
204
  @macros[sym].instance_eval "def arity; #{arity}; end" # hack!
192
205
  ""
193
206
  end
194
-
195
- def warn s
196
- $stderr.puts "warning: #{s}"
197
- end
198
207
  end
199
208
 
200
209
  end
@@ -1,33 +1,34 @@
1
1
  ## lib/ritex/lexer.rb -- contains Ritex::Lexer
2
2
  ## Author:: William Morgan (mailto: wmorgan-ritex@masanjin.net)
3
- ## Copyright:: Copyright 2005 William Morgan
3
+ ## Copyright:: Copyright 2005--2009 William Morgan
4
4
  ## License:: GNU GPL version 2
5
5
 
6
6
  require 'racc/parser' # just for Racc::ParseError
7
7
 
8
8
  module Ritex
9
9
 
10
- ## thrown upon lexing errors
11
- class LexError < Racc::ParseError; end
10
+ ## Thrown by Lexer upon lexing errors.
11
+ class LexError < StandardError; end
12
12
 
13
- ## The lexer splits input stream into tokens. These are handed to the
13
+ ## The lexer splits an input stream into tokens. These are handed to the
14
14
  ## parser. Ritex::Parser takes care of setting up and configuring the
15
15
  ## lexer.
16
16
  ##
17
- ## In order to support macros, the lexer maintains a stack of
18
- ## strings. Pushing a string onto the stack will cause #lex to yield
19
- ## tokens from that string, until it reaches the end, at which point
20
- ## it will discard the string and resume yielding tokens from the
21
- ## previous string.
17
+ ## In order to support macros, the lexer maintains a stack of strings.
18
+ ## Pushing a string onto the stack will cause #lex to yield tokens from
19
+ ## that string, until it reaches the end, at which point it will discard
20
+ ## the string and resume yielding tokens from the previous string.
22
21
  ##
23
- ## The lexer has two states. Normally it ignores all spacing. After
24
- ## hitting an ENV token it will start returning SPACE tokens for each
25
- ## space until it hits a '}'.
26
- ##
27
- ## The lexer also handles unary minus. It decides whether a '-' is
28
- ## unary or binary by considering the previous token.
22
+ ## To handle macros, the lexer is stateful. Normally it ignores all
23
+ ## spacing. After hitting an ENV token it will start returning SPACE
24
+ ## tokens for each space until it hits a '}'.
29
25
  class Lexer
30
26
  TOKENS = '+-\/\*|\.,;:<>=()#&\[\]^_!?~%\'{} ' # passed as themselves
27
+ OPERATOR_TOKENS = ' ,' # things that can be \'d to become operators
28
+ WORDS = %w(array arrayopts define left right rowopts cellopts colalign rowalign align padding equalcols equalrows rowlines collines
29
+ frame rowspan colspan) # passed as special tokens
30
+
31
+ WORDS_SEARCH = WORDS.map { |w| [/\A\\(#{Regexp.escape w})\b/, w.upcase.intern] }
31
32
 
32
33
  ## _s_ is an initial string to push on the stack, or nil.
33
34
  def initialize parser, s = nil
@@ -41,18 +42,21 @@ class Lexer
41
42
 
42
43
  ## Yield token and value pairs from the string stack.
43
44
  def lex #:yields: token, value
44
- @lastop = nil
45
+ ## actually this function does nothing right now except call
46
+ ## lex_inner. if we switch to more stateful tokenization this
47
+ ## might do something more.
45
48
  lex_inner do |sym, val|
46
- @lastop = val
47
49
  yield sym, val
48
50
  end
49
51
  end
50
52
 
51
53
  ## For debugging purposes.
52
54
  def dlex #:nodoc:
53
- lex do |sym, val|
54
- puts "** got #{sym.inspect}: [#{val}]"
55
- yield sym, val
55
+ while true
56
+ lex do |sym, val|
57
+ puts "GOT: #{sym} => #{val.inspect}"
58
+ return unless sym
59
+ end
56
60
  end
57
61
  end
58
62
 
@@ -62,38 +66,26 @@ private
62
66
  state = :normal
63
67
 
64
68
  until @s.empty?
65
- # puts "- @s length #{@s.length}: #{@s.inspect}"
66
69
  s, i = @s.first
67
70
  if i >= s.length
68
71
  @s.shift
69
72
  next
70
73
  end
71
74
 
72
- # puts "> now have #{s[i .. s.length]}"
73
- case s[i .. s.length]
75
+ next if WORDS_SEARCH.any? do |regex, token|
76
+ if s[i .. -1] =~ regex
77
+ name = $1
78
+ @s.first[1] += name.length + 1
79
+ yield [token, name]
80
+ state = :env if @parser.envs[name]
81
+ true
82
+ end
83
+ end
84
+
85
+ case s[i .. -1]
74
86
  when /\A(\s+)/
75
87
  @s.first[1] += $1.length
76
88
  yield [:SPACE, $1] if state == :env
77
- when /\A(\\array)/
78
- @s.first[1] += $1.length
79
- yield [:ARRAY, $1]
80
- when /\A(\\define)/
81
- @s.first[1] += $1.length
82
- yield [:DEFINE, $1]
83
- when /\A(\\left)/
84
- @s.first[1] += $1.length
85
- yield [:LEFT, $1]
86
- when /\A(\\right)/
87
- @s.first[1] += $1.length
88
- yield [:RIGHT, $1]
89
- when /\A-/
90
- @s.first[1] += 1
91
- if [[nil, '{', '(', '[', '+', '-', '/', '*', '=', '<', '>', '&'],
92
- @parser.op_symbols].any? { |x| x.member? @lastop }
93
- yield [:UNARYMINUS, '-']
94
- else
95
- yield ['-', '-']
96
- end
97
89
  when /\A([#{TOKENS}])/
98
90
  @s.first[1] += 1
99
91
  state = :normal if (state == :env) && ($1 == '}')
@@ -101,13 +93,15 @@ private
101
93
  when /\A(\\\\)/
102
94
  @s.first[1] += $1.length
103
95
  yield [:DOUBLEBACK, $1]
96
+ when /\A\\([#{OPERATOR_TOKENS}])/
97
+ @s.first[1] += $1.length + 1
98
+ yield [:OPERATOR, $1]
104
99
  when /\A\\([#{TOKENS}\\\\])/
105
100
  @s.first[1] += $1.length + 1
106
101
  yield [:SYMBOL, $1]
107
- when /\A\\([a-zA-Z][a-zA-Z*\d]+)/
102
+ when /\A\\([a-zA-Z][a-zA-Z*\d]*)/
108
103
  name = $1
109
104
  type = :SYMBOL
110
- # puts "** checking #{name} against specials list #{specs.keys * ' '}, got #{specs[name].inspect}"
111
105
  if @parser.funcs.member? name
112
106
  proc = @parser.funcs[name]
113
107
  type = [:FUNC0, :FUNC1, :FUNC2, :FUNC3][proc.arity]
@@ -125,16 +119,15 @@ private
125
119
  when /\A(-?(\d+|\d*\.\d+))/
126
120
  @s.first[1] += $1.length
127
121
  yield [:NUMBER, $1]
128
- when /\A(\w)/
122
+ when /\A([a-zA-Z]+)/
129
123
  @s.first[1] += $1.length
130
124
  yield [:VAR, $1]
131
125
  else
132
126
  raise LexError, "unlexable at position #{i}: #{s[i .. [s.length, i + 20].min]}"
133
127
  end
134
128
  end
135
- yield [false, false]
136
- end
137
129
 
130
+ yield [false, false] # done!
131
+ end
138
132
  end
139
-
140
133
  end
@@ -13,32 +13,22 @@ module Ritex
13
13
  ## incorrect because we programmatically modify the globals in this package.
14
14
  module MathML
15
15
 
16
- ## Default entities, stolen from
16
+ ## Default entities, mostly stolen from
17
17
  ## http://www.orcca.on.ca/mathml/texmml/texmml.xml. We overwrite many
18
18
  ## of these below.
19
19
  DEFAULTS = {
20
- "\"" => "<mo>&quot;</mo>",
21
- "|" => "<mo>&#x2225;</mo>",
20
+ "{" => "<mo>{</mo>",
21
+ "}" => "<mo>}</mo>",
22
22
  "Vert" => "<mo>&#x2225;</mo>",
23
- "|" => "<mo>&#x2223;</mo>",
24
23
  "vert" => "<mo>&#x2223;</mo>",
25
- "(" => "<mo>(</mo>",
26
- "[" => "<mo>[</mo>",
27
24
  "lbrack" => "<mo>[</mo>",
28
- "{" => "<mo>{</mo>",
29
25
  "lbrace" => "<mo>{</mo>",
30
- "<" => "<mo>&lt;</mo>",
31
- "/" => "<mo>/</mo>",
32
26
  "lfloor" => "<mo>&#x230A;</mo>",
33
27
  "lceil" => "<mo>&#x2308;</mo>",
34
28
  "langle" => "<mo>&#x2329;</mo>",
35
29
  "lgroup" => "<mo>(</mo>",
36
- ")" => "<mo>)</mo>",
37
- "]" => "<mo>]</mo>",
38
30
  "rbrack" => "<mo>]</mo>",
39
- "}" => "<mo>}</mo>",
40
31
  "rbrace" => "<mo>}</mo>",
41
- ">" => "<mo>&gt;</mo>",
42
32
  "backslash" => "<mo>\\</mo>",
43
33
  "rfloor" => "<mo>&#x230B;</mo>",
44
34
  "rceil" => "<mo>&#x2309;</mo>",
@@ -331,13 +321,7 @@ DEFAULTS = {
331
321
  "varPsi" => "<mi>&#x03A8;</mi>",
332
322
  "varOmega" => "<mi>&#x03A9;</mi>",
333
323
  "colon" => "<mo>:</mo>",
334
- "*" => "<mo>*</mo>",
335
- "#" => "<mo>#</mo>",
336
- "$" => "<mo>$</mo>",
337
- "%" => "<mo>%</mo>",
338
324
  "&" => "<mo>&amp;</mo>",
339
- "_" => "<mo>_</mo>",
340
- "!" => "<mo>!</mo>",
341
325
  "aleph" => "<mo>&#x2135;</mo>",
342
326
  "imath" => "<mo>&#x2373;</mo>",
343
327
  "jmath" => "<mo>&#x006A;</mo>",
@@ -406,8 +390,6 @@ DEFAULTS = {
406
390
  "copyright" => "<mo>&#x00A9;</mo>",
407
391
  "P" => "<mo>&#x00B6;</mo>",
408
392
  "pounds" => "<mo>&#x00A3;</mo>",
409
- "+" => "<mo>+</mo>",
410
- "-" => "<mo>-</mo>",
411
393
  "pm" => "<mo>&#x00B1;</mo>",
412
394
  "mp" => "<mo>&#x00B1;</mo>",
413
395
  "times" => "<mo>&#x00D7;</mo>",
@@ -649,7 +631,8 @@ NOTATION = generate "mo", "", {
649
631
  },
650
632
  %w(rfloor rceil rang rangle)
651
633
 
652
- NOTATION["cdots"] = "<mo>&sdot; &sdot; &sdot;</mo>"
634
+ #NOTATION["cdots"] = "<mo>&sdot; &sdot; &sdot;</mo>"
635
+ NOTATION["cdots"] = "<mo>&ctdot;</mo>"
653
636
  NOTATION["pmod"] = "&nbsp; mod"
654
637
 
655
638
  ## unary operators ("MOB")
@@ -664,7 +647,7 @@ UNARY_OPERATORS = generate "mo", 'lspace="thinmathspace" rspace="thinmathspace"'
664
647
  ["bigotimes", "Otimes"],
665
648
  ["bigoplus", "Oplus"],
666
649
  ]
667
- UNARY_OPERATORS["lim"] = "<mo lspace=\"thinmathspace\" rspace=\"thinmathspace\">lim</mo>"
650
+ #UNARY_OPERATORS["lim"] = "<mo lspace=\"thinmathspace\" rspace=\"thinmathspace\">lim</mo>"
668
651
 
669
652
  ## spaces
670
653
  SPACES = {
@@ -680,7 +663,14 @@ SPACES = {
680
663
 
681
664
  ## functions
682
665
  MATH_FUNCTIONS = {}
683
- %w(arccos arcsin arctan arg cos cosh cot coth csc deg det dim exp gcd hom inf ker lg liminf linmsup ln log bmod mod max min Pr sec sin sinh sup tan tanh).each { |x| MATH_FUNCTIONS[x] = "<mo lspace=\"0em\" rspace=\"thinmathspace\">#{x}</mo>" }
666
+ %w(arccos arcsin arctan arg cos cosh cot coth csc deg det dim exp gcd hom inf ker lg liminf linmsup ln log bmod mod max min Pr sec sin sinh sup tan tanh).each { |x| MATH_FUNCTIONS[x] = "<mi>#{x}</mi>" }
667
+
668
+ TOKENS = {
669
+ "-" => "&minus;",
670
+ "&" => "&amp;",
671
+ ">" => "&gt;",
672
+ "<" => "&lt;",
673
+ }
684
674
 
685
675
  ENTITIES = DEFAULTS.merge(NUMS).merge(GREEK).merge(OPERATORS).merge(NOTATION).merge(UNARY_OPERATORS).merge(SPACES).merge(MATH_FUNCTIONS)
686
676