ritex 0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README ADDED
@@ -0,0 +1,96 @@
1
+ Author:: William Morgan (mailto: wmorgan-ritex@masanjin.net)
2
+ Copyright:: Copyright 2005 William Morgan
3
+ License:: GNU GPL version 2
4
+
5
+ = Introduction
6
+
7
+ Ritex converts expressions from WebTeX into MathML. WebTeX is an
8
+ adaptation of TeX math syntax for web display.
9
+
10
+ Ritex makes inserting math into HTML pages easy. It supports most TeX
11
+ math syntax as well as macros.
12
+
13
+ For example, Ritex turns
14
+ \alpha^\beta
15
+ into
16
+ <math xmlns="http://www.w3.org/1998/Math/MathML">
17
+ <msup>
18
+ <mi>&alpha;</mi>
19
+ <mi>&beta;</mi>
20
+ </msup>
21
+ </math>
22
+
23
+ Ritex is based heavily on itex2mml
24
+ (http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html), a popular TeX
25
+ math to MathML convertor--so much so that the default correct answer
26
+ to unit tests is to do whatever itex2mml does!
27
+
28
+ Ritex features several advantages over itex2mml:
29
+
30
+ * It's written in Ruby (hey, I consider that an advantage).
31
+ * It supports macros.
32
+ * It handles unary minus better.
33
+ * It's easier to extend.
34
+
35
+ = Synopsis
36
+
37
+ require 'ritex'
38
+ p = Ritex::Parser.new
39
+ ARGF.each { |l| puts p.parse(l) }
40
+
41
+ ## or ...
42
+
43
+ ARGF.each do |l|
44
+ begin
45
+ puts p.parse(l)
46
+ rescue Racc::ParseError
47
+ $stderr.puts "invalid input"
48
+ end
49
+ end
50
+
51
+ = Using Ritex
52
+
53
+ Calling Ritex from Ruby is very simple. If the synopsis above isn't
54
+ enough, see the documentation for Ritex::Parser for the gory details.
55
+
56
+ = Creating MathML with Ritex
57
+
58
+ Ritex parses WebTeX. WebTeX is an adapation of the TeX math syntax
59
+ which is designed for web page display. The WebTeX documentation can
60
+ be found at
61
+ http://stuff.mit.edu/afs/athena/software/webeq/currenthome/docs/webtex/toc.html.
62
+
63
+ If you're familiar with TeX math syntax, you'll feel right at
64
+ home. But there are several important differences between it and
65
+ WebTeX. Most notably:
66
+
67
+ * arrays: different \array syntax; no \eqnarray or \align
68
+ * macro definitions: \define; no \newcommand or \def
69
+ * \left and \right no longer need "invisible" delimiters
70
+
71
+ These differences are explained in the WebTeX documentation.
72
+
73
+ Ritex is based heavily on Itex2mml. Itex2mml accepts what it calls
74
+ "Itex", an extension of WebTeX which adds a few aliases (like
75
+ \infinity for \infty) and markups (like \underoverset). Ritex supports
76
+ these extensions. Regardless, I've chosen to say that Ritex parses
77
+ WebTeX rather than Itex, mainly because the former includes macros and
78
+ is better documented.
79
+
80
+ Itex is described at
81
+ http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html.
82
+
83
+ See the ReleaseNotes for features in WebTeX that are currently
84
+ unimplemented in Ritex.
85
+
86
+ = Differences between Ritex and itex2mml
87
+
88
+ If you're familiar with itex2mml, there are a few subtle differences
89
+ between the two:
90
+ * A sequence of letters like "abc" is treated as three separate
91
+ variables and not as one variable. I believe that's the TeX Way (tm).
92
+ * \ (backslash space) is a medium space, not an undefined character.
93
+ * Sequences like "x--3" will correctly mark the second operator as a
94
+ unary minus.
95
+ * And of course, macros.
96
+
@@ -0,0 +1,11 @@
1
+ version 0.1, 9/15/2005
2
+ ----------------------
3
+
4
+ First version. Highly experimental!
5
+
6
+ Unimplemented features:
7
+ * \floatleft, \floatright
8
+ * array options
9
+ * \tensor and \multiscripts
10
+ * macros with 4 or more arguments
11
+ * \bghighlight, \fghighlight, \statusline
@@ -0,0 +1,200 @@
1
+ ## lib/ritex.rb -- contains Ritex::Parser
2
+ ## Author:: William Morgan (mailto: wmorgan-ritex@masanjin.net)
3
+ ## Copyright:: Copyright 2005 William Morgan
4
+ ## License:: GNU GPL version 2
5
+ ##
6
+ ## :title:Ritex: a Ruby itex to mathml converter
7
+ ## :main:README
8
+
9
+ require "ritex/parser"
10
+ require "ritex/lexer"
11
+ require "ritex/mathml/entities"
12
+ require "ritex/mathml/functions"
13
+ require "ritex/mathml/markup"
14
+ require 'racc/parser' # just for Racc::ParserError
15
+
16
+ ## Container module for all Ritex stuff. The entry point is
17
+ ## Ritex::Parser.
18
+ module Ritex
19
+
20
+ ## See #merror=
21
+ class Error < Racc::ParseError; end
22
+
23
+ ## This is not ideal by any means. Until we can call a Proc with an
24
+ ## arbitrary binding (Ruby 1.9?), we will relay all #markup and
25
+ ## #lookup calls within the module to a registered parser, so that the
26
+ ## "functions" in lib/functions.rb can be written more easily. Any
27
+ ## better ideas?
28
+ ##
29
+ ## In the mean time, I'd recommend not having more than one parser at
30
+ ## a time going.
31
+ attr_accessor :global_parser
32
+ module_function :global_parser, :global_parser=
33
+
34
+ ## The parser for itex and the main entry point for Ritex. This class
35
+ ## is partially defined here and partially generated by Racc from
36
+ ## lib/parser.y.
37
+ ##
38
+ ## Create the parser with #new. Parse strings with #parse. That's all
39
+ ## there is to it.
40
+ class Parser
41
+ FORMATS = [:mathml]
42
+
43
+ ## If true, Ritex will output a <merror>...</merror> message in the
44
+ ## MathML if an unknown entity is encountered. If false (the default),
45
+ ## Ritex will throw a Ritex::Error.
46
+ attr_accessor :merror
47
+
48
+ ## _format_ is the desired output format and must be in the FORMATS
49
+ ## list. Right now that's just :mathml.
50
+ def initialize format = :mathml
51
+ self.format = format
52
+ @macros = {}
53
+ Ritex.global_parser = self # lame
54
+ @merror = false
55
+ end
56
+
57
+ ## Parse a string. Returns the MathML output in string form. Note
58
+ ## that macro definitios are cumulative and persistent across calls
59
+ ## to #parse. If you don't want this behavior, you must explicitly
60
+ ## call #flush_macros after every #parse call.
61
+ ##
62
+ ## _wrap_ denotes whether you want the output wrapped in the
63
+ ## top-level XML math tag. Unless you're generating these tags
64
+ ## yourself, you want this.
65
+ ##
66
+ ## _inline_ denotes whether you want inline markup versus block or
67
+ ## "display" markup. For mathml output this only has an effect if
68
+ ## _wrap_ is true.
69
+ def parse s, wrap = true, inline = true
70
+ @lex = Lexer.new(self, s)
71
+ r = yyparse @lex, :lex
72
+ r = markup r, (inline ? :math : :displaymath) if wrap unless r.empty?
73
+ r
74
+ end
75
+
76
+ attr_reader :format
77
+ def format= format
78
+ raise ArgumentError, "format must be one of #{FORMATS * ', '}" unless FORMATS.include? format
79
+ @format = format
80
+ end
81
+
82
+ ## Delete all macros
83
+ def flush_macros; @macros = {}; end
84
+
85
+ def markup what, tag, opts=nil #:nodoc:
86
+ case @format
87
+ when :mathml
88
+ # puts "x marking up #{type}, member? #{MathML::MARKUP.member? type}"
89
+ tag, opts =
90
+ case tag
91
+ when String
92
+ [tag, opts]
93
+ when Symbol
94
+ MathML::MARKUP[tag]
95
+ end
96
+ if opts
97
+ "<#{tag} #{opts}>#{what}</#{tag}>"
98
+ else
99
+ "<#{tag}>#{what}</#{tag}>"
100
+ end
101
+ end
102
+ end
103
+
104
+ def lookup sym #:nodoc:
105
+ case @format
106
+ when :mathml
107
+ return error("unknown entity #{sym.inspect}") unless MathML::ENTITIES.member? sym
108
+ MathML::ENTITIES[sym]
109
+ end
110
+ end
111
+
112
+ def funcs #:nodoc:
113
+ case @format
114
+ when :mathml
115
+ MathML::FUNCTIONS
116
+ end
117
+ end
118
+
119
+ def envs #:nodoc:
120
+ case @format
121
+ when :mathml
122
+ MathML::ENVS
123
+ end
124
+ end
125
+
126
+ def macros #:nodoc:
127
+ @macros
128
+ end
129
+
130
+ def op_symbols #:nodoc:
131
+ case @format
132
+ when :mathml
133
+ MathML::OPERATORS.merge(MathML::UNARY_OPERATORS).merge(MathML::MATH_FUNCTIONS)
134
+ end
135
+ end
136
+
137
+ private
138
+ def error e
139
+ if @merror
140
+ "<merror>e</merror>"
141
+ else
142
+ raise Error, e
143
+ end
144
+ end
145
+
146
+ def safe s
147
+ case @format
148
+ when :mathml
149
+ s.gsub("&", "&amp;").gsub(">", "&gt;").gsub("<", "&lt;")
150
+ end
151
+ end
152
+
153
+ def join *a
154
+ case @format
155
+ when :mathml
156
+ a.join ""
157
+ end
158
+ end
159
+
160
+ def special name, *a
161
+ if @macros.member? name
162
+ # puts "evaluating macro (arity #{@macros[name].arity}): type #{name.inspect}, #{a.length} args #{a.inspect}"
163
+ res = @macros[name][*a]
164
+ # puts "got #{res}"
165
+ @lex.push res
166
+ ""
167
+ elsif funcs.member? name
168
+ # puts "*** running func #{name}"
169
+ funcs[name][*a]
170
+ elsif envs.member? name
171
+ envs[name][*a]
172
+ else
173
+ error "unknown function, macro or environment #{name}"
174
+ end
175
+ end
176
+
177
+ def define sym, arity, exp
178
+ arity = arity.to_i
179
+ raise Error, "macro arity must be <= 3" unless arity <= 3
180
+ raise Error, "macro arity must be >= 0" unless arity >= 0
181
+
182
+ # puts "defining macro #{sym} with exp #{exp} (arity #{arity})"
183
+ warn "overriding definition for #{sym}" if @macros.member? sym
184
+ @macros[sym] = lambda do |*a|
185
+ raise Error, "expecting #{arity} arguments, got #{a.length}" unless a.length == arity
186
+ # puts "evaluating macro #{sym}, args #{a.inspect}"
187
+ x = (0 ... arity).inject(exp) { |s, i| s.gsub(/\##{i + 1}/, a[i]) }
188
+ # puts "macro evals to: #{x.inspect}"
189
+ x
190
+ end
191
+ @macros[sym].instance_eval "def arity; #{arity}; end" # hack!
192
+ ""
193
+ end
194
+
195
+ def warn s
196
+ $stderr.puts "warning: #{s}"
197
+ end
198
+ end
199
+
200
+ end
@@ -0,0 +1,140 @@
1
+ ## lib/ritex/lexer.rb -- contains Ritex::Lexer
2
+ ## Author:: William Morgan (mailto: wmorgan-ritex@masanjin.net)
3
+ ## Copyright:: Copyright 2005 William Morgan
4
+ ## License:: GNU GPL version 2
5
+
6
+ require 'racc/parser' # just for Racc::ParseError
7
+
8
+ module Ritex
9
+
10
+ ## thrown upon lexing errors
11
+ class LexError < Racc::ParseError; end
12
+
13
+ ## The lexer splits input stream into tokens. These are handed to the
14
+ ## parser. Ritex::Parser takes care of setting up and configuring the
15
+ ## lexer.
16
+ ##
17
+ ## In order to support macros, the lexer maintains a stack of
18
+ ## strings. Pushing a string onto the stack will cause #lex to yield
19
+ ## tokens from that string, until it reaches the end, at which point
20
+ ## it will discard the string and resume yielding tokens from the
21
+ ## previous string.
22
+ ##
23
+ ## The lexer has two states. Normally it ignores all spacing. After
24
+ ## hitting an ENV token it will start returning SPACE tokens for each
25
+ ## space until it hits a '}'.
26
+ ##
27
+ ## The lexer also handles unary minus. It decides whether a '-' is
28
+ ## unary or binary by considering the previous token.
29
+ class Lexer
30
+ TOKENS = '+-\/\*|\.,;:<>=()#&\[\]^_!?~%\'{} ' # passed as themselves
31
+
32
+ ## _s_ is an initial string to push on the stack, or nil.
33
+ def initialize parser, s = nil
34
+ @parser = parser
35
+ @s = []
36
+ push s unless s.nil?
37
+ end
38
+
39
+ ## push an additional string on to the stack.
40
+ def push s; @s.unshift [s, 0]; end
41
+
42
+ ## Yield token and value pairs from the string stack.
43
+ def lex #:yields: token, value
44
+ @lastop = nil
45
+ lex_inner do |sym, val|
46
+ @lastop = val
47
+ yield sym, val
48
+ end
49
+ end
50
+
51
+ ## For debugging purposes.
52
+ def dlex #:nodoc:
53
+ lex do |sym, val|
54
+ puts "** got #{sym.inspect}: [#{val}]"
55
+ yield sym, val
56
+ end
57
+ end
58
+
59
+ private
60
+
61
+ def lex_inner
62
+ state = :normal
63
+
64
+ until @s.empty?
65
+ # puts "- @s length #{@s.length}: #{@s.inspect}"
66
+ s, i = @s.first
67
+ if i >= s.length
68
+ @s.shift
69
+ next
70
+ end
71
+
72
+ # puts "> now have #{s[i .. s.length]}"
73
+ case s[i .. s.length]
74
+ when /\A(\s+)/
75
+ @s.first[1] += $1.length
76
+ yield [:SPACE, $1] if state == :env
77
+ when /\A(\\array)/
78
+ @s.first[1] += $1.length
79
+ yield [:ARRAY, $1]
80
+ when /\A(\\define)/
81
+ @s.first[1] += $1.length
82
+ yield [:DEFINE, $1]
83
+ when /\A(\\left)/
84
+ @s.first[1] += $1.length
85
+ yield [:LEFT, $1]
86
+ when /\A(\\right)/
87
+ @s.first[1] += $1.length
88
+ yield [:RIGHT, $1]
89
+ when /\A-/
90
+ @s.first[1] += 1
91
+ if [[nil, '{', '(', '[', '+', '-', '/', '*', '=', '<', '>', '&'],
92
+ @parser.op_symbols].any? { |x| x.member? @lastop }
93
+ yield [:UNARYMINUS, '-']
94
+ else
95
+ yield ['-', '-']
96
+ end
97
+ when /\A([#{TOKENS}])/
98
+ @s.first[1] += 1
99
+ state = :normal if (state == :env) && ($1 == '}')
100
+ yield [$1, $1]
101
+ when /\A(\\\\)/
102
+ @s.first[1] += $1.length
103
+ yield [:DOUBLEBACK, $1]
104
+ when /\A\\([#{TOKENS}\\\\])/
105
+ @s.first[1] += $1.length + 1
106
+ yield [:SYMBOL, $1]
107
+ when /\A\\([a-zA-Z][a-zA-Z*\d]+)/
108
+ name = $1
109
+ type = :SYMBOL
110
+ # puts "** checking #{name} against specials list #{specs.keys * ' '}, got #{specs[name].inspect}"
111
+ if @parser.funcs.member? name
112
+ proc = @parser.funcs[name]
113
+ type = [:FUNC0, :FUNC1, :FUNC2, :FUNC3][proc.arity]
114
+ raise LexError, "functions of arity '#{proc.arity}' unsupported" if type.nil?
115
+ elsif @parser.envs[name]
116
+ type = :ENV
117
+ state = :env
118
+ elsif @parser.macros.member? name
119
+ proc = @parser.macros[name]
120
+ type = [:MACRO0, :MACRO1, :MACRO2, :MACRO3][proc.arity]
121
+ raise LexError, "macro of arity '#{proc.arity}' unsupported" if type.nil?
122
+ end
123
+ @s.first[1] += $1.length + 1
124
+ yield [type, name]
125
+ when /\A(-?(\d+|\d*\.\d+))/
126
+ @s.first[1] += $1.length
127
+ yield [:NUMBER, $1]
128
+ when /\A(\w)/
129
+ @s.first[1] += $1.length
130
+ yield [:VAR, $1]
131
+ else
132
+ raise LexError, "unlexable at position #{i}: #{s[i .. [s.length, i + 20].min]}"
133
+ end
134
+ end
135
+ yield [false, false]
136
+ end
137
+
138
+ end
139
+
140
+ end