RubyGems - ritex - Versions diffs - 0.1 → 0.2 - Mend

ritex 0.1 → 0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

data/README +41 -50
data/ReleaseNotes +12 -4
data/lib/ritex.rb +26 -17
data/lib/ritex/lexer.rb +42 -49
data/lib/ritex/mathml/entities.rb +14 -24
data/lib/ritex/mathml/functions.rb +32 -10
data/lib/ritex/mathml/markup.rb +4 -4
data/lib/ritex/parser.rb +983 -644
data/lib/ritex/parser.y +103 -65
data/test/answer-key.yaml +233 -0
data/test/mathml.rb +117 -74
metadata +56 -45
data/test/itex2mml-key.yaml +0 -292

data/README CHANGED

@@ -1,14 +1,12 @@
 Author::    William Morgan (mailto: wmorgan-ritex@masanjin.net)
-Copyright:: Copyright 2005 William Morgan
+Copyright:: Copyright 2005--2009 William Morgan
 License::   GNU GPL version 2
 = Introduction
-Ritex converts expressions from WebTeX into MathML. WebTeX is an
-adaptation of TeX math syntax for web display.
-Ritex makes inserting math into HTML pages easy. It supports most TeX
-math syntax as well as macros.
+Ritex converts expressions from WebTeX into MathML. WebTeX is an adaptation of
+TeX math syntax for web display. Ritex supports most TeX math syntax, and
+supports macros.
 For example, Ritex turns
   \alpha^\beta
@@ -20,20 +18,18 @@ into
     </msup>
   </math>
-Ritex is based heavily on itex2mml
-(http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html), a popular TeX
-math to MathML convertor--so much so that the default correct answer
-to unit tests is to do whatever itex2mml does!
+= How to get Ritex
-Ritex features several advantages over itex2mml:
+To install, use RubyGems: gem install ritex.
+For examples and git instructions, see the home page: http://masanjin.net/ritex/
-* It's written in Ruby (hey, I consider that an advantage).
-* It supports macros.
-* It handles unary minus better.
-* It's easier to extend.
+= News about Ritex
+See the blog: http://all-thing.net/label/ritex/.
 = Synopsis
+  require 'rubygems'
   require 'ritex'
   p = Ritex::Parser.new
   ARGF.each { |l| puts p.parse(l) }
@@ -48,49 +44,44 @@ Ritex features several advantages over itex2mml:
     end
   end
-= Using Ritex
-Calling Ritex from Ruby is very simple. If the synopsis above isn't
-enough, see the documentation for Ritex::Parser for the gory details.
+See the documentation for Ritex::Parser for gory details.
-= Creating MathML with Ritex
+= Ritex's MathML output
-Ritex parses WebTeX. WebTeX is an adapation of the TeX math syntax
-which is designed for web page display. The WebTeX documentation can
-be found at
-http://stuff.mit.edu/afs/athena/software/webeq/currenthome/docs/webtex/toc.html.
+To be pedantic, Ritex is a WebTeX to MathML converter. WebTeX is an adapation
+of the TeX math syntax which is designed for web page display. WebTeX
+documentation can be found at:
+  http://stuff.mit.edu/afs/athena/software/webeq/currenthome/docs/webtex/toc.html
-If you're familiar with TeX math syntax, you'll feel right at
-home. But there are several important differences between it and
-WebTeX. Most notably:
+If you're familiar with TeX math syntax, it's mostly the same, but there are
+several important differences in WebTeX. Notably:
-* arrays: different \array syntax; no \eqnarray or \align
+* arrays: Use \array syntax; there's no no \eqnarray or \align
 * macro definitions: \define; no \newcommand or \def
-* \left and \right no longer need "invisible" delimiters
+* \left and \right no longer need "invisible" delimiters like "."
 These differences are explained in the WebTeX documentation.
-Ritex is based heavily on Itex2mml. Itex2mml accepts what it calls
-"Itex", an extension of WebTeX which adds a few aliases (like
-\infinity for \infty) and markups (like \underoverset). Ritex supports
-these extensions. Regardless, I've chosen to say that Ritex parses
-WebTeX rather than Itex, mainly because the former includes macros and
-is better documented.
+Ritex also supports many of itex2MML's various extensions to WebTeX, mainly
+consisting of additional aliases (e.g. \infinity for \infty) and markup (e.g.
+\underoverset).
-Itex is described at
-http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html.
+= Comparison with itex2MML
-See the ReleaseNotes for features in WebTeX that are currently
-unimplemented in Ritex.
-= Differences between Ritex and itex2mml
-If you're familiar with itex2mml, there are a few subtle differences
-between the two:
-* A sequence of letters like "abc" is treated as three separate
-  variables and not as one variable. I believe that's the TeX Way (tm).
-* \ (backslash space) is a medium space, not an undefined character.
-* Sequences like "x--3" will correctly mark the second operator as a
-  unary minus.
-* And of course, macros.
+itex2MML is another option for converting LaTeX-like math into MathML. It has
+Ruby bindings. Compared against itex2MML version 1.3.7 (3/7/2009), Ritex has
+several differences:
+* It supports macros.
+* It's written in Ruby.
+* It fixes several output bugs:
+  - Operators like < and > produce <mo> (math operator) tags instead of
+    <mi> (math identifier) tags.
+  - A sequence of letters like "abc" is treated as three separate variables and
+    not as one variable. That's The TeX Way (tm).
+  - \ (backslash space) is a medium space, not an undefined character.
+  - \binom output does not add extra parentheses
+  - Empty delimiters are accepted (e.g. "\left" is sufficient; no need for
+    "\left.") as per WebTeX spec.
+  - \cellopts{} can be elided in arrays as per WebTeX examples.
+* It's slower.

data/ReleaseNotes CHANGED

@@ -1,11 +1,19 @@
+version 0.2, 04/02/2009
+======================
+After almost four years, a new version! Many upgrades and updates.
+Highlights include:
+* array options support
+* better unary minus detection
+* better testing
 version 0.1, 9/15/2005
-----------------------
+======================
 First version. Highly experimental!
-Unimplemented features:
+Unimplemented WebTeX features:
 * \floatleft, \floatright
-* array options
 * \tensor and \multiscripts
 * macros with 4 or more arguments
-* \bghighlight, \fghighlight, \statusline
+* \bghighlight, \fghighlight, \statusline

data/lib/ritex.rb CHANGED

@@ -1,9 +1,9 @@
 ## lib/ritex.rb -- contains Ritex::Parser
 ## Author::    William Morgan (mailto: wmorgan-ritex@masanjin.net)
-## Copyright:: Copyright 2005 William Morgan
+## Copyright:: Copyright 2005-2009 William Morgan
 ## License::   GNU GPL version 2
 ##
-## :title:Ritex: a Ruby itex to mathml converter
+## :title:Ritex: a Ruby WebTeX to MathML converter
 ## :main:README
 require "ritex/parser"
@@ -17,9 +17,6 @@ require 'racc/parser' # just for Racc::ParserError
 ## Ritex::Parser.
 module Ritex
-## See #merror=
-class Error < Racc::ParseError; end
 ## This is not ideal by any means. Until we can call a Proc with an
 ## arbitrary binding (Ruby 1.9?), we will relay all #markup and
 ## #lookup calls within the module to a registered parser, so that the
@@ -31,6 +28,9 @@ class Error < Racc::ParseError; end
 attr_accessor :global_parser
 module_function :global_parser, :global_parser=
+## Thrown by Parser upon errors. See Parser#merror=.
+class Error < StandardError; end
 ## The parser for itex and the main entry point for Ritex.  This class
 ## is partially defined here and partially generated by Racc from
 ## lib/parser.y.
@@ -69,7 +69,7 @@ class Parser
   def parse s, wrap = true, inline = true
     @lex = Lexer.new(self, s)
     r = yyparse @lex, :lex
-    r = markup r, (inline ? :math : :displaymath) if wrap unless r.empty?
+    r = markup r, (inline ? :math : :displaymath) if wrap
     r
   end
@@ -82,18 +82,17 @@ class Parser
   ## Delete all macros
   def flush_macros; @macros = {}; end
-  def markup what, tag, opts=nil #:nodoc:
+  def markup what, tag, opts=[] #:nodoc:
     case @format
     when :mathml
-#      puts "x marking up #{type}, member? #{MathML::MARKUP.member? type}"
-      tag, opts =
-        case tag
+      tag, opts = case tag
         when String
           [tag, opts]
         when Symbol
-          MathML::MARKUP[tag]
+          a, b = MathML::MARKUP[tag]
+          [a, [b, opts].flatten.compact]
         end
-      if opts
+      unless opts.empty?
         "<#{tag} #{opts}>#{what}</#{tag}>"
       else
         "<#{tag}>#{what}</#{tag}>"
@@ -109,6 +108,20 @@ class Parser
     end
   end
+  def token o #:nodoc:
+    case @format
+    when :mathml
+      MathML::TOKENS[o] || o
+    end
+  end
+  def op o, opts=[]
+    case @format
+    when :mathml
+      markup(token(o), "mo", opts)
+    end
+  end
   def funcs #:nodoc:
     case @format
     when :mathml
@@ -170,7 +183,7 @@ private
     elsif envs.member? name
       envs[name][*a]
     else
-      error "unknown function, macro or environment #{name}"
+      error "unknown function, macro or environment #{name.inspect}"
     end
   end
@@ -191,10 +204,6 @@ private
     @macros[sym].instance_eval "def arity; #{arity}; end" # hack!
     ""
   end
-  def warn s
-    $stderr.puts "warning: #{s}"
-  end
 end
 end

data/lib/ritex/lexer.rb CHANGED

@@ -1,33 +1,34 @@
 ## lib/ritex/lexer.rb -- contains Ritex::Lexer
 ## Author::    William Morgan (mailto: wmorgan-ritex@masanjin.net)
-## Copyright:: Copyright 2005 William Morgan
+## Copyright:: Copyright 2005--2009 William Morgan
 ## License::   GNU GPL version 2
 require 'racc/parser' # just for Racc::ParseError
 module Ritex
-## thrown upon lexing errors
-class LexError < Racc::ParseError; end
+## Thrown by Lexer upon lexing errors.
+class LexError < StandardError; end
-## The lexer splits input stream into tokens. These are handed to the
+## The lexer splits an input stream into tokens. These are handed to the
 ## parser. Ritex::Parser takes care of setting up and configuring the
 ## lexer.
 ##
-## In order to support macros, the lexer maintains a stack of
-## strings. Pushing a string onto the stack will cause #lex to yield
-## tokens from that string, until it reaches the end, at which point
-## it will discard the string and resume yielding tokens from the
-## previous string.
+## In order to support macros, the lexer maintains a stack of strings.
+## Pushing a string onto the stack will cause #lex to yield tokens from
+## that string, until it reaches the end, at which point it will discard
+## the string and resume yielding tokens from the previous string.
 ##
-## The lexer has two states. Normally it ignores all spacing. After
-## hitting an ENV token it will start returning SPACE tokens for each
-## space until it hits a '}'.
-##
-## The lexer also handles unary minus. It decides whether a '-' is
-## unary or binary by considering the previous token.
+## To handle macros, the lexer is stateful. Normally it ignores all
+## spacing. After hitting an ENV token it will start returning SPACE
+## tokens for each space until it hits a '}'.
 class Lexer
   TOKENS = '+-\/\*|\.,;:<>=()#&\[\]^_!?~%\'{} ' # passed as themselves
+  OPERATOR_TOKENS = ' ,' # things that can be \'d to become operators
+  WORDS = %w(array arrayopts define left right rowopts cellopts colalign rowalign align padding equalcols equalrows rowlines collines
+             frame rowspan colspan) # passed as special tokens
+  WORDS_SEARCH = WORDS.map { |w| [/\A\\(#{Regexp.escape w})\b/, w.upcase.intern] }
   ## _s_ is an initial string to push on the stack, or nil.
   def initialize parser, s = nil
@@ -41,18 +42,21 @@ class Lexer
   ## Yield token and value pairs from the string stack.
   def lex #:yields: token, value
-    @lastop = nil
+    ## actually this function does nothing right now except call
+    ## lex_inner. if we switch to more stateful tokenization this
+    ## might do something more.
     lex_inner do |sym, val|
-      @lastop = val
       yield sym, val
     end
   end
   ## For debugging purposes.
   def dlex #:nodoc:
-    lex do |sym, val|
-      puts "** got #{sym.inspect}: [#{val}]"
-      yield sym, val
+    while true
+      lex do |sym, val|
+        puts "GOT: #{sym} => #{val.inspect}"
+        return unless sym
+      end
     end
   end
@@ -62,38 +66,26 @@ private
     state = :normal
     until @s.empty?
-#    puts "- @s length #{@s.length}: #{@s.inspect}"
       s, i = @s.first
       if i >= s.length
         @s.shift
         next
       end
-#      puts "> now have #{s[i .. s.length]}"
-      case s[i .. s.length]
+      next if WORDS_SEARCH.any? do |regex, token|
+        if s[i .. -1] =~ regex
+          name = $1
+          @s.first[1] += name.length + 1
+          yield [token, name]
+          state = :env if @parser.envs[name]
+          true
+        end
+      end
+      case s[i .. -1]
       when /\A(\s+)/
         @s.first[1] += $1.length
         yield [:SPACE, $1] if state == :env
-      when /\A(\\array)/
-        @s.first[1] += $1.length
-        yield [:ARRAY, $1]
-      when /\A(\\define)/
-        @s.first[1] += $1.length
-        yield [:DEFINE, $1]
-      when /\A(\\left)/
-        @s.first[1] += $1.length
-        yield [:LEFT, $1]
-      when /\A(\\right)/
-        @s.first[1] += $1.length
-        yield [:RIGHT, $1]
-      when /\A-/
-        @s.first[1] += 1
-        if [[nil, '{', '(', '[', '+', '-', '/', '*', '=', '<', '>', '&'],
-            @parser.op_symbols].any? { |x| x.member? @lastop }
-          yield [:UNARYMINUS, '-']
-        else
-          yield ['-', '-']
-        end
       when /\A([#{TOKENS}])/
         @s.first[1] += 1
         state = :normal if (state == :env) && ($1 == '}')
@@ -101,13 +93,15 @@ private
       when /\A(\\\\)/
         @s.first[1] += $1.length
         yield [:DOUBLEBACK, $1]
+      when /\A\\([#{OPERATOR_TOKENS}])/
+        @s.first[1] += $1.length + 1
+        yield [:OPERATOR, $1]
       when /\A\\([#{TOKENS}\\\\])/
         @s.first[1] += $1.length + 1
         yield [:SYMBOL, $1]
-      when /\A\\([a-zA-Z][a-zA-Z*\d]+)/
+      when /\A\\([a-zA-Z][a-zA-Z*\d]*)/
         name = $1
         type = :SYMBOL
-#        puts "** checking #{name} against specials list #{specs.keys * ' '}, got #{specs[name].inspect}"
         if @parser.funcs.member? name
           proc = @parser.funcs[name]
           type = [:FUNC0, :FUNC1, :FUNC2, :FUNC3][proc.arity]
@@ -125,16 +119,15 @@ private
       when /\A(-?(\d+|\d*\.\d+))/
         @s.first[1] += $1.length
         yield [:NUMBER, $1]
-      when /\A(\w)/
+      when /\A([a-zA-Z]+)/
         @s.first[1] += $1.length
         yield [:VAR, $1]
       else
         raise LexError, "unlexable at position #{i}: #{s[i .. [s.length, i + 20].min]}"
       end
     end
-    yield [false, false]
-  end
+    yield [false, false] # done!
+  end
 end
 end

data/lib/ritex/mathml/entities.rb CHANGED

@@ -13,32 +13,22 @@ module Ritex
 ## incorrect because we programmatically modify the globals in this package.
 module MathML
-## Default entities, stolen from
+## Default entities, mostly stolen from
 ## http://www.orcca.on.ca/mathml/texmml/texmml.xml. We overwrite many
 ## of these below.
 DEFAULTS = {
-  "\"" => "<mo>&quot;</mo>",
-  "|" => "<mo>&#x2225;</mo>",
+  "{" => "<mo>{</mo>",
+  "}" => "<mo>}</mo>",
   "Vert" => "<mo>&#x2225;</mo>",
-  "|" => "<mo>&#x2223;</mo>",
   "vert" => "<mo>&#x2223;</mo>",
-  "(" => "<mo>(</mo>",
-  "[" => "<mo>[</mo>",
   "lbrack" => "<mo>[</mo>",
-  "{" => "<mo>{</mo>",
   "lbrace" => "<mo>{</mo>",
-  "<" => "<mo>&lt;</mo>",
-  "/" => "<mo>/</mo>",
   "lfloor" => "<mo>&#x230A;</mo>",
   "lceil" => "<mo>&#x2308;</mo>",
   "langle" => "<mo>&#x2329;</mo>",
   "lgroup" => "<mo>(</mo>",
-  ")" => "<mo>)</mo>",
-  "]" => "<mo>]</mo>",
   "rbrack" => "<mo>]</mo>",
-  "}" => "<mo>}</mo>",
   "rbrace" => "<mo>}</mo>",
-  ">" => "<mo>&gt;</mo>",
   "backslash" => "<mo>\\</mo>",
   "rfloor" => "<mo>&#x230B;</mo>",
   "rceil" => "<mo>&#x2309;</mo>",
@@ -331,13 +321,7 @@ DEFAULTS = {
   "varPsi" => "<mi>&#x03A8;</mi>",
   "varOmega" => "<mi>&#x03A9;</mi>",
   "colon" => "<mo>:</mo>",
-  "*" => "<mo>*</mo>",
-  "#" => "<mo>#</mo>",
-  "$" => "<mo>$</mo>",
-  "%" => "<mo>%</mo>",
   "&" => "<mo>&amp;</mo>",
-  "_" => "<mo>_</mo>",
-  "!" => "<mo>!</mo>",
   "aleph" => "<mo>&#x2135;</mo>",
   "imath" => "<mo>&#x2373;</mo>",
   "jmath" => "<mo>&#x006A;</mo>",
@@ -406,8 +390,6 @@ DEFAULTS = {
   "copyright" => "<mo>&#x00A9;</mo>",
   "P" => "<mo>&#x00B6;</mo>",
   "pounds" => "<mo>&#x00A3;</mo>",
-  "+" => "<mo>+</mo>",
-  "-" => "<mo>-</mo>",
   "pm" => "<mo>&#x00B1;</mo>",
   "mp" => "<mo>&#x00B1;</mo>",
   "times" => "<mo>&#x00D7;</mo>",
@@ -649,7 +631,8 @@ NOTATION = generate "mo", "",  {
 },
 %w(rfloor rceil rang rangle)
-NOTATION["cdots"] = "<mo>&sdot; &sdot; &sdot;</mo>"
+#NOTATION["cdots"] = "<mo>&sdot; &sdot; &sdot;</mo>"
+NOTATION["cdots"] = "<mo>&ctdot;</mo>"
 NOTATION["pmod"] = "&nbsp; mod"
 ## unary operators ("MOB")
@@ -664,7 +647,7 @@ UNARY_OPERATORS = generate "mo", 'lspace="thinmathspace" rspace="thinmathspace"'
   ["bigotimes", "Otimes"],
   ["bigoplus", "Oplus"],
 ]
-UNARY_OPERATORS["lim"] = "<mo lspace=\"thinmathspace\" rspace=\"thinmathspace\">lim</mo>"
+#UNARY_OPERATORS["lim"] = "<mo lspace=\"thinmathspace\" rspace=\"thinmathspace\">lim</mo>"
 ## spaces
 SPACES = {
@@ -680,7 +663,14 @@ SPACES = {
 ## functions
 MATH_FUNCTIONS = {}
-%w(arccos arcsin arctan arg cos cosh cot coth csc deg det dim exp gcd hom inf ker lg liminf linmsup ln log bmod mod max min Pr sec sin sinh sup tan tanh).each { |x| MATH_FUNCTIONS[x] = "<mo lspace=\"0em\" rspace=\"thinmathspace\">#{x}</mo>" }
+%w(arccos arcsin arctan arg cos cosh cot coth csc deg det dim exp gcd hom inf ker lg liminf linmsup ln log bmod mod max min Pr sec sin sinh sup tan tanh).each { |x| MATH_FUNCTIONS[x] = "<mi>#{x}</mi>" }
+TOKENS = {
+  "-" => "&minus;",
+  "&" => "&amp;",
+  ">" => "&gt;",
+  "<" => "&lt;",
+}
 ENTITIES = DEFAULTS.merge(NUMS).merge(GREEK).merge(OPERATORS).merge(NOTATION).merge(UNARY_OPERATORS).merge(SPACES).merge(MATH_FUNCTIONS)