RubyGems - ritex - Versions diffs - 0.1 - Mend

ritex 0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

data/README +96 -0
data/ReleaseNotes +11 -0
data/lib/ritex.rb +200 -0
data/lib/ritex/lexer.rb +140 -0
data/lib/ritex/mathml/entities.rb +688 -0
data/lib/ritex/mathml/functions.rb +88 -0
data/lib/ritex/mathml/markup.rb +30 -0
data/lib/ritex/parser.rb +845 -0
data/lib/ritex/parser.y +140 -0
data/test/all.rb +7 -0
data/test/itex2mml-key.yaml +292 -0
data/test/mathml.rb +265 -0
data/test/parser.rb +36 -0
metadata +55 -0

data/README ADDED

@@ -0,0 +1,96 @@
+Author::    William Morgan (mailto: wmorgan-ritex@masanjin.net)
+Copyright:: Copyright 2005 William Morgan
+License::   GNU GPL version 2
+= Introduction
+Ritex converts expressions from WebTeX into MathML. WebTeX is an
+adaptation of TeX math syntax for web display.
+Ritex makes inserting math into HTML pages easy. It supports most TeX
+math syntax as well as macros.
+For example, Ritex turns
+  \alpha^\beta
+into
+  <math xmlns="http://www.w3.org/1998/Math/MathML">
+    <msup>
+      <mi>&alpha;</mi>
+      <mi>&beta;</mi>
+    </msup>
+  </math>
+Ritex is based heavily on itex2mml
+(http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html), a popular TeX
+math to MathML convertor--so much so that the default correct answer
+to unit tests is to do whatever itex2mml does!
+Ritex features several advantages over itex2mml:
+* It's written in Ruby (hey, I consider that an advantage).
+* It supports macros.
+* It handles unary minus better.
+* It's easier to extend.
+= Synopsis
+  require 'ritex'
+  p = Ritex::Parser.new
+  ARGF.each { |l| puts p.parse(l) }
+  ## or ...
+  ARGF.each do |l|
+    begin
+      puts p.parse(l)
+    rescue Racc::ParseError
+      $stderr.puts "invalid input"
+    end
+  end
+= Using Ritex
+Calling Ritex from Ruby is very simple. If the synopsis above isn't
+enough, see the documentation for Ritex::Parser for the gory details.
+= Creating MathML with Ritex
+Ritex parses WebTeX. WebTeX is an adapation of the TeX math syntax
+which is designed for web page display. The WebTeX documentation can
+be found at
+http://stuff.mit.edu/afs/athena/software/webeq/currenthome/docs/webtex/toc.html.
+If you're familiar with TeX math syntax, you'll feel right at
+home. But there are several important differences between it and
+WebTeX. Most notably:
+* arrays: different \array syntax; no \eqnarray or \align
+* macro definitions: \define; no \newcommand or \def
+* \left and \right no longer need "invisible" delimiters
+These differences are explained in the WebTeX documentation.
+Ritex is based heavily on Itex2mml. Itex2mml accepts what it calls
+"Itex", an extension of WebTeX which adds a few aliases (like
+\infinity for \infty) and markups (like \underoverset). Ritex supports
+these extensions. Regardless, I've chosen to say that Ritex parses
+WebTeX rather than Itex, mainly because the former includes macros and
+is better documented.
+Itex is described at
+http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html.
+See the ReleaseNotes for features in WebTeX that are currently
+unimplemented in Ritex.
+= Differences between Ritex and itex2mml
+If you're familiar with itex2mml, there are a few subtle differences
+between the two:
+* A sequence of letters like "abc" is treated as three separate
+  variables and not as one variable. I believe that's the TeX Way (tm).
+* \ (backslash space) is a medium space, not an undefined character.
+* Sequences like "x--3" will correctly mark the second operator as a
+  unary minus.
+* And of course, macros.

data/ReleaseNotes ADDED

@@ -0,0 +1,11 @@
+version 0.1, 9/15/2005
+----------------------
+First version. Highly experimental!
+Unimplemented features:
+* \floatleft, \floatright
+* array options
+* \tensor and \multiscripts
+* macros with 4 or more arguments
+* \bghighlight, \fghighlight, \statusline

data/lib/ritex.rb ADDED

@@ -0,0 +1,200 @@
+## lib/ritex.rb -- contains Ritex::Parser
+## Author::    William Morgan (mailto: wmorgan-ritex@masanjin.net)
+## Copyright:: Copyright 2005 William Morgan
+## License::   GNU GPL version 2
+##
+## :title:Ritex: a Ruby itex to mathml converter
+## :main:README
+require "ritex/parser"
+require "ritex/lexer"
+require "ritex/mathml/entities"
+require "ritex/mathml/functions"
+require "ritex/mathml/markup"
+require 'racc/parser' # just for Racc::ParserError
+## Container module for all Ritex stuff. The entry point is
+## Ritex::Parser.
+module Ritex
+## See #merror=
+class Error < Racc::ParseError; end
+## This is not ideal by any means. Until we can call a Proc with an
+## arbitrary binding (Ruby 1.9?), we will relay all #markup and
+## #lookup calls within the module to a registered parser, so that the
+## "functions" in lib/functions.rb can be written more easily. Any
+## better ideas?
+##
+## In the mean time, I'd recommend not having more than one parser at
+## a time going.
+attr_accessor :global_parser
+module_function :global_parser, :global_parser=
+## The parser for itex and the main entry point for Ritex.  This class
+## is partially defined here and partially generated by Racc from
+## lib/parser.y.
+##
+## Create the parser with #new. Parse strings with #parse. That's all
+## there is to it.
+class Parser
+  FORMATS = [:mathml]
+  ## If true, Ritex will output a <merror>...</merror> message in the
+  ## MathML if an unknown entity is encountered. If false (the default),
+  ## Ritex will throw a Ritex::Error.
+  attr_accessor :merror
+  ## _format_ is the desired output format and must be in the FORMATS
+  ## list. Right now that's just :mathml.
+  def initialize format = :mathml
+    self.format = format
+    @macros = {}
+    Ritex.global_parser = self # lame
+    @merror = false
+  end
+  ## Parse a string. Returns the MathML output in string form. Note
+  ## that macro definitios are cumulative and persistent across calls
+  ## to #parse. If you don't want this behavior, you must explicitly
+  ## call #flush_macros after every #parse call.
+  ##
+  ## _wrap_ denotes whether you want the output wrapped in the
+  ## top-level XML math tag. Unless you're generating these tags
+  ## yourself, you want this.
+  ##
+  ## _inline_ denotes whether you want inline markup versus block or
+  ## "display" markup. For mathml output this only has an effect if
+  ## _wrap_ is true.
+  def parse s, wrap = true, inline = true
+    @lex = Lexer.new(self, s)
+    r = yyparse @lex, :lex
+    r = markup r, (inline ? :math : :displaymath) if wrap unless r.empty?
+    r
+  end
+  attr_reader :format
+  def format= format
+    raise ArgumentError, "format must be one of #{FORMATS * ', '}" unless FORMATS.include? format
+    @format = format
+  end
+  ## Delete all macros
+  def flush_macros; @macros = {}; end
+  def markup what, tag, opts=nil #:nodoc:
+    case @format
+    when :mathml
+#      puts "x marking up #{type}, member? #{MathML::MARKUP.member? type}"
+      tag, opts =
+        case tag
+        when String
+          [tag, opts]
+        when Symbol
+          MathML::MARKUP[tag]
+        end
+      if opts
+        "<#{tag} #{opts}>#{what}</#{tag}>"
+      else
+        "<#{tag}>#{what}</#{tag}>"
+      end
+    end
+  end
+  def lookup sym #:nodoc:
+    case @format
+    when :mathml
+      return error("unknown entity #{sym.inspect}") unless MathML::ENTITIES.member? sym
+      MathML::ENTITIES[sym]
+    end
+  end
+  def funcs #:nodoc:
+    case @format
+    when :mathml
+      MathML::FUNCTIONS
+    end
+  end
+  def envs #:nodoc:
+    case @format
+    when :mathml
+      MathML::ENVS
+    end
+  end
+  def macros #:nodoc:
+    @macros
+  end
+  def op_symbols #:nodoc:
+    case @format
+    when :mathml
+      MathML::OPERATORS.merge(MathML::UNARY_OPERATORS).merge(MathML::MATH_FUNCTIONS)
+    end
+  end
+private
+  def error e
+    if @merror
+      "<merror>e</merror>"
+    else
+      raise Error, e
+    end
+  end
+  def safe s
+    case @format
+    when :mathml
+      s.gsub("&", "&amp;").gsub(">", "&gt;").gsub("<", "&lt;")
+    end
+  end
+  def join *a
+    case @format
+    when :mathml
+      a.join ""
+    end
+  end
+  def special name, *a
+    if @macros.member? name
+      #      puts "evaluating macro (arity #{@macros[name].arity}): type #{name.inspect}, #{a.length} args #{a.inspect}"
+      res = @macros[name][*a]
+#      puts "got #{res}"
+      @lex.push res
+      ""
+    elsif funcs.member? name
+#      puts "*** running func #{name}"
+      funcs[name][*a]
+    elsif envs.member? name
+      envs[name][*a]
+    else
+      error "unknown function, macro or environment #{name}"
+    end
+  end
+  def define sym, arity, exp
+    arity = arity.to_i
+    raise Error, "macro arity must be <= 3" unless arity <= 3
+    raise Error, "macro arity must be >= 0" unless arity >= 0
+#    puts "defining macro #{sym} with exp #{exp} (arity #{arity})"
+    warn "overriding definition for #{sym}" if @macros.member? sym
+    @macros[sym] = lambda do |*a|
+      raise Error, "expecting #{arity} arguments, got #{a.length}" unless a.length == arity
+#      puts "evaluating macro #{sym}, args #{a.inspect}"
+      x = (0 ... arity).inject(exp) { |s, i| s.gsub(/\##{i + 1}/, a[i]) }
+#      puts "macro evals to: #{x.inspect}"
+      x
+    end
+    @macros[sym].instance_eval "def arity; #{arity}; end" # hack!
+    ""
+  end
+  def warn s
+    $stderr.puts "warning: #{s}"
+  end
+end
+end

data/lib/ritex/lexer.rb ADDED

@@ -0,0 +1,140 @@
+## lib/ritex/lexer.rb -- contains Ritex::Lexer
+## Author::    William Morgan (mailto: wmorgan-ritex@masanjin.net)
+## Copyright:: Copyright 2005 William Morgan
+## License::   GNU GPL version 2
+require 'racc/parser' # just for Racc::ParseError
+module Ritex
+## thrown upon lexing errors
+class LexError < Racc::ParseError; end
+## The lexer splits input stream into tokens. These are handed to the
+## parser. Ritex::Parser takes care of setting up and configuring the
+## lexer.
+##
+## In order to support macros, the lexer maintains a stack of
+## strings. Pushing a string onto the stack will cause #lex to yield
+## tokens from that string, until it reaches the end, at which point
+## it will discard the string and resume yielding tokens from the
+## previous string.
+##
+## The lexer has two states. Normally it ignores all spacing. After
+## hitting an ENV token it will start returning SPACE tokens for each
+## space until it hits a '}'.
+##
+## The lexer also handles unary minus. It decides whether a '-' is
+## unary or binary by considering the previous token.
+class Lexer
+  TOKENS = '+-\/\*|\.,;:<>=()#&\[\]^_!?~%\'{} ' # passed as themselves
+  ## _s_ is an initial string to push on the stack, or nil.
+  def initialize parser, s = nil
+    @parser = parser
+    @s = []
+    push s unless s.nil?
+  end
+  ## push an additional string on to the stack.
+  def push s; @s.unshift [s, 0]; end
+  ## Yield token and value pairs from the string stack.
+  def lex #:yields: token, value
+    @lastop = nil
+    lex_inner do |sym, val|
+      @lastop = val
+      yield sym, val
+    end
+  end
+  ## For debugging purposes.
+  def dlex #:nodoc:
+    lex do |sym, val|
+      puts "** got #{sym.inspect}: [#{val}]"
+      yield sym, val
+    end
+  end
+private
+  def lex_inner
+    state = :normal
+    until @s.empty?
+#    puts "- @s length #{@s.length}: #{@s.inspect}"
+      s, i = @s.first
+      if i >= s.length
+        @s.shift
+        next
+      end
+#      puts "> now have #{s[i .. s.length]}"
+      case s[i .. s.length]
+      when /\A(\s+)/
+        @s.first[1] += $1.length
+        yield [:SPACE, $1] if state == :env
+      when /\A(\\array)/
+        @s.first[1] += $1.length
+        yield [:ARRAY, $1]
+      when /\A(\\define)/
+        @s.first[1] += $1.length
+        yield [:DEFINE, $1]
+      when /\A(\\left)/
+        @s.first[1] += $1.length
+        yield [:LEFT, $1]
+      when /\A(\\right)/
+        @s.first[1] += $1.length
+        yield [:RIGHT, $1]
+      when /\A-/
+        @s.first[1] += 1
+        if [[nil, '{', '(', '[', '+', '-', '/', '*', '=', '<', '>', '&'],
+            @parser.op_symbols].any? { |x| x.member? @lastop }
+          yield [:UNARYMINUS, '-']
+        else
+          yield ['-', '-']
+        end
+      when /\A([#{TOKENS}])/
+        @s.first[1] += 1
+        state = :normal if (state == :env) && ($1 == '}')
+        yield [$1, $1]
+      when /\A(\\\\)/
+        @s.first[1] += $1.length
+        yield [:DOUBLEBACK, $1]
+      when /\A\\([#{TOKENS}\\\\])/
+        @s.first[1] += $1.length + 1
+        yield [:SYMBOL, $1]
+      when /\A\\([a-zA-Z][a-zA-Z*\d]+)/
+        name = $1
+        type = :SYMBOL
+#        puts "** checking #{name} against specials list #{specs.keys * ' '}, got #{specs[name].inspect}"
+        if @parser.funcs.member? name
+          proc = @parser.funcs[name]
+          type = [:FUNC0, :FUNC1, :FUNC2, :FUNC3][proc.arity]
+          raise LexError, "functions of arity '#{proc.arity}' unsupported" if type.nil?
+        elsif @parser.envs[name]
+          type = :ENV
+          state = :env
+        elsif @parser.macros.member? name
+          proc = @parser.macros[name]
+          type = [:MACRO0, :MACRO1, :MACRO2, :MACRO3][proc.arity]
+          raise LexError, "macro of arity '#{proc.arity}' unsupported" if type.nil?
+        end
+        @s.first[1] += $1.length + 1
+        yield [type, name]
+      when /\A(-?(\d+|\d*\.\d+))/
+        @s.first[1] += $1.length
+        yield [:NUMBER, $1]
+      when /\A(\w)/
+        @s.first[1] += $1.length
+        yield [:VAR, $1]
+      else
+        raise LexError, "unlexable at position #{i}: #{s[i .. [s.length, i + 20].min]}"
+      end
+    end
+    yield [false, false]
+  end
+end
+end