RubyGems - tagtreescanner - Versions diffs - 0.8.0 - Mend

tagtreescanner 0.8.0

Files changed (9) hide show

data/HISTORY +17 -0
data/Manifest.txt +8 -0
data/README +191 -0
data/Rakefile +18 -0
data/TODO +11 -0
data/lib/tagtreescanner.rb +851 -0
data/test/test_simplemarkup.rb +84 -0
data/test/test_tagtreescanner.rb +104 -0
metadata +63 -0

data/HISTORY ADDED Viewed

@@ -0,0 +1,17 @@
+== 0.8.0 / 2007-November-25
+* First release as a gem. Breaks backwards compatibility with older versions.
+* Changed TagTreeScanner::Tag#tag_name to TagTreeScanner::Tag#name
+  * ...because it was dumb to write "tag.tag_name = 'span'"
+* Added a method_missing hack to TagTreeScanner::Tag that delegates
+  to read/write from its attributes hash.
+  * ...because I wanted people to be able to write "tag.href = 'foo'"
+* New TagTreeScanner::Tag#text= method to directly set the contents of
+  a tag, clearing out any other junk.
+== 0.6.1 / 2005-July-5
+* Initial public release

data/Manifest.txt ADDED Viewed

@@ -0,0 +1,8 @@
+HISTORY
+Manifest.txt
+README
+Rakefile
+TODO
+lib/tagtreescanner.rb
+test/test_simplemarkup.rb
+test/test_tagtreescanner.rb

data/README ADDED Viewed

@@ -0,0 +1,191 @@
+<b>TagTreeScanner</b>
+Author::     Gavin Kistner  (mailto:phrogz@mac.com)
+Copyright::  Copyright (c)2005-2007 Gavin Kistner
+License::    MIT License
+Version::    0.8.0 (2007-November-24)
+= Overview
+The TagTreeScanner class provides a generic framework for creating a
+nested hierarchy of tags and text (like XML or HTML) by parsing text. An
+example use (and the reason it was written) is to convert a wiki markup
+syntax into HTML.
+= Example Usage
+  require 'tagtreescanner'
+  class SimpleMarkup < TagTreeScanner
+     @root_factory.allows_text = false
+     @tag_genres[ :root ] = [ ]
+     @tag_genres[ :root ] << TagFactory.new( :paragraph,
+        # A line that doesn't have whitespace at the start
+        :open_match => /(?=\S)/, :open_requires_bol => true,
+        # Close when you see a double return
+        :close_match => /\n[ \t]*\n/,
+        :allows_text => true,
+        :allowed_genre => :inline
+     )
+     @tag_genres[ :root ] << TagFactory.new( :preformatted,
+        # Grab all lines that are indented up until a line that isn't
+        :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
+        :setup => lambda{ |tag, scanner, tagtree|
+           # Throw the contents I found into the tag
+           # but remove leading whitespace
+           tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
+        },
+        :autoclose => :true
+     )
+     @tag_genres[ :inline ] = [ ]
+     @tag_genres[ :inline ] << TagFactory.new( :bold,
+        # An asterisk followed by a letter or number
+        :open_match => /\*(?=[a-z0-9])/i,
+        # Close when I see an asterisk OR a newline coming up
+        :close_match => /\*|(?=\n)/,
+        :allows_text => true,
+        :allowed_genre => :inline
+     )
+     @tag_genres[ :inline ] << TagFactory.new( :italic,
+        # An underscore followed by a letter or number
+        :open_match => /_(?=[a-z0-9])/i,
+        # Close when I see an underscore OR a newline coming up
+        :close_match => /_|(?=\n)/,
+        :allows_text => true,
+        :allowed_genre => :inline
+     )
+  end
+  raw_text = <<ENDINPUT
+  Hello World! You're _soaking in_ my test.
+  This is a *subset* of markup that I allow.
+  Hi paragraph two. Yo! A code sample:
+    def foo
+      puts "Whee!"
+    end
+  _That, as they say, is that._
+  ENDINPUT
+  markup = SimpleMarkup.new( raw_text ).to_xml
+  puts markup
+  #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
+  #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
+  #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
+  #=> <preformatted>def foo
+  #=>   puts "Whee!"
+  #=> end</preformatted>
+  #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
+= Details
+== TagFactories at 10,000 feet
+  Each possible output tag is described by a TagFactory, which specifies
+some or all of the following:
+* The name of the tags it creates <i>(required)</i>
+* The regular expression to look for to start the tag
+* The regular expression to look for to close the tag, or
+* Whether the tag is automatically closed after creation
+* What genre of tags are allowed within the tag
+* Whether the tag supports raw text inside it
+* Code to run when creating a tag
+See the TagFactory class for more information on specifying factories.
+== Genres as a State Machine
+As a new tag is opened, the scanner uses the Tag#allowed_genre property
+of that tag (set by the +allowed_genre+ property on the TagFactory) to
+determine which tags to be looking for. A genre is specified by adding
+an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
+For example:
+  @tag_genres[ :inline ] = [ ]
+adds a new genre named 'inline', with no tags in it. TagFactory instances
+should be pushed onto this array <b>in the order that they should be looked
+for</b>. For example:
+  @tag_genres[ :inline ] << TagFactory.new( :italic,
+    # see the TagFactory#initialize for options
+  )
+Note that the +close_match+ regular expression of the current tag is
+always checked before looking to open/create any new tags.
+== Consuming Text
+As the text is being parsed, there will (probably) be many cases where
+you have raw text that doesn't close or open any new tags. Whenever the
+scanner reaches this state, it runs the <tt>@text_match</tt> regexp
+against the text to move the pointer ahead. If the current tag has
+<tt>Tag#allows_text?</tt> set to +true+ (through
+<tt>TagFactory#allows_text</tt>), then this text is added as contents of
+the tag. If not, the text is thrown away.
+The safest regular expression consumes only one character at a time:
+  @text_match = /./m
+<b><i>It is vital that your regexp match newlines</i></b> (the 'm')
+<b><i>unless every single one of your tags is set to close upon seeing
+a newline.</i></b>
+Unfortunately, the safest regular expression is also the slowest. If
+speed is an issue, your regexp should strive to eat as many characters as
+possible at once...while ensuring that it doesn't eat characters that
+would signify the start of a new tag.
+For example, setting a regexp like:
+  @text_match = /\w+|./m
+allows the scanner to match a whole word at a time. However, if you have
+a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
+the entire string would be eaten as text and the subscript tag would
+never start.
+== Using the Scanner
+As shown in the example above, consumers of your class initialize it by
+passing in the string to be parsed, and then calling #to_xml or #to_html
+on it.
+<i>(This two-step process allows the consumer to run other code after
+the tag parsing, before final conversion. Examples might include
+replacing special command tags with other input, or performing database
+lookups on special wiki-page-link tags and replacing with HTML
+anchors.)</i>
+= Requirements
+TagTreeScanner is built on top of the StringScanner library that is part
+of the standard Ruby installation.
+= License
+(The MIT License)
+Copyright (c) 2005-2007 Gavin Kistner
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+'Software'), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/Rakefile ADDED Viewed

@@ -0,0 +1,18 @@
+# -*- ruby -*-
+require 'rubygems'
+require 'hoe'
+require './lib/tagtreescanner.rb'
+Hoe.new('tagtreescanner', TagTreeScanner::VERSION) do |p|
+  p.rubyforge_name = 'tagtreescanner'
+  p.author = 'Gavin Kistner'
+  p.email  = 'phrogz@mac.com'
+  p.url         = ''
+  p.summary = 'Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.'
+  p.description = IO.read( 'README' )[ /= Overview\n(.+?)^=/m, 1 ].rstrip
+  p.changes     = IO.read( 'HISTORY' )[ /^=[^\n]+\n+(.+?)^=/m, 1 ].rstrip
+  p.remote_rdoc_dir = ''
+end
+# vim: syntax=Ruby

data/TODO ADDED Viewed

@@ -0,0 +1,11 @@
+* Overhaul Tag and TextNode and TagTreeScanner to use a common DOM module
+  like <tt>Phrogz::DOM::OrderedTreeNode</tt>.
+* Allow TagFactories to explicitly specify multiple allowed genres
+  and/or allowed tags, rather than only one genre.
+* Provide a method like inner_html= for parsing and creating tag content.
+  * Useful for batch replacing the contents of a single tag with output from
+    another program, while maintaining the DOM integrity.
+* More unit tests

data/lib/tagtreescanner.rb ADDED Viewed

@@ -0,0 +1,851 @@
+# This file covers the TagTreeScanner class, and the extensions to the
+# String class needed by it.
+# Please see the documentation on those classes for more information.
+#
+# Author::     Gavin Kistner  (mailto:phrogz@mac.com)
+# Copyright::  Copyright (c)2005-2007 Gavin Kistner
+# License::    MIT License
+# Version::    0.8.0 (2007-November-24)
+require 'strscan'
+# = Overview
+# The TagTreeScanner class provides a generic framework for creating a
+# nested hierarchy of tags and text (like XML or HTML) by parsing text. An
+# example use (and the reason it was written) is to convert a wiki markup
+# syntax into HTML.
+#
+# = Example Usage
+#   require 'TagTreeScanner'
+#
+#   class SimpleMarkup < TagTreeScanner
+#      @root_factory.allows_text = false
+#
+#      @tag_genres[ :root ] = [ ]
+#
+#      @tag_genres[ :root ] << TagFactory.new( :paragraph,
+#         # A line that doesn't have whitespace at the start
+#         :open_match => /(?=\S)/, :open_requires_bol => true,
+#
+#         # Close when you see a double return
+#         :close_match => /\n[ \t]*\n/,
+#         :allows_text => true,
+#         :allowed_genre => :inline
+#      )
+#
+#      @tag_genres[ :root ] << TagFactory.new( :preformatted,
+#         # Grab all lines that are indented up until a line that isn't
+#         :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
+#         :setup => lambda{ |tag, scanner, tagtree|
+#            # Throw the contents I found into the tag
+#            # but remove leading whitespace
+#            tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
+#         },
+#         :autoclose => :true
+#      )
+#
+#      @tag_genres[ :inline ] = [ ]
+#
+#      @tag_genres[ :inline ] << TagFactory.new( :bold,
+#         # An asterisk followed by a letter or number
+#         :open_match => /\*(?=[a-z0-9])/i,
+#
+#         # Close when I see an asterisk OR a newline coming up
+#         :close_match => /\*|(?=\n)/,
+#         :allows_text => true,
+#         :allowed_genre => :inline
+#      )
+#
+#      @tag_genres[ :inline ] << TagFactory.new( :italic,
+#         # An underscore followed by a letter or number
+#         :open_match => /_(?=[a-z0-9])/i,
+#
+#         # Close when I see an underscore OR a newline coming up
+#         :close_match => /_|(?=\n)/,
+#         :allows_text => true,
+#         :allowed_genre => :inline
+#      )
+#   end
+#
+#   raw_text = <<ENDINPUT
+#   Hello World! You're _soaking in_ my test.
+#   This is a *subset* of markup that I allow.
+#
+#   Hi paragraph two. Yo! A code sample:
+#
+#     def foo
+#       puts "Whee!"
+#     end
+#
+#   _That, as they say, is that._
+#
+#   ENDINPUT
+#
+#   markup = SimpleMarkup.new( raw_text ).to_xml
+#   puts markup
+#
+#
+#   #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
+#   #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
+#   #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
+#   #=> <preformatted>def foo
+#   #=>   puts "Whee!"
+#   #=> end</preformatted>
+#   #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
+#
+#
+# = Details
+#
+# == TagFactories at 10,000 feet
+# Each possible output tag is described by a TagFactory, which specifies
+# some or all of the following:
+# * The name of the tags it creates <i>(required)</i>
+# * The regular expression to look for to start the tag
+# * The regular expression to look for to close the tag, or
+# * Whether the tag is automatically closed after creation
+# * What genre of tags are allowed within the tag
+# * Whether the tag supports raw text inside it
+# * Code to run when creating a tag
+#
+# See the TagFactory class for more information on specifying factories.
+#
+# == Genres as a State Machine
+# As a new tag is opened, the scanner uses the Tag#allowed_genre property
+# of that tag (set by the +allowed_genre+ property on the TagFactory) to
+# determine which tags to be looking for. A genre is specified by adding
+# an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
+# For example:
+#   @tag_genres[ :inline ] = [ ]
+# adds a new genre named 'inline', with no tags in it. TagFactory instances
+# should be pushed onto this array <b>in the order that they should be looked
+# for</b>. For example:
+#   @tag_genres[ :inline ] << TagFactory.new( :italic,
+#     # see the TagFactory#initialize for options
+#   )
+#
+# Note that the +close_match+ regular expression of the current tag is
+# always checked before looking to open/create any new tags.
+#
+# == Consuming Text
+# As the text is being parsed, there will (probably) be many cases where
+# you have raw text that doesn't close or open any new tags. Whenever the
+# scanner reaches this state, it runs the <tt>@text_match</tt> regexp
+# against the text to move the pointer ahead. If the current tag has
+# <tt>Tag#allows_text?</tt> set to +true+ (through
+# <tt>TagFactory#allows_text</tt>), then this text is added as contents of
+# the tag. If not, the text is thrown away.
+#
+# The safest regular expression consumes only one character at a time:
+#   @text_match = /./m
+#
+# <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
+# <b><i>unless every single one of your tags is set to close upon seeing
+# a newline.</i></b>
+#
+# Unfortunately, the safest regular expression is also the slowest. If
+# speed is an issue, your regexp should strive to eat as many characters as
+# possible at once...while ensuring that it doesn't eat characters that
+# would signify the start of a new tag.
+#
+# For example, setting a regexp like:
+#   @text_match = /\w+|./m
+# allows the scanner to match a whole word at a time. However, if you have
+# a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
+# the entire string would be eaten as text and the subscript tag would
+# never start.
+#
+# == Using the Scanner
+# As shown in the example above, consumers of your class initialize it by
+# passing in the string to be parsed, and then calling #to_xml or #to_html
+# on it.
+#
+# <i>(This two-step process allows the consumer to run other code after
+# the tag parsing, before final conversion. Examples might include
+# replacing special command tags with other input, or performing database
+# lookups on special wiki-page-link tags and replacing with HTML
+# anchors.)</i>
+class TagTreeScanner
+  VERSION = "0.8.0"
+  # A TagFactory holds the information about a specific kind of tag:
+  # * the name of the tag
+  # * what to look for to open and close the tag
+  # * what genre of tags it may contain
+  # * whether the tag permits raw text
+  # * additional code to run when creating the tag
+  #
+  # See the documentation about the <tt>@tag_genres</tt> hash inside
+  # the TagTreeScanner class for information on how to add factories
+  # for use.
+  #
+  # === Utilizing <tt>:autoclose</tt>
+  # Occasionally you will want to
+  # create a tag and allow no other tags inside it. An example might be
+  # a tag containing preformatted code.
+  #
+  # Rather than opening the tag and slowly spinning through all the
+  # text, the combination of the <tt>:autoclose</tt> and
+  # <tt>:setup</tt> options allow you to create the tag, fill it with
+  # content, and then immediately continute with the parent tag.
+  #
+  # See the #new method for how to use the <tt>:setup</tt>
+  # function, and an example usage.
+  class TagFactory
+    # The type of tag this factory produces.
+    attr_accessor :tag_name
+    # A regexp to match (and consume) that causes a new tag to be started.
+    attr_accessor :open_match
+    # Does the #open_match regexp require beginning of line?
+    attr_accessor :open_requires_bol
+    # The regexp which causes the tag to automatically close.
+    attr_accessor :close_match
+    # Does the #open_match regexp require beginning of line?
+    attr_accessor :close_requires_bol
+    # Should this tag stay open when created, or automatically close?
+    attr_accessor :autoclose
+    # A symbol with the genre of tags that are allowed inside the tag.
+    # <i>(See @tag_genres in the TagTreeScanner documentation.)</i>
+    attr_accessor :allowed_genre
+    # May tags created by this factory have text added to them?
+    attr_accessor :allows_text
+    # __tag_name__:: A symbol with the name of the tag to create
+    # __options__:: A hash including one or more of <tt>:open_match</tt>,
+    # <tt>:open_requires_bol</tt>, <tt>:close_match</tt>,
+    # <tt>:close_requires_bol</tt>, <tt>:autoclose</tt>,
+    # <tt>:allows_text</tt>, <tt>:allowed_genre</tt>, and
+    # <tt>:setup</tt>.
+    #
+    # Due to the way the StringScanner class works, placing a <tt>^</tt>
+    # (beginning of line) marker in your <tt>:open_match</tt> or
+    # <tt>:close_match</tt> regular expressions will not behave as
+    # desired. Instead, set the <tt>:open_requires_bol</tt> and/or
+    # <tt>:close_requires_bol</tt> properties to +true+ if desired.
+    #
+    # A factory should either be set to <tt>:autoclose => true</tt>, or
+    # supply a <tt>:close_match</tt>. (Otherwise, it will never close.)
+    #
+    # Further, a factory should either be set to
+    # <tt>:autoclose => true</tt> or specify an <tt>:allowed_genre</tt>.
+    # <i>(See below for how to efficiently create a tag that cannot
+    # contain other tags.)</i>
+    #
+    # The <tt>:setup</tt> option is used to run code during the tag
+    # creation. The value of this option should be a lambda/Proc that
+    # accepts three parameters:
+    # * the <b>Tag</b> being created
+    # * the <b>StringScanner</b> instance that matched the tag opening
+    # * the <b>TagTreeScanner</b> instance creating the tag.
+    #
+    # === Example:
+    #  # Shove URLs as HTML anchors, without the protocol prefix shown
+    #  @tag_genres[ :inline ] << TagFactory.new( :a,
+    #    :open_match => %r{http://(\S+)},
+    #    :setup => lambda{ |tag, ss, tagtree|
+    #      tag.attributes[ :href ] = ss[0]
+    #      tag << ss[1]
+    #    },
+    #    :autoclose => true
+    #  )
+    def initialize( tag_name, options={} )
+      @tag_name = tag_name
+      [ :open_match, :close_match,
+        :open_requires_bol, :close_requires_bol,
+        :allowed_genre, :autoclose,
+        :allows_text,
+        :setup, :attributes ].each{ |k|
+        self.instance_variable_set( "@#{k}".intern, options[ k ] )
+      }
+    end
+    # Creates and returns a new tag if the supplied _string_scanner_
+    # matches the +open_match+ of this factory.
+    #
+    # Called by TagTreeScanner during initialization.
+    def match( string_scanner, tagtreescanner ) #:nodoc:
+      #puts "Matching #{@open_match.inspect} against #{string_scanner.peek(10)}"
+      return nil unless ( !@open_requires_bol || string_scanner.bol? ) && string_scanner.scan( @open_match )
+      tag = maketag
+      @setup.call( tag, string_scanner, tagtreescanner ) if @setup
+      #puts "...created #{tag}"
+      tag
+    end
+    # Creates a tag from the factory manually
+    def create #:nodoc:
+      tag = maketag
+      @setup.call( tag, nil, nil ) if @setup
+      tag
+    end
+    private
+      # DRY common code
+      def maketag #:nodoc:
+        tag = Tag.new( @tag_name )
+        tag.factory = self
+        tag.attributes = @attributes if @attributes
+        tag
+      end
+  end
+  # Tags are the equivalent of a DOM Element. The majority of tags
+  # are created automatically by a TagFactory, but it may be
+  # necessary to create them directly in order to augment or replace
+  # information in the tag tree.
+  #
+  # A Tag may have one or more attributes, which are pairs of
+  # key/value strings; attributes are output in the HTML or XML
+  # representation of the Tag.
+  #
+  # Each tag also has an <tt>info</tt> hash, which may be used to
+  # keep track of extra bits of information about a tag. <i>Example
+  # usages might be keeping track of the depth of a list item, or the
+  # associated section for a header.</i> Information from the +info+
+  # hash is not output in the HTML or XML representations.
+  class Tag
+    # A symbol with the name of this tag
+    attr_accessor :name
+    # An array of child Tag or TextNode instances
+    attr_accessor :child_tags
+    # A hash of key/value attributes to emit in the XML/HTML
+    # representation
+    attr_accessor :attributes
+    # The TagFactory that created this tag (may be +nil+)
+    attr_accessor :factory
+    # A hash that may be used to store extra information about a Tag
+    attr_accessor :info
+    # The Tag to which this tag is attached (may be +nil+)
+    attr_reader :parent_tag
+    # The Tag or TextNode which immediately follows this tag
+    # (may be +nil+ if this is the last tag of its parent)
+    attr_reader :next_sibling
+    # The Tag or TextNode which immediately precedes this tag
+    # (may be +nil+ if this is the first tag of its parent)
+    attr_reader :previous_sibling
+    # _name_::   A symbol with the name of this tag
+    # _attributes_:: A hash of key/value pairs to store with this tag
+    def initialize( name, attributes={} )
+      @name = name
+      @child_tags = [ ]
+      @attributes = attributes
+      @info = {}
+    end
+    # Allows for settings HTML or XML-like attributes directly without
+    # knowing about the _attributes_ collection. For example:
+    #   tag.href  = 'http://www.google.com'
+    #   tag.class = 'external'
+    # is the same as:
+    #   tag.attributes['href']  = 'http://www.google.com'
+    #   tag.attributes['class'] = 'external'
+    # ...for any attributes (like the above) that don't have the same
+    # name as an existing method or attribute on the Tag class.
+    def method_missing( name, *args )
+      if (name=name.to_s) =~ /=$/
+        @attributes[ name[0...-1] ] = (args.size==1 ? args[0] : args )
+      else
+        @attributes[ name ]
+      end
+    end
+    # Returns the +close_match+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def close_match
+      @factory && @factory.close_match
+    end
+    # Returns the +close_requires_bol+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def close_requires_bol?
+      @factory && @factory.close_requires_bol
+    end
+    # Returns the +autoclose+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def autoclose?
+      @factory && @factory.autoclose
+    end
+    # Returns the +allows_text+ property of the owning TagFactory,
+    # or +true+ if this tag wasn't created by a factory.
+    def allows_text?
+      @factory ? @factory.allows_text : true
+    end
+    # Returns the +allowed_genre+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def allowed_genre
+      @factory && @factory.allowed_genre
+    end
+    # _new_child_:: The Tag or TextNode to add as the last child.
+    #
+    # Adds _new_child_ to the end of this tag's +child_tags_ collection.
+    # Returns a reference to _new_child_.
+    #
+    # If _new_child_ is a child of another Tag, it is first removed from
+    # that tag.
+    def append_child( new_child )
+      return if new_child == @child_tags.last
+      insert_after( new_child, @child_tags.last )
+    end
+    # _new_child_:: The Tag or TextNode to add as a child of this tag.
+    # _reference_child_:: The child to place _new_child_ before.
+    #
+    # Adds _new_child_ as a child of this tag, immediately before the
+    # location of _reference_child_. Returns a reference to _new_child_.
+    #
+    # If _reference_child_ is +nil+, the child is added as the last
+    # child of this tag. A RuntimeError is raised if _reference_child_
+    # is not a child of this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def insert_before( new_child, reference_child=nil )
+      return new_child if reference_child ? ( reference_child.previous_sibling == new_child ) : ( new_child == @child_tags.last )
+      insert_after( new_child, reference_child ? reference_child.previous_sibling : @child_tags.last )
+    end
+    # _new_child_:: The Tag or TextNode to add as a child of this tag.
+    # _reference_child_:: The child to place _new_child_ after.
+    #
+    # Adds _new_child_ as a child of this tag, immediately after the
+    # location of _reference_child_. Returns a reference to _new_child_.
+    #
+    # If _reference_child_ is +nil+, the child is added as the first
+    # child of this tag. A RuntimeError is raised if _reference_child_
+    # is not a child of this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def insert_after( new_child, reference_child=nil )
+      #puts "#{self.inspect}#insert_after( #{new_child.inspect}, #{reference_child.inspect} )"
+      return new_child if reference_child ? ( reference_child.next_sibling == new_child ) : ( new_child == @child_tags.first )
+      #Ensure new_child is not not an ancestor of self
+      walker = self
+      while walker
+        raise "#{new_child.inspect} cannot be added under #{self.inspect}, because it is an ancestor of it!" if walker==new_child
+        walker = walker.parent_tag
+      end
+      new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
+      if reference_child
+        new_idx = @child_tags.index( reference_child )
+        raise "#{reference_child.inspect} is not a child of #{self.inspect}" unless new_idx
+        new_idx += 1
+      else
+        new_idx = 0
+      end
+      new_child.parent_tag = self
+      succ = @child_tags[ new_idx ]
+      @child_tags.insert( new_idx, new_child )
+      new_child.previous_sibling = reference_child
+      reference_child.next_sibling = new_child if reference_child
+      new_child.next_sibling = succ
+      succ.previous_sibling = new_child if succ
+      new_child
+    end
+    # _existing_child_:: The Tag or TextNode to remove.
+    #
+    # Removes _existing_child_ from being a child of this tag.
+    # Returns _existing_child_.
+    #
+    # A RuntimeError is raised if _existing_child_ is not a child of
+    # this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def remove_child( existing_child )
+      idx = @child_tags.index( existing_child )
+      raise "#{existing_child.inspect} is not a child of #{self.inspect}" unless idx
+      prev, succ = existing_child.previous_sibling, existing_child.next_sibling
+      prev.next_sibling = succ if prev
+      succ.previous_sibling = prev if succ
+      @child_tags.delete_at( idx )
+      existing_child.previous_sibling = existing_child.next_sibling = existing_child.parent_tag = nil
+      existing_child
+    end
+    # _old_child_:: The existing child Tag or TextNode to replace.
+    # _new_child_:: The Tag or TextNode to replace _old_child_.
+    #
+    # Replaces _old_child_ with _new_child_ in this collection.
+    # Returns _old_child_.
+    #
+    # A RuntimeError is raised if _existing_child_ is not a child of
+    # this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def replace_child( old_child, new_child )
+      if ( prev = old_child.previous_sibling ) == new_child || old_child.next_sibling == new_child
+        remove_child( old_child )
+      else
+        new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
+        remove_child( old_child )
+        insert_after( new_child, prev )
+      end
+      old_child
+    end
+    # _new_child_:: The Tag or TextNode to replace this tag.
+    #
+    # Replaces this tag with _new_child_. Returns _new_child_.
+    #
+    # A RuntimeError is raised if this tag is not a child of another tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def replace_with( new_child )
+      return new_child if new_child == self
+      raise "#{self.inspect} is not a child of another tag" unless @parent_tag
+      @parent_tag.replace_child( self, new_child )
+      new_child
+    end
+    # _additional_text_:: The text to add to this node.
+    #
+    # Appends _additional_text_ to this tag. If the last item in the
+    # +child_tags+ collection is a TextNode, the text is added to that
+    # item; otherwise, a new TextNode is created with _additional_text_
+    # and added as the last child of this tag.
+    def << ( additional_text )
+      last_child = @child_tags.last
+      if last_child.is_a? TextNode
+        last_child << additional_text
+      else
+        append_child( TextNode.new( additional_text ) )
+      end
+    end
+    # Set the text content of this element to _new_contents_
+    # Removes any child tags (and their text)
+    def text=( new_contents )
+      @child_tags.clear
+      append_child( TextNode.new( new_contents ) )
+    end
+    alias_method :inner_text=, :text=
+    # Returns an HTML representation of this tag and all its descendants.
+    #
+    # This method is the same as #to_xml except that tags without
+    # any +child_tags+ use an explicit close tag, e.g.
+    # <tt><div></div></tt> instead of XML's <tt><div /></tt>
+    def to_html
+      to_xml( false )
+    end
+    # Returns an XML representation of this tag and all its descendants.
+    #
+    # If _empty_tags_collapsed_ is +true+ (the default) then this method
+    # is the same as #to_html except that tags without any +child_tags+
+    # use a single closed tag, e.g.
+    # <tt><div /></tt> instead of HTML's <tt><div></div></tt>
+    #
+    # If _empty_tags_collapsed_ is +false+, this is the same as #to_html.
+    def to_xml( empty_tags_collapsed=true )
+      out = "<#{@name}"
+      @attributes.each{ |k,v| out << " #{k}=\"#{v.to_s.gsub( '""', '&quot;' )}\"" }
+      if empty_tags_collapsed && @child_tags.empty?
+        out << ' />'
+      else
+        out << '>'
+        unless @child_tags.empty?
+          out << "\n" unless self.allows_text?
+          @child_tags.each{ |tag|
+            out << tag.to_xml( empty_tags_collapsed )
+          }
+        end
+        out << "</#{@name}>"
+      end
+      out << "\n" if @parent_tag && !@parent_tag.allows_text?
+      out
+    end
+    # Returns an array of all descendants of this tag whose #name
+    # matches the supplied _name_.
+    def tags_by_name( name )
+      out = []
+      @child_tags.each{ |tag|
+        out << tag if tag.name == name
+        unless tag.child_tags.empty?
+          out.concat( tag.tags_by_name( name ) )
+        end
+      }
+      out
+    end
+    # Returns the text contents of this tag and its descendants.
+    def inner_text
+      @child_tags.inject(''){ |out,tag|
+        out << ( tag.is_a?( TextNode ) ? tag.text : tag.inner_text )
+      }
+    end
+    def inspect #:nodoc:
+      out = "<#{@name}"
+      #out << " @pops=#{@parent_tag ? @parent_tag.name.inspect : 'nil'}"
+      #out << " @prev=#{@previous_sibling ? @previous_sibling.name.inspect : 'nil'}"
+      #out << " @next=#{@next_sibling ? @next_sibling.name.inspect : 'nil'}"
+      @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
+      @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
+      children = @child_tags.length
+      if children == 1 && TextNode === @child_tags.first
+        out << ">#{@child_tags.first}</#{@name}"
+      elsif children == 0
+        out << '>'
+      else
+        out << " (#{@child_tags.length} child#{@child_tags.length != 1 ? 'ren' : ''})>"
+      end
+    end
+    # _level_:: The indentation level (tabs) to start at.
+    #
+    # Returns a full-hierarchical representation of this tag and its
+    # descendants. (Used for debugging.)
+    def to_hier( level=0 ) #:nodoc:
+      tabs = "\t" * level
+      out = "#{tabs}<#{@name}"
+      @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
+      @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
+      if @child_tags.empty?
+        out << " />\n"
+      elsif @child_tags.length == 1 && TextNode === @child_tags.first
+        out << ">#{@child_tags.first}</#{@name}>\n"
+      else
+        out << ">\n"
+        @child_tags.each{ |n| out << n.to_hier(level+1) }
+        out << "#{tabs}</#{@name}>\n"
+      end
+      out
+    end
+    # Returns a copy of this tag and its entire hierarchy.
+    # All descendant tags/text nodes are also cloned.
+    #
+    # The +info+ hash is not preserved.
+    def dup
+      tag = self.class.new( self.name, self.attributes.dup )
+      @child_tags.each{ |tag2| tag.append_child( tag2.dup ) }
+      tag
+    end
+    # :stopdoc:
+    protected
+      attr_writer :previous_sibling, :next_sibling, :parent_tag
+    # :startdoc:
+  end
+  # A TextNode holds raw text inside a Tag. Generally, TextNodes are
+  # created automatically by the Tag#<< method.
+  class TextNode
+    # The Tag or TextNode that comes after this one (may be +nil+)
+    attr_accessor :next_sibling
+    # The Tag or TextNode that comes before this one (may be +nil+)
+    attr_accessor :previous_sibling
+    # The Tag that is a parent of this TextNode (may be +nil+)
+    attr_accessor :parent_tag
+    # A hash which may be used to store 'extra' information
+    attr_accessor :info
+    # The string contents of this text node
+    attr_accessor :text
+    # _text_:: The text to start out with
+    def initialize( text='' )
+      @text = text
+      @info = {}
+    end
+    # _additional_text_:: The text to add
+    #
+    # Appends the provided text to the end of the current text
+    #
+    # Returns the new text value
+    def << ( additional_text )
+      @text << additional_text
+    end
+    # Returns a copy of this text node
+    def dup
+      tag = self.class.new( @text.dup )
+    end
+    def to_hier( level=0 ) #:nodoc:
+      "#{"\t"*level}#{@text.inspect}\n"
+    end
+    def to_s #:nodoc:
+      @text
+    end
+    # Returns the contents of this node, modified to be made XML-safe
+    # by calling String#xmlsafe.
+    def to_xml( *args )
+      @text.xmlsafe
+    end
+  end
+  # RDoc thinks that this stuff applies to instances, not the class
+  # :stopdoc:
+  class << self
+    attr_accessor :tag_genres, :root_factory, :text_match
+  end
+  # :startdoc:
+  # The tag_genres hash maps a genre name onto an array of TagFactories.
+  #
+  # Factories are tested in the order they appear in the genre array;
+  # more important matches are at the top, generic fallback ones
+  # should appear at the end of the list.
+  #
+  # If no factory matches the current input, then text is shoved into the
+  # most recent tag until a new tag start is found, or the closing match
+  # is met. (If the current tag's factory does not have :allows_text set
+  # to true, then the text is simply thrown away until a the closing or
+  # new tag start is found.)
+  @tag_genres = { }
+  # Settings for the root of your document: what genre is allowed at the
+  # highest level, and should raw text be allowed there?
+  #
+  # Override in your class by setting a class-instance variable as below.
+  @root_factory = TagFactory.new( :root,
+    :allowed_genre => :root,
+    :allows_text => true )
+  # The pattern to consume and shove as text whenever no tag start/close
+  # is found. Eating one character at a time is safest, but slow.
+  # Ensure that this pattern never lets you over the start of a tag,
+  # or else you'll miss it.
+  @text_match = /./m
+  # Scans through _string_to_parse_ and builds a tree of tags based
+  # on the regular expressions and rules set by the TagFactory
+  # instances present in <tt>@tag_genres</tt>.
+  #
+  # After parsing the tree, call #to_xml or #to_html to retrieve
+  # a string representation.
+  def initialize( string_to_parse )
+    current = @root = self.class.root_factory.create
+    tag_genres = self.class.tag_genres
+    text_match = self.class.text_match
+    ss = StringScanner.new( string_to_parse )
+    while !ss.eos?
+      # Keep popping off the current tag until we get to the root,
+      # as long as the end criteria is met
+      while ( current != @root ) && (!current.close_requires_bol? || ss.bol?) && ss.scan( current.close_match )
+        current = current.parent_tag || @root
+      end
+      # No point in continuing if closing out tags consumed the rest of the string
+      break if ss.eos?
+      # Look for a tag to open
+      if factories = tag_genres[ current.allowed_genre ]
+        tag = nil
+        factories.each{ |factory|
+          if tag = factory.match( ss, self )
+            current.append_child( tag )
+            current = tag unless tag.autoclose?
+            break
+          end
+        }
+        #start at the top of the loop if we found one
+        next if tag
+      end
+      # Couldn't find a valid tag at this spot
+      # so we need to eat some characters
+      consumed = ss.scan( text_match )
+      current << consumed if current.allows_text?
+    end
+  end
+  # Returns an HTML representation of the tag tree.
+  #
+  # This is the same as the #to_xml method except that empty tags use an
+  # explicit close tag, e.g. <tt><div></div></tt> versus <tt><div /></tt>
+  def to_html
+    @root.child_tags.inject(''){ |out, tag| out << tag.to_html }
+  end
+  # Returns an XML representation of the tag tree.
+  #
+  # This method is the same as the #to_html method except that empty tags
+  # do not use an explicit close tag,
+  # e.g. <tt><div /></tt> versus <tt><div></div></tt>
+  def to_xml
+    @root.child_tags.inject(''){ |out, tag| out << tag.to_xml }
+  end
+  # Returns an array of all root-level tags found
+  def tags
+    @root.child_tags
+  end
+  # Returns an array of all tags in the tree whose Tag#name matches
+  # the supplied _name_.
+  def tags_by_name( name )
+    @root.tags_by_type( name )
+  end
+  # Returns a hierarchical representation of the entire tag tree
+  def inspect #:nodoc:
+    @root.to_hier
+  end
+  # When a class inherits from TagTreeScanner, defaults are set for
+  # <tt>@tag_genres</tt>, <tt>@root_factory</tt> and
+  # <tt>@text_match</tt>
+  def self.inherited( child_class ) #:nodoc:
+    child_class.tag_genres = @tag_genres
+    child_class.root_factory = @root_factory
+    child_class.text_match = @text_match
+  end
+end
+# Extensions to the built-in String class
+class String
+  # Returns a copy of the string with all <tt>&</tt>, <tt><</tt> and
+  # <tt>></tt> characters replaced by their equivalent XML entities
+  # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
+  def xmlsafe
+    self.dup.xmlsafe!
+  end
+  # Modifies the string, replacing all <tt>&</tt>, <tt><</tt> and
+  # <tt>></tt> characters with their equivalent XML entities
+  # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
+  def xmlsafe!
+    self.gsub!( /&/, '&amp;' )
+    self.gsub!( /</, '&lt;' )
+    self.gsub!( />/, '&gt;' )
+    self
+  end
+end

data/test/test_simplemarkup.rb ADDED Viewed

@@ -0,0 +1,84 @@
+require "test/unit"
+require "../lib/tagtreescanner.rb"
+class SimpleMarkup < TagTreeScanner
+	@root_factory.allows_text = false
+	@tag_genres[ :root ] = [ ]
+	@tag_genres[ :root ] << TagFactory.new( :paragraph,
+		# A line that doesn't have whitespace at the start
+		:open_match => /(?=\S)/, :open_requires_bol => true,
+		# Close when you see a double return
+		:close_match => /\n[ \t]*\n/,
+		:allows_text => :true,
+		:allowed_genre => :inline
+	)
+	@tag_genres[ :root ] << TagFactory.new( :preformatted,
+		# Grab all lines that are indented up until a line that isn't
+		:open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
+		:setup => lambda{ |tag, scanner, tagtree|
+			# Throw the contents I found into the tag
+			# but remove leading whitespace
+			tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
+		},
+		:autoclose => :true
+	)
+	@tag_genres[ :inline ] = [ ]
+	@tag_genres[ :inline ] << TagFactory.new( :bold,
+		# An asterisk followed by a letter or number
+		:open_match => /\*(?=[a-z0-9])/i,
+		# Close when I see an asterisk OR a newline coming up
+		:close_match => /\*|(?=\n)/,
+		:allows_text => true,
+		:allowed_genre => :inline
+	)
+	@tag_genres[ :inline ] << TagFactory.new( :italic,
+		# An underscore followed by a letter or number
+		:open_match => /_(?=[a-z0-9])/i,
+		# Close when I see an underscore OR a newline coming up
+		:close_match => /_|(?=\n)/,
+		:allows_text => true,
+		:allowed_genre => :inline
+	)
+end
+class Tag_Test < Test::Unit::TestCase
+  def setup
+  end
+  def test_conversion
+    raw_text = <<-ENDINPUT
+    Hello World! You're _soaking in_ my test.
+    This is a *subset* of markup that I allow.
+    Hi paragraph two. Yo! A code sample:
+      def foo
+        puts "Whee!"
+      end
+    _That, as they say, is that._
+    ENDINPUT
+    markup = SimpleMarkup.new( raw_text ).to_xml
+    p '',markup
+  end
+end
+#=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
+#=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
+#=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
+#=> <preformatted>def foo
+#=>   puts "Whee!"
+#=> end</preformatted>
+#=> <paragraph><italic>That, as they say, is that.</italic></paragraph>

data/test/test_tagtreescanner.rb ADDED Viewed

@@ -0,0 +1,104 @@
+require "test/unit"
+require "../lib/tagtreescanner"
+class Tag_Test < Test::Unit::TestCase
+  def setup
+  end
+  def test1_tags
+    root = TagTreeScanner::Tag.new( :root, { :is_root => true } )
+    assert_equal( :root, root.name )
+    assert_equal( true, root.attributes[ :is_root ] )
+    assert_nil( root.allowed_genre )
+    assert( root.allows_text? )
+    t1 = TagTreeScanner::Tag.new( :t1 )
+    root.append_child( t1 )
+    assert_equal( 1, root.child_tags.length )
+    assert_equal( t1, root.child_tags.first )
+    t2 = TagTreeScanner::Tag.new( :t2 )
+    root.append_child( t2 )
+    assert_equal( 2, root.child_tags.length )
+    assert_equal( t2, root.child_tags.last )
+    t3 = TagTreeScanner::Tag.new( :t3 )
+    root.insert_before( t3, t2 )
+    assert_equal( 3, root.child_tags.length )
+    assert_equal( [t1,t3,t2], root.child_tags )
+    root.append_child( t1 )
+    assert_equal( [t3,t2,t1], root.child_tags )
+    t1.replace_with( t3 )
+    assert_equal( [t2,t3], root.child_tags )
+    assert_nil( t1.parent_tag )
+    root.insert_before( t1, t2 )
+    assert_equal( [t1,t2,t3], root.child_tags )
+    assert_equal( root, t1.parent_tag )
+    root.append_child( t1 )
+    assert_equal( [t2,t3,t1], root.child_tags )
+    assert_equal( root, t1.parent_tag )
+    assert_nil( t1.next_sibling )
+    assert_nil( t2.previous_sibling )
+    t1.append_child( t3 )
+    assert_equal( [t2,t1], root.child_tags )
+    assert_nil( t3.next_sibling )
+    assert_nil( t3.previous_sibling )
+    assert_equal( t1, t2.next_sibling )
+    assert_equal( t2, t1.previous_sibling )
+    assert_equal( t3, t1.child_tags.first )
+    assert_raise( RuntimeError ){
+      t3.append_child( t1 )
+    }
+    assert_raise( RuntimeError ){
+      t1.append_child( t1 )
+    }
+  end
+  def test2_tags2
+    root = TagTreeScanner::Tag.new( :root )
+    # make a ton of tags...
+    1.upto(100){ |i|
+      root.append_child( TagTreeScanner::Tag.new( "t#{i}".intern ) )
+    }
+    # ...shuffle the hell out of them...
+    500.times{
+      next unless n1 = root.child_tags[ rand( root.child_tags.length ) ]
+      n2 = root.child_tags[ rand( root.child_tags.length ) ]
+      next if n1 == n2
+      case rand(30)
+        when 0
+          root.remove_child( n1 )
+        when 1
+          root.append_child( n1 )
+        when 2
+          root.insert_before( n1, nil )
+        when 3
+          root.insert_after( n1, nil )
+        when 4
+          root.insert_before( n1, n2 )
+        when 5
+          n1.replace_with( n2 )
+        else
+          root.insert_after( n1, n2 )
+      end
+    }
+    # ...and now ensure that they're all properly linked
+    last_tag = nil
+    root.child_tags.each{ |tag|
+      assert_equal( last_tag, tag.previous_sibling )
+      assert_equal( tag, last_tag.next_sibling ) if last_tag
+      assert_equal( root, tag.parent_tag )
+      last_tag = tag
+    }
+    assert_nil( last_tag.next_sibling ) if last_tag
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,63 @@
+--- !ruby/object:Gem::Specification
+rubygems_version: 0.9.4
+specification_version: 1
+name: tagtreescanner
+version: !ruby/object:Gem::Version
+  version: 0.8.0
+date: 2007-11-25 00:00:00 -07:00
+summary: Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.
+require_paths:
+- lib
+email: phrogz@mac.com
+homepage:
+rubyforge_project: tagtreescanner
+description: The TagTreeScanner class provides a generic framework for creating a nested hierarchy of tags and text (like XML or HTML) by parsing text. An example use (and the reason it was written) is to convert a wiki markup syntax into HTML.
+autorequire:
+default_executable:
+bindir: bin
+has_rdoc: true
+required_ruby_version: !ruby/object:Gem::Version::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 0.0.0
+  version:
+platform: ruby
+signing_key:
+cert_chain:
+post_install_message:
+authors:
+- Gavin Kistner
+files:
+- HISTORY
+- Manifest.txt
+- README
+- Rakefile
+- TODO
+- lib/tagtreescanner.rb
+- test/test_simplemarkup.rb
+- test/test_tagtreescanner.rb
+test_files:
+- test/test_simplemarkup.rb
+- test/test_tagtreescanner.rb
+rdoc_options:
+- --main
+- README.txt
+extra_rdoc_files:
+- Manifest.txt
+executables: []
+extensions: []
+requirements: []
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: hoe
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Version::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.3.0
+    version: