RubyGems - tagtreescanner - Versions diffs - 0.8.0 - Mend

tagtreescanner 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

data/HISTORY +17 -0
data/Manifest.txt +8 -0
data/README +191 -0
data/Rakefile +18 -0
data/TODO +11 -0
data/lib/tagtreescanner.rb +851 -0
data/test/test_simplemarkup.rb +84 -0
data/test/test_tagtreescanner.rb +104 -0
metadata +63 -0

data/HISTORY ADDED Viewed

@@ -0,0 +1,17 @@
+== 0.8.0 / 2007-November-25
+* First release as a gem. Breaks backwards compatibility with older versions.
+* Changed TagTreeScanner::Tag#tag_name to TagTreeScanner::Tag#name
+  * ...because it was dumb to write "tag.tag_name = 'span'"
+* Added a method_missing hack to TagTreeScanner::Tag that delegates
+  to read/write from its attributes hash.
+  * ...because I wanted people to be able to write "tag.href = 'foo'"
+* New TagTreeScanner::Tag#text= method to directly set the contents of
+  a tag, clearing out any other junk.
+== 0.6.1 / 2005-July-5
+* Initial public release

data/Manifest.txt ADDED Viewed

@@ -0,0 +1,8 @@
+HISTORY
+Manifest.txt
+README
+Rakefile
+TODO
+lib/tagtreescanner.rb
+test/test_simplemarkup.rb
+test/test_tagtreescanner.rb

data/README ADDED Viewed

@@ -0,0 +1,191 @@
+<b>TagTreeScanner</b>
+Author::     Gavin Kistner  (mailto:phrogz@mac.com)
+Copyright::  Copyright (c)2005-2007 Gavin Kistner
+License::    MIT License
+Version::    0.8.0 (2007-November-24)
+= Overview
+The TagTreeScanner class provides a generic framework for creating a
+nested hierarchy of tags and text (like XML or HTML) by parsing text. An
+example use (and the reason it was written) is to convert a wiki markup
+syntax into HTML.
+= Example Usage
+  require 'tagtreescanner'
+  class SimpleMarkup < TagTreeScanner
+     @root_factory.allows_text = false
+     @tag_genres[ :root ] = [ ]
+     @tag_genres[ :root ] << TagFactory.new( :paragraph,
+        # A line that doesn't have whitespace at the start
+        :open_match => /(?=\S)/, :open_requires_bol => true,
+        # Close when you see a double return
+        :close_match => /\n[ \t]*\n/,
+        :allows_text => true,
+        :allowed_genre => :inline
+     )
+     @tag_genres[ :root ] << TagFactory.new( :preformatted,
+        # Grab all lines that are indented up until a line that isn't
+        :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
+        :setup => lambda{ |tag, scanner, tagtree|
+           # Throw the contents I found into the tag
+           # but remove leading whitespace
+           tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
+        },
+        :autoclose => :true
+     )
+     @tag_genres[ :inline ] = [ ]
+     @tag_genres[ :inline ] << TagFactory.new( :bold,
+        # An asterisk followed by a letter or number
+        :open_match => /\*(?=[a-z0-9])/i,
+        # Close when I see an asterisk OR a newline coming up
+        :close_match => /\*|(?=\n)/,
+        :allows_text => true,
+        :allowed_genre => :inline
+     )
+     @tag_genres[ :inline ] << TagFactory.new( :italic,
+        # An underscore followed by a letter or number
+        :open_match => /_(?=[a-z0-9])/i,
+        # Close when I see an underscore OR a newline coming up
+        :close_match => /_|(?=\n)/,
+        :allows_text => true,
+        :allowed_genre => :inline
+     )
+  end
+  raw_text = <<ENDINPUT
+  Hello World! You're _soaking in_ my test.
+  This is a *subset* of markup that I allow.
+  Hi paragraph two. Yo! A code sample:
+    def foo
+      puts "Whee!"
+    end
+  _That, as they say, is that._
+  ENDINPUT
+  markup = SimpleMarkup.new( raw_text ).to_xml
+  puts markup
+  #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
+  #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
+  #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
+  #=> <preformatted>def foo
+  #=>   puts "Whee!"
+  #=> end</preformatted>
+  #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
+= Details
+== TagFactories at 10,000 feet
+  Each possible output tag is described by a TagFactory, which specifies
+some or all of the following:
+* The name of the tags it creates <i>(required)</i>
+* The regular expression to look for to start the tag
+* The regular expression to look for to close the tag, or
+* Whether the tag is automatically closed after creation
+* What genre of tags are allowed within the tag
+* Whether the tag supports raw text inside it
+* Code to run when creating a tag
+See the TagFactory class for more information on specifying factories.
+== Genres as a State Machine
+As a new tag is opened, the scanner uses the Tag#allowed_genre property
+of that tag (set by the +allowed_genre+ property on the TagFactory) to
+determine which tags to be looking for. A genre is specified by adding
+an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
+For example:
+  @tag_genres[ :inline ] = [ ]
+adds a new genre named 'inline', with no tags in it. TagFactory instances
+should be pushed onto this array <b>in the order that they should be looked
+for</b>. For example:
+  @tag_genres[ :inline ] << TagFactory.new( :italic,
+    # see the TagFactory#initialize for options
+  )
+Note that the +close_match+ regular expression of the current tag is
+always checked before looking to open/create any new tags.
+== Consuming Text
+As the text is being parsed, there will (probably) be many cases where
+you have raw text that doesn't close or open any new tags. Whenever the
+scanner reaches this state, it runs the <tt>@text_match</tt> regexp
+against the text to move the pointer ahead. If the current tag has
+<tt>Tag#allows_text?</tt> set to +true+ (through
+<tt>TagFactory#allows_text</tt>), then this text is added as contents of
+the tag. If not, the text is thrown away.
+The safest regular expression consumes only one character at a time:
+  @text_match = /./m
+<b><i>It is vital that your regexp match newlines</i></b> (the 'm')
+<b><i>unless every single one of your tags is set to close upon seeing
+a newline.</i></b>
+Unfortunately, the safest regular expression is also the slowest. If
+speed is an issue, your regexp should strive to eat as many characters as
+possible at once...while ensuring that it doesn't eat characters that
+would signify the start of a new tag.
+For example, setting a regexp like:
+  @text_match = /\w+|./m
+allows the scanner to match a whole word at a time. However, if you have
+a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
+the entire string would be eaten as text and the subscript tag would
+never start.
+== Using the Scanner
+As shown in the example above, consumers of your class initialize it by
+passing in the string to be parsed, and then calling #to_xml or #to_html
+on it.
+<i>(This two-step process allows the consumer to run other code after
+the tag parsing, before final conversion. Examples might include
+replacing special command tags with other input, or performing database
+lookups on special wiki-page-link tags and replacing with HTML
+anchors.)</i>
+= Requirements
+TagTreeScanner is built on top of the StringScanner library that is part
+of the standard Ruby installation.
+= License
+(The MIT License)
+Copyright (c) 2005-2007 Gavin Kistner
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+'Software'), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/Rakefile ADDED Viewed

@@ -0,0 +1,18 @@
+# -*- ruby -*-
+require 'rubygems'
+require 'hoe'
+require './lib/tagtreescanner.rb'
+Hoe.new('tagtreescanner', TagTreeScanner::VERSION) do |p|
+  p.rubyforge_name = 'tagtreescanner'
+  p.author = 'Gavin Kistner'
+  p.email  = 'phrogz@mac.com'
+  p.url         = ''
+  p.summary = 'Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.'
+  p.description = IO.read( 'README' )[ /= Overview\n(.+?)^=/m, 1 ].rstrip
+  p.changes     = IO.read( 'HISTORY' )[ /^=[^\n]+\n+(.+?)^=/m, 1 ].rstrip
+  p.remote_rdoc_dir = ''
+end
+# vim: syntax=Ruby

data/TODO ADDED Viewed

@@ -0,0 +1,11 @@
+* Overhaul Tag and TextNode and TagTreeScanner to use a common DOM module
+  like <tt>Phrogz::DOM::OrderedTreeNode</tt>.
+* Allow TagFactories to explicitly specify multiple allowed genres
+  and/or allowed tags, rather than only one genre.
+* Provide a method like inner_html= for parsing and creating tag content.
+  * Useful for batch replacing the contents of a single tag with output from
+    another program, while maintaining the DOM integrity.
+* More unit tests

data/lib/tagtreescanner.rb ADDED Viewed

@@ -0,0 +1,851 @@
+# This file covers the TagTreeScanner class, and the extensions to the
+# String class needed by it.
+# Please see the documentation on those classes for more information.
+#
+# Author::     Gavin Kistner  (mailto:phrogz@mac.com)
+# Copyright::  Copyright (c)2005-2007 Gavin Kistner
+# License::    MIT License
+# Version::    0.8.0 (2007-November-24)
+require 'strscan'
+# = Overview
+# The TagTreeScanner class provides a generic framework for creating a
+# nested hierarchy of tags and text (like XML or HTML) by parsing text. An
+# example use (and the reason it was written) is to convert a wiki markup
+# syntax into HTML.
+#
+# = Example Usage
+#   require 'TagTreeScanner'
+#
+#   class SimpleMarkup < TagTreeScanner
+#      @root_factory.allows_text = false
+#
+#      @tag_genres[ :root ] = [ ]
+#
+#      @tag_genres[ :root ] << TagFactory.new( :paragraph,
+#         # A line that doesn't have whitespace at the start
+#         :open_match => /(?=\S)/, :open_requires_bol => true,
+#
+#         # Close when you see a double return
+#         :close_match => /\n[ \t]*\n/,
+#         :allows_text => true,
+#         :allowed_genre => :inline
+#      )
+#
+#      @tag_genres[ :root ] << TagFactory.new( :preformatted,
+#         # Grab all lines that are indented up until a line that isn't
+#         :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
+#         :setup => lambda{ |tag, scanner, tagtree|
+#            # Throw the contents I found into the tag
+#            # but remove leading whitespace
+#            tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
+#         },
+#         :autoclose => :true
+#      )
+#
+#      @tag_genres[ :inline ] = [ ]
+#
+#      @tag_genres[ :inline ] << TagFactory.new( :bold,
+#         # An asterisk followed by a letter or number
+#         :open_match => /\*(?=[a-z0-9])/i,
+#
+#         # Close when I see an asterisk OR a newline coming up
+#         :close_match => /\*|(?=\n)/,
+#         :allows_text => true,
+#         :allowed_genre => :inline
+#      )
+#
+#      @tag_genres[ :inline ] << TagFactory.new( :italic,
+#         # An underscore followed by a letter or number
+#         :open_match => /_(?=[a-z0-9])/i,
+#
+#         # Close when I see an underscore OR a newline coming up
+#         :close_match => /_|(?=\n)/,
+#         :allows_text => true,
+#         :allowed_genre => :inline
+#      )
+#   end
+#
+#   raw_text = <<ENDINPUT
+#   Hello World! You're _soaking in_ my test.
+#   This is a *subset* of markup that I allow.
+#
+#   Hi paragraph two. Yo! A code sample:
+#
+#     def foo
+#       puts "Whee!"
+#     end
+#
+#   _That, as they say, is that._
+#
+#   ENDINPUT
+#
+#   markup = SimpleMarkup.new( raw_text ).to_xml
+#   puts markup
+#
+#
+#   #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
+#   #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
+#   #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
+#   #=> <preformatted>def foo
+#   #=>   puts "Whee!"
+#   #=> end</preformatted>
+#   #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
+#
+#
+# = Details
+#
+# == TagFactories at 10,000 feet
+# Each possible output tag is described by a TagFactory, which specifies
+# some or all of the following:
+# * The name of the tags it creates <i>(required)</i>
+# * The regular expression to look for to start the tag
+# * The regular expression to look for to close the tag, or
+# * Whether the tag is automatically closed after creation
+# * What genre of tags are allowed within the tag
+# * Whether the tag supports raw text inside it
+# * Code to run when creating a tag
+#
+# See the TagFactory class for more information on specifying factories.
+#
+# == Genres as a State Machine
+# As a new tag is opened, the scanner uses the Tag#allowed_genre property
+# of that tag (set by the +allowed_genre+ property on the TagFactory) to
+# determine which tags to be looking for. A genre is specified by adding
+# an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
+# For example:
+#   @tag_genres[ :inline ] = [ ]
+# adds a new genre named 'inline', with no tags in it. TagFactory instances
+# should be pushed onto this array <b>in the order that they should be looked
+# for</b>. For example:
+#   @tag_genres[ :inline ] << TagFactory.new( :italic,
+#     # see the TagFactory#initialize for options
+#   )
+#
+# Note that the +close_match+ regular expression of the current tag is
+# always checked before looking to open/create any new tags.
+#
+# == Consuming Text
+# As the text is being parsed, there will (probably) be many cases where
+# you have raw text that doesn't close or open any new tags. Whenever the
+# scanner reaches this state, it runs the <tt>@text_match</tt> regexp
+# against the text to move the pointer ahead. If the current tag has
+# <tt>Tag#allows_text?</tt> set to +true+ (through
+# <tt>TagFactory#allows_text</tt>), then this text is added as contents of
+# the tag. If not, the text is thrown away.
+#
+# The safest regular expression consumes only one character at a time:
+#   @text_match = /./m
+#
+# <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
+# <b><i>unless every single one of your tags is set to close upon seeing
+# a newline.</i></b>
+#
+# Unfortunately, the safest regular expression is also the slowest. If
+# speed is an issue, your regexp should strive to eat as many characters as
+# possible at once...while ensuring that it doesn't eat characters that
+# would signify the start of a new tag.
+#
+# For example, setting a regexp like:
+#   @text_match = /\w+|./m
+# allows the scanner to match a whole word at a time. However, if you have
+# a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
+# the entire string would be eaten as text and the subscript tag would
+# never start.
+#
+# == Using the Scanner
+# As shown in the example above, consumers of your class initialize it by
+# passing in the string to be parsed, and then calling #to_xml or #to_html
+# on it.
+#
+# <i>(This two-step process allows the consumer to run other code after
+# the tag parsing, before final conversion. Examples might include
+# replacing special command tags with other input, or performing database
+# lookups on special wiki-page-link tags and replacing with HTML
+# anchors.)</i>
+class TagTreeScanner
+  VERSION = "0.8.0"
+  # A TagFactory holds the information about a specific kind of tag:
+  # * the name of the tag
+  # * what to look for to open and close the tag
+  # * what genre of tags it may contain
+  # * whether the tag permits raw text
+  # * additional code to run when creating the tag
+  #
+  # See the documentation about the <tt>@tag_genres</tt> hash inside
+  # the TagTreeScanner class for information on how to add factories
+  # for use.
+  #
+  # === Utilizing <tt>:autoclose</tt>
+  # Occasionally you will want to
+  # create a tag and allow no other tags inside it. An example might be
+  # a tag containing preformatted code.
+  #
+  # Rather than opening the tag and slowly spinning through all the
+  # text, the combination of the <tt>:autoclose</tt> and
+  # <tt>:setup</tt> options allow you to create the tag, fill it with
+  # content, and then immediately continute with the parent tag.
+  #
+  # See the #new method for how to use the <tt>:setup</tt>
+  # function, and an example usage.
+  class TagFactory
+    # The type of tag this factory produces.
+    attr_accessor :tag_name
+    # A regexp to match (and consume) that causes a new tag to be started.
+    attr_accessor :open_match
+    # Does the #open_match regexp require beginning of line?
+    attr_accessor :open_requires_bol
+    # The regexp which causes the tag to automatically close.
+    attr_accessor :close_match
+    # Does the #open_match regexp require beginning of line?
+    attr_accessor :close_requires_bol
+    # Should this tag stay open when created, or automatically close?
+    attr_accessor :autoclose
+    # A symbol with the genre of tags that are allowed inside the tag.
+    # <i>(See @tag_genres in the TagTreeScanner documentation.)</i>
+    attr_accessor :allowed_genre
+    # May tags created by this factory have text added to them?
+    attr_accessor :allows_text
+    # __tag_name__:: A symbol with the name of the tag to create
+    # __options__:: A hash including one or more of <tt>:open_match</tt>,
+    # <tt>:open_requires_bol</tt>, <tt>:close_match</tt>,
+    # <tt>:close_requires_bol</tt>, <tt>:autoclose</tt>,
+    # <tt>:allows_text</tt>, <tt>:allowed_genre</tt>, and
+    # <tt>:setup</tt>.
+    #
+    # Due to the way the StringScanner class works, placing a <tt>^</tt>
+    # (beginning of line) marker in your <tt>:open_match</tt> or
+    # <tt>:close_match</tt> regular expressions will not behave as
+    # desired. Instead, set the <tt>:open_requires_bol</tt> and/or
+    # <tt>:close_requires_bol</tt> properties to +true+ if desired.
+    #
+    # A factory should either be set to <tt>:autoclose => true</tt>, or
+    # supply a <tt>:close_match</tt>. (Otherwise, it will never close.)
+    #
+    # Further, a factory should either be set to
+    # <tt>:autoclose => true</tt> or specify an <tt>:allowed_genre</tt>.
+    # <i>(See below for how to efficiently create a tag that cannot
+    # contain other tags.)</i>
+    #
+    # The <tt>:setup</tt> option is used to run code during the tag
+    # creation. The value of this option should be a lambda/Proc that
+    # accepts three parameters:
+    # * the <b>Tag</b> being created
+    # * the <b>StringScanner</b> instance that matched the tag opening
+    # * the <b>TagTreeScanner</b> instance creating the tag.
+    #
+    # === Example:
+    #  # Shove URLs as HTML anchors, without the protocol prefix shown
+    #  @tag_genres[ :inline ] << TagFactory.new( :a,
+    #    :open_match => %r{http://(\S+)},
+    #    :setup => lambda{ |tag, ss, tagtree|
+    #      tag.attributes[ :href ] = ss[0]
+    #      tag << ss[1]
+    #    },
+    #    :autoclose => true
+    #  )
+    def initialize( tag_name, options={} )
+      @tag_name = tag_name
+      [ :open_match, :close_match,
+        :open_requires_bol, :close_requires_bol,
+        :allowed_genre, :autoclose,
+        :allows_text,
+        :setup, :attributes ].each{ |k|
+        self.instance_variable_set( "@#{k}".intern, options[ k ] )
+      }
+    end
+    # Creates and returns a new tag if the supplied _string_scanner_
+    # matches the +open_match+ of this factory.
+    #
+    # Called by TagTreeScanner during initialization.
+    def match( string_scanner, tagtreescanner ) #:nodoc:
+      #puts "Matching #{@open_match.inspect} against #{string_scanner.peek(10)}"
+      return nil unless ( !@open_requires_bol || string_scanner.bol? ) && string_scanner.scan( @open_match )
+      tag = maketag
+      @setup.call( tag, string_scanner, tagtreescanner ) if @setup
+      #puts "...created #{tag}"
+      tag
+    end
+    # Creates a tag from the factory manually
+    def create #:nodoc:
+      tag = maketag
+      @setup.call( tag, nil, nil ) if @setup
+      tag
+    end
+    private
+      # DRY common code
+      def maketag #:nodoc:
+        tag = Tag.new( @tag_name )
+        tag.factory = self
+        tag.attributes = @attributes if @attributes
+        tag
+      end
+  end
+  # Tags are the equivalent of a DOM Element. The majority of tags
+  # are created automatically by a TagFactory, but it may be
+  # necessary to create them directly in order to augment or replace
+  # information in the tag tree.
+  #
+  # A Tag may have one or more attributes, which are pairs of
+  # key/value strings; attributes are output in the HTML or XML
+  # representation of the Tag.
+  #
+  # Each tag also has an <tt>info</tt> hash, which may be used to
+  # keep track of extra bits of information about a tag. <i>Example
+  # usages might be keeping track of the depth of a list item, or the
+  # associated section for a header.</i> Information from the +info+
+  # hash is not output in the HTML or XML representations.
+  class Tag
+    # A symbol with the name of this tag
+    attr_accessor :name
+    # An array of child Tag or TextNode instances
+    attr_accessor :child_tags
+    # A hash of key/value attributes to emit in the XML/HTML
+    # representation
+    attr_accessor :attributes
+    # The TagFactory that created this tag (may be +nil+)
+    attr_accessor :factory
+    # A hash that may be used to store extra information about a Tag
+    attr_accessor :info
+    # The Tag to which this tag is attached (may be +nil+)
+    attr_reader :parent_tag
+    # The Tag or TextNode which immediately follows this tag
+    # (may be +nil+ if this is the last tag of its parent)
+    attr_reader :next_sibling
+    # The Tag or TextNode which immediately precedes this tag
+    # (may be +nil+ if this is the first tag of its parent)
+    attr_reader :previous_sibling
+    # _name_::   A symbol with the name of this tag
+    # _attributes_:: A hash of key/value pairs to store with this tag
+    def initialize( name, attributes={} )
+      @name = name
+      @child_tags = [ ]
+      @attributes = attributes
+      @info = {}
+    end
+    # Allows for settings HTML or XML-like attributes directly without
+    # knowing about the _attributes_ collection. For example:
+    #   tag.href  = 'http://www.google.com'
+    #   tag.class = 'external'
+    # is the same as:
+    #   tag.attributes['href']  = 'http://www.google.com'
+    #   tag.attributes['class'] = 'external'
+    # ...for any attributes (like the above) that don't have the same
+    # name as an existing method or attribute on the Tag class.
+    def method_missing( name, *args )
+      if (name=name.to_s) =~ /=$/
+        @attributes[ name[0...-1] ] = (args.size==1 ? args[0] : args )
+      else
+        @attributes[ name ]
+      end
+    end
+    # Returns the +close_match+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def close_match
+      @factory && @factory.close_match
+    end
+    # Returns the +close_requires_bol+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def close_requires_bol?
+      @factory && @factory.close_requires_bol
+    end
+    # Returns the +autoclose+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def autoclose?
+      @factory && @factory.autoclose
+    end
+    # Returns the +allows_text+ property of the owning TagFactory,
+    # or +true+ if this tag wasn't created by a factory.
+    def allows_text?
+      @factory ? @factory.allows_text : true
+    end
+    # Returns the +allowed_genre+ property of the owning TagFactory,
+    # or +nil+ if this tag wasn't created by a factory.
+    def allowed_genre
+      @factory && @factory.allowed_genre
+    end
+    # _new_child_:: The Tag or TextNode to add as the last child.
+    #
+    # Adds _new_child_ to the end of this tag's +child_tags_ collection.
+    # Returns a reference to _new_child_.
+    #
+    # If _new_child_ is a child of another Tag, it is first removed from
+    # that tag.
+    def append_child( new_child )
+      return if new_child == @child_tags.last
+      insert_after( new_child, @child_tags.last )
+    end
+    # _new_child_:: The Tag or TextNode to add as a child of this tag.
+    # _reference_child_:: The child to place _new_child_ before.
+    #
+    # Adds _new_child_ as a child of this tag, immediately before the
+    # location of _reference_child_. Returns a reference to _new_child_.
+    #
+    # If _reference_child_ is +nil+, the child is added as the last
+    # child of this tag. A RuntimeError is raised if _reference_child_
+    # is not a child of this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def insert_before( new_child, reference_child=nil )
+      return new_child if reference_child ? ( reference_child.previous_sibling == new_child ) : ( new_child == @child_tags.last )
+      insert_after( new_child, reference_child ? reference_child.previous_sibling : @child_tags.last )
+    end
+    # _new_child_:: The Tag or TextNode to add as a child of this tag.
+    # _reference_child_:: The child to place _new_child_ after.
+    #
+    # Adds _new_child_ as a child of this tag, immediately after the
+    # location of _reference_child_. Returns a reference to _new_child_.
+    #
+    # If _reference_child_ is +nil+, the child is added as the first
+    # child of this tag. A RuntimeError is raised if _reference_child_
+    # is not a child of this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def insert_after( new_child, reference_child=nil )
+      #puts "#{self.inspect}#insert_after( #{new_child.inspect}, #{reference_child.inspect} )"
+      return new_child if reference_child ? ( reference_child.next_sibling == new_child ) : ( new_child == @child_tags.first )
+      #Ensure new_child is not not an ancestor of self
+      walker = self
+      while walker
+        raise "#{new_child.inspect} cannot be added under #{self.inspect}, because it is an ancestor of it!" if walker==new_child
+        walker = walker.parent_tag
+      end
+      new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
+      if reference_child
+        new_idx = @child_tags.index( reference_child )
+        raise "#{reference_child.inspect} is not a child of #{self.inspect}" unless new_idx
+        new_idx += 1
+      else
+        new_idx = 0
+      end
+      new_child.parent_tag = self
+      succ = @child_tags[ new_idx ]
+      @child_tags.insert( new_idx, new_child )
+      new_child.previous_sibling = reference_child
+      reference_child.next_sibling = new_child if reference_child
+      new_child.next_sibling = succ
+      succ.previous_sibling = new_child if succ
+      new_child
+    end
+    # _existing_child_:: The Tag or TextNode to remove.
+    #
+    # Removes _existing_child_ from being a child of this tag.
+    # Returns _existing_child_.
+    #
+    # A RuntimeError is raised if _existing_child_ is not a child of
+    # this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def remove_child( existing_child )
+      idx = @child_tags.index( existing_child )
+      raise "#{existing_child.inspect} is not a child of #{self.inspect}" unless idx
+      prev, succ = existing_child.previous_sibling, existing_child.next_sibling
+      prev.next_sibling = succ if prev
+      succ.previous_sibling = prev if succ
+      @child_tags.delete_at( idx )
+      existing_child.previous_sibling = existing_child.next_sibling = existing_child.parent_tag = nil
+      existing_child
+    end
+    # _old_child_:: The existing child Tag or TextNode to replace.
+    # _new_child_:: The Tag or TextNode to replace _old_child_.
+    #
+    # Replaces _old_child_ with _new_child_ in this collection.
+    # Returns _old_child_.
+    #
+    # A RuntimeError is raised if _existing_child_ is not a child of
+    # this tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def replace_child( old_child, new_child )
+      if ( prev = old_child.previous_sibling ) == new_child || old_child.next_sibling == new_child
+        remove_child( old_child )
+      else
+        new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
+        remove_child( old_child )
+        insert_after( new_child, prev )
+      end
+      old_child
+    end
+    # _new_child_:: The Tag or TextNode to replace this tag.
+    #
+    # Replaces this tag with _new_child_. Returns _new_child_.
+    #
+    # A RuntimeError is raised if this tag is not a child of another tag.
+    #
+    # If _new_child_ is a child of another Tag, #remove_child is
+    # automatically invoked to remove it from that tag.
+    def replace_with( new_child )
+      return new_child if new_child == self
+      raise "#{self.inspect} is not a child of another tag" unless @parent_tag
+      @parent_tag.replace_child( self, new_child )
+      new_child
+    end
+    # _additional_text_:: The text to add to this node.
+    #
+    # Appends _additional_text_ to this tag. If the last item in the
+    # +child_tags+ collection is a TextNode, the text is added to that
+    # item; otherwise, a new TextNode is created with _additional_text_
+    # and added as the last child of this tag.
+    def << ( additional_text )
+      last_child = @child_tags.last
+      if last_child.is_a? TextNode
+        last_child << additional_text
+      else
+        append_child( TextNode.new( additional_text ) )
+      end
+    end
+    # Set the text content of this element to _new_contents_
+    # Removes any child tags (and their text)
+    def text=( new_contents )
+      @child_tags.clear
+      append_child( TextNode.new( new_contents ) )
+    end
+    alias_method :inner_text=, :text=
+    # Returns an HTML representation of this tag and all its descendants.
+    #
+    # This method is the same as #to_xml except that tags without
+    # any +child_tags+ use an explicit close tag, e.g.
+    # <tt><div></div></tt> instead of XML's <tt><div /></tt>
+    def to_html
+      to_xml( false )
+    end
+    # Returns an XML representation of this tag and all its descendants.
+    #
+    # If _empty_tags_collapsed_ is +true+ (the default) then this method
+    # is the same as #to_html except that tags without any +child_tags+
+    # use a single closed tag, e.g.
+    # <tt><div /></tt> instead of HTML's <tt><div></div></tt>
+    #
+    # If _empty_tags_collapsed_ is +false+, this is the same as #to_html.
+    def to_xml( empty_tags_collapsed=true )
+      out = "<#{@name}"
+      @attributes.each{ |k,v| out << " #{k}=\"#{v.to_s.gsub( '""', '&quot;' )}\"" }
+      if empty_tags_collapsed && @child_tags.empty?
+        out << ' />'
+      else
+        out << '>'
+        unless @child_tags.empty?
+          out << "\n" unless self.allows_text?
+          @child_tags.each{ |tag|
+            out << tag.to_xml( empty_tags_collapsed )
+          }
+        end
+        out << "</#{@name}>"
+      end
+      out << "\n" if @parent_tag && !@parent_tag.allows_text?
+      out
+    end
+    # Returns an array of all descendants of this tag whose #name
+    # matches the supplied _name_.
+    def tags_by_name( name )
+      out = []
+      @child_tags.each{ |tag|
+        out << tag if tag.name == name
+        unless tag.child_tags.empty?
+          out.concat( tag.tags_by_name( name ) )
+        end
+      }
+      out
+    end
+    # Returns the text contents of this tag and its descendants.
+    def inner_text
+      @child_tags.inject(''){ |out,tag|
+        out << ( tag.is_a?( TextNode ) ? tag.text : tag.inner_text )
+      }
+    end
+    def inspect #:nodoc:
+      out = "<#{@name}"
+      #out << " @pops=#{@parent_tag ? @parent_tag.name.inspect : 'nil'}"
+      #out << " @prev=#{@previous_sibling ? @previous_sibling.name.inspect : 'nil'}"
+      #out << " @next=#{@next_sibling ? @next_sibling.name.inspect : 'nil'}"
+      @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
+      @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
+      children = @child_tags.length
+      if children == 1 && TextNode === @child_tags.first
+        out << ">#{@child_tags.first}</#{@name}"
+      elsif children == 0
+        out << '>'
+      else
+        out << " (#{@child_tags.length} child#{@child_tags.length != 1 ? 'ren' : ''})>"
+      end
+    end
+    # _level_:: The indentation level (tabs) to start at.
+    #
+    # Returns a full-hierarchical representation of this tag and its
+    # descendants. (Used for debugging.)
+    def to_hier( level=0 ) #:nodoc:
+      tabs = "\t" * level
+      out = "#{tabs}<#{@name}"
+      @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
+      @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
+      if @child_tags.empty?
+        out << " />\n"
+      elsif @child_tags.length == 1 && TextNode === @child_tags.first
+        out << ">#{@child_tags.first}</#{@name}>\n"
+      else
+        out << ">\n"
+        @child_tags.each{ |n| out << n.to_hier(level+1) }
+        out << "#{tabs}</#{@name}>\n"
+      end
+      out
+    end
+    # Returns a copy of this tag and its entire hierarchy.
+    # All descendant tags/text nodes are also cloned.
+    #
+    # The +info+ hash is not preserved.
+    def dup
+      tag = self.class.new( self.name, self.attributes.dup )
+      @child_tags.each{ |tag2| tag.append_child( tag2.dup ) }
+      tag
+    end
+    # :stopdoc:
+    protected
+      attr_writer :previous_sibling, :next_sibling, :parent_tag
+    # :startdoc:
+  end
+  # A TextNode holds raw text inside a Tag. Generally, TextNodes are
+  # created automatically by the Tag#<< method.
+  class TextNode
+    # The Tag or TextNode that comes after this one (may be +nil+)
+    attr_accessor :next_sibling
+    # The Tag or TextNode that comes before this one (may be +nil+)
+    attr_accessor :previous_sibling
+    # The Tag that is a parent of this TextNode (may be +nil+)
+    attr_accessor :parent_tag
+    # A hash which may be used to store 'extra' information
+    attr_accessor :info
+    # The string contents of this text node
+    attr_accessor :text
+    # _text_:: The text to start out with
+    def initialize( text='' )
+      @text = text
+      @info = {}
+    end
+    # _additional_text_:: The text to add
+    #
+    # Appends the provided text to the end of the current text
+    #
+    # Returns the new text value
+    def << ( additional_text )
+      @text << additional_text
+    end
+    # Returns a copy of this text node
+    def dup
+      tag = self.class.new( @text.dup )
+    end
+    def to_hier( level=0 ) #:nodoc:
+      "#{"\t"*level}#{@text.inspect}\n"
+    end
+    def to_s #:nodoc:
+      @text
+    end
+    # Returns the contents of this node, modified to be made XML-safe
+    # by calling String#xmlsafe.
+    def to_xml( *args )
+      @text.xmlsafe
+    end
+  end
+  # RDoc thinks that this stuff applies to instances, not the class
+  # :stopdoc:
+  class << self
+    attr_accessor :tag_genres, :root_factory, :text_match
+  end
+  # :startdoc:
+  # The tag_genres hash maps a genre name onto an array of TagFactories.
+  #
+  # Factories are tested in the order they appear in the genre array;
+  # more important matches are at the top, generic fallback ones
+  # should appear at the end of the list.
+  #
+  # If no factory matches the current input, then text is shoved into the
+  # most recent tag until a new tag start is found, or the closing match
+  # is met. (If the current tag's factory does not have :allows_text set
+  # to true, then the text is simply thrown away until a the closing or
+  # new tag start is found.)
+  @tag_genres = { }
+  # Settings for the root of your document: what genre is allowed at the
+  # highest level, and should raw text be allowed there?
+  #
+  # Override in your class by setting a class-instance variable as below.
+  @root_factory = TagFactory.new( :root,
+    :allowed_genre => :root,
+    :allows_text => true )
+  # The pattern to consume and shove as text whenever no tag start/close
+  # is found. Eating one character at a time is safest, but slow.
+  # Ensure that this pattern never lets you over the start of a tag,
+  # or else you'll miss it.
+  @text_match = /./m
+  # Scans through _string_to_parse_ and builds a tree of tags based
+  # on the regular expressions and rules set by the TagFactory
+  # instances present in <tt>@tag_genres</tt>.
+  #
+  # After parsing the tree, call #to_xml or #to_html to retrieve
+  # a string representation.
+  def initialize( string_to_parse )
+    current = @root = self.class.root_factory.create
+    tag_genres = self.class.tag_genres
+    text_match = self.class.text_match
+    ss = StringScanner.new( string_to_parse )
+    while !ss.eos?
+      # Keep popping off the current tag until we get to the root,
+      # as long as the end criteria is met
+      while ( current != @root ) && (!current.close_requires_bol? || ss.bol?) && ss.scan( current.close_match )
+        current = current.parent_tag || @root
+      end
+      # No point in continuing if closing out tags consumed the rest of the string
+      break if ss.eos?
+      # Look for a tag to open
+      if factories = tag_genres[ current.allowed_genre ]
+        tag = nil
+        factories.each{ |factory|
+          if tag = factory.match( ss, self )
+            current.append_child( tag )
+            current = tag unless tag.autoclose?
+            break
+          end
+        }
+        #start at the top of the loop if we found one
+        next if tag
+      end
+      # Couldn't find a valid tag at this spot
+      # so we need to eat some characters
+      consumed = ss.scan( text_match )
+      current << consumed if current.allows_text?
+    end
+  end
+  # Returns an HTML representation of the tag tree.
+  #
+  # This is the same as the #to_xml method except that empty tags use an
+  # explicit close tag, e.g. <tt><div></div></tt> versus <tt><div /></tt>
+  def to_html
+    @root.child_tags.inject(''){ |out, tag| out << tag.to_html }
+  end
+  # Returns an XML representation of the tag tree.
+  #
+  # This method is the same as the #to_html method except that empty tags
+  # do not use an explicit close tag,
+  # e.g. <tt><div /></tt> versus <tt><div></div></tt>
+  def to_xml
+    @root.child_tags.inject(''){ |out, tag| out << tag.to_xml }
+  end
+  # Returns an array of all root-level tags found
+  def tags
+    @root.child_tags
+  end
+  # Returns an array of all tags in the tree whose Tag#name matches
+  # the supplied _name_.
+  def tags_by_name( name )
+    @root.tags_by_type( name )
+  end
+  # Returns a hierarchical representation of the entire tag tree
+  def inspect #:nodoc:
+    @root.to_hier
+  end
+  # When a class inherits from TagTreeScanner, defaults are set for
+  # <tt>@tag_genres</tt>, <tt>@root_factory</tt> and
+  # <tt>@text_match</tt>
+  def self.inherited( child_class ) #:nodoc:
+    child_class.tag_genres = @tag_genres
+    child_class.root_factory = @root_factory
+    child_class.text_match = @text_match
+  end
+end
+# Extensions to the built-in String class
+class String
+  # Returns a copy of the string with all <tt>&</tt>, <tt><</tt> and
+  # <tt>></tt> characters replaced by their equivalent XML entities
+  # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
+  def xmlsafe
+    self.dup.xmlsafe!
+  end
+  # Modifies the string, replacing all <tt>&</tt>, <tt><</tt> and
+  # <tt>></tt> characters with their equivalent XML entities
+  # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
+  def xmlsafe!
+    self.gsub!( /&/, '&amp;' )
+    self.gsub!( /</, '&lt;' )
+    self.gsub!( />/, '&gt;' )
+    self
+  end
+end

data/test/test_simplemarkup.rb ADDED Viewed

@@ -0,0 +1,84 @@
+require "test/unit"
+require "../lib/tagtreescanner.rb"
+class SimpleMarkup < TagTreeScanner
+	@root_factory.allows_text = false
+	@tag_genres[ :root ] = [ ]
+	@tag_genres[ :root ] << TagFactory.new( :paragraph,
+		# A line that doesn't have whitespace at the start
+		:open_match => /(?=\S)/, :open_requires_bol => true,
+		# Close when you see a double return
+		:close_match => /\n[ \t]*\n/,
+		:allows_text => :true,
+		:allowed_genre => :inline
+	)
+	@tag_genres[ :root ] << TagFactory.new( :preformatted,
+		# Grab all lines that are indented up until a line that isn't
+		:open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
+		:setup => lambda{ |tag, scanner, tagtree|
+			# Throw the contents I found into the tag
+			# but remove leading whitespace
+			tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
+		},
+		:autoclose => :true
+	)
+	@tag_genres[ :inline ] = [ ]
+	@tag_genres[ :inline ] << TagFactory.new( :bold,
+		# An asterisk followed by a letter or number
+		:open_match => /\*(?=[a-z0-9])/i,
+		# Close when I see an asterisk OR a newline coming up
+		:close_match => /\*|(?=\n)/,
+		:allows_text => true,
+		:allowed_genre => :inline
+	)
+	@tag_genres[ :inline ] << TagFactory.new( :italic,
+		# An underscore followed by a letter or number
+		:open_match => /_(?=[a-z0-9])/i,
+		# Close when I see an underscore OR a newline coming up
+		:close_match => /_|(?=\n)/,
+		:allows_text => true,
+		:allowed_genre => :inline
+	)
+end
+class Tag_Test < Test::Unit::TestCase
+  def setup
+  end
+  def test_conversion
+    raw_text = <<-ENDINPUT
+    Hello World! You're _soaking in_ my test.
+    This is a *subset* of markup that I allow.
+    Hi paragraph two. Yo! A code sample:
+      def foo
+        puts "Whee!"
+      end
+    _That, as they say, is that._
+    ENDINPUT
+    markup = SimpleMarkup.new( raw_text ).to_xml
+    p '',markup
+  end
+end
+#=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
+#=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
+#=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
+#=> <preformatted>def foo
+#=>   puts "Whee!"
+#=> end</preformatted>
+#=> <paragraph><italic>That, as they say, is that.</italic></paragraph>

data/test/test_tagtreescanner.rb ADDED Viewed

@@ -0,0 +1,104 @@
+require "test/unit"
+require "../lib/tagtreescanner"
+class Tag_Test < Test::Unit::TestCase
+  def setup
+  end
+  def test1_tags
+    root = TagTreeScanner::Tag.new( :root, { :is_root => true } )
+    assert_equal( :root, root.name )
+    assert_equal( true, root.attributes[ :is_root ] )
+    assert_nil( root.allowed_genre )
+    assert( root.allows_text? )
+    t1 = TagTreeScanner::Tag.new( :t1 )
+    root.append_child( t1 )
+    assert_equal( 1, root.child_tags.length )
+    assert_equal( t1, root.child_tags.first )
+    t2 = TagTreeScanner::Tag.new( :t2 )
+    root.append_child( t2 )
+    assert_equal( 2, root.child_tags.length )
+    assert_equal( t2, root.child_tags.last )
+    t3 = TagTreeScanner::Tag.new( :t3 )
+    root.insert_before( t3, t2 )
+    assert_equal( 3, root.child_tags.length )
+    assert_equal( [t1,t3,t2], root.child_tags )
+    root.append_child( t1 )
+    assert_equal( [t3,t2,t1], root.child_tags )
+    t1.replace_with( t3 )
+    assert_equal( [t2,t3], root.child_tags )
+    assert_nil( t1.parent_tag )
+    root.insert_before( t1, t2 )
+    assert_equal( [t1,t2,t3], root.child_tags )
+    assert_equal( root, t1.parent_tag )
+    root.append_child( t1 )
+    assert_equal( [t2,t3,t1], root.child_tags )
+    assert_equal( root, t1.parent_tag )
+    assert_nil( t1.next_sibling )
+    assert_nil( t2.previous_sibling )
+    t1.append_child( t3 )
+    assert_equal( [t2,t1], root.child_tags )
+    assert_nil( t3.next_sibling )
+    assert_nil( t3.previous_sibling )
+    assert_equal( t1, t2.next_sibling )
+    assert_equal( t2, t1.previous_sibling )
+    assert_equal( t3, t1.child_tags.first )
+    assert_raise( RuntimeError ){
+      t3.append_child( t1 )
+    }
+    assert_raise( RuntimeError ){
+      t1.append_child( t1 )
+    }
+  end
+  def test2_tags2
+    root = TagTreeScanner::Tag.new( :root )
+    # make a ton of tags...
+    1.upto(100){ |i|
+      root.append_child( TagTreeScanner::Tag.new( "t#{i}".intern ) )
+    }
+    # ...shuffle the hell out of them...
+    500.times{
+      next unless n1 = root.child_tags[ rand( root.child_tags.length ) ]
+      n2 = root.child_tags[ rand( root.child_tags.length ) ]
+      next if n1 == n2
+      case rand(30)
+        when 0
+          root.remove_child( n1 )
+        when 1
+          root.append_child( n1 )
+        when 2
+          root.insert_before( n1, nil )
+        when 3
+          root.insert_after( n1, nil )
+        when 4
+          root.insert_before( n1, n2 )
+        when 5
+          n1.replace_with( n2 )
+        else
+          root.insert_after( n1, n2 )
+      end
+    }
+    # ...and now ensure that they're all properly linked
+    last_tag = nil
+    root.child_tags.each{ |tag|
+      assert_equal( last_tag, tag.previous_sibling )
+      assert_equal( tag, last_tag.next_sibling ) if last_tag
+      assert_equal( root, tag.parent_tag )
+      last_tag = tag
+    }
+    assert_nil( last_tag.next_sibling ) if last_tag
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,63 @@
+--- !ruby/object:Gem::Specification
+rubygems_version: 0.9.4
+specification_version: 1
+name: tagtreescanner
+version: !ruby/object:Gem::Version
+  version: 0.8.0
+date: 2007-11-25 00:00:00 -07:00
+summary: Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.
+require_paths:
+- lib
+email: phrogz@mac.com
+homepage:
+rubyforge_project: tagtreescanner
+description: The TagTreeScanner class provides a generic framework for creating a nested hierarchy of tags and text (like XML or HTML) by parsing text. An example use (and the reason it was written) is to convert a wiki markup syntax into HTML.
+autorequire:
+default_executable:
+bindir: bin
+has_rdoc: true
+required_ruby_version: !ruby/object:Gem::Version::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 0.0.0
+  version:
+platform: ruby
+signing_key:
+cert_chain:
+post_install_message:
+authors:
+- Gavin Kistner
+files:
+- HISTORY
+- Manifest.txt
+- README
+- Rakefile
+- TODO
+- lib/tagtreescanner.rb
+- test/test_simplemarkup.rb
+- test/test_tagtreescanner.rb
+test_files:
+- test/test_simplemarkup.rb
+- test/test_tagtreescanner.rb
+rdoc_options:
+- --main
+- README.txt
+extra_rdoc_files:
+- Manifest.txt
+executables: []
+extensions: []
+requirements: []
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: hoe
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Version::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.3.0
+    version: