tagtreescanner 0.8.0
Sign up to get free protection for your applications and to get access to all the features.
- data/HISTORY +17 -0
- data/Manifest.txt +8 -0
- data/README +191 -0
- data/Rakefile +18 -0
- data/TODO +11 -0
- data/lib/tagtreescanner.rb +851 -0
- data/test/test_simplemarkup.rb +84 -0
- data/test/test_tagtreescanner.rb +104 -0
- metadata +63 -0
data/HISTORY
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
== 0.8.0 / 2007-November-25
|
2
|
+
|
3
|
+
* First release as a gem. Breaks backwards compatibility with older versions.
|
4
|
+
|
5
|
+
* Changed TagTreeScanner::Tag#tag_name to TagTreeScanner::Tag#name
|
6
|
+
* ...because it was dumb to write "tag.tag_name = 'span'"
|
7
|
+
|
8
|
+
* Added a method_missing hack to TagTreeScanner::Tag that delegates
|
9
|
+
to read/write from its attributes hash.
|
10
|
+
* ...because I wanted people to be able to write "tag.href = 'foo'"
|
11
|
+
|
12
|
+
* New TagTreeScanner::Tag#text= method to directly set the contents of
|
13
|
+
a tag, clearing out any other junk.
|
14
|
+
|
15
|
+
== 0.6.1 / 2005-July-5
|
16
|
+
|
17
|
+
* Initial public release
|
data/Manifest.txt
ADDED
data/README
ADDED
@@ -0,0 +1,191 @@
|
|
1
|
+
<b>TagTreeScanner</b>
|
2
|
+
|
3
|
+
Author:: Gavin Kistner (mailto:phrogz@mac.com)
|
4
|
+
Copyright:: Copyright (c)2005-2007 Gavin Kistner
|
5
|
+
License:: MIT License
|
6
|
+
Version:: 0.8.0 (2007-November-24)
|
7
|
+
|
8
|
+
= Overview
|
9
|
+
|
10
|
+
The TagTreeScanner class provides a generic framework for creating a
|
11
|
+
nested hierarchy of tags and text (like XML or HTML) by parsing text. An
|
12
|
+
example use (and the reason it was written) is to convert a wiki markup
|
13
|
+
syntax into HTML.
|
14
|
+
|
15
|
+
= Example Usage
|
16
|
+
require 'tagtreescanner'
|
17
|
+
|
18
|
+
class SimpleMarkup < TagTreeScanner
|
19
|
+
@root_factory.allows_text = false
|
20
|
+
|
21
|
+
@tag_genres[ :root ] = [ ]
|
22
|
+
|
23
|
+
@tag_genres[ :root ] << TagFactory.new( :paragraph,
|
24
|
+
# A line that doesn't have whitespace at the start
|
25
|
+
:open_match => /(?=\S)/, :open_requires_bol => true,
|
26
|
+
|
27
|
+
# Close when you see a double return
|
28
|
+
:close_match => /\n[ \t]*\n/,
|
29
|
+
:allows_text => true,
|
30
|
+
:allowed_genre => :inline
|
31
|
+
)
|
32
|
+
|
33
|
+
@tag_genres[ :root ] << TagFactory.new( :preformatted,
|
34
|
+
# Grab all lines that are indented up until a line that isn't
|
35
|
+
:open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
|
36
|
+
:setup => lambda{ |tag, scanner, tagtree|
|
37
|
+
# Throw the contents I found into the tag
|
38
|
+
# but remove leading whitespace
|
39
|
+
tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
|
40
|
+
},
|
41
|
+
:autoclose => :true
|
42
|
+
)
|
43
|
+
|
44
|
+
@tag_genres[ :inline ] = [ ]
|
45
|
+
|
46
|
+
@tag_genres[ :inline ] << TagFactory.new( :bold,
|
47
|
+
# An asterisk followed by a letter or number
|
48
|
+
:open_match => /\*(?=[a-z0-9])/i,
|
49
|
+
|
50
|
+
# Close when I see an asterisk OR a newline coming up
|
51
|
+
:close_match => /\*|(?=\n)/,
|
52
|
+
:allows_text => true,
|
53
|
+
:allowed_genre => :inline
|
54
|
+
)
|
55
|
+
|
56
|
+
@tag_genres[ :inline ] << TagFactory.new( :italic,
|
57
|
+
# An underscore followed by a letter or number
|
58
|
+
:open_match => /_(?=[a-z0-9])/i,
|
59
|
+
|
60
|
+
# Close when I see an underscore OR a newline coming up
|
61
|
+
:close_match => /_|(?=\n)/,
|
62
|
+
:allows_text => true,
|
63
|
+
:allowed_genre => :inline
|
64
|
+
)
|
65
|
+
end
|
66
|
+
|
67
|
+
raw_text = <<ENDINPUT
|
68
|
+
Hello World! You're _soaking in_ my test.
|
69
|
+
This is a *subset* of markup that I allow.
|
70
|
+
|
71
|
+
Hi paragraph two. Yo! A code sample:
|
72
|
+
|
73
|
+
def foo
|
74
|
+
puts "Whee!"
|
75
|
+
end
|
76
|
+
|
77
|
+
_That, as they say, is that._
|
78
|
+
|
79
|
+
ENDINPUT
|
80
|
+
|
81
|
+
markup = SimpleMarkup.new( raw_text ).to_xml
|
82
|
+
puts markup
|
83
|
+
|
84
|
+
|
85
|
+
#=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
|
86
|
+
#=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
|
87
|
+
#=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
|
88
|
+
#=> <preformatted>def foo
|
89
|
+
#=> puts "Whee!"
|
90
|
+
#=> end</preformatted>
|
91
|
+
#=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
|
92
|
+
|
93
|
+
= Details
|
94
|
+
|
95
|
+
== TagFactories at 10,000 feet
|
96
|
+
Each possible output tag is described by a TagFactory, which specifies
|
97
|
+
some or all of the following:
|
98
|
+
* The name of the tags it creates <i>(required)</i>
|
99
|
+
* The regular expression to look for to start the tag
|
100
|
+
* The regular expression to look for to close the tag, or
|
101
|
+
* Whether the tag is automatically closed after creation
|
102
|
+
* What genre of tags are allowed within the tag
|
103
|
+
* Whether the tag supports raw text inside it
|
104
|
+
* Code to run when creating a tag
|
105
|
+
|
106
|
+
See the TagFactory class for more information on specifying factories.
|
107
|
+
|
108
|
+
== Genres as a State Machine
|
109
|
+
As a new tag is opened, the scanner uses the Tag#allowed_genre property
|
110
|
+
of that tag (set by the +allowed_genre+ property on the TagFactory) to
|
111
|
+
determine which tags to be looking for. A genre is specified by adding
|
112
|
+
an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
|
113
|
+
For example:
|
114
|
+
@tag_genres[ :inline ] = [ ]
|
115
|
+
adds a new genre named 'inline', with no tags in it. TagFactory instances
|
116
|
+
should be pushed onto this array <b>in the order that they should be looked
|
117
|
+
for</b>. For example:
|
118
|
+
@tag_genres[ :inline ] << TagFactory.new( :italic,
|
119
|
+
# see the TagFactory#initialize for options
|
120
|
+
)
|
121
|
+
|
122
|
+
Note that the +close_match+ regular expression of the current tag is
|
123
|
+
always checked before looking to open/create any new tags.
|
124
|
+
|
125
|
+
== Consuming Text
|
126
|
+
As the text is being parsed, there will (probably) be many cases where
|
127
|
+
you have raw text that doesn't close or open any new tags. Whenever the
|
128
|
+
scanner reaches this state, it runs the <tt>@text_match</tt> regexp
|
129
|
+
against the text to move the pointer ahead. If the current tag has
|
130
|
+
<tt>Tag#allows_text?</tt> set to +true+ (through
|
131
|
+
<tt>TagFactory#allows_text</tt>), then this text is added as contents of
|
132
|
+
the tag. If not, the text is thrown away.
|
133
|
+
|
134
|
+
The safest regular expression consumes only one character at a time:
|
135
|
+
@text_match = /./m
|
136
|
+
|
137
|
+
<b><i>It is vital that your regexp match newlines</i></b> (the 'm')
|
138
|
+
<b><i>unless every single one of your tags is set to close upon seeing
|
139
|
+
a newline.</i></b>
|
140
|
+
|
141
|
+
Unfortunately, the safest regular expression is also the slowest. If
|
142
|
+
speed is an issue, your regexp should strive to eat as many characters as
|
143
|
+
possible at once...while ensuring that it doesn't eat characters that
|
144
|
+
would signify the start of a new tag.
|
145
|
+
|
146
|
+
For example, setting a regexp like:
|
147
|
+
@text_match = /\w+|./m
|
148
|
+
allows the scanner to match a whole word at a time. However, if you have
|
149
|
+
a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
|
150
|
+
the entire string would be eaten as text and the subscript tag would
|
151
|
+
never start.
|
152
|
+
|
153
|
+
== Using the Scanner
|
154
|
+
As shown in the example above, consumers of your class initialize it by
|
155
|
+
passing in the string to be parsed, and then calling #to_xml or #to_html
|
156
|
+
on it.
|
157
|
+
|
158
|
+
<i>(This two-step process allows the consumer to run other code after
|
159
|
+
the tag parsing, before final conversion. Examples might include
|
160
|
+
replacing special command tags with other input, or performing database
|
161
|
+
lookups on special wiki-page-link tags and replacing with HTML
|
162
|
+
anchors.)</i>
|
163
|
+
|
164
|
+
= Requirements
|
165
|
+
TagTreeScanner is built on top of the StringScanner library that is part
|
166
|
+
of the standard Ruby installation.
|
167
|
+
|
168
|
+
= License
|
169
|
+
|
170
|
+
(The MIT License)
|
171
|
+
|
172
|
+
Copyright (c) 2005-2007 Gavin Kistner
|
173
|
+
|
174
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
175
|
+
a copy of this software and associated documentation files (the
|
176
|
+
'Software'), to deal in the Software without restriction, including
|
177
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
178
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
179
|
+
permit persons to whom the Software is furnished to do so, subject to
|
180
|
+
the following conditions:
|
181
|
+
|
182
|
+
The above copyright notice and this permission notice shall be
|
183
|
+
included in all copies or substantial portions of the Software.
|
184
|
+
|
185
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
|
186
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
187
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
188
|
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
189
|
+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
190
|
+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
191
|
+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/Rakefile
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
# -*- ruby -*-
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'hoe'
|
5
|
+
require './lib/tagtreescanner.rb'
|
6
|
+
|
7
|
+
Hoe.new('tagtreescanner', TagTreeScanner::VERSION) do |p|
|
8
|
+
p.rubyforge_name = 'tagtreescanner'
|
9
|
+
p.author = 'Gavin Kistner'
|
10
|
+
p.email = 'phrogz@mac.com'
|
11
|
+
p.url = ''
|
12
|
+
p.summary = 'Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.'
|
13
|
+
p.description = IO.read( 'README' )[ /= Overview\n(.+?)^=/m, 1 ].rstrip
|
14
|
+
p.changes = IO.read( 'HISTORY' )[ /^=[^\n]+\n+(.+?)^=/m, 1 ].rstrip
|
15
|
+
p.remote_rdoc_dir = ''
|
16
|
+
end
|
17
|
+
|
18
|
+
# vim: syntax=Ruby
|
data/TODO
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
* Overhaul Tag and TextNode and TagTreeScanner to use a common DOM module
|
2
|
+
like <tt>Phrogz::DOM::OrderedTreeNode</tt>.
|
3
|
+
|
4
|
+
* Allow TagFactories to explicitly specify multiple allowed genres
|
5
|
+
and/or allowed tags, rather than only one genre.
|
6
|
+
|
7
|
+
* Provide a method like inner_html= for parsing and creating tag content.
|
8
|
+
* Useful for batch replacing the contents of a single tag with output from
|
9
|
+
another program, while maintaining the DOM integrity.
|
10
|
+
|
11
|
+
* More unit tests
|
@@ -0,0 +1,851 @@
|
|
1
|
+
# This file covers the TagTreeScanner class, and the extensions to the
|
2
|
+
# String class needed by it.
|
3
|
+
# Please see the documentation on those classes for more information.
|
4
|
+
#
|
5
|
+
# Author:: Gavin Kistner (mailto:phrogz@mac.com)
|
6
|
+
# Copyright:: Copyright (c)2005-2007 Gavin Kistner
|
7
|
+
# License:: MIT License
|
8
|
+
# Version:: 0.8.0 (2007-November-24)
|
9
|
+
|
10
|
+
require 'strscan'
|
11
|
+
|
12
|
+
# = Overview
|
13
|
+
# The TagTreeScanner class provides a generic framework for creating a
|
14
|
+
# nested hierarchy of tags and text (like XML or HTML) by parsing text. An
|
15
|
+
# example use (and the reason it was written) is to convert a wiki markup
|
16
|
+
# syntax into HTML.
|
17
|
+
#
|
18
|
+
# = Example Usage
|
19
|
+
# require 'TagTreeScanner'
|
20
|
+
#
|
21
|
+
# class SimpleMarkup < TagTreeScanner
|
22
|
+
# @root_factory.allows_text = false
|
23
|
+
#
|
24
|
+
# @tag_genres[ :root ] = [ ]
|
25
|
+
#
|
26
|
+
# @tag_genres[ :root ] << TagFactory.new( :paragraph,
|
27
|
+
# # A line that doesn't have whitespace at the start
|
28
|
+
# :open_match => /(?=\S)/, :open_requires_bol => true,
|
29
|
+
#
|
30
|
+
# # Close when you see a double return
|
31
|
+
# :close_match => /\n[ \t]*\n/,
|
32
|
+
# :allows_text => true,
|
33
|
+
# :allowed_genre => :inline
|
34
|
+
# )
|
35
|
+
#
|
36
|
+
# @tag_genres[ :root ] << TagFactory.new( :preformatted,
|
37
|
+
# # Grab all lines that are indented up until a line that isn't
|
38
|
+
# :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
|
39
|
+
# :setup => lambda{ |tag, scanner, tagtree|
|
40
|
+
# # Throw the contents I found into the tag
|
41
|
+
# # but remove leading whitespace
|
42
|
+
# tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
|
43
|
+
# },
|
44
|
+
# :autoclose => :true
|
45
|
+
# )
|
46
|
+
#
|
47
|
+
# @tag_genres[ :inline ] = [ ]
|
48
|
+
#
|
49
|
+
# @tag_genres[ :inline ] << TagFactory.new( :bold,
|
50
|
+
# # An asterisk followed by a letter or number
|
51
|
+
# :open_match => /\*(?=[a-z0-9])/i,
|
52
|
+
#
|
53
|
+
# # Close when I see an asterisk OR a newline coming up
|
54
|
+
# :close_match => /\*|(?=\n)/,
|
55
|
+
# :allows_text => true,
|
56
|
+
# :allowed_genre => :inline
|
57
|
+
# )
|
58
|
+
#
|
59
|
+
# @tag_genres[ :inline ] << TagFactory.new( :italic,
|
60
|
+
# # An underscore followed by a letter or number
|
61
|
+
# :open_match => /_(?=[a-z0-9])/i,
|
62
|
+
#
|
63
|
+
# # Close when I see an underscore OR a newline coming up
|
64
|
+
# :close_match => /_|(?=\n)/,
|
65
|
+
# :allows_text => true,
|
66
|
+
# :allowed_genre => :inline
|
67
|
+
# )
|
68
|
+
# end
|
69
|
+
#
|
70
|
+
# raw_text = <<ENDINPUT
|
71
|
+
# Hello World! You're _soaking in_ my test.
|
72
|
+
# This is a *subset* of markup that I allow.
|
73
|
+
#
|
74
|
+
# Hi paragraph two. Yo! A code sample:
|
75
|
+
#
|
76
|
+
# def foo
|
77
|
+
# puts "Whee!"
|
78
|
+
# end
|
79
|
+
#
|
80
|
+
# _That, as they say, is that._
|
81
|
+
#
|
82
|
+
# ENDINPUT
|
83
|
+
#
|
84
|
+
# markup = SimpleMarkup.new( raw_text ).to_xml
|
85
|
+
# puts markup
|
86
|
+
#
|
87
|
+
#
|
88
|
+
# #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
|
89
|
+
# #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
|
90
|
+
# #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
|
91
|
+
# #=> <preformatted>def foo
|
92
|
+
# #=> puts "Whee!"
|
93
|
+
# #=> end</preformatted>
|
94
|
+
# #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
|
95
|
+
#
|
96
|
+
#
|
97
|
+
# = Details
|
98
|
+
#
|
99
|
+
# == TagFactories at 10,000 feet
|
100
|
+
# Each possible output tag is described by a TagFactory, which specifies
|
101
|
+
# some or all of the following:
|
102
|
+
# * The name of the tags it creates <i>(required)</i>
|
103
|
+
# * The regular expression to look for to start the tag
|
104
|
+
# * The regular expression to look for to close the tag, or
|
105
|
+
# * Whether the tag is automatically closed after creation
|
106
|
+
# * What genre of tags are allowed within the tag
|
107
|
+
# * Whether the tag supports raw text inside it
|
108
|
+
# * Code to run when creating a tag
|
109
|
+
#
|
110
|
+
# See the TagFactory class for more information on specifying factories.
|
111
|
+
#
|
112
|
+
# == Genres as a State Machine
|
113
|
+
# As a new tag is opened, the scanner uses the Tag#allowed_genre property
|
114
|
+
# of that tag (set by the +allowed_genre+ property on the TagFactory) to
|
115
|
+
# determine which tags to be looking for. A genre is specified by adding
|
116
|
+
# an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
|
117
|
+
# For example:
|
118
|
+
# @tag_genres[ :inline ] = [ ]
|
119
|
+
# adds a new genre named 'inline', with no tags in it. TagFactory instances
|
120
|
+
# should be pushed onto this array <b>in the order that they should be looked
|
121
|
+
# for</b>. For example:
|
122
|
+
# @tag_genres[ :inline ] << TagFactory.new( :italic,
|
123
|
+
# # see the TagFactory#initialize for options
|
124
|
+
# )
|
125
|
+
#
|
126
|
+
# Note that the +close_match+ regular expression of the current tag is
|
127
|
+
# always checked before looking to open/create any new tags.
|
128
|
+
#
|
129
|
+
# == Consuming Text
|
130
|
+
# As the text is being parsed, there will (probably) be many cases where
|
131
|
+
# you have raw text that doesn't close or open any new tags. Whenever the
|
132
|
+
# scanner reaches this state, it runs the <tt>@text_match</tt> regexp
|
133
|
+
# against the text to move the pointer ahead. If the current tag has
|
134
|
+
# <tt>Tag#allows_text?</tt> set to +true+ (through
|
135
|
+
# <tt>TagFactory#allows_text</tt>), then this text is added as contents of
|
136
|
+
# the tag. If not, the text is thrown away.
|
137
|
+
#
|
138
|
+
# The safest regular expression consumes only one character at a time:
|
139
|
+
# @text_match = /./m
|
140
|
+
#
|
141
|
+
# <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
|
142
|
+
# <b><i>unless every single one of your tags is set to close upon seeing
|
143
|
+
# a newline.</i></b>
|
144
|
+
#
|
145
|
+
# Unfortunately, the safest regular expression is also the slowest. If
|
146
|
+
# speed is an issue, your regexp should strive to eat as many characters as
|
147
|
+
# possible at once...while ensuring that it doesn't eat characters that
|
148
|
+
# would signify the start of a new tag.
|
149
|
+
#
|
150
|
+
# For example, setting a regexp like:
|
151
|
+
# @text_match = /\w+|./m
|
152
|
+
# allows the scanner to match a whole word at a time. However, if you have
|
153
|
+
# a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
|
154
|
+
# the entire string would be eaten as text and the subscript tag would
|
155
|
+
# never start.
|
156
|
+
#
|
157
|
+
# == Using the Scanner
|
158
|
+
# As shown in the example above, consumers of your class initialize it by
|
159
|
+
# passing in the string to be parsed, and then calling #to_xml or #to_html
|
160
|
+
# on it.
|
161
|
+
#
|
162
|
+
# <i>(This two-step process allows the consumer to run other code after
|
163
|
+
# the tag parsing, before final conversion. Examples might include
|
164
|
+
# replacing special command tags with other input, or performing database
|
165
|
+
# lookups on special wiki-page-link tags and replacing with HTML
|
166
|
+
# anchors.)</i>
|
167
|
+
class TagTreeScanner
|
168
|
+
VERSION = "0.8.0"
|
169
|
+
|
170
|
+
# A TagFactory holds the information about a specific kind of tag:
|
171
|
+
# * the name of the tag
|
172
|
+
# * what to look for to open and close the tag
|
173
|
+
# * what genre of tags it may contain
|
174
|
+
# * whether the tag permits raw text
|
175
|
+
# * additional code to run when creating the tag
|
176
|
+
#
|
177
|
+
# See the documentation about the <tt>@tag_genres</tt> hash inside
|
178
|
+
# the TagTreeScanner class for information on how to add factories
|
179
|
+
# for use.
|
180
|
+
#
|
181
|
+
# === Utilizing <tt>:autoclose</tt>
|
182
|
+
# Occasionally you will want to
|
183
|
+
# create a tag and allow no other tags inside it. An example might be
|
184
|
+
# a tag containing preformatted code.
|
185
|
+
#
|
186
|
+
# Rather than opening the tag and slowly spinning through all the
|
187
|
+
# text, the combination of the <tt>:autoclose</tt> and
|
188
|
+
# <tt>:setup</tt> options allow you to create the tag, fill it with
|
189
|
+
# content, and then immediately continute with the parent tag.
|
190
|
+
#
|
191
|
+
# See the #new method for how to use the <tt>:setup</tt>
|
192
|
+
# function, and an example usage.
|
193
|
+
class TagFactory
|
194
|
+
# The type of tag this factory produces.
|
195
|
+
attr_accessor :tag_name
|
196
|
+
|
197
|
+
# A regexp to match (and consume) that causes a new tag to be started.
|
198
|
+
attr_accessor :open_match
|
199
|
+
|
200
|
+
# Does the #open_match regexp require beginning of line?
|
201
|
+
attr_accessor :open_requires_bol
|
202
|
+
|
203
|
+
# The regexp which causes the tag to automatically close.
|
204
|
+
attr_accessor :close_match
|
205
|
+
|
206
|
+
# Does the #open_match regexp require beginning of line?
|
207
|
+
attr_accessor :close_requires_bol
|
208
|
+
|
209
|
+
# Should this tag stay open when created, or automatically close?
|
210
|
+
attr_accessor :autoclose
|
211
|
+
|
212
|
+
# A symbol with the genre of tags that are allowed inside the tag.
|
213
|
+
# <i>(See @tag_genres in the TagTreeScanner documentation.)</i>
|
214
|
+
attr_accessor :allowed_genre
|
215
|
+
|
216
|
+
# May tags created by this factory have text added to them?
|
217
|
+
attr_accessor :allows_text
|
218
|
+
|
219
|
+
# __tag_name__:: A symbol with the name of the tag to create
|
220
|
+
# __options__:: A hash including one or more of <tt>:open_match</tt>,
|
221
|
+
# <tt>:open_requires_bol</tt>, <tt>:close_match</tt>,
|
222
|
+
# <tt>:close_requires_bol</tt>, <tt>:autoclose</tt>,
|
223
|
+
# <tt>:allows_text</tt>, <tt>:allowed_genre</tt>, and
|
224
|
+
# <tt>:setup</tt>.
|
225
|
+
#
|
226
|
+
# Due to the way the StringScanner class works, placing a <tt>^</tt>
|
227
|
+
# (beginning of line) marker in your <tt>:open_match</tt> or
|
228
|
+
# <tt>:close_match</tt> regular expressions will not behave as
|
229
|
+
# desired. Instead, set the <tt>:open_requires_bol</tt> and/or
|
230
|
+
# <tt>:close_requires_bol</tt> properties to +true+ if desired.
|
231
|
+
#
|
232
|
+
# A factory should either be set to <tt>:autoclose => true</tt>, or
|
233
|
+
# supply a <tt>:close_match</tt>. (Otherwise, it will never close.)
|
234
|
+
#
|
235
|
+
# Further, a factory should either be set to
|
236
|
+
# <tt>:autoclose => true</tt> or specify an <tt>:allowed_genre</tt>.
|
237
|
+
# <i>(See below for how to efficiently create a tag that cannot
|
238
|
+
# contain other tags.)</i>
|
239
|
+
#
|
240
|
+
# The <tt>:setup</tt> option is used to run code during the tag
|
241
|
+
# creation. The value of this option should be a lambda/Proc that
|
242
|
+
# accepts three parameters:
|
243
|
+
# * the <b>Tag</b> being created
|
244
|
+
# * the <b>StringScanner</b> instance that matched the tag opening
|
245
|
+
# * the <b>TagTreeScanner</b> instance creating the tag.
|
246
|
+
#
|
247
|
+
# === Example:
|
248
|
+
# # Shove URLs as HTML anchors, without the protocol prefix shown
|
249
|
+
# @tag_genres[ :inline ] << TagFactory.new( :a,
|
250
|
+
# :open_match => %r{http://(\S+)},
|
251
|
+
# :setup => lambda{ |tag, ss, tagtree|
|
252
|
+
# tag.attributes[ :href ] = ss[0]
|
253
|
+
# tag << ss[1]
|
254
|
+
# },
|
255
|
+
# :autoclose => true
|
256
|
+
# )
|
257
|
+
def initialize( tag_name, options={} )
|
258
|
+
@tag_name = tag_name
|
259
|
+
[ :open_match, :close_match,
|
260
|
+
:open_requires_bol, :close_requires_bol,
|
261
|
+
:allowed_genre, :autoclose,
|
262
|
+
:allows_text,
|
263
|
+
:setup, :attributes ].each{ |k|
|
264
|
+
self.instance_variable_set( "@#{k}".intern, options[ k ] )
|
265
|
+
}
|
266
|
+
end
|
267
|
+
|
268
|
+
# Creates and returns a new tag if the supplied _string_scanner_
|
269
|
+
# matches the +open_match+ of this factory.
|
270
|
+
#
|
271
|
+
# Called by TagTreeScanner during initialization.
|
272
|
+
def match( string_scanner, tagtreescanner ) #:nodoc:
|
273
|
+
#puts "Matching #{@open_match.inspect} against #{string_scanner.peek(10)}"
|
274
|
+
return nil unless ( !@open_requires_bol || string_scanner.bol? ) && string_scanner.scan( @open_match )
|
275
|
+
tag = maketag
|
276
|
+
@setup.call( tag, string_scanner, tagtreescanner ) if @setup
|
277
|
+
#puts "...created #{tag}"
|
278
|
+
tag
|
279
|
+
end
|
280
|
+
|
281
|
+
# Creates a tag from the factory manually
|
282
|
+
def create #:nodoc:
|
283
|
+
tag = maketag
|
284
|
+
@setup.call( tag, nil, nil ) if @setup
|
285
|
+
tag
|
286
|
+
end
|
287
|
+
|
288
|
+
private
|
289
|
+
# DRY common code
|
290
|
+
def maketag #:nodoc:
|
291
|
+
tag = Tag.new( @tag_name )
|
292
|
+
tag.factory = self
|
293
|
+
tag.attributes = @attributes if @attributes
|
294
|
+
tag
|
295
|
+
end
|
296
|
+
end
|
297
|
+
|
298
|
+
# Tags are the equivalent of a DOM Element. The majority of tags
|
299
|
+
# are created automatically by a TagFactory, but it may be
|
300
|
+
# necessary to create them directly in order to augment or replace
|
301
|
+
# information in the tag tree.
|
302
|
+
#
|
303
|
+
# A Tag may have one or more attributes, which are pairs of
|
304
|
+
# key/value strings; attributes are output in the HTML or XML
|
305
|
+
# representation of the Tag.
|
306
|
+
#
|
307
|
+
# Each tag also has an <tt>info</tt> hash, which may be used to
|
308
|
+
# keep track of extra bits of information about a tag. <i>Example
|
309
|
+
# usages might be keeping track of the depth of a list item, or the
|
310
|
+
# associated section for a header.</i> Information from the +info+
|
311
|
+
# hash is not output in the HTML or XML representations.
|
312
|
+
class Tag
|
313
|
+
# A symbol with the name of this tag
|
314
|
+
attr_accessor :name
|
315
|
+
|
316
|
+
# An array of child Tag or TextNode instances
|
317
|
+
attr_accessor :child_tags
|
318
|
+
|
319
|
+
# A hash of key/value attributes to emit in the XML/HTML
|
320
|
+
# representation
|
321
|
+
attr_accessor :attributes
|
322
|
+
|
323
|
+
# The TagFactory that created this tag (may be +nil+)
|
324
|
+
attr_accessor :factory
|
325
|
+
|
326
|
+
# A hash that may be used to store extra information about a Tag
|
327
|
+
attr_accessor :info
|
328
|
+
|
329
|
+
# The Tag to which this tag is attached (may be +nil+)
|
330
|
+
attr_reader :parent_tag
|
331
|
+
|
332
|
+
# The Tag or TextNode which immediately follows this tag
|
333
|
+
# (may be +nil+ if this is the last tag of its parent)
|
334
|
+
attr_reader :next_sibling
|
335
|
+
|
336
|
+
# The Tag or TextNode which immediately precedes this tag
|
337
|
+
# (may be +nil+ if this is the first tag of its parent)
|
338
|
+
attr_reader :previous_sibling
|
339
|
+
|
340
|
+
# _name_:: A symbol with the name of this tag
|
341
|
+
# _attributes_:: A hash of key/value pairs to store with this tag
|
342
|
+
def initialize( name, attributes={} )
|
343
|
+
@name = name
|
344
|
+
@child_tags = [ ]
|
345
|
+
@attributes = attributes
|
346
|
+
@info = {}
|
347
|
+
end
|
348
|
+
|
349
|
+
# Allows for settings HTML or XML-like attributes directly without
|
350
|
+
# knowing about the _attributes_ collection. For example:
|
351
|
+
# tag.href = 'http://www.google.com'
|
352
|
+
# tag.class = 'external'
|
353
|
+
# is the same as:
|
354
|
+
# tag.attributes['href'] = 'http://www.google.com'
|
355
|
+
# tag.attributes['class'] = 'external'
|
356
|
+
# ...for any attributes (like the above) that don't have the same
|
357
|
+
# name as an existing method or attribute on the Tag class.
|
358
|
+
def method_missing( name, *args )
|
359
|
+
if (name=name.to_s) =~ /=$/
|
360
|
+
@attributes[ name[0...-1] ] = (args.size==1 ? args[0] : args )
|
361
|
+
else
|
362
|
+
@attributes[ name ]
|
363
|
+
end
|
364
|
+
end
|
365
|
+
|
366
|
+
# Returns the +close_match+ property of the owning TagFactory,
|
367
|
+
# or +nil+ if this tag wasn't created by a factory.
|
368
|
+
def close_match
|
369
|
+
@factory && @factory.close_match
|
370
|
+
end
|
371
|
+
|
372
|
+
# Returns the +close_requires_bol+ property of the owning TagFactory,
|
373
|
+
# or +nil+ if this tag wasn't created by a factory.
|
374
|
+
def close_requires_bol?
|
375
|
+
@factory && @factory.close_requires_bol
|
376
|
+
end
|
377
|
+
|
378
|
+
# Returns the +autoclose+ property of the owning TagFactory,
|
379
|
+
# or +nil+ if this tag wasn't created by a factory.
|
380
|
+
def autoclose?
|
381
|
+
@factory && @factory.autoclose
|
382
|
+
end
|
383
|
+
|
384
|
+
# Returns the +allows_text+ property of the owning TagFactory,
|
385
|
+
# or +true+ if this tag wasn't created by a factory.
|
386
|
+
def allows_text?
|
387
|
+
@factory ? @factory.allows_text : true
|
388
|
+
end
|
389
|
+
|
390
|
+
# Returns the +allowed_genre+ property of the owning TagFactory,
|
391
|
+
# or +nil+ if this tag wasn't created by a factory.
|
392
|
+
def allowed_genre
|
393
|
+
@factory && @factory.allowed_genre
|
394
|
+
end
|
395
|
+
|
396
|
+
# _new_child_:: The Tag or TextNode to add as the last child.
|
397
|
+
#
|
398
|
+
# Adds _new_child_ to the end of this tag's +child_tags_ collection.
|
399
|
+
# Returns a reference to _new_child_.
|
400
|
+
#
|
401
|
+
# If _new_child_ is a child of another Tag, it is first removed from
|
402
|
+
# that tag.
|
403
|
+
def append_child( new_child )
|
404
|
+
return if new_child == @child_tags.last
|
405
|
+
insert_after( new_child, @child_tags.last )
|
406
|
+
end
|
407
|
+
|
408
|
+
# _new_child_:: The Tag or TextNode to add as a child of this tag.
|
409
|
+
# _reference_child_:: The child to place _new_child_ before.
|
410
|
+
#
|
411
|
+
# Adds _new_child_ as a child of this tag, immediately before the
|
412
|
+
# location of _reference_child_. Returns a reference to _new_child_.
|
413
|
+
#
|
414
|
+
# If _reference_child_ is +nil+, the child is added as the last
|
415
|
+
# child of this tag. A RuntimeError is raised if _reference_child_
|
416
|
+
# is not a child of this tag.
|
417
|
+
#
|
418
|
+
# If _new_child_ is a child of another Tag, #remove_child is
|
419
|
+
# automatically invoked to remove it from that tag.
|
420
|
+
def insert_before( new_child, reference_child=nil )
|
421
|
+
return new_child if reference_child ? ( reference_child.previous_sibling == new_child ) : ( new_child == @child_tags.last )
|
422
|
+
insert_after( new_child, reference_child ? reference_child.previous_sibling : @child_tags.last )
|
423
|
+
end
|
424
|
+
|
425
|
+
# _new_child_:: The Tag or TextNode to add as a child of this tag.
|
426
|
+
# _reference_child_:: The child to place _new_child_ after.
|
427
|
+
#
|
428
|
+
# Adds _new_child_ as a child of this tag, immediately after the
|
429
|
+
# location of _reference_child_. Returns a reference to _new_child_.
|
430
|
+
#
|
431
|
+
# If _reference_child_ is +nil+, the child is added as the first
|
432
|
+
# child of this tag. A RuntimeError is raised if _reference_child_
|
433
|
+
# is not a child of this tag.
|
434
|
+
#
|
435
|
+
# If _new_child_ is a child of another Tag, #remove_child is
|
436
|
+
# automatically invoked to remove it from that tag.
|
437
|
+
def insert_after( new_child, reference_child=nil )
|
438
|
+
#puts "#{self.inspect}#insert_after( #{new_child.inspect}, #{reference_child.inspect} )"
|
439
|
+
return new_child if reference_child ? ( reference_child.next_sibling == new_child ) : ( new_child == @child_tags.first )
|
440
|
+
|
441
|
+
#Ensure new_child is not not an ancestor of self
|
442
|
+
walker = self
|
443
|
+
while walker
|
444
|
+
raise "#{new_child.inspect} cannot be added under #{self.inspect}, because it is an ancestor of it!" if walker==new_child
|
445
|
+
walker = walker.parent_tag
|
446
|
+
end
|
447
|
+
|
448
|
+
new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
|
449
|
+
if reference_child
|
450
|
+
new_idx = @child_tags.index( reference_child )
|
451
|
+
raise "#{reference_child.inspect} is not a child of #{self.inspect}" unless new_idx
|
452
|
+
new_idx += 1
|
453
|
+
else
|
454
|
+
new_idx = 0
|
455
|
+
end
|
456
|
+
new_child.parent_tag = self
|
457
|
+
succ = @child_tags[ new_idx ]
|
458
|
+
@child_tags.insert( new_idx, new_child )
|
459
|
+
new_child.previous_sibling = reference_child
|
460
|
+
reference_child.next_sibling = new_child if reference_child
|
461
|
+
new_child.next_sibling = succ
|
462
|
+
succ.previous_sibling = new_child if succ
|
463
|
+
new_child
|
464
|
+
end
|
465
|
+
|
466
|
+
# _existing_child_:: The Tag or TextNode to remove.
|
467
|
+
#
|
468
|
+
# Removes _existing_child_ from being a child of this tag.
|
469
|
+
# Returns _existing_child_.
|
470
|
+
#
|
471
|
+
# A RuntimeError is raised if _existing_child_ is not a child of
|
472
|
+
# this tag.
|
473
|
+
#
|
474
|
+
# If _new_child_ is a child of another Tag, #remove_child is
|
475
|
+
# automatically invoked to remove it from that tag.
|
476
|
+
def remove_child( existing_child )
|
477
|
+
idx = @child_tags.index( existing_child )
|
478
|
+
raise "#{existing_child.inspect} is not a child of #{self.inspect}" unless idx
|
479
|
+
prev, succ = existing_child.previous_sibling, existing_child.next_sibling
|
480
|
+
prev.next_sibling = succ if prev
|
481
|
+
succ.previous_sibling = prev if succ
|
482
|
+
@child_tags.delete_at( idx )
|
483
|
+
existing_child.previous_sibling = existing_child.next_sibling = existing_child.parent_tag = nil
|
484
|
+
existing_child
|
485
|
+
end
|
486
|
+
|
487
|
+
# _old_child_:: The existing child Tag or TextNode to replace.
|
488
|
+
# _new_child_:: The Tag or TextNode to replace _old_child_.
|
489
|
+
#
|
490
|
+
# Replaces _old_child_ with _new_child_ in this collection.
|
491
|
+
# Returns _old_child_.
|
492
|
+
#
|
493
|
+
# A RuntimeError is raised if _existing_child_ is not a child of
|
494
|
+
# this tag.
|
495
|
+
#
|
496
|
+
# If _new_child_ is a child of another Tag, #remove_child is
|
497
|
+
# automatically invoked to remove it from that tag.
|
498
|
+
def replace_child( old_child, new_child )
|
499
|
+
if ( prev = old_child.previous_sibling ) == new_child || old_child.next_sibling == new_child
|
500
|
+
remove_child( old_child )
|
501
|
+
else
|
502
|
+
new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
|
503
|
+
remove_child( old_child )
|
504
|
+
insert_after( new_child, prev )
|
505
|
+
end
|
506
|
+
old_child
|
507
|
+
end
|
508
|
+
|
509
|
+
# _new_child_:: The Tag or TextNode to replace this tag.
|
510
|
+
#
|
511
|
+
# Replaces this tag with _new_child_. Returns _new_child_.
|
512
|
+
#
|
513
|
+
# A RuntimeError is raised if this tag is not a child of another tag.
|
514
|
+
#
|
515
|
+
# If _new_child_ is a child of another Tag, #remove_child is
|
516
|
+
# automatically invoked to remove it from that tag.
|
517
|
+
def replace_with( new_child )
|
518
|
+
return new_child if new_child == self
|
519
|
+
raise "#{self.inspect} is not a child of another tag" unless @parent_tag
|
520
|
+
@parent_tag.replace_child( self, new_child )
|
521
|
+
new_child
|
522
|
+
end
|
523
|
+
|
524
|
+
# _additional_text_:: The text to add to this node.
|
525
|
+
#
|
526
|
+
# Appends _additional_text_ to this tag. If the last item in the
|
527
|
+
# +child_tags+ collection is a TextNode, the text is added to that
|
528
|
+
# item; otherwise, a new TextNode is created with _additional_text_
|
529
|
+
# and added as the last child of this tag.
|
530
|
+
def << ( additional_text )
|
531
|
+
last_child = @child_tags.last
|
532
|
+
if last_child.is_a? TextNode
|
533
|
+
last_child << additional_text
|
534
|
+
else
|
535
|
+
append_child( TextNode.new( additional_text ) )
|
536
|
+
end
|
537
|
+
end
|
538
|
+
|
539
|
+
# Set the text content of this element to _new_contents_
|
540
|
+
# Removes any child tags (and their text)
|
541
|
+
def text=( new_contents )
|
542
|
+
@child_tags.clear
|
543
|
+
append_child( TextNode.new( new_contents ) )
|
544
|
+
end
|
545
|
+
|
546
|
+
alias_method :inner_text=, :text=
|
547
|
+
|
548
|
+
# Returns an HTML representation of this tag and all its descendants.
|
549
|
+
#
|
550
|
+
# This method is the same as #to_xml except that tags without
|
551
|
+
# any +child_tags+ use an explicit close tag, e.g.
|
552
|
+
# <tt><div></div></tt> instead of XML's <tt><div /></tt>
|
553
|
+
def to_html
|
554
|
+
to_xml( false )
|
555
|
+
end
|
556
|
+
|
557
|
+
# Returns an XML representation of this tag and all its descendants.
|
558
|
+
#
|
559
|
+
# If _empty_tags_collapsed_ is +true+ (the default) then this method
|
560
|
+
# is the same as #to_html except that tags without any +child_tags+
|
561
|
+
# use a single closed tag, e.g.
|
562
|
+
# <tt><div /></tt> instead of HTML's <tt><div></div></tt>
|
563
|
+
#
|
564
|
+
# If _empty_tags_collapsed_ is +false+, this is the same as #to_html.
|
565
|
+
def to_xml( empty_tags_collapsed=true )
|
566
|
+
out = "<#{@name}"
|
567
|
+
@attributes.each{ |k,v| out << " #{k}=\"#{v.to_s.gsub( '""', '"' )}\"" }
|
568
|
+
if empty_tags_collapsed && @child_tags.empty?
|
569
|
+
out << ' />'
|
570
|
+
else
|
571
|
+
out << '>'
|
572
|
+
unless @child_tags.empty?
|
573
|
+
out << "\n" unless self.allows_text?
|
574
|
+
@child_tags.each{ |tag|
|
575
|
+
out << tag.to_xml( empty_tags_collapsed )
|
576
|
+
}
|
577
|
+
end
|
578
|
+
out << "</#{@name}>"
|
579
|
+
end
|
580
|
+
out << "\n" if @parent_tag && !@parent_tag.allows_text?
|
581
|
+
out
|
582
|
+
end
|
583
|
+
|
584
|
+
# Returns an array of all descendants of this tag whose #name
|
585
|
+
# matches the supplied _name_.
|
586
|
+
def tags_by_name( name )
|
587
|
+
out = []
|
588
|
+
@child_tags.each{ |tag|
|
589
|
+
out << tag if tag.name == name
|
590
|
+
unless tag.child_tags.empty?
|
591
|
+
out.concat( tag.tags_by_name( name ) )
|
592
|
+
end
|
593
|
+
}
|
594
|
+
out
|
595
|
+
end
|
596
|
+
|
597
|
+
# Returns the text contents of this tag and its descendants.
|
598
|
+
def inner_text
|
599
|
+
@child_tags.inject(''){ |out,tag|
|
600
|
+
out << ( tag.is_a?( TextNode ) ? tag.text : tag.inner_text )
|
601
|
+
}
|
602
|
+
end
|
603
|
+
|
604
|
+
def inspect #:nodoc:
|
605
|
+
out = "<#{@name}"
|
606
|
+
#out << " @pops=#{@parent_tag ? @parent_tag.name.inspect : 'nil'}"
|
607
|
+
#out << " @prev=#{@previous_sibling ? @previous_sibling.name.inspect : 'nil'}"
|
608
|
+
#out << " @next=#{@next_sibling ? @next_sibling.name.inspect : 'nil'}"
|
609
|
+
@attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
|
610
|
+
@info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
|
611
|
+
children = @child_tags.length
|
612
|
+
if children == 1 && TextNode === @child_tags.first
|
613
|
+
out << ">#{@child_tags.first}</#{@name}"
|
614
|
+
elsif children == 0
|
615
|
+
out << '>'
|
616
|
+
else
|
617
|
+
out << " (#{@child_tags.length} child#{@child_tags.length != 1 ? 'ren' : ''})>"
|
618
|
+
end
|
619
|
+
end
|
620
|
+
|
621
|
+
# _level_:: The indentation level (tabs) to start at.
|
622
|
+
#
|
623
|
+
# Returns a full-hierarchical representation of this tag and its
|
624
|
+
# descendants. (Used for debugging.)
|
625
|
+
def to_hier( level=0 ) #:nodoc:
|
626
|
+
tabs = "\t" * level
|
627
|
+
out = "#{tabs}<#{@name}"
|
628
|
+
@attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
|
629
|
+
@info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
|
630
|
+
if @child_tags.empty?
|
631
|
+
out << " />\n"
|
632
|
+
elsif @child_tags.length == 1 && TextNode === @child_tags.first
|
633
|
+
out << ">#{@child_tags.first}</#{@name}>\n"
|
634
|
+
else
|
635
|
+
out << ">\n"
|
636
|
+
@child_tags.each{ |n| out << n.to_hier(level+1) }
|
637
|
+
out << "#{tabs}</#{@name}>\n"
|
638
|
+
end
|
639
|
+
out
|
640
|
+
end
|
641
|
+
|
642
|
+
# Returns a copy of this tag and its entire hierarchy.
|
643
|
+
# All descendant tags/text nodes are also cloned.
|
644
|
+
#
|
645
|
+
# The +info+ hash is not preserved.
|
646
|
+
def dup
|
647
|
+
tag = self.class.new( self.name, self.attributes.dup )
|
648
|
+
@child_tags.each{ |tag2| tag.append_child( tag2.dup ) }
|
649
|
+
tag
|
650
|
+
end
|
651
|
+
|
652
|
+
# :stopdoc:
|
653
|
+
protected
|
654
|
+
attr_writer :previous_sibling, :next_sibling, :parent_tag
|
655
|
+
# :startdoc:
|
656
|
+
|
657
|
+
end
|
658
|
+
|
659
|
+
# A TextNode holds raw text inside a Tag. Generally, TextNodes are
|
660
|
+
# created automatically by the Tag#<< method.
|
661
|
+
class TextNode
|
662
|
+
# The Tag or TextNode that comes after this one (may be +nil+)
|
663
|
+
attr_accessor :next_sibling
|
664
|
+
|
665
|
+
# The Tag or TextNode that comes before this one (may be +nil+)
|
666
|
+
attr_accessor :previous_sibling
|
667
|
+
|
668
|
+
# The Tag that is a parent of this TextNode (may be +nil+)
|
669
|
+
attr_accessor :parent_tag
|
670
|
+
|
671
|
+
# A hash which may be used to store 'extra' information
|
672
|
+
attr_accessor :info
|
673
|
+
|
674
|
+
# The string contents of this text node
|
675
|
+
attr_accessor :text
|
676
|
+
|
677
|
+
# _text_:: The text to start out with
|
678
|
+
def initialize( text='' )
|
679
|
+
@text = text
|
680
|
+
@info = {}
|
681
|
+
end
|
682
|
+
|
683
|
+
# _additional_text_:: The text to add
|
684
|
+
#
|
685
|
+
# Appends the provided text to the end of the current text
|
686
|
+
#
|
687
|
+
# Returns the new text value
|
688
|
+
def << ( additional_text )
|
689
|
+
@text << additional_text
|
690
|
+
end
|
691
|
+
|
692
|
+
# Returns a copy of this text node
|
693
|
+
def dup
|
694
|
+
tag = self.class.new( @text.dup )
|
695
|
+
end
|
696
|
+
|
697
|
+
def to_hier( level=0 ) #:nodoc:
|
698
|
+
"#{"\t"*level}#{@text.inspect}\n"
|
699
|
+
end
|
700
|
+
|
701
|
+
def to_s #:nodoc:
|
702
|
+
@text
|
703
|
+
end
|
704
|
+
|
705
|
+
# Returns the contents of this node, modified to be made XML-safe
|
706
|
+
# by calling String#xmlsafe.
|
707
|
+
def to_xml( *args )
|
708
|
+
@text.xmlsafe
|
709
|
+
end
|
710
|
+
end
|
711
|
+
|
712
|
+
# RDoc thinks that this stuff applies to instances, not the class
|
713
|
+
# :stopdoc:
|
714
|
+
class << self
|
715
|
+
attr_accessor :tag_genres, :root_factory, :text_match
|
716
|
+
end
|
717
|
+
# :startdoc:
|
718
|
+
|
719
|
+
# The tag_genres hash maps a genre name onto an array of TagFactories.
|
720
|
+
#
|
721
|
+
# Factories are tested in the order they appear in the genre array;
|
722
|
+
# more important matches are at the top, generic fallback ones
|
723
|
+
# should appear at the end of the list.
|
724
|
+
#
|
725
|
+
# If no factory matches the current input, then text is shoved into the
|
726
|
+
# most recent tag until a new tag start is found, or the closing match
|
727
|
+
# is met. (If the current tag's factory does not have :allows_text set
|
728
|
+
# to true, then the text is simply thrown away until a the closing or
|
729
|
+
# new tag start is found.)
|
730
|
+
@tag_genres = { }
|
731
|
+
|
732
|
+
# Settings for the root of your document: what genre is allowed at the
|
733
|
+
# highest level, and should raw text be allowed there?
|
734
|
+
#
|
735
|
+
# Override in your class by setting a class-instance variable as below.
|
736
|
+
@root_factory = TagFactory.new( :root,
|
737
|
+
:allowed_genre => :root,
|
738
|
+
:allows_text => true )
|
739
|
+
|
740
|
+
# The pattern to consume and shove as text whenever no tag start/close
|
741
|
+
# is found. Eating one character at a time is safest, but slow.
|
742
|
+
# Ensure that this pattern never lets you over the start of a tag,
|
743
|
+
# or else you'll miss it.
|
744
|
+
@text_match = /./m
|
745
|
+
|
746
|
+
# Scans through _string_to_parse_ and builds a tree of tags based
|
747
|
+
# on the regular expressions and rules set by the TagFactory
|
748
|
+
# instances present in <tt>@tag_genres</tt>.
|
749
|
+
#
|
750
|
+
# After parsing the tree, call #to_xml or #to_html to retrieve
|
751
|
+
# a string representation.
|
752
|
+
def initialize( string_to_parse )
|
753
|
+
current = @root = self.class.root_factory.create
|
754
|
+
tag_genres = self.class.tag_genres
|
755
|
+
text_match = self.class.text_match
|
756
|
+
|
757
|
+
ss = StringScanner.new( string_to_parse )
|
758
|
+
while !ss.eos?
|
759
|
+
# Keep popping off the current tag until we get to the root,
|
760
|
+
# as long as the end criteria is met
|
761
|
+
while ( current != @root ) && (!current.close_requires_bol? || ss.bol?) && ss.scan( current.close_match )
|
762
|
+
current = current.parent_tag || @root
|
763
|
+
end
|
764
|
+
|
765
|
+
# No point in continuing if closing out tags consumed the rest of the string
|
766
|
+
break if ss.eos?
|
767
|
+
|
768
|
+
# Look for a tag to open
|
769
|
+
if factories = tag_genres[ current.allowed_genre ]
|
770
|
+
tag = nil
|
771
|
+
factories.each{ |factory|
|
772
|
+
if tag = factory.match( ss, self )
|
773
|
+
current.append_child( tag )
|
774
|
+
current = tag unless tag.autoclose?
|
775
|
+
break
|
776
|
+
end
|
777
|
+
}
|
778
|
+
#start at the top of the loop if we found one
|
779
|
+
next if tag
|
780
|
+
end
|
781
|
+
|
782
|
+
# Couldn't find a valid tag at this spot
|
783
|
+
# so we need to eat some characters
|
784
|
+
consumed = ss.scan( text_match )
|
785
|
+
current << consumed if current.allows_text?
|
786
|
+
end
|
787
|
+
end
|
788
|
+
|
789
|
+
# Returns an HTML representation of the tag tree.
|
790
|
+
#
|
791
|
+
# This is the same as the #to_xml method except that empty tags use an
|
792
|
+
# explicit close tag, e.g. <tt><div></div></tt> versus <tt><div /></tt>
|
793
|
+
def to_html
|
794
|
+
@root.child_tags.inject(''){ |out, tag| out << tag.to_html }
|
795
|
+
end
|
796
|
+
|
797
|
+
# Returns an XML representation of the tag tree.
|
798
|
+
#
|
799
|
+
# This method is the same as the #to_html method except that empty tags
|
800
|
+
# do not use an explicit close tag,
|
801
|
+
# e.g. <tt><div /></tt> versus <tt><div></div></tt>
|
802
|
+
def to_xml
|
803
|
+
@root.child_tags.inject(''){ |out, tag| out << tag.to_xml }
|
804
|
+
end
|
805
|
+
|
806
|
+
# Returns an array of all root-level tags found
|
807
|
+
def tags
|
808
|
+
@root.child_tags
|
809
|
+
end
|
810
|
+
|
811
|
+
# Returns an array of all tags in the tree whose Tag#name matches
|
812
|
+
# the supplied _name_.
|
813
|
+
def tags_by_name( name )
|
814
|
+
@root.tags_by_type( name )
|
815
|
+
end
|
816
|
+
|
817
|
+
# Returns a hierarchical representation of the entire tag tree
|
818
|
+
def inspect #:nodoc:
|
819
|
+
@root.to_hier
|
820
|
+
end
|
821
|
+
|
822
|
+
# When a class inherits from TagTreeScanner, defaults are set for
|
823
|
+
# <tt>@tag_genres</tt>, <tt>@root_factory</tt> and
|
824
|
+
# <tt>@text_match</tt>
|
825
|
+
def self.inherited( child_class ) #:nodoc:
|
826
|
+
child_class.tag_genres = @tag_genres
|
827
|
+
child_class.root_factory = @root_factory
|
828
|
+
child_class.text_match = @text_match
|
829
|
+
end
|
830
|
+
end
|
831
|
+
|
832
|
+
# Extensions to the built-in String class
|
833
|
+
class String
|
834
|
+
|
835
|
+
# Returns a copy of the string with all <tt>&</tt>, <tt><</tt> and
|
836
|
+
# <tt>></tt> characters replaced by their equivalent XML entities
|
837
|
+
# (<tt>&</tt>, <tt><</tt>, and <tt>></tt>)
|
838
|
+
def xmlsafe
|
839
|
+
self.dup.xmlsafe!
|
840
|
+
end
|
841
|
+
|
842
|
+
# Modifies the string, replacing all <tt>&</tt>, <tt><</tt> and
|
843
|
+
# <tt>></tt> characters with their equivalent XML entities
|
844
|
+
# (<tt>&</tt>, <tt><</tt>, and <tt>></tt>)
|
845
|
+
def xmlsafe!
|
846
|
+
self.gsub!( /&/, '&' )
|
847
|
+
self.gsub!( /</, '<' )
|
848
|
+
self.gsub!( />/, '>' )
|
849
|
+
self
|
850
|
+
end
|
851
|
+
end
|
@@ -0,0 +1,84 @@
|
|
1
|
+
require "test/unit"
|
2
|
+
require "../lib/tagtreescanner.rb"
|
3
|
+
|
4
|
+
class SimpleMarkup < TagTreeScanner
|
5
|
+
@root_factory.allows_text = false
|
6
|
+
|
7
|
+
@tag_genres[ :root ] = [ ]
|
8
|
+
|
9
|
+
@tag_genres[ :root ] << TagFactory.new( :paragraph,
|
10
|
+
# A line that doesn't have whitespace at the start
|
11
|
+
:open_match => /(?=\S)/, :open_requires_bol => true,
|
12
|
+
|
13
|
+
# Close when you see a double return
|
14
|
+
:close_match => /\n[ \t]*\n/,
|
15
|
+
:allows_text => :true,
|
16
|
+
:allowed_genre => :inline
|
17
|
+
)
|
18
|
+
|
19
|
+
@tag_genres[ :root ] << TagFactory.new( :preformatted,
|
20
|
+
# Grab all lines that are indented up until a line that isn't
|
21
|
+
:open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
|
22
|
+
:setup => lambda{ |tag, scanner, tagtree|
|
23
|
+
# Throw the contents I found into the tag
|
24
|
+
# but remove leading whitespace
|
25
|
+
tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
|
26
|
+
},
|
27
|
+
:autoclose => :true
|
28
|
+
)
|
29
|
+
|
30
|
+
@tag_genres[ :inline ] = [ ]
|
31
|
+
|
32
|
+
@tag_genres[ :inline ] << TagFactory.new( :bold,
|
33
|
+
# An asterisk followed by a letter or number
|
34
|
+
:open_match => /\*(?=[a-z0-9])/i,
|
35
|
+
|
36
|
+
# Close when I see an asterisk OR a newline coming up
|
37
|
+
:close_match => /\*|(?=\n)/,
|
38
|
+
:allows_text => true,
|
39
|
+
:allowed_genre => :inline
|
40
|
+
)
|
41
|
+
|
42
|
+
@tag_genres[ :inline ] << TagFactory.new( :italic,
|
43
|
+
# An underscore followed by a letter or number
|
44
|
+
:open_match => /_(?=[a-z0-9])/i,
|
45
|
+
|
46
|
+
# Close when I see an underscore OR a newline coming up
|
47
|
+
:close_match => /_|(?=\n)/,
|
48
|
+
:allows_text => true,
|
49
|
+
:allowed_genre => :inline
|
50
|
+
)
|
51
|
+
end
|
52
|
+
|
53
|
+
class Tag_Test < Test::Unit::TestCase
|
54
|
+
def setup
|
55
|
+
end
|
56
|
+
|
57
|
+
def test_conversion
|
58
|
+
raw_text = <<-ENDINPUT
|
59
|
+
Hello World! You're _soaking in_ my test.
|
60
|
+
This is a *subset* of markup that I allow.
|
61
|
+
|
62
|
+
Hi paragraph two. Yo! A code sample:
|
63
|
+
|
64
|
+
def foo
|
65
|
+
puts "Whee!"
|
66
|
+
end
|
67
|
+
|
68
|
+
_That, as they say, is that._
|
69
|
+
|
70
|
+
ENDINPUT
|
71
|
+
|
72
|
+
markup = SimpleMarkup.new( raw_text ).to_xml
|
73
|
+
p '',markup
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
|
78
|
+
#=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
|
79
|
+
#=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
|
80
|
+
#=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
|
81
|
+
#=> <preformatted>def foo
|
82
|
+
#=> puts "Whee!"
|
83
|
+
#=> end</preformatted>
|
84
|
+
#=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
|
@@ -0,0 +1,104 @@
|
|
1
|
+
require "test/unit"
|
2
|
+
require "../lib/tagtreescanner"
|
3
|
+
|
4
|
+
class Tag_Test < Test::Unit::TestCase
|
5
|
+
def setup
|
6
|
+
end
|
7
|
+
|
8
|
+
def test1_tags
|
9
|
+
root = TagTreeScanner::Tag.new( :root, { :is_root => true } )
|
10
|
+
assert_equal( :root, root.name )
|
11
|
+
assert_equal( true, root.attributes[ :is_root ] )
|
12
|
+
assert_nil( root.allowed_genre )
|
13
|
+
assert( root.allows_text? )
|
14
|
+
|
15
|
+
t1 = TagTreeScanner::Tag.new( :t1 )
|
16
|
+
root.append_child( t1 )
|
17
|
+
assert_equal( 1, root.child_tags.length )
|
18
|
+
assert_equal( t1, root.child_tags.first )
|
19
|
+
|
20
|
+
t2 = TagTreeScanner::Tag.new( :t2 )
|
21
|
+
root.append_child( t2 )
|
22
|
+
assert_equal( 2, root.child_tags.length )
|
23
|
+
assert_equal( t2, root.child_tags.last )
|
24
|
+
|
25
|
+
t3 = TagTreeScanner::Tag.new( :t3 )
|
26
|
+
root.insert_before( t3, t2 )
|
27
|
+
assert_equal( 3, root.child_tags.length )
|
28
|
+
assert_equal( [t1,t3,t2], root.child_tags )
|
29
|
+
|
30
|
+
root.append_child( t1 )
|
31
|
+
assert_equal( [t3,t2,t1], root.child_tags )
|
32
|
+
|
33
|
+
t1.replace_with( t3 )
|
34
|
+
assert_equal( [t2,t3], root.child_tags )
|
35
|
+
assert_nil( t1.parent_tag )
|
36
|
+
|
37
|
+
root.insert_before( t1, t2 )
|
38
|
+
assert_equal( [t1,t2,t3], root.child_tags )
|
39
|
+
assert_equal( root, t1.parent_tag )
|
40
|
+
|
41
|
+
root.append_child( t1 )
|
42
|
+
assert_equal( [t2,t3,t1], root.child_tags )
|
43
|
+
assert_equal( root, t1.parent_tag )
|
44
|
+
assert_nil( t1.next_sibling )
|
45
|
+
assert_nil( t2.previous_sibling )
|
46
|
+
|
47
|
+
t1.append_child( t3 )
|
48
|
+
assert_equal( [t2,t1], root.child_tags )
|
49
|
+
assert_nil( t3.next_sibling )
|
50
|
+
assert_nil( t3.previous_sibling )
|
51
|
+
assert_equal( t1, t2.next_sibling )
|
52
|
+
assert_equal( t2, t1.previous_sibling )
|
53
|
+
assert_equal( t3, t1.child_tags.first )
|
54
|
+
|
55
|
+
assert_raise( RuntimeError ){
|
56
|
+
t3.append_child( t1 )
|
57
|
+
}
|
58
|
+
|
59
|
+
assert_raise( RuntimeError ){
|
60
|
+
t1.append_child( t1 )
|
61
|
+
}
|
62
|
+
end
|
63
|
+
|
64
|
+
def test2_tags2
|
65
|
+
root = TagTreeScanner::Tag.new( :root )
|
66
|
+
# make a ton of tags...
|
67
|
+
1.upto(100){ |i|
|
68
|
+
root.append_child( TagTreeScanner::Tag.new( "t#{i}".intern ) )
|
69
|
+
}
|
70
|
+
|
71
|
+
# ...shuffle the hell out of them...
|
72
|
+
500.times{
|
73
|
+
next unless n1 = root.child_tags[ rand( root.child_tags.length ) ]
|
74
|
+
n2 = root.child_tags[ rand( root.child_tags.length ) ]
|
75
|
+
next if n1 == n2
|
76
|
+
case rand(30)
|
77
|
+
when 0
|
78
|
+
root.remove_child( n1 )
|
79
|
+
when 1
|
80
|
+
root.append_child( n1 )
|
81
|
+
when 2
|
82
|
+
root.insert_before( n1, nil )
|
83
|
+
when 3
|
84
|
+
root.insert_after( n1, nil )
|
85
|
+
when 4
|
86
|
+
root.insert_before( n1, n2 )
|
87
|
+
when 5
|
88
|
+
n1.replace_with( n2 )
|
89
|
+
else
|
90
|
+
root.insert_after( n1, n2 )
|
91
|
+
end
|
92
|
+
}
|
93
|
+
|
94
|
+
# ...and now ensure that they're all properly linked
|
95
|
+
last_tag = nil
|
96
|
+
root.child_tags.each{ |tag|
|
97
|
+
assert_equal( last_tag, tag.previous_sibling )
|
98
|
+
assert_equal( tag, last_tag.next_sibling ) if last_tag
|
99
|
+
assert_equal( root, tag.parent_tag )
|
100
|
+
last_tag = tag
|
101
|
+
}
|
102
|
+
assert_nil( last_tag.next_sibling ) if last_tag
|
103
|
+
end
|
104
|
+
end
|
metadata
ADDED
@@ -0,0 +1,63 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
rubygems_version: 0.9.4
|
3
|
+
specification_version: 1
|
4
|
+
name: tagtreescanner
|
5
|
+
version: !ruby/object:Gem::Version
|
6
|
+
version: 0.8.0
|
7
|
+
date: 2007-11-25 00:00:00 -07:00
|
8
|
+
summary: Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.
|
9
|
+
require_paths:
|
10
|
+
- lib
|
11
|
+
email: phrogz@mac.com
|
12
|
+
homepage:
|
13
|
+
rubyforge_project: tagtreescanner
|
14
|
+
description: The TagTreeScanner class provides a generic framework for creating a nested hierarchy of tags and text (like XML or HTML) by parsing text. An example use (and the reason it was written) is to convert a wiki markup syntax into HTML.
|
15
|
+
autorequire:
|
16
|
+
default_executable:
|
17
|
+
bindir: bin
|
18
|
+
has_rdoc: true
|
19
|
+
required_ruby_version: !ruby/object:Gem::Version::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">"
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: 0.0.0
|
24
|
+
version:
|
25
|
+
platform: ruby
|
26
|
+
signing_key:
|
27
|
+
cert_chain:
|
28
|
+
post_install_message:
|
29
|
+
authors:
|
30
|
+
- Gavin Kistner
|
31
|
+
files:
|
32
|
+
- HISTORY
|
33
|
+
- Manifest.txt
|
34
|
+
- README
|
35
|
+
- Rakefile
|
36
|
+
- TODO
|
37
|
+
- lib/tagtreescanner.rb
|
38
|
+
- test/test_simplemarkup.rb
|
39
|
+
- test/test_tagtreescanner.rb
|
40
|
+
test_files:
|
41
|
+
- test/test_simplemarkup.rb
|
42
|
+
- test/test_tagtreescanner.rb
|
43
|
+
rdoc_options:
|
44
|
+
- --main
|
45
|
+
- README.txt
|
46
|
+
extra_rdoc_files:
|
47
|
+
- Manifest.txt
|
48
|
+
executables: []
|
49
|
+
|
50
|
+
extensions: []
|
51
|
+
|
52
|
+
requirements: []
|
53
|
+
|
54
|
+
dependencies:
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: hoe
|
57
|
+
version_requirement:
|
58
|
+
version_requirements: !ruby/object:Gem::Version::Requirement
|
59
|
+
requirements:
|
60
|
+
- - ">="
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: 1.3.0
|
63
|
+
version:
|