tagtreescanner 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/HISTORY ADDED
@@ -0,0 +1,17 @@
1
+ == 0.8.0 / 2007-November-25
2
+
3
+ * First release as a gem. Breaks backwards compatibility with older versions.
4
+
5
+ * Changed TagTreeScanner::Tag#tag_name to TagTreeScanner::Tag#name
6
+ * ...because it was dumb to write "tag.tag_name = 'span'"
7
+
8
+ * Added a method_missing hack to TagTreeScanner::Tag that delegates
9
+ to read/write from its attributes hash.
10
+ * ...because I wanted people to be able to write "tag.href = 'foo'"
11
+
12
+ * New TagTreeScanner::Tag#text= method to directly set the contents of
13
+ a tag, clearing out any other junk.
14
+
15
+ == 0.6.1 / 2005-July-5
16
+
17
+ * Initial public release
data/Manifest.txt ADDED
@@ -0,0 +1,8 @@
1
+ HISTORY
2
+ Manifest.txt
3
+ README
4
+ Rakefile
5
+ TODO
6
+ lib/tagtreescanner.rb
7
+ test/test_simplemarkup.rb
8
+ test/test_tagtreescanner.rb
data/README ADDED
@@ -0,0 +1,191 @@
1
+ <b>TagTreeScanner</b>
2
+
3
+ Author:: Gavin Kistner (mailto:phrogz@mac.com)
4
+ Copyright:: Copyright (c)2005-2007 Gavin Kistner
5
+ License:: MIT License
6
+ Version:: 0.8.0 (2007-November-24)
7
+
8
+ = Overview
9
+
10
+ The TagTreeScanner class provides a generic framework for creating a
11
+ nested hierarchy of tags and text (like XML or HTML) by parsing text. An
12
+ example use (and the reason it was written) is to convert a wiki markup
13
+ syntax into HTML.
14
+
15
+ = Example Usage
16
+ require 'tagtreescanner'
17
+
18
+ class SimpleMarkup < TagTreeScanner
19
+ @root_factory.allows_text = false
20
+
21
+ @tag_genres[ :root ] = [ ]
22
+
23
+ @tag_genres[ :root ] << TagFactory.new( :paragraph,
24
+ # A line that doesn't have whitespace at the start
25
+ :open_match => /(?=\S)/, :open_requires_bol => true,
26
+
27
+ # Close when you see a double return
28
+ :close_match => /\n[ \t]*\n/,
29
+ :allows_text => true,
30
+ :allowed_genre => :inline
31
+ )
32
+
33
+ @tag_genres[ :root ] << TagFactory.new( :preformatted,
34
+ # Grab all lines that are indented up until a line that isn't
35
+ :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
36
+ :setup => lambda{ |tag, scanner, tagtree|
37
+ # Throw the contents I found into the tag
38
+ # but remove leading whitespace
39
+ tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
40
+ },
41
+ :autoclose => :true
42
+ )
43
+
44
+ @tag_genres[ :inline ] = [ ]
45
+
46
+ @tag_genres[ :inline ] << TagFactory.new( :bold,
47
+ # An asterisk followed by a letter or number
48
+ :open_match => /\*(?=[a-z0-9])/i,
49
+
50
+ # Close when I see an asterisk OR a newline coming up
51
+ :close_match => /\*|(?=\n)/,
52
+ :allows_text => true,
53
+ :allowed_genre => :inline
54
+ )
55
+
56
+ @tag_genres[ :inline ] << TagFactory.new( :italic,
57
+ # An underscore followed by a letter or number
58
+ :open_match => /_(?=[a-z0-9])/i,
59
+
60
+ # Close when I see an underscore OR a newline coming up
61
+ :close_match => /_|(?=\n)/,
62
+ :allows_text => true,
63
+ :allowed_genre => :inline
64
+ )
65
+ end
66
+
67
+ raw_text = <<ENDINPUT
68
+ Hello World! You're _soaking in_ my test.
69
+ This is a *subset* of markup that I allow.
70
+
71
+ Hi paragraph two. Yo! A code sample:
72
+
73
+ def foo
74
+ puts "Whee!"
75
+ end
76
+
77
+ _That, as they say, is that._
78
+
79
+ ENDINPUT
80
+
81
+ markup = SimpleMarkup.new( raw_text ).to_xml
82
+ puts markup
83
+
84
+
85
+ #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
86
+ #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
87
+ #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
88
+ #=> <preformatted>def foo
89
+ #=> puts "Whee!"
90
+ #=> end</preformatted>
91
+ #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
92
+
93
+ = Details
94
+
95
+ == TagFactories at 10,000 feet
96
+ Each possible output tag is described by a TagFactory, which specifies
97
+ some or all of the following:
98
+ * The name of the tags it creates <i>(required)</i>
99
+ * The regular expression to look for to start the tag
100
+ * The regular expression to look for to close the tag, or
101
+ * Whether the tag is automatically closed after creation
102
+ * What genre of tags are allowed within the tag
103
+ * Whether the tag supports raw text inside it
104
+ * Code to run when creating a tag
105
+
106
+ See the TagFactory class for more information on specifying factories.
107
+
108
+ == Genres as a State Machine
109
+ As a new tag is opened, the scanner uses the Tag#allowed_genre property
110
+ of that tag (set by the +allowed_genre+ property on the TagFactory) to
111
+ determine which tags to be looking for. A genre is specified by adding
112
+ an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
113
+ For example:
114
+ @tag_genres[ :inline ] = [ ]
115
+ adds a new genre named 'inline', with no tags in it. TagFactory instances
116
+ should be pushed onto this array <b>in the order that they should be looked
117
+ for</b>. For example:
118
+ @tag_genres[ :inline ] << TagFactory.new( :italic,
119
+ # see the TagFactory#initialize for options
120
+ )
121
+
122
+ Note that the +close_match+ regular expression of the current tag is
123
+ always checked before looking to open/create any new tags.
124
+
125
+ == Consuming Text
126
+ As the text is being parsed, there will (probably) be many cases where
127
+ you have raw text that doesn't close or open any new tags. Whenever the
128
+ scanner reaches this state, it runs the <tt>@text_match</tt> regexp
129
+ against the text to move the pointer ahead. If the current tag has
130
+ <tt>Tag#allows_text?</tt> set to +true+ (through
131
+ <tt>TagFactory#allows_text</tt>), then this text is added as contents of
132
+ the tag. If not, the text is thrown away.
133
+
134
+ The safest regular expression consumes only one character at a time:
135
+ @text_match = /./m
136
+
137
+ <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
138
+ <b><i>unless every single one of your tags is set to close upon seeing
139
+ a newline.</i></b>
140
+
141
+ Unfortunately, the safest regular expression is also the slowest. If
142
+ speed is an issue, your regexp should strive to eat as many characters as
143
+ possible at once...while ensuring that it doesn't eat characters that
144
+ would signify the start of a new tag.
145
+
146
+ For example, setting a regexp like:
147
+ @text_match = /\w+|./m
148
+ allows the scanner to match a whole word at a time. However, if you have
149
+ a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
150
+ the entire string would be eaten as text and the subscript tag would
151
+ never start.
152
+
153
+ == Using the Scanner
154
+ As shown in the example above, consumers of your class initialize it by
155
+ passing in the string to be parsed, and then calling #to_xml or #to_html
156
+ on it.
157
+
158
+ <i>(This two-step process allows the consumer to run other code after
159
+ the tag parsing, before final conversion. Examples might include
160
+ replacing special command tags with other input, or performing database
161
+ lookups on special wiki-page-link tags and replacing with HTML
162
+ anchors.)</i>
163
+
164
+ = Requirements
165
+ TagTreeScanner is built on top of the StringScanner library that is part
166
+ of the standard Ruby installation.
167
+
168
+ = License
169
+
170
+ (The MIT License)
171
+
172
+ Copyright (c) 2005-2007 Gavin Kistner
173
+
174
+ Permission is hereby granted, free of charge, to any person obtaining
175
+ a copy of this software and associated documentation files (the
176
+ 'Software'), to deal in the Software without restriction, including
177
+ without limitation the rights to use, copy, modify, merge, publish,
178
+ distribute, sublicense, and/or sell copies of the Software, and to
179
+ permit persons to whom the Software is furnished to do so, subject to
180
+ the following conditions:
181
+
182
+ The above copyright notice and this permission notice shall be
183
+ included in all copies or substantial portions of the Software.
184
+
185
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
186
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
187
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
188
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
189
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
190
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
191
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ # -*- ruby -*-
2
+
3
+ require 'rubygems'
4
+ require 'hoe'
5
+ require './lib/tagtreescanner.rb'
6
+
7
+ Hoe.new('tagtreescanner', TagTreeScanner::VERSION) do |p|
8
+ p.rubyforge_name = 'tagtreescanner'
9
+ p.author = 'Gavin Kistner'
10
+ p.email = 'phrogz@mac.com'
11
+ p.url = ''
12
+ p.summary = 'Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.'
13
+ p.description = IO.read( 'README' )[ /= Overview\n(.+?)^=/m, 1 ].rstrip
14
+ p.changes = IO.read( 'HISTORY' )[ /^=[^\n]+\n+(.+?)^=/m, 1 ].rstrip
15
+ p.remote_rdoc_dir = ''
16
+ end
17
+
18
+ # vim: syntax=Ruby
data/TODO ADDED
@@ -0,0 +1,11 @@
1
+ * Overhaul Tag and TextNode and TagTreeScanner to use a common DOM module
2
+ like <tt>Phrogz::DOM::OrderedTreeNode</tt>.
3
+
4
+ * Allow TagFactories to explicitly specify multiple allowed genres
5
+ and/or allowed tags, rather than only one genre.
6
+
7
+ * Provide a method like inner_html= for parsing and creating tag content.
8
+ * Useful for batch replacing the contents of a single tag with output from
9
+ another program, while maintaining the DOM integrity.
10
+
11
+ * More unit tests
@@ -0,0 +1,851 @@
1
+ # This file covers the TagTreeScanner class, and the extensions to the
2
+ # String class needed by it.
3
+ # Please see the documentation on those classes for more information.
4
+ #
5
+ # Author:: Gavin Kistner (mailto:phrogz@mac.com)
6
+ # Copyright:: Copyright (c)2005-2007 Gavin Kistner
7
+ # License:: MIT License
8
+ # Version:: 0.8.0 (2007-November-24)
9
+
10
+ require 'strscan'
11
+
12
+ # = Overview
13
+ # The TagTreeScanner class provides a generic framework for creating a
14
+ # nested hierarchy of tags and text (like XML or HTML) by parsing text. An
15
+ # example use (and the reason it was written) is to convert a wiki markup
16
+ # syntax into HTML.
17
+ #
18
+ # = Example Usage
19
+ # require 'TagTreeScanner'
20
+ #
21
+ # class SimpleMarkup < TagTreeScanner
22
+ # @root_factory.allows_text = false
23
+ #
24
+ # @tag_genres[ :root ] = [ ]
25
+ #
26
+ # @tag_genres[ :root ] << TagFactory.new( :paragraph,
27
+ # # A line that doesn't have whitespace at the start
28
+ # :open_match => /(?=\S)/, :open_requires_bol => true,
29
+ #
30
+ # # Close when you see a double return
31
+ # :close_match => /\n[ \t]*\n/,
32
+ # :allows_text => true,
33
+ # :allowed_genre => :inline
34
+ # )
35
+ #
36
+ # @tag_genres[ :root ] << TagFactory.new( :preformatted,
37
+ # # Grab all lines that are indented up until a line that isn't
38
+ # :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
39
+ # :setup => lambda{ |tag, scanner, tagtree|
40
+ # # Throw the contents I found into the tag
41
+ # # but remove leading whitespace
42
+ # tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
43
+ # },
44
+ # :autoclose => :true
45
+ # )
46
+ #
47
+ # @tag_genres[ :inline ] = [ ]
48
+ #
49
+ # @tag_genres[ :inline ] << TagFactory.new( :bold,
50
+ # # An asterisk followed by a letter or number
51
+ # :open_match => /\*(?=[a-z0-9])/i,
52
+ #
53
+ # # Close when I see an asterisk OR a newline coming up
54
+ # :close_match => /\*|(?=\n)/,
55
+ # :allows_text => true,
56
+ # :allowed_genre => :inline
57
+ # )
58
+ #
59
+ # @tag_genres[ :inline ] << TagFactory.new( :italic,
60
+ # # An underscore followed by a letter or number
61
+ # :open_match => /_(?=[a-z0-9])/i,
62
+ #
63
+ # # Close when I see an underscore OR a newline coming up
64
+ # :close_match => /_|(?=\n)/,
65
+ # :allows_text => true,
66
+ # :allowed_genre => :inline
67
+ # )
68
+ # end
69
+ #
70
+ # raw_text = <<ENDINPUT
71
+ # Hello World! You're _soaking in_ my test.
72
+ # This is a *subset* of markup that I allow.
73
+ #
74
+ # Hi paragraph two. Yo! A code sample:
75
+ #
76
+ # def foo
77
+ # puts "Whee!"
78
+ # end
79
+ #
80
+ # _That, as they say, is that._
81
+ #
82
+ # ENDINPUT
83
+ #
84
+ # markup = SimpleMarkup.new( raw_text ).to_xml
85
+ # puts markup
86
+ #
87
+ #
88
+ # #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
89
+ # #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
90
+ # #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
91
+ # #=> <preformatted>def foo
92
+ # #=> puts "Whee!"
93
+ # #=> end</preformatted>
94
+ # #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
95
+ #
96
+ #
97
+ # = Details
98
+ #
99
+ # == TagFactories at 10,000 feet
100
+ # Each possible output tag is described by a TagFactory, which specifies
101
+ # some or all of the following:
102
+ # * The name of the tags it creates <i>(required)</i>
103
+ # * The regular expression to look for to start the tag
104
+ # * The regular expression to look for to close the tag, or
105
+ # * Whether the tag is automatically closed after creation
106
+ # * What genre of tags are allowed within the tag
107
+ # * Whether the tag supports raw text inside it
108
+ # * Code to run when creating a tag
109
+ #
110
+ # See the TagFactory class for more information on specifying factories.
111
+ #
112
+ # == Genres as a State Machine
113
+ # As a new tag is opened, the scanner uses the Tag#allowed_genre property
114
+ # of that tag (set by the +allowed_genre+ property on the TagFactory) to
115
+ # determine which tags to be looking for. A genre is specified by adding
116
+ # an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
117
+ # For example:
118
+ # @tag_genres[ :inline ] = [ ]
119
+ # adds a new genre named 'inline', with no tags in it. TagFactory instances
120
+ # should be pushed onto this array <b>in the order that they should be looked
121
+ # for</b>. For example:
122
+ # @tag_genres[ :inline ] << TagFactory.new( :italic,
123
+ # # see the TagFactory#initialize for options
124
+ # )
125
+ #
126
+ # Note that the +close_match+ regular expression of the current tag is
127
+ # always checked before looking to open/create any new tags.
128
+ #
129
+ # == Consuming Text
130
+ # As the text is being parsed, there will (probably) be many cases where
131
+ # you have raw text that doesn't close or open any new tags. Whenever the
132
+ # scanner reaches this state, it runs the <tt>@text_match</tt> regexp
133
+ # against the text to move the pointer ahead. If the current tag has
134
+ # <tt>Tag#allows_text?</tt> set to +true+ (through
135
+ # <tt>TagFactory#allows_text</tt>), then this text is added as contents of
136
+ # the tag. If not, the text is thrown away.
137
+ #
138
+ # The safest regular expression consumes only one character at a time:
139
+ # @text_match = /./m
140
+ #
141
+ # <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
142
+ # <b><i>unless every single one of your tags is set to close upon seeing
143
+ # a newline.</i></b>
144
+ #
145
+ # Unfortunately, the safest regular expression is also the slowest. If
146
+ # speed is an issue, your regexp should strive to eat as many characters as
147
+ # possible at once...while ensuring that it doesn't eat characters that
148
+ # would signify the start of a new tag.
149
+ #
150
+ # For example, setting a regexp like:
151
+ # @text_match = /\w+|./m
152
+ # allows the scanner to match a whole word at a time. However, if you have
153
+ # a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
154
+ # the entire string would be eaten as text and the subscript tag would
155
+ # never start.
156
+ #
157
+ # == Using the Scanner
158
+ # As shown in the example above, consumers of your class initialize it by
159
+ # passing in the string to be parsed, and then calling #to_xml or #to_html
160
+ # on it.
161
+ #
162
+ # <i>(This two-step process allows the consumer to run other code after
163
+ # the tag parsing, before final conversion. Examples might include
164
+ # replacing special command tags with other input, or performing database
165
+ # lookups on special wiki-page-link tags and replacing with HTML
166
+ # anchors.)</i>
167
+ class TagTreeScanner
168
+ VERSION = "0.8.0"
169
+
170
+ # A TagFactory holds the information about a specific kind of tag:
171
+ # * the name of the tag
172
+ # * what to look for to open and close the tag
173
+ # * what genre of tags it may contain
174
+ # * whether the tag permits raw text
175
+ # * additional code to run when creating the tag
176
+ #
177
+ # See the documentation about the <tt>@tag_genres</tt> hash inside
178
+ # the TagTreeScanner class for information on how to add factories
179
+ # for use.
180
+ #
181
+ # === Utilizing <tt>:autoclose</tt>
182
+ # Occasionally you will want to
183
+ # create a tag and allow no other tags inside it. An example might be
184
+ # a tag containing preformatted code.
185
+ #
186
+ # Rather than opening the tag and slowly spinning through all the
187
+ # text, the combination of the <tt>:autoclose</tt> and
188
+ # <tt>:setup</tt> options allow you to create the tag, fill it with
189
+ # content, and then immediately continute with the parent tag.
190
+ #
191
+ # See the #new method for how to use the <tt>:setup</tt>
192
+ # function, and an example usage.
193
+ class TagFactory
194
+ # The type of tag this factory produces.
195
+ attr_accessor :tag_name
196
+
197
+ # A regexp to match (and consume) that causes a new tag to be started.
198
+ attr_accessor :open_match
199
+
200
+ # Does the #open_match regexp require beginning of line?
201
+ attr_accessor :open_requires_bol
202
+
203
+ # The regexp which causes the tag to automatically close.
204
+ attr_accessor :close_match
205
+
206
+ # Does the #open_match regexp require beginning of line?
207
+ attr_accessor :close_requires_bol
208
+
209
+ # Should this tag stay open when created, or automatically close?
210
+ attr_accessor :autoclose
211
+
212
+ # A symbol with the genre of tags that are allowed inside the tag.
213
+ # <i>(See @tag_genres in the TagTreeScanner documentation.)</i>
214
+ attr_accessor :allowed_genre
215
+
216
+ # May tags created by this factory have text added to them?
217
+ attr_accessor :allows_text
218
+
219
+ # __tag_name__:: A symbol with the name of the tag to create
220
+ # __options__:: A hash including one or more of <tt>:open_match</tt>,
221
+ # <tt>:open_requires_bol</tt>, <tt>:close_match</tt>,
222
+ # <tt>:close_requires_bol</tt>, <tt>:autoclose</tt>,
223
+ # <tt>:allows_text</tt>, <tt>:allowed_genre</tt>, and
224
+ # <tt>:setup</tt>.
225
+ #
226
+ # Due to the way the StringScanner class works, placing a <tt>^</tt>
227
+ # (beginning of line) marker in your <tt>:open_match</tt> or
228
+ # <tt>:close_match</tt> regular expressions will not behave as
229
+ # desired. Instead, set the <tt>:open_requires_bol</tt> and/or
230
+ # <tt>:close_requires_bol</tt> properties to +true+ if desired.
231
+ #
232
+ # A factory should either be set to <tt>:autoclose => true</tt>, or
233
+ # supply a <tt>:close_match</tt>. (Otherwise, it will never close.)
234
+ #
235
+ # Further, a factory should either be set to
236
+ # <tt>:autoclose => true</tt> or specify an <tt>:allowed_genre</tt>.
237
+ # <i>(See below for how to efficiently create a tag that cannot
238
+ # contain other tags.)</i>
239
+ #
240
+ # The <tt>:setup</tt> option is used to run code during the tag
241
+ # creation. The value of this option should be a lambda/Proc that
242
+ # accepts three parameters:
243
+ # * the <b>Tag</b> being created
244
+ # * the <b>StringScanner</b> instance that matched the tag opening
245
+ # * the <b>TagTreeScanner</b> instance creating the tag.
246
+ #
247
+ # === Example:
248
+ # # Shove URLs as HTML anchors, without the protocol prefix shown
249
+ # @tag_genres[ :inline ] << TagFactory.new( :a,
250
+ # :open_match => %r{http://(\S+)},
251
+ # :setup => lambda{ |tag, ss, tagtree|
252
+ # tag.attributes[ :href ] = ss[0]
253
+ # tag << ss[1]
254
+ # },
255
+ # :autoclose => true
256
+ # )
257
+ def initialize( tag_name, options={} )
258
+ @tag_name = tag_name
259
+ [ :open_match, :close_match,
260
+ :open_requires_bol, :close_requires_bol,
261
+ :allowed_genre, :autoclose,
262
+ :allows_text,
263
+ :setup, :attributes ].each{ |k|
264
+ self.instance_variable_set( "@#{k}".intern, options[ k ] )
265
+ }
266
+ end
267
+
268
+ # Creates and returns a new tag if the supplied _string_scanner_
269
+ # matches the +open_match+ of this factory.
270
+ #
271
+ # Called by TagTreeScanner during initialization.
272
+ def match( string_scanner, tagtreescanner ) #:nodoc:
273
+ #puts "Matching #{@open_match.inspect} against #{string_scanner.peek(10)}"
274
+ return nil unless ( !@open_requires_bol || string_scanner.bol? ) && string_scanner.scan( @open_match )
275
+ tag = maketag
276
+ @setup.call( tag, string_scanner, tagtreescanner ) if @setup
277
+ #puts "...created #{tag}"
278
+ tag
279
+ end
280
+
281
+ # Creates a tag from the factory manually
282
+ def create #:nodoc:
283
+ tag = maketag
284
+ @setup.call( tag, nil, nil ) if @setup
285
+ tag
286
+ end
287
+
288
+ private
289
+ # DRY common code
290
+ def maketag #:nodoc:
291
+ tag = Tag.new( @tag_name )
292
+ tag.factory = self
293
+ tag.attributes = @attributes if @attributes
294
+ tag
295
+ end
296
+ end
297
+
298
+ # Tags are the equivalent of a DOM Element. The majority of tags
299
+ # are created automatically by a TagFactory, but it may be
300
+ # necessary to create them directly in order to augment or replace
301
+ # information in the tag tree.
302
+ #
303
+ # A Tag may have one or more attributes, which are pairs of
304
+ # key/value strings; attributes are output in the HTML or XML
305
+ # representation of the Tag.
306
+ #
307
+ # Each tag also has an <tt>info</tt> hash, which may be used to
308
+ # keep track of extra bits of information about a tag. <i>Example
309
+ # usages might be keeping track of the depth of a list item, or the
310
+ # associated section for a header.</i> Information from the +info+
311
+ # hash is not output in the HTML or XML representations.
312
+ class Tag
313
+ # A symbol with the name of this tag
314
+ attr_accessor :name
315
+
316
+ # An array of child Tag or TextNode instances
317
+ attr_accessor :child_tags
318
+
319
+ # A hash of key/value attributes to emit in the XML/HTML
320
+ # representation
321
+ attr_accessor :attributes
322
+
323
+ # The TagFactory that created this tag (may be +nil+)
324
+ attr_accessor :factory
325
+
326
+ # A hash that may be used to store extra information about a Tag
327
+ attr_accessor :info
328
+
329
+ # The Tag to which this tag is attached (may be +nil+)
330
+ attr_reader :parent_tag
331
+
332
+ # The Tag or TextNode which immediately follows this tag
333
+ # (may be +nil+ if this is the last tag of its parent)
334
+ attr_reader :next_sibling
335
+
336
+ # The Tag or TextNode which immediately precedes this tag
337
+ # (may be +nil+ if this is the first tag of its parent)
338
+ attr_reader :previous_sibling
339
+
340
+ # _name_:: A symbol with the name of this tag
341
+ # _attributes_:: A hash of key/value pairs to store with this tag
342
+ def initialize( name, attributes={} )
343
+ @name = name
344
+ @child_tags = [ ]
345
+ @attributes = attributes
346
+ @info = {}
347
+ end
348
+
349
+ # Allows for settings HTML or XML-like attributes directly without
350
+ # knowing about the _attributes_ collection. For example:
351
+ # tag.href = 'http://www.google.com'
352
+ # tag.class = 'external'
353
+ # is the same as:
354
+ # tag.attributes['href'] = 'http://www.google.com'
355
+ # tag.attributes['class'] = 'external'
356
+ # ...for any attributes (like the above) that don't have the same
357
+ # name as an existing method or attribute on the Tag class.
358
+ def method_missing( name, *args )
359
+ if (name=name.to_s) =~ /=$/
360
+ @attributes[ name[0...-1] ] = (args.size==1 ? args[0] : args )
361
+ else
362
+ @attributes[ name ]
363
+ end
364
+ end
365
+
366
+ # Returns the +close_match+ property of the owning TagFactory,
367
+ # or +nil+ if this tag wasn't created by a factory.
368
+ def close_match
369
+ @factory && @factory.close_match
370
+ end
371
+
372
+ # Returns the +close_requires_bol+ property of the owning TagFactory,
373
+ # or +nil+ if this tag wasn't created by a factory.
374
+ def close_requires_bol?
375
+ @factory && @factory.close_requires_bol
376
+ end
377
+
378
+ # Returns the +autoclose+ property of the owning TagFactory,
379
+ # or +nil+ if this tag wasn't created by a factory.
380
+ def autoclose?
381
+ @factory && @factory.autoclose
382
+ end
383
+
384
+ # Returns the +allows_text+ property of the owning TagFactory,
385
+ # or +true+ if this tag wasn't created by a factory.
386
+ def allows_text?
387
+ @factory ? @factory.allows_text : true
388
+ end
389
+
390
+ # Returns the +allowed_genre+ property of the owning TagFactory,
391
+ # or +nil+ if this tag wasn't created by a factory.
392
+ def allowed_genre
393
+ @factory && @factory.allowed_genre
394
+ end
395
+
396
+ # _new_child_:: The Tag or TextNode to add as the last child.
397
+ #
398
+ # Adds _new_child_ to the end of this tag's +child_tags_ collection.
399
+ # Returns a reference to _new_child_.
400
+ #
401
+ # If _new_child_ is a child of another Tag, it is first removed from
402
+ # that tag.
403
+ def append_child( new_child )
404
+ return if new_child == @child_tags.last
405
+ insert_after( new_child, @child_tags.last )
406
+ end
407
+
408
+ # _new_child_:: The Tag or TextNode to add as a child of this tag.
409
+ # _reference_child_:: The child to place _new_child_ before.
410
+ #
411
+ # Adds _new_child_ as a child of this tag, immediately before the
412
+ # location of _reference_child_. Returns a reference to _new_child_.
413
+ #
414
+ # If _reference_child_ is +nil+, the child is added as the last
415
+ # child of this tag. A RuntimeError is raised if _reference_child_
416
+ # is not a child of this tag.
417
+ #
418
+ # If _new_child_ is a child of another Tag, #remove_child is
419
+ # automatically invoked to remove it from that tag.
420
+ def insert_before( new_child, reference_child=nil )
421
+ return new_child if reference_child ? ( reference_child.previous_sibling == new_child ) : ( new_child == @child_tags.last )
422
+ insert_after( new_child, reference_child ? reference_child.previous_sibling : @child_tags.last )
423
+ end
424
+
425
+ # _new_child_:: The Tag or TextNode to add as a child of this tag.
426
+ # _reference_child_:: The child to place _new_child_ after.
427
+ #
428
+ # Adds _new_child_ as a child of this tag, immediately after the
429
+ # location of _reference_child_. Returns a reference to _new_child_.
430
+ #
431
+ # If _reference_child_ is +nil+, the child is added as the first
432
+ # child of this tag. A RuntimeError is raised if _reference_child_
433
+ # is not a child of this tag.
434
+ #
435
+ # If _new_child_ is a child of another Tag, #remove_child is
436
+ # automatically invoked to remove it from that tag.
437
+ def insert_after( new_child, reference_child=nil )
438
+ #puts "#{self.inspect}#insert_after( #{new_child.inspect}, #{reference_child.inspect} )"
439
+ return new_child if reference_child ? ( reference_child.next_sibling == new_child ) : ( new_child == @child_tags.first )
440
+
441
+ #Ensure new_child is not not an ancestor of self
442
+ walker = self
443
+ while walker
444
+ raise "#{new_child.inspect} cannot be added under #{self.inspect}, because it is an ancestor of it!" if walker==new_child
445
+ walker = walker.parent_tag
446
+ end
447
+
448
+ new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
449
+ if reference_child
450
+ new_idx = @child_tags.index( reference_child )
451
+ raise "#{reference_child.inspect} is not a child of #{self.inspect}" unless new_idx
452
+ new_idx += 1
453
+ else
454
+ new_idx = 0
455
+ end
456
+ new_child.parent_tag = self
457
+ succ = @child_tags[ new_idx ]
458
+ @child_tags.insert( new_idx, new_child )
459
+ new_child.previous_sibling = reference_child
460
+ reference_child.next_sibling = new_child if reference_child
461
+ new_child.next_sibling = succ
462
+ succ.previous_sibling = new_child if succ
463
+ new_child
464
+ end
465
+
466
+ # _existing_child_:: The Tag or TextNode to remove.
467
+ #
468
+ # Removes _existing_child_ from being a child of this tag.
469
+ # Returns _existing_child_.
470
+ #
471
+ # A RuntimeError is raised if _existing_child_ is not a child of
472
+ # this tag.
473
+ #
474
+ # If _new_child_ is a child of another Tag, #remove_child is
475
+ # automatically invoked to remove it from that tag.
476
+ def remove_child( existing_child )
477
+ idx = @child_tags.index( existing_child )
478
+ raise "#{existing_child.inspect} is not a child of #{self.inspect}" unless idx
479
+ prev, succ = existing_child.previous_sibling, existing_child.next_sibling
480
+ prev.next_sibling = succ if prev
481
+ succ.previous_sibling = prev if succ
482
+ @child_tags.delete_at( idx )
483
+ existing_child.previous_sibling = existing_child.next_sibling = existing_child.parent_tag = nil
484
+ existing_child
485
+ end
486
+
487
+ # _old_child_:: The existing child Tag or TextNode to replace.
488
+ # _new_child_:: The Tag or TextNode to replace _old_child_.
489
+ #
490
+ # Replaces _old_child_ with _new_child_ in this collection.
491
+ # Returns _old_child_.
492
+ #
493
+ # A RuntimeError is raised if _existing_child_ is not a child of
494
+ # this tag.
495
+ #
496
+ # If _new_child_ is a child of another Tag, #remove_child is
497
+ # automatically invoked to remove it from that tag.
498
+ def replace_child( old_child, new_child )
499
+ if ( prev = old_child.previous_sibling ) == new_child || old_child.next_sibling == new_child
500
+ remove_child( old_child )
501
+ else
502
+ new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
503
+ remove_child( old_child )
504
+ insert_after( new_child, prev )
505
+ end
506
+ old_child
507
+ end
508
+
509
+ # _new_child_:: The Tag or TextNode to replace this tag.
510
+ #
511
+ # Replaces this tag with _new_child_. Returns _new_child_.
512
+ #
513
+ # A RuntimeError is raised if this tag is not a child of another tag.
514
+ #
515
+ # If _new_child_ is a child of another Tag, #remove_child is
516
+ # automatically invoked to remove it from that tag.
517
+ def replace_with( new_child )
518
+ return new_child if new_child == self
519
+ raise "#{self.inspect} is not a child of another tag" unless @parent_tag
520
+ @parent_tag.replace_child( self, new_child )
521
+ new_child
522
+ end
523
+
524
+ # _additional_text_:: The text to add to this node.
525
+ #
526
+ # Appends _additional_text_ to this tag. If the last item in the
527
+ # +child_tags+ collection is a TextNode, the text is added to that
528
+ # item; otherwise, a new TextNode is created with _additional_text_
529
+ # and added as the last child of this tag.
530
+ def << ( additional_text )
531
+ last_child = @child_tags.last
532
+ if last_child.is_a? TextNode
533
+ last_child << additional_text
534
+ else
535
+ append_child( TextNode.new( additional_text ) )
536
+ end
537
+ end
538
+
539
+ # Set the text content of this element to _new_contents_
540
+ # Removes any child tags (and their text)
541
+ def text=( new_contents )
542
+ @child_tags.clear
543
+ append_child( TextNode.new( new_contents ) )
544
+ end
545
+
546
+ alias_method :inner_text=, :text=
547
+
548
+ # Returns an HTML representation of this tag and all its descendants.
549
+ #
550
+ # This method is the same as #to_xml except that tags without
551
+ # any +child_tags+ use an explicit close tag, e.g.
552
+ # <tt><div></div></tt> instead of XML's <tt><div /></tt>
553
+ def to_html
554
+ to_xml( false )
555
+ end
556
+
557
+ # Returns an XML representation of this tag and all its descendants.
558
+ #
559
+ # If _empty_tags_collapsed_ is +true+ (the default) then this method
560
+ # is the same as #to_html except that tags without any +child_tags+
561
+ # use a single closed tag, e.g.
562
+ # <tt><div /></tt> instead of HTML's <tt><div></div></tt>
563
+ #
564
+ # If _empty_tags_collapsed_ is +false+, this is the same as #to_html.
565
+ def to_xml( empty_tags_collapsed=true )
566
+ out = "<#{@name}"
567
+ @attributes.each{ |k,v| out << " #{k}=\"#{v.to_s.gsub( '""', '&quot;' )}\"" }
568
+ if empty_tags_collapsed && @child_tags.empty?
569
+ out << ' />'
570
+ else
571
+ out << '>'
572
+ unless @child_tags.empty?
573
+ out << "\n" unless self.allows_text?
574
+ @child_tags.each{ |tag|
575
+ out << tag.to_xml( empty_tags_collapsed )
576
+ }
577
+ end
578
+ out << "</#{@name}>"
579
+ end
580
+ out << "\n" if @parent_tag && !@parent_tag.allows_text?
581
+ out
582
+ end
583
+
584
+ # Returns an array of all descendants of this tag whose #name
585
+ # matches the supplied _name_.
586
+ def tags_by_name( name )
587
+ out = []
588
+ @child_tags.each{ |tag|
589
+ out << tag if tag.name == name
590
+ unless tag.child_tags.empty?
591
+ out.concat( tag.tags_by_name( name ) )
592
+ end
593
+ }
594
+ out
595
+ end
596
+
597
+ # Returns the text contents of this tag and its descendants.
598
+ def inner_text
599
+ @child_tags.inject(''){ |out,tag|
600
+ out << ( tag.is_a?( TextNode ) ? tag.text : tag.inner_text )
601
+ }
602
+ end
603
+
604
+ def inspect #:nodoc:
605
+ out = "<#{@name}"
606
+ #out << " @pops=#{@parent_tag ? @parent_tag.name.inspect : 'nil'}"
607
+ #out << " @prev=#{@previous_sibling ? @previous_sibling.name.inspect : 'nil'}"
608
+ #out << " @next=#{@next_sibling ? @next_sibling.name.inspect : 'nil'}"
609
+ @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
610
+ @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
611
+ children = @child_tags.length
612
+ if children == 1 && TextNode === @child_tags.first
613
+ out << ">#{@child_tags.first}</#{@name}"
614
+ elsif children == 0
615
+ out << '>'
616
+ else
617
+ out << " (#{@child_tags.length} child#{@child_tags.length != 1 ? 'ren' : ''})>"
618
+ end
619
+ end
620
+
621
+ # _level_:: The indentation level (tabs) to start at.
622
+ #
623
+ # Returns a full-hierarchical representation of this tag and its
624
+ # descendants. (Used for debugging.)
625
+ def to_hier( level=0 ) #:nodoc:
626
+ tabs = "\t" * level
627
+ out = "#{tabs}<#{@name}"
628
+ @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
629
+ @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
630
+ if @child_tags.empty?
631
+ out << " />\n"
632
+ elsif @child_tags.length == 1 && TextNode === @child_tags.first
633
+ out << ">#{@child_tags.first}</#{@name}>\n"
634
+ else
635
+ out << ">\n"
636
+ @child_tags.each{ |n| out << n.to_hier(level+1) }
637
+ out << "#{tabs}</#{@name}>\n"
638
+ end
639
+ out
640
+ end
641
+
642
+ # Returns a copy of this tag and its entire hierarchy.
643
+ # All descendant tags/text nodes are also cloned.
644
+ #
645
+ # The +info+ hash is not preserved.
646
+ def dup
647
+ tag = self.class.new( self.name, self.attributes.dup )
648
+ @child_tags.each{ |tag2| tag.append_child( tag2.dup ) }
649
+ tag
650
+ end
651
+
652
+ # :stopdoc:
653
+ protected
654
+ attr_writer :previous_sibling, :next_sibling, :parent_tag
655
+ # :startdoc:
656
+
657
+ end
658
+
659
+ # A TextNode holds raw text inside a Tag. Generally, TextNodes are
660
+ # created automatically by the Tag#<< method.
661
+ class TextNode
662
+ # The Tag or TextNode that comes after this one (may be +nil+)
663
+ attr_accessor :next_sibling
664
+
665
+ # The Tag or TextNode that comes before this one (may be +nil+)
666
+ attr_accessor :previous_sibling
667
+
668
+ # The Tag that is a parent of this TextNode (may be +nil+)
669
+ attr_accessor :parent_tag
670
+
671
+ # A hash which may be used to store 'extra' information
672
+ attr_accessor :info
673
+
674
+ # The string contents of this text node
675
+ attr_accessor :text
676
+
677
+ # _text_:: The text to start out with
678
+ def initialize( text='' )
679
+ @text = text
680
+ @info = {}
681
+ end
682
+
683
+ # _additional_text_:: The text to add
684
+ #
685
+ # Appends the provided text to the end of the current text
686
+ #
687
+ # Returns the new text value
688
+ def << ( additional_text )
689
+ @text << additional_text
690
+ end
691
+
692
+ # Returns a copy of this text node
693
+ def dup
694
+ tag = self.class.new( @text.dup )
695
+ end
696
+
697
+ def to_hier( level=0 ) #:nodoc:
698
+ "#{"\t"*level}#{@text.inspect}\n"
699
+ end
700
+
701
+ def to_s #:nodoc:
702
+ @text
703
+ end
704
+
705
+ # Returns the contents of this node, modified to be made XML-safe
706
+ # by calling String#xmlsafe.
707
+ def to_xml( *args )
708
+ @text.xmlsafe
709
+ end
710
+ end
711
+
712
+ # RDoc thinks that this stuff applies to instances, not the class
713
+ # :stopdoc:
714
+ class << self
715
+ attr_accessor :tag_genres, :root_factory, :text_match
716
+ end
717
+ # :startdoc:
718
+
719
+ # The tag_genres hash maps a genre name onto an array of TagFactories.
720
+ #
721
+ # Factories are tested in the order they appear in the genre array;
722
+ # more important matches are at the top, generic fallback ones
723
+ # should appear at the end of the list.
724
+ #
725
+ # If no factory matches the current input, then text is shoved into the
726
+ # most recent tag until a new tag start is found, or the closing match
727
+ # is met. (If the current tag's factory does not have :allows_text set
728
+ # to true, then the text is simply thrown away until a the closing or
729
+ # new tag start is found.)
730
+ @tag_genres = { }
731
+
732
+ # Settings for the root of your document: what genre is allowed at the
733
+ # highest level, and should raw text be allowed there?
734
+ #
735
+ # Override in your class by setting a class-instance variable as below.
736
+ @root_factory = TagFactory.new( :root,
737
+ :allowed_genre => :root,
738
+ :allows_text => true )
739
+
740
+ # The pattern to consume and shove as text whenever no tag start/close
741
+ # is found. Eating one character at a time is safest, but slow.
742
+ # Ensure that this pattern never lets you over the start of a tag,
743
+ # or else you'll miss it.
744
+ @text_match = /./m
745
+
746
+ # Scans through _string_to_parse_ and builds a tree of tags based
747
+ # on the regular expressions and rules set by the TagFactory
748
+ # instances present in <tt>@tag_genres</tt>.
749
+ #
750
+ # After parsing the tree, call #to_xml or #to_html to retrieve
751
+ # a string representation.
752
+ def initialize( string_to_parse )
753
+ current = @root = self.class.root_factory.create
754
+ tag_genres = self.class.tag_genres
755
+ text_match = self.class.text_match
756
+
757
+ ss = StringScanner.new( string_to_parse )
758
+ while !ss.eos?
759
+ # Keep popping off the current tag until we get to the root,
760
+ # as long as the end criteria is met
761
+ while ( current != @root ) && (!current.close_requires_bol? || ss.bol?) && ss.scan( current.close_match )
762
+ current = current.parent_tag || @root
763
+ end
764
+
765
+ # No point in continuing if closing out tags consumed the rest of the string
766
+ break if ss.eos?
767
+
768
+ # Look for a tag to open
769
+ if factories = tag_genres[ current.allowed_genre ]
770
+ tag = nil
771
+ factories.each{ |factory|
772
+ if tag = factory.match( ss, self )
773
+ current.append_child( tag )
774
+ current = tag unless tag.autoclose?
775
+ break
776
+ end
777
+ }
778
+ #start at the top of the loop if we found one
779
+ next if tag
780
+ end
781
+
782
+ # Couldn't find a valid tag at this spot
783
+ # so we need to eat some characters
784
+ consumed = ss.scan( text_match )
785
+ current << consumed if current.allows_text?
786
+ end
787
+ end
788
+
789
+ # Returns an HTML representation of the tag tree.
790
+ #
791
+ # This is the same as the #to_xml method except that empty tags use an
792
+ # explicit close tag, e.g. <tt><div></div></tt> versus <tt><div /></tt>
793
+ def to_html
794
+ @root.child_tags.inject(''){ |out, tag| out << tag.to_html }
795
+ end
796
+
797
+ # Returns an XML representation of the tag tree.
798
+ #
799
+ # This method is the same as the #to_html method except that empty tags
800
+ # do not use an explicit close tag,
801
+ # e.g. <tt><div /></tt> versus <tt><div></div></tt>
802
+ def to_xml
803
+ @root.child_tags.inject(''){ |out, tag| out << tag.to_xml }
804
+ end
805
+
806
+ # Returns an array of all root-level tags found
807
+ def tags
808
+ @root.child_tags
809
+ end
810
+
811
+ # Returns an array of all tags in the tree whose Tag#name matches
812
+ # the supplied _name_.
813
+ def tags_by_name( name )
814
+ @root.tags_by_type( name )
815
+ end
816
+
817
+ # Returns a hierarchical representation of the entire tag tree
818
+ def inspect #:nodoc:
819
+ @root.to_hier
820
+ end
821
+
822
+ # When a class inherits from TagTreeScanner, defaults are set for
823
+ # <tt>@tag_genres</tt>, <tt>@root_factory</tt> and
824
+ # <tt>@text_match</tt>
825
+ def self.inherited( child_class ) #:nodoc:
826
+ child_class.tag_genres = @tag_genres
827
+ child_class.root_factory = @root_factory
828
+ child_class.text_match = @text_match
829
+ end
830
+ end
831
+
832
+ # Extensions to the built-in String class
833
+ class String
834
+
835
+ # Returns a copy of the string with all <tt>&</tt>, <tt><</tt> and
836
+ # <tt>></tt> characters replaced by their equivalent XML entities
837
+ # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
838
+ def xmlsafe
839
+ self.dup.xmlsafe!
840
+ end
841
+
842
+ # Modifies the string, replacing all <tt>&</tt>, <tt><</tt> and
843
+ # <tt>></tt> characters with their equivalent XML entities
844
+ # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
845
+ def xmlsafe!
846
+ self.gsub!( /&/, '&amp;' )
847
+ self.gsub!( /</, '&lt;' )
848
+ self.gsub!( />/, '&gt;' )
849
+ self
850
+ end
851
+ end
@@ -0,0 +1,84 @@
1
+ require "test/unit"
2
+ require "../lib/tagtreescanner.rb"
3
+
4
+ class SimpleMarkup < TagTreeScanner
5
+ @root_factory.allows_text = false
6
+
7
+ @tag_genres[ :root ] = [ ]
8
+
9
+ @tag_genres[ :root ] << TagFactory.new( :paragraph,
10
+ # A line that doesn't have whitespace at the start
11
+ :open_match => /(?=\S)/, :open_requires_bol => true,
12
+
13
+ # Close when you see a double return
14
+ :close_match => /\n[ \t]*\n/,
15
+ :allows_text => :true,
16
+ :allowed_genre => :inline
17
+ )
18
+
19
+ @tag_genres[ :root ] << TagFactory.new( :preformatted,
20
+ # Grab all lines that are indented up until a line that isn't
21
+ :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
22
+ :setup => lambda{ |tag, scanner, tagtree|
23
+ # Throw the contents I found into the tag
24
+ # but remove leading whitespace
25
+ tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
26
+ },
27
+ :autoclose => :true
28
+ )
29
+
30
+ @tag_genres[ :inline ] = [ ]
31
+
32
+ @tag_genres[ :inline ] << TagFactory.new( :bold,
33
+ # An asterisk followed by a letter or number
34
+ :open_match => /\*(?=[a-z0-9])/i,
35
+
36
+ # Close when I see an asterisk OR a newline coming up
37
+ :close_match => /\*|(?=\n)/,
38
+ :allows_text => true,
39
+ :allowed_genre => :inline
40
+ )
41
+
42
+ @tag_genres[ :inline ] << TagFactory.new( :italic,
43
+ # An underscore followed by a letter or number
44
+ :open_match => /_(?=[a-z0-9])/i,
45
+
46
+ # Close when I see an underscore OR a newline coming up
47
+ :close_match => /_|(?=\n)/,
48
+ :allows_text => true,
49
+ :allowed_genre => :inline
50
+ )
51
+ end
52
+
53
+ class Tag_Test < Test::Unit::TestCase
54
+ def setup
55
+ end
56
+
57
+ def test_conversion
58
+ raw_text = <<-ENDINPUT
59
+ Hello World! You're _soaking in_ my test.
60
+ This is a *subset* of markup that I allow.
61
+
62
+ Hi paragraph two. Yo! A code sample:
63
+
64
+ def foo
65
+ puts "Whee!"
66
+ end
67
+
68
+ _That, as they say, is that._
69
+
70
+ ENDINPUT
71
+
72
+ markup = SimpleMarkup.new( raw_text ).to_xml
73
+ p '',markup
74
+ end
75
+ end
76
+
77
+
78
+ #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
79
+ #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
80
+ #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
81
+ #=> <preformatted>def foo
82
+ #=> puts "Whee!"
83
+ #=> end</preformatted>
84
+ #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
@@ -0,0 +1,104 @@
1
+ require "test/unit"
2
+ require "../lib/tagtreescanner"
3
+
4
+ class Tag_Test < Test::Unit::TestCase
5
+ def setup
6
+ end
7
+
8
+ def test1_tags
9
+ root = TagTreeScanner::Tag.new( :root, { :is_root => true } )
10
+ assert_equal( :root, root.name )
11
+ assert_equal( true, root.attributes[ :is_root ] )
12
+ assert_nil( root.allowed_genre )
13
+ assert( root.allows_text? )
14
+
15
+ t1 = TagTreeScanner::Tag.new( :t1 )
16
+ root.append_child( t1 )
17
+ assert_equal( 1, root.child_tags.length )
18
+ assert_equal( t1, root.child_tags.first )
19
+
20
+ t2 = TagTreeScanner::Tag.new( :t2 )
21
+ root.append_child( t2 )
22
+ assert_equal( 2, root.child_tags.length )
23
+ assert_equal( t2, root.child_tags.last )
24
+
25
+ t3 = TagTreeScanner::Tag.new( :t3 )
26
+ root.insert_before( t3, t2 )
27
+ assert_equal( 3, root.child_tags.length )
28
+ assert_equal( [t1,t3,t2], root.child_tags )
29
+
30
+ root.append_child( t1 )
31
+ assert_equal( [t3,t2,t1], root.child_tags )
32
+
33
+ t1.replace_with( t3 )
34
+ assert_equal( [t2,t3], root.child_tags )
35
+ assert_nil( t1.parent_tag )
36
+
37
+ root.insert_before( t1, t2 )
38
+ assert_equal( [t1,t2,t3], root.child_tags )
39
+ assert_equal( root, t1.parent_tag )
40
+
41
+ root.append_child( t1 )
42
+ assert_equal( [t2,t3,t1], root.child_tags )
43
+ assert_equal( root, t1.parent_tag )
44
+ assert_nil( t1.next_sibling )
45
+ assert_nil( t2.previous_sibling )
46
+
47
+ t1.append_child( t3 )
48
+ assert_equal( [t2,t1], root.child_tags )
49
+ assert_nil( t3.next_sibling )
50
+ assert_nil( t3.previous_sibling )
51
+ assert_equal( t1, t2.next_sibling )
52
+ assert_equal( t2, t1.previous_sibling )
53
+ assert_equal( t3, t1.child_tags.first )
54
+
55
+ assert_raise( RuntimeError ){
56
+ t3.append_child( t1 )
57
+ }
58
+
59
+ assert_raise( RuntimeError ){
60
+ t1.append_child( t1 )
61
+ }
62
+ end
63
+
64
+ def test2_tags2
65
+ root = TagTreeScanner::Tag.new( :root )
66
+ # make a ton of tags...
67
+ 1.upto(100){ |i|
68
+ root.append_child( TagTreeScanner::Tag.new( "t#{i}".intern ) )
69
+ }
70
+
71
+ # ...shuffle the hell out of them...
72
+ 500.times{
73
+ next unless n1 = root.child_tags[ rand( root.child_tags.length ) ]
74
+ n2 = root.child_tags[ rand( root.child_tags.length ) ]
75
+ next if n1 == n2
76
+ case rand(30)
77
+ when 0
78
+ root.remove_child( n1 )
79
+ when 1
80
+ root.append_child( n1 )
81
+ when 2
82
+ root.insert_before( n1, nil )
83
+ when 3
84
+ root.insert_after( n1, nil )
85
+ when 4
86
+ root.insert_before( n1, n2 )
87
+ when 5
88
+ n1.replace_with( n2 )
89
+ else
90
+ root.insert_after( n1, n2 )
91
+ end
92
+ }
93
+
94
+ # ...and now ensure that they're all properly linked
95
+ last_tag = nil
96
+ root.child_tags.each{ |tag|
97
+ assert_equal( last_tag, tag.previous_sibling )
98
+ assert_equal( tag, last_tag.next_sibling ) if last_tag
99
+ assert_equal( root, tag.parent_tag )
100
+ last_tag = tag
101
+ }
102
+ assert_nil( last_tag.next_sibling ) if last_tag
103
+ end
104
+ end
metadata ADDED
@@ -0,0 +1,63 @@
1
+ --- !ruby/object:Gem::Specification
2
+ rubygems_version: 0.9.4
3
+ specification_version: 1
4
+ name: tagtreescanner
5
+ version: !ruby/object:Gem::Version
6
+ version: 0.8.0
7
+ date: 2007-11-25 00:00:00 -07:00
8
+ summary: Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.
9
+ require_paths:
10
+ - lib
11
+ email: phrogz@mac.com
12
+ homepage:
13
+ rubyforge_project: tagtreescanner
14
+ description: The TagTreeScanner class provides a generic framework for creating a nested hierarchy of tags and text (like XML or HTML) by parsing text. An example use (and the reason it was written) is to convert a wiki markup syntax into HTML.
15
+ autorequire:
16
+ default_executable:
17
+ bindir: bin
18
+ has_rdoc: true
19
+ required_ruby_version: !ruby/object:Gem::Version::Requirement
20
+ requirements:
21
+ - - ">"
22
+ - !ruby/object:Gem::Version
23
+ version: 0.0.0
24
+ version:
25
+ platform: ruby
26
+ signing_key:
27
+ cert_chain:
28
+ post_install_message:
29
+ authors:
30
+ - Gavin Kistner
31
+ files:
32
+ - HISTORY
33
+ - Manifest.txt
34
+ - README
35
+ - Rakefile
36
+ - TODO
37
+ - lib/tagtreescanner.rb
38
+ - test/test_simplemarkup.rb
39
+ - test/test_tagtreescanner.rb
40
+ test_files:
41
+ - test/test_simplemarkup.rb
42
+ - test/test_tagtreescanner.rb
43
+ rdoc_options:
44
+ - --main
45
+ - README.txt
46
+ extra_rdoc_files:
47
+ - Manifest.txt
48
+ executables: []
49
+
50
+ extensions: []
51
+
52
+ requirements: []
53
+
54
+ dependencies:
55
+ - !ruby/object:Gem::Dependency
56
+ name: hoe
57
+ version_requirement:
58
+ version_requirements: !ruby/object:Gem::Version::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: 1.3.0
63
+ version: