tagtreescanner 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
data/HISTORY ADDED
@@ -0,0 +1,17 @@
1
+ == 0.8.0 / 2007-November-25
2
+
3
+ * First release as a gem. Breaks backwards compatibility with older versions.
4
+
5
+ * Changed TagTreeScanner::Tag#tag_name to TagTreeScanner::Tag#name
6
+ * ...because it was dumb to write "tag.tag_name = 'span'"
7
+
8
+ * Added a method_missing hack to TagTreeScanner::Tag that delegates
9
+ to read/write from its attributes hash.
10
+ * ...because I wanted people to be able to write "tag.href = 'foo'"
11
+
12
+ * New TagTreeScanner::Tag#text= method to directly set the contents of
13
+ a tag, clearing out any other junk.
14
+
15
+ == 0.6.1 / 2005-July-5
16
+
17
+ * Initial public release
data/Manifest.txt ADDED
@@ -0,0 +1,8 @@
1
+ HISTORY
2
+ Manifest.txt
3
+ README
4
+ Rakefile
5
+ TODO
6
+ lib/tagtreescanner.rb
7
+ test/test_simplemarkup.rb
8
+ test/test_tagtreescanner.rb
data/README ADDED
@@ -0,0 +1,191 @@
1
+ <b>TagTreeScanner</b>
2
+
3
+ Author:: Gavin Kistner (mailto:phrogz@mac.com)
4
+ Copyright:: Copyright (c)2005-2007 Gavin Kistner
5
+ License:: MIT License
6
+ Version:: 0.8.0 (2007-November-24)
7
+
8
+ = Overview
9
+
10
+ The TagTreeScanner class provides a generic framework for creating a
11
+ nested hierarchy of tags and text (like XML or HTML) by parsing text. An
12
+ example use (and the reason it was written) is to convert a wiki markup
13
+ syntax into HTML.
14
+
15
+ = Example Usage
16
+ require 'tagtreescanner'
17
+
18
+ class SimpleMarkup < TagTreeScanner
19
+ @root_factory.allows_text = false
20
+
21
+ @tag_genres[ :root ] = [ ]
22
+
23
+ @tag_genres[ :root ] << TagFactory.new( :paragraph,
24
+ # A line that doesn't have whitespace at the start
25
+ :open_match => /(?=\S)/, :open_requires_bol => true,
26
+
27
+ # Close when you see a double return
28
+ :close_match => /\n[ \t]*\n/,
29
+ :allows_text => true,
30
+ :allowed_genre => :inline
31
+ )
32
+
33
+ @tag_genres[ :root ] << TagFactory.new( :preformatted,
34
+ # Grab all lines that are indented up until a line that isn't
35
+ :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
36
+ :setup => lambda{ |tag, scanner, tagtree|
37
+ # Throw the contents I found into the tag
38
+ # but remove leading whitespace
39
+ tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
40
+ },
41
+ :autoclose => :true
42
+ )
43
+
44
+ @tag_genres[ :inline ] = [ ]
45
+
46
+ @tag_genres[ :inline ] << TagFactory.new( :bold,
47
+ # An asterisk followed by a letter or number
48
+ :open_match => /\*(?=[a-z0-9])/i,
49
+
50
+ # Close when I see an asterisk OR a newline coming up
51
+ :close_match => /\*|(?=\n)/,
52
+ :allows_text => true,
53
+ :allowed_genre => :inline
54
+ )
55
+
56
+ @tag_genres[ :inline ] << TagFactory.new( :italic,
57
+ # An underscore followed by a letter or number
58
+ :open_match => /_(?=[a-z0-9])/i,
59
+
60
+ # Close when I see an underscore OR a newline coming up
61
+ :close_match => /_|(?=\n)/,
62
+ :allows_text => true,
63
+ :allowed_genre => :inline
64
+ )
65
+ end
66
+
67
+ raw_text = <<ENDINPUT
68
+ Hello World! You're _soaking in_ my test.
69
+ This is a *subset* of markup that I allow.
70
+
71
+ Hi paragraph two. Yo! A code sample:
72
+
73
+ def foo
74
+ puts "Whee!"
75
+ end
76
+
77
+ _That, as they say, is that._
78
+
79
+ ENDINPUT
80
+
81
+ markup = SimpleMarkup.new( raw_text ).to_xml
82
+ puts markup
83
+
84
+
85
+ #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
86
+ #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
87
+ #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
88
+ #=> <preformatted>def foo
89
+ #=> puts "Whee!"
90
+ #=> end</preformatted>
91
+ #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
92
+
93
+ = Details
94
+
95
+ == TagFactories at 10,000 feet
96
+ Each possible output tag is described by a TagFactory, which specifies
97
+ some or all of the following:
98
+ * The name of the tags it creates <i>(required)</i>
99
+ * The regular expression to look for to start the tag
100
+ * The regular expression to look for to close the tag, or
101
+ * Whether the tag is automatically closed after creation
102
+ * What genre of tags are allowed within the tag
103
+ * Whether the tag supports raw text inside it
104
+ * Code to run when creating a tag
105
+
106
+ See the TagFactory class for more information on specifying factories.
107
+
108
+ == Genres as a State Machine
109
+ As a new tag is opened, the scanner uses the Tag#allowed_genre property
110
+ of that tag (set by the +allowed_genre+ property on the TagFactory) to
111
+ determine which tags to be looking for. A genre is specified by adding
112
+ an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
113
+ For example:
114
+ @tag_genres[ :inline ] = [ ]
115
+ adds a new genre named 'inline', with no tags in it. TagFactory instances
116
+ should be pushed onto this array <b>in the order that they should be looked
117
+ for</b>. For example:
118
+ @tag_genres[ :inline ] << TagFactory.new( :italic,
119
+ # see the TagFactory#initialize for options
120
+ )
121
+
122
+ Note that the +close_match+ regular expression of the current tag is
123
+ always checked before looking to open/create any new tags.
124
+
125
+ == Consuming Text
126
+ As the text is being parsed, there will (probably) be many cases where
127
+ you have raw text that doesn't close or open any new tags. Whenever the
128
+ scanner reaches this state, it runs the <tt>@text_match</tt> regexp
129
+ against the text to move the pointer ahead. If the current tag has
130
+ <tt>Tag#allows_text?</tt> set to +true+ (through
131
+ <tt>TagFactory#allows_text</tt>), then this text is added as contents of
132
+ the tag. If not, the text is thrown away.
133
+
134
+ The safest regular expression consumes only one character at a time:
135
+ @text_match = /./m
136
+
137
+ <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
138
+ <b><i>unless every single one of your tags is set to close upon seeing
139
+ a newline.</i></b>
140
+
141
+ Unfortunately, the safest regular expression is also the slowest. If
142
+ speed is an issue, your regexp should strive to eat as many characters as
143
+ possible at once...while ensuring that it doesn't eat characters that
144
+ would signify the start of a new tag.
145
+
146
+ For example, setting a regexp like:
147
+ @text_match = /\w+|./m
148
+ allows the scanner to match a whole word at a time. However, if you have
149
+ a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
150
+ the entire string would be eaten as text and the subscript tag would
151
+ never start.
152
+
153
+ == Using the Scanner
154
+ As shown in the example above, consumers of your class initialize it by
155
+ passing in the string to be parsed, and then calling #to_xml or #to_html
156
+ on it.
157
+
158
+ <i>(This two-step process allows the consumer to run other code after
159
+ the tag parsing, before final conversion. Examples might include
160
+ replacing special command tags with other input, or performing database
161
+ lookups on special wiki-page-link tags and replacing with HTML
162
+ anchors.)</i>
163
+
164
+ = Requirements
165
+ TagTreeScanner is built on top of the StringScanner library that is part
166
+ of the standard Ruby installation.
167
+
168
+ = License
169
+
170
+ (The MIT License)
171
+
172
+ Copyright (c) 2005-2007 Gavin Kistner
173
+
174
+ Permission is hereby granted, free of charge, to any person obtaining
175
+ a copy of this software and associated documentation files (the
176
+ 'Software'), to deal in the Software without restriction, including
177
+ without limitation the rights to use, copy, modify, merge, publish,
178
+ distribute, sublicense, and/or sell copies of the Software, and to
179
+ permit persons to whom the Software is furnished to do so, subject to
180
+ the following conditions:
181
+
182
+ The above copyright notice and this permission notice shall be
183
+ included in all copies or substantial portions of the Software.
184
+
185
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
186
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
187
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
188
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
189
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
190
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
191
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ # -*- ruby -*-
2
+
3
+ require 'rubygems'
4
+ require 'hoe'
5
+ require './lib/tagtreescanner.rb'
6
+
7
+ Hoe.new('tagtreescanner', TagTreeScanner::VERSION) do |p|
8
+ p.rubyforge_name = 'tagtreescanner'
9
+ p.author = 'Gavin Kistner'
10
+ p.email = 'phrogz@mac.com'
11
+ p.url = ''
12
+ p.summary = 'Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.'
13
+ p.description = IO.read( 'README' )[ /= Overview\n(.+?)^=/m, 1 ].rstrip
14
+ p.changes = IO.read( 'HISTORY' )[ /^=[^\n]+\n+(.+?)^=/m, 1 ].rstrip
15
+ p.remote_rdoc_dir = ''
16
+ end
17
+
18
+ # vim: syntax=Ruby
data/TODO ADDED
@@ -0,0 +1,11 @@
1
+ * Overhaul Tag and TextNode and TagTreeScanner to use a common DOM module
2
+ like <tt>Phrogz::DOM::OrderedTreeNode</tt>.
3
+
4
+ * Allow TagFactories to explicitly specify multiple allowed genres
5
+ and/or allowed tags, rather than only one genre.
6
+
7
+ * Provide a method like inner_html= for parsing and creating tag content.
8
+ * Useful for batch replacing the contents of a single tag with output from
9
+ another program, while maintaining the DOM integrity.
10
+
11
+ * More unit tests
@@ -0,0 +1,851 @@
1
+ # This file covers the TagTreeScanner class, and the extensions to the
2
+ # String class needed by it.
3
+ # Please see the documentation on those classes for more information.
4
+ #
5
+ # Author:: Gavin Kistner (mailto:phrogz@mac.com)
6
+ # Copyright:: Copyright (c)2005-2007 Gavin Kistner
7
+ # License:: MIT License
8
+ # Version:: 0.8.0 (2007-November-24)
9
+
10
+ require 'strscan'
11
+
12
+ # = Overview
13
+ # The TagTreeScanner class provides a generic framework for creating a
14
+ # nested hierarchy of tags and text (like XML or HTML) by parsing text. An
15
+ # example use (and the reason it was written) is to convert a wiki markup
16
+ # syntax into HTML.
17
+ #
18
+ # = Example Usage
19
+ # require 'TagTreeScanner'
20
+ #
21
+ # class SimpleMarkup < TagTreeScanner
22
+ # @root_factory.allows_text = false
23
+ #
24
+ # @tag_genres[ :root ] = [ ]
25
+ #
26
+ # @tag_genres[ :root ] << TagFactory.new( :paragraph,
27
+ # # A line that doesn't have whitespace at the start
28
+ # :open_match => /(?=\S)/, :open_requires_bol => true,
29
+ #
30
+ # # Close when you see a double return
31
+ # :close_match => /\n[ \t]*\n/,
32
+ # :allows_text => true,
33
+ # :allowed_genre => :inline
34
+ # )
35
+ #
36
+ # @tag_genres[ :root ] << TagFactory.new( :preformatted,
37
+ # # Grab all lines that are indented up until a line that isn't
38
+ # :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
39
+ # :setup => lambda{ |tag, scanner, tagtree|
40
+ # # Throw the contents I found into the tag
41
+ # # but remove leading whitespace
42
+ # tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
43
+ # },
44
+ # :autoclose => :true
45
+ # )
46
+ #
47
+ # @tag_genres[ :inline ] = [ ]
48
+ #
49
+ # @tag_genres[ :inline ] << TagFactory.new( :bold,
50
+ # # An asterisk followed by a letter or number
51
+ # :open_match => /\*(?=[a-z0-9])/i,
52
+ #
53
+ # # Close when I see an asterisk OR a newline coming up
54
+ # :close_match => /\*|(?=\n)/,
55
+ # :allows_text => true,
56
+ # :allowed_genre => :inline
57
+ # )
58
+ #
59
+ # @tag_genres[ :inline ] << TagFactory.new( :italic,
60
+ # # An underscore followed by a letter or number
61
+ # :open_match => /_(?=[a-z0-9])/i,
62
+ #
63
+ # # Close when I see an underscore OR a newline coming up
64
+ # :close_match => /_|(?=\n)/,
65
+ # :allows_text => true,
66
+ # :allowed_genre => :inline
67
+ # )
68
+ # end
69
+ #
70
+ # raw_text = <<ENDINPUT
71
+ # Hello World! You're _soaking in_ my test.
72
+ # This is a *subset* of markup that I allow.
73
+ #
74
+ # Hi paragraph two. Yo! A code sample:
75
+ #
76
+ # def foo
77
+ # puts "Whee!"
78
+ # end
79
+ #
80
+ # _That, as they say, is that._
81
+ #
82
+ # ENDINPUT
83
+ #
84
+ # markup = SimpleMarkup.new( raw_text ).to_xml
85
+ # puts markup
86
+ #
87
+ #
88
+ # #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
89
+ # #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
90
+ # #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
91
+ # #=> <preformatted>def foo
92
+ # #=> puts "Whee!"
93
+ # #=> end</preformatted>
94
+ # #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
95
+ #
96
+ #
97
+ # = Details
98
+ #
99
+ # == TagFactories at 10,000 feet
100
+ # Each possible output tag is described by a TagFactory, which specifies
101
+ # some or all of the following:
102
+ # * The name of the tags it creates <i>(required)</i>
103
+ # * The regular expression to look for to start the tag
104
+ # * The regular expression to look for to close the tag, or
105
+ # * Whether the tag is automatically closed after creation
106
+ # * What genre of tags are allowed within the tag
107
+ # * Whether the tag supports raw text inside it
108
+ # * Code to run when creating a tag
109
+ #
110
+ # See the TagFactory class for more information on specifying factories.
111
+ #
112
+ # == Genres as a State Machine
113
+ # As a new tag is opened, the scanner uses the Tag#allowed_genre property
114
+ # of that tag (set by the +allowed_genre+ property on the TagFactory) to
115
+ # determine which tags to be looking for. A genre is specified by adding
116
+ # an array in the <tt>@tag_genres</tt> hash, whose key is the genre name.
117
+ # For example:
118
+ # @tag_genres[ :inline ] = [ ]
119
+ # adds a new genre named 'inline', with no tags in it. TagFactory instances
120
+ # should be pushed onto this array <b>in the order that they should be looked
121
+ # for</b>. For example:
122
+ # @tag_genres[ :inline ] << TagFactory.new( :italic,
123
+ # # see the TagFactory#initialize for options
124
+ # )
125
+ #
126
+ # Note that the +close_match+ regular expression of the current tag is
127
+ # always checked before looking to open/create any new tags.
128
+ #
129
+ # == Consuming Text
130
+ # As the text is being parsed, there will (probably) be many cases where
131
+ # you have raw text that doesn't close or open any new tags. Whenever the
132
+ # scanner reaches this state, it runs the <tt>@text_match</tt> regexp
133
+ # against the text to move the pointer ahead. If the current tag has
134
+ # <tt>Tag#allows_text?</tt> set to +true+ (through
135
+ # <tt>TagFactory#allows_text</tt>), then this text is added as contents of
136
+ # the tag. If not, the text is thrown away.
137
+ #
138
+ # The safest regular expression consumes only one character at a time:
139
+ # @text_match = /./m
140
+ #
141
+ # <b><i>It is vital that your regexp match newlines</i></b> (the 'm')
142
+ # <b><i>unless every single one of your tags is set to close upon seeing
143
+ # a newline.</i></b>
144
+ #
145
+ # Unfortunately, the safest regular expression is also the slowest. If
146
+ # speed is an issue, your regexp should strive to eat as many characters as
147
+ # possible at once...while ensuring that it doesn't eat characters that
148
+ # would signify the start of a new tag.
149
+ #
150
+ # For example, setting a regexp like:
151
+ # @text_match = /\w+|./m
152
+ # allows the scanner to match a whole word at a time. However, if you have
153
+ # a tag factory set to look for "Hvv2vvO" to indicate a subscripted '2',
154
+ # the entire string would be eaten as text and the subscript tag would
155
+ # never start.
156
+ #
157
+ # == Using the Scanner
158
+ # As shown in the example above, consumers of your class initialize it by
159
+ # passing in the string to be parsed, and then calling #to_xml or #to_html
160
+ # on it.
161
+ #
162
+ # <i>(This two-step process allows the consumer to run other code after
163
+ # the tag parsing, before final conversion. Examples might include
164
+ # replacing special command tags with other input, or performing database
165
+ # lookups on special wiki-page-link tags and replacing with HTML
166
+ # anchors.)</i>
167
+ class TagTreeScanner
168
+ VERSION = "0.8.0"
169
+
170
+ # A TagFactory holds the information about a specific kind of tag:
171
+ # * the name of the tag
172
+ # * what to look for to open and close the tag
173
+ # * what genre of tags it may contain
174
+ # * whether the tag permits raw text
175
+ # * additional code to run when creating the tag
176
+ #
177
+ # See the documentation about the <tt>@tag_genres</tt> hash inside
178
+ # the TagTreeScanner class for information on how to add factories
179
+ # for use.
180
+ #
181
+ # === Utilizing <tt>:autoclose</tt>
182
+ # Occasionally you will want to
183
+ # create a tag and allow no other tags inside it. An example might be
184
+ # a tag containing preformatted code.
185
+ #
186
+ # Rather than opening the tag and slowly spinning through all the
187
+ # text, the combination of the <tt>:autoclose</tt> and
188
+ # <tt>:setup</tt> options allow you to create the tag, fill it with
189
+ # content, and then immediately continute with the parent tag.
190
+ #
191
+ # See the #new method for how to use the <tt>:setup</tt>
192
+ # function, and an example usage.
193
+ class TagFactory
194
+ # The type of tag this factory produces.
195
+ attr_accessor :tag_name
196
+
197
+ # A regexp to match (and consume) that causes a new tag to be started.
198
+ attr_accessor :open_match
199
+
200
+ # Does the #open_match regexp require beginning of line?
201
+ attr_accessor :open_requires_bol
202
+
203
+ # The regexp which causes the tag to automatically close.
204
+ attr_accessor :close_match
205
+
206
+ # Does the #open_match regexp require beginning of line?
207
+ attr_accessor :close_requires_bol
208
+
209
+ # Should this tag stay open when created, or automatically close?
210
+ attr_accessor :autoclose
211
+
212
+ # A symbol with the genre of tags that are allowed inside the tag.
213
+ # <i>(See @tag_genres in the TagTreeScanner documentation.)</i>
214
+ attr_accessor :allowed_genre
215
+
216
+ # May tags created by this factory have text added to them?
217
+ attr_accessor :allows_text
218
+
219
+ # __tag_name__:: A symbol with the name of the tag to create
220
+ # __options__:: A hash including one or more of <tt>:open_match</tt>,
221
+ # <tt>:open_requires_bol</tt>, <tt>:close_match</tt>,
222
+ # <tt>:close_requires_bol</tt>, <tt>:autoclose</tt>,
223
+ # <tt>:allows_text</tt>, <tt>:allowed_genre</tt>, and
224
+ # <tt>:setup</tt>.
225
+ #
226
+ # Due to the way the StringScanner class works, placing a <tt>^</tt>
227
+ # (beginning of line) marker in your <tt>:open_match</tt> or
228
+ # <tt>:close_match</tt> regular expressions will not behave as
229
+ # desired. Instead, set the <tt>:open_requires_bol</tt> and/or
230
+ # <tt>:close_requires_bol</tt> properties to +true+ if desired.
231
+ #
232
+ # A factory should either be set to <tt>:autoclose => true</tt>, or
233
+ # supply a <tt>:close_match</tt>. (Otherwise, it will never close.)
234
+ #
235
+ # Further, a factory should either be set to
236
+ # <tt>:autoclose => true</tt> or specify an <tt>:allowed_genre</tt>.
237
+ # <i>(See below for how to efficiently create a tag that cannot
238
+ # contain other tags.)</i>
239
+ #
240
+ # The <tt>:setup</tt> option is used to run code during the tag
241
+ # creation. The value of this option should be a lambda/Proc that
242
+ # accepts three parameters:
243
+ # * the <b>Tag</b> being created
244
+ # * the <b>StringScanner</b> instance that matched the tag opening
245
+ # * the <b>TagTreeScanner</b> instance creating the tag.
246
+ #
247
+ # === Example:
248
+ # # Shove URLs as HTML anchors, without the protocol prefix shown
249
+ # @tag_genres[ :inline ] << TagFactory.new( :a,
250
+ # :open_match => %r{http://(\S+)},
251
+ # :setup => lambda{ |tag, ss, tagtree|
252
+ # tag.attributes[ :href ] = ss[0]
253
+ # tag << ss[1]
254
+ # },
255
+ # :autoclose => true
256
+ # )
257
+ def initialize( tag_name, options={} )
258
+ @tag_name = tag_name
259
+ [ :open_match, :close_match,
260
+ :open_requires_bol, :close_requires_bol,
261
+ :allowed_genre, :autoclose,
262
+ :allows_text,
263
+ :setup, :attributes ].each{ |k|
264
+ self.instance_variable_set( "@#{k}".intern, options[ k ] )
265
+ }
266
+ end
267
+
268
+ # Creates and returns a new tag if the supplied _string_scanner_
269
+ # matches the +open_match+ of this factory.
270
+ #
271
+ # Called by TagTreeScanner during initialization.
272
+ def match( string_scanner, tagtreescanner ) #:nodoc:
273
+ #puts "Matching #{@open_match.inspect} against #{string_scanner.peek(10)}"
274
+ return nil unless ( !@open_requires_bol || string_scanner.bol? ) && string_scanner.scan( @open_match )
275
+ tag = maketag
276
+ @setup.call( tag, string_scanner, tagtreescanner ) if @setup
277
+ #puts "...created #{tag}"
278
+ tag
279
+ end
280
+
281
+ # Creates a tag from the factory manually
282
+ def create #:nodoc:
283
+ tag = maketag
284
+ @setup.call( tag, nil, nil ) if @setup
285
+ tag
286
+ end
287
+
288
+ private
289
+ # DRY common code
290
+ def maketag #:nodoc:
291
+ tag = Tag.new( @tag_name )
292
+ tag.factory = self
293
+ tag.attributes = @attributes if @attributes
294
+ tag
295
+ end
296
+ end
297
+
298
+ # Tags are the equivalent of a DOM Element. The majority of tags
299
+ # are created automatically by a TagFactory, but it may be
300
+ # necessary to create them directly in order to augment or replace
301
+ # information in the tag tree.
302
+ #
303
+ # A Tag may have one or more attributes, which are pairs of
304
+ # key/value strings; attributes are output in the HTML or XML
305
+ # representation of the Tag.
306
+ #
307
+ # Each tag also has an <tt>info</tt> hash, which may be used to
308
+ # keep track of extra bits of information about a tag. <i>Example
309
+ # usages might be keeping track of the depth of a list item, or the
310
+ # associated section for a header.</i> Information from the +info+
311
+ # hash is not output in the HTML or XML representations.
312
+ class Tag
313
+ # A symbol with the name of this tag
314
+ attr_accessor :name
315
+
316
+ # An array of child Tag or TextNode instances
317
+ attr_accessor :child_tags
318
+
319
+ # A hash of key/value attributes to emit in the XML/HTML
320
+ # representation
321
+ attr_accessor :attributes
322
+
323
+ # The TagFactory that created this tag (may be +nil+)
324
+ attr_accessor :factory
325
+
326
+ # A hash that may be used to store extra information about a Tag
327
+ attr_accessor :info
328
+
329
+ # The Tag to which this tag is attached (may be +nil+)
330
+ attr_reader :parent_tag
331
+
332
+ # The Tag or TextNode which immediately follows this tag
333
+ # (may be +nil+ if this is the last tag of its parent)
334
+ attr_reader :next_sibling
335
+
336
+ # The Tag or TextNode which immediately precedes this tag
337
+ # (may be +nil+ if this is the first tag of its parent)
338
+ attr_reader :previous_sibling
339
+
340
+ # _name_:: A symbol with the name of this tag
341
+ # _attributes_:: A hash of key/value pairs to store with this tag
342
+ def initialize( name, attributes={} )
343
+ @name = name
344
+ @child_tags = [ ]
345
+ @attributes = attributes
346
+ @info = {}
347
+ end
348
+
349
+ # Allows for settings HTML or XML-like attributes directly without
350
+ # knowing about the _attributes_ collection. For example:
351
+ # tag.href = 'http://www.google.com'
352
+ # tag.class = 'external'
353
+ # is the same as:
354
+ # tag.attributes['href'] = 'http://www.google.com'
355
+ # tag.attributes['class'] = 'external'
356
+ # ...for any attributes (like the above) that don't have the same
357
+ # name as an existing method or attribute on the Tag class.
358
+ def method_missing( name, *args )
359
+ if (name=name.to_s) =~ /=$/
360
+ @attributes[ name[0...-1] ] = (args.size==1 ? args[0] : args )
361
+ else
362
+ @attributes[ name ]
363
+ end
364
+ end
365
+
366
+ # Returns the +close_match+ property of the owning TagFactory,
367
+ # or +nil+ if this tag wasn't created by a factory.
368
+ def close_match
369
+ @factory && @factory.close_match
370
+ end
371
+
372
+ # Returns the +close_requires_bol+ property of the owning TagFactory,
373
+ # or +nil+ if this tag wasn't created by a factory.
374
+ def close_requires_bol?
375
+ @factory && @factory.close_requires_bol
376
+ end
377
+
378
+ # Returns the +autoclose+ property of the owning TagFactory,
379
+ # or +nil+ if this tag wasn't created by a factory.
380
+ def autoclose?
381
+ @factory && @factory.autoclose
382
+ end
383
+
384
+ # Returns the +allows_text+ property of the owning TagFactory,
385
+ # or +true+ if this tag wasn't created by a factory.
386
+ def allows_text?
387
+ @factory ? @factory.allows_text : true
388
+ end
389
+
390
+ # Returns the +allowed_genre+ property of the owning TagFactory,
391
+ # or +nil+ if this tag wasn't created by a factory.
392
+ def allowed_genre
393
+ @factory && @factory.allowed_genre
394
+ end
395
+
396
+ # _new_child_:: The Tag or TextNode to add as the last child.
397
+ #
398
+ # Adds _new_child_ to the end of this tag's +child_tags_ collection.
399
+ # Returns a reference to _new_child_.
400
+ #
401
+ # If _new_child_ is a child of another Tag, it is first removed from
402
+ # that tag.
403
+ def append_child( new_child )
404
+ return if new_child == @child_tags.last
405
+ insert_after( new_child, @child_tags.last )
406
+ end
407
+
408
+ # _new_child_:: The Tag or TextNode to add as a child of this tag.
409
+ # _reference_child_:: The child to place _new_child_ before.
410
+ #
411
+ # Adds _new_child_ as a child of this tag, immediately before the
412
+ # location of _reference_child_. Returns a reference to _new_child_.
413
+ #
414
+ # If _reference_child_ is +nil+, the child is added as the last
415
+ # child of this tag. A RuntimeError is raised if _reference_child_
416
+ # is not a child of this tag.
417
+ #
418
+ # If _new_child_ is a child of another Tag, #remove_child is
419
+ # automatically invoked to remove it from that tag.
420
+ def insert_before( new_child, reference_child=nil )
421
+ return new_child if reference_child ? ( reference_child.previous_sibling == new_child ) : ( new_child == @child_tags.last )
422
+ insert_after( new_child, reference_child ? reference_child.previous_sibling : @child_tags.last )
423
+ end
424
+
425
+ # _new_child_:: The Tag or TextNode to add as a child of this tag.
426
+ # _reference_child_:: The child to place _new_child_ after.
427
+ #
428
+ # Adds _new_child_ as a child of this tag, immediately after the
429
+ # location of _reference_child_. Returns a reference to _new_child_.
430
+ #
431
+ # If _reference_child_ is +nil+, the child is added as the first
432
+ # child of this tag. A RuntimeError is raised if _reference_child_
433
+ # is not a child of this tag.
434
+ #
435
+ # If _new_child_ is a child of another Tag, #remove_child is
436
+ # automatically invoked to remove it from that tag.
437
+ def insert_after( new_child, reference_child=nil )
438
+ #puts "#{self.inspect}#insert_after( #{new_child.inspect}, #{reference_child.inspect} )"
439
+ return new_child if reference_child ? ( reference_child.next_sibling == new_child ) : ( new_child == @child_tags.first )
440
+
441
+ #Ensure new_child is not not an ancestor of self
442
+ walker = self
443
+ while walker
444
+ raise "#{new_child.inspect} cannot be added under #{self.inspect}, because it is an ancestor of it!" if walker==new_child
445
+ walker = walker.parent_tag
446
+ end
447
+
448
+ new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
449
+ if reference_child
450
+ new_idx = @child_tags.index( reference_child )
451
+ raise "#{reference_child.inspect} is not a child of #{self.inspect}" unless new_idx
452
+ new_idx += 1
453
+ else
454
+ new_idx = 0
455
+ end
456
+ new_child.parent_tag = self
457
+ succ = @child_tags[ new_idx ]
458
+ @child_tags.insert( new_idx, new_child )
459
+ new_child.previous_sibling = reference_child
460
+ reference_child.next_sibling = new_child if reference_child
461
+ new_child.next_sibling = succ
462
+ succ.previous_sibling = new_child if succ
463
+ new_child
464
+ end
465
+
466
+ # _existing_child_:: The Tag or TextNode to remove.
467
+ #
468
+ # Removes _existing_child_ from being a child of this tag.
469
+ # Returns _existing_child_.
470
+ #
471
+ # A RuntimeError is raised if _existing_child_ is not a child of
472
+ # this tag.
473
+ #
474
+ # If _new_child_ is a child of another Tag, #remove_child is
475
+ # automatically invoked to remove it from that tag.
476
+ def remove_child( existing_child )
477
+ idx = @child_tags.index( existing_child )
478
+ raise "#{existing_child.inspect} is not a child of #{self.inspect}" unless idx
479
+ prev, succ = existing_child.previous_sibling, existing_child.next_sibling
480
+ prev.next_sibling = succ if prev
481
+ succ.previous_sibling = prev if succ
482
+ @child_tags.delete_at( idx )
483
+ existing_child.previous_sibling = existing_child.next_sibling = existing_child.parent_tag = nil
484
+ existing_child
485
+ end
486
+
487
+ # _old_child_:: The existing child Tag or TextNode to replace.
488
+ # _new_child_:: The Tag or TextNode to replace _old_child_.
489
+ #
490
+ # Replaces _old_child_ with _new_child_ in this collection.
491
+ # Returns _old_child_.
492
+ #
493
+ # A RuntimeError is raised if _existing_child_ is not a child of
494
+ # this tag.
495
+ #
496
+ # If _new_child_ is a child of another Tag, #remove_child is
497
+ # automatically invoked to remove it from that tag.
498
+ def replace_child( old_child, new_child )
499
+ if ( prev = old_child.previous_sibling ) == new_child || old_child.next_sibling == new_child
500
+ remove_child( old_child )
501
+ else
502
+ new_child.parent_tag.remove_child( new_child ) if new_child.parent_tag
503
+ remove_child( old_child )
504
+ insert_after( new_child, prev )
505
+ end
506
+ old_child
507
+ end
508
+
509
+ # _new_child_:: The Tag or TextNode to replace this tag.
510
+ #
511
+ # Replaces this tag with _new_child_. Returns _new_child_.
512
+ #
513
+ # A RuntimeError is raised if this tag is not a child of another tag.
514
+ #
515
+ # If _new_child_ is a child of another Tag, #remove_child is
516
+ # automatically invoked to remove it from that tag.
517
+ def replace_with( new_child )
518
+ return new_child if new_child == self
519
+ raise "#{self.inspect} is not a child of another tag" unless @parent_tag
520
+ @parent_tag.replace_child( self, new_child )
521
+ new_child
522
+ end
523
+
524
+ # _additional_text_:: The text to add to this node.
525
+ #
526
+ # Appends _additional_text_ to this tag. If the last item in the
527
+ # +child_tags+ collection is a TextNode, the text is added to that
528
+ # item; otherwise, a new TextNode is created with _additional_text_
529
+ # and added as the last child of this tag.
530
+ def << ( additional_text )
531
+ last_child = @child_tags.last
532
+ if last_child.is_a? TextNode
533
+ last_child << additional_text
534
+ else
535
+ append_child( TextNode.new( additional_text ) )
536
+ end
537
+ end
538
+
539
+ # Set the text content of this element to _new_contents_
540
+ # Removes any child tags (and their text)
541
+ def text=( new_contents )
542
+ @child_tags.clear
543
+ append_child( TextNode.new( new_contents ) )
544
+ end
545
+
546
+ alias_method :inner_text=, :text=
547
+
548
+ # Returns an HTML representation of this tag and all its descendants.
549
+ #
550
+ # This method is the same as #to_xml except that tags without
551
+ # any +child_tags+ use an explicit close tag, e.g.
552
+ # <tt><div></div></tt> instead of XML's <tt><div /></tt>
553
+ def to_html
554
+ to_xml( false )
555
+ end
556
+
557
+ # Returns an XML representation of this tag and all its descendants.
558
+ #
559
+ # If _empty_tags_collapsed_ is +true+ (the default) then this method
560
+ # is the same as #to_html except that tags without any +child_tags+
561
+ # use a single closed tag, e.g.
562
+ # <tt><div /></tt> instead of HTML's <tt><div></div></tt>
563
+ #
564
+ # If _empty_tags_collapsed_ is +false+, this is the same as #to_html.
565
+ def to_xml( empty_tags_collapsed=true )
566
+ out = "<#{@name}"
567
+ @attributes.each{ |k,v| out << " #{k}=\"#{v.to_s.gsub( '""', '&quot;' )}\"" }
568
+ if empty_tags_collapsed && @child_tags.empty?
569
+ out << ' />'
570
+ else
571
+ out << '>'
572
+ unless @child_tags.empty?
573
+ out << "\n" unless self.allows_text?
574
+ @child_tags.each{ |tag|
575
+ out << tag.to_xml( empty_tags_collapsed )
576
+ }
577
+ end
578
+ out << "</#{@name}>"
579
+ end
580
+ out << "\n" if @parent_tag && !@parent_tag.allows_text?
581
+ out
582
+ end
583
+
584
+ # Returns an array of all descendants of this tag whose #name
585
+ # matches the supplied _name_.
586
+ def tags_by_name( name )
587
+ out = []
588
+ @child_tags.each{ |tag|
589
+ out << tag if tag.name == name
590
+ unless tag.child_tags.empty?
591
+ out.concat( tag.tags_by_name( name ) )
592
+ end
593
+ }
594
+ out
595
+ end
596
+
597
+ # Returns the text contents of this tag and its descendants.
598
+ def inner_text
599
+ @child_tags.inject(''){ |out,tag|
600
+ out << ( tag.is_a?( TextNode ) ? tag.text : tag.inner_text )
601
+ }
602
+ end
603
+
604
+ def inspect #:nodoc:
605
+ out = "<#{@name}"
606
+ #out << " @pops=#{@parent_tag ? @parent_tag.name.inspect : 'nil'}"
607
+ #out << " @prev=#{@previous_sibling ? @previous_sibling.name.inspect : 'nil'}"
608
+ #out << " @next=#{@next_sibling ? @next_sibling.name.inspect : 'nil'}"
609
+ @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
610
+ @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
611
+ children = @child_tags.length
612
+ if children == 1 && TextNode === @child_tags.first
613
+ out << ">#{@child_tags.first}</#{@name}"
614
+ elsif children == 0
615
+ out << '>'
616
+ else
617
+ out << " (#{@child_tags.length} child#{@child_tags.length != 1 ? 'ren' : ''})>"
618
+ end
619
+ end
620
+
621
+ # _level_:: The indentation level (tabs) to start at.
622
+ #
623
+ # Returns a full-hierarchical representation of this tag and its
624
+ # descendants. (Used for debugging.)
625
+ def to_hier( level=0 ) #:nodoc:
626
+ tabs = "\t" * level
627
+ out = "#{tabs}<#{@name}"
628
+ @attributes.each{ |k,v| out << " #{k}=\"#{v}\"" }
629
+ @info.each{ |k,v| out << " @#{k}=>#{v.inspect}" }
630
+ if @child_tags.empty?
631
+ out << " />\n"
632
+ elsif @child_tags.length == 1 && TextNode === @child_tags.first
633
+ out << ">#{@child_tags.first}</#{@name}>\n"
634
+ else
635
+ out << ">\n"
636
+ @child_tags.each{ |n| out << n.to_hier(level+1) }
637
+ out << "#{tabs}</#{@name}>\n"
638
+ end
639
+ out
640
+ end
641
+
642
+ # Returns a copy of this tag and its entire hierarchy.
643
+ # All descendant tags/text nodes are also cloned.
644
+ #
645
+ # The +info+ hash is not preserved.
646
+ def dup
647
+ tag = self.class.new( self.name, self.attributes.dup )
648
+ @child_tags.each{ |tag2| tag.append_child( tag2.dup ) }
649
+ tag
650
+ end
651
+
652
+ # :stopdoc:
653
+ protected
654
+ attr_writer :previous_sibling, :next_sibling, :parent_tag
655
+ # :startdoc:
656
+
657
+ end
658
+
659
+ # A TextNode holds raw text inside a Tag. Generally, TextNodes are
660
+ # created automatically by the Tag#<< method.
661
+ class TextNode
662
+ # The Tag or TextNode that comes after this one (may be +nil+)
663
+ attr_accessor :next_sibling
664
+
665
+ # The Tag or TextNode that comes before this one (may be +nil+)
666
+ attr_accessor :previous_sibling
667
+
668
+ # The Tag that is a parent of this TextNode (may be +nil+)
669
+ attr_accessor :parent_tag
670
+
671
+ # A hash which may be used to store 'extra' information
672
+ attr_accessor :info
673
+
674
+ # The string contents of this text node
675
+ attr_accessor :text
676
+
677
+ # _text_:: The text to start out with
678
+ def initialize( text='' )
679
+ @text = text
680
+ @info = {}
681
+ end
682
+
683
+ # _additional_text_:: The text to add
684
+ #
685
+ # Appends the provided text to the end of the current text
686
+ #
687
+ # Returns the new text value
688
+ def << ( additional_text )
689
+ @text << additional_text
690
+ end
691
+
692
+ # Returns a copy of this text node
693
+ def dup
694
+ tag = self.class.new( @text.dup )
695
+ end
696
+
697
+ def to_hier( level=0 ) #:nodoc:
698
+ "#{"\t"*level}#{@text.inspect}\n"
699
+ end
700
+
701
+ def to_s #:nodoc:
702
+ @text
703
+ end
704
+
705
+ # Returns the contents of this node, modified to be made XML-safe
706
+ # by calling String#xmlsafe.
707
+ def to_xml( *args )
708
+ @text.xmlsafe
709
+ end
710
+ end
711
+
712
+ # RDoc thinks that this stuff applies to instances, not the class
713
+ # :stopdoc:
714
+ class << self
715
+ attr_accessor :tag_genres, :root_factory, :text_match
716
+ end
717
+ # :startdoc:
718
+
719
+ # The tag_genres hash maps a genre name onto an array of TagFactories.
720
+ #
721
+ # Factories are tested in the order they appear in the genre array;
722
+ # more important matches are at the top, generic fallback ones
723
+ # should appear at the end of the list.
724
+ #
725
+ # If no factory matches the current input, then text is shoved into the
726
+ # most recent tag until a new tag start is found, or the closing match
727
+ # is met. (If the current tag's factory does not have :allows_text set
728
+ # to true, then the text is simply thrown away until a the closing or
729
+ # new tag start is found.)
730
+ @tag_genres = { }
731
+
732
+ # Settings for the root of your document: what genre is allowed at the
733
+ # highest level, and should raw text be allowed there?
734
+ #
735
+ # Override in your class by setting a class-instance variable as below.
736
+ @root_factory = TagFactory.new( :root,
737
+ :allowed_genre => :root,
738
+ :allows_text => true )
739
+
740
+ # The pattern to consume and shove as text whenever no tag start/close
741
+ # is found. Eating one character at a time is safest, but slow.
742
+ # Ensure that this pattern never lets you over the start of a tag,
743
+ # or else you'll miss it.
744
+ @text_match = /./m
745
+
746
+ # Scans through _string_to_parse_ and builds a tree of tags based
747
+ # on the regular expressions and rules set by the TagFactory
748
+ # instances present in <tt>@tag_genres</tt>.
749
+ #
750
+ # After parsing the tree, call #to_xml or #to_html to retrieve
751
+ # a string representation.
752
+ def initialize( string_to_parse )
753
+ current = @root = self.class.root_factory.create
754
+ tag_genres = self.class.tag_genres
755
+ text_match = self.class.text_match
756
+
757
+ ss = StringScanner.new( string_to_parse )
758
+ while !ss.eos?
759
+ # Keep popping off the current tag until we get to the root,
760
+ # as long as the end criteria is met
761
+ while ( current != @root ) && (!current.close_requires_bol? || ss.bol?) && ss.scan( current.close_match )
762
+ current = current.parent_tag || @root
763
+ end
764
+
765
+ # No point in continuing if closing out tags consumed the rest of the string
766
+ break if ss.eos?
767
+
768
+ # Look for a tag to open
769
+ if factories = tag_genres[ current.allowed_genre ]
770
+ tag = nil
771
+ factories.each{ |factory|
772
+ if tag = factory.match( ss, self )
773
+ current.append_child( tag )
774
+ current = tag unless tag.autoclose?
775
+ break
776
+ end
777
+ }
778
+ #start at the top of the loop if we found one
779
+ next if tag
780
+ end
781
+
782
+ # Couldn't find a valid tag at this spot
783
+ # so we need to eat some characters
784
+ consumed = ss.scan( text_match )
785
+ current << consumed if current.allows_text?
786
+ end
787
+ end
788
+
789
+ # Returns an HTML representation of the tag tree.
790
+ #
791
+ # This is the same as the #to_xml method except that empty tags use an
792
+ # explicit close tag, e.g. <tt><div></div></tt> versus <tt><div /></tt>
793
+ def to_html
794
+ @root.child_tags.inject(''){ |out, tag| out << tag.to_html }
795
+ end
796
+
797
+ # Returns an XML representation of the tag tree.
798
+ #
799
+ # This method is the same as the #to_html method except that empty tags
800
+ # do not use an explicit close tag,
801
+ # e.g. <tt><div /></tt> versus <tt><div></div></tt>
802
+ def to_xml
803
+ @root.child_tags.inject(''){ |out, tag| out << tag.to_xml }
804
+ end
805
+
806
+ # Returns an array of all root-level tags found
807
+ def tags
808
+ @root.child_tags
809
+ end
810
+
811
+ # Returns an array of all tags in the tree whose Tag#name matches
812
+ # the supplied _name_.
813
+ def tags_by_name( name )
814
+ @root.tags_by_type( name )
815
+ end
816
+
817
+ # Returns a hierarchical representation of the entire tag tree
818
+ def inspect #:nodoc:
819
+ @root.to_hier
820
+ end
821
+
822
+ # When a class inherits from TagTreeScanner, defaults are set for
823
+ # <tt>@tag_genres</tt>, <tt>@root_factory</tt> and
824
+ # <tt>@text_match</tt>
825
+ def self.inherited( child_class ) #:nodoc:
826
+ child_class.tag_genres = @tag_genres
827
+ child_class.root_factory = @root_factory
828
+ child_class.text_match = @text_match
829
+ end
830
+ end
831
+
832
+ # Extensions to the built-in String class
833
+ class String
834
+
835
+ # Returns a copy of the string with all <tt>&</tt>, <tt><</tt> and
836
+ # <tt>></tt> characters replaced by their equivalent XML entities
837
+ # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
838
+ def xmlsafe
839
+ self.dup.xmlsafe!
840
+ end
841
+
842
+ # Modifies the string, replacing all <tt>&</tt>, <tt><</tt> and
843
+ # <tt>></tt> characters with their equivalent XML entities
844
+ # (<tt>&amp;</tt>, <tt>&lt;</tt>, and <tt>&gt;</tt>)
845
+ def xmlsafe!
846
+ self.gsub!( /&/, '&amp;' )
847
+ self.gsub!( /</, '&lt;' )
848
+ self.gsub!( />/, '&gt;' )
849
+ self
850
+ end
851
+ end
@@ -0,0 +1,84 @@
1
+ require "test/unit"
2
+ require "../lib/tagtreescanner.rb"
3
+
4
+ class SimpleMarkup < TagTreeScanner
5
+ @root_factory.allows_text = false
6
+
7
+ @tag_genres[ :root ] = [ ]
8
+
9
+ @tag_genres[ :root ] << TagFactory.new( :paragraph,
10
+ # A line that doesn't have whitespace at the start
11
+ :open_match => /(?=\S)/, :open_requires_bol => true,
12
+
13
+ # Close when you see a double return
14
+ :close_match => /\n[ \t]*\n/,
15
+ :allows_text => :true,
16
+ :allowed_genre => :inline
17
+ )
18
+
19
+ @tag_genres[ :root ] << TagFactory.new( :preformatted,
20
+ # Grab all lines that are indented up until a line that isn't
21
+ :open_match => /((\s+).+?)\n+(?=\S)/m, :open_requires_bol => true,
22
+ :setup => lambda{ |tag, scanner, tagtree|
23
+ # Throw the contents I found into the tag
24
+ # but remove leading whitespace
25
+ tag << scanner[1].gsub( /^#{scanner[2]}/, '' )
26
+ },
27
+ :autoclose => :true
28
+ )
29
+
30
+ @tag_genres[ :inline ] = [ ]
31
+
32
+ @tag_genres[ :inline ] << TagFactory.new( :bold,
33
+ # An asterisk followed by a letter or number
34
+ :open_match => /\*(?=[a-z0-9])/i,
35
+
36
+ # Close when I see an asterisk OR a newline coming up
37
+ :close_match => /\*|(?=\n)/,
38
+ :allows_text => true,
39
+ :allowed_genre => :inline
40
+ )
41
+
42
+ @tag_genres[ :inline ] << TagFactory.new( :italic,
43
+ # An underscore followed by a letter or number
44
+ :open_match => /_(?=[a-z0-9])/i,
45
+
46
+ # Close when I see an underscore OR a newline coming up
47
+ :close_match => /_|(?=\n)/,
48
+ :allows_text => true,
49
+ :allowed_genre => :inline
50
+ )
51
+ end
52
+
53
+ class Tag_Test < Test::Unit::TestCase
54
+ def setup
55
+ end
56
+
57
+ def test_conversion
58
+ raw_text = <<-ENDINPUT
59
+ Hello World! You're _soaking in_ my test.
60
+ This is a *subset* of markup that I allow.
61
+
62
+ Hi paragraph two. Yo! A code sample:
63
+
64
+ def foo
65
+ puts "Whee!"
66
+ end
67
+
68
+ _That, as they say, is that._
69
+
70
+ ENDINPUT
71
+
72
+ markup = SimpleMarkup.new( raw_text ).to_xml
73
+ p '',markup
74
+ end
75
+ end
76
+
77
+
78
+ #=> <paragraph>Hello World! You're <italic>soaking in</italic> my test.
79
+ #=> This is a <bold>subset</bold> of markup that I allow.</paragraph>
80
+ #=> <paragraph>Hi paragraph two. Yo! A code sample:</paragraph>
81
+ #=> <preformatted>def foo
82
+ #=> puts "Whee!"
83
+ #=> end</preformatted>
84
+ #=> <paragraph><italic>That, as they say, is that.</italic></paragraph>
@@ -0,0 +1,104 @@
1
+ require "test/unit"
2
+ require "../lib/tagtreescanner"
3
+
4
+ class Tag_Test < Test::Unit::TestCase
5
+ def setup
6
+ end
7
+
8
+ def test1_tags
9
+ root = TagTreeScanner::Tag.new( :root, { :is_root => true } )
10
+ assert_equal( :root, root.name )
11
+ assert_equal( true, root.attributes[ :is_root ] )
12
+ assert_nil( root.allowed_genre )
13
+ assert( root.allows_text? )
14
+
15
+ t1 = TagTreeScanner::Tag.new( :t1 )
16
+ root.append_child( t1 )
17
+ assert_equal( 1, root.child_tags.length )
18
+ assert_equal( t1, root.child_tags.first )
19
+
20
+ t2 = TagTreeScanner::Tag.new( :t2 )
21
+ root.append_child( t2 )
22
+ assert_equal( 2, root.child_tags.length )
23
+ assert_equal( t2, root.child_tags.last )
24
+
25
+ t3 = TagTreeScanner::Tag.new( :t3 )
26
+ root.insert_before( t3, t2 )
27
+ assert_equal( 3, root.child_tags.length )
28
+ assert_equal( [t1,t3,t2], root.child_tags )
29
+
30
+ root.append_child( t1 )
31
+ assert_equal( [t3,t2,t1], root.child_tags )
32
+
33
+ t1.replace_with( t3 )
34
+ assert_equal( [t2,t3], root.child_tags )
35
+ assert_nil( t1.parent_tag )
36
+
37
+ root.insert_before( t1, t2 )
38
+ assert_equal( [t1,t2,t3], root.child_tags )
39
+ assert_equal( root, t1.parent_tag )
40
+
41
+ root.append_child( t1 )
42
+ assert_equal( [t2,t3,t1], root.child_tags )
43
+ assert_equal( root, t1.parent_tag )
44
+ assert_nil( t1.next_sibling )
45
+ assert_nil( t2.previous_sibling )
46
+
47
+ t1.append_child( t3 )
48
+ assert_equal( [t2,t1], root.child_tags )
49
+ assert_nil( t3.next_sibling )
50
+ assert_nil( t3.previous_sibling )
51
+ assert_equal( t1, t2.next_sibling )
52
+ assert_equal( t2, t1.previous_sibling )
53
+ assert_equal( t3, t1.child_tags.first )
54
+
55
+ assert_raise( RuntimeError ){
56
+ t3.append_child( t1 )
57
+ }
58
+
59
+ assert_raise( RuntimeError ){
60
+ t1.append_child( t1 )
61
+ }
62
+ end
63
+
64
+ def test2_tags2
65
+ root = TagTreeScanner::Tag.new( :root )
66
+ # make a ton of tags...
67
+ 1.upto(100){ |i|
68
+ root.append_child( TagTreeScanner::Tag.new( "t#{i}".intern ) )
69
+ }
70
+
71
+ # ...shuffle the hell out of them...
72
+ 500.times{
73
+ next unless n1 = root.child_tags[ rand( root.child_tags.length ) ]
74
+ n2 = root.child_tags[ rand( root.child_tags.length ) ]
75
+ next if n1 == n2
76
+ case rand(30)
77
+ when 0
78
+ root.remove_child( n1 )
79
+ when 1
80
+ root.append_child( n1 )
81
+ when 2
82
+ root.insert_before( n1, nil )
83
+ when 3
84
+ root.insert_after( n1, nil )
85
+ when 4
86
+ root.insert_before( n1, n2 )
87
+ when 5
88
+ n1.replace_with( n2 )
89
+ else
90
+ root.insert_after( n1, n2 )
91
+ end
92
+ }
93
+
94
+ # ...and now ensure that they're all properly linked
95
+ last_tag = nil
96
+ root.child_tags.each{ |tag|
97
+ assert_equal( last_tag, tag.previous_sibling )
98
+ assert_equal( tag, last_tag.next_sibling ) if last_tag
99
+ assert_equal( root, tag.parent_tag )
100
+ last_tag = tag
101
+ }
102
+ assert_nil( last_tag.next_sibling ) if last_tag
103
+ end
104
+ end
metadata ADDED
@@ -0,0 +1,63 @@
1
+ --- !ruby/object:Gem::Specification
2
+ rubygems_version: 0.9.4
3
+ specification_version: 1
4
+ name: tagtreescanner
5
+ version: !ruby/object:Gem::Version
6
+ version: 0.8.0
7
+ date: 2007-11-25 00:00:00 -07:00
8
+ summary: Meta library for creating classes that turn custom text markup into XML-like tag hierarchies.
9
+ require_paths:
10
+ - lib
11
+ email: phrogz@mac.com
12
+ homepage:
13
+ rubyforge_project: tagtreescanner
14
+ description: The TagTreeScanner class provides a generic framework for creating a nested hierarchy of tags and text (like XML or HTML) by parsing text. An example use (and the reason it was written) is to convert a wiki markup syntax into HTML.
15
+ autorequire:
16
+ default_executable:
17
+ bindir: bin
18
+ has_rdoc: true
19
+ required_ruby_version: !ruby/object:Gem::Version::Requirement
20
+ requirements:
21
+ - - ">"
22
+ - !ruby/object:Gem::Version
23
+ version: 0.0.0
24
+ version:
25
+ platform: ruby
26
+ signing_key:
27
+ cert_chain:
28
+ post_install_message:
29
+ authors:
30
+ - Gavin Kistner
31
+ files:
32
+ - HISTORY
33
+ - Manifest.txt
34
+ - README
35
+ - Rakefile
36
+ - TODO
37
+ - lib/tagtreescanner.rb
38
+ - test/test_simplemarkup.rb
39
+ - test/test_tagtreescanner.rb
40
+ test_files:
41
+ - test/test_simplemarkup.rb
42
+ - test/test_tagtreescanner.rb
43
+ rdoc_options:
44
+ - --main
45
+ - README.txt
46
+ extra_rdoc_files:
47
+ - Manifest.txt
48
+ executables: []
49
+
50
+ extensions: []
51
+
52
+ requirements: []
53
+
54
+ dependencies:
55
+ - !ruby/object:Gem::Dependency
56
+ name: hoe
57
+ version_requirement:
58
+ version_requirements: !ruby/object:Gem::Version::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: 1.3.0
63
+ version: