syndication 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,33 @@
1
+ # = Syndication 0.4
2
+ #
3
+ # As discussed in the README, this is really my fourth attempt at writing
4
+ # RSS parsing code. For the record, I thought I'd list the approaches I
5
+ # tried and abandoned. In a way, that's more interesting than the one I
6
+ # picked...
7
+ #
8
+ # First I used hashes for storage and just looked for matching tags.
9
+ # That approach works, kinda, but it doesn't really understand nested
10
+ # elements at all. As a result, it becomes really hard to deal with Atom
11
+ # feeds, where an <email> element could belong to one of a number of kinds
12
+ # of person. Plus, I wanted a real object-based approach which would be
13
+ # amenable to RDoc documentation.
14
+ #
15
+ # Next I wrote a classic stack-based parser, with a container stack and a
16
+ # text buffer stack. That worked well for RSS; I got it parsing every RSS
17
+ # variant, and even went as far as a test suite. However, as I tried
18
+ # extending it to deal with Atom, I realized that the parser code was
19
+ # becoming hard to follow, as the state machine gained more and more
20
+ # special cases.
21
+ #
22
+ # For a third iteration, I tried to generalize the knowledge represented by the
23
+ # state machine, by placing it in the context stack. That is, I would have a
24
+ # smart stack that knew which XML elements could go inside other elements.
25
+ # Actually, there would have been four context stacks, for containers,
26
+ # attributes, tags and textual data.
27
+ #
28
+ # That design never made it past the paper stage, because I realized that I
29
+ # could move all the knowledge into the classes used to create the objects of
30
+ # the final parse tree. With the new model--the one used in this code--the
31
+ # parser really doesn't know anything about Atom or RSS. It just forwards
32
+ # events to a tree of objects, which construct child objects as appropriate to
33
+ # grow the tree and represent the feed.
data/README ADDED
@@ -0,0 +1,208 @@
1
+ #
2
+ # = Syndication 0.4
3
+ #
4
+ # This module provides classes for parsing web syndication feeds in RSS and
5
+ # Atom formats.
6
+ #
7
+ # To parse RSS, use Syndication::RSS::Parser.
8
+ #
9
+ # To parse Atom, use Syndication::Atom::Parser.
10
+ #
11
+ # If you want my advice on which to generate, my order of preference would
12
+ # be:
13
+ #
14
+ # 1. Atom 1.0
15
+ # 2. RSS 1.0
16
+ # 3. RSS 2.0
17
+ #
18
+ # My reasoning is simply that I hate having to sniff for HTML (see
19
+ # Syndication::RSS).
20
+ #
21
+ # == License
22
+ #
23
+ # Syndication is Copyright 2005 mathew <meta@pobox.com>, and is licensed
24
+ # under the same terms as Ruby.
25
+ #
26
+ # == Requirements
27
+ #
28
+ # Built and tested using Ruby 1.8.2. Needs only the standard library.
29
+ #
30
+ # == Rationale
31
+ #
32
+ # Ruby already has an RSS library as part of the standard library, so you
33
+ # might be wondering why I decided to write another one.
34
+ #
35
+ # I started out trying to document the standard rss module, but found the
36
+ # code rather impenetrable. It was also difficult to see how it could be made
37
+ # documentable via Rdoc.
38
+ #
39
+ # Then I tried writing code to use the standard RSS library, and discovered
40
+ # that it had a number of (what I consider to be) defects:
41
+ #
42
+ # - It doesn't support RSS 2.0 with extensions (such as iTunes podcast feeds),
43
+ # and it wasn't clear to me how to extend it to do so.
44
+ #
45
+ # - It doesn't support RSS 0.9.
46
+ #
47
+ # - It doesn't support Atom.
48
+ #
49
+ # - The API is different depending on what kind of RSS feed you are parsing.
50
+ #
51
+ # I asked around, and discovered that I wasn't the only person dissatisfied
52
+ # with the RSS library. Since fixing the problems would have resulted in
53
+ # breaking existing code that used the RSS module, I opted for an all-new
54
+ # implementation.
55
+ #
56
+ # This is the result. I'm calling it version 0.4, because it's actually my
57
+ # fourth attempt at putting together a clean, simple, universal API for RSS
58
+ # and Atom parsing. (The first three never saw public release.)
59
+ #
60
+ # == Features
61
+ #
62
+ # Here are what I see as the key improvements over the rss module in the
63
+ # Ruby standard library:
64
+ #
65
+ # - Supports all RSS versions, including RSS 0.9, as well as Atom.
66
+ #
67
+ # - Provides a unified API/object model for accessing the decoded data,
68
+ # with no need to know what format the feed is in.
69
+ #
70
+ # - Allows use of extended RSS 2.0 feeds.
71
+ #
72
+ # - Simple API, fully documented.
73
+ #
74
+ # - Test suite with over 220 test assertions.
75
+ #
76
+ # - Commented source code.
77
+ #
78
+ # - Less source code than the standard library rss module.
79
+ #
80
+ # - Faster than the standard library (at least, in my tests, see caveat below).
81
+ #
82
+ # Other features:
83
+ #
84
+ # - Optional support for RSS 1.0 Dublin Core, Syndication and Content modules
85
+ # and Apple iTunes Podcast elements (others to follow).
86
+ #
87
+ # - Content module decodes CDATA-escaped or encoded HTML content for you.
88
+ #
89
+ # - Supports namespaces, and encoded XHTML/HTML in Atom feeds.
90
+ #
91
+ # - Dates decoded to Ruby DateTime objects. Note, however, that this is slow,
92
+ # so parsing is only performed if you ask for the value.
93
+ #
94
+ # - Simple to extend to support your own RSS extensions, uses reflection.
95
+ #
96
+ # - Uses REXML fast stream parsing API for speed.
97
+ #
98
+ # - Non-validating, tries to be as forgiving as possible of structural errors.
99
+ #
100
+ # - Remaps namespace prefixes to standard values if it recognizes the module's
101
+ # URL.
102
+ #
103
+ # In the interests of balance, here are some key disadvantages over the
104
+ # standard library RSS support:
105
+ #
106
+ # - No support for _generating_ RSS feeds yet, only for parsing them. If
107
+ # you're using Rails, you can use RXML; if not, you can of course continue
108
+ # to use rss/maker.
109
+ #
110
+ # - Different API, not a drop-in replacement.
111
+ #
112
+ # - No way to choose a different XML parser (yet).
113
+ #
114
+ # - Incomplete support for Atom 0.3 draft. (Anyone still using it?)
115
+ #
116
+ # - No support for base64 data in Atom feeds (yet).
117
+ #
118
+ # - No Japanese documentation.
119
+ #
120
+ # - No XSL output options.
121
+ #
122
+ # - Slower if there are dates in the feed and you ask for their values.
123
+ #
124
+ # == Other options
125
+ #
126
+ # There are, of course, other Ruby RSS/Atom libraries out there. The ones I
127
+ # know about:
128
+ #
129
+ # = simple-rss
130
+ #
131
+ # http://rubyforge.org/projects/simple-rss
132
+ #
133
+ # Pros:
134
+ # - Much smaller than syndication or rss.
135
+ #
136
+ # - Completely non-validating.
137
+ #
138
+ # - Backwards compatible with rss in standard library.
139
+ #
140
+ # Cons:
141
+ # - Doesn't use a real XML parser.
142
+ #
143
+ # - No support for namespaces.
144
+ #
145
+ # - Incomplete Atom support (e.g. can't get name and e-mail of <atom:person>
146
+ # elements as separate fields, you still have to decode XHTML data yourself)
147
+ #
148
+ # - No documentation.
149
+ #
150
+ # For the record, I started work on my library long before simple-rss was
151
+ # announced.
152
+ #
153
+ # = feedtools / feedreader
154
+ #
155
+ # http://rubyforge.org/projects/feedtools/
156
+ #
157
+ # I don't know much about this one.
158
+ #
159
+ # == Design philosophy
160
+ #
161
+ # Here's my design philosophy for this module:
162
+ #
163
+ # - The interface should be via standard Ruby objects and methods; e.g.
164
+ # feed.channel.item[0].title, rather than (say) a dictionary hash.
165
+ #
166
+ # - It should be easier to parse RSS via the module than to hack something
167
+ # together using REXML, even if all you want is a list of titles and URLs.
168
+ #
169
+ # - It should be easy to add support for new RSS extensions without needing
170
+ # to know anything about reflection or other advanced topics. Just define
171
+ # a mixin with a bunch of appropriately-named methods, and you're done.
172
+ #
173
+ # - The code should be simple to understand.
174
+ #
175
+ # - Even so, good complete documentation is extremely important.
176
+ #
177
+ # - Be lenient in what you accept.
178
+ #
179
+ # - Be conservative in what you generate.
180
+ #
181
+ # - Get well-formed feeds parsing reliably, then worry about broken feeds.
182
+ #
183
+ # == Future plans
184
+ #
185
+ # Here are some possible improvements:
186
+ #
187
+ # - RSS and Atom generation. Create objects, then call Syndication::FeedMaker
188
+ # to generate XML in various flavors.
189
+ #
190
+ # - More lenient parsing. The limiting factor right now appears to be REXML,
191
+ # which although a non-validating parser, does require fairly well-formed
192
+ # XML. (In particular, failure to match tags will cause errors.) Perhaps
193
+ # the answer is to find or build a 'tag soup' parser that implements the
194
+ # REXML stream parsing API?
195
+ #
196
+ # - Faster date parsing. It turns out that when I asked for parsed dates in
197
+ # my test code, the profiler showed Date.parse chewing up 25% of the total
198
+ # CPU time used. A more specific date parser that didn't use heuristics
199
+ # to guess format could cut that down drastically. On the other hand,
200
+ # does it actually matter? Is the date parsing slow enough to be a problem
201
+ # for anyone?
202
+ #
203
+ # == Feedback
204
+ #
205
+ # This is my first public release of this code, so there are doubtless things
206
+ # I could have done better. Comments, suggestions, etc are welcome; e-mail
207
+ # <meta@pobox.com>.
208
+ #
@@ -0,0 +1,21 @@
1
+
2
+ # RSS Syndication example:
3
+ #
4
+ # Output Yahoo news headlines, dated.
5
+
6
+ require 'open-uri'
7
+ require 'syndication/rss'
8
+
9
+ parser = Syndication::RSS::Parser.new
10
+ feed = nil
11
+ open("http://rss.news.yahoo.com/rss/topstories") {|file|
12
+ text = file.read
13
+ feed = parser.parse(text)
14
+ }
15
+ chan = feed.channel
16
+ t = chan.lastbuilddate.strftime("%H:%I on %A %d %B")
17
+ puts "#{chan.title} at #{t}"
18
+ for i in feed.items
19
+ t = i.pubdate.strftime("%d %b")
20
+ puts "#{t}: #{i.title}"
21
+ end
@@ -0,0 +1,479 @@
1
+ # Provides classes for parsing Atom web syndication feeds.
2
+ # See Syndication class for documentation.
3
+ #
4
+ # Copyright � mathew <meta@pobox.com> 2005.
5
+ # Licensed under the same terms as Ruby.
6
+
7
+ require 'uri'
8
+ require 'rexml/parsers/streamparser'
9
+ require 'rexml/streamlistener'
10
+ require 'rexml/document'
11
+ require 'date'
12
+ require 'syndication/common'
13
+
14
+ module Syndication
15
+
16
+ # The Atom syndication format is defined at
17
+ # <URL:http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-11.txt>.
18
+ # It is finalized, and should become an RFC soon.
19
+ #
20
+ # For an introduction, see "An overview of the Atom 1.0 Syndication Format"
21
+ # at <URL:http://www-128.ibm.com/developerworks/xml/library/x-atom10.html>
22
+ #
23
+ # For a comparison of Atom and RSS, see
24
+ # <URL:http://www.tbray.org/atom/RSS-and-Atom>
25
+ #
26
+ # To parse Atom feeds, use Syndication::Atom::Parser.
27
+ #
28
+ # The earlier Atom 0.3 format is partially supported; the 'mode' attribute
29
+ # is ignored and assumed to be 'xml' (as for Atom 1.0).
30
+ #
31
+ # Base64 encoded data in Atom 1.0 feeds is not supported (yet).
32
+ module Atom
33
+
34
+ # A value in an Atom feed which might be plain ASCII text, HTML, XHTML,
35
+ # or some random MIME type.
36
+
37
+ # TODO: Implement base64 support
38
+ # See http://ietfreport.isoc.org/all-ids/draft-ietf-atompub-format-11.txt
39
+ # section 4.1.3.3.
40
+
41
+ #:stopdoc:
42
+ # This object has to be handled specially; the parser feeds in all the
43
+ # REXML events, so the object can reconstruct embedded XML/XHTML.
44
+ # (Normally, the parser handles text buffering for a Container and
45
+ # calls store() when the container's element is closed.)
46
+ #:startdoc:
47
+ class Data < Container
48
+ # The decoded data, if the type is not text or XML
49
+ attr_reader :data
50
+
51
+ # Table of entities ripped from the XHTML spec.
52
+ ENTITIES = {
53
+ 'Aacute' => 193, 'aacute' => 225, 'Acirc' => 194,
54
+ 'acirc' => 226, 'acute' => 180, 'AElig' => 198,
55
+ 'aelig' => 230, 'Agrave' => 192, 'agrave' => 224,
56
+ 'amp' => 38, 'Aring' => 197, 'aring' => 229,
57
+ 'Atilde' => 195, 'atilde' => 227, 'Auml' => 196,
58
+ 'auml' => 228, 'brvbar' => 166, 'Ccedil' => 199,
59
+ 'ccedil' => 231, 'cedil' => 184, 'cent' => 162,
60
+ 'copy' => 169, 'curren' => 164, 'deg' => 176,
61
+ 'divide' => 247, 'Eacute' => 201, 'eacute' => 233,
62
+ 'Ecirc' => 202, 'ecirc' => 234, 'Egrave' => 200,
63
+ 'egrave' => 232, 'ETH' => 208, 'eth' => 240,
64
+ 'Euml' => 203, 'euml' => 235, 'frac12' => 189,
65
+ 'frac14' => 188, 'frac34' => 190, 'gt' => 62,
66
+ 'Iacute' => 205, 'iacute' => 237, 'Icirc' => 206,
67
+ 'icirc' => 238, 'iexcl' => 161, 'Igrave' => 204,
68
+ 'igrave' => 236, 'iquest' => 191, 'Iuml' => 207,
69
+ 'iuml' => 239, 'laquo' => 171, 'lt' => 60,
70
+ 'macr' => 175, 'micro' => 181, 'middot' => 183,
71
+ 'nbsp' => 160, 'not' => 172, 'Ntilde' => 209,
72
+ 'ntilde' => 241, 'Oacute' => 211, 'oacute' => 243,
73
+ 'Ocirc' => 212, 'ocirc' => 244, 'Ograve' => 210,
74
+ 'ograve' => 242, 'ordf' => 170, 'ordm' => 186,
75
+ 'Oslash' => 216, 'oslash' => 248, 'Otilde' => 213,
76
+ 'otilde' => 245, 'Ouml' => 214, 'ouml' => 246,
77
+ 'para' => 182, 'plusmn' => 177, 'pound' => 163,
78
+ 'quot' => 34, 'raquo' => 187, 'reg' => 174,
79
+ 'sect' => 167, 'shy' => 173, 'sup1' => 185,
80
+ 'sup2' => 178, 'sup3' => 179, 'szlig' => 223,
81
+ 'THORN' => 222, 'thorn' => 254, 'times' => 215,
82
+ 'Uacute' => 218, 'uacute' => 250, 'Ucirc' => 219,
83
+ 'ucirc' => 251, 'Ugrave' => 217, 'ugrave' => 249,
84
+ 'uml' => 168, 'Uuml' => 220, 'uuml' => 252,
85
+ 'Yacute' => 221, 'yacute' => 253, 'yen' => 165,
86
+ 'yuml' => 255
87
+ }
88
+
89
+ def initialize(parent, tag, attrs = nil)
90
+ @tag = tag
91
+ @parent = parent
92
+ @type = 'text' # the default, as per the standard
93
+ if attrs['type']
94
+ @type = attrs['type']
95
+ end
96
+ @div_trimmed = false
97
+ case @type
98
+ when 'xhtml'
99
+ @xhtml = ''
100
+ when 'html'
101
+ @html = ''
102
+ when 'text'
103
+ @text = ''
104
+ end
105
+ end
106
+
107
+ # Convert a text representation to HTML.
108
+ def text2html(text)
109
+ html = text.gsub('&','&amp;')
110
+ html.gsub!('<','&lt;')
111
+ html.gsub!('>','&gt;')
112
+ return html
113
+ end
114
+
115
+ # Convert an HTML representation to text.
116
+ # This is done by throwing away all tags and converting all entities.
117
+ # Not ideal, but I can't think of a better simple approach.
118
+ def html2text(html)
119
+ text = html.gsub(/<[^>]*>/, '')
120
+ text = text.gsub(/&(\w)+;/) {|x|
121
+ ENTITIES[x] ? ENTITIES[x] : ''
122
+ }
123
+ return text
124
+ end
125
+
126
+ # Return value of Data as HTML.
127
+ def html
128
+ return @html if @html
129
+ return @xhtml if @xhtml
130
+ return text2html(@text) if @text
131
+ return nil
132
+ end
133
+
134
+ # Return value of Data as ASCII text.
135
+ # If the field started off as (X)HTML, this is done by ruthlessly
136
+ # discarding markup and entities, so it is highly recommended that you
137
+ # use the XHTML or HTML and convert to text in a more intelligent way.
138
+ def txt
139
+ return @text if @text
140
+ return html2text(@xhtml) if @xhtml
141
+ return html2text(@html) if @html
142
+ return nil
143
+ end
144
+
145
+ # Return value of Data as XHTML.
146
+ def xhtml
147
+ return @xhtml if @xhtml
148
+ return @html if @html
149
+ return text2html(@text) if @text
150
+ return nil
151
+ end
152
+
153
+ # Catch tag start events if we're collecting embedded XHTML.
154
+ def tag_start(tag, attrs = nil)
155
+ if @type == 'xhtml'
156
+ t = tag.sub(/^xhtml:/,'')
157
+ @xhtml += "<#{t}>"
158
+ else
159
+ super
160
+ end
161
+ end
162
+
163
+ # Catch tag end events if we're collecting embedded XHTML.
164
+ def tag_end(endtag, current)
165
+ if @tag == endtag
166
+ if @type == 'xhtml' and !@div_stripped
167
+ @xhtml.sub!(/^\s*<div>\s*/m,'')
168
+ @xhtml.sub!(/\s*<\/div>\s*$/m,'')
169
+ @div_stripped = true
170
+ end
171
+ return @parent
172
+ end
173
+ if @type == 'xhtml'
174
+ t = endtag.sub(/^xhtml:/,'')
175
+ @xhtml += "</#{t}>"
176
+ return self
177
+ else
178
+ super
179
+ end
180
+ end
181
+
182
+ # Store/buffer text in the appropriate internal field.
183
+ def text(s)
184
+ case @type
185
+ when 'xhtml'
186
+ @xhtml += s
187
+ when 'html'
188
+ @html += s
189
+ when 'text'
190
+ @text += s
191
+ end
192
+ end
193
+ end
194
+
195
+ # A Link represents a hypertext link to another object from an Atom feed.
196
+ # Examples include the link with rel=self to the canonical URL of the feed.
197
+ class Link < Container
198
+ attr_accessor :href # The URI of the link.
199
+ attr_accessor :rel # The type of relationship the link expresses.
200
+ attr_accessor :type # The type of object at the other end of the link.
201
+ attr_accessor :title # The title for the link.
202
+ attr_accessor :length # The length of the linked-to object in bytes.
203
+
204
+ def initialize(parent, tag, attrs = nil)
205
+ @tag = tag
206
+ @parent = parent
207
+ if attrs
208
+ attrs.each_pair {|key, value|
209
+ self.store(key, value)
210
+ }
211
+ end
212
+ end
213
+ end
214
+
215
+ # A person, corporation or similar entity within an Atom feed.
216
+ class Person < Container
217
+ attr_accessor :name # Human-readable name of person.
218
+ attr_accessor :uri # URI associated with the person.
219
+ attr_accessor :email # RFC2822 e-mail address of person.
220
+
221
+ # For Atom 0.3 compatibility
222
+ def url=(x)
223
+ @uri = x
224
+ end
225
+ end
226
+
227
+ # A category (keyword) in an Atom feed.
228
+ # For convenience, Category#to_s is the same as Category#label.
229
+ class Category < Container
230
+ # The category itself, possibly encoded.
231
+ attr_accessor :term
232
+ # A human-readable version of Category#term.
233
+ attr_accessor :label
234
+ # URI to the schema definition.
235
+ attr_accessor :scheme
236
+
237
+ #:stopdoc:
238
+ # parent = parent object
239
+ # tag = XML tag which caused creation of this object
240
+ # attrs = XML attributes as a hash
241
+ def initialize(parent, tag, attrs = nil)
242
+ @tag = tag
243
+ @parent = parent
244
+ if attrs
245
+ attrs.each_pair {|key, value|
246
+ self.store(key, value)
247
+ }
248
+ end
249
+ end
250
+
251
+ alias to_s label
252
+ #:startdoc:
253
+ end
254
+
255
+ # Represents a parsed Atom feed, as returned by Syndication::Atom::Parser.
256
+ class Feed < Container
257
+ # Title of feed as a Syndication::Data object.
258
+ attr_accessor :title
259
+ # Subtitle of feed as a Syndication::Data object.
260
+ attr_accessor :subtitle
261
+ # Last update time, accepts an ISO8601 date/time as per the Atom spec.
262
+ attr_writer :updated
263
+ # Software which generated feed as a String.
264
+ attr_accessor :generator
265
+ # URI of icon to represent channel as a String.
266
+ attr_accessor :icon
267
+ # Globally unique ID of feed as a String.
268
+ attr_accessor :id
269
+ # URI of logo for channel as a String.
270
+ attr_accessor :logo
271
+ # Copyright or other rights information as a String.
272
+ attr_accessor :rights
273
+ # Author of feed as a Syndication::Person object.
274
+ attr_accessor :author
275
+ # Array of Syndication::Entry objects representing the entries in the feed.
276
+ attr_reader :entries
277
+ # Array of Syndication::Category objects representing taxonomic
278
+ # categories for the feed.
279
+ attr_reader :categories
280
+ # Array of Syndication::Person objects representing contributors.
281
+ attr_reader :contributors
282
+ # Array of Syndication::Link objects representing various types of link.
283
+ attr_reader :links
284
+ # Atom 0.3 info element (obsolete)
285
+ attr_accessor :info
286
+
287
+ # For Atom 0.3 compatibility
288
+ def tagline=(x)
289
+ @subtitle = x
290
+ end
291
+
292
+ # For Atom 0.3 compatibility
293
+ def copyright=(x)
294
+ @rights = x
295
+ end
296
+
297
+ # For Atom 0.3 compatibility
298
+ def modified=(x)
299
+ @updated = x
300
+ end
301
+
302
+ # Add a Syndication::Category value to the feed
303
+ def category=(obj)
304
+ if !@categories
305
+ @categories = Array.new
306
+ end
307
+ @categories.push(obj)
308
+ end
309
+
310
+ # Add a Syndication::Entry to the feed
311
+ def entry=(obj)
312
+ if !@entries
313
+ @entries = Array.new
314
+ end
315
+ @entries.push(obj)
316
+ end
317
+
318
+ # Add a Syndication::Person contributor to the feed
319
+ def contributor=(obj)
320
+ if !@contributors
321
+ @contributors = Array.new
322
+ end
323
+ @contributors.push(obj)
324
+ end
325
+
326
+ # Add a Syndication::Link to the feed
327
+ def link=(obj)
328
+ if !@links
329
+ @links = Array.new
330
+ end
331
+ @links.push(obj)
332
+ end
333
+
334
+ # Last update date/time as a DateTime object if it can be parsed,
335
+ # a String otherwise.
336
+ def updated
337
+ parse_date(@updated)
338
+ end
339
+ end
340
+
341
+ # An entry within an Atom feed.
342
+ class Entry < Container
343
+ # Title of entry.
344
+ attr_accessor :title
345
+ # Summary of content.
346
+ attr_accessor :summary
347
+ # Source feed metadata as Feed object.
348
+ attr_accessor :source
349
+ # Last update date/time as DateTime object.
350
+ attr_writer :updated
351
+ # Publication date/time as DateTime object.
352
+ attr_writer :published
353
+ # Author of entry as a Person object.
354
+ attr_accessor :author
355
+ # Copyright or other rights information.
356
+ attr_accessor :rights
357
+ # Content of entry.
358
+ attr_accessor :content
359
+ # Globally unique ID of Entry.
360
+ attr_accessor :id
361
+ # Array of taxonomic categories for feed.
362
+ attr_reader :categories
363
+ # Array of Link objects.
364
+ attr_reader :links
365
+ # Array of Person objects representing contributors.
366
+ attr_reader :contributors
367
+ # Atom 0.3 creation date/time (obsolete)
368
+ attr_writer :created
369
+
370
+ # For Atom 0.3 compatibility
371
+ def modified=(x)
372
+ @updated = x
373
+ end
374
+
375
+ # For Atom 0.3 compatibility
376
+ def issued=(x)
377
+ @published = x
378
+ end
379
+
380
+ # For Atom 0.3 compatibility
381
+ def copyright=(x)
382
+ @rights = x
383
+ end
384
+
385
+ # Add a Category object to the entry
386
+ def category=(obj)
387
+ if !@categories
388
+ @categories = Array.new
389
+ end
390
+ @categories.push(obj)
391
+ end
392
+
393
+ # Add a Person to the entry to represent a contributor
394
+ def contributor=(obj)
395
+ if !@contributors
396
+ @contributors = Array.new
397
+ end
398
+ @contributors.push(obj)
399
+ end
400
+
401
+ # Add a Link to the entry
402
+ def link=(obj)
403
+ if !@links
404
+ @links = Array.new
405
+ end
406
+ @links.push(obj)
407
+ end
408
+
409
+ # The last update DateTime
410
+ def updated
411
+ parse_date(@updated)
412
+ end
413
+
414
+ # The DateTime of publication
415
+ def published
416
+ parse_date(@published)
417
+ end
418
+
419
+ # The DateTime of creation (Atom 0.3, obsolete)
420
+ def created
421
+ parse_date(@created)
422
+ end
423
+ end
424
+
425
+ # A parser for Atom feeds.
426
+ # See Syndication::Parser in common.rb for the abstract class this
427
+ # specializes.
428
+ class Parser < AbstractParser
429
+ include REXML::StreamListener
430
+
431
+ #:stopdoc:
432
+ # A hash of tags which require the creation of new objects, and the class
433
+ # to use for creating the object.
434
+ CLASS_FOR_TAG = {
435
+ 'entry' => Entry,
436
+ 'author' => Person,
437
+ 'contributor' => Person,
438
+ 'title' => Data,
439
+ 'subtitle' => Data,
440
+ 'summary' => Data,
441
+ 'link' => Link,
442
+ 'source' => Feed,
443
+ 'category' => Category
444
+ }
445
+
446
+ # Called when REXML finds a text fragment.
447
+ # For Atom parsing, we need to handle Data objects specially:
448
+ # They need all events passed through verbatim, because
449
+ # they might contain XHTML which will be sent through
450
+ # as REXML events and will need to be reconstructed.
451
+ def text(s)
452
+ if @current_object.kind_of?(Data)
453
+ @current_object.text(s)
454
+ return
455
+ end
456
+ if @textstack.last
457
+ @textstack.last << s
458
+ end
459
+ end
460
+ #:startdoc:
461
+
462
+ # Reset the parser ready to parse a new feed.
463
+ def reset
464
+ # Set up an empty Feed object and make it the current object
465
+ @parsetree = Feed.new(nil)
466
+ # Set up the class-for-tag hash
467
+ @class_for_tag = CLASS_FOR_TAG
468
+ # Everything else is common to both kinds of parser
469
+ super
470
+ end
471
+
472
+ # The most recently parsed feed as a Syndication::Feed object.
473
+ def feed
474
+ return @parsetree
475
+ end
476
+
477
+ end
478
+ end
479
+ end