syndication 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,33 @@
1
+ # = Syndication 0.4
2
+ #
3
+ # As discussed in the README, this is really my fourth attempt at writing
4
+ # RSS parsing code. For the record, I thought I'd list the approaches I
5
+ # tried and abandoned. In a way, that's more interesting than the one I
6
+ # picked...
7
+ #
8
+ # First I used hashes for storage and just looked for matching tags.
9
+ # That approach works, kinda, but it doesn't really understand nested
10
+ # elements at all. As a result, it becomes really hard to deal with Atom
11
+ # feeds, where an <email> element could belong to one of a number of kinds
12
+ # of person. Plus, I wanted a real object-based approach which would be
13
+ # amenable to RDoc documentation.
14
+ #
15
+ # Next I wrote a classic stack-based parser, with a container stack and a
16
+ # text buffer stack. That worked well for RSS; I got it parsing every RSS
17
+ # variant, and even went as far as a test suite. However, as I tried
18
+ # extending it to deal with Atom, I realized that the parser code was
19
+ # becoming hard to follow, as the state machine gained more and more
20
+ # special cases.
21
+ #
22
+ # For a third iteration, I tried to generalize the knowledge represented by the
23
+ # state machine, by placing it in the context stack. That is, I would have a
24
+ # smart stack that knew which XML elements could go inside other elements.
25
+ # Actually, there would have been four context stacks, for containers,
26
+ # attributes, tags and textual data.
27
+ #
28
+ # That design never made it past the paper stage, because I realized that I
29
+ # could move all the knowledge into the classes used to create the objects of
30
+ # the final parse tree. With the new model--the one used in this code--the
31
+ # parser really doesn't know anything about Atom or RSS. It just forwards
32
+ # events to a tree of objects, which construct child objects as appropriate to
33
+ # grow the tree and represent the feed.
data/README ADDED
@@ -0,0 +1,208 @@
1
+ #
2
+ # = Syndication 0.4
3
+ #
4
+ # This module provides classes for parsing web syndication feeds in RSS and
5
+ # Atom formats.
6
+ #
7
+ # To parse RSS, use Syndication::RSS::Parser.
8
+ #
9
+ # To parse Atom, use Syndication::Atom::Parser.
10
+ #
11
+ # If you want my advice on which to generate, my order of preference would
12
+ # be:
13
+ #
14
+ # 1. Atom 1.0
15
+ # 2. RSS 1.0
16
+ # 3. RSS 2.0
17
+ #
18
+ # My reasoning is simply that I hate having to sniff for HTML (see
19
+ # Syndication::RSS).
20
+ #
21
+ # == License
22
+ #
23
+ # Syndication is Copyright 2005 mathew <meta@pobox.com>, and is licensed
24
+ # under the same terms as Ruby.
25
+ #
26
+ # == Requirements
27
+ #
28
+ # Built and tested using Ruby 1.8.2. Needs only the standard library.
29
+ #
30
+ # == Rationale
31
+ #
32
+ # Ruby already has an RSS library as part of the standard library, so you
33
+ # might be wondering why I decided to write another one.
34
+ #
35
+ # I started out trying to document the standard rss module, but found the
36
+ # code rather impenetrable. It was also difficult to see how it could be made
37
+ # documentable via Rdoc.
38
+ #
39
+ # Then I tried writing code to use the standard RSS library, and discovered
40
+ # that it had a number of (what I consider to be) defects:
41
+ #
42
+ # - It doesn't support RSS 2.0 with extensions (such as iTunes podcast feeds),
43
+ # and it wasn't clear to me how to extend it to do so.
44
+ #
45
+ # - It doesn't support RSS 0.9.
46
+ #
47
+ # - It doesn't support Atom.
48
+ #
49
+ # - The API is different depending on what kind of RSS feed you are parsing.
50
+ #
51
+ # I asked around, and discovered that I wasn't the only person dissatisfied
52
+ # with the RSS library. Since fixing the problems would have resulted in
53
+ # breaking existing code that used the RSS module, I opted for an all-new
54
+ # implementation.
55
+ #
56
+ # This is the result. I'm calling it version 0.4, because it's actually my
57
+ # fourth attempt at putting together a clean, simple, universal API for RSS
58
+ # and Atom parsing. (The first three never saw public release.)
59
+ #
60
+ # == Features
61
+ #
62
+ # Here are what I see as the key improvements over the rss module in the
63
+ # Ruby standard library:
64
+ #
65
+ # - Supports all RSS versions, including RSS 0.9, as well as Atom.
66
+ #
67
+ # - Provides a unified API/object model for accessing the decoded data,
68
+ # with no need to know what format the feed is in.
69
+ #
70
+ # - Allows use of extended RSS 2.0 feeds.
71
+ #
72
+ # - Simple API, fully documented.
73
+ #
74
+ # - Test suite with over 220 test assertions.
75
+ #
76
+ # - Commented source code.
77
+ #
78
+ # - Less source code than the standard library rss module.
79
+ #
80
+ # - Faster than the standard library (at least, in my tests, see caveat below).
81
+ #
82
+ # Other features:
83
+ #
84
+ # - Optional support for RSS 1.0 Dublin Core, Syndication and Content modules
85
+ # and Apple iTunes Podcast elements (others to follow).
86
+ #
87
+ # - Content module decodes CDATA-escaped or encoded HTML content for you.
88
+ #
89
+ # - Supports namespaces, and encoded XHTML/HTML in Atom feeds.
90
+ #
91
+ # - Dates decoded to Ruby DateTime objects. Note, however, that this is slow,
92
+ # so parsing is only performed if you ask for the value.
93
+ #
94
+ # - Simple to extend to support your own RSS extensions, uses reflection.
95
+ #
96
+ # - Uses REXML fast stream parsing API for speed.
97
+ #
98
+ # - Non-validating, tries to be as forgiving as possible of structural errors.
99
+ #
100
+ # - Remaps namespace prefixes to standard values if it recognizes the module's
101
+ # URL.
102
+ #
103
+ # In the interests of balance, here are some key disadvantages over the
104
+ # standard library RSS support:
105
+ #
106
+ # - No support for _generating_ RSS feeds yet, only for parsing them. If
107
+ # you're using Rails, you can use RXML; if not, you can of course continue
108
+ # to use rss/maker.
109
+ #
110
+ # - Different API, not a drop-in replacement.
111
+ #
112
+ # - No way to choose a different XML parser (yet).
113
+ #
114
+ # - Incomplete support for Atom 0.3 draft. (Anyone still using it?)
115
+ #
116
+ # - No support for base64 data in Atom feeds (yet).
117
+ #
118
+ # - No Japanese documentation.
119
+ #
120
+ # - No XSL output options.
121
+ #
122
+ # - Slower if there are dates in the feed and you ask for their values.
123
+ #
124
+ # == Other options
125
+ #
126
+ # There are, of course, other Ruby RSS/Atom libraries out there. The ones I
127
+ # know about:
128
+ #
129
+ # = simple-rss
130
+ #
131
+ # http://rubyforge.org/projects/simple-rss
132
+ #
133
+ # Pros:
134
+ # - Much smaller than syndication or rss.
135
+ #
136
+ # - Completely non-validating.
137
+ #
138
+ # - Backwards compatible with rss in standard library.
139
+ #
140
+ # Cons:
141
+ # - Doesn't use a real XML parser.
142
+ #
143
+ # - No support for namespaces.
144
+ #
145
+ # - Incomplete Atom support (e.g. can't get name and e-mail of <atom:person>
146
+ # elements as separate fields, you still have to decode XHTML data yourself)
147
+ #
148
+ # - No documentation.
149
+ #
150
+ # For the record, I started work on my library long before simple-rss was
151
+ # announced.
152
+ #
153
+ # = feedtools / feedreader
154
+ #
155
+ # http://rubyforge.org/projects/feedtools/
156
+ #
157
+ # I don't know much about this one.
158
+ #
159
+ # == Design philosophy
160
+ #
161
+ # Here's my design philosophy for this module:
162
+ #
163
+ # - The interface should be via standard Ruby objects and methods; e.g.
164
+ # feed.channel.item[0].title, rather than (say) a dictionary hash.
165
+ #
166
+ # - It should be easier to parse RSS via the module than to hack something
167
+ # together using REXML, even if all you want is a list of titles and URLs.
168
+ #
169
+ # - It should be easy to add support for new RSS extensions without needing
170
+ # to know anything about reflection or other advanced topics. Just define
171
+ # a mixin with a bunch of appropriately-named methods, and you're done.
172
+ #
173
+ # - The code should be simple to understand.
174
+ #
175
+ # - Even so, good complete documentation is extremely important.
176
+ #
177
+ # - Be lenient in what you accept.
178
+ #
179
+ # - Be conservative in what you generate.
180
+ #
181
+ # - Get well-formed feeds parsing reliably, then worry about broken feeds.
182
+ #
183
+ # == Future plans
184
+ #
185
+ # Here are some possible improvements:
186
+ #
187
+ # - RSS and Atom generation. Create objects, then call Syndication::FeedMaker
188
+ # to generate XML in various flavors.
189
+ #
190
+ # - More lenient parsing. The limiting factor right now appears to be REXML,
191
+ # which although a non-validating parser, does require fairly well-formed
192
+ # XML. (In particular, failure to match tags will cause errors.) Perhaps
193
+ # the answer is to find or build a 'tag soup' parser that implements the
194
+ # REXML stream parsing API?
195
+ #
196
+ # - Faster date parsing. It turns out that when I asked for parsed dates in
197
+ # my test code, the profiler showed Date.parse chewing up 25% of the total
198
+ # CPU time used. A more specific date parser that didn't use heuristics
199
+ # to guess format could cut that down drastically. On the other hand,
200
+ # does it actually matter? Is the date parsing slow enough to be a problem
201
+ # for anyone?
202
+ #
203
+ # == Feedback
204
+ #
205
+ # This is my first public release of this code, so there are doubtless things
206
+ # I could have done better. Comments, suggestions, etc are welcome; e-mail
207
+ # <meta@pobox.com>.
208
+ #
@@ -0,0 +1,21 @@
1
+
2
+ # RSS Syndication example:
3
+ #
4
+ # Output Yahoo news headlines, dated.
5
+
6
+ require 'open-uri'
7
+ require 'syndication/rss'
8
+
9
+ parser = Syndication::RSS::Parser.new
10
+ feed = nil
11
+ open("http://rss.news.yahoo.com/rss/topstories") {|file|
12
+ text = file.read
13
+ feed = parser.parse(text)
14
+ }
15
+ chan = feed.channel
16
+ t = chan.lastbuilddate.strftime("%H:%I on %A %d %B")
17
+ puts "#{chan.title} at #{t}"
18
+ for i in feed.items
19
+ t = i.pubdate.strftime("%d %b")
20
+ puts "#{t}: #{i.title}"
21
+ end
@@ -0,0 +1,479 @@
1
+ # Provides classes for parsing Atom web syndication feeds.
2
+ # See Syndication class for documentation.
3
+ #
4
+ # Copyright � mathew <meta@pobox.com> 2005.
5
+ # Licensed under the same terms as Ruby.
6
+
7
+ require 'uri'
8
+ require 'rexml/parsers/streamparser'
9
+ require 'rexml/streamlistener'
10
+ require 'rexml/document'
11
+ require 'date'
12
+ require 'syndication/common'
13
+
14
+ module Syndication
15
+
16
+ # The Atom syndication format is defined at
17
+ # <URL:http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-11.txt>.
18
+ # It is finalized, and should become an RFC soon.
19
+ #
20
+ # For an introduction, see "An overview of the Atom 1.0 Syndication Format"
21
+ # at <URL:http://www-128.ibm.com/developerworks/xml/library/x-atom10.html>
22
+ #
23
+ # For a comparison of Atom and RSS, see
24
+ # <URL:http://www.tbray.org/atom/RSS-and-Atom>
25
+ #
26
+ # To parse Atom feeds, use Syndication::Atom::Parser.
27
+ #
28
+ # The earlier Atom 0.3 format is partially supported; the 'mode' attribute
29
+ # is ignored and assumed to be 'xml' (as for Atom 1.0).
30
+ #
31
+ # Base64 encoded data in Atom 1.0 feeds is not supported (yet).
32
+ module Atom
33
+
34
+ # A value in an Atom feed which might be plain ASCII text, HTML, XHTML,
35
+ # or some random MIME type.
36
+
37
+ # TODO: Implement base64 support
38
+ # See http://ietfreport.isoc.org/all-ids/draft-ietf-atompub-format-11.txt
39
+ # section 4.1.3.3.
40
+
41
+ #:stopdoc:
42
+ # This object has to be handled specially; the parser feeds in all the
43
+ # REXML events, so the object can reconstruct embedded XML/XHTML.
44
+ # (Normally, the parser handles text buffering for a Container and
45
+ # calls store() when the container's element is closed.)
46
+ #:startdoc:
47
+ class Data < Container
48
+ # The decoded data, if the type is not text or XML
49
+ attr_reader :data
50
+
51
+ # Table of entities ripped from the XHTML spec.
52
+ ENTITIES = {
53
+ 'Aacute' => 193, 'aacute' => 225, 'Acirc' => 194,
54
+ 'acirc' => 226, 'acute' => 180, 'AElig' => 198,
55
+ 'aelig' => 230, 'Agrave' => 192, 'agrave' => 224,
56
+ 'amp' => 38, 'Aring' => 197, 'aring' => 229,
57
+ 'Atilde' => 195, 'atilde' => 227, 'Auml' => 196,
58
+ 'auml' => 228, 'brvbar' => 166, 'Ccedil' => 199,
59
+ 'ccedil' => 231, 'cedil' => 184, 'cent' => 162,
60
+ 'copy' => 169, 'curren' => 164, 'deg' => 176,
61
+ 'divide' => 247, 'Eacute' => 201, 'eacute' => 233,
62
+ 'Ecirc' => 202, 'ecirc' => 234, 'Egrave' => 200,
63
+ 'egrave' => 232, 'ETH' => 208, 'eth' => 240,
64
+ 'Euml' => 203, 'euml' => 235, 'frac12' => 189,
65
+ 'frac14' => 188, 'frac34' => 190, 'gt' => 62,
66
+ 'Iacute' => 205, 'iacute' => 237, 'Icirc' => 206,
67
+ 'icirc' => 238, 'iexcl' => 161, 'Igrave' => 204,
68
+ 'igrave' => 236, 'iquest' => 191, 'Iuml' => 207,
69
+ 'iuml' => 239, 'laquo' => 171, 'lt' => 60,
70
+ 'macr' => 175, 'micro' => 181, 'middot' => 183,
71
+ 'nbsp' => 160, 'not' => 172, 'Ntilde' => 209,
72
+ 'ntilde' => 241, 'Oacute' => 211, 'oacute' => 243,
73
+ 'Ocirc' => 212, 'ocirc' => 244, 'Ograve' => 210,
74
+ 'ograve' => 242, 'ordf' => 170, 'ordm' => 186,
75
+ 'Oslash' => 216, 'oslash' => 248, 'Otilde' => 213,
76
+ 'otilde' => 245, 'Ouml' => 214, 'ouml' => 246,
77
+ 'para' => 182, 'plusmn' => 177, 'pound' => 163,
78
+ 'quot' => 34, 'raquo' => 187, 'reg' => 174,
79
+ 'sect' => 167, 'shy' => 173, 'sup1' => 185,
80
+ 'sup2' => 178, 'sup3' => 179, 'szlig' => 223,
81
+ 'THORN' => 222, 'thorn' => 254, 'times' => 215,
82
+ 'Uacute' => 218, 'uacute' => 250, 'Ucirc' => 219,
83
+ 'ucirc' => 251, 'Ugrave' => 217, 'ugrave' => 249,
84
+ 'uml' => 168, 'Uuml' => 220, 'uuml' => 252,
85
+ 'Yacute' => 221, 'yacute' => 253, 'yen' => 165,
86
+ 'yuml' => 255
87
+ }
88
+
89
+ def initialize(parent, tag, attrs = nil)
90
+ @tag = tag
91
+ @parent = parent
92
+ @type = 'text' # the default, as per the standard
93
+ if attrs['type']
94
+ @type = attrs['type']
95
+ end
96
+ @div_trimmed = false
97
+ case @type
98
+ when 'xhtml'
99
+ @xhtml = ''
100
+ when 'html'
101
+ @html = ''
102
+ when 'text'
103
+ @text = ''
104
+ end
105
+ end
106
+
107
+ # Convert a text representation to HTML.
108
+ def text2html(text)
109
+ html = text.gsub('&','&amp;')
110
+ html.gsub!('<','&lt;')
111
+ html.gsub!('>','&gt;')
112
+ return html
113
+ end
114
+
115
+ # Convert an HTML representation to text.
116
+ # This is done by throwing away all tags and converting all entities.
117
+ # Not ideal, but I can't think of a better simple approach.
118
+ def html2text(html)
119
+ text = html.gsub(/<[^>]*>/, '')
120
+ text = text.gsub(/&(\w)+;/) {|x|
121
+ ENTITIES[x] ? ENTITIES[x] : ''
122
+ }
123
+ return text
124
+ end
125
+
126
+ # Return value of Data as HTML.
127
+ def html
128
+ return @html if @html
129
+ return @xhtml if @xhtml
130
+ return text2html(@text) if @text
131
+ return nil
132
+ end
133
+
134
+ # Return value of Data as ASCII text.
135
+ # If the field started off as (X)HTML, this is done by ruthlessly
136
+ # discarding markup and entities, so it is highly recommended that you
137
+ # use the XHTML or HTML and convert to text in a more intelligent way.
138
+ def txt
139
+ return @text if @text
140
+ return html2text(@xhtml) if @xhtml
141
+ return html2text(@html) if @html
142
+ return nil
143
+ end
144
+
145
+ # Return value of Data as XHTML.
146
+ def xhtml
147
+ return @xhtml if @xhtml
148
+ return @html if @html
149
+ return text2html(@text) if @text
150
+ return nil
151
+ end
152
+
153
+ # Catch tag start events if we're collecting embedded XHTML.
154
+ def tag_start(tag, attrs = nil)
155
+ if @type == 'xhtml'
156
+ t = tag.sub(/^xhtml:/,'')
157
+ @xhtml += "<#{t}>"
158
+ else
159
+ super
160
+ end
161
+ end
162
+
163
+ # Catch tag end events if we're collecting embedded XHTML.
164
+ def tag_end(endtag, current)
165
+ if @tag == endtag
166
+ if @type == 'xhtml' and !@div_stripped
167
+ @xhtml.sub!(/^\s*<div>\s*/m,'')
168
+ @xhtml.sub!(/\s*<\/div>\s*$/m,'')
169
+ @div_stripped = true
170
+ end
171
+ return @parent
172
+ end
173
+ if @type == 'xhtml'
174
+ t = endtag.sub(/^xhtml:/,'')
175
+ @xhtml += "</#{t}>"
176
+ return self
177
+ else
178
+ super
179
+ end
180
+ end
181
+
182
+ # Store/buffer text in the appropriate internal field.
183
+ def text(s)
184
+ case @type
185
+ when 'xhtml'
186
+ @xhtml += s
187
+ when 'html'
188
+ @html += s
189
+ when 'text'
190
+ @text += s
191
+ end
192
+ end
193
+ end
194
+
195
+ # A Link represents a hypertext link to another object from an Atom feed.
196
+ # Examples include the link with rel=self to the canonical URL of the feed.
197
+ class Link < Container
198
+ attr_accessor :href # The URI of the link.
199
+ attr_accessor :rel # The type of relationship the link expresses.
200
+ attr_accessor :type # The type of object at the other end of the link.
201
+ attr_accessor :title # The title for the link.
202
+ attr_accessor :length # The length of the linked-to object in bytes.
203
+
204
+ def initialize(parent, tag, attrs = nil)
205
+ @tag = tag
206
+ @parent = parent
207
+ if attrs
208
+ attrs.each_pair {|key, value|
209
+ self.store(key, value)
210
+ }
211
+ end
212
+ end
213
+ end
214
+
215
+ # A person, corporation or similar entity within an Atom feed.
216
+ class Person < Container
217
+ attr_accessor :name # Human-readable name of person.
218
+ attr_accessor :uri # URI associated with the person.
219
+ attr_accessor :email # RFC2822 e-mail address of person.
220
+
221
+ # For Atom 0.3 compatibility
222
+ def url=(x)
223
+ @uri = x
224
+ end
225
+ end
226
+
227
+ # A category (keyword) in an Atom feed.
228
+ # For convenience, Category#to_s is the same as Category#label.
229
+ class Category < Container
230
+ # The category itself, possibly encoded.
231
+ attr_accessor :term
232
+ # A human-readable version of Category#term.
233
+ attr_accessor :label
234
+ # URI to the schema definition.
235
+ attr_accessor :scheme
236
+
237
+ #:stopdoc:
238
+ # parent = parent object
239
+ # tag = XML tag which caused creation of this object
240
+ # attrs = XML attributes as a hash
241
+ def initialize(parent, tag, attrs = nil)
242
+ @tag = tag
243
+ @parent = parent
244
+ if attrs
245
+ attrs.each_pair {|key, value|
246
+ self.store(key, value)
247
+ }
248
+ end
249
+ end
250
+
251
+ alias to_s label
252
+ #:startdoc:
253
+ end
254
+
255
+ # Represents a parsed Atom feed, as returned by Syndication::Atom::Parser.
256
+ class Feed < Container
257
+ # Title of feed as a Syndication::Data object.
258
+ attr_accessor :title
259
+ # Subtitle of feed as a Syndication::Data object.
260
+ attr_accessor :subtitle
261
+ # Last update time, accepts an ISO8601 date/time as per the Atom spec.
262
+ attr_writer :updated
263
+ # Software which generated feed as a String.
264
+ attr_accessor :generator
265
+ # URI of icon to represent channel as a String.
266
+ attr_accessor :icon
267
+ # Globally unique ID of feed as a String.
268
+ attr_accessor :id
269
+ # URI of logo for channel as a String.
270
+ attr_accessor :logo
271
+ # Copyright or other rights information as a String.
272
+ attr_accessor :rights
273
+ # Author of feed as a Syndication::Person object.
274
+ attr_accessor :author
275
+ # Array of Syndication::Entry objects representing the entries in the feed.
276
+ attr_reader :entries
277
+ # Array of Syndication::Category objects representing taxonomic
278
+ # categories for the feed.
279
+ attr_reader :categories
280
+ # Array of Syndication::Person objects representing contributors.
281
+ attr_reader :contributors
282
+ # Array of Syndication::Link objects representing various types of link.
283
+ attr_reader :links
284
+ # Atom 0.3 info element (obsolete)
285
+ attr_accessor :info
286
+
287
+ # For Atom 0.3 compatibility
288
+ def tagline=(x)
289
+ @subtitle = x
290
+ end
291
+
292
+ # For Atom 0.3 compatibility
293
+ def copyright=(x)
294
+ @rights = x
295
+ end
296
+
297
+ # For Atom 0.3 compatibility
298
+ def modified=(x)
299
+ @updated = x
300
+ end
301
+
302
+ # Add a Syndication::Category value to the feed
303
+ def category=(obj)
304
+ if !@categories
305
+ @categories = Array.new
306
+ end
307
+ @categories.push(obj)
308
+ end
309
+
310
+ # Add a Syndication::Entry to the feed
311
+ def entry=(obj)
312
+ if !@entries
313
+ @entries = Array.new
314
+ end
315
+ @entries.push(obj)
316
+ end
317
+
318
+ # Add a Syndication::Person contributor to the feed
319
+ def contributor=(obj)
320
+ if !@contributors
321
+ @contributors = Array.new
322
+ end
323
+ @contributors.push(obj)
324
+ end
325
+
326
+ # Add a Syndication::Link to the feed
327
+ def link=(obj)
328
+ if !@links
329
+ @links = Array.new
330
+ end
331
+ @links.push(obj)
332
+ end
333
+
334
+ # Last update date/time as a DateTime object if it can be parsed,
335
+ # a String otherwise.
336
+ def updated
337
+ parse_date(@updated)
338
+ end
339
+ end
340
+
341
+ # An entry within an Atom feed.
342
+ class Entry < Container
343
+ # Title of entry.
344
+ attr_accessor :title
345
+ # Summary of content.
346
+ attr_accessor :summary
347
+ # Source feed metadata as Feed object.
348
+ attr_accessor :source
349
+ # Last update date/time as DateTime object.
350
+ attr_writer :updated
351
+ # Publication date/time as DateTime object.
352
+ attr_writer :published
353
+ # Author of entry as a Person object.
354
+ attr_accessor :author
355
+ # Copyright or other rights information.
356
+ attr_accessor :rights
357
+ # Content of entry.
358
+ attr_accessor :content
359
+ # Globally unique ID of Entry.
360
+ attr_accessor :id
361
+ # Array of taxonomic categories for feed.
362
+ attr_reader :categories
363
+ # Array of Link objects.
364
+ attr_reader :links
365
+ # Array of Person objects representing contributors.
366
+ attr_reader :contributors
367
+ # Atom 0.3 creation date/time (obsolete)
368
+ attr_writer :created
369
+
370
+ # For Atom 0.3 compatibility
371
+ def modified=(x)
372
+ @updated = x
373
+ end
374
+
375
+ # For Atom 0.3 compatibility
376
+ def issued=(x)
377
+ @published = x
378
+ end
379
+
380
+ # For Atom 0.3 compatibility
381
+ def copyright=(x)
382
+ @rights = x
383
+ end
384
+
385
+ # Add a Category object to the entry
386
+ def category=(obj)
387
+ if !@categories
388
+ @categories = Array.new
389
+ end
390
+ @categories.push(obj)
391
+ end
392
+
393
+ # Add a Person to the entry to represent a contributor
394
+ def contributor=(obj)
395
+ if !@contributors
396
+ @contributors = Array.new
397
+ end
398
+ @contributors.push(obj)
399
+ end
400
+
401
+ # Add a Link to the entry
402
+ def link=(obj)
403
+ if !@links
404
+ @links = Array.new
405
+ end
406
+ @links.push(obj)
407
+ end
408
+
409
+ # The last update DateTime
410
+ def updated
411
+ parse_date(@updated)
412
+ end
413
+
414
+ # The DateTime of publication
415
+ def published
416
+ parse_date(@published)
417
+ end
418
+
419
+ # The DateTime of creation (Atom 0.3, obsolete)
420
+ def created
421
+ parse_date(@created)
422
+ end
423
+ end
424
+
425
+ # A parser for Atom feeds.
426
+ # See Syndication::Parser in common.rb for the abstract class this
427
+ # specializes.
428
+ class Parser < AbstractParser
429
+ include REXML::StreamListener
430
+
431
+ #:stopdoc:
432
+ # A hash of tags which require the creation of new objects, and the class
433
+ # to use for creating the object.
434
+ CLASS_FOR_TAG = {
435
+ 'entry' => Entry,
436
+ 'author' => Person,
437
+ 'contributor' => Person,
438
+ 'title' => Data,
439
+ 'subtitle' => Data,
440
+ 'summary' => Data,
441
+ 'link' => Link,
442
+ 'source' => Feed,
443
+ 'category' => Category
444
+ }
445
+
446
+ # Called when REXML finds a text fragment.
447
+ # For Atom parsing, we need to handle Data objects specially:
448
+ # They need all events passed through verbatim, because
449
+ # they might contain XHTML which will be sent through
450
+ # as REXML events and will need to be reconstructed.
451
+ def text(s)
452
+ if @current_object.kind_of?(Data)
453
+ @current_object.text(s)
454
+ return
455
+ end
456
+ if @textstack.last
457
+ @textstack.last << s
458
+ end
459
+ end
460
+ #:startdoc:
461
+
462
+ # Reset the parser ready to parse a new feed.
463
+ def reset
464
+ # Set up an empty Feed object and make it the current object
465
+ @parsetree = Feed.new(nil)
466
+ # Set up the class-for-tag hash
467
+ @class_for_tag = CLASS_FOR_TAG
468
+ # Everything else is common to both kinds of parser
469
+ super
470
+ end
471
+
472
+ # The most recently parsed feed as a Syndication::Feed object.
473
+ def feed
474
+ return @parsetree
475
+ end
476
+
477
+ end
478
+ end
479
+ end