syndication 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/IMPLEMENTATION +33 -0
- data/README +208 -0
- data/examples/yahoo.rb +21 -0
- data/lib/syndication/atom.rb +479 -0
- data/lib/syndication/common.rb +267 -0
- data/lib/syndication/content.rb +37 -0
- data/lib/syndication/dublincore.rb +92 -0
- data/lib/syndication/podcast.rb +85 -0
- data/lib/syndication/rss.rb +326 -0
- data/lib/syndication/syndication.rb +45 -0
- data/test/atomtest.rb +186 -0
- data/test/rsstest.rb +314 -0
- metadata +55 -0
data/IMPLEMENTATION
ADDED
@@ -0,0 +1,33 @@
|
|
1
|
+
# = Syndication 0.4
|
2
|
+
#
|
3
|
+
# As discussed in the README, this is really my fourth attempt at writing
|
4
|
+
# RSS parsing code. For the record, I thought I'd list the approaches I
|
5
|
+
# tried and abandoned. In a way, that's more interesting than the one I
|
6
|
+
# picked...
|
7
|
+
#
|
8
|
+
# First I used hashes for storage and just looked for matching tags.
|
9
|
+
# That approach works, kinda, but it doesn't really understand nested
|
10
|
+
# elements at all. As a result, it becomes really hard to deal with Atom
|
11
|
+
# feeds, where an <email> element could belong to one of a number of kinds
|
12
|
+
# of person. Plus, I wanted a real object-based approach which would be
|
13
|
+
# amenable to RDoc documentation.
|
14
|
+
#
|
15
|
+
# Next I wrote a classic stack-based parser, with a container stack and a
|
16
|
+
# text buffer stack. That worked well for RSS; I got it parsing every RSS
|
17
|
+
# variant, and even went as far as a test suite. However, as I tried
|
18
|
+
# extending it to deal with Atom, I realized that the parser code was
|
19
|
+
# becoming hard to follow, as the state machine gained more and more
|
20
|
+
# special cases.
|
21
|
+
#
|
22
|
+
# For a third iteration, I tried to generalize the knowledge represented by the
|
23
|
+
# state machine, by placing it in the context stack. That is, I would have a
|
24
|
+
# smart stack that knew which XML elements could go inside other elements.
|
25
|
+
# Actually, there would have been four context stacks, for containers,
|
26
|
+
# attributes, tags and textual data.
|
27
|
+
#
|
28
|
+
# That design never made it past the paper stage, because I realized that I
|
29
|
+
# could move all the knowledge into the classes used to create the objects of
|
30
|
+
# the final parse tree. With the new model--the one used in this code--the
|
31
|
+
# parser really doesn't know anything about Atom or RSS. It just forwards
|
32
|
+
# events to a tree of objects, which construct child objects as appropriate to
|
33
|
+
# grow the tree and represent the feed.
|
data/README
ADDED
@@ -0,0 +1,208 @@
|
|
1
|
+
#
|
2
|
+
# = Syndication 0.4
|
3
|
+
#
|
4
|
+
# This module provides classes for parsing web syndication feeds in RSS and
|
5
|
+
# Atom formats.
|
6
|
+
#
|
7
|
+
# To parse RSS, use Syndication::RSS::Parser.
|
8
|
+
#
|
9
|
+
# To parse Atom, use Syndication::Atom::Parser.
|
10
|
+
#
|
11
|
+
# If you want my advice on which to generate, my order of preference would
|
12
|
+
# be:
|
13
|
+
#
|
14
|
+
# 1. Atom 1.0
|
15
|
+
# 2. RSS 1.0
|
16
|
+
# 3. RSS 2.0
|
17
|
+
#
|
18
|
+
# My reasoning is simply that I hate having to sniff for HTML (see
|
19
|
+
# Syndication::RSS).
|
20
|
+
#
|
21
|
+
# == License
|
22
|
+
#
|
23
|
+
# Syndication is Copyright 2005 mathew <meta@pobox.com>, and is licensed
|
24
|
+
# under the same terms as Ruby.
|
25
|
+
#
|
26
|
+
# == Requirements
|
27
|
+
#
|
28
|
+
# Built and tested using Ruby 1.8.2. Needs only the standard library.
|
29
|
+
#
|
30
|
+
# == Rationale
|
31
|
+
#
|
32
|
+
# Ruby already has an RSS library as part of the standard library, so you
|
33
|
+
# might be wondering why I decided to write another one.
|
34
|
+
#
|
35
|
+
# I started out trying to document the standard rss module, but found the
|
36
|
+
# code rather impenetrable. It was also difficult to see how it could be made
|
37
|
+
# documentable via Rdoc.
|
38
|
+
#
|
39
|
+
# Then I tried writing code to use the standard RSS library, and discovered
|
40
|
+
# that it had a number of (what I consider to be) defects:
|
41
|
+
#
|
42
|
+
# - It doesn't support RSS 2.0 with extensions (such as iTunes podcast feeds),
|
43
|
+
# and it wasn't clear to me how to extend it to do so.
|
44
|
+
#
|
45
|
+
# - It doesn't support RSS 0.9.
|
46
|
+
#
|
47
|
+
# - It doesn't support Atom.
|
48
|
+
#
|
49
|
+
# - The API is different depending on what kind of RSS feed you are parsing.
|
50
|
+
#
|
51
|
+
# I asked around, and discovered that I wasn't the only person dissatisfied
|
52
|
+
# with the RSS library. Since fixing the problems would have resulted in
|
53
|
+
# breaking existing code that used the RSS module, I opted for an all-new
|
54
|
+
# implementation.
|
55
|
+
#
|
56
|
+
# This is the result. I'm calling it version 0.4, because it's actually my
|
57
|
+
# fourth attempt at putting together a clean, simple, universal API for RSS
|
58
|
+
# and Atom parsing. (The first three never saw public release.)
|
59
|
+
#
|
60
|
+
# == Features
|
61
|
+
#
|
62
|
+
# Here are what I see as the key improvements over the rss module in the
|
63
|
+
# Ruby standard library:
|
64
|
+
#
|
65
|
+
# - Supports all RSS versions, including RSS 0.9, as well as Atom.
|
66
|
+
#
|
67
|
+
# - Provides a unified API/object model for accessing the decoded data,
|
68
|
+
# with no need to know what format the feed is in.
|
69
|
+
#
|
70
|
+
# - Allows use of extended RSS 2.0 feeds.
|
71
|
+
#
|
72
|
+
# - Simple API, fully documented.
|
73
|
+
#
|
74
|
+
# - Test suite with over 220 test assertions.
|
75
|
+
#
|
76
|
+
# - Commented source code.
|
77
|
+
#
|
78
|
+
# - Less source code than the standard library rss module.
|
79
|
+
#
|
80
|
+
# - Faster than the standard library (at least, in my tests, see caveat below).
|
81
|
+
#
|
82
|
+
# Other features:
|
83
|
+
#
|
84
|
+
# - Optional support for RSS 1.0 Dublin Core, Syndication and Content modules
|
85
|
+
# and Apple iTunes Podcast elements (others to follow).
|
86
|
+
#
|
87
|
+
# - Content module decodes CDATA-escaped or encoded HTML content for you.
|
88
|
+
#
|
89
|
+
# - Supports namespaces, and encoded XHTML/HTML in Atom feeds.
|
90
|
+
#
|
91
|
+
# - Dates decoded to Ruby DateTime objects. Note, however, that this is slow,
|
92
|
+
# so parsing is only performed if you ask for the value.
|
93
|
+
#
|
94
|
+
# - Simple to extend to support your own RSS extensions, uses reflection.
|
95
|
+
#
|
96
|
+
# - Uses REXML fast stream parsing API for speed.
|
97
|
+
#
|
98
|
+
# - Non-validating, tries to be as forgiving as possible of structural errors.
|
99
|
+
#
|
100
|
+
# - Remaps namespace prefixes to standard values if it recognizes the module's
|
101
|
+
# URL.
|
102
|
+
#
|
103
|
+
# In the interests of balance, here are some key disadvantages over the
|
104
|
+
# standard library RSS support:
|
105
|
+
#
|
106
|
+
# - No support for _generating_ RSS feeds yet, only for parsing them. If
|
107
|
+
# you're using Rails, you can use RXML; if not, you can of course continue
|
108
|
+
# to use rss/maker.
|
109
|
+
#
|
110
|
+
# - Different API, not a drop-in replacement.
|
111
|
+
#
|
112
|
+
# - No way to choose a different XML parser (yet).
|
113
|
+
#
|
114
|
+
# - Incomplete support for Atom 0.3 draft. (Anyone still using it?)
|
115
|
+
#
|
116
|
+
# - No support for base64 data in Atom feeds (yet).
|
117
|
+
#
|
118
|
+
# - No Japanese documentation.
|
119
|
+
#
|
120
|
+
# - No XSL output options.
|
121
|
+
#
|
122
|
+
# - Slower if there are dates in the feed and you ask for their values.
|
123
|
+
#
|
124
|
+
# == Other options
|
125
|
+
#
|
126
|
+
# There are, of course, other Ruby RSS/Atom libraries out there. The ones I
|
127
|
+
# know about:
|
128
|
+
#
|
129
|
+
# = simple-rss
|
130
|
+
#
|
131
|
+
# http://rubyforge.org/projects/simple-rss
|
132
|
+
#
|
133
|
+
# Pros:
|
134
|
+
# - Much smaller than syndication or rss.
|
135
|
+
#
|
136
|
+
# - Completely non-validating.
|
137
|
+
#
|
138
|
+
# - Backwards compatible with rss in standard library.
|
139
|
+
#
|
140
|
+
# Cons:
|
141
|
+
# - Doesn't use a real XML parser.
|
142
|
+
#
|
143
|
+
# - No support for namespaces.
|
144
|
+
#
|
145
|
+
# - Incomplete Atom support (e.g. can't get name and e-mail of <atom:person>
|
146
|
+
# elements as separate fields, you still have to decode XHTML data yourself)
|
147
|
+
#
|
148
|
+
# - No documentation.
|
149
|
+
#
|
150
|
+
# For the record, I started work on my library long before simple-rss was
|
151
|
+
# announced.
|
152
|
+
#
|
153
|
+
# = feedtools / feedreader
|
154
|
+
#
|
155
|
+
# http://rubyforge.org/projects/feedtools/
|
156
|
+
#
|
157
|
+
# I don't know much about this one.
|
158
|
+
#
|
159
|
+
# == Design philosophy
|
160
|
+
#
|
161
|
+
# Here's my design philosophy for this module:
|
162
|
+
#
|
163
|
+
# - The interface should be via standard Ruby objects and methods; e.g.
|
164
|
+
# feed.channel.item[0].title, rather than (say) a dictionary hash.
|
165
|
+
#
|
166
|
+
# - It should be easier to parse RSS via the module than to hack something
|
167
|
+
# together using REXML, even if all you want is a list of titles and URLs.
|
168
|
+
#
|
169
|
+
# - It should be easy to add support for new RSS extensions without needing
|
170
|
+
# to know anything about reflection or other advanced topics. Just define
|
171
|
+
# a mixin with a bunch of appropriately-named methods, and you're done.
|
172
|
+
#
|
173
|
+
# - The code should be simple to understand.
|
174
|
+
#
|
175
|
+
# - Even so, good complete documentation is extremely important.
|
176
|
+
#
|
177
|
+
# - Be lenient in what you accept.
|
178
|
+
#
|
179
|
+
# - Be conservative in what you generate.
|
180
|
+
#
|
181
|
+
# - Get well-formed feeds parsing reliably, then worry about broken feeds.
|
182
|
+
#
|
183
|
+
# == Future plans
|
184
|
+
#
|
185
|
+
# Here are some possible improvements:
|
186
|
+
#
|
187
|
+
# - RSS and Atom generation. Create objects, then call Syndication::FeedMaker
|
188
|
+
# to generate XML in various flavors.
|
189
|
+
#
|
190
|
+
# - More lenient parsing. The limiting factor right now appears to be REXML,
|
191
|
+
# which although a non-validating parser, does require fairly well-formed
|
192
|
+
# XML. (In particular, failure to match tags will cause errors.) Perhaps
|
193
|
+
# the answer is to find or build a 'tag soup' parser that implements the
|
194
|
+
# REXML stream parsing API?
|
195
|
+
#
|
196
|
+
# - Faster date parsing. It turns out that when I asked for parsed dates in
|
197
|
+
# my test code, the profiler showed Date.parse chewing up 25% of the total
|
198
|
+
# CPU time used. A more specific date parser that didn't use heuristics
|
199
|
+
# to guess format could cut that down drastically. On the other hand,
|
200
|
+
# does it actually matter? Is the date parsing slow enough to be a problem
|
201
|
+
# for anyone?
|
202
|
+
#
|
203
|
+
# == Feedback
|
204
|
+
#
|
205
|
+
# This is my first public release of this code, so there are doubtless things
|
206
|
+
# I could have done better. Comments, suggestions, etc are welcome; e-mail
|
207
|
+
# <meta@pobox.com>.
|
208
|
+
#
|
data/examples/yahoo.rb
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
|
2
|
+
# RSS Syndication example:
|
3
|
+
#
|
4
|
+
# Output Yahoo news headlines, dated.
|
5
|
+
|
6
|
+
require 'open-uri'
|
7
|
+
require 'syndication/rss'
|
8
|
+
|
9
|
+
parser = Syndication::RSS::Parser.new
|
10
|
+
feed = nil
|
11
|
+
open("http://rss.news.yahoo.com/rss/topstories") {|file|
|
12
|
+
text = file.read
|
13
|
+
feed = parser.parse(text)
|
14
|
+
}
|
15
|
+
chan = feed.channel
|
16
|
+
t = chan.lastbuilddate.strftime("%H:%I on %A %d %B")
|
17
|
+
puts "#{chan.title} at #{t}"
|
18
|
+
for i in feed.items
|
19
|
+
t = i.pubdate.strftime("%d %b")
|
20
|
+
puts "#{t}: #{i.title}"
|
21
|
+
end
|
@@ -0,0 +1,479 @@
|
|
1
|
+
# Provides classes for parsing Atom web syndication feeds.
|
2
|
+
# See Syndication class for documentation.
|
3
|
+
#
|
4
|
+
# Copyright � mathew <meta@pobox.com> 2005.
|
5
|
+
# Licensed under the same terms as Ruby.
|
6
|
+
|
7
|
+
require 'uri'
|
8
|
+
require 'rexml/parsers/streamparser'
|
9
|
+
require 'rexml/streamlistener'
|
10
|
+
require 'rexml/document'
|
11
|
+
require 'date'
|
12
|
+
require 'syndication/common'
|
13
|
+
|
14
|
+
module Syndication
|
15
|
+
|
16
|
+
# The Atom syndication format is defined at
|
17
|
+
# <URL:http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-11.txt>.
|
18
|
+
# It is finalized, and should become an RFC soon.
|
19
|
+
#
|
20
|
+
# For an introduction, see "An overview of the Atom 1.0 Syndication Format"
|
21
|
+
# at <URL:http://www-128.ibm.com/developerworks/xml/library/x-atom10.html>
|
22
|
+
#
|
23
|
+
# For a comparison of Atom and RSS, see
|
24
|
+
# <URL:http://www.tbray.org/atom/RSS-and-Atom>
|
25
|
+
#
|
26
|
+
# To parse Atom feeds, use Syndication::Atom::Parser.
|
27
|
+
#
|
28
|
+
# The earlier Atom 0.3 format is partially supported; the 'mode' attribute
|
29
|
+
# is ignored and assumed to be 'xml' (as for Atom 1.0).
|
30
|
+
#
|
31
|
+
# Base64 encoded data in Atom 1.0 feeds is not supported (yet).
|
32
|
+
module Atom
|
33
|
+
|
34
|
+
# A value in an Atom feed which might be plain ASCII text, HTML, XHTML,
|
35
|
+
# or some random MIME type.
|
36
|
+
|
37
|
+
# TODO: Implement base64 support
|
38
|
+
# See http://ietfreport.isoc.org/all-ids/draft-ietf-atompub-format-11.txt
|
39
|
+
# section 4.1.3.3.
|
40
|
+
|
41
|
+
#:stopdoc:
|
42
|
+
# This object has to be handled specially; the parser feeds in all the
|
43
|
+
# REXML events, so the object can reconstruct embedded XML/XHTML.
|
44
|
+
# (Normally, the parser handles text buffering for a Container and
|
45
|
+
# calls store() when the container's element is closed.)
|
46
|
+
#:startdoc:
|
47
|
+
class Data < Container
|
48
|
+
# The decoded data, if the type is not text or XML
|
49
|
+
attr_reader :data
|
50
|
+
|
51
|
+
# Table of entities ripped from the XHTML spec.
|
52
|
+
ENTITIES = {
|
53
|
+
'Aacute' => 193, 'aacute' => 225, 'Acirc' => 194,
|
54
|
+
'acirc' => 226, 'acute' => 180, 'AElig' => 198,
|
55
|
+
'aelig' => 230, 'Agrave' => 192, 'agrave' => 224,
|
56
|
+
'amp' => 38, 'Aring' => 197, 'aring' => 229,
|
57
|
+
'Atilde' => 195, 'atilde' => 227, 'Auml' => 196,
|
58
|
+
'auml' => 228, 'brvbar' => 166, 'Ccedil' => 199,
|
59
|
+
'ccedil' => 231, 'cedil' => 184, 'cent' => 162,
|
60
|
+
'copy' => 169, 'curren' => 164, 'deg' => 176,
|
61
|
+
'divide' => 247, 'Eacute' => 201, 'eacute' => 233,
|
62
|
+
'Ecirc' => 202, 'ecirc' => 234, 'Egrave' => 200,
|
63
|
+
'egrave' => 232, 'ETH' => 208, 'eth' => 240,
|
64
|
+
'Euml' => 203, 'euml' => 235, 'frac12' => 189,
|
65
|
+
'frac14' => 188, 'frac34' => 190, 'gt' => 62,
|
66
|
+
'Iacute' => 205, 'iacute' => 237, 'Icirc' => 206,
|
67
|
+
'icirc' => 238, 'iexcl' => 161, 'Igrave' => 204,
|
68
|
+
'igrave' => 236, 'iquest' => 191, 'Iuml' => 207,
|
69
|
+
'iuml' => 239, 'laquo' => 171, 'lt' => 60,
|
70
|
+
'macr' => 175, 'micro' => 181, 'middot' => 183,
|
71
|
+
'nbsp' => 160, 'not' => 172, 'Ntilde' => 209,
|
72
|
+
'ntilde' => 241, 'Oacute' => 211, 'oacute' => 243,
|
73
|
+
'Ocirc' => 212, 'ocirc' => 244, 'Ograve' => 210,
|
74
|
+
'ograve' => 242, 'ordf' => 170, 'ordm' => 186,
|
75
|
+
'Oslash' => 216, 'oslash' => 248, 'Otilde' => 213,
|
76
|
+
'otilde' => 245, 'Ouml' => 214, 'ouml' => 246,
|
77
|
+
'para' => 182, 'plusmn' => 177, 'pound' => 163,
|
78
|
+
'quot' => 34, 'raquo' => 187, 'reg' => 174,
|
79
|
+
'sect' => 167, 'shy' => 173, 'sup1' => 185,
|
80
|
+
'sup2' => 178, 'sup3' => 179, 'szlig' => 223,
|
81
|
+
'THORN' => 222, 'thorn' => 254, 'times' => 215,
|
82
|
+
'Uacute' => 218, 'uacute' => 250, 'Ucirc' => 219,
|
83
|
+
'ucirc' => 251, 'Ugrave' => 217, 'ugrave' => 249,
|
84
|
+
'uml' => 168, 'Uuml' => 220, 'uuml' => 252,
|
85
|
+
'Yacute' => 221, 'yacute' => 253, 'yen' => 165,
|
86
|
+
'yuml' => 255
|
87
|
+
}
|
88
|
+
|
89
|
+
def initialize(parent, tag, attrs = nil)
|
90
|
+
@tag = tag
|
91
|
+
@parent = parent
|
92
|
+
@type = 'text' # the default, as per the standard
|
93
|
+
if attrs['type']
|
94
|
+
@type = attrs['type']
|
95
|
+
end
|
96
|
+
@div_trimmed = false
|
97
|
+
case @type
|
98
|
+
when 'xhtml'
|
99
|
+
@xhtml = ''
|
100
|
+
when 'html'
|
101
|
+
@html = ''
|
102
|
+
when 'text'
|
103
|
+
@text = ''
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
107
|
+
# Convert a text representation to HTML.
|
108
|
+
def text2html(text)
|
109
|
+
html = text.gsub('&','&')
|
110
|
+
html.gsub!('<','<')
|
111
|
+
html.gsub!('>','>')
|
112
|
+
return html
|
113
|
+
end
|
114
|
+
|
115
|
+
# Convert an HTML representation to text.
|
116
|
+
# This is done by throwing away all tags and converting all entities.
|
117
|
+
# Not ideal, but I can't think of a better simple approach.
|
118
|
+
def html2text(html)
|
119
|
+
text = html.gsub(/<[^>]*>/, '')
|
120
|
+
text = text.gsub(/&(\w)+;/) {|x|
|
121
|
+
ENTITIES[x] ? ENTITIES[x] : ''
|
122
|
+
}
|
123
|
+
return text
|
124
|
+
end
|
125
|
+
|
126
|
+
# Return value of Data as HTML.
|
127
|
+
def html
|
128
|
+
return @html if @html
|
129
|
+
return @xhtml if @xhtml
|
130
|
+
return text2html(@text) if @text
|
131
|
+
return nil
|
132
|
+
end
|
133
|
+
|
134
|
+
# Return value of Data as ASCII text.
|
135
|
+
# If the field started off as (X)HTML, this is done by ruthlessly
|
136
|
+
# discarding markup and entities, so it is highly recommended that you
|
137
|
+
# use the XHTML or HTML and convert to text in a more intelligent way.
|
138
|
+
def txt
|
139
|
+
return @text if @text
|
140
|
+
return html2text(@xhtml) if @xhtml
|
141
|
+
return html2text(@html) if @html
|
142
|
+
return nil
|
143
|
+
end
|
144
|
+
|
145
|
+
# Return value of Data as XHTML.
|
146
|
+
def xhtml
|
147
|
+
return @xhtml if @xhtml
|
148
|
+
return @html if @html
|
149
|
+
return text2html(@text) if @text
|
150
|
+
return nil
|
151
|
+
end
|
152
|
+
|
153
|
+
# Catch tag start events if we're collecting embedded XHTML.
|
154
|
+
def tag_start(tag, attrs = nil)
|
155
|
+
if @type == 'xhtml'
|
156
|
+
t = tag.sub(/^xhtml:/,'')
|
157
|
+
@xhtml += "<#{t}>"
|
158
|
+
else
|
159
|
+
super
|
160
|
+
end
|
161
|
+
end
|
162
|
+
|
163
|
+
# Catch tag end events if we're collecting embedded XHTML.
|
164
|
+
def tag_end(endtag, current)
|
165
|
+
if @tag == endtag
|
166
|
+
if @type == 'xhtml' and !@div_stripped
|
167
|
+
@xhtml.sub!(/^\s*<div>\s*/m,'')
|
168
|
+
@xhtml.sub!(/\s*<\/div>\s*$/m,'')
|
169
|
+
@div_stripped = true
|
170
|
+
end
|
171
|
+
return @parent
|
172
|
+
end
|
173
|
+
if @type == 'xhtml'
|
174
|
+
t = endtag.sub(/^xhtml:/,'')
|
175
|
+
@xhtml += "</#{t}>"
|
176
|
+
return self
|
177
|
+
else
|
178
|
+
super
|
179
|
+
end
|
180
|
+
end
|
181
|
+
|
182
|
+
# Store/buffer text in the appropriate internal field.
|
183
|
+
def text(s)
|
184
|
+
case @type
|
185
|
+
when 'xhtml'
|
186
|
+
@xhtml += s
|
187
|
+
when 'html'
|
188
|
+
@html += s
|
189
|
+
when 'text'
|
190
|
+
@text += s
|
191
|
+
end
|
192
|
+
end
|
193
|
+
end
|
194
|
+
|
195
|
+
# A Link represents a hypertext link to another object from an Atom feed.
|
196
|
+
# Examples include the link with rel=self to the canonical URL of the feed.
|
197
|
+
class Link < Container
|
198
|
+
attr_accessor :href # The URI of the link.
|
199
|
+
attr_accessor :rel # The type of relationship the link expresses.
|
200
|
+
attr_accessor :type # The type of object at the other end of the link.
|
201
|
+
attr_accessor :title # The title for the link.
|
202
|
+
attr_accessor :length # The length of the linked-to object in bytes.
|
203
|
+
|
204
|
+
def initialize(parent, tag, attrs = nil)
|
205
|
+
@tag = tag
|
206
|
+
@parent = parent
|
207
|
+
if attrs
|
208
|
+
attrs.each_pair {|key, value|
|
209
|
+
self.store(key, value)
|
210
|
+
}
|
211
|
+
end
|
212
|
+
end
|
213
|
+
end
|
214
|
+
|
215
|
+
# A person, corporation or similar entity within an Atom feed.
|
216
|
+
class Person < Container
|
217
|
+
attr_accessor :name # Human-readable name of person.
|
218
|
+
attr_accessor :uri # URI associated with the person.
|
219
|
+
attr_accessor :email # RFC2822 e-mail address of person.
|
220
|
+
|
221
|
+
# For Atom 0.3 compatibility
|
222
|
+
def url=(x)
|
223
|
+
@uri = x
|
224
|
+
end
|
225
|
+
end
|
226
|
+
|
227
|
+
# A category (keyword) in an Atom feed.
|
228
|
+
# For convenience, Category#to_s is the same as Category#label.
|
229
|
+
class Category < Container
|
230
|
+
# The category itself, possibly encoded.
|
231
|
+
attr_accessor :term
|
232
|
+
# A human-readable version of Category#term.
|
233
|
+
attr_accessor :label
|
234
|
+
# URI to the schema definition.
|
235
|
+
attr_accessor :scheme
|
236
|
+
|
237
|
+
#:stopdoc:
|
238
|
+
# parent = parent object
|
239
|
+
# tag = XML tag which caused creation of this object
|
240
|
+
# attrs = XML attributes as a hash
|
241
|
+
def initialize(parent, tag, attrs = nil)
|
242
|
+
@tag = tag
|
243
|
+
@parent = parent
|
244
|
+
if attrs
|
245
|
+
attrs.each_pair {|key, value|
|
246
|
+
self.store(key, value)
|
247
|
+
}
|
248
|
+
end
|
249
|
+
end
|
250
|
+
|
251
|
+
alias to_s label
|
252
|
+
#:startdoc:
|
253
|
+
end
|
254
|
+
|
255
|
+
# Represents a parsed Atom feed, as returned by Syndication::Atom::Parser.
|
256
|
+
class Feed < Container
|
257
|
+
# Title of feed as a Syndication::Data object.
|
258
|
+
attr_accessor :title
|
259
|
+
# Subtitle of feed as a Syndication::Data object.
|
260
|
+
attr_accessor :subtitle
|
261
|
+
# Last update time, accepts an ISO8601 date/time as per the Atom spec.
|
262
|
+
attr_writer :updated
|
263
|
+
# Software which generated feed as a String.
|
264
|
+
attr_accessor :generator
|
265
|
+
# URI of icon to represent channel as a String.
|
266
|
+
attr_accessor :icon
|
267
|
+
# Globally unique ID of feed as a String.
|
268
|
+
attr_accessor :id
|
269
|
+
# URI of logo for channel as a String.
|
270
|
+
attr_accessor :logo
|
271
|
+
# Copyright or other rights information as a String.
|
272
|
+
attr_accessor :rights
|
273
|
+
# Author of feed as a Syndication::Person object.
|
274
|
+
attr_accessor :author
|
275
|
+
# Array of Syndication::Entry objects representing the entries in the feed.
|
276
|
+
attr_reader :entries
|
277
|
+
# Array of Syndication::Category objects representing taxonomic
|
278
|
+
# categories for the feed.
|
279
|
+
attr_reader :categories
|
280
|
+
# Array of Syndication::Person objects representing contributors.
|
281
|
+
attr_reader :contributors
|
282
|
+
# Array of Syndication::Link objects representing various types of link.
|
283
|
+
attr_reader :links
|
284
|
+
# Atom 0.3 info element (obsolete)
|
285
|
+
attr_accessor :info
|
286
|
+
|
287
|
+
# For Atom 0.3 compatibility
|
288
|
+
def tagline=(x)
|
289
|
+
@subtitle = x
|
290
|
+
end
|
291
|
+
|
292
|
+
# For Atom 0.3 compatibility
|
293
|
+
def copyright=(x)
|
294
|
+
@rights = x
|
295
|
+
end
|
296
|
+
|
297
|
+
# For Atom 0.3 compatibility
|
298
|
+
def modified=(x)
|
299
|
+
@updated = x
|
300
|
+
end
|
301
|
+
|
302
|
+
# Add a Syndication::Category value to the feed
|
303
|
+
def category=(obj)
|
304
|
+
if !@categories
|
305
|
+
@categories = Array.new
|
306
|
+
end
|
307
|
+
@categories.push(obj)
|
308
|
+
end
|
309
|
+
|
310
|
+
# Add a Syndication::Entry to the feed
|
311
|
+
def entry=(obj)
|
312
|
+
if !@entries
|
313
|
+
@entries = Array.new
|
314
|
+
end
|
315
|
+
@entries.push(obj)
|
316
|
+
end
|
317
|
+
|
318
|
+
# Add a Syndication::Person contributor to the feed
|
319
|
+
def contributor=(obj)
|
320
|
+
if !@contributors
|
321
|
+
@contributors = Array.new
|
322
|
+
end
|
323
|
+
@contributors.push(obj)
|
324
|
+
end
|
325
|
+
|
326
|
+
# Add a Syndication::Link to the feed
|
327
|
+
def link=(obj)
|
328
|
+
if !@links
|
329
|
+
@links = Array.new
|
330
|
+
end
|
331
|
+
@links.push(obj)
|
332
|
+
end
|
333
|
+
|
334
|
+
# Last update date/time as a DateTime object if it can be parsed,
|
335
|
+
# a String otherwise.
|
336
|
+
def updated
|
337
|
+
parse_date(@updated)
|
338
|
+
end
|
339
|
+
end
|
340
|
+
|
341
|
+
# An entry within an Atom feed.
|
342
|
+
class Entry < Container
|
343
|
+
# Title of entry.
|
344
|
+
attr_accessor :title
|
345
|
+
# Summary of content.
|
346
|
+
attr_accessor :summary
|
347
|
+
# Source feed metadata as Feed object.
|
348
|
+
attr_accessor :source
|
349
|
+
# Last update date/time as DateTime object.
|
350
|
+
attr_writer :updated
|
351
|
+
# Publication date/time as DateTime object.
|
352
|
+
attr_writer :published
|
353
|
+
# Author of entry as a Person object.
|
354
|
+
attr_accessor :author
|
355
|
+
# Copyright or other rights information.
|
356
|
+
attr_accessor :rights
|
357
|
+
# Content of entry.
|
358
|
+
attr_accessor :content
|
359
|
+
# Globally unique ID of Entry.
|
360
|
+
attr_accessor :id
|
361
|
+
# Array of taxonomic categories for feed.
|
362
|
+
attr_reader :categories
|
363
|
+
# Array of Link objects.
|
364
|
+
attr_reader :links
|
365
|
+
# Array of Person objects representing contributors.
|
366
|
+
attr_reader :contributors
|
367
|
+
# Atom 0.3 creation date/time (obsolete)
|
368
|
+
attr_writer :created
|
369
|
+
|
370
|
+
# For Atom 0.3 compatibility
|
371
|
+
def modified=(x)
|
372
|
+
@updated = x
|
373
|
+
end
|
374
|
+
|
375
|
+
# For Atom 0.3 compatibility
|
376
|
+
def issued=(x)
|
377
|
+
@published = x
|
378
|
+
end
|
379
|
+
|
380
|
+
# For Atom 0.3 compatibility
|
381
|
+
def copyright=(x)
|
382
|
+
@rights = x
|
383
|
+
end
|
384
|
+
|
385
|
+
# Add a Category object to the entry
|
386
|
+
def category=(obj)
|
387
|
+
if !@categories
|
388
|
+
@categories = Array.new
|
389
|
+
end
|
390
|
+
@categories.push(obj)
|
391
|
+
end
|
392
|
+
|
393
|
+
# Add a Person to the entry to represent a contributor
|
394
|
+
def contributor=(obj)
|
395
|
+
if !@contributors
|
396
|
+
@contributors = Array.new
|
397
|
+
end
|
398
|
+
@contributors.push(obj)
|
399
|
+
end
|
400
|
+
|
401
|
+
# Add a Link to the entry
|
402
|
+
def link=(obj)
|
403
|
+
if !@links
|
404
|
+
@links = Array.new
|
405
|
+
end
|
406
|
+
@links.push(obj)
|
407
|
+
end
|
408
|
+
|
409
|
+
# The last update DateTime
|
410
|
+
def updated
|
411
|
+
parse_date(@updated)
|
412
|
+
end
|
413
|
+
|
414
|
+
# The DateTime of publication
|
415
|
+
def published
|
416
|
+
parse_date(@published)
|
417
|
+
end
|
418
|
+
|
419
|
+
# The DateTime of creation (Atom 0.3, obsolete)
|
420
|
+
def created
|
421
|
+
parse_date(@created)
|
422
|
+
end
|
423
|
+
end
|
424
|
+
|
425
|
+
# A parser for Atom feeds.
|
426
|
+
# See Syndication::Parser in common.rb for the abstract class this
|
427
|
+
# specializes.
|
428
|
+
class Parser < AbstractParser
|
429
|
+
include REXML::StreamListener
|
430
|
+
|
431
|
+
#:stopdoc:
|
432
|
+
# A hash of tags which require the creation of new objects, and the class
|
433
|
+
# to use for creating the object.
|
434
|
+
CLASS_FOR_TAG = {
|
435
|
+
'entry' => Entry,
|
436
|
+
'author' => Person,
|
437
|
+
'contributor' => Person,
|
438
|
+
'title' => Data,
|
439
|
+
'subtitle' => Data,
|
440
|
+
'summary' => Data,
|
441
|
+
'link' => Link,
|
442
|
+
'source' => Feed,
|
443
|
+
'category' => Category
|
444
|
+
}
|
445
|
+
|
446
|
+
# Called when REXML finds a text fragment.
|
447
|
+
# For Atom parsing, we need to handle Data objects specially:
|
448
|
+
# They need all events passed through verbatim, because
|
449
|
+
# they might contain XHTML which will be sent through
|
450
|
+
# as REXML events and will need to be reconstructed.
|
451
|
+
def text(s)
|
452
|
+
if @current_object.kind_of?(Data)
|
453
|
+
@current_object.text(s)
|
454
|
+
return
|
455
|
+
end
|
456
|
+
if @textstack.last
|
457
|
+
@textstack.last << s
|
458
|
+
end
|
459
|
+
end
|
460
|
+
#:startdoc:
|
461
|
+
|
462
|
+
# Reset the parser ready to parse a new feed.
|
463
|
+
def reset
|
464
|
+
# Set up an empty Feed object and make it the current object
|
465
|
+
@parsetree = Feed.new(nil)
|
466
|
+
# Set up the class-for-tag hash
|
467
|
+
@class_for_tag = CLASS_FOR_TAG
|
468
|
+
# Everything else is common to both kinds of parser
|
469
|
+
super
|
470
|
+
end
|
471
|
+
|
472
|
+
# The most recently parsed feed as a Syndication::Feed object.
|
473
|
+
def feed
|
474
|
+
return @parsetree
|
475
|
+
end
|
476
|
+
|
477
|
+
end
|
478
|
+
end
|
479
|
+
end
|