podcatcher 3.1.8

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. checksums.yaml +7 -0
  2. data/MIT-LICENSE +20 -0
  3. data/README.txt +249 -0
  4. data/bin/podcatcher +2535 -0
  5. metadata +53 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 7a4cb7f4491ccad25c92672d133f6d77a6684361
4
+ data.tar.gz: 9823d1629cf7e1cbd812958c444f7d273c4aa5d7
5
+ SHA512:
6
+ metadata.gz: 4e16316c59fc2fae8d074f100fbd93f419d140ce5d1f07edc9976f8a52a7dcc54cf78fdef4d43f907828b68bd52d7b391460a875ee769c961653df293bfca545
7
+ data.tar.gz: 58695c4f838a1e7aa688d36ce6b48f4c0b898f7776a66b306f7b1412cc0fb22d2f4bd094912d0601407b3296cc5d4fedd18fe370faa1359fc6b72b0098396fc0
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2016 Doga Armangil
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.txt ADDED
@@ -0,0 +1,249 @@
1
+
2
+ ARMANGIL'S PODCATCHER
3
+ =====================
4
+
5
+ Armangil's podcatcher is a podcast client for the command line.
6
+ It can download any type of content enclosed in RSS or Atom files, such as
7
+ MP3 or other audio content, video and images. A search function for
8
+ subscribing to feeds is also included. It provides several download
9
+ strategies, supports BitTorrent, offers cache management, and generates
10
+ playlists for media player applications.
11
+
12
+ As argument, it accepts feeds (RSS or Atom) or subscription lists
13
+ (OPML or iTunes PCAST), in the form of filenames or URLs (HTTP or FTP).
14
+ Alternatively, it accepts one feed or subscription list from the standard
15
+ input.
16
+
17
+ BitTorrent is supported both internally (through the RubyTorrent library)
18
+ and externally (.torrent files are downloaded, but the user handles
19
+ them using a BitTorrent application). The latter is currently the most
20
+ reliable method, as RubyTorrent is still in alpha phase.
21
+
22
+ Concurrency is not handled: simultaneous executions of this program should
23
+ target different directories.
24
+
25
+ Visit https://github.com/doga/podcatcher for more information.
26
+
27
+ Usage: podcatcher [options] [arguments]
28
+
29
+ Options:
30
+ -d, --dir DIR Directory for storing application state.
31
+ Default value is current directory.
32
+ -D, --cachedir DIR Directory for storing downloaded content.
33
+ Default value is the 'cache' subdirectory
34
+ of the state directory (specified by
35
+ the --dir option).
36
+ This option is ignored if this directory
37
+ is inside the state directory, or if the
38
+ state directory is inside this directory.
39
+ -s, --size SIZE Size, in megabytes, of the cache directory
40
+ (specified by the --cachedir option).
41
+ 0 means unbounded. Default value is 512.
42
+ This option also sets the upper limit for
43
+ the amount of content that can be downloaded
44
+ in one session.
45
+ Content downloaded during previous sessions
46
+ may be deleted by podcatcher in order to
47
+ make place for new content.
48
+ -e, --[no-]empty Empty the cache directory before
49
+ downloading content.
50
+ -p, --[no-]perfeed Create one subdirectory per feed
51
+ in the cache directory.
52
+ -S, --strategy S Strategy to use when downloading content:
53
+ * back_catalog: download any content that
54
+ has not been downloaded before; prefer
55
+ recent content to older content (may
56
+ download more than one content file per
57
+ feed),
58
+ * one: download one content file (not
59
+ already downloaded) for each feed, with a
60
+ preference for recent content,
61
+ * all: download all content, with a
62
+ preference for recent content; even
63
+ already downloaded content is downloaded
64
+ once again (may download more than one
65
+ content file per feed),
66
+ * chron: download in chronological order
67
+ any content that has not been downloaded
68
+ before; this is useful for audiobook
69
+ podcasts etc (may download more than one
70
+ content file per feed),
71
+ * chron_one: download the oldest content of
72
+ each feed that has not already been
73
+ downloaded,
74
+ * chron_all: download all content in
75
+ chronological order, even if the content
76
+ has already been downloaded (may download
77
+ more than one content file per feed),
78
+ * new: download the most recent content
79
+ of each feed, if it has not already been
80
+ downloaded (DEPRECATED: use 'one' instead
81
+ of 'new'),
82
+ * cache: generate a playlist for content
83
+ already in cache.
84
+ Default value is one.
85
+ -C, --content REGEXP A regular expression that matches the
86
+ MIME types of content to be downloaded.
87
+ Examples: '^video/', '^audio/mpeg$'.
88
+ Default value is '', which matches any
89
+ type of content.
90
+ -l, --language LANG A list of language tags separated by
91
+ commas. Examples: 'en-us,de', 'fr'.
92
+ A feed whose language does not match
93
+ this list is ignored. By default, all
94
+ feeds are accepted. See
95
+ http://cyber.law.harvard.edu/rss/languages.html
96
+ and
97
+ http://cyber.law.harvard.edu/rss/rss.html#optionalChannelElements
98
+ for allowed tags.
99
+ -H, --horizon DATE Do not download content older than
100
+ the given date. The date has the format
101
+ yyyy.mm.dd (example: 2007.03.22) or
102
+ yyyy.mm (equivalent to yyyy.mm.01) or
103
+ yyyy (equivalent to yyyy.01.01).
104
+ By default, no horizon is specified.
105
+ -r, --retries N Try downloading files (content, feeds
106
+ or subscription lists) at most N times
107
+ before giving up. Default value is 1.
108
+ -t, --type TYPE Type of the playlist written to
109
+ standard output. Accepted values are
110
+ m3u, smil, pls, asx, tox, xspf.
111
+ Default value is m3u.
112
+ -m, --memsize N Remember last N downloaded content,
113
+ and do not download them again.
114
+ 0 means unbounded. Default value is 1000.
115
+ -o, --order ORDER The order in which feeds are traversed
116
+ when downloading content:
117
+ * random: randomizes the feed order,
118
+ so that every feed has an equal chance
119
+ when content is downloaded, even if
120
+ the cache size is small and the number
121
+ of feeds is big,
122
+ * alphabetical: orders feeds
123
+ alphabetically by using their titles,
124
+ * sequential: preserves the argument
125
+ order (and the feed order in
126
+ subscription lists),
127
+ * reverse: reverses the feed order.
128
+ Default value is random.
129
+ -F, --function FUNCTION Used function:
130
+ * download: downloads content from
131
+ specified feeds,
132
+ * search: generates an OPML subscription
133
+ list of feeds matching the specified
134
+ query; the only options relevant for
135
+ search are -v, -r and -f.
136
+ Default value is download.
137
+ -f, --feeds N Do not download more than N feeds
138
+ (when using the download function),
139
+ or return the first N relevant feeds
140
+ (when using the search function).
141
+ 0 means unbounded. Default value is 1000.
142
+ -T, --torrentdir DIR Copy torrent files to directory DIR.
143
+ The handling of torrents through an
144
+ external BitTorrent client is left to
145
+ the user. If this option is not used,
146
+ torrents are handled internally (if
147
+ RubyTorrent is installed), or else
148
+ ignored.
149
+ -U, --uploadrate N Maximum upload rate (kilobytes per second)
150
+ for the internal BitTorrent client.
151
+ Unbounded by default.
152
+ -i, --itemsize N If downloaded content is less than N MB in
153
+ size (where N is an integer), fetch other
154
+ content of that same feed until this size
155
+ is reached.
156
+ Default value is 0.
157
+ The intent here is to ensure that podcatcher
158
+ downloads about as much content from podcasts
159
+ that frequently post small content (in
160
+ terms of minutes) as it does from podcasts
161
+ that post bigger content less frequently.
162
+ This option was more relevant in the early
163
+ days of podcasting when content size varied
164
+ greatly from one podcast to another. You
165
+ would rarely need to use this option today.
166
+ -c, --[no-]cache Generate a playlist for content
167
+ already in cache.
168
+ DEPRECATED, use '--strategy cache'.
169
+ -a, --[no-]asif Do not download content, only download
170
+ feeds and subscription lists.
171
+ Useful for testing.
172
+ -v, --[no-]verbose Run verbosely.
173
+ -V, --version Display current version and exit.
174
+ -h, --help Display this message and exit.
175
+ --[no-]restrictednames In the cache directory, make the names of
176
+ created subdirectories and files acceptable
177
+ for restrictive file systems such as VFAT
178
+ and FAT, which are used on Windows and MP3
179
+ player devices.
180
+ Enabled by default.
181
+ -A, --arguments FILENAME_OR_URL Read arguments from specified file.
182
+ Rules:
183
+ * accepts one argument per line,
184
+ * ignores empty lines and lines starting
185
+ with #,
186
+ * this option may be used several times
187
+ in one command.
188
+ -O, --options FILENAME_OR_URL Read options from specified file.
189
+ The options file uses the YAML format.
190
+
191
+ Usage examples:
192
+
193
+ podcatcher http://feeds.feedburner.com/Ruby5
194
+
195
+ podcatcher -O options.yaml -A feeds.txt
196
+
197
+ podcatcher --dir ~/podcasts http://www.npr.org/podcasts.opml
198
+
199
+ podcatcher --dir ~/podcasts --strategy cache > cache.m3u
200
+
201
+ cat feeds.opml | podcatcher --dir ~/podcasts > latest.m3u
202
+
203
+ podcatcher -vd ~/podcasts -s 500 -m 10_000 -t tox feeds.opml > latest.tox
204
+
205
+ podcatcher -vF search news http://www.bbc.co.uk/podcasts.opml > bbc_news.opml
206
+
207
+ podcatcher -F search -f 12 news http://www.npr.org/podcasts.opml > npr_news.opml
208
+
209
+
210
+ Requirements
211
+ ------------
212
+ Ruby 1.8.2 or later.
213
+
214
+
215
+ Installation
216
+ ------------
217
+ 1. Install the most recent Ruby distribution. Ruby is available on many
218
+ operating systems such as Windows, MacOS and Linux. A good starting point
219
+ is http://www.ruby-lang.org/en/ , and for Linux it is worth taking a look
220
+ at an RPM repository such as http://www.rpmseek.com/ (package name ruby).
221
+
222
+ 2. Extract to disk the podcatcher directory from the TGZ file.
223
+
224
+ 3. (Optional, for internal BitTorrent support) Download the most recent
225
+ RubyTorrent release from http://rubyforge.org/projects/rubytorrent/ ,
226
+ add its installation directory to $RUBYLIB (for Linux).
227
+
228
+ 4. (Optional, for Linux users) Add the podcatcher/bin subdirectory to $PATH.
229
+
230
+
231
+ Support
232
+ -------
233
+ Please use https://github.com/doga/podcatcher for bug reports
234
+ and feature requests.
235
+
236
+ Alternatively, you can send me an email to the address listed below.
237
+
238
+
239
+ License
240
+ -------
241
+ Armangil's podcatcher is released under the GNU General Public Licence.
242
+ Please see http://opensource.org/licenses/gpl-license.php for more information.
243
+
244
+
245
+ Author
246
+ ------
247
+ Doga Armangil, armangild@yahoo.com
248
+
249
+ [November 2014]
data/bin/podcatcher ADDED
@@ -0,0 +1,2535 @@
1
+ #!/usr/bin/env ruby
2
+ #:mode=ruby:
3
+
4
+ # This program is released under the GNU General Public Licence. Please see
5
+ # http://opensource.org/licenses/gpl-license.php for more information.
6
+ # Author: Doga Armangil, armangild@yahoo.com
7
+
8
+ PODCATCHER_WEBSITE = 'https://github.com/doga/podcatcher'
9
+ PODCATCHER_VERSION = '3.1.8'
10
+
11
+ # todo: allow files to be selected not only by its MIME type, but also other attributes. Example: --content '^video/ width:680-1024 height:400'
12
+ # todo: --proxy option
13
+ # todo: download at most one enclosure or media:content per rss item
14
+ # todo: support for --content and --language options in search mode
15
+ # todo: code refactoring: do not duplicate option handling for 'options' option, factor out conversion between MIME type and file extension, avoid code duplication between implementations of download and search functions
16
+ # todo: "item search" - search function that generates a feed containing relevant items of feeds (":item" or ":show" ?)
17
+ # todo: option to specify share ratio for torrents
18
+ # todo: symlink support in directory (for history, cache etc)
19
+ # todo: improve playlist generation when using --strategy cache (only include audio and video content)
20
+ # todo: improve --feeds implementation
21
+ # todo: resuming of failed media downloads
22
+ # todo: --subscriptions option (subscription d/l limit)
23
+ # todo: informative exception messages
24
+ # todo: only fetch bittorrent metainfo for d/l candidates
25
+ # todo: option to download shows concurrently
26
+ # todo: "lock" directory to prevent concurrency issues
27
+ # todo: option to throttle non-BitTorrent downloads
28
+ # 3.1.8: make podcatcher a Ruby gem
29
+ # 3.1.7: move the code repository from rubyforge to github, remove sponsor message, disable voting and checking for updates by default
30
+ # 3.1.6alpha: fixes a bug whereby a failed content download caused all other content from the same feed to be ignored
31
+ # 3.1.5: updated --arguments file format (# now comments out line), updated sponsor message
32
+ # 3.1.4: added publication date to content titles in generated playlists, added better handling of invalid URLs in feeds and subscription lists (such URLs are now simply ignored instead of causing the whole document to be skipped)
33
+ # 3.1.3: --restrictednames option is now enabled by default, fixed directory name generation bug that allowed '!' character when --perfeed and --restrictednames options were used simultaneously, updated sponsor message
34
+ # 3.1.2: modified the help text that appears when --help option is used, updated sponsor message
35
+ # 3.1.1: fixed a bug in verbose mode that caused content to be listed twice if it is declared as both RSS enclosure and Media RSS content, changed the sponsor message
36
+ # 3.1.0: added support for yyyy and yyyy.mm formats for --horizon parameter
37
+ # 3.0.0: added the --cachedir option for explicitely specifying cache directory, added --language option for selecting feeds by language, added the --horizon option that prevents the downloading of content older than a given date, added --restrictednames option for using content subdirectory and file names that are acceptable for restrictive filesystems such as VFAT, http://search.yahoo.com/mrss is now accepted as namespace for RSS Media module, fixed a bug in update checking (flash now only appears if podcatcherstats version is newer than current one), fixed a bug that caused votes to be sent for feeds that have file URLs or filenames.
38
+ # 2.0.1: fixed Yahoo Media RSS module handling bug
39
+ # 2.0.0: fixed a bug that caused the generation of invalid playlists for feeds containing control characters (such as Ctrl-M) in their title or in the title of one of its entries, added --order option that determines feed order, changed default feed order from 'sequential' to 'random', all content is downloaded by default (not only MP3), changed default cache size to 512MB, added support for the Yahoo Media RSS module (http://search.yahoo.com/mrss), added strategies for downloading content in chronological order (chron_one, chron, chron_all), added -C option that specifies the types of content that are to be received (overrides the default types), added -o option for reading options from a file, added -A option for reading arguments from a file, changed the default download strategy to 'one', added -V alias for --version option, fixed a bug that caused the order of feeds to be ignored in OPML files, fixed a bug that caused downloads of some video files to fail in vodcatcher mode, added --checkforupdate option for informing the user when a new version is available, added --vote option for voting in favour of downloaded podcasts at podcatcherstats.com
40
+ # 1.3.7: added status code and content type check when downloading a media file using HTTP, removed some debugging comments
41
+ # 1.3.5: fixed a bug that caused wrong cache filenames to be generated when an HTTP redirection was received from a server, added Operating System and processor information to the User-Agent HTTP header sent to web servers
42
+ # 1.3.4: fixed the help message
43
+ # 1.3.3: added the -p option that assigns a separate cache subfolder to each feed
44
+ # 1.3.2: bug fix
45
+ # 1.3.1: added robust handling of subscription lists that directly link to media files (such links are now ignored), fixed an OPML generation bug for interrupted searches
46
+ # 1.3.0: added search function for online podcast directories such as the iPodder podcast directory, added xspf support
47
+ # 1.2.0: added support for decentralized subscription lists (i.e. subscription lists that point to other subscription lists), fixed a bug that sometimes caused an invalid Referer header to be sent in HTTP requests, added the -f option, added support for Atom feeds that do not list items in reverse chronological order, added support for RSS/Atom feeds as command line arguments, added support for Extended M3U and Extended PLS playlist formats, M3U playlists can now also be generated in vodcatcher mode, m3u is now the default type in vodcatcher mode, added "cache" strategy which deprecates -c option
48
+ # 1.1.1: added support for iTunes .pcast subscription files
49
+ # 1.1.0: names of media files downloaded via BitTorrent are now preserved, done some refactoring so that the script can function as a vodcatcher
50
+ # 1.0.4: added support for RSS feeds that do not list items in reverse chronological order
51
+ # 1.0.3: fixed an RSS parsing bug that caused enclosures of some feeds to be ignored
52
+ # 1.0.2: fixed some minor MP3 file naming bugs
53
+ # 1.0.1: names of downloaded MP3 files are now preserved
54
+ # 1.0.0: added ATOM support
55
+ # 0.4.0: added duplicate removal for MP3, RSS/Atom and OPML URLs and pathnames; added the -i option that attempts to increase the listen-time given to podcasts which frequently release short shows
56
+ # 0.3.2: fixed BitTorrent handling bug
57
+ # 0.3.1: added robust handling of network exceptions, removed support for Ctrl-C to terminate execution
58
+ # 0.3.0: added support for opml format used by podcastalley, added podcast title information in playlists, reduced RAM usage by not loading the history file in memory, history file and playlist are now updated after each download
59
+ # 0.2.1: added support for Ctrl-C to terminate execution; added robust handling of some bad command line arguments; (James Carter patch) fixed the "OPML truncation" issue where a bad RSS feed was considered the last of the list
60
+ # 0.2.0: added a new download strategy ("one"); added support for more than one OPML argument, fixed some issues
61
+ # 0.1.7: bug fix
62
+ # 0.1.6: added internal Bittorrent support, fixed flawed handling of some exceptions
63
+ # 0.1.5: changed -d option description, added external handling of Bittorrent files
64
+ # 0.1.4: bug-fix, robust handling of bad //enclosure/@length attributes, handling of relative enclosure URLs
65
+ # 0.1.3: podcast download strategies (and changed default), download retries
66
+ # 0.1.2: added TOX playlist support, added HTTP and FTP support for the OPML parameter, done some code clean-up
67
+ # 0.1.1: fixed RSS parsing issue
68
+ # 0.1.0: initial version
69
+
70
+ require 'uri'
71
+ require 'open-uri'
72
+ require 'ostruct'
73
+ require 'optparse'
74
+ require 'pathname'
75
+ require 'date'
76
+ require 'cgi'
77
+ require 'yaml'
78
+ require 'net/http'
79
+ require 'rexml/document'
80
+
81
+ include REXML
82
+
83
+ #PODCATCHER_ENV = :development
84
+ PODCATCHER_ENV = :production
85
+
86
+ USER_AGENT = "podcatcher/#{PODCATCHER_VERSION} Ruby/#{RUBY_VERSION} #{RUBY_PLATFORM}"
87
+ UPDATE_CHECK_INTERVAL = 6 #months
88
+
89
+ opt = OpenStruct.new
90
+ opt.PLAYLIST_TYPES = [:m3u, :smil, :pls, :asx, :tox, :xspf]
91
+ opt.playlist_type = opt.PLAYLIST_TYPES[0]
92
+ opt.size = 512
93
+ opt.content_type = Regexp.new ''
94
+ opt.DESCRIPTION = <<END
95
+
96
+ Armangil's podcatcher is a podcast client for the command line.
97
+ It can download any type of content enclosed in RSS or Atom files, such as
98
+ MP3 or other audio content, video and images. A search function for
99
+ subscribing to feeds is also included. It provides several download
100
+ strategies, supports BitTorrent, offers cache management, and generates
101
+ playlists for media player applications.
102
+
103
+ As argument, it accepts feeds (RSS or Atom) or subscription lists
104
+ (OPML or iTunes PCAST), in the form of filenames or URLs (HTTP or FTP).
105
+ Alternatively, it accepts one feed or subscription list from the standard
106
+ input.
107
+
108
+ BitTorrent is supported both internally (through the RubyTorrent library)
109
+ and externally (.torrent files are downloaded, but the user handles
110
+ them using a BitTorrent application). The latter is currently the most
111
+ reliable method, as RubyTorrent is still in alpha phase.
112
+
113
+ Concurrency is not handled: simultaneous executions of this program should
114
+ target different directories.
115
+
116
+ Visit $website for more information.
117
+
118
+ Usage: #{$0} [options] [arguments]
119
+ END
120
+
121
+ opt.DESCRIPTION.gsub! '$website', PODCATCHER_WEBSITE
122
+
123
+ opt.dir = Pathname.new Dir.pwd
124
+ opt.CACHEDIR= 'cache'
125
+ opt.cachedir = opt.dir + opt.CACHEDIR
126
+ opt.memsize = 1_000
127
+ opt.empty = false
128
+ opt.simulate = false
129
+ opt.verbose = false
130
+ opt.STRATEGIES = [:one, :new, :back_catalog, :all, :chron, :chron_one, :chron_all, :cache]
131
+ opt.strategy = opt.STRATEGIES[0]
132
+ opt.retries = 1
133
+ opt.torrent_dir = nil
134
+ opt.rubytorrent = false
135
+ opt.upload_rate = nil #10
136
+ opt.itemsize = 0
137
+ opt.feeds = 1_000
138
+ opt.FUNCTIONS = [:download, :search]
139
+ opt.function = opt.FUNCTIONS[0]
140
+ opt.per_feed = false
141
+ opt.vote = false
142
+ opt.check_for_update = false
143
+ opt.ORDERS = [:random, :sequential, :alphabetical, :reverse]
144
+ opt.order = opt.ORDERS[0]
145
+ opt.horizon=nil
146
+ opt.language=[]
147
+ opt.restricted_names = true
148
+
149
+ arguments = []
150
+
151
+ option_parser = OptionParser.new() do |c|
152
+ c.banner = opt.DESCRIPTION
153
+ c.separator ""
154
+ c.separator "Options:"
155
+ c.on("-d", "--dir DIR",
156
+ "Directory for storing application state.",
157
+ "Default value is current directory.\n") do |e|
158
+ contained=false
159
+ #cache directory inside old state directory?
160
+ statedir=opt.dir
161
+ cachedir=opt.cachedir
162
+ loop do
163
+ if cachedir==statedir
164
+ contained=true
165
+ break
166
+ end
167
+ break if cachedir.root?
168
+ cachedir=cachedir.parent
169
+ end
170
+ opt.dir = Pathname.new(Dir.pwd)+e
171
+ #cache directory inside new state directory?
172
+ unless contained
173
+ statedir=opt.dir
174
+ cachedir=opt.cachedir
175
+ loop do
176
+ if cachedir==statedir
177
+ contained=true
178
+ break
179
+ end
180
+ break if cachedir.root?
181
+ cachedir=cachedir.parent
182
+ end
183
+ end
184
+ #new state directory inside cache directory?
185
+ unless contained
186
+ statedir=opt.dir
187
+ cachedir=opt.cachedir
188
+ loop do
189
+ if cachedir==statedir
190
+ contained=true
191
+ break
192
+ end
193
+ break if statedir.root?
194
+ statedir=statedir.parent
195
+ end
196
+ end
197
+ #
198
+ opt.dir.mkdir unless opt.dir.exist?
199
+ exit 1 unless opt.dir.directory?
200
+ if contained
201
+ opt.cachedir = opt.dir + opt.CACHEDIR
202
+ end
203
+ end
204
+ c.on("-D", "--cachedir DIR",
205
+ "Directory for storing downloaded content.",
206
+ "Default value is the '#{opt.CACHEDIR}' subdirectory",
207
+ "of the state directory (specified by ",
208
+ "the --dir option).",
209
+ "This option is ignored if this directory",
210
+ "is inside the state directory, or if the",
211
+ "state directory is inside this directory.\n") do |e|
212
+ contained=false
213
+ #cache directory should be outside state directory
214
+ statedir=opt.dir
215
+ cachedir = Pathname.new(Dir.pwd)+e
216
+ loop do
217
+ if cachedir==statedir
218
+ contained=true
219
+ break
220
+ end
221
+ break if cachedir.root?
222
+ cachedir=cachedir.parent
223
+ end
224
+ next if contained
225
+ #state directory should be outside cache directory
226
+ statedir=opt.dir
227
+ cachedir = Pathname.new(Dir.pwd)+e
228
+ loop do
229
+ if cachedir==statedir
230
+ contained=true
231
+ break
232
+ end
233
+ break if statedir.root?
234
+ statedir=statedir.parent
235
+ end
236
+ next if contained
237
+ #accept cache directory
238
+ opt.cachedir=Pathname.new(Dir.pwd)+e
239
+ end
240
+ c.on("-s", "--size SIZE",
241
+ "Size, in megabytes, of the cache directory",
242
+ "(specified by the --cachedir option).",
243
+ "0 means unbounded. Default value is #{opt.size}.",
244
+ "This option also sets the upper limit for",
245
+ "the amount of content that can be downloaded",
246
+ "in one session.",
247
+ "Content downloaded during previous sessions",
248
+ "may be deleted by podcatcher in order to",
249
+ "make place for new content.\n") do |e|
250
+ opt.size = e.to_i
251
+ opt.size = nil if opt.size<1
252
+ end
253
+ c.on("-e", "--[no-]empty",
254
+ "Empty the cache directory before",
255
+ "downloading content.\n") do |e|
256
+ opt.empty = e
257
+ end
258
+ c.on("-p", "--[no-]perfeed",
259
+ "Create one subdirectory per feed",
260
+ "in the cache directory.\n") do |e|
261
+ opt.per_feed = e
262
+ end
263
+ c.on("-S", "--strategy S", opt.STRATEGIES,
264
+ "Strategy to use when downloading content:",
265
+ "* back_catalog: download any content that",
266
+ " has not been downloaded before; prefer",
267
+ " recent content to older content (may ",
268
+ " download more than one content file per",
269
+ " feed),",
270
+ "* one: download one content file (not ",
271
+ " already downloaded) for each feed, with a ",
272
+ " preference for recent content,",
273
+ "* all: download all content, with a ",
274
+ " preference for recent content; even ",
275
+ " already downloaded content is downloaded ",
276
+ " once again (may download more than one",
277
+ " content file per feed),",
278
+ "* chron: download in chronological order",
279
+ " any content that has not been downloaded ",
280
+ " before; this is useful for audiobook",
281
+ " podcasts etc (may download more than one",
282
+ " content file per feed),",
283
+ "* chron_one: download the oldest content of",
284
+ " each feed that has not already been ",
285
+ " downloaded, ",
286
+ "* chron_all: download all content in ",
287
+ " chronological order, even if the content",
288
+ " has already been downloaded (may download",
289
+ " more than one content file per feed), ",
290
+ "* new: download the most recent content ",
291
+ " of each feed, if it has not already been ",
292
+ " downloaded (DEPRECATED: use 'one' instead",
293
+ " of 'new'),",
294
+ "* cache: generate a playlist for content ",
295
+ " already in cache.",
296
+ "Default value is #{opt.strategy}.\n") do |e|
297
+ opt.strategy = e if e
298
+ end
299
+ c.on("-C", "--content REGEXP",
300
+ "A regular expression that matches the",
301
+ "MIME types of content to be downloaded.",
302
+ "Examples: '^video/', '^audio/mpeg$'.",
303
+ "Default value is '', which matches any",
304
+ "type of content.\n") do |e|
305
+ begin
306
+ opt.content_type = Regexp.new(e.downcase) if e
307
+ rescue Exception
308
+ $stderr.puts "Error: ignoring regular expression '#{e}'"
309
+ end
310
+ end
311
+ c.on("-l", "--language LANG",
312
+ "A list of language tags separated by",
313
+ "commas. Examples: 'en-us,de', 'fr'.",
314
+ "A feed whose language does not match",
315
+ "this list is ignored. By default, all",
316
+ "feeds are accepted. See",
317
+ "http://cyber.law.harvard.edu/rss/languages.html",
318
+ "and",
319
+ "http://cyber.law.harvard.edu/rss/rss.html#optionalChannelElements",
320
+ "for allowed tags.\n") do |e|
321
+ opt.language = e.split ','
322
+ for i in 0...opt.language.size
323
+ opt.language[i].downcase!
324
+ opt.language[i] = opt.language[i].split '-'
325
+ end
326
+ end
327
+ c.on("-H", "--horizon DATE",
328
+ "Do not download content older than",
329
+ "the given date. The date has the format",
330
+ "yyyy.mm.dd (example: 2007.03.22) or",
331
+ "yyyy.mm (equivalent to yyyy.mm.01) or",
332
+ "yyyy (equivalent to yyyy.01.01).",
333
+ "#{opt.horizon ? 'Default value is '+opt.horizon.to_s.split('-').join('.') : 'By default, no horizon is specified'}.\n") do |e|
334
+ begin
335
+ date = e.split '.'
336
+ if (1..3).include? date.size
337
+ while date.size < 3
338
+ date << '01'
339
+ end
340
+ opt.horizon = Date.parse date.join('-')
341
+ end
342
+ rescue ArgumentError
343
+ end
344
+ end
345
+ c.on("-r", "--retries N",
346
+ "Try downloading files (content, feeds",
347
+ "or subscription lists) at most N times",
348
+ "before giving up. Default value is #{opt.retries}.\n") do |e|
349
+ opt.retries = e.to_i unless e.to_i<1
350
+ end
351
+ c.on("-t", "--type TYPE", opt.PLAYLIST_TYPES,
352
+ "Type of the playlist written to",
353
+ "standard output. Accepted values are",
354
+ "#{opt.PLAYLIST_TYPES.join ', '}.",
355
+ "Default value is #{opt.playlist_type}.\n") do |e|
356
+ opt.playlist_type = e if e
357
+ end
358
+ c.on("-m", "--memsize N",
359
+ "Remember last N downloaded content,",
360
+ "and do not download them again. ",
361
+ "0 means unbounded. Default value is #{opt.memsize}.\n") do |e|
362
+ opt.memsize = e.to_i
363
+ opt.memsize = nil if opt.memsize<1
364
+ end
365
+ c.on("-o", "--order ORDER", opt.ORDERS,
366
+ "The order in which feeds are traversed",
367
+ "when downloading content:",
368
+ "* random: randomizes the feed order,",
369
+ " so that every feed has an equal chance",
370
+ " when content is downloaded, even if",
371
+ " the cache size is small and the number",
372
+ " of feeds is big,",
373
+ "* alphabetical: orders feeds",
374
+ " alphabetically by using their titles,",
375
+ "* sequential: preserves the argument ",
376
+ " order (and the feed order in",
377
+ " subscription lists),",
378
+ "* reverse: reverses the feed order.",
379
+ "Default value is #{opt.order}.\n") do |e|
380
+ opt.order = e if e
381
+ end
382
+ c.on("-F", "--function FUNCTION", opt.FUNCTIONS,
383
+ "Used function:",
384
+ "* download: downloads content from",
385
+ " specified feeds,",
386
+ "* search: generates an OPML subscription",
387
+ " list of feeds matching the specified",
388
+ " query; the only options relevant for ",
389
+ " search are -v, -r and -f.",
390
+ "Default value is #{opt.function}.\n") do |e|
391
+ opt.function = e if e
392
+ end
393
+ c.on("-f", "--feeds N",
394
+ "Do not download more than N feeds",
395
+ "(when using the download function),",
396
+ "or return the first N relevant feeds",
397
+ "(when using the search function).",
398
+ "0 means unbounded. Default value is #{opt.feeds}.\n") do |e|
399
+ opt.feeds = e.to_i
400
+ opt.feeds = nil if opt.feeds<1
401
+ end
402
+ c.on("-T", "--torrentdir DIR",
403
+ "Copy torrent files to directory DIR.",
404
+ "The handling of torrents through an",
405
+ "external BitTorrent client is left to",
406
+ "the user. If this option is not used,",
407
+ "torrents are handled internally (if",
408
+ "RubyTorrent is installed), or else",
409
+ "ignored.\n") do |e|
410
+ dir = Pathname.new e
411
+ if dir.exist? and dir.directory?
412
+ opt.torrent_dir = dir
413
+ end
414
+ end
415
+ c.on("-U", "--uploadrate N",
416
+ "Maximum upload rate (kilobytes per second)",
417
+ "for the internal BitTorrent client.",
418
+ "#{opt.upload_rate ? 'Default value is '+opt.upload_rate : 'Unbounded by default'}.\n") do |e|
419
+ opt.upload_rate = e.to_i unless e.to_i<1
420
+ end
421
+ c.on("-i", "--itemsize N",
422
+ "If downloaded content is less than N MB in",
423
+ "size (where N is an integer), fetch other",
424
+ "content of that same feed until this size",
425
+ "is reached. ",
426
+ "Default value is #{opt.itemsize}.",
427
+ "The intent here is to ensure that podcatcher",
428
+ "downloads about as much content from podcasts",
429
+ "that frequently post small content (in",
430
+ "terms of minutes) as it does from podcasts",
431
+ "that post bigger content less frequently.",
432
+ "This option was more relevant in the early",
433
+ "days of podcasting when content size varied",
434
+ "greatly from one podcast to another. You",
435
+ "would rarely need to use this option today.\n") do |e|
436
+ opt.itemsize = e.to_i unless e.to_i<0
437
+ end
438
+ c.on("-c", "--[no-]cache",
439
+ "Generate a playlist for content",
440
+ "already in cache.",
441
+ "DEPRECATED, use '--strategy cache'.\n") do |e|
442
+ opt.strategy = :cache if e
443
+ end
444
+ c.on("-a", "--[no-]asif",
445
+ "Do not download content, only download",
446
+ "feeds and subscription lists.",
447
+ "Useful for testing.\n") do |e|
448
+ opt.simulate = e
449
+ end
450
+ c.on("-v", "--[no-]verbose", "Run verbosely.\n") do |e|
451
+ opt.verbose = e
452
+ end
453
+ c.on("-V", "--version", "Display current version and exit.\n") do
454
+ puts PODCATCHER_VERSION
455
+ exit
456
+ end
457
+ c.on("-h", "--help", "Display this message and exit.\n") do
458
+ puts c.to_s
459
+ exit
460
+ end
461
+ c.on("--[no-]restrictednames",
462
+ 'In the cache directory, make the names of',
463
+ 'created subdirectories and files acceptable',
464
+ 'for restrictive file systems such as VFAT',
465
+ 'and FAT, which are used on Windows and MP3',
466
+ 'player devices.',
467
+ "Enabled by default.\n") do |e|
468
+ opt.restricted_names = e
469
+ end
470
+ # c.on("--[no-]checkforupdate",
471
+ # "Check once every #{UPDATE_CHECK_INTERVAL} months if a newer ",
472
+ # "version is available and display an ",
473
+ # "informational message. Disabled by default.\n") do |e|
474
+ # opt.check_for_update = e
475
+ # end
476
+ # c.on("--[no-]vote",
477
+ # "Automatically vote for the downloaded",
478
+ # "podcasts at podcatcherstats.com.",
479
+ # "Disabled by default.\n") do |e|
480
+ # opt.vote = e
481
+ # end
482
+ c.on("-A", "--arguments FILENAME_OR_URL",
483
+ "Read arguments from specified file.",
484
+ "Rules:",
485
+ "* accepts one argument per line,",
486
+ "* ignores empty lines and lines starting",
487
+ " with #,",
488
+ "* this option may be used several times",
489
+ " in one command.\n") do |e|
490
+ begin
491
+ open(e) do |f|
492
+ loop do
493
+ line = f.gets
494
+ break unless line
495
+ line = line.chomp.strip
496
+ next if line.length == 0
497
+ next if line =~ /^\s*#/
498
+ arguments << line
499
+ end
500
+ end
501
+ rescue Exception
502
+ $stderr.puts "Error: arguments file could not be read and will be ignored"
503
+ end
504
+ end
505
+ c.on("-O", "--options FILENAME_OR_URL",
506
+ "Read options from specified file.",
507
+ "The options file uses the YAML format.\n") do |e|
508
+ loop do
509
+ options = nil
510
+ begin
511
+ open(e) do |f|
512
+ options = YAML::load(f)
513
+ end
514
+ rescue Exception
515
+ $stderr.puts "Error: options file could not be read and will be ignored"
516
+ end
517
+ break unless options
518
+ break unless options.instance_of? Hash
519
+ options.each() do |option, value|
520
+ case option.downcase
521
+ when 'arguments'
522
+ begin
523
+ open(value) do |f|
524
+ loop do
525
+ line = f.gets
526
+ break unless line
527
+ line = line.chomp.strip
528
+ next if line.length == 0
529
+ arguments << line
530
+ end
531
+ end
532
+ rescue Exception
533
+ $stderr.puts "Error: arguments file could not be read and will be ignored"
534
+ end
535
+ when 'dir'
536
+ contained=false
537
+ #cache directory inside old state directory?
538
+ statedir=opt.dir
539
+ cachedir=opt.cachedir
540
+ loop do
541
+ if cachedir==statedir
542
+ contained=true
543
+ break
544
+ end
545
+ break if cachedir.root?
546
+ cachedir=cachedir.parent
547
+ end
548
+ opt.dir = Pathname.new(Dir.pwd)+value
549
+ #cache directory inside new state directory?
550
+ unless contained
551
+ statedir=opt.dir
552
+ cachedir=opt.cachedir
553
+ loop do
554
+ if cachedir==statedir
555
+ contained=true
556
+ break
557
+ end
558
+ break if cachedir.root?
559
+ cachedir=cachedir.parent
560
+ end
561
+ end
562
+ #new state directory inside cache directory?
563
+ unless contained
564
+ statedir=opt.dir
565
+ cachedir=opt.cachedir
566
+ loop do
567
+ if cachedir==statedir
568
+ contained=true
569
+ break
570
+ end
571
+ break if statedir.root?
572
+ statedir=statedir.parent
573
+ end
574
+ end
575
+ #
576
+ opt.dir.mkdir unless opt.dir.exist?
577
+ exit 1 unless opt.dir.directory?
578
+ if contained
579
+ opt.cachedir = opt.dir + opt.CACHEDIR
580
+ end
581
+ when 'cachedir'
582
+ contained=false
583
+ #cache directory should be outside state directory
584
+ statedir=opt.dir
585
+ cachedir = Pathname.new(Dir.pwd)+value
586
+ loop do
587
+ if cachedir==statedir
588
+ contained=true
589
+ break
590
+ end
591
+ break if cachedir.root?
592
+ cachedir=cachedir.parent
593
+ end
594
+ next if contained
595
+ #state directory should be outside cache directory
596
+ statedir=opt.dir
597
+ cachedir = Pathname.new(Dir.pwd)+value
598
+ loop do
599
+ if cachedir==statedir
600
+ contained=true
601
+ break
602
+ end
603
+ break if statedir.root?
604
+ statedir=statedir.parent
605
+ end
606
+ next if contained
607
+ #accept cache directory
608
+ opt.cachedir=Pathname.new(Dir.pwd)+value
609
+ when 'size'
610
+ if value.instance_of?(Fixnum)
611
+ opt.size = value
612
+ opt.size = nil if opt.size<1
613
+ end
614
+ when 'strategy'
615
+ opt.strategy = value.to_sym if opt.STRATEGIES.detect{|s| value.to_sym == s}
616
+ when 'type'
617
+ opt.playlist_type = value.to_sym if opt.PLAYLIST_TYPES.detect{|s| value.to_sym == s}
618
+ when 'retries'
619
+ opt.retries = value if value.instance_of?(Fixnum) and value>=1
620
+ when 'memsize'
621
+ if value.instance_of?(Fixnum)
622
+ opt.memsize = value
623
+ opt.memsize = nil if opt.memsize<1
624
+ end
625
+ when 'content'
626
+ begin
627
+ opt.content_type = Regexp.new(value.downcase)
628
+ rescue Exception
629
+ $stderr.puts "Error: '#{value.downcase}' is not a valid regular expression and will be ignored"
630
+ end
631
+ when 'language'
632
+ opt.language = value.split ','
633
+ for i in 0...opt.language.size
634
+ opt.language[i].downcase!
635
+ opt.language[i] = opt.language[i].split '-'
636
+ end
637
+ when 'order'
638
+ opt.order = value.to_sym if opt.ORDERS.detect{|s| value.to_sym == s}
639
+ when 'function'
640
+ opt.function = value.to_sym if opt.FUNCTIONS.detect{|s| value.to_sym == s}
641
+ when 'feeds'
642
+ if value.instance_of?(Fixnum)
643
+ opt.feeds = value
644
+ opt.feeds = nil if opt.feeds<1
645
+ end
646
+ when 'horizon'
647
+ begin
648
+ date = value.split '.'
649
+ if (1..3).include? date.size
650
+ while date.size < 3
651
+ date << '01'
652
+ end
653
+ opt.horizon = Date.parse date.join('-')
654
+ end
655
+ rescue ArgumentError
656
+ end
657
+ when 'torrentdir'
658
+ dir = Pathname.new value
659
+ if dir.exist? and dir.directory?
660
+ opt.torrent_dir = dir
661
+ end
662
+ when 'uploadrate'
663
+ opt.upload_rate = value if value.instance_of?(Fixnum) and value>=1
664
+ when 'itemsize'
665
+ opt.itemsize = value if value.instance_of?(Fixnum) and value>=0
666
+ when 'perfeed'
667
+ opt.per_feed = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
668
+ when 'cache'
669
+ opt.strategy = :cache if value.instance_of?(TrueClass)
670
+ when 'empty'
671
+ opt.empty = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
672
+ when 'asif'
673
+ opt.simulate = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
674
+ when 'checkforupdate'
675
+ opt.check_for_update = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
676
+ when 'vote'
677
+ opt.vote = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
678
+ when 'verbose'
679
+ opt.verbose = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
680
+ when 'restrictednames'
681
+ opt.restricted_names = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
682
+ end
683
+ end
684
+ break
685
+ end
686
+ end
687
+ c.separator ""
688
+ c.separator "Usage examples:"
689
+ c.separator ""
690
+ c.separator " #{$0} http://feeds.feedburner.com/Ruby5"
691
+ c.separator ""
692
+ c.separator " #{$0} -O options.yaml -A feeds.txt"
693
+ c.separator ""
694
+ c.separator " #{$0} --dir ~/podcasts http://www.npr.org/podcasts.opml"
695
+ c.separator ""
696
+ c.separator " #{$0} --dir ~/podcasts --strategy cache > cache.m3u"
697
+ c.separator ""
698
+ c.separator " cat feeds.opml | #{$0} --dir ~/podcasts > latest.m3u"
699
+ c.separator ""
700
+ c.separator " #{$0} -vd ~/podcasts -s 500 -m 10_000 -t tox feeds.opml > latest.tox"
701
+ c.separator ""
702
+ c.separator " #{$0} -vF search news http://www.bbc.co.uk/podcasts.opml > bbc_news.opml"
703
+ c.separator ""
704
+ c.separator " #{$0} -F search -f 12 news http://www.npr.org/podcasts.opml > npr_news.opml"
705
+ end
706
+ option_parser.parse!
707
+
708
+ class Playlist
709
+ def initialize(playlisttype)
710
+ @playlisttype = playlisttype
711
+ @audio_or_video = Regexp.new '^audio/|^video/'
712
+ @size = 0
713
+ end
714
+ def start()
715
+ @str = ""
716
+ case @playlisttype
717
+ when :tox
718
+ @str = "# toxine playlist \n"
719
+ when :m3u
720
+ @str = "#EXTM3U\n"
721
+ when :pls
722
+ @str = "[playlist]\n"
723
+ when :asx
724
+ @str = <<END
725
+ <asx version = "3.0">
726
+ END
727
+ when :smil
728
+ @str = <<END
729
+ <?xml version="1.0"?>
730
+ <!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
731
+ <smil xmlns="http://www.w3.org/2001/SMIL20/Language">
732
+ <head></head>
733
+ <body>
734
+ END
735
+ when :xspf
736
+ @doc = Document.new
737
+ @doc.xml_decl.dowrite
738
+ @doc.add_element Element.new("playlist")
739
+ @doc.root.add_attribute "version", "1"
740
+ @doc.root.add_attribute "xmlns", "http://xspf.org/ns/0/"
741
+ @tracklist = Element.new("trackList")
742
+ @doc.root.add_element @tracklist
743
+ end
744
+ print @str
745
+ @str
746
+ end
747
+ def add(content)
748
+ return unless content
749
+ if content.mime
750
+ return unless @audio_or_video =~ content.mime
751
+ end
752
+ @size+=1
753
+ feed_title = content.feed_title
754
+ feed_title = '' unless feed_title
755
+ feed_title = sanitize feed_title
756
+ title = content.title
757
+ title = '' unless title
758
+ title = sanitize title
759
+ title = "#{content.pub_date.strftime('%Y.%m.%d')} - "+title if content.pub_date
760
+ entry = ""
761
+ case @playlisttype
762
+ when :m3u
763
+ feed_title = feed_title.gsub(/,/," ")
764
+ title = title.gsub(/,/," ")
765
+ entry = "#EXTINF:-1,[#{feed_title}] #{title}\n#{content.file.to_s}\n"
766
+ when :pls
767
+ entry = "File#{@size}:#{content.file}\nTitle#{@size}:[#{feed_title}] #{title}\nLength#{@size}:-1\n"
768
+ when :asx
769
+ entry = " <entry><ref href='#{content.file.to_s.gsub(/&/,"&amp;").gsub(/'/,"&apos;").gsub(/"/,"&quot;")}' /></entry>\n"
770
+ when :smil
771
+ entry = " <ref src='#{content.file.to_s.gsub(/&/,"&amp;").gsub(/'/,"&apos;").gsub(/"/,"&quot;")}' />\n"
772
+ when :tox
773
+ entry = "entry { \n\tidentifier = [#{feed_title}] #{title};\n\tmrl = #{content.file};\n};\n"
774
+ when :xspf
775
+ track = Element.new("track")
776
+ @tracklist.add_element track
777
+ title = Element.new("title")
778
+ title.add_text "[#{feed_title}] #{title}"
779
+ track.add_element title
780
+ location = Element.new("location")
781
+ location.add_text fileurl(content.file)
782
+ track.add_element location
783
+ end
784
+ @str += entry
785
+ print entry
786
+ entry
787
+ end
788
+ def finish()
789
+ res = ""
790
+ case @playlisttype
791
+ when :tox
792
+ res = "# end "
793
+ when :asx
794
+ res = <<END
795
+ </asx>
796
+ END
797
+ when :smil
798
+ res = <<END
799
+ </body>
800
+ </smil>
801
+ END
802
+ when :pls
803
+ res = "NumberOfEntries=#{@size}\nVersion=2\n"
804
+ when :xspf
805
+ @doc.write $stdout, 0
806
+ end
807
+ @str += res
808
+ print res
809
+ res
810
+ end
811
+ def to_s()
812
+ if @doc
813
+ @doc.to_s
814
+ else
815
+ @str
816
+ end
817
+ end
818
+ private
819
+ def fileurl(path)
820
+ res = ""
821
+ loop do
822
+ path, base = path.split
823
+ if base.root?
824
+ if base.to_s != "/"
825
+ res = "/"+CGI.escape(base.to_s)+res
826
+ end
827
+ break
828
+ end
829
+ res = "/"+CGI.escape(base.to_s)+res
830
+ end
831
+ "file://"+res
832
+ end
833
+ def sanitize(text) #removes invisible characters from text
834
+ return nil unless text
835
+ res = ''
836
+ text.each_byte() do |c|
837
+ case c
838
+ when 0..31, 127 #control chars
839
+ res << ' '
840
+ else
841
+ res << c
842
+ end
843
+ end
844
+ res
845
+ end
846
+ end
847
+
848
+ class Update
849
+ def initialize(dir)
850
+ @now = Time.now
851
+ @data = {'last-check' => @now, 'latest-version' => PODCATCHER_VERSION, 'latest-version-description' => ''}
852
+ @server = URI.parse('http://www.podcatcherstats.com/podcatcher/latest_release')
853
+ @server = URI.parse('http://0.0.0.0:3000/podcatcher/latest_release') if PODCATCHER_ENV == :development
854
+ return unless dir
855
+ return unless dir.directory?
856
+ @file = dir + 'updates'
857
+ if @file.exist? and @file.file?
858
+ begin
859
+ data = nil
860
+ @file.open() do |f|
861
+ data = YAML.load f
862
+ end
863
+ if data.instance_of? Hash
864
+ if newer_or_equal? data['latest-version']
865
+ data.each() do |key, value|
866
+ case key
867
+ when 'last-check'
868
+ @data[key] = value if value.instance_of? Time and value < @now
869
+ when 'latest-version'
870
+ @data[key] = value if value.instance_of? String
871
+ when 'latest-version-description'
872
+ @data[key] = value if value.instance_of? String
873
+ end
874
+ end
875
+ end
876
+ end
877
+ rescue Interrupt
878
+ @file.delete
879
+ rescue SystemExit
880
+ exit 1
881
+ rescue Exception
882
+ @file.delete
883
+ end
884
+ end
885
+ save
886
+ exit 1 unless @file.file?
887
+ end
888
+ def check()
889
+ if @now - @data['last-check'] > 60.0 * 60.0 * 24 * 30 * UPDATE_CHECK_INTERVAL
890
+ @data['last-check'] = @now
891
+ begin
892
+ Net::HTTP.start(@server.host, @server.port) do |http|
893
+ resp = http.get(@server.path, {'User-Agent' => USER_AGENT, 'Connection' => 'close'})
894
+ loop do
895
+ break unless resp.code =~ Regexp.new('^2')
896
+ doc = Document.new resp.body
897
+ break unless doc and doc.root and doc.root.name == 'release'
898
+ version = XPath.first doc.root, 'version'
899
+ break unless version
900
+ break unless newer? version.text
901
+ description = XPath.first doc.root, 'description'
902
+ if description
903
+ description = description.text.strip
904
+ else
905
+ description = ''
906
+ end
907
+ @data['latest-version'] = version.join '.'
908
+ @data['latest-version-description'] = description
909
+ save
910
+ break
911
+ end
912
+ # read resp.body
913
+ end
914
+ rescue Interrupt
915
+ rescue SystemExit
916
+ exit 1
917
+ rescue Exception
918
+ end
919
+ end
920
+ flash
921
+ end
922
+ def to_s()
923
+ res = ''
924
+ if @data
925
+ @data.each() do |key, value|
926
+ res+= "#{key}: #{value}\n"
927
+ end
928
+ end
929
+ res
930
+ end
931
+ private
932
+ def flash()
933
+ return unless newer? @data['latest-version'] #if equal? @data['latest-version']
934
+ #constants
935
+ line_length = 70
936
+ p = '**** '
937
+ #
938
+ $stderr.puts ""
939
+ $stderr.puts p+"New release:"
940
+ $stderr.puts p+"Version #{@data['latest-version']} is available at #{PODCATCHER_WEBSITE}."
941
+ if @data['latest-version-description'].size>0
942
+ descr = []
943
+ @data['latest-version-description'].each() do |line|
944
+ descr = descr + line.chomp.split(' ')
945
+ end
946
+ line = nil
947
+ descr.each() do |word|
948
+ if line and (line + ' ' + word).size>line_length
949
+ $stderr.puts p+line
950
+ line = nil
951
+ end
952
+ if line
953
+ line += ' '+word
954
+ else
955
+ line = word
956
+ end
957
+
958
+ end
959
+ $stderr.puts p+line if line
960
+ end
961
+ $stderr.puts ""
962
+ end
963
+ def save()
964
+ @file.open('w') do |f|
965
+ YAML.dump @data, f
966
+ end
967
+ end
968
+ def compare_with(version) # Return values: -1: version<installed_version, 0: version==installed_version, 1: version>installed_version
969
+ return -1 unless version
970
+ version = version.strip.split '.'
971
+ for i in 0...version.size
972
+ version[i] = version[i].to_i
973
+ end
974
+ current_version = PODCATCHER_VERSION.strip.split '.'
975
+ for i in 0...current_version.size
976
+ current_version[i] = current_version[i].to_i
977
+ end
978
+ res = 0
979
+ for i in 0...version.size
980
+ break if i>=current_version.size
981
+ if current_version[i]>version[i]
982
+ res = -1
983
+ break
984
+ end
985
+ if current_version[i]<version[i]
986
+ res = 1
987
+ break
988
+ end
989
+ end
990
+ res
991
+ end
992
+ def newer?(version)
993
+ compare_with(version) == 1
994
+ end
995
+ def newer_or_equal?(version)
996
+ compare_with(version) != -1
997
+ end
998
+ def equal?(version)
999
+ compare_with(version) == 0
1000
+ end
1001
+ end
1002
+
1003
+ class Stats
1004
+ def initialize(dir)
1005
+ srand
1006
+ @now = Time.now
1007
+ @data = {'ping-probability' => 1.0}
1008
+ @server = URI.parse('http://www.podcatcherstats.com/podcatcher/ping')
1009
+ @server = URI.parse('http://0.0.0.0:3000/podcatcher/ping') if PODCATCHER_ENV == :development
1010
+ return unless dir
1011
+ return unless dir.directory?
1012
+ @file = dir + 'votes'
1013
+ if @file.exist? and @file.file?
1014
+ data = nil
1015
+ begin
1016
+ @file.open() do |f|
1017
+ data = YAML.load f
1018
+ end
1019
+ rescue Interrupt
1020
+ @file.delete
1021
+ rescue SystemExit
1022
+ exit 1
1023
+ rescue Exception
1024
+ @file.delete
1025
+ end
1026
+ if data.instance_of? Hash
1027
+ # $stderr.puts "votes file read"
1028
+ data.each() do |key, value|
1029
+ case key
1030
+ when 'ping-probability'
1031
+ @data[key] = value unless value<0.0 or 1.0<value
1032
+ when 'last-session'
1033
+ @data[key] = value unless @now<value
1034
+ when 'last-ping'
1035
+ @data[key] = value unless @now<value
1036
+ end
1037
+ end
1038
+ else
1039
+ # $stderr.puts "votes file could not be read"
1040
+ save
1041
+ end
1042
+ end
1043
+ if @data['last-ping']
1044
+ if @data['last-session']
1045
+ @data['last-ping'] = nil if @data['last-session']<@data['last-ping']
1046
+ else
1047
+ @data['last-ping'] = nil
1048
+ end
1049
+ end
1050
+ save unless @file.exist?
1051
+ exit 1 unless @file.file?
1052
+ end
1053
+ def ping(opt, feeds)
1054
+ return unless opt
1055
+ return unless feeds
1056
+ return if opt.simulate
1057
+ #constants
1058
+ max_sent_feeds = 50 #max nb of feed info to be sent
1059
+ #
1060
+ now = Time.now
1061
+ begin
1062
+ loop do
1063
+ break unless opt.vote
1064
+ break unless ping?
1065
+ # $stderr.puts "ping: #{@server}"
1066
+ stats = Document.new
1067
+ stats.add_element 'downloading'
1068
+ #state
1069
+ stats.root.add_element state_element #(opt)
1070
+ #feeds
1071
+ sent_feeds = 0
1072
+ feeds.each() do |feed|
1073
+ if feed.size > 0 and feed[0].feedurl and feed[0].feedurl.size<255 and (not URI.parse(feed[0].feedurl).instance_of?(URI::Generic)) and sent_feeds < max_sent_feeds
1074
+ stats.root.add_element 'feed', {'url' => feed[0].feedurl}
1075
+ sent_feeds += 1
1076
+ end
1077
+ end
1078
+ break unless sent_feeds>0
1079
+ #send
1080
+ stats_str = ''
1081
+ stats.write stats_str
1082
+ if PODCATCHER_ENV != :production
1083
+ $stderr.puts "Sent:"
1084
+ $stderr.puts stats_str
1085
+ end
1086
+ change_state = nil
1087
+ Net::HTTP.start(@server.host, @server.port) do |http|
1088
+ resp = http.request_post @server.path, stats_str, 'User-Agent' => USER_AGENT, 'Content-Type' => 'application/xml', 'Connection' => 'close'
1089
+ if PODCATCHER_ENV != :production
1090
+ $stderr.puts "Received:"
1091
+ $stderr.puts "#{resp.body}"
1092
+ end
1093
+ change resp.body
1094
+ end
1095
+ @data['last-ping'] = now+0
1096
+ break
1097
+ end
1098
+ rescue Interrupt
1099
+ # $stderr.puts "int1 #{$!}"
1100
+ rescue SystemExit
1101
+ exit 1
1102
+ rescue Exception
1103
+ # $stderr.puts "exc #{$!}"
1104
+ end
1105
+ @data['last-session'] = now+0
1106
+ save
1107
+ # $stderr.puts "#{to_s}"
1108
+ end
1109
+ def ping_search(opt, query)
1110
+ return unless opt
1111
+ return unless query
1112
+ return if opt.simulate
1113
+ now = Time.now
1114
+ begin
1115
+ loop do
1116
+ break unless opt.vote
1117
+ break unless ping?
1118
+ # $stderr.puts "ping.."
1119
+ stats = Document.new
1120
+ stats.add_element 'searching', {'query' => query}
1121
+ #state
1122
+ stats.root.add_element state_element
1123
+ #send
1124
+ stats_str = ''
1125
+ stats.write stats_str
1126
+ # $stderr.puts stats_str
1127
+ change_state = nil
1128
+ Net::HTTP.start(@server.host, @server.port) do |http|
1129
+ resp = http.request_post @server.path, stats_str, 'User-Agent' => USER_AGENT, 'Content-Type' => 'application/xml', 'Connection' => 'close'
1130
+ # $stderr.puts "#{resp.body}"
1131
+ change resp.body
1132
+ end
1133
+ @data['last-ping'] = now+0
1134
+ break
1135
+ end
1136
+ rescue Interrupt
1137
+ # $stderr.puts "int1 #{$!}"
1138
+ rescue SystemExit
1139
+ exit 1
1140
+ rescue Exception
1141
+ # $stderr.puts "exc #{$!}"
1142
+ end
1143
+ @data['last-session'] = now+0
1144
+ save
1145
+ # $stderr.puts "#{to_s}"
1146
+ end
1147
+ def to_s()
1148
+ res = ''
1149
+ if @data
1150
+ @data.each() do |key, value|
1151
+ res+= "#{key}: #{value}\n"
1152
+ end
1153
+ end
1154
+ res
1155
+ end
1156
+ private
1157
+ def save()
1158
+ @file.open('w') do |f|
1159
+ YAML.dump @data, f
1160
+ end
1161
+ end
1162
+ def ping?()
1163
+ r = rand
1164
+ # $stderr.puts "random: #{r}, ping-probability: #{@data['ping-probability']}"
1165
+ return r < @data['ping-probability']
1166
+ end
1167
+ def change(doc_str)
1168
+ return unless doc_str
1169
+ begin
1170
+ change_state = Document.new doc_str
1171
+ loop do
1172
+ break unless change_state
1173
+ break unless change_state.root
1174
+ break unless change_state.root.name == 'state'
1175
+ #ping-probability
1176
+ ping = change_state.root.attributes['ping']
1177
+ if ping and ping.size>0
1178
+ ping = ping.to_f
1179
+ unless ping<0.0 or 1.0<ping
1180
+ @data['ping-probability'] = ping
1181
+ end
1182
+ end
1183
+ #
1184
+ break
1185
+ end
1186
+ rescue Interrupt
1187
+ rescue SystemExit
1188
+ exit 1
1189
+ rescue Exception
1190
+ end
1191
+ end
1192
+ def state_element #(opt=nil)
1193
+ state = Element.new 'state'
1194
+ state.add_attribute('ping', @data['ping-probability']) if @data['ping-probability']
1195
+ if @data['last-session']
1196
+ age_in_seconds = @now - @data['last-session'] #Float
1197
+ age_in_days = age_in_seconds/60.0/60.0/24.0
1198
+ state.add_attribute('age', age_in_days)
1199
+ end
1200
+ # return state unless opt
1201
+ # state.add_attribute('strategy', opt.strategy)
1202
+ # state.add_attribute('order', opt.order)
1203
+ # state.add_attribute('cache', opt.size / 1_000_000) if opt.size
1204
+ # state.add_attribute('content', opt.content_type.source) if opt.content_type and opt.content_type.source.size<80
1205
+ state
1206
+ end
1207
+ end
1208
+
1209
+ class History
1210
+ def initialize(dir)
1211
+ @history = dir + "history"
1212
+ @history_old = dir + "history-old"
1213
+ unless @history.exist?
1214
+ @history_old.rename @history if @history_old.exist?
1215
+ end
1216
+ @history.open("w"){|f|}unless @history.exist?
1217
+ exit 1 unless @history.file?
1218
+ @history_old.delete if @history_old.exist?
1219
+ end
1220
+ def mark_old_content(feeds)
1221
+ feeds.each() do |feed|
1222
+ feed.each() do |content|
1223
+ content.in_history = false
1224
+ end
1225
+ end
1226
+ @history.each_line() do |url|
1227
+ url = url.chomp
1228
+ feeds.each() do |feed|
1229
+ feed.each() do |content|
1230
+ next if content.in_history
1231
+ content.in_history = content.url == url
1232
+ end
1233
+ end
1234
+ end
1235
+ end
1236
+ def add(content)
1237
+ begin
1238
+ @history.open("a") do |f|
1239
+ f.puts content.url
1240
+ end
1241
+ rescue Interrupt, SystemExit
1242
+ exit 1
1243
+ rescue Exception
1244
+ $stderr.puts "Error: history file could not be updated"
1245
+ end
1246
+ end
1247
+ def trim(limit)
1248
+ begin
1249
+ history_size = 0
1250
+ @history.each_line() do |url|
1251
+ history_size += 1
1252
+ end
1253
+ if history_size > limit #shrink
1254
+ @history_old.delete if @history_old.exist?
1255
+ @history.rename @history_old
1256
+ @history.open("w") do |f|
1257
+ @history_old.each_line() do |url|
1258
+ f.print(url) if history_size <= limit
1259
+ history_size -= 1
1260
+ end
1261
+ end
1262
+ @history_old.unlink
1263
+ end
1264
+ rescue Interrupt, SystemExit
1265
+ exit 1
1266
+ rescue Exception
1267
+ $stderr.puts "Error: failure during history file clean-up."
1268
+ end if limit
1269
+ end
1270
+ end
1271
+
1272
+ class Cache
1273
+ def initialize(opt)
1274
+ super()
1275
+ @opt = opt
1276
+ @@TORRENT = "application/x-bittorrent"
1277
+ @@MEDIA_RSS_NS = ['http://search.yahoo.com/mrss/']
1278
+ @@MEDIA_RSS_NS << 'http://search.yahoo.com/mrss'
1279
+ @@ATOM_NS = Regexp.new "^http://purl.org/atom/ns#"
1280
+ #history
1281
+ @history = History.new opt.dir
1282
+ #stats
1283
+ @stats = Stats.new opt.dir
1284
+ #cache
1285
+ @cache_dir = opt.cachedir #opt.dir+"cache"
1286
+ @cache_dir.mkdir() unless @cache_dir.exist?
1287
+ exit 1 unless @cache_dir.directory?
1288
+ @cache_dir.each_entry() do |e|
1289
+ e = @cache_dir+e
1290
+ e = e.cleanpath
1291
+ next if e == @cache_dir or e == @cache_dir.parent
1292
+ if e.directory? #feed subfolder
1293
+ e.each_entry() do |e2|
1294
+ e2 = e+e2
1295
+ next if e2.directory?
1296
+ if opt.empty
1297
+ unless opt.simulate or opt.strategy == :cache
1298
+ $stderr.puts "Deleting: #{e2}" if opt.verbose
1299
+ e2.delete
1300
+ end
1301
+ end
1302
+ end
1303
+ e.delete if e.entries.size == 2
1304
+ elsif opt.empty
1305
+ unless opt.simulate or opt.strategy == :cache
1306
+ $stderr.puts "Deleting: #{e}" if opt.verbose
1307
+ e.delete
1308
+ end
1309
+ end
1310
+ end
1311
+ @cache = @cache_dir.entries.collect() do |e|
1312
+ e = @cache_dir+e
1313
+ e = e.cleanpath
1314
+ next if e == @cache_dir or e == @cache_dir.parent
1315
+ if e.file?
1316
+ content = OpenStruct.new
1317
+ content.file = e
1318
+ content.size = e.size
1319
+ content.title = e.to_s
1320
+ content
1321
+ elsif e.directory?
1322
+ e.entries.collect() do |e2|
1323
+ e2 = e+e2
1324
+ if e2.file?
1325
+ content = OpenStruct.new
1326
+ content.file = e2
1327
+ content.size = e2.size
1328
+ content.title = e2.to_s
1329
+ content
1330
+ else
1331
+ nil
1332
+ end
1333
+ end
1334
+ else
1335
+ nil
1336
+ end
1337
+ end
1338
+ @cache.flatten!
1339
+ @cache.compact!
1340
+ @cache.sort!() do |e,e2|
1341
+ e.file.mtime() <=> e2.file.mtime()
1342
+ end
1343
+ end
1344
+ def createplaylist(urls)
1345
+ playlist = Playlist.new @opt.playlist_type
1346
+ if @opt.strategy == :cache
1347
+ playlist.start
1348
+ @cache.reverse!
1349
+ @cache.each() do |content|
1350
+ playlist.add content
1351
+ end
1352
+ playlist.finish
1353
+ return playlist.to_s
1354
+ end
1355
+ playlist.start
1356
+ doc = nil
1357
+ if urls.size == 0
1358
+ $stderr.puts "Reading document from standard input" if @opt.verbose
1359
+ begin
1360
+ xml = ""
1361
+ $stdin.each() do |e|
1362
+ xml += e
1363
+ end
1364
+ doc = OpenStruct.new
1365
+ doc.dom = Document.new(xml)
1366
+ doc = nil unless doc.dom
1367
+ rescue Interrupt, SystemExit
1368
+ exit 1
1369
+ rescue Exception
1370
+ $stderr.puts "Error: unreadable document"
1371
+ doc = nil
1372
+ end
1373
+ end
1374
+ dochistory = []
1375
+ feeds = []
1376
+ urls.uniq!
1377
+ links = urls.collect() do |e|
1378
+ l = OpenStruct.new
1379
+ l.url = e
1380
+ l
1381
+ end
1382
+ loop do
1383
+ break if @opt.feeds and feeds.size >= @opt.feeds
1384
+ while not doc
1385
+ link = links.shift
1386
+ break unless link
1387
+ if dochistory.detect{|e| e == link.url}
1388
+ $stderr.puts "Skipping duplicate: #{link.url}" if @opt.verbose
1389
+ next
1390
+ end
1391
+ $stderr.puts "Fetching: #{link.url}" if @opt.verbose
1392
+ dochistory << link.url
1393
+ begin
1394
+ doc = fetchdoc(link)
1395
+ rescue Interrupt, SystemExit
1396
+ exit 1
1397
+ rescue Exception
1398
+ $stderr.puts "Error: skipping unreadable document"
1399
+ end
1400
+ end
1401
+ break unless doc
1402
+ begin
1403
+ if doc.dom.root.name == "opml"
1404
+ newlinks = []
1405
+ outlines = []
1406
+ doc.dom.elements.each("/opml/body") do |body|
1407
+ body.elements.each() do |e|
1408
+ next unless e.name == 'outline'
1409
+ outlines << e
1410
+ end
1411
+ end
1412
+ while outlines.size>0
1413
+ outline = outlines.shift
1414
+ url = outline.attributes["xmlUrl"]
1415
+ url = outline.attributes["url"] unless url
1416
+ if url
1417
+ begin
1418
+ url = URI.parse(doc.url).merge(url).to_s if doc.url
1419
+ link = OpenStruct.new
1420
+ link.url = url
1421
+ link.referrer = doc.url
1422
+ newlinks << link
1423
+ rescue URI::InvalidURIError
1424
+ end
1425
+ next
1426
+ end
1427
+ new_outlines = []
1428
+ outline.elements.each() do |e|
1429
+ next unless e.name == 'outline'
1430
+ new_outlines << e
1431
+ end
1432
+ outlines = new_outlines + outlines
1433
+ end
1434
+ links = newlinks + links
1435
+ elsif doc.dom.root.name == "pcast"
1436
+ newlinks = []
1437
+ XPath.each(doc.dom,"//link[@rel='feed']") do |outline|
1438
+ url = outline.attributes["href"]
1439
+ next unless url
1440
+ begin
1441
+ url = URI.parse(doc.url).merge(url).to_s if doc.url
1442
+ link = OpenStruct.new
1443
+ link.url = url
1444
+ link.referrer = doc.url
1445
+ newlinks << link
1446
+ rescue URI::InvalidURIError
1447
+ end
1448
+ end
1449
+ links = newlinks + links
1450
+ elsif doc.dom.root.namespace =~ @@ATOM_NS
1451
+ feed = []
1452
+ XPath.each(doc.dom.root,"//*[@rel='enclosure']") do |e2|
1453
+ next unless e2.namespace =~ @@ATOM_NS
1454
+ content = OpenStruct.new
1455
+ XPath.each(e2,"parent::/title/text()") do |node|
1456
+ content.title = ""
1457
+ node.value.each_line() do |e3| #remove line breaks
1458
+ content.title+= e3.chomp+" "
1459
+ end
1460
+ content.title.strip!
1461
+ end
1462
+ XPath.each(e2,"parent::/created/text()") do |node|
1463
+ pub_date = ""
1464
+ node.value.each_line() do |e3| #remove line breaks
1465
+ pub_date+= e3.chomp+" "
1466
+ end
1467
+ begin
1468
+ content.pub_date = DateTime.parse(pub_date.strip, true)
1469
+ rescue Exception
1470
+ end
1471
+ end
1472
+ content.mime = e2.attributes["type"].downcase
1473
+ next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
1474
+ next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
1475
+ content.feedurl = doc.url
1476
+ begin
1477
+ content.url = URI.parse(content.feedurl).merge(e2.attributes["href"]).to_s if content.feedurl
1478
+ content.size = e2.attributes["length"].to_i
1479
+ content.size = 2 unless content.size and content.size>0
1480
+ content.size = 0 if content.mime == @@TORRENT #not strictly necessary
1481
+ feed << content
1482
+ rescue URI::InvalidURIError
1483
+ end
1484
+ end
1485
+ #sort by date
1486
+ feed.sort!() do |a,b|
1487
+ if a.pub_date
1488
+ if b.pub_date
1489
+ b.pub_date <=> a.pub_date
1490
+ else
1491
+ -1
1492
+ end
1493
+ else
1494
+ if b.pub_date
1495
+ 1
1496
+ else
1497
+ 0
1498
+ end
1499
+ end
1500
+ end
1501
+ feed.each() do |content|
1502
+ $stderr.puts "Enclosure: #{content.url}"
1503
+ end if @opt.verbose
1504
+ #title
1505
+ node = XPath.first(doc.dom,"/feed/title/text()")
1506
+ feed_title = ""
1507
+ node.value.each_line() do |e3| #remove line breaks
1508
+ feed_title += e3.chomp+" "
1509
+ end
1510
+ feed_title.strip!
1511
+ feed.each() do |content|
1512
+ content.feed_title = feed_title
1513
+ end
1514
+ #
1515
+ feeds << feed
1516
+ elsif doc.dom.root.name = "rss"
1517
+ feed = []
1518
+ doc.dom.root.elements.each() do |e| #channel
1519
+ e.elements.each() do |e1| #item
1520
+ title = ''
1521
+ XPath.each(e1,"title/text()") do |node|
1522
+ title = ''
1523
+ node.value.each_line() do |e3| #remove line breaks
1524
+ title+= e3.chomp+" "
1525
+ end
1526
+ title.strip!
1527
+ end
1528
+ pub_date = nil
1529
+ XPath.each(e1,"pubDate/text()") do |node|
1530
+ pub_date = ""
1531
+ node.value.each_line() do |e3| #remove line breaks
1532
+ pub_date+= e3.chomp+" "
1533
+ end
1534
+ begin
1535
+ pub_date = DateTime.parse(pub_date.strip, true)
1536
+ rescue Exception
1537
+ pub_date = nil
1538
+ end
1539
+ end
1540
+ e1.elements.each() do |e2|
1541
+ if e2.name == "enclosure"
1542
+ content = OpenStruct.new
1543
+ content.title = title
1544
+ content.pub_date = pub_date
1545
+ content.mime = e2.attributes["type"].downcase
1546
+ next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
1547
+ next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
1548
+ content.feedurl = doc.url
1549
+ begin
1550
+ content.url = URI.parse(content.feedurl).merge(e2.attributes["url"]).to_s if content.feedurl
1551
+ content.size = e2.attributes["length"].to_i
1552
+ content.size = 2 unless content.size and content.size>0
1553
+ content.size = 0 if content.mime == @@TORRENT #not strictly necessary
1554
+ feed << content
1555
+ rescue URI::InvalidURIError
1556
+ end
1557
+ elsif @@MEDIA_RSS_NS.include? e2.namespace
1558
+ case e2.name
1559
+ when 'content'
1560
+ content = OpenStruct.new
1561
+ content.title = title
1562
+ content.pub_date = pub_date
1563
+ content.mime = e2.attributes["type"].downcase
1564
+ next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
1565
+ next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
1566
+ content.feedurl = doc.url
1567
+ begin
1568
+ content.url = URI.parse(content.feedurl).merge(e2.attributes["url"]).to_s if content.feedurl
1569
+ content.size = e2.attributes["fileSize"].to_i
1570
+ content.size = 2 unless content.size and content.size>0
1571
+ content.size = 0 if content.mime == @@TORRENT #not strictly necessary
1572
+ feed << content
1573
+ rescue URI::InvalidURIError
1574
+ end
1575
+ when 'group'
1576
+ e2.elements.each() do |e4|
1577
+ if e4.name == 'content' and @@MEDIA_RSS_NS.include?(e4.namespace)
1578
+ content = OpenStruct.new
1579
+ content.title = title
1580
+ content.pub_date = pub_date
1581
+ content.mime = e4.attributes["type"].downcase
1582
+ next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
1583
+ next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
1584
+ content.feedurl = doc.url
1585
+ begin
1586
+ content.url = URI.parse(content.feedurl).merge(e4.attributes["url"]).to_s if content.feedurl
1587
+ content.size = e4.attributes["fileSize"].to_i
1588
+ content.size = 2 unless content.size and content.size>0
1589
+ content.size = 0 if content.mime == @@TORRENT #not strictly necessary
1590
+ feed << content
1591
+ rescue URI::InvalidURIError
1592
+ end
1593
+ break
1594
+ end
1595
+ end
1596
+ end
1597
+
1598
+ end
1599
+ end if e1.name == "item"
1600
+ end if e.name == "channel"
1601
+ end
1602
+ #remove duplicates (duplication occurs in particular for content declared as both enclosure and Media RSS content)
1603
+ for i in 0...feed.size
1604
+ content = feed[i]
1605
+ next unless content
1606
+ for j in i+1...feed.size
1607
+ next unless feed[j]
1608
+ feed[j] = nil if feed[j].url == content.url
1609
+ end
1610
+ end
1611
+ feed.compact!
1612
+ #sort by date
1613
+ feed.sort!() do |a,b|
1614
+ if a.pub_date
1615
+ if b.pub_date
1616
+ b.pub_date <=> a.pub_date
1617
+ else
1618
+ -1
1619
+ end
1620
+ else
1621
+ if b.pub_date
1622
+ 1
1623
+ else
1624
+ 0
1625
+ end
1626
+ end
1627
+ end
1628
+ feed.each() do |content|
1629
+ $stderr.puts "Enclosure: #{content.url}"
1630
+ end if @opt.verbose
1631
+ #title
1632
+ node = XPath.first(doc.dom,"//channel/title/text()")
1633
+ feed_title = ""
1634
+ node.value.each_line() do |e3| #remove line breaks
1635
+ feed_title += e3.chomp+" "
1636
+ end
1637
+ feed_title.strip!
1638
+ feed.each() do |content|
1639
+ content.feed_title = feed_title
1640
+ end
1641
+ #language
1642
+ if @opt.language.size > 0
1643
+ loop do
1644
+ node = XPath.first doc.dom, '//channel/language/text()'
1645
+ break unless node
1646
+ break unless node.value
1647
+ feed_lang = node.value.strip.downcase.split '-'
1648
+ break if feed_lang.size == 0
1649
+ langmatch = @opt.language.collect() do |lang|
1650
+ next false if feed_lang.size < lang.size
1651
+ matches = true
1652
+ for i in 0...lang.size
1653
+ next if lang[i] == feed_lang[i]
1654
+ matches = false
1655
+ end
1656
+ matches
1657
+ end
1658
+ feeds << feed if langmatch.include? true
1659
+ break
1660
+ end
1661
+ else
1662
+ feeds << feed
1663
+ end
1664
+ end
1665
+ rescue Interrupt, SystemExit
1666
+ exit 1
1667
+ rescue Exception
1668
+ $stderr.puts "Error: skipping document because of an internal error #{$@}"
1669
+ end
1670
+ doc = nil
1671
+ end
1672
+ #remove content older than the horizon date
1673
+ if @opt.horizon
1674
+ feeds.each() do |feed|
1675
+ for i in 0...feed.size
1676
+ if feed[i].pub_date
1677
+ feed[i] = nil if feed[i].pub_date < @opt.horizon
1678
+ else
1679
+ feed[i] = nil
1680
+ end
1681
+ end
1682
+ feed.compact!
1683
+ end
1684
+ end
1685
+ #apply download strategy
1686
+ @history.mark_old_content feeds
1687
+ if @opt.strategy == :chron or @opt.strategy == :chron_one or @opt.strategy == :chron_all
1688
+ feeds.each() do |feed|
1689
+ feed.reverse!
1690
+ end
1691
+ @opt.strategy = :back_catalog if @opt.strategy == :chron
1692
+ @opt.strategy = :one if @opt.strategy == :chron_one
1693
+ @opt.strategy = :all if @opt.strategy == :chron_all
1694
+ end
1695
+ case @opt.strategy #remove ignored content
1696
+ when :new
1697
+ feeds.each() do |feed|
1698
+ in_hist = nil
1699
+ for i in 0...feed.size
1700
+ if feed[i].in_history
1701
+ in_hist = i
1702
+ break
1703
+ end
1704
+ end
1705
+ feed.slice! in_hist...feed.size if in_hist
1706
+ end
1707
+ when :all
1708
+ else
1709
+ feeds.each() do |feed|
1710
+ for i in 0...feed.size
1711
+ feed[i] = nil if feed[i].in_history
1712
+ end
1713
+ feed.compact!
1714
+ end
1715
+ end
1716
+ if @opt.strategy == :new or @opt.strategy == :one
1717
+ feeds.each() do |feed|
1718
+ itemsize = 0
1719
+ index = nil
1720
+ for i in 0...feed.size
1721
+ itemsize += feed[i].size
1722
+ if itemsize >= @opt.itemsize
1723
+ index = i+1
1724
+ break
1725
+ end
1726
+ end
1727
+ feed.slice! index...feed.size if index
1728
+ end
1729
+ end
1730
+ #feed order
1731
+ case @opt.order
1732
+ when :random
1733
+ srand
1734
+ feeds.sort!() do |a,b|
1735
+ if a.size>0
1736
+ if b.size>0
1737
+ rand(3)-1
1738
+ else
1739
+ -1
1740
+ end
1741
+ else
1742
+ if b.size>0
1743
+ 1
1744
+ else
1745
+ 0
1746
+ end
1747
+ end
1748
+ end
1749
+ when :alphabetical
1750
+ feeds.sort!() do |a,b|
1751
+ if a.size>0
1752
+ if b.size>0
1753
+ a[0].feed_title <=> b[0].feed_title
1754
+ else
1755
+ -1
1756
+ end
1757
+ else
1758
+ if b.size>0
1759
+ 1
1760
+ else
1761
+ 0
1762
+ end
1763
+ end
1764
+ end
1765
+ when :reverse
1766
+ feeds.reverse!
1767
+ end
1768
+ #remove duplicate content
1769
+ feeds.each() do |feed|
1770
+ feed.each() do |content|
1771
+ next unless content
1772
+ dup = false
1773
+ feeds.each() do |f|
1774
+ for i in 0...f.size
1775
+ next unless f[i]
1776
+ if f[i].url == content.url
1777
+ f[i] = nil if dup
1778
+ dup = true
1779
+ end
1780
+ $stderr.puts "Removed duplicate: #{content.url}" unless f[i] or (not @opt.verbose)
1781
+ end
1782
+ end
1783
+ end
1784
+ feed.compact!
1785
+ end
1786
+ #send usage statistics
1787
+ @stats.ping @opt, feeds
1788
+ #fetch torrent metainfo files
1789
+ feeds.each() do |feed|
1790
+ feed.each() do |content|
1791
+ next if content.mime != @@TORRENT
1792
+ content.mime = nil
1793
+ begin
1794
+ $stderr.puts "Fetching torrent metainfo: #{content.url}" if @opt.verbose
1795
+ content.metainfo = RubyTorrent::MetaInfo.from_location content.url
1796
+ content.size = content.metainfo.info.length
1797
+ content.mime = case content.metainfo.info.name.downcase
1798
+ when /\.mp3$/
1799
+ "audio/mpeg"
1800
+ when /\.wma$/
1801
+ "audio/x-ms-wma"
1802
+ when /\.mpg$|\.mpeg$|\.mpe$|\.mpa$|\.mp2$|\.mpv2$/
1803
+ "video/mpeg"
1804
+ when /\.mov$|\.qt$/
1805
+ "video/quicktime"
1806
+ when /\.avi$/
1807
+ "video/x-msvideo"
1808
+ when /\.wmv$/
1809
+ "video/x-ms-wmv"
1810
+ when /\.asf$/
1811
+ "video/x-ms-asf"
1812
+ when /\.m4v$|\.mp4$|\.mpg4$/
1813
+ "video/mp4"
1814
+ else
1815
+ nil
1816
+ end
1817
+ content.url = nil unless content.mime
1818
+ content.url = nil unless (@opt.content_type =~ content.mime)
1819
+ content.url = nil unless content.metainfo.info.single?
1820
+ rescue Interrupt
1821
+ content.url = nil
1822
+ $stderr.puts "Error: unreadable torrent metainfo" if @opt.verbose
1823
+ rescue SystemExit
1824
+ exit 1
1825
+ rescue Exception
1826
+ content.url = nil
1827
+ $stderr.puts "Error: unreadable torrent metainfo" if @opt.verbose
1828
+ end
1829
+ end
1830
+ for i in 0...feed.size
1831
+ feed[i] = nil unless feed[i].url
1832
+ end
1833
+ feed.compact!
1834
+ end
1835
+ #fetch enclosures
1836
+ item = total = 0
1837
+ @cache.each() do |e|
1838
+ total+= e.size
1839
+ end
1840
+ torrents = []
1841
+ torrentfiles = []
1842
+ inc = 1
1843
+ while inc>0
1844
+ inc = 0
1845
+ itemsize = 0
1846
+ feeds.each do |e|
1847
+ #find next enclosure in feed
1848
+ content = e.shift
1849
+ unless content
1850
+ itemsize = 0
1851
+ next
1852
+ end
1853
+ #make place in cache
1854
+ while @opt.size and content.size+inc+total > @opt.size
1855
+ break if @opt.simulate
1856
+ f = @cache.shift
1857
+ break unless f
1858
+ total-= f.size
1859
+ parent = f.file.parent
1860
+ $stderr.puts "Deleting: #{f.file}" if @opt.verbose
1861
+ f.file.delete
1862
+ if parent.parent != @opt.dir and parent.entries.size == 2
1863
+ #delete empty feed subfolder
1864
+ $stderr.puts "Deleting: #{parent}" if @opt.verbose
1865
+ parent.delete
1866
+ end
1867
+ end
1868
+ unless @opt.simulate
1869
+ break if @opt.size and content.size+inc+total > @opt.size
1870
+ end
1871
+ #download
1872
+ 1.upto(@opt.retries) do |i|
1873
+ begin
1874
+ if content.metainfo
1875
+ if @opt.torrent_dir
1876
+ loop do
1877
+ content.file = @opt.torrent_dir+(Time.now.to_f.to_s+".torrent")
1878
+ break unless content.file.exist?
1879
+ sleep 1
1880
+ end
1881
+ $stderr.puts "Copying: #{content.url} to #{content.file}" if @opt.verbose and i == 1
1882
+ if not @opt.simulate
1883
+ if content.feedurl and (content.feedurl =~ %r{^http:} or content.feedurl =~ %r{^ftp:})
1884
+ open(content.url, "User-Agent" => USER_AGENT, "Referer" => content.feedurl) do |fin|
1885
+ content.file.open("wb") do |fout|
1886
+ fin.each_byte() do |b|
1887
+ fout.putc b
1888
+ end
1889
+ end
1890
+ end
1891
+ else
1892
+ open(content.url, "User-Agent" => USER_AGENT) do |fin|
1893
+ content.file.open("wb") do |fout|
1894
+ fin.each_byte() do |b|
1895
+ fout.putc b
1896
+ end
1897
+ end
1898
+ end
1899
+ end
1900
+ end
1901
+ else
1902
+ $stderr.puts "Fetching in background: #{content.url}" if @opt.verbose and i == 1
1903
+ unless @opt.simulate
1904
+ content.file = filename(content, @cache_dir)
1905
+ package = RubyTorrent::Package.new content.metainfo, content.file.to_s
1906
+ bt = RubyTorrent::BitTorrent.new content.metainfo, package, :dlratelim => nil, :ulratelim => @opt.upload_rate, :http_proxy => ENV["http_proxy"]
1907
+ torrents << bt
1908
+ torrentfiles << content
1909
+ end
1910
+ inc+= content.size
1911
+ itemsize+= content.size
1912
+ end
1913
+ else
1914
+ $stderr.puts "Fetching: #{content.url} (#{content.size.to_s} bytes)" if @opt.verbose and i == 1
1915
+ if not @opt.simulate
1916
+ headers = {"User-Agent" => USER_AGENT}
1917
+ headers["Referer"] = content.feedurl if content.feedurl and (content.feedurl =~ %r{^http:} or content.feedurl =~ %r{^ftp:})
1918
+ content.download_url = content.url unless content.download_url
1919
+ open(content.download_url, headers) do |fin|
1920
+ if fin.base_uri.instance_of?(URI::HTTP)
1921
+ if fin.status[0] =~ Regexp.new('^3')
1922
+ content.download_url = fin.meta['location']
1923
+ raise "redirecting"
1924
+ elsif fin.status[0] !~ Regexp.new('^2')
1925
+ raise 'failed'
1926
+ end
1927
+ end
1928
+ # write content to cache
1929
+ content.redirection_url = fin.base_uri.to_s # content.redirection_url is used for finding the correct filename in case of redirection
1930
+ content.redirection_url = nil if content.redirection_url.eql?(content.url)
1931
+ content.file = filename(content, @cache_dir)
1932
+ content.file.open("wb") do |fout|
1933
+ fin.each_byte() do |b|
1934
+ fout.putc b
1935
+ end
1936
+ end
1937
+ end
1938
+ content.size = content.file.size
1939
+ @history.add content
1940
+ end
1941
+ playlist.add(content)
1942
+ inc+= content.size
1943
+ itemsize+= content.size
1944
+ end
1945
+ break
1946
+ rescue Interrupt
1947
+ rescue SystemExit
1948
+ exit 1
1949
+ rescue Exception
1950
+ end
1951
+ $stderr.puts "Attempt #{i} aborted" if @opt.verbose
1952
+ if content.file and i == @opt.retries
1953
+ if content.file.exist?
1954
+ parent = content.file.parent
1955
+ content.file.delete
1956
+ if parent.parent != @opt.dir and parent.entries.size == 2
1957
+ #delete empty feed subfolder
1958
+ parent.delete
1959
+ end
1960
+ end
1961
+ content.file = nil
1962
+ end
1963
+ sleep 5
1964
+ end
1965
+ redo unless content.file # skip unavailable enclosures
1966
+ redo if @opt.itemsize > itemsize
1967
+ itemsize = 0
1968
+ end
1969
+ total+=inc
1970
+ end
1971
+ #shut down torrents
1972
+ if torrents.length > 0
1973
+ $stderr.puts "Fetching torrents (duration: 30min to a couple of hours) " if @opt.verbose
1974
+ bt = torrents[0]
1975
+ completion = torrents.collect() do |e|
1976
+ e.percent_completed
1977
+ end
1978
+ while torrents.length > 0
1979
+ sleep 30*60
1980
+ for i in 0...torrents.length
1981
+ c = torrents[i].percent_completed
1982
+ complete = torrents[i].complete?
1983
+ $stderr.puts "Fetched: #{c}% of #{torrentfiles[i].url} " if @opt.verbose
1984
+ if complete or c == completion[i]
1985
+ begin
1986
+ torrents[i].shutdown
1987
+ rescue SystemExit
1988
+ exit 1
1989
+ rescue Interrupt, Exception
1990
+ end
1991
+ if complete
1992
+ playlist.add(torrentfiles[i])
1993
+ @history.add torrentfiles[i]
1994
+ else
1995
+ $stderr.puts "Aborted: #{torrentfiles[i].url}" if @opt.verbose
1996
+ begin
1997
+ torrentfiles[i].file.delete if torrentfiles[i].file.exist?
1998
+ torrentfiles[i] = nil
1999
+ rescue Interrupt, SystemExit
2000
+ exit 1
2001
+ rescue Exception
2002
+ end
2003
+ end
2004
+ torrents[i] = nil
2005
+ torrentfiles[i] = nil
2006
+ completion[i] = nil
2007
+ next
2008
+ end
2009
+ completion[i] = c
2010
+ end
2011
+ torrents.compact!
2012
+ torrentfiles.compact!
2013
+ completion.compact!
2014
+ end
2015
+ begin
2016
+ bt.shutdown_all
2017
+ rescue Interrupt, SystemExit
2018
+ exit 1
2019
+ rescue Exception
2020
+ end
2021
+ $stderr.puts "BitTorrent stopped" if @opt.verbose
2022
+ end
2023
+ playlist.finish
2024
+ @history.trim(@opt.memsize) unless @opt.simulate or @opt.strategy == :cache
2025
+ playlist.to_s
2026
+ end
2027
+ private
2028
+ def fetchdoc(link)
2029
+ doc = ""
2030
+ 1.upto(@opt.retries) do |i|
2031
+ begin
2032
+ if link.url =~ %r{^http:} or link.url =~ %r{^ftp:}
2033
+ if link.referrer and (link.referrer =~ %r{^http:} or link.referrer =~ %r{^ftp:})
2034
+ open(link.url, "User-Agent" => USER_AGENT, "Referer" => link.referrer) do |f|
2035
+ break if f.content_type.index "audio/"
2036
+ break if f.content_type.index "video/"
2037
+ f.each_line() do |e|
2038
+ doc += e
2039
+ end
2040
+ end
2041
+ else
2042
+ open(link.url, "User-Agent" => USER_AGENT) do |f|
2043
+ break if f.content_type.index "audio/"
2044
+ break if f.content_type.index "video/"
2045
+ f.each_line() do |e|
2046
+ doc += e
2047
+ end
2048
+ end
2049
+ end
2050
+ else
2051
+ open(link.url) do |f|
2052
+ f.each_line() do |e|
2053
+ doc += e
2054
+ end
2055
+ end
2056
+ end
2057
+ break
2058
+ rescue Interrupt
2059
+ rescue SystemExit
2060
+ exit 1
2061
+ rescue Exception
2062
+ end
2063
+ $stderr.puts "Attempt #{i} aborted" if @opt.verbose
2064
+ doc = ""
2065
+ sleep 5
2066
+ end
2067
+ res = OpenStruct.new
2068
+ begin
2069
+ res.dom = Document.new doc
2070
+ rescue Exception
2071
+ end
2072
+ if res.dom
2073
+ res.url = link.url
2074
+ else
2075
+ res = nil
2076
+ end
2077
+ res
2078
+ end
2079
+ def filename(content, dir) #produce filename for content to be downloaded
2080
+ begin #per-feed subfolder
2081
+ if @opt.per_feed and content.feed_title and content.feed_title.size > 0
2082
+ newdir = dir+content.feed_title
2083
+ newdir = dir+content.feed_title.gsub(/[\\\/:*?\"<>|!]/, ' ').gsub(/-+/,'-').gsub(/\s+/,' ').strip if @opt.restricted_names
2084
+ if newdir.exist?
2085
+ if newdir.directory?
2086
+ dir = newdir
2087
+ end
2088
+ else
2089
+ newdir.mkdir
2090
+ dir = newdir
2091
+ end
2092
+ end
2093
+ rescue Exception
2094
+ # $stderr.puts "error: #{$!}"
2095
+ end
2096
+ ext = [""]
2097
+ if content.metainfo
2098
+ begin
2099
+ ext = ["."+content.metainfo.info.name.split(".").reverse[0]]
2100
+ rescue Exception
2101
+ end
2102
+ else
2103
+ ext = case content.mime.downcase
2104
+ when "audio/mpeg"
2105
+ [".mp3"]
2106
+ when "audio/x-mpeg"
2107
+ [".mp3"]
2108
+ when "audio/x-ms-wma"
2109
+ [".wma"]
2110
+ when "audio/x-m4a"
2111
+ [".m4a"]
2112
+ when "video/mpeg"
2113
+ [".mpg",".mpeg",".mpe",".mpa",".mp2",".mpv2"]
2114
+ when "video/quicktime"
2115
+ [".mov",".qt"]
2116
+ when "video/x-msvideo"
2117
+ [".avi"]
2118
+ when "video/x-ms-wmv"
2119
+ [".wmv"]
2120
+ when "video/x-ms-asf"
2121
+ [".asf"]
2122
+ when "video/mp4"
2123
+ [".m4v", ".mp4",".mpg4"]
2124
+ when "video/x-m4v"
2125
+ [".m4v", ".mp4",".mpg4"]
2126
+ else
2127
+ [""]
2128
+ end
2129
+ end
2130
+ #name from url?
2131
+ name = nil
2132
+ begin
2133
+ if content.metainfo
2134
+ name = content.metainfo.info.name
2135
+ name = nil if (dir+name).exist?
2136
+ else
2137
+ urlname = nil
2138
+ urlname = URI.split(content.redirection_url)[5].split("/")[-1] if content.redirection_url
2139
+ urlname = URI.split(content.url)[5].split("/")[-1] unless urlname
2140
+ ext.each() do |e|
2141
+ if e.length == 0 or urlname[-e.length..-1].downcase == e
2142
+ name = urlname
2143
+ name = URI.unescape(name)
2144
+ name = nil if (dir+name).exist?
2145
+ break if name
2146
+ end
2147
+ end
2148
+ end
2149
+ rescue Exception
2150
+ end
2151
+ #unique name?
2152
+ loop do
2153
+ name = Time.now.to_f.to_s+ext[0]
2154
+ break unless (dir+name).exist?
2155
+ sleep 1
2156
+ end unless name
2157
+ dir+name
2158
+ end
2159
+ end
2160
+ class OPML
2161
+ def initialize(title = nil)
2162
+ @doc = Document.new
2163
+ @doc.xml_decl.dowrite
2164
+ @doc.add_element Element.new("opml")
2165
+ @doc.root.add_attribute "version", "1.1"
2166
+ head = Element.new("head")
2167
+ @doc.root.add_element head
2168
+ if title
2169
+ titlee = Element.new("title")
2170
+ titlee.text = title
2171
+ head.add_element titlee
2172
+ end
2173
+ @body = Element.new("body")
2174
+ @doc.root.add_element @body
2175
+ @size = 0
2176
+ end
2177
+ def add(feedurl, text=nil)
2178
+ e = Element.new("outline")
2179
+ e.add_attribute("text", text) if text
2180
+ e.add_attribute "type", "link"
2181
+ e.add_attribute "url", feedurl
2182
+ @body.add_element e
2183
+ @size += 1
2184
+ end
2185
+ def write()
2186
+ @doc.write $stdout, 0
2187
+ end
2188
+ def size()
2189
+ @size
2190
+ end
2191
+ end
2192
+
2193
+ class Query
2194
+ def initialize(opt, query)
2195
+ @@ATOM_NS = Regexp.new '^http://purl.org/atom/ns#'
2196
+ @@ITUNES_NS = 'http://www.itunes.com/dtds/podcast-1.0.dtd'
2197
+ @opt = opt
2198
+ if query
2199
+ @query = query.downcase.split
2200
+ @query = nil if @query.size == 0
2201
+ end
2202
+ @stats = Stats.new opt.dir
2203
+ end
2204
+ def search(urls)
2205
+ res = []
2206
+ begin
2207
+ newpaths = []
2208
+ dochistory = []
2209
+ paths = []
2210
+ if urls.size == 0
2211
+ $stderr.puts "Reading subscriptions from standard input" if @opt.verbose
2212
+ begin
2213
+ xml = ""
2214
+ $stdin.each() do |e|
2215
+ xml += e
2216
+ end
2217
+ path = OpenStruct.new
2218
+ path.doc = Document.new(xml)
2219
+ if path.doc and path.doc.root
2220
+ path.relevance = 0
2221
+ newpaths << path
2222
+ end
2223
+ rescue Interrupt, SystemExit
2224
+ raise
2225
+ rescue Exception
2226
+ $stderr.puts "Error: unreadable subscriptions"
2227
+ end
2228
+ else
2229
+ newpaths = urls.uniq.collect() do |e|
2230
+ path = OpenStruct.new
2231
+ path.url = e
2232
+ path
2233
+ end
2234
+ newpaths = newpaths.collect() do |path|
2235
+ $stderr.puts "Fetching: #{path.url}" if @opt.verbose
2236
+ dochistory << path.url
2237
+ path.doc = fetchdoc(path)
2238
+ if path.doc
2239
+ path.relevance = 0
2240
+ path
2241
+ else
2242
+ $stderr.puts "Skipping unreadable document" if @opt.verbose
2243
+ nil
2244
+ end
2245
+ end
2246
+ newpaths.compact!
2247
+ end
2248
+ #send usage statistics
2249
+ @stats.ping_search @opt, @query.join(' ')
2250
+ #
2251
+ loop do
2252
+ break if @opt.feeds and res.size >= @opt.feeds
2253
+ begin
2254
+ newpaths.sort!() do |path1, path2|
2255
+ path2.relevance <=> path1.relevance
2256
+ end
2257
+ paths = newpaths + paths
2258
+ newpaths = []
2259
+ path = nil
2260
+ loop do
2261
+ path = paths.shift
2262
+ break unless path
2263
+ if path.doc
2264
+ break
2265
+ else
2266
+ if dochistory.detect{|e| e == path.url}
2267
+ $stderr.puts "Skipping duplicate: #{path.url}" if @opt.verbose
2268
+ next
2269
+ end
2270
+ $stderr.puts "Fetching: #{path.url}" if @opt.verbose
2271
+ dochistory << path.url
2272
+ path.doc = fetchdoc(path)
2273
+ if path.doc
2274
+ break
2275
+ end
2276
+ $stderr.puts "Error: skipping unreadable document"
2277
+ end
2278
+ end
2279
+ break unless path
2280
+ if path.doc.root.name == "opml"
2281
+ #doc relevance
2282
+ path.relevance += relevance_of(XPath.first(path.doc, "/opml/head/title/text()"))
2283
+ #outgoing links
2284
+ XPath.each(path.doc,"//outline") do |outline|
2285
+ url = outline.attributes["xmlUrl"]
2286
+ url = outline.attributes["url"] unless url
2287
+ next unless url
2288
+ begin
2289
+ url = URI.parse(path.url).merge(url).to_s if path.url
2290
+ rescue Interrupt, SystemExit
2291
+ raise
2292
+ rescue Exception
2293
+ end
2294
+ newpath = OpenStruct.new
2295
+ newpath.url = url
2296
+ newpath.referrer = path.url
2297
+ #link relevance
2298
+ newpath.relevance = path.relevance
2299
+ XPath.each(outline, "ancestor-or-self::outline") do |e|
2300
+ newpath.relevance += relevance_of(e.attributes["text"])
2301
+ end
2302
+ #
2303
+ newpaths << newpath
2304
+ end
2305
+ elsif path.doc.root.name == "pcast"
2306
+ #outgoing links
2307
+ XPath.each(path.doc,"/pcast/channel") do |channel|
2308
+ link = XPath.first(channel, "link[@rel='feed']")
2309
+ next unless link
2310
+ url = link.attributes["href"]
2311
+ next unless url
2312
+ begin
2313
+ url = URI.parse(path.url).merge(url).to_s if path.url
2314
+ rescue Interrupt, SystemExit
2315
+ raise
2316
+ rescue Exception
2317
+ end
2318
+ newpath = OpenStruct.new
2319
+ newpath.url = url
2320
+ newpath.referrer = path.url
2321
+ #link relevance
2322
+ newpath.relevance = path.relevance
2323
+ newpath.relevance += relevance_of(XPath.first(channel, "title/text()"))
2324
+ newpath.relevance += relevance_of(XPath.first(channel, "subtitle/text()"))
2325
+ #
2326
+ newpaths << newpath
2327
+ end
2328
+ elsif path.doc.root.namespace =~ @@ATOM_NS and path.url
2329
+ #doc relevance
2330
+ title = nil
2331
+ begin
2332
+ XPath.each(path.doc.root,"/*/*") do |e|
2333
+ next unless e.namespace =~ @@ATOM_NS
2334
+ next unless e.name == "title" or e.name == "subtitle"
2335
+ title = e.text if e.name == "title"
2336
+ path.relevance += relevance_of(e.text)
2337
+ end
2338
+ rescue Interrupt, SystemExit
2339
+ raise
2340
+ rescue Exception
2341
+ #$stderr.puts "error: #{$!}"
2342
+ end
2343
+ if path.relevance > 0
2344
+ $stderr.puts "Found: #{title} (relevance: #{path.relevance})" if @opt.verbose
2345
+ if title
2346
+ path.title = ""
2347
+ title.value.each_line() do |e3| #remove line breaks
2348
+ path.title+= e3.chomp+" "
2349
+ end
2350
+ path.title.strip!
2351
+ end
2352
+ res << path
2353
+ end
2354
+ elsif path.doc.root.name = "rss" and path.url
2355
+ #doc relevance
2356
+ title = XPath.first(path.doc, "//channel/title/text()")
2357
+ path.relevance += relevance_of(title)
2358
+ path.relevance += relevance_of(XPath.first(path.doc, "//channel/description/text()"))
2359
+ begin
2360
+ XPath.each(path.doc.root,"//channel/*") do |e|
2361
+ next unless e.name == "category"
2362
+ if e.namespace == @@ITUNES_NS
2363
+ XPath.each(e, "descendant-or-self::*") do |e2|
2364
+ next unless e2.name == "category"
2365
+ path.relevance += relevance_of(e2.attributes["text"])
2366
+ end
2367
+ else
2368
+ path.relevance += relevance_of(e.text)
2369
+ end
2370
+ end
2371
+ rescue Interrupt, SystemExit
2372
+ raise
2373
+ rescue Exception
2374
+ #$stderr.puts "error: #{$!}"
2375
+ end
2376
+ if path.relevance > 0
2377
+ $stderr.puts "Found: #{title} (relevance: #{path.relevance})" if @opt.verbose
2378
+ if title
2379
+ path.title = ""
2380
+ title.value.each_line() do |e3| #remove line breaks
2381
+ path.title+= e3.chomp+" "
2382
+ end
2383
+ path.title.strip!
2384
+ end
2385
+ res << path
2386
+ end
2387
+ end
2388
+ rescue Interrupt, SystemExit
2389
+ raise
2390
+ rescue Exception
2391
+ $stderr.puts "Error: skipping unreadable document"
2392
+ end
2393
+ end
2394
+ rescue Interrupt, SystemExit
2395
+ $stderr.puts "Execution interrupted"
2396
+ rescue Exception
2397
+ end
2398
+ result = nil
2399
+ while not result
2400
+ begin
2401
+ res.sort!() do |path1, path2|
2402
+ path2.relevance <=> path1.relevance
2403
+ end
2404
+ opml = OPML.new "Search results for \"#{@query.collect(){|e| "#{e} "}}\""
2405
+ res.each() do |path|
2406
+ opml.add path.url, path.title
2407
+ end
2408
+ result = opml
2409
+ rescue Exception
2410
+ end
2411
+ end
2412
+ result.write
2413
+ result
2414
+ end
2415
+ private
2416
+ def relevance_of(meta)
2417
+ return 0 unless meta
2418
+ unless meta.kind_of? String #Text todo: resolve entities
2419
+ meta = meta.value
2420
+ end
2421
+ meta = meta.downcase
2422
+ meta = meta.split
2423
+ res = 0
2424
+ @query.each() do |e|
2425
+ meta.each() do |e2|
2426
+ res += 1 if e2.index(e)
2427
+ end
2428
+ end
2429
+ res
2430
+ end
2431
+ def fetchdoc(link)
2432
+ doc = ""
2433
+ 1.upto(@opt.retries) do |i|
2434
+ begin
2435
+ if link.url =~ %r{^http:} or link.url =~ %r{^ftp:}
2436
+ if link.referrer and (link.referrer =~ %r{^http:} or link.referrer =~ %r{^ftp:})
2437
+ open(link.url, "User-Agent" => USER_AGENT, "Referer" => link.referrer) do |f|
2438
+ break if f.content_type.index "audio/"
2439
+ break if f.content_type.index "video/"
2440
+ f.each_line() do |e|
2441
+ doc += e
2442
+ end
2443
+ end
2444
+ else
2445
+ open(link.url, "User-Agent" => USER_AGENT) do |f|
2446
+ break if f.content_type.index "audio/"
2447
+ break if f.content_type.index "video/"
2448
+ f.each_line() do |e|
2449
+ doc += e
2450
+ end
2451
+ end
2452
+ end
2453
+ else
2454
+ open(link.url) do |f|
2455
+ f.each_line() do |e|
2456
+ doc += e
2457
+ end
2458
+ end
2459
+ end
2460
+ break
2461
+ rescue Interrupt
2462
+ rescue SystemExit
2463
+ break
2464
+ rescue Exception
2465
+ end
2466
+ $stderr.puts "Attempt #{i} aborted" if @opt.verbose
2467
+ doc = ""
2468
+ sleep 5
2469
+ end
2470
+ res = nil
2471
+ begin
2472
+ res = Document.new doc
2473
+ rescue Exception
2474
+ end
2475
+ res = nil unless res and res.root
2476
+ res
2477
+ end
2478
+ end
2479
+
2480
+ opt.size *= 1_000_000 if opt.size
2481
+ opt.upload_rate *= 1024 if opt.upload_rate
2482
+ opt.itemsize *= 1_000_000
2483
+ arguments = arguments + ARGV
2484
+
2485
+ if opt.check_for_update
2486
+ $stderr.puts "Enabling update check." if opt.verbose
2487
+ end
2488
+
2489
+ if opt.vote
2490
+ $stderr.puts "Enabling the sending of anonymous usage statistics." if opt.verbose
2491
+ end
2492
+
2493
+ begin
2494
+ require "rubytorrent"
2495
+ opt.rubytorrent = true
2496
+ $stderr.puts "RubyTorrent detected." if opt.verbose
2497
+ rescue Interrupt, SystemExit
2498
+ exit 1
2499
+ rescue Exception
2500
+ end
2501
+
2502
+ if opt.function == :download
2503
+ cache = Cache.new opt
2504
+ cache.createplaylist arguments
2505
+ elsif opt.function == :search
2506
+ dir = Query.new opt, arguments.shift
2507
+ dir.search arguments
2508
+ end
2509
+
2510
+ if opt.check_for_update
2511
+ update = Update.new opt.dir
2512
+ update.check
2513
+ end
2514
+
2515
+ if opt.verbose and false
2516
+ $stderr.puts ""
2517
+ $stderr.puts " *********************************************************************"
2518
+ $stderr.puts " **** Qworum - A platform for web-based services (sponsor) ****"
2519
+ $stderr.puts " *********************************************************************"
2520
+ $stderr.puts " **** Sell and buy services: ****"
2521
+ $stderr.puts " **** Host services on your own domain; sell them to websites ****"
2522
+ $stderr.puts " **** or businesses on the service marketplace. ****"
2523
+ $stderr.puts " **** ****"
2524
+ $stderr.puts " **** Build enterprise information systems: ****"
2525
+ $stderr.puts " **** Use Qworum in your information system, and enjoy the ****"
2526
+ $stderr.puts " **** benefits of a powerful SOA technology. ****"
2527
+ $stderr.puts " **** ****"
2528
+ $stderr.puts " **** Learn more at http://www.qworum.com/ ****"
2529
+ $stderr.puts " *********************************************************************"
2530
+ $stderr.puts ""
2531
+ end
2532
+
2533
+ $stderr.puts "End of podcatching session." if opt.verbose
2534
+
2535
+