podcatcher 3.1.8
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/MIT-LICENSE +20 -0
- data/README.txt +249 -0
- data/bin/podcatcher +2535 -0
- metadata +53 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 7a4cb7f4491ccad25c92672d133f6d77a6684361
|
4
|
+
data.tar.gz: 9823d1629cf7e1cbd812958c444f7d273c4aa5d7
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 4e16316c59fc2fae8d074f100fbd93f419d140ce5d1f07edc9976f8a52a7dcc54cf78fdef4d43f907828b68bd52d7b391460a875ee769c961653df293bfca545
|
7
|
+
data.tar.gz: 58695c4f838a1e7aa688d36ce6b48f4c0b898f7776a66b306f7b1412cc0fb22d2f4bd094912d0601407b3296cc5d4fedd18fe370faa1359fc6b72b0098396fc0
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright 2016 Doga Armangil
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.txt
ADDED
@@ -0,0 +1,249 @@
|
|
1
|
+
|
2
|
+
ARMANGIL'S PODCATCHER
|
3
|
+
=====================
|
4
|
+
|
5
|
+
Armangil's podcatcher is a podcast client for the command line.
|
6
|
+
It can download any type of content enclosed in RSS or Atom files, such as
|
7
|
+
MP3 or other audio content, video and images. A search function for
|
8
|
+
subscribing to feeds is also included. It provides several download
|
9
|
+
strategies, supports BitTorrent, offers cache management, and generates
|
10
|
+
playlists for media player applications.
|
11
|
+
|
12
|
+
As argument, it accepts feeds (RSS or Atom) or subscription lists
|
13
|
+
(OPML or iTunes PCAST), in the form of filenames or URLs (HTTP or FTP).
|
14
|
+
Alternatively, it accepts one feed or subscription list from the standard
|
15
|
+
input.
|
16
|
+
|
17
|
+
BitTorrent is supported both internally (through the RubyTorrent library)
|
18
|
+
and externally (.torrent files are downloaded, but the user handles
|
19
|
+
them using a BitTorrent application). The latter is currently the most
|
20
|
+
reliable method, as RubyTorrent is still in alpha phase.
|
21
|
+
|
22
|
+
Concurrency is not handled: simultaneous executions of this program should
|
23
|
+
target different directories.
|
24
|
+
|
25
|
+
Visit https://github.com/doga/podcatcher for more information.
|
26
|
+
|
27
|
+
Usage: podcatcher [options] [arguments]
|
28
|
+
|
29
|
+
Options:
|
30
|
+
-d, --dir DIR Directory for storing application state.
|
31
|
+
Default value is current directory.
|
32
|
+
-D, --cachedir DIR Directory for storing downloaded content.
|
33
|
+
Default value is the 'cache' subdirectory
|
34
|
+
of the state directory (specified by
|
35
|
+
the --dir option).
|
36
|
+
This option is ignored if this directory
|
37
|
+
is inside the state directory, or if the
|
38
|
+
state directory is inside this directory.
|
39
|
+
-s, --size SIZE Size, in megabytes, of the cache directory
|
40
|
+
(specified by the --cachedir option).
|
41
|
+
0 means unbounded. Default value is 512.
|
42
|
+
This option also sets the upper limit for
|
43
|
+
the amount of content that can be downloaded
|
44
|
+
in one session.
|
45
|
+
Content downloaded during previous sessions
|
46
|
+
may be deleted by podcatcher in order to
|
47
|
+
make place for new content.
|
48
|
+
-e, --[no-]empty Empty the cache directory before
|
49
|
+
downloading content.
|
50
|
+
-p, --[no-]perfeed Create one subdirectory per feed
|
51
|
+
in the cache directory.
|
52
|
+
-S, --strategy S Strategy to use when downloading content:
|
53
|
+
* back_catalog: download any content that
|
54
|
+
has not been downloaded before; prefer
|
55
|
+
recent content to older content (may
|
56
|
+
download more than one content file per
|
57
|
+
feed),
|
58
|
+
* one: download one content file (not
|
59
|
+
already downloaded) for each feed, with a
|
60
|
+
preference for recent content,
|
61
|
+
* all: download all content, with a
|
62
|
+
preference for recent content; even
|
63
|
+
already downloaded content is downloaded
|
64
|
+
once again (may download more than one
|
65
|
+
content file per feed),
|
66
|
+
* chron: download in chronological order
|
67
|
+
any content that has not been downloaded
|
68
|
+
before; this is useful for audiobook
|
69
|
+
podcasts etc (may download more than one
|
70
|
+
content file per feed),
|
71
|
+
* chron_one: download the oldest content of
|
72
|
+
each feed that has not already been
|
73
|
+
downloaded,
|
74
|
+
* chron_all: download all content in
|
75
|
+
chronological order, even if the content
|
76
|
+
has already been downloaded (may download
|
77
|
+
more than one content file per feed),
|
78
|
+
* new: download the most recent content
|
79
|
+
of each feed, if it has not already been
|
80
|
+
downloaded (DEPRECATED: use 'one' instead
|
81
|
+
of 'new'),
|
82
|
+
* cache: generate a playlist for content
|
83
|
+
already in cache.
|
84
|
+
Default value is one.
|
85
|
+
-C, --content REGEXP A regular expression that matches the
|
86
|
+
MIME types of content to be downloaded.
|
87
|
+
Examples: '^video/', '^audio/mpeg$'.
|
88
|
+
Default value is '', which matches any
|
89
|
+
type of content.
|
90
|
+
-l, --language LANG A list of language tags separated by
|
91
|
+
commas. Examples: 'en-us,de', 'fr'.
|
92
|
+
A feed whose language does not match
|
93
|
+
this list is ignored. By default, all
|
94
|
+
feeds are accepted. See
|
95
|
+
http://cyber.law.harvard.edu/rss/languages.html
|
96
|
+
and
|
97
|
+
http://cyber.law.harvard.edu/rss/rss.html#optionalChannelElements
|
98
|
+
for allowed tags.
|
99
|
+
-H, --horizon DATE Do not download content older than
|
100
|
+
the given date. The date has the format
|
101
|
+
yyyy.mm.dd (example: 2007.03.22) or
|
102
|
+
yyyy.mm (equivalent to yyyy.mm.01) or
|
103
|
+
yyyy (equivalent to yyyy.01.01).
|
104
|
+
By default, no horizon is specified.
|
105
|
+
-r, --retries N Try downloading files (content, feeds
|
106
|
+
or subscription lists) at most N times
|
107
|
+
before giving up. Default value is 1.
|
108
|
+
-t, --type TYPE Type of the playlist written to
|
109
|
+
standard output. Accepted values are
|
110
|
+
m3u, smil, pls, asx, tox, xspf.
|
111
|
+
Default value is m3u.
|
112
|
+
-m, --memsize N Remember last N downloaded content,
|
113
|
+
and do not download them again.
|
114
|
+
0 means unbounded. Default value is 1000.
|
115
|
+
-o, --order ORDER The order in which feeds are traversed
|
116
|
+
when downloading content:
|
117
|
+
* random: randomizes the feed order,
|
118
|
+
so that every feed has an equal chance
|
119
|
+
when content is downloaded, even if
|
120
|
+
the cache size is small and the number
|
121
|
+
of feeds is big,
|
122
|
+
* alphabetical: orders feeds
|
123
|
+
alphabetically by using their titles,
|
124
|
+
* sequential: preserves the argument
|
125
|
+
order (and the feed order in
|
126
|
+
subscription lists),
|
127
|
+
* reverse: reverses the feed order.
|
128
|
+
Default value is random.
|
129
|
+
-F, --function FUNCTION Used function:
|
130
|
+
* download: downloads content from
|
131
|
+
specified feeds,
|
132
|
+
* search: generates an OPML subscription
|
133
|
+
list of feeds matching the specified
|
134
|
+
query; the only options relevant for
|
135
|
+
search are -v, -r and -f.
|
136
|
+
Default value is download.
|
137
|
+
-f, --feeds N Do not download more than N feeds
|
138
|
+
(when using the download function),
|
139
|
+
or return the first N relevant feeds
|
140
|
+
(when using the search function).
|
141
|
+
0 means unbounded. Default value is 1000.
|
142
|
+
-T, --torrentdir DIR Copy torrent files to directory DIR.
|
143
|
+
The handling of torrents through an
|
144
|
+
external BitTorrent client is left to
|
145
|
+
the user. If this option is not used,
|
146
|
+
torrents are handled internally (if
|
147
|
+
RubyTorrent is installed), or else
|
148
|
+
ignored.
|
149
|
+
-U, --uploadrate N Maximum upload rate (kilobytes per second)
|
150
|
+
for the internal BitTorrent client.
|
151
|
+
Unbounded by default.
|
152
|
+
-i, --itemsize N If downloaded content is less than N MB in
|
153
|
+
size (where N is an integer), fetch other
|
154
|
+
content of that same feed until this size
|
155
|
+
is reached.
|
156
|
+
Default value is 0.
|
157
|
+
The intent here is to ensure that podcatcher
|
158
|
+
downloads about as much content from podcasts
|
159
|
+
that frequently post small content (in
|
160
|
+
terms of minutes) as it does from podcasts
|
161
|
+
that post bigger content less frequently.
|
162
|
+
This option was more relevant in the early
|
163
|
+
days of podcasting when content size varied
|
164
|
+
greatly from one podcast to another. You
|
165
|
+
would rarely need to use this option today.
|
166
|
+
-c, --[no-]cache Generate a playlist for content
|
167
|
+
already in cache.
|
168
|
+
DEPRECATED, use '--strategy cache'.
|
169
|
+
-a, --[no-]asif Do not download content, only download
|
170
|
+
feeds and subscription lists.
|
171
|
+
Useful for testing.
|
172
|
+
-v, --[no-]verbose Run verbosely.
|
173
|
+
-V, --version Display current version and exit.
|
174
|
+
-h, --help Display this message and exit.
|
175
|
+
--[no-]restrictednames In the cache directory, make the names of
|
176
|
+
created subdirectories and files acceptable
|
177
|
+
for restrictive file systems such as VFAT
|
178
|
+
and FAT, which are used on Windows and MP3
|
179
|
+
player devices.
|
180
|
+
Enabled by default.
|
181
|
+
-A, --arguments FILENAME_OR_URL Read arguments from specified file.
|
182
|
+
Rules:
|
183
|
+
* accepts one argument per line,
|
184
|
+
* ignores empty lines and lines starting
|
185
|
+
with #,
|
186
|
+
* this option may be used several times
|
187
|
+
in one command.
|
188
|
+
-O, --options FILENAME_OR_URL Read options from specified file.
|
189
|
+
The options file uses the YAML format.
|
190
|
+
|
191
|
+
Usage examples:
|
192
|
+
|
193
|
+
podcatcher http://feeds.feedburner.com/Ruby5
|
194
|
+
|
195
|
+
podcatcher -O options.yaml -A feeds.txt
|
196
|
+
|
197
|
+
podcatcher --dir ~/podcasts http://www.npr.org/podcasts.opml
|
198
|
+
|
199
|
+
podcatcher --dir ~/podcasts --strategy cache > cache.m3u
|
200
|
+
|
201
|
+
cat feeds.opml | podcatcher --dir ~/podcasts > latest.m3u
|
202
|
+
|
203
|
+
podcatcher -vd ~/podcasts -s 500 -m 10_000 -t tox feeds.opml > latest.tox
|
204
|
+
|
205
|
+
podcatcher -vF search news http://www.bbc.co.uk/podcasts.opml > bbc_news.opml
|
206
|
+
|
207
|
+
podcatcher -F search -f 12 news http://www.npr.org/podcasts.opml > npr_news.opml
|
208
|
+
|
209
|
+
|
210
|
+
Requirements
|
211
|
+
------------
|
212
|
+
Ruby 1.8.2 or later.
|
213
|
+
|
214
|
+
|
215
|
+
Installation
|
216
|
+
------------
|
217
|
+
1. Install the most recent Ruby distribution. Ruby is available on many
|
218
|
+
operating systems such as Windows, MacOS and Linux. A good starting point
|
219
|
+
is http://www.ruby-lang.org/en/ , and for Linux it is worth taking a look
|
220
|
+
at an RPM repository such as http://www.rpmseek.com/ (package name ruby).
|
221
|
+
|
222
|
+
2. Extract to disk the podcatcher directory from the TGZ file.
|
223
|
+
|
224
|
+
3. (Optional, for internal BitTorrent support) Download the most recent
|
225
|
+
RubyTorrent release from http://rubyforge.org/projects/rubytorrent/ ,
|
226
|
+
add its installation directory to $RUBYLIB (for Linux).
|
227
|
+
|
228
|
+
4. (Optional, for Linux users) Add the podcatcher/bin subdirectory to $PATH.
|
229
|
+
|
230
|
+
|
231
|
+
Support
|
232
|
+
-------
|
233
|
+
Please use https://github.com/doga/podcatcher for bug reports
|
234
|
+
and feature requests.
|
235
|
+
|
236
|
+
Alternatively, you can send me an email to the address listed below.
|
237
|
+
|
238
|
+
|
239
|
+
License
|
240
|
+
-------
|
241
|
+
Armangil's podcatcher is released under the GNU General Public Licence.
|
242
|
+
Please see http://opensource.org/licenses/gpl-license.php for more information.
|
243
|
+
|
244
|
+
|
245
|
+
Author
|
246
|
+
------
|
247
|
+
Doga Armangil, armangild@yahoo.com
|
248
|
+
|
249
|
+
[November 2014]
|
data/bin/podcatcher
ADDED
@@ -0,0 +1,2535 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#:mode=ruby:
|
3
|
+
|
4
|
+
# This program is released under the GNU General Public Licence. Please see
|
5
|
+
# http://opensource.org/licenses/gpl-license.php for more information.
|
6
|
+
# Author: Doga Armangil, armangild@yahoo.com
|
7
|
+
|
8
|
+
PODCATCHER_WEBSITE = 'https://github.com/doga/podcatcher'
|
9
|
+
PODCATCHER_VERSION = '3.1.8'
|
10
|
+
|
11
|
+
# todo: allow files to be selected not only by its MIME type, but also other attributes. Example: --content '^video/ width:680-1024 height:400'
|
12
|
+
# todo: --proxy option
|
13
|
+
# todo: download at most one enclosure or media:content per rss item
|
14
|
+
# todo: support for --content and --language options in search mode
|
15
|
+
# todo: code refactoring: do not duplicate option handling for 'options' option, factor out conversion between MIME type and file extension, avoid code duplication between implementations of download and search functions
|
16
|
+
# todo: "item search" - search function that generates a feed containing relevant items of feeds (":item" or ":show" ?)
|
17
|
+
# todo: option to specify share ratio for torrents
|
18
|
+
# todo: symlink support in directory (for history, cache etc)
|
19
|
+
# todo: improve playlist generation when using --strategy cache (only include audio and video content)
|
20
|
+
# todo: improve --feeds implementation
|
21
|
+
# todo: resuming of failed media downloads
|
22
|
+
# todo: --subscriptions option (subscription d/l limit)
|
23
|
+
# todo: informative exception messages
|
24
|
+
# todo: only fetch bittorrent metainfo for d/l candidates
|
25
|
+
# todo: option to download shows concurrently
|
26
|
+
# todo: "lock" directory to prevent concurrency issues
|
27
|
+
# todo: option to throttle non-BitTorrent downloads
|
28
|
+
# 3.1.8: make podcatcher a Ruby gem
|
29
|
+
# 3.1.7: move the code repository from rubyforge to github, remove sponsor message, disable voting and checking for updates by default
|
30
|
+
# 3.1.6alpha: fixes a bug whereby a failed content download caused all other content from the same feed to be ignored
|
31
|
+
# 3.1.5: updated --arguments file format (# now comments out line), updated sponsor message
|
32
|
+
# 3.1.4: added publication date to content titles in generated playlists, added better handling of invalid URLs in feeds and subscription lists (such URLs are now simply ignored instead of causing the whole document to be skipped)
|
33
|
+
# 3.1.3: --restrictednames option is now enabled by default, fixed directory name generation bug that allowed '!' character when --perfeed and --restrictednames options were used simultaneously, updated sponsor message
|
34
|
+
# 3.1.2: modified the help text that appears when --help option is used, updated sponsor message
|
35
|
+
# 3.1.1: fixed a bug in verbose mode that caused content to be listed twice if it is declared as both RSS enclosure and Media RSS content, changed the sponsor message
|
36
|
+
# 3.1.0: added support for yyyy and yyyy.mm formats for --horizon parameter
|
37
|
+
# 3.0.0: added the --cachedir option for explicitely specifying cache directory, added --language option for selecting feeds by language, added the --horizon option that prevents the downloading of content older than a given date, added --restrictednames option for using content subdirectory and file names that are acceptable for restrictive filesystems such as VFAT, http://search.yahoo.com/mrss is now accepted as namespace for RSS Media module, fixed a bug in update checking (flash now only appears if podcatcherstats version is newer than current one), fixed a bug that caused votes to be sent for feeds that have file URLs or filenames.
|
38
|
+
# 2.0.1: fixed Yahoo Media RSS module handling bug
|
39
|
+
# 2.0.0: fixed a bug that caused the generation of invalid playlists for feeds containing control characters (such as Ctrl-M) in their title or in the title of one of its entries, added --order option that determines feed order, changed default feed order from 'sequential' to 'random', all content is downloaded by default (not only MP3), changed default cache size to 512MB, added support for the Yahoo Media RSS module (http://search.yahoo.com/mrss), added strategies for downloading content in chronological order (chron_one, chron, chron_all), added -C option that specifies the types of content that are to be received (overrides the default types), added -o option for reading options from a file, added -A option for reading arguments from a file, changed the default download strategy to 'one', added -V alias for --version option, fixed a bug that caused the order of feeds to be ignored in OPML files, fixed a bug that caused downloads of some video files to fail in vodcatcher mode, added --checkforupdate option for informing the user when a new version is available, added --vote option for voting in favour of downloaded podcasts at podcatcherstats.com
|
40
|
+
# 1.3.7: added status code and content type check when downloading a media file using HTTP, removed some debugging comments
|
41
|
+
# 1.3.5: fixed a bug that caused wrong cache filenames to be generated when an HTTP redirection was received from a server, added Operating System and processor information to the User-Agent HTTP header sent to web servers
|
42
|
+
# 1.3.4: fixed the help message
|
43
|
+
# 1.3.3: added the -p option that assigns a separate cache subfolder to each feed
|
44
|
+
# 1.3.2: bug fix
|
45
|
+
# 1.3.1: added robust handling of subscription lists that directly link to media files (such links are now ignored), fixed an OPML generation bug for interrupted searches
|
46
|
+
# 1.3.0: added search function for online podcast directories such as the iPodder podcast directory, added xspf support
|
47
|
+
# 1.2.0: added support for decentralized subscription lists (i.e. subscription lists that point to other subscription lists), fixed a bug that sometimes caused an invalid Referer header to be sent in HTTP requests, added the -f option, added support for Atom feeds that do not list items in reverse chronological order, added support for RSS/Atom feeds as command line arguments, added support for Extended M3U and Extended PLS playlist formats, M3U playlists can now also be generated in vodcatcher mode, m3u is now the default type in vodcatcher mode, added "cache" strategy which deprecates -c option
|
48
|
+
# 1.1.1: added support for iTunes .pcast subscription files
|
49
|
+
# 1.1.0: names of media files downloaded via BitTorrent are now preserved, done some refactoring so that the script can function as a vodcatcher
|
50
|
+
# 1.0.4: added support for RSS feeds that do not list items in reverse chronological order
|
51
|
+
# 1.0.3: fixed an RSS parsing bug that caused enclosures of some feeds to be ignored
|
52
|
+
# 1.0.2: fixed some minor MP3 file naming bugs
|
53
|
+
# 1.0.1: names of downloaded MP3 files are now preserved
|
54
|
+
# 1.0.0: added ATOM support
|
55
|
+
# 0.4.0: added duplicate removal for MP3, RSS/Atom and OPML URLs and pathnames; added the -i option that attempts to increase the listen-time given to podcasts which frequently release short shows
|
56
|
+
# 0.3.2: fixed BitTorrent handling bug
|
57
|
+
# 0.3.1: added robust handling of network exceptions, removed support for Ctrl-C to terminate execution
|
58
|
+
# 0.3.0: added support for opml format used by podcastalley, added podcast title information in playlists, reduced RAM usage by not loading the history file in memory, history file and playlist are now updated after each download
|
59
|
+
# 0.2.1: added support for Ctrl-C to terminate execution; added robust handling of some bad command line arguments; (James Carter patch) fixed the "OPML truncation" issue where a bad RSS feed was considered the last of the list
|
60
|
+
# 0.2.0: added a new download strategy ("one"); added support for more than one OPML argument, fixed some issues
|
61
|
+
# 0.1.7: bug fix
|
62
|
+
# 0.1.6: added internal Bittorrent support, fixed flawed handling of some exceptions
|
63
|
+
# 0.1.5: changed -d option description, added external handling of Bittorrent files
|
64
|
+
# 0.1.4: bug-fix, robust handling of bad //enclosure/@length attributes, handling of relative enclosure URLs
|
65
|
+
# 0.1.3: podcast download strategies (and changed default), download retries
|
66
|
+
# 0.1.2: added TOX playlist support, added HTTP and FTP support for the OPML parameter, done some code clean-up
|
67
|
+
# 0.1.1: fixed RSS parsing issue
|
68
|
+
# 0.1.0: initial version
|
69
|
+
|
70
|
+
require 'uri'
|
71
|
+
require 'open-uri'
|
72
|
+
require 'ostruct'
|
73
|
+
require 'optparse'
|
74
|
+
require 'pathname'
|
75
|
+
require 'date'
|
76
|
+
require 'cgi'
|
77
|
+
require 'yaml'
|
78
|
+
require 'net/http'
|
79
|
+
require 'rexml/document'
|
80
|
+
|
81
|
+
include REXML
|
82
|
+
|
83
|
+
#PODCATCHER_ENV = :development
|
84
|
+
PODCATCHER_ENV = :production
|
85
|
+
|
86
|
+
USER_AGENT = "podcatcher/#{PODCATCHER_VERSION} Ruby/#{RUBY_VERSION} #{RUBY_PLATFORM}"
|
87
|
+
UPDATE_CHECK_INTERVAL = 6 #months
|
88
|
+
|
89
|
+
opt = OpenStruct.new
|
90
|
+
opt.PLAYLIST_TYPES = [:m3u, :smil, :pls, :asx, :tox, :xspf]
|
91
|
+
opt.playlist_type = opt.PLAYLIST_TYPES[0]
|
92
|
+
opt.size = 512
|
93
|
+
opt.content_type = Regexp.new ''
|
94
|
+
opt.DESCRIPTION = <<END
|
95
|
+
|
96
|
+
Armangil's podcatcher is a podcast client for the command line.
|
97
|
+
It can download any type of content enclosed in RSS or Atom files, such as
|
98
|
+
MP3 or other audio content, video and images. A search function for
|
99
|
+
subscribing to feeds is also included. It provides several download
|
100
|
+
strategies, supports BitTorrent, offers cache management, and generates
|
101
|
+
playlists for media player applications.
|
102
|
+
|
103
|
+
As argument, it accepts feeds (RSS or Atom) or subscription lists
|
104
|
+
(OPML or iTunes PCAST), in the form of filenames or URLs (HTTP or FTP).
|
105
|
+
Alternatively, it accepts one feed or subscription list from the standard
|
106
|
+
input.
|
107
|
+
|
108
|
+
BitTorrent is supported both internally (through the RubyTorrent library)
|
109
|
+
and externally (.torrent files are downloaded, but the user handles
|
110
|
+
them using a BitTorrent application). The latter is currently the most
|
111
|
+
reliable method, as RubyTorrent is still in alpha phase.
|
112
|
+
|
113
|
+
Concurrency is not handled: simultaneous executions of this program should
|
114
|
+
target different directories.
|
115
|
+
|
116
|
+
Visit $website for more information.
|
117
|
+
|
118
|
+
Usage: #{$0} [options] [arguments]
|
119
|
+
END
|
120
|
+
|
121
|
+
opt.DESCRIPTION.gsub! '$website', PODCATCHER_WEBSITE
|
122
|
+
|
123
|
+
opt.dir = Pathname.new Dir.pwd
|
124
|
+
opt.CACHEDIR= 'cache'
|
125
|
+
opt.cachedir = opt.dir + opt.CACHEDIR
|
126
|
+
opt.memsize = 1_000
|
127
|
+
opt.empty = false
|
128
|
+
opt.simulate = false
|
129
|
+
opt.verbose = false
|
130
|
+
opt.STRATEGIES = [:one, :new, :back_catalog, :all, :chron, :chron_one, :chron_all, :cache]
|
131
|
+
opt.strategy = opt.STRATEGIES[0]
|
132
|
+
opt.retries = 1
|
133
|
+
opt.torrent_dir = nil
|
134
|
+
opt.rubytorrent = false
|
135
|
+
opt.upload_rate = nil #10
|
136
|
+
opt.itemsize = 0
|
137
|
+
opt.feeds = 1_000
|
138
|
+
opt.FUNCTIONS = [:download, :search]
|
139
|
+
opt.function = opt.FUNCTIONS[0]
|
140
|
+
opt.per_feed = false
|
141
|
+
opt.vote = false
|
142
|
+
opt.check_for_update = false
|
143
|
+
opt.ORDERS = [:random, :sequential, :alphabetical, :reverse]
|
144
|
+
opt.order = opt.ORDERS[0]
|
145
|
+
opt.horizon=nil
|
146
|
+
opt.language=[]
|
147
|
+
opt.restricted_names = true
|
148
|
+
|
149
|
+
arguments = []
|
150
|
+
|
151
|
+
option_parser = OptionParser.new() do |c|
|
152
|
+
c.banner = opt.DESCRIPTION
|
153
|
+
c.separator ""
|
154
|
+
c.separator "Options:"
|
155
|
+
c.on("-d", "--dir DIR",
|
156
|
+
"Directory for storing application state.",
|
157
|
+
"Default value is current directory.\n") do |e|
|
158
|
+
contained=false
|
159
|
+
#cache directory inside old state directory?
|
160
|
+
statedir=opt.dir
|
161
|
+
cachedir=opt.cachedir
|
162
|
+
loop do
|
163
|
+
if cachedir==statedir
|
164
|
+
contained=true
|
165
|
+
break
|
166
|
+
end
|
167
|
+
break if cachedir.root?
|
168
|
+
cachedir=cachedir.parent
|
169
|
+
end
|
170
|
+
opt.dir = Pathname.new(Dir.pwd)+e
|
171
|
+
#cache directory inside new state directory?
|
172
|
+
unless contained
|
173
|
+
statedir=opt.dir
|
174
|
+
cachedir=opt.cachedir
|
175
|
+
loop do
|
176
|
+
if cachedir==statedir
|
177
|
+
contained=true
|
178
|
+
break
|
179
|
+
end
|
180
|
+
break if cachedir.root?
|
181
|
+
cachedir=cachedir.parent
|
182
|
+
end
|
183
|
+
end
|
184
|
+
#new state directory inside cache directory?
|
185
|
+
unless contained
|
186
|
+
statedir=opt.dir
|
187
|
+
cachedir=opt.cachedir
|
188
|
+
loop do
|
189
|
+
if cachedir==statedir
|
190
|
+
contained=true
|
191
|
+
break
|
192
|
+
end
|
193
|
+
break if statedir.root?
|
194
|
+
statedir=statedir.parent
|
195
|
+
end
|
196
|
+
end
|
197
|
+
#
|
198
|
+
opt.dir.mkdir unless opt.dir.exist?
|
199
|
+
exit 1 unless opt.dir.directory?
|
200
|
+
if contained
|
201
|
+
opt.cachedir = opt.dir + opt.CACHEDIR
|
202
|
+
end
|
203
|
+
end
|
204
|
+
c.on("-D", "--cachedir DIR",
|
205
|
+
"Directory for storing downloaded content.",
|
206
|
+
"Default value is the '#{opt.CACHEDIR}' subdirectory",
|
207
|
+
"of the state directory (specified by ",
|
208
|
+
"the --dir option).",
|
209
|
+
"This option is ignored if this directory",
|
210
|
+
"is inside the state directory, or if the",
|
211
|
+
"state directory is inside this directory.\n") do |e|
|
212
|
+
contained=false
|
213
|
+
#cache directory should be outside state directory
|
214
|
+
statedir=opt.dir
|
215
|
+
cachedir = Pathname.new(Dir.pwd)+e
|
216
|
+
loop do
|
217
|
+
if cachedir==statedir
|
218
|
+
contained=true
|
219
|
+
break
|
220
|
+
end
|
221
|
+
break if cachedir.root?
|
222
|
+
cachedir=cachedir.parent
|
223
|
+
end
|
224
|
+
next if contained
|
225
|
+
#state directory should be outside cache directory
|
226
|
+
statedir=opt.dir
|
227
|
+
cachedir = Pathname.new(Dir.pwd)+e
|
228
|
+
loop do
|
229
|
+
if cachedir==statedir
|
230
|
+
contained=true
|
231
|
+
break
|
232
|
+
end
|
233
|
+
break if statedir.root?
|
234
|
+
statedir=statedir.parent
|
235
|
+
end
|
236
|
+
next if contained
|
237
|
+
#accept cache directory
|
238
|
+
opt.cachedir=Pathname.new(Dir.pwd)+e
|
239
|
+
end
|
240
|
+
c.on("-s", "--size SIZE",
|
241
|
+
"Size, in megabytes, of the cache directory",
|
242
|
+
"(specified by the --cachedir option).",
|
243
|
+
"0 means unbounded. Default value is #{opt.size}.",
|
244
|
+
"This option also sets the upper limit for",
|
245
|
+
"the amount of content that can be downloaded",
|
246
|
+
"in one session.",
|
247
|
+
"Content downloaded during previous sessions",
|
248
|
+
"may be deleted by podcatcher in order to",
|
249
|
+
"make place for new content.\n") do |e|
|
250
|
+
opt.size = e.to_i
|
251
|
+
opt.size = nil if opt.size<1
|
252
|
+
end
|
253
|
+
c.on("-e", "--[no-]empty",
|
254
|
+
"Empty the cache directory before",
|
255
|
+
"downloading content.\n") do |e|
|
256
|
+
opt.empty = e
|
257
|
+
end
|
258
|
+
c.on("-p", "--[no-]perfeed",
|
259
|
+
"Create one subdirectory per feed",
|
260
|
+
"in the cache directory.\n") do |e|
|
261
|
+
opt.per_feed = e
|
262
|
+
end
|
263
|
+
c.on("-S", "--strategy S", opt.STRATEGIES,
|
264
|
+
"Strategy to use when downloading content:",
|
265
|
+
"* back_catalog: download any content that",
|
266
|
+
" has not been downloaded before; prefer",
|
267
|
+
" recent content to older content (may ",
|
268
|
+
" download more than one content file per",
|
269
|
+
" feed),",
|
270
|
+
"* one: download one content file (not ",
|
271
|
+
" already downloaded) for each feed, with a ",
|
272
|
+
" preference for recent content,",
|
273
|
+
"* all: download all content, with a ",
|
274
|
+
" preference for recent content; even ",
|
275
|
+
" already downloaded content is downloaded ",
|
276
|
+
" once again (may download more than one",
|
277
|
+
" content file per feed),",
|
278
|
+
"* chron: download in chronological order",
|
279
|
+
" any content that has not been downloaded ",
|
280
|
+
" before; this is useful for audiobook",
|
281
|
+
" podcasts etc (may download more than one",
|
282
|
+
" content file per feed),",
|
283
|
+
"* chron_one: download the oldest content of",
|
284
|
+
" each feed that has not already been ",
|
285
|
+
" downloaded, ",
|
286
|
+
"* chron_all: download all content in ",
|
287
|
+
" chronological order, even if the content",
|
288
|
+
" has already been downloaded (may download",
|
289
|
+
" more than one content file per feed), ",
|
290
|
+
"* new: download the most recent content ",
|
291
|
+
" of each feed, if it has not already been ",
|
292
|
+
" downloaded (DEPRECATED: use 'one' instead",
|
293
|
+
" of 'new'),",
|
294
|
+
"* cache: generate a playlist for content ",
|
295
|
+
" already in cache.",
|
296
|
+
"Default value is #{opt.strategy}.\n") do |e|
|
297
|
+
opt.strategy = e if e
|
298
|
+
end
|
299
|
+
c.on("-C", "--content REGEXP",
|
300
|
+
"A regular expression that matches the",
|
301
|
+
"MIME types of content to be downloaded.",
|
302
|
+
"Examples: '^video/', '^audio/mpeg$'.",
|
303
|
+
"Default value is '', which matches any",
|
304
|
+
"type of content.\n") do |e|
|
305
|
+
begin
|
306
|
+
opt.content_type = Regexp.new(e.downcase) if e
|
307
|
+
rescue Exception
|
308
|
+
$stderr.puts "Error: ignoring regular expression '#{e}'"
|
309
|
+
end
|
310
|
+
end
|
311
|
+
c.on("-l", "--language LANG",
|
312
|
+
"A list of language tags separated by",
|
313
|
+
"commas. Examples: 'en-us,de', 'fr'.",
|
314
|
+
"A feed whose language does not match",
|
315
|
+
"this list is ignored. By default, all",
|
316
|
+
"feeds are accepted. See",
|
317
|
+
"http://cyber.law.harvard.edu/rss/languages.html",
|
318
|
+
"and",
|
319
|
+
"http://cyber.law.harvard.edu/rss/rss.html#optionalChannelElements",
|
320
|
+
"for allowed tags.\n") do |e|
|
321
|
+
opt.language = e.split ','
|
322
|
+
for i in 0...opt.language.size
|
323
|
+
opt.language[i].downcase!
|
324
|
+
opt.language[i] = opt.language[i].split '-'
|
325
|
+
end
|
326
|
+
end
|
327
|
+
c.on("-H", "--horizon DATE",
|
328
|
+
"Do not download content older than",
|
329
|
+
"the given date. The date has the format",
|
330
|
+
"yyyy.mm.dd (example: 2007.03.22) or",
|
331
|
+
"yyyy.mm (equivalent to yyyy.mm.01) or",
|
332
|
+
"yyyy (equivalent to yyyy.01.01).",
|
333
|
+
"#{opt.horizon ? 'Default value is '+opt.horizon.to_s.split('-').join('.') : 'By default, no horizon is specified'}.\n") do |e|
|
334
|
+
begin
|
335
|
+
date = e.split '.'
|
336
|
+
if (1..3).include? date.size
|
337
|
+
while date.size < 3
|
338
|
+
date << '01'
|
339
|
+
end
|
340
|
+
opt.horizon = Date.parse date.join('-')
|
341
|
+
end
|
342
|
+
rescue ArgumentError
|
343
|
+
end
|
344
|
+
end
|
345
|
+
c.on("-r", "--retries N",
|
346
|
+
"Try downloading files (content, feeds",
|
347
|
+
"or subscription lists) at most N times",
|
348
|
+
"before giving up. Default value is #{opt.retries}.\n") do |e|
|
349
|
+
opt.retries = e.to_i unless e.to_i<1
|
350
|
+
end
|
351
|
+
c.on("-t", "--type TYPE", opt.PLAYLIST_TYPES,
|
352
|
+
"Type of the playlist written to",
|
353
|
+
"standard output. Accepted values are",
|
354
|
+
"#{opt.PLAYLIST_TYPES.join ', '}.",
|
355
|
+
"Default value is #{opt.playlist_type}.\n") do |e|
|
356
|
+
opt.playlist_type = e if e
|
357
|
+
end
|
358
|
+
c.on("-m", "--memsize N",
|
359
|
+
"Remember last N downloaded content,",
|
360
|
+
"and do not download them again. ",
|
361
|
+
"0 means unbounded. Default value is #{opt.memsize}.\n") do |e|
|
362
|
+
opt.memsize = e.to_i
|
363
|
+
opt.memsize = nil if opt.memsize<1
|
364
|
+
end
|
365
|
+
c.on("-o", "--order ORDER", opt.ORDERS,
|
366
|
+
"The order in which feeds are traversed",
|
367
|
+
"when downloading content:",
|
368
|
+
"* random: randomizes the feed order,",
|
369
|
+
" so that every feed has an equal chance",
|
370
|
+
" when content is downloaded, even if",
|
371
|
+
" the cache size is small and the number",
|
372
|
+
" of feeds is big,",
|
373
|
+
"* alphabetical: orders feeds",
|
374
|
+
" alphabetically by using their titles,",
|
375
|
+
"* sequential: preserves the argument ",
|
376
|
+
" order (and the feed order in",
|
377
|
+
" subscription lists),",
|
378
|
+
"* reverse: reverses the feed order.",
|
379
|
+
"Default value is #{opt.order}.\n") do |e|
|
380
|
+
opt.order = e if e
|
381
|
+
end
|
382
|
+
c.on("-F", "--function FUNCTION", opt.FUNCTIONS,
|
383
|
+
"Used function:",
|
384
|
+
"* download: downloads content from",
|
385
|
+
" specified feeds,",
|
386
|
+
"* search: generates an OPML subscription",
|
387
|
+
" list of feeds matching the specified",
|
388
|
+
" query; the only options relevant for ",
|
389
|
+
" search are -v, -r and -f.",
|
390
|
+
"Default value is #{opt.function}.\n") do |e|
|
391
|
+
opt.function = e if e
|
392
|
+
end
|
393
|
+
c.on("-f", "--feeds N",
|
394
|
+
"Do not download more than N feeds",
|
395
|
+
"(when using the download function),",
|
396
|
+
"or return the first N relevant feeds",
|
397
|
+
"(when using the search function).",
|
398
|
+
"0 means unbounded. Default value is #{opt.feeds}.\n") do |e|
|
399
|
+
opt.feeds = e.to_i
|
400
|
+
opt.feeds = nil if opt.feeds<1
|
401
|
+
end
|
402
|
+
c.on("-T", "--torrentdir DIR",
|
403
|
+
"Copy torrent files to directory DIR.",
|
404
|
+
"The handling of torrents through an",
|
405
|
+
"external BitTorrent client is left to",
|
406
|
+
"the user. If this option is not used,",
|
407
|
+
"torrents are handled internally (if",
|
408
|
+
"RubyTorrent is installed), or else",
|
409
|
+
"ignored.\n") do |e|
|
410
|
+
dir = Pathname.new e
|
411
|
+
if dir.exist? and dir.directory?
|
412
|
+
opt.torrent_dir = dir
|
413
|
+
end
|
414
|
+
end
|
415
|
+
c.on("-U", "--uploadrate N",
|
416
|
+
"Maximum upload rate (kilobytes per second)",
|
417
|
+
"for the internal BitTorrent client.",
|
418
|
+
"#{opt.upload_rate ? 'Default value is '+opt.upload_rate : 'Unbounded by default'}.\n") do |e|
|
419
|
+
opt.upload_rate = e.to_i unless e.to_i<1
|
420
|
+
end
|
421
|
+
c.on("-i", "--itemsize N",
|
422
|
+
"If downloaded content is less than N MB in",
|
423
|
+
"size (where N is an integer), fetch other",
|
424
|
+
"content of that same feed until this size",
|
425
|
+
"is reached. ",
|
426
|
+
"Default value is #{opt.itemsize}.",
|
427
|
+
"The intent here is to ensure that podcatcher",
|
428
|
+
"downloads about as much content from podcasts",
|
429
|
+
"that frequently post small content (in",
|
430
|
+
"terms of minutes) as it does from podcasts",
|
431
|
+
"that post bigger content less frequently.",
|
432
|
+
"This option was more relevant in the early",
|
433
|
+
"days of podcasting when content size varied",
|
434
|
+
"greatly from one podcast to another. You",
|
435
|
+
"would rarely need to use this option today.\n") do |e|
|
436
|
+
opt.itemsize = e.to_i unless e.to_i<0
|
437
|
+
end
|
438
|
+
c.on("-c", "--[no-]cache",
|
439
|
+
"Generate a playlist for content",
|
440
|
+
"already in cache.",
|
441
|
+
"DEPRECATED, use '--strategy cache'.\n") do |e|
|
442
|
+
opt.strategy = :cache if e
|
443
|
+
end
|
444
|
+
c.on("-a", "--[no-]asif",
|
445
|
+
"Do not download content, only download",
|
446
|
+
"feeds and subscription lists.",
|
447
|
+
"Useful for testing.\n") do |e|
|
448
|
+
opt.simulate = e
|
449
|
+
end
|
450
|
+
c.on("-v", "--[no-]verbose", "Run verbosely.\n") do |e|
|
451
|
+
opt.verbose = e
|
452
|
+
end
|
453
|
+
c.on("-V", "--version", "Display current version and exit.\n") do
|
454
|
+
puts PODCATCHER_VERSION
|
455
|
+
exit
|
456
|
+
end
|
457
|
+
c.on("-h", "--help", "Display this message and exit.\n") do
|
458
|
+
puts c.to_s
|
459
|
+
exit
|
460
|
+
end
|
461
|
+
c.on("--[no-]restrictednames",
|
462
|
+
'In the cache directory, make the names of',
|
463
|
+
'created subdirectories and files acceptable',
|
464
|
+
'for restrictive file systems such as VFAT',
|
465
|
+
'and FAT, which are used on Windows and MP3',
|
466
|
+
'player devices.',
|
467
|
+
"Enabled by default.\n") do |e|
|
468
|
+
opt.restricted_names = e
|
469
|
+
end
|
470
|
+
# c.on("--[no-]checkforupdate",
|
471
|
+
# "Check once every #{UPDATE_CHECK_INTERVAL} months if a newer ",
|
472
|
+
# "version is available and display an ",
|
473
|
+
# "informational message. Disabled by default.\n") do |e|
|
474
|
+
# opt.check_for_update = e
|
475
|
+
# end
|
476
|
+
# c.on("--[no-]vote",
|
477
|
+
# "Automatically vote for the downloaded",
|
478
|
+
# "podcasts at podcatcherstats.com.",
|
479
|
+
# "Disabled by default.\n") do |e|
|
480
|
+
# opt.vote = e
|
481
|
+
# end
|
482
|
+
c.on("-A", "--arguments FILENAME_OR_URL",
|
483
|
+
"Read arguments from specified file.",
|
484
|
+
"Rules:",
|
485
|
+
"* accepts one argument per line,",
|
486
|
+
"* ignores empty lines and lines starting",
|
487
|
+
" with #,",
|
488
|
+
"* this option may be used several times",
|
489
|
+
" in one command.\n") do |e|
|
490
|
+
begin
|
491
|
+
open(e) do |f|
|
492
|
+
loop do
|
493
|
+
line = f.gets
|
494
|
+
break unless line
|
495
|
+
line = line.chomp.strip
|
496
|
+
next if line.length == 0
|
497
|
+
next if line =~ /^\s*#/
|
498
|
+
arguments << line
|
499
|
+
end
|
500
|
+
end
|
501
|
+
rescue Exception
|
502
|
+
$stderr.puts "Error: arguments file could not be read and will be ignored"
|
503
|
+
end
|
504
|
+
end
|
505
|
+
c.on("-O", "--options FILENAME_OR_URL",
|
506
|
+
"Read options from specified file.",
|
507
|
+
"The options file uses the YAML format.\n") do |e|
|
508
|
+
loop do
|
509
|
+
options = nil
|
510
|
+
begin
|
511
|
+
open(e) do |f|
|
512
|
+
options = YAML::load(f)
|
513
|
+
end
|
514
|
+
rescue Exception
|
515
|
+
$stderr.puts "Error: options file could not be read and will be ignored"
|
516
|
+
end
|
517
|
+
break unless options
|
518
|
+
break unless options.instance_of? Hash
|
519
|
+
options.each() do |option, value|
|
520
|
+
case option.downcase
|
521
|
+
when 'arguments'
|
522
|
+
begin
|
523
|
+
open(value) do |f|
|
524
|
+
loop do
|
525
|
+
line = f.gets
|
526
|
+
break unless line
|
527
|
+
line = line.chomp.strip
|
528
|
+
next if line.length == 0
|
529
|
+
arguments << line
|
530
|
+
end
|
531
|
+
end
|
532
|
+
rescue Exception
|
533
|
+
$stderr.puts "Error: arguments file could not be read and will be ignored"
|
534
|
+
end
|
535
|
+
when 'dir'
|
536
|
+
contained=false
|
537
|
+
#cache directory inside old state directory?
|
538
|
+
statedir=opt.dir
|
539
|
+
cachedir=opt.cachedir
|
540
|
+
loop do
|
541
|
+
if cachedir==statedir
|
542
|
+
contained=true
|
543
|
+
break
|
544
|
+
end
|
545
|
+
break if cachedir.root?
|
546
|
+
cachedir=cachedir.parent
|
547
|
+
end
|
548
|
+
opt.dir = Pathname.new(Dir.pwd)+value
|
549
|
+
#cache directory inside new state directory?
|
550
|
+
unless contained
|
551
|
+
statedir=opt.dir
|
552
|
+
cachedir=opt.cachedir
|
553
|
+
loop do
|
554
|
+
if cachedir==statedir
|
555
|
+
contained=true
|
556
|
+
break
|
557
|
+
end
|
558
|
+
break if cachedir.root?
|
559
|
+
cachedir=cachedir.parent
|
560
|
+
end
|
561
|
+
end
|
562
|
+
#new state directory inside cache directory?
|
563
|
+
unless contained
|
564
|
+
statedir=opt.dir
|
565
|
+
cachedir=opt.cachedir
|
566
|
+
loop do
|
567
|
+
if cachedir==statedir
|
568
|
+
contained=true
|
569
|
+
break
|
570
|
+
end
|
571
|
+
break if statedir.root?
|
572
|
+
statedir=statedir.parent
|
573
|
+
end
|
574
|
+
end
|
575
|
+
#
|
576
|
+
opt.dir.mkdir unless opt.dir.exist?
|
577
|
+
exit 1 unless opt.dir.directory?
|
578
|
+
if contained
|
579
|
+
opt.cachedir = opt.dir + opt.CACHEDIR
|
580
|
+
end
|
581
|
+
when 'cachedir'
|
582
|
+
contained=false
|
583
|
+
#cache directory should be outside state directory
|
584
|
+
statedir=opt.dir
|
585
|
+
cachedir = Pathname.new(Dir.pwd)+value
|
586
|
+
loop do
|
587
|
+
if cachedir==statedir
|
588
|
+
contained=true
|
589
|
+
break
|
590
|
+
end
|
591
|
+
break if cachedir.root?
|
592
|
+
cachedir=cachedir.parent
|
593
|
+
end
|
594
|
+
next if contained
|
595
|
+
#state directory should be outside cache directory
|
596
|
+
statedir=opt.dir
|
597
|
+
cachedir = Pathname.new(Dir.pwd)+value
|
598
|
+
loop do
|
599
|
+
if cachedir==statedir
|
600
|
+
contained=true
|
601
|
+
break
|
602
|
+
end
|
603
|
+
break if statedir.root?
|
604
|
+
statedir=statedir.parent
|
605
|
+
end
|
606
|
+
next if contained
|
607
|
+
#accept cache directory
|
608
|
+
opt.cachedir=Pathname.new(Dir.pwd)+value
|
609
|
+
when 'size'
|
610
|
+
if value.instance_of?(Fixnum)
|
611
|
+
opt.size = value
|
612
|
+
opt.size = nil if opt.size<1
|
613
|
+
end
|
614
|
+
when 'strategy'
|
615
|
+
opt.strategy = value.to_sym if opt.STRATEGIES.detect{|s| value.to_sym == s}
|
616
|
+
when 'type'
|
617
|
+
opt.playlist_type = value.to_sym if opt.PLAYLIST_TYPES.detect{|s| value.to_sym == s}
|
618
|
+
when 'retries'
|
619
|
+
opt.retries = value if value.instance_of?(Fixnum) and value>=1
|
620
|
+
when 'memsize'
|
621
|
+
if value.instance_of?(Fixnum)
|
622
|
+
opt.memsize = value
|
623
|
+
opt.memsize = nil if opt.memsize<1
|
624
|
+
end
|
625
|
+
when 'content'
|
626
|
+
begin
|
627
|
+
opt.content_type = Regexp.new(value.downcase)
|
628
|
+
rescue Exception
|
629
|
+
$stderr.puts "Error: '#{value.downcase}' is not a valid regular expression and will be ignored"
|
630
|
+
end
|
631
|
+
when 'language'
|
632
|
+
opt.language = value.split ','
|
633
|
+
for i in 0...opt.language.size
|
634
|
+
opt.language[i].downcase!
|
635
|
+
opt.language[i] = opt.language[i].split '-'
|
636
|
+
end
|
637
|
+
when 'order'
|
638
|
+
opt.order = value.to_sym if opt.ORDERS.detect{|s| value.to_sym == s}
|
639
|
+
when 'function'
|
640
|
+
opt.function = value.to_sym if opt.FUNCTIONS.detect{|s| value.to_sym == s}
|
641
|
+
when 'feeds'
|
642
|
+
if value.instance_of?(Fixnum)
|
643
|
+
opt.feeds = value
|
644
|
+
opt.feeds = nil if opt.feeds<1
|
645
|
+
end
|
646
|
+
when 'horizon'
|
647
|
+
begin
|
648
|
+
date = value.split '.'
|
649
|
+
if (1..3).include? date.size
|
650
|
+
while date.size < 3
|
651
|
+
date << '01'
|
652
|
+
end
|
653
|
+
opt.horizon = Date.parse date.join('-')
|
654
|
+
end
|
655
|
+
rescue ArgumentError
|
656
|
+
end
|
657
|
+
when 'torrentdir'
|
658
|
+
dir = Pathname.new value
|
659
|
+
if dir.exist? and dir.directory?
|
660
|
+
opt.torrent_dir = dir
|
661
|
+
end
|
662
|
+
when 'uploadrate'
|
663
|
+
opt.upload_rate = value if value.instance_of?(Fixnum) and value>=1
|
664
|
+
when 'itemsize'
|
665
|
+
opt.itemsize = value if value.instance_of?(Fixnum) and value>=0
|
666
|
+
when 'perfeed'
|
667
|
+
opt.per_feed = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
668
|
+
when 'cache'
|
669
|
+
opt.strategy = :cache if value.instance_of?(TrueClass)
|
670
|
+
when 'empty'
|
671
|
+
opt.empty = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
672
|
+
when 'asif'
|
673
|
+
opt.simulate = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
674
|
+
when 'checkforupdate'
|
675
|
+
opt.check_for_update = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
676
|
+
when 'vote'
|
677
|
+
opt.vote = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
678
|
+
when 'verbose'
|
679
|
+
opt.verbose = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
680
|
+
when 'restrictednames'
|
681
|
+
opt.restricted_names = value if value.instance_of?(FalseClass) or value.instance_of?(TrueClass)
|
682
|
+
end
|
683
|
+
end
|
684
|
+
break
|
685
|
+
end
|
686
|
+
end
|
687
|
+
c.separator ""
|
688
|
+
c.separator "Usage examples:"
|
689
|
+
c.separator ""
|
690
|
+
c.separator " #{$0} http://feeds.feedburner.com/Ruby5"
|
691
|
+
c.separator ""
|
692
|
+
c.separator " #{$0} -O options.yaml -A feeds.txt"
|
693
|
+
c.separator ""
|
694
|
+
c.separator " #{$0} --dir ~/podcasts http://www.npr.org/podcasts.opml"
|
695
|
+
c.separator ""
|
696
|
+
c.separator " #{$0} --dir ~/podcasts --strategy cache > cache.m3u"
|
697
|
+
c.separator ""
|
698
|
+
c.separator " cat feeds.opml | #{$0} --dir ~/podcasts > latest.m3u"
|
699
|
+
c.separator ""
|
700
|
+
c.separator " #{$0} -vd ~/podcasts -s 500 -m 10_000 -t tox feeds.opml > latest.tox"
|
701
|
+
c.separator ""
|
702
|
+
c.separator " #{$0} -vF search news http://www.bbc.co.uk/podcasts.opml > bbc_news.opml"
|
703
|
+
c.separator ""
|
704
|
+
c.separator " #{$0} -F search -f 12 news http://www.npr.org/podcasts.opml > npr_news.opml"
|
705
|
+
end
|
706
|
+
option_parser.parse!
|
707
|
+
|
708
|
+
class Playlist
|
709
|
+
def initialize(playlisttype)
|
710
|
+
@playlisttype = playlisttype
|
711
|
+
@audio_or_video = Regexp.new '^audio/|^video/'
|
712
|
+
@size = 0
|
713
|
+
end
|
714
|
+
def start()
|
715
|
+
@str = ""
|
716
|
+
case @playlisttype
|
717
|
+
when :tox
|
718
|
+
@str = "# toxine playlist \n"
|
719
|
+
when :m3u
|
720
|
+
@str = "#EXTM3U\n"
|
721
|
+
when :pls
|
722
|
+
@str = "[playlist]\n"
|
723
|
+
when :asx
|
724
|
+
@str = <<END
|
725
|
+
<asx version = "3.0">
|
726
|
+
END
|
727
|
+
when :smil
|
728
|
+
@str = <<END
|
729
|
+
<?xml version="1.0"?>
|
730
|
+
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
|
731
|
+
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
|
732
|
+
<head></head>
|
733
|
+
<body>
|
734
|
+
END
|
735
|
+
when :xspf
|
736
|
+
@doc = Document.new
|
737
|
+
@doc.xml_decl.dowrite
|
738
|
+
@doc.add_element Element.new("playlist")
|
739
|
+
@doc.root.add_attribute "version", "1"
|
740
|
+
@doc.root.add_attribute "xmlns", "http://xspf.org/ns/0/"
|
741
|
+
@tracklist = Element.new("trackList")
|
742
|
+
@doc.root.add_element @tracklist
|
743
|
+
end
|
744
|
+
print @str
|
745
|
+
@str
|
746
|
+
end
|
747
|
+
def add(content)
|
748
|
+
return unless content
|
749
|
+
if content.mime
|
750
|
+
return unless @audio_or_video =~ content.mime
|
751
|
+
end
|
752
|
+
@size+=1
|
753
|
+
feed_title = content.feed_title
|
754
|
+
feed_title = '' unless feed_title
|
755
|
+
feed_title = sanitize feed_title
|
756
|
+
title = content.title
|
757
|
+
title = '' unless title
|
758
|
+
title = sanitize title
|
759
|
+
title = "#{content.pub_date.strftime('%Y.%m.%d')} - "+title if content.pub_date
|
760
|
+
entry = ""
|
761
|
+
case @playlisttype
|
762
|
+
when :m3u
|
763
|
+
feed_title = feed_title.gsub(/,/," ")
|
764
|
+
title = title.gsub(/,/," ")
|
765
|
+
entry = "#EXTINF:-1,[#{feed_title}] #{title}\n#{content.file.to_s}\n"
|
766
|
+
when :pls
|
767
|
+
entry = "File#{@size}:#{content.file}\nTitle#{@size}:[#{feed_title}] #{title}\nLength#{@size}:-1\n"
|
768
|
+
when :asx
|
769
|
+
entry = " <entry><ref href='#{content.file.to_s.gsub(/&/,"&").gsub(/'/,"'").gsub(/"/,""")}' /></entry>\n"
|
770
|
+
when :smil
|
771
|
+
entry = " <ref src='#{content.file.to_s.gsub(/&/,"&").gsub(/'/,"'").gsub(/"/,""")}' />\n"
|
772
|
+
when :tox
|
773
|
+
entry = "entry { \n\tidentifier = [#{feed_title}] #{title};\n\tmrl = #{content.file};\n};\n"
|
774
|
+
when :xspf
|
775
|
+
track = Element.new("track")
|
776
|
+
@tracklist.add_element track
|
777
|
+
title = Element.new("title")
|
778
|
+
title.add_text "[#{feed_title}] #{title}"
|
779
|
+
track.add_element title
|
780
|
+
location = Element.new("location")
|
781
|
+
location.add_text fileurl(content.file)
|
782
|
+
track.add_element location
|
783
|
+
end
|
784
|
+
@str += entry
|
785
|
+
print entry
|
786
|
+
entry
|
787
|
+
end
|
788
|
+
def finish()
|
789
|
+
res = ""
|
790
|
+
case @playlisttype
|
791
|
+
when :tox
|
792
|
+
res = "# end "
|
793
|
+
when :asx
|
794
|
+
res = <<END
|
795
|
+
</asx>
|
796
|
+
END
|
797
|
+
when :smil
|
798
|
+
res = <<END
|
799
|
+
</body>
|
800
|
+
</smil>
|
801
|
+
END
|
802
|
+
when :pls
|
803
|
+
res = "NumberOfEntries=#{@size}\nVersion=2\n"
|
804
|
+
when :xspf
|
805
|
+
@doc.write $stdout, 0
|
806
|
+
end
|
807
|
+
@str += res
|
808
|
+
print res
|
809
|
+
res
|
810
|
+
end
|
811
|
+
def to_s()
|
812
|
+
if @doc
|
813
|
+
@doc.to_s
|
814
|
+
else
|
815
|
+
@str
|
816
|
+
end
|
817
|
+
end
|
818
|
+
private
|
819
|
+
def fileurl(path)
|
820
|
+
res = ""
|
821
|
+
loop do
|
822
|
+
path, base = path.split
|
823
|
+
if base.root?
|
824
|
+
if base.to_s != "/"
|
825
|
+
res = "/"+CGI.escape(base.to_s)+res
|
826
|
+
end
|
827
|
+
break
|
828
|
+
end
|
829
|
+
res = "/"+CGI.escape(base.to_s)+res
|
830
|
+
end
|
831
|
+
"file://"+res
|
832
|
+
end
|
833
|
+
def sanitize(text) #removes invisible characters from text
|
834
|
+
return nil unless text
|
835
|
+
res = ''
|
836
|
+
text.each_byte() do |c|
|
837
|
+
case c
|
838
|
+
when 0..31, 127 #control chars
|
839
|
+
res << ' '
|
840
|
+
else
|
841
|
+
res << c
|
842
|
+
end
|
843
|
+
end
|
844
|
+
res
|
845
|
+
end
|
846
|
+
end
|
847
|
+
|
848
|
+
class Update
|
849
|
+
def initialize(dir)
|
850
|
+
@now = Time.now
|
851
|
+
@data = {'last-check' => @now, 'latest-version' => PODCATCHER_VERSION, 'latest-version-description' => ''}
|
852
|
+
@server = URI.parse('http://www.podcatcherstats.com/podcatcher/latest_release')
|
853
|
+
@server = URI.parse('http://0.0.0.0:3000/podcatcher/latest_release') if PODCATCHER_ENV == :development
|
854
|
+
return unless dir
|
855
|
+
return unless dir.directory?
|
856
|
+
@file = dir + 'updates'
|
857
|
+
if @file.exist? and @file.file?
|
858
|
+
begin
|
859
|
+
data = nil
|
860
|
+
@file.open() do |f|
|
861
|
+
data = YAML.load f
|
862
|
+
end
|
863
|
+
if data.instance_of? Hash
|
864
|
+
if newer_or_equal? data['latest-version']
|
865
|
+
data.each() do |key, value|
|
866
|
+
case key
|
867
|
+
when 'last-check'
|
868
|
+
@data[key] = value if value.instance_of? Time and value < @now
|
869
|
+
when 'latest-version'
|
870
|
+
@data[key] = value if value.instance_of? String
|
871
|
+
when 'latest-version-description'
|
872
|
+
@data[key] = value if value.instance_of? String
|
873
|
+
end
|
874
|
+
end
|
875
|
+
end
|
876
|
+
end
|
877
|
+
rescue Interrupt
|
878
|
+
@file.delete
|
879
|
+
rescue SystemExit
|
880
|
+
exit 1
|
881
|
+
rescue Exception
|
882
|
+
@file.delete
|
883
|
+
end
|
884
|
+
end
|
885
|
+
save
|
886
|
+
exit 1 unless @file.file?
|
887
|
+
end
|
888
|
+
def check()
|
889
|
+
if @now - @data['last-check'] > 60.0 * 60.0 * 24 * 30 * UPDATE_CHECK_INTERVAL
|
890
|
+
@data['last-check'] = @now
|
891
|
+
begin
|
892
|
+
Net::HTTP.start(@server.host, @server.port) do |http|
|
893
|
+
resp = http.get(@server.path, {'User-Agent' => USER_AGENT, 'Connection' => 'close'})
|
894
|
+
loop do
|
895
|
+
break unless resp.code =~ Regexp.new('^2')
|
896
|
+
doc = Document.new resp.body
|
897
|
+
break unless doc and doc.root and doc.root.name == 'release'
|
898
|
+
version = XPath.first doc.root, 'version'
|
899
|
+
break unless version
|
900
|
+
break unless newer? version.text
|
901
|
+
description = XPath.first doc.root, 'description'
|
902
|
+
if description
|
903
|
+
description = description.text.strip
|
904
|
+
else
|
905
|
+
description = ''
|
906
|
+
end
|
907
|
+
@data['latest-version'] = version.join '.'
|
908
|
+
@data['latest-version-description'] = description
|
909
|
+
save
|
910
|
+
break
|
911
|
+
end
|
912
|
+
# read resp.body
|
913
|
+
end
|
914
|
+
rescue Interrupt
|
915
|
+
rescue SystemExit
|
916
|
+
exit 1
|
917
|
+
rescue Exception
|
918
|
+
end
|
919
|
+
end
|
920
|
+
flash
|
921
|
+
end
|
922
|
+
def to_s()
|
923
|
+
res = ''
|
924
|
+
if @data
|
925
|
+
@data.each() do |key, value|
|
926
|
+
res+= "#{key}: #{value}\n"
|
927
|
+
end
|
928
|
+
end
|
929
|
+
res
|
930
|
+
end
|
931
|
+
private
|
932
|
+
def flash()
|
933
|
+
return unless newer? @data['latest-version'] #if equal? @data['latest-version']
|
934
|
+
#constants
|
935
|
+
line_length = 70
|
936
|
+
p = '**** '
|
937
|
+
#
|
938
|
+
$stderr.puts ""
|
939
|
+
$stderr.puts p+"New release:"
|
940
|
+
$stderr.puts p+"Version #{@data['latest-version']} is available at #{PODCATCHER_WEBSITE}."
|
941
|
+
if @data['latest-version-description'].size>0
|
942
|
+
descr = []
|
943
|
+
@data['latest-version-description'].each() do |line|
|
944
|
+
descr = descr + line.chomp.split(' ')
|
945
|
+
end
|
946
|
+
line = nil
|
947
|
+
descr.each() do |word|
|
948
|
+
if line and (line + ' ' + word).size>line_length
|
949
|
+
$stderr.puts p+line
|
950
|
+
line = nil
|
951
|
+
end
|
952
|
+
if line
|
953
|
+
line += ' '+word
|
954
|
+
else
|
955
|
+
line = word
|
956
|
+
end
|
957
|
+
|
958
|
+
end
|
959
|
+
$stderr.puts p+line if line
|
960
|
+
end
|
961
|
+
$stderr.puts ""
|
962
|
+
end
|
963
|
+
def save()
|
964
|
+
@file.open('w') do |f|
|
965
|
+
YAML.dump @data, f
|
966
|
+
end
|
967
|
+
end
|
968
|
+
def compare_with(version) # Return values: -1: version<installed_version, 0: version==installed_version, 1: version>installed_version
|
969
|
+
return -1 unless version
|
970
|
+
version = version.strip.split '.'
|
971
|
+
for i in 0...version.size
|
972
|
+
version[i] = version[i].to_i
|
973
|
+
end
|
974
|
+
current_version = PODCATCHER_VERSION.strip.split '.'
|
975
|
+
for i in 0...current_version.size
|
976
|
+
current_version[i] = current_version[i].to_i
|
977
|
+
end
|
978
|
+
res = 0
|
979
|
+
for i in 0...version.size
|
980
|
+
break if i>=current_version.size
|
981
|
+
if current_version[i]>version[i]
|
982
|
+
res = -1
|
983
|
+
break
|
984
|
+
end
|
985
|
+
if current_version[i]<version[i]
|
986
|
+
res = 1
|
987
|
+
break
|
988
|
+
end
|
989
|
+
end
|
990
|
+
res
|
991
|
+
end
|
992
|
+
def newer?(version)
|
993
|
+
compare_with(version) == 1
|
994
|
+
end
|
995
|
+
def newer_or_equal?(version)
|
996
|
+
compare_with(version) != -1
|
997
|
+
end
|
998
|
+
def equal?(version)
|
999
|
+
compare_with(version) == 0
|
1000
|
+
end
|
1001
|
+
end
|
1002
|
+
|
1003
|
+
class Stats
|
1004
|
+
def initialize(dir)
|
1005
|
+
srand
|
1006
|
+
@now = Time.now
|
1007
|
+
@data = {'ping-probability' => 1.0}
|
1008
|
+
@server = URI.parse('http://www.podcatcherstats.com/podcatcher/ping')
|
1009
|
+
@server = URI.parse('http://0.0.0.0:3000/podcatcher/ping') if PODCATCHER_ENV == :development
|
1010
|
+
return unless dir
|
1011
|
+
return unless dir.directory?
|
1012
|
+
@file = dir + 'votes'
|
1013
|
+
if @file.exist? and @file.file?
|
1014
|
+
data = nil
|
1015
|
+
begin
|
1016
|
+
@file.open() do |f|
|
1017
|
+
data = YAML.load f
|
1018
|
+
end
|
1019
|
+
rescue Interrupt
|
1020
|
+
@file.delete
|
1021
|
+
rescue SystemExit
|
1022
|
+
exit 1
|
1023
|
+
rescue Exception
|
1024
|
+
@file.delete
|
1025
|
+
end
|
1026
|
+
if data.instance_of? Hash
|
1027
|
+
# $stderr.puts "votes file read"
|
1028
|
+
data.each() do |key, value|
|
1029
|
+
case key
|
1030
|
+
when 'ping-probability'
|
1031
|
+
@data[key] = value unless value<0.0 or 1.0<value
|
1032
|
+
when 'last-session'
|
1033
|
+
@data[key] = value unless @now<value
|
1034
|
+
when 'last-ping'
|
1035
|
+
@data[key] = value unless @now<value
|
1036
|
+
end
|
1037
|
+
end
|
1038
|
+
else
|
1039
|
+
# $stderr.puts "votes file could not be read"
|
1040
|
+
save
|
1041
|
+
end
|
1042
|
+
end
|
1043
|
+
if @data['last-ping']
|
1044
|
+
if @data['last-session']
|
1045
|
+
@data['last-ping'] = nil if @data['last-session']<@data['last-ping']
|
1046
|
+
else
|
1047
|
+
@data['last-ping'] = nil
|
1048
|
+
end
|
1049
|
+
end
|
1050
|
+
save unless @file.exist?
|
1051
|
+
exit 1 unless @file.file?
|
1052
|
+
end
|
1053
|
+
def ping(opt, feeds)
|
1054
|
+
return unless opt
|
1055
|
+
return unless feeds
|
1056
|
+
return if opt.simulate
|
1057
|
+
#constants
|
1058
|
+
max_sent_feeds = 50 #max nb of feed info to be sent
|
1059
|
+
#
|
1060
|
+
now = Time.now
|
1061
|
+
begin
|
1062
|
+
loop do
|
1063
|
+
break unless opt.vote
|
1064
|
+
break unless ping?
|
1065
|
+
# $stderr.puts "ping: #{@server}"
|
1066
|
+
stats = Document.new
|
1067
|
+
stats.add_element 'downloading'
|
1068
|
+
#state
|
1069
|
+
stats.root.add_element state_element #(opt)
|
1070
|
+
#feeds
|
1071
|
+
sent_feeds = 0
|
1072
|
+
feeds.each() do |feed|
|
1073
|
+
if feed.size > 0 and feed[0].feedurl and feed[0].feedurl.size<255 and (not URI.parse(feed[0].feedurl).instance_of?(URI::Generic)) and sent_feeds < max_sent_feeds
|
1074
|
+
stats.root.add_element 'feed', {'url' => feed[0].feedurl}
|
1075
|
+
sent_feeds += 1
|
1076
|
+
end
|
1077
|
+
end
|
1078
|
+
break unless sent_feeds>0
|
1079
|
+
#send
|
1080
|
+
stats_str = ''
|
1081
|
+
stats.write stats_str
|
1082
|
+
if PODCATCHER_ENV != :production
|
1083
|
+
$stderr.puts "Sent:"
|
1084
|
+
$stderr.puts stats_str
|
1085
|
+
end
|
1086
|
+
change_state = nil
|
1087
|
+
Net::HTTP.start(@server.host, @server.port) do |http|
|
1088
|
+
resp = http.request_post @server.path, stats_str, 'User-Agent' => USER_AGENT, 'Content-Type' => 'application/xml', 'Connection' => 'close'
|
1089
|
+
if PODCATCHER_ENV != :production
|
1090
|
+
$stderr.puts "Received:"
|
1091
|
+
$stderr.puts "#{resp.body}"
|
1092
|
+
end
|
1093
|
+
change resp.body
|
1094
|
+
end
|
1095
|
+
@data['last-ping'] = now+0
|
1096
|
+
break
|
1097
|
+
end
|
1098
|
+
rescue Interrupt
|
1099
|
+
# $stderr.puts "int1 #{$!}"
|
1100
|
+
rescue SystemExit
|
1101
|
+
exit 1
|
1102
|
+
rescue Exception
|
1103
|
+
# $stderr.puts "exc #{$!}"
|
1104
|
+
end
|
1105
|
+
@data['last-session'] = now+0
|
1106
|
+
save
|
1107
|
+
# $stderr.puts "#{to_s}"
|
1108
|
+
end
|
1109
|
+
def ping_search(opt, query)
|
1110
|
+
return unless opt
|
1111
|
+
return unless query
|
1112
|
+
return if opt.simulate
|
1113
|
+
now = Time.now
|
1114
|
+
begin
|
1115
|
+
loop do
|
1116
|
+
break unless opt.vote
|
1117
|
+
break unless ping?
|
1118
|
+
# $stderr.puts "ping.."
|
1119
|
+
stats = Document.new
|
1120
|
+
stats.add_element 'searching', {'query' => query}
|
1121
|
+
#state
|
1122
|
+
stats.root.add_element state_element
|
1123
|
+
#send
|
1124
|
+
stats_str = ''
|
1125
|
+
stats.write stats_str
|
1126
|
+
# $stderr.puts stats_str
|
1127
|
+
change_state = nil
|
1128
|
+
Net::HTTP.start(@server.host, @server.port) do |http|
|
1129
|
+
resp = http.request_post @server.path, stats_str, 'User-Agent' => USER_AGENT, 'Content-Type' => 'application/xml', 'Connection' => 'close'
|
1130
|
+
# $stderr.puts "#{resp.body}"
|
1131
|
+
change resp.body
|
1132
|
+
end
|
1133
|
+
@data['last-ping'] = now+0
|
1134
|
+
break
|
1135
|
+
end
|
1136
|
+
rescue Interrupt
|
1137
|
+
# $stderr.puts "int1 #{$!}"
|
1138
|
+
rescue SystemExit
|
1139
|
+
exit 1
|
1140
|
+
rescue Exception
|
1141
|
+
# $stderr.puts "exc #{$!}"
|
1142
|
+
end
|
1143
|
+
@data['last-session'] = now+0
|
1144
|
+
save
|
1145
|
+
# $stderr.puts "#{to_s}"
|
1146
|
+
end
|
1147
|
+
def to_s()
|
1148
|
+
res = ''
|
1149
|
+
if @data
|
1150
|
+
@data.each() do |key, value|
|
1151
|
+
res+= "#{key}: #{value}\n"
|
1152
|
+
end
|
1153
|
+
end
|
1154
|
+
res
|
1155
|
+
end
|
1156
|
+
private
|
1157
|
+
def save()
|
1158
|
+
@file.open('w') do |f|
|
1159
|
+
YAML.dump @data, f
|
1160
|
+
end
|
1161
|
+
end
|
1162
|
+
def ping?()
|
1163
|
+
r = rand
|
1164
|
+
# $stderr.puts "random: #{r}, ping-probability: #{@data['ping-probability']}"
|
1165
|
+
return r < @data['ping-probability']
|
1166
|
+
end
|
1167
|
+
def change(doc_str)
|
1168
|
+
return unless doc_str
|
1169
|
+
begin
|
1170
|
+
change_state = Document.new doc_str
|
1171
|
+
loop do
|
1172
|
+
break unless change_state
|
1173
|
+
break unless change_state.root
|
1174
|
+
break unless change_state.root.name == 'state'
|
1175
|
+
#ping-probability
|
1176
|
+
ping = change_state.root.attributes['ping']
|
1177
|
+
if ping and ping.size>0
|
1178
|
+
ping = ping.to_f
|
1179
|
+
unless ping<0.0 or 1.0<ping
|
1180
|
+
@data['ping-probability'] = ping
|
1181
|
+
end
|
1182
|
+
end
|
1183
|
+
#
|
1184
|
+
break
|
1185
|
+
end
|
1186
|
+
rescue Interrupt
|
1187
|
+
rescue SystemExit
|
1188
|
+
exit 1
|
1189
|
+
rescue Exception
|
1190
|
+
end
|
1191
|
+
end
|
1192
|
+
def state_element #(opt=nil)
|
1193
|
+
state = Element.new 'state'
|
1194
|
+
state.add_attribute('ping', @data['ping-probability']) if @data['ping-probability']
|
1195
|
+
if @data['last-session']
|
1196
|
+
age_in_seconds = @now - @data['last-session'] #Float
|
1197
|
+
age_in_days = age_in_seconds/60.0/60.0/24.0
|
1198
|
+
state.add_attribute('age', age_in_days)
|
1199
|
+
end
|
1200
|
+
# return state unless opt
|
1201
|
+
# state.add_attribute('strategy', opt.strategy)
|
1202
|
+
# state.add_attribute('order', opt.order)
|
1203
|
+
# state.add_attribute('cache', opt.size / 1_000_000) if opt.size
|
1204
|
+
# state.add_attribute('content', opt.content_type.source) if opt.content_type and opt.content_type.source.size<80
|
1205
|
+
state
|
1206
|
+
end
|
1207
|
+
end
|
1208
|
+
|
1209
|
+
class History
|
1210
|
+
def initialize(dir)
|
1211
|
+
@history = dir + "history"
|
1212
|
+
@history_old = dir + "history-old"
|
1213
|
+
unless @history.exist?
|
1214
|
+
@history_old.rename @history if @history_old.exist?
|
1215
|
+
end
|
1216
|
+
@history.open("w"){|f|}unless @history.exist?
|
1217
|
+
exit 1 unless @history.file?
|
1218
|
+
@history_old.delete if @history_old.exist?
|
1219
|
+
end
|
1220
|
+
def mark_old_content(feeds)
|
1221
|
+
feeds.each() do |feed|
|
1222
|
+
feed.each() do |content|
|
1223
|
+
content.in_history = false
|
1224
|
+
end
|
1225
|
+
end
|
1226
|
+
@history.each_line() do |url|
|
1227
|
+
url = url.chomp
|
1228
|
+
feeds.each() do |feed|
|
1229
|
+
feed.each() do |content|
|
1230
|
+
next if content.in_history
|
1231
|
+
content.in_history = content.url == url
|
1232
|
+
end
|
1233
|
+
end
|
1234
|
+
end
|
1235
|
+
end
|
1236
|
+
def add(content)
|
1237
|
+
begin
|
1238
|
+
@history.open("a") do |f|
|
1239
|
+
f.puts content.url
|
1240
|
+
end
|
1241
|
+
rescue Interrupt, SystemExit
|
1242
|
+
exit 1
|
1243
|
+
rescue Exception
|
1244
|
+
$stderr.puts "Error: history file could not be updated"
|
1245
|
+
end
|
1246
|
+
end
|
1247
|
+
def trim(limit)
|
1248
|
+
begin
|
1249
|
+
history_size = 0
|
1250
|
+
@history.each_line() do |url|
|
1251
|
+
history_size += 1
|
1252
|
+
end
|
1253
|
+
if history_size > limit #shrink
|
1254
|
+
@history_old.delete if @history_old.exist?
|
1255
|
+
@history.rename @history_old
|
1256
|
+
@history.open("w") do |f|
|
1257
|
+
@history_old.each_line() do |url|
|
1258
|
+
f.print(url) if history_size <= limit
|
1259
|
+
history_size -= 1
|
1260
|
+
end
|
1261
|
+
end
|
1262
|
+
@history_old.unlink
|
1263
|
+
end
|
1264
|
+
rescue Interrupt, SystemExit
|
1265
|
+
exit 1
|
1266
|
+
rescue Exception
|
1267
|
+
$stderr.puts "Error: failure during history file clean-up."
|
1268
|
+
end if limit
|
1269
|
+
end
|
1270
|
+
end
|
1271
|
+
|
1272
|
+
class Cache
|
1273
|
+
def initialize(opt)
|
1274
|
+
super()
|
1275
|
+
@opt = opt
|
1276
|
+
@@TORRENT = "application/x-bittorrent"
|
1277
|
+
@@MEDIA_RSS_NS = ['http://search.yahoo.com/mrss/']
|
1278
|
+
@@MEDIA_RSS_NS << 'http://search.yahoo.com/mrss'
|
1279
|
+
@@ATOM_NS = Regexp.new "^http://purl.org/atom/ns#"
|
1280
|
+
#history
|
1281
|
+
@history = History.new opt.dir
|
1282
|
+
#stats
|
1283
|
+
@stats = Stats.new opt.dir
|
1284
|
+
#cache
|
1285
|
+
@cache_dir = opt.cachedir #opt.dir+"cache"
|
1286
|
+
@cache_dir.mkdir() unless @cache_dir.exist?
|
1287
|
+
exit 1 unless @cache_dir.directory?
|
1288
|
+
@cache_dir.each_entry() do |e|
|
1289
|
+
e = @cache_dir+e
|
1290
|
+
e = e.cleanpath
|
1291
|
+
next if e == @cache_dir or e == @cache_dir.parent
|
1292
|
+
if e.directory? #feed subfolder
|
1293
|
+
e.each_entry() do |e2|
|
1294
|
+
e2 = e+e2
|
1295
|
+
next if e2.directory?
|
1296
|
+
if opt.empty
|
1297
|
+
unless opt.simulate or opt.strategy == :cache
|
1298
|
+
$stderr.puts "Deleting: #{e2}" if opt.verbose
|
1299
|
+
e2.delete
|
1300
|
+
end
|
1301
|
+
end
|
1302
|
+
end
|
1303
|
+
e.delete if e.entries.size == 2
|
1304
|
+
elsif opt.empty
|
1305
|
+
unless opt.simulate or opt.strategy == :cache
|
1306
|
+
$stderr.puts "Deleting: #{e}" if opt.verbose
|
1307
|
+
e.delete
|
1308
|
+
end
|
1309
|
+
end
|
1310
|
+
end
|
1311
|
+
@cache = @cache_dir.entries.collect() do |e|
|
1312
|
+
e = @cache_dir+e
|
1313
|
+
e = e.cleanpath
|
1314
|
+
next if e == @cache_dir or e == @cache_dir.parent
|
1315
|
+
if e.file?
|
1316
|
+
content = OpenStruct.new
|
1317
|
+
content.file = e
|
1318
|
+
content.size = e.size
|
1319
|
+
content.title = e.to_s
|
1320
|
+
content
|
1321
|
+
elsif e.directory?
|
1322
|
+
e.entries.collect() do |e2|
|
1323
|
+
e2 = e+e2
|
1324
|
+
if e2.file?
|
1325
|
+
content = OpenStruct.new
|
1326
|
+
content.file = e2
|
1327
|
+
content.size = e2.size
|
1328
|
+
content.title = e2.to_s
|
1329
|
+
content
|
1330
|
+
else
|
1331
|
+
nil
|
1332
|
+
end
|
1333
|
+
end
|
1334
|
+
else
|
1335
|
+
nil
|
1336
|
+
end
|
1337
|
+
end
|
1338
|
+
@cache.flatten!
|
1339
|
+
@cache.compact!
|
1340
|
+
@cache.sort!() do |e,e2|
|
1341
|
+
e.file.mtime() <=> e2.file.mtime()
|
1342
|
+
end
|
1343
|
+
end
|
1344
|
+
def createplaylist(urls)
|
1345
|
+
playlist = Playlist.new @opt.playlist_type
|
1346
|
+
if @opt.strategy == :cache
|
1347
|
+
playlist.start
|
1348
|
+
@cache.reverse!
|
1349
|
+
@cache.each() do |content|
|
1350
|
+
playlist.add content
|
1351
|
+
end
|
1352
|
+
playlist.finish
|
1353
|
+
return playlist.to_s
|
1354
|
+
end
|
1355
|
+
playlist.start
|
1356
|
+
doc = nil
|
1357
|
+
if urls.size == 0
|
1358
|
+
$stderr.puts "Reading document from standard input" if @opt.verbose
|
1359
|
+
begin
|
1360
|
+
xml = ""
|
1361
|
+
$stdin.each() do |e|
|
1362
|
+
xml += e
|
1363
|
+
end
|
1364
|
+
doc = OpenStruct.new
|
1365
|
+
doc.dom = Document.new(xml)
|
1366
|
+
doc = nil unless doc.dom
|
1367
|
+
rescue Interrupt, SystemExit
|
1368
|
+
exit 1
|
1369
|
+
rescue Exception
|
1370
|
+
$stderr.puts "Error: unreadable document"
|
1371
|
+
doc = nil
|
1372
|
+
end
|
1373
|
+
end
|
1374
|
+
dochistory = []
|
1375
|
+
feeds = []
|
1376
|
+
urls.uniq!
|
1377
|
+
links = urls.collect() do |e|
|
1378
|
+
l = OpenStruct.new
|
1379
|
+
l.url = e
|
1380
|
+
l
|
1381
|
+
end
|
1382
|
+
loop do
|
1383
|
+
break if @opt.feeds and feeds.size >= @opt.feeds
|
1384
|
+
while not doc
|
1385
|
+
link = links.shift
|
1386
|
+
break unless link
|
1387
|
+
if dochistory.detect{|e| e == link.url}
|
1388
|
+
$stderr.puts "Skipping duplicate: #{link.url}" if @opt.verbose
|
1389
|
+
next
|
1390
|
+
end
|
1391
|
+
$stderr.puts "Fetching: #{link.url}" if @opt.verbose
|
1392
|
+
dochistory << link.url
|
1393
|
+
begin
|
1394
|
+
doc = fetchdoc(link)
|
1395
|
+
rescue Interrupt, SystemExit
|
1396
|
+
exit 1
|
1397
|
+
rescue Exception
|
1398
|
+
$stderr.puts "Error: skipping unreadable document"
|
1399
|
+
end
|
1400
|
+
end
|
1401
|
+
break unless doc
|
1402
|
+
begin
|
1403
|
+
if doc.dom.root.name == "opml"
|
1404
|
+
newlinks = []
|
1405
|
+
outlines = []
|
1406
|
+
doc.dom.elements.each("/opml/body") do |body|
|
1407
|
+
body.elements.each() do |e|
|
1408
|
+
next unless e.name == 'outline'
|
1409
|
+
outlines << e
|
1410
|
+
end
|
1411
|
+
end
|
1412
|
+
while outlines.size>0
|
1413
|
+
outline = outlines.shift
|
1414
|
+
url = outline.attributes["xmlUrl"]
|
1415
|
+
url = outline.attributes["url"] unless url
|
1416
|
+
if url
|
1417
|
+
begin
|
1418
|
+
url = URI.parse(doc.url).merge(url).to_s if doc.url
|
1419
|
+
link = OpenStruct.new
|
1420
|
+
link.url = url
|
1421
|
+
link.referrer = doc.url
|
1422
|
+
newlinks << link
|
1423
|
+
rescue URI::InvalidURIError
|
1424
|
+
end
|
1425
|
+
next
|
1426
|
+
end
|
1427
|
+
new_outlines = []
|
1428
|
+
outline.elements.each() do |e|
|
1429
|
+
next unless e.name == 'outline'
|
1430
|
+
new_outlines << e
|
1431
|
+
end
|
1432
|
+
outlines = new_outlines + outlines
|
1433
|
+
end
|
1434
|
+
links = newlinks + links
|
1435
|
+
elsif doc.dom.root.name == "pcast"
|
1436
|
+
newlinks = []
|
1437
|
+
XPath.each(doc.dom,"//link[@rel='feed']") do |outline|
|
1438
|
+
url = outline.attributes["href"]
|
1439
|
+
next unless url
|
1440
|
+
begin
|
1441
|
+
url = URI.parse(doc.url).merge(url).to_s if doc.url
|
1442
|
+
link = OpenStruct.new
|
1443
|
+
link.url = url
|
1444
|
+
link.referrer = doc.url
|
1445
|
+
newlinks << link
|
1446
|
+
rescue URI::InvalidURIError
|
1447
|
+
end
|
1448
|
+
end
|
1449
|
+
links = newlinks + links
|
1450
|
+
elsif doc.dom.root.namespace =~ @@ATOM_NS
|
1451
|
+
feed = []
|
1452
|
+
XPath.each(doc.dom.root,"//*[@rel='enclosure']") do |e2|
|
1453
|
+
next unless e2.namespace =~ @@ATOM_NS
|
1454
|
+
content = OpenStruct.new
|
1455
|
+
XPath.each(e2,"parent::/title/text()") do |node|
|
1456
|
+
content.title = ""
|
1457
|
+
node.value.each_line() do |e3| #remove line breaks
|
1458
|
+
content.title+= e3.chomp+" "
|
1459
|
+
end
|
1460
|
+
content.title.strip!
|
1461
|
+
end
|
1462
|
+
XPath.each(e2,"parent::/created/text()") do |node|
|
1463
|
+
pub_date = ""
|
1464
|
+
node.value.each_line() do |e3| #remove line breaks
|
1465
|
+
pub_date+= e3.chomp+" "
|
1466
|
+
end
|
1467
|
+
begin
|
1468
|
+
content.pub_date = DateTime.parse(pub_date.strip, true)
|
1469
|
+
rescue Exception
|
1470
|
+
end
|
1471
|
+
end
|
1472
|
+
content.mime = e2.attributes["type"].downcase
|
1473
|
+
next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
|
1474
|
+
next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
|
1475
|
+
content.feedurl = doc.url
|
1476
|
+
begin
|
1477
|
+
content.url = URI.parse(content.feedurl).merge(e2.attributes["href"]).to_s if content.feedurl
|
1478
|
+
content.size = e2.attributes["length"].to_i
|
1479
|
+
content.size = 2 unless content.size and content.size>0
|
1480
|
+
content.size = 0 if content.mime == @@TORRENT #not strictly necessary
|
1481
|
+
feed << content
|
1482
|
+
rescue URI::InvalidURIError
|
1483
|
+
end
|
1484
|
+
end
|
1485
|
+
#sort by date
|
1486
|
+
feed.sort!() do |a,b|
|
1487
|
+
if a.pub_date
|
1488
|
+
if b.pub_date
|
1489
|
+
b.pub_date <=> a.pub_date
|
1490
|
+
else
|
1491
|
+
-1
|
1492
|
+
end
|
1493
|
+
else
|
1494
|
+
if b.pub_date
|
1495
|
+
1
|
1496
|
+
else
|
1497
|
+
0
|
1498
|
+
end
|
1499
|
+
end
|
1500
|
+
end
|
1501
|
+
feed.each() do |content|
|
1502
|
+
$stderr.puts "Enclosure: #{content.url}"
|
1503
|
+
end if @opt.verbose
|
1504
|
+
#title
|
1505
|
+
node = XPath.first(doc.dom,"/feed/title/text()")
|
1506
|
+
feed_title = ""
|
1507
|
+
node.value.each_line() do |e3| #remove line breaks
|
1508
|
+
feed_title += e3.chomp+" "
|
1509
|
+
end
|
1510
|
+
feed_title.strip!
|
1511
|
+
feed.each() do |content|
|
1512
|
+
content.feed_title = feed_title
|
1513
|
+
end
|
1514
|
+
#
|
1515
|
+
feeds << feed
|
1516
|
+
elsif doc.dom.root.name = "rss"
|
1517
|
+
feed = []
|
1518
|
+
doc.dom.root.elements.each() do |e| #channel
|
1519
|
+
e.elements.each() do |e1| #item
|
1520
|
+
title = ''
|
1521
|
+
XPath.each(e1,"title/text()") do |node|
|
1522
|
+
title = ''
|
1523
|
+
node.value.each_line() do |e3| #remove line breaks
|
1524
|
+
title+= e3.chomp+" "
|
1525
|
+
end
|
1526
|
+
title.strip!
|
1527
|
+
end
|
1528
|
+
pub_date = nil
|
1529
|
+
XPath.each(e1,"pubDate/text()") do |node|
|
1530
|
+
pub_date = ""
|
1531
|
+
node.value.each_line() do |e3| #remove line breaks
|
1532
|
+
pub_date+= e3.chomp+" "
|
1533
|
+
end
|
1534
|
+
begin
|
1535
|
+
pub_date = DateTime.parse(pub_date.strip, true)
|
1536
|
+
rescue Exception
|
1537
|
+
pub_date = nil
|
1538
|
+
end
|
1539
|
+
end
|
1540
|
+
e1.elements.each() do |e2|
|
1541
|
+
if e2.name == "enclosure"
|
1542
|
+
content = OpenStruct.new
|
1543
|
+
content.title = title
|
1544
|
+
content.pub_date = pub_date
|
1545
|
+
content.mime = e2.attributes["type"].downcase
|
1546
|
+
next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
|
1547
|
+
next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
|
1548
|
+
content.feedurl = doc.url
|
1549
|
+
begin
|
1550
|
+
content.url = URI.parse(content.feedurl).merge(e2.attributes["url"]).to_s if content.feedurl
|
1551
|
+
content.size = e2.attributes["length"].to_i
|
1552
|
+
content.size = 2 unless content.size and content.size>0
|
1553
|
+
content.size = 0 if content.mime == @@TORRENT #not strictly necessary
|
1554
|
+
feed << content
|
1555
|
+
rescue URI::InvalidURIError
|
1556
|
+
end
|
1557
|
+
elsif @@MEDIA_RSS_NS.include? e2.namespace
|
1558
|
+
case e2.name
|
1559
|
+
when 'content'
|
1560
|
+
content = OpenStruct.new
|
1561
|
+
content.title = title
|
1562
|
+
content.pub_date = pub_date
|
1563
|
+
content.mime = e2.attributes["type"].downcase
|
1564
|
+
next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
|
1565
|
+
next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
|
1566
|
+
content.feedurl = doc.url
|
1567
|
+
begin
|
1568
|
+
content.url = URI.parse(content.feedurl).merge(e2.attributes["url"]).to_s if content.feedurl
|
1569
|
+
content.size = e2.attributes["fileSize"].to_i
|
1570
|
+
content.size = 2 unless content.size and content.size>0
|
1571
|
+
content.size = 0 if content.mime == @@TORRENT #not strictly necessary
|
1572
|
+
feed << content
|
1573
|
+
rescue URI::InvalidURIError
|
1574
|
+
end
|
1575
|
+
when 'group'
|
1576
|
+
e2.elements.each() do |e4|
|
1577
|
+
if e4.name == 'content' and @@MEDIA_RSS_NS.include?(e4.namespace)
|
1578
|
+
content = OpenStruct.new
|
1579
|
+
content.title = title
|
1580
|
+
content.pub_date = pub_date
|
1581
|
+
content.mime = e4.attributes["type"].downcase
|
1582
|
+
next if @opt.content_type !~ content.mime and content.mime != @@TORRENT
|
1583
|
+
next if content.mime == @@TORRENT and not (@opt.torrent_dir or @opt.rubytorrent)
|
1584
|
+
content.feedurl = doc.url
|
1585
|
+
begin
|
1586
|
+
content.url = URI.parse(content.feedurl).merge(e4.attributes["url"]).to_s if content.feedurl
|
1587
|
+
content.size = e4.attributes["fileSize"].to_i
|
1588
|
+
content.size = 2 unless content.size and content.size>0
|
1589
|
+
content.size = 0 if content.mime == @@TORRENT #not strictly necessary
|
1590
|
+
feed << content
|
1591
|
+
rescue URI::InvalidURIError
|
1592
|
+
end
|
1593
|
+
break
|
1594
|
+
end
|
1595
|
+
end
|
1596
|
+
end
|
1597
|
+
|
1598
|
+
end
|
1599
|
+
end if e1.name == "item"
|
1600
|
+
end if e.name == "channel"
|
1601
|
+
end
|
1602
|
+
#remove duplicates (duplication occurs in particular for content declared as both enclosure and Media RSS content)
|
1603
|
+
for i in 0...feed.size
|
1604
|
+
content = feed[i]
|
1605
|
+
next unless content
|
1606
|
+
for j in i+1...feed.size
|
1607
|
+
next unless feed[j]
|
1608
|
+
feed[j] = nil if feed[j].url == content.url
|
1609
|
+
end
|
1610
|
+
end
|
1611
|
+
feed.compact!
|
1612
|
+
#sort by date
|
1613
|
+
feed.sort!() do |a,b|
|
1614
|
+
if a.pub_date
|
1615
|
+
if b.pub_date
|
1616
|
+
b.pub_date <=> a.pub_date
|
1617
|
+
else
|
1618
|
+
-1
|
1619
|
+
end
|
1620
|
+
else
|
1621
|
+
if b.pub_date
|
1622
|
+
1
|
1623
|
+
else
|
1624
|
+
0
|
1625
|
+
end
|
1626
|
+
end
|
1627
|
+
end
|
1628
|
+
feed.each() do |content|
|
1629
|
+
$stderr.puts "Enclosure: #{content.url}"
|
1630
|
+
end if @opt.verbose
|
1631
|
+
#title
|
1632
|
+
node = XPath.first(doc.dom,"//channel/title/text()")
|
1633
|
+
feed_title = ""
|
1634
|
+
node.value.each_line() do |e3| #remove line breaks
|
1635
|
+
feed_title += e3.chomp+" "
|
1636
|
+
end
|
1637
|
+
feed_title.strip!
|
1638
|
+
feed.each() do |content|
|
1639
|
+
content.feed_title = feed_title
|
1640
|
+
end
|
1641
|
+
#language
|
1642
|
+
if @opt.language.size > 0
|
1643
|
+
loop do
|
1644
|
+
node = XPath.first doc.dom, '//channel/language/text()'
|
1645
|
+
break unless node
|
1646
|
+
break unless node.value
|
1647
|
+
feed_lang = node.value.strip.downcase.split '-'
|
1648
|
+
break if feed_lang.size == 0
|
1649
|
+
langmatch = @opt.language.collect() do |lang|
|
1650
|
+
next false if feed_lang.size < lang.size
|
1651
|
+
matches = true
|
1652
|
+
for i in 0...lang.size
|
1653
|
+
next if lang[i] == feed_lang[i]
|
1654
|
+
matches = false
|
1655
|
+
end
|
1656
|
+
matches
|
1657
|
+
end
|
1658
|
+
feeds << feed if langmatch.include? true
|
1659
|
+
break
|
1660
|
+
end
|
1661
|
+
else
|
1662
|
+
feeds << feed
|
1663
|
+
end
|
1664
|
+
end
|
1665
|
+
rescue Interrupt, SystemExit
|
1666
|
+
exit 1
|
1667
|
+
rescue Exception
|
1668
|
+
$stderr.puts "Error: skipping document because of an internal error #{$@}"
|
1669
|
+
end
|
1670
|
+
doc = nil
|
1671
|
+
end
|
1672
|
+
#remove content older than the horizon date
|
1673
|
+
if @opt.horizon
|
1674
|
+
feeds.each() do |feed|
|
1675
|
+
for i in 0...feed.size
|
1676
|
+
if feed[i].pub_date
|
1677
|
+
feed[i] = nil if feed[i].pub_date < @opt.horizon
|
1678
|
+
else
|
1679
|
+
feed[i] = nil
|
1680
|
+
end
|
1681
|
+
end
|
1682
|
+
feed.compact!
|
1683
|
+
end
|
1684
|
+
end
|
1685
|
+
#apply download strategy
|
1686
|
+
@history.mark_old_content feeds
|
1687
|
+
if @opt.strategy == :chron or @opt.strategy == :chron_one or @opt.strategy == :chron_all
|
1688
|
+
feeds.each() do |feed|
|
1689
|
+
feed.reverse!
|
1690
|
+
end
|
1691
|
+
@opt.strategy = :back_catalog if @opt.strategy == :chron
|
1692
|
+
@opt.strategy = :one if @opt.strategy == :chron_one
|
1693
|
+
@opt.strategy = :all if @opt.strategy == :chron_all
|
1694
|
+
end
|
1695
|
+
case @opt.strategy #remove ignored content
|
1696
|
+
when :new
|
1697
|
+
feeds.each() do |feed|
|
1698
|
+
in_hist = nil
|
1699
|
+
for i in 0...feed.size
|
1700
|
+
if feed[i].in_history
|
1701
|
+
in_hist = i
|
1702
|
+
break
|
1703
|
+
end
|
1704
|
+
end
|
1705
|
+
feed.slice! in_hist...feed.size if in_hist
|
1706
|
+
end
|
1707
|
+
when :all
|
1708
|
+
else
|
1709
|
+
feeds.each() do |feed|
|
1710
|
+
for i in 0...feed.size
|
1711
|
+
feed[i] = nil if feed[i].in_history
|
1712
|
+
end
|
1713
|
+
feed.compact!
|
1714
|
+
end
|
1715
|
+
end
|
1716
|
+
if @opt.strategy == :new or @opt.strategy == :one
|
1717
|
+
feeds.each() do |feed|
|
1718
|
+
itemsize = 0
|
1719
|
+
index = nil
|
1720
|
+
for i in 0...feed.size
|
1721
|
+
itemsize += feed[i].size
|
1722
|
+
if itemsize >= @opt.itemsize
|
1723
|
+
index = i+1
|
1724
|
+
break
|
1725
|
+
end
|
1726
|
+
end
|
1727
|
+
feed.slice! index...feed.size if index
|
1728
|
+
end
|
1729
|
+
end
|
1730
|
+
#feed order
|
1731
|
+
case @opt.order
|
1732
|
+
when :random
|
1733
|
+
srand
|
1734
|
+
feeds.sort!() do |a,b|
|
1735
|
+
if a.size>0
|
1736
|
+
if b.size>0
|
1737
|
+
rand(3)-1
|
1738
|
+
else
|
1739
|
+
-1
|
1740
|
+
end
|
1741
|
+
else
|
1742
|
+
if b.size>0
|
1743
|
+
1
|
1744
|
+
else
|
1745
|
+
0
|
1746
|
+
end
|
1747
|
+
end
|
1748
|
+
end
|
1749
|
+
when :alphabetical
|
1750
|
+
feeds.sort!() do |a,b|
|
1751
|
+
if a.size>0
|
1752
|
+
if b.size>0
|
1753
|
+
a[0].feed_title <=> b[0].feed_title
|
1754
|
+
else
|
1755
|
+
-1
|
1756
|
+
end
|
1757
|
+
else
|
1758
|
+
if b.size>0
|
1759
|
+
1
|
1760
|
+
else
|
1761
|
+
0
|
1762
|
+
end
|
1763
|
+
end
|
1764
|
+
end
|
1765
|
+
when :reverse
|
1766
|
+
feeds.reverse!
|
1767
|
+
end
|
1768
|
+
#remove duplicate content
|
1769
|
+
feeds.each() do |feed|
|
1770
|
+
feed.each() do |content|
|
1771
|
+
next unless content
|
1772
|
+
dup = false
|
1773
|
+
feeds.each() do |f|
|
1774
|
+
for i in 0...f.size
|
1775
|
+
next unless f[i]
|
1776
|
+
if f[i].url == content.url
|
1777
|
+
f[i] = nil if dup
|
1778
|
+
dup = true
|
1779
|
+
end
|
1780
|
+
$stderr.puts "Removed duplicate: #{content.url}" unless f[i] or (not @opt.verbose)
|
1781
|
+
end
|
1782
|
+
end
|
1783
|
+
end
|
1784
|
+
feed.compact!
|
1785
|
+
end
|
1786
|
+
#send usage statistics
|
1787
|
+
@stats.ping @opt, feeds
|
1788
|
+
#fetch torrent metainfo files
|
1789
|
+
feeds.each() do |feed|
|
1790
|
+
feed.each() do |content|
|
1791
|
+
next if content.mime != @@TORRENT
|
1792
|
+
content.mime = nil
|
1793
|
+
begin
|
1794
|
+
$stderr.puts "Fetching torrent metainfo: #{content.url}" if @opt.verbose
|
1795
|
+
content.metainfo = RubyTorrent::MetaInfo.from_location content.url
|
1796
|
+
content.size = content.metainfo.info.length
|
1797
|
+
content.mime = case content.metainfo.info.name.downcase
|
1798
|
+
when /\.mp3$/
|
1799
|
+
"audio/mpeg"
|
1800
|
+
when /\.wma$/
|
1801
|
+
"audio/x-ms-wma"
|
1802
|
+
when /\.mpg$|\.mpeg$|\.mpe$|\.mpa$|\.mp2$|\.mpv2$/
|
1803
|
+
"video/mpeg"
|
1804
|
+
when /\.mov$|\.qt$/
|
1805
|
+
"video/quicktime"
|
1806
|
+
when /\.avi$/
|
1807
|
+
"video/x-msvideo"
|
1808
|
+
when /\.wmv$/
|
1809
|
+
"video/x-ms-wmv"
|
1810
|
+
when /\.asf$/
|
1811
|
+
"video/x-ms-asf"
|
1812
|
+
when /\.m4v$|\.mp4$|\.mpg4$/
|
1813
|
+
"video/mp4"
|
1814
|
+
else
|
1815
|
+
nil
|
1816
|
+
end
|
1817
|
+
content.url = nil unless content.mime
|
1818
|
+
content.url = nil unless (@opt.content_type =~ content.mime)
|
1819
|
+
content.url = nil unless content.metainfo.info.single?
|
1820
|
+
rescue Interrupt
|
1821
|
+
content.url = nil
|
1822
|
+
$stderr.puts "Error: unreadable torrent metainfo" if @opt.verbose
|
1823
|
+
rescue SystemExit
|
1824
|
+
exit 1
|
1825
|
+
rescue Exception
|
1826
|
+
content.url = nil
|
1827
|
+
$stderr.puts "Error: unreadable torrent metainfo" if @opt.verbose
|
1828
|
+
end
|
1829
|
+
end
|
1830
|
+
for i in 0...feed.size
|
1831
|
+
feed[i] = nil unless feed[i].url
|
1832
|
+
end
|
1833
|
+
feed.compact!
|
1834
|
+
end
|
1835
|
+
#fetch enclosures
|
1836
|
+
item = total = 0
|
1837
|
+
@cache.each() do |e|
|
1838
|
+
total+= e.size
|
1839
|
+
end
|
1840
|
+
torrents = []
|
1841
|
+
torrentfiles = []
|
1842
|
+
inc = 1
|
1843
|
+
while inc>0
|
1844
|
+
inc = 0
|
1845
|
+
itemsize = 0
|
1846
|
+
feeds.each do |e|
|
1847
|
+
#find next enclosure in feed
|
1848
|
+
content = e.shift
|
1849
|
+
unless content
|
1850
|
+
itemsize = 0
|
1851
|
+
next
|
1852
|
+
end
|
1853
|
+
#make place in cache
|
1854
|
+
while @opt.size and content.size+inc+total > @opt.size
|
1855
|
+
break if @opt.simulate
|
1856
|
+
f = @cache.shift
|
1857
|
+
break unless f
|
1858
|
+
total-= f.size
|
1859
|
+
parent = f.file.parent
|
1860
|
+
$stderr.puts "Deleting: #{f.file}" if @opt.verbose
|
1861
|
+
f.file.delete
|
1862
|
+
if parent.parent != @opt.dir and parent.entries.size == 2
|
1863
|
+
#delete empty feed subfolder
|
1864
|
+
$stderr.puts "Deleting: #{parent}" if @opt.verbose
|
1865
|
+
parent.delete
|
1866
|
+
end
|
1867
|
+
end
|
1868
|
+
unless @opt.simulate
|
1869
|
+
break if @opt.size and content.size+inc+total > @opt.size
|
1870
|
+
end
|
1871
|
+
#download
|
1872
|
+
1.upto(@opt.retries) do |i|
|
1873
|
+
begin
|
1874
|
+
if content.metainfo
|
1875
|
+
if @opt.torrent_dir
|
1876
|
+
loop do
|
1877
|
+
content.file = @opt.torrent_dir+(Time.now.to_f.to_s+".torrent")
|
1878
|
+
break unless content.file.exist?
|
1879
|
+
sleep 1
|
1880
|
+
end
|
1881
|
+
$stderr.puts "Copying: #{content.url} to #{content.file}" if @opt.verbose and i == 1
|
1882
|
+
if not @opt.simulate
|
1883
|
+
if content.feedurl and (content.feedurl =~ %r{^http:} or content.feedurl =~ %r{^ftp:})
|
1884
|
+
open(content.url, "User-Agent" => USER_AGENT, "Referer" => content.feedurl) do |fin|
|
1885
|
+
content.file.open("wb") do |fout|
|
1886
|
+
fin.each_byte() do |b|
|
1887
|
+
fout.putc b
|
1888
|
+
end
|
1889
|
+
end
|
1890
|
+
end
|
1891
|
+
else
|
1892
|
+
open(content.url, "User-Agent" => USER_AGENT) do |fin|
|
1893
|
+
content.file.open("wb") do |fout|
|
1894
|
+
fin.each_byte() do |b|
|
1895
|
+
fout.putc b
|
1896
|
+
end
|
1897
|
+
end
|
1898
|
+
end
|
1899
|
+
end
|
1900
|
+
end
|
1901
|
+
else
|
1902
|
+
$stderr.puts "Fetching in background: #{content.url}" if @opt.verbose and i == 1
|
1903
|
+
unless @opt.simulate
|
1904
|
+
content.file = filename(content, @cache_dir)
|
1905
|
+
package = RubyTorrent::Package.new content.metainfo, content.file.to_s
|
1906
|
+
bt = RubyTorrent::BitTorrent.new content.metainfo, package, :dlratelim => nil, :ulratelim => @opt.upload_rate, :http_proxy => ENV["http_proxy"]
|
1907
|
+
torrents << bt
|
1908
|
+
torrentfiles << content
|
1909
|
+
end
|
1910
|
+
inc+= content.size
|
1911
|
+
itemsize+= content.size
|
1912
|
+
end
|
1913
|
+
else
|
1914
|
+
$stderr.puts "Fetching: #{content.url} (#{content.size.to_s} bytes)" if @opt.verbose and i == 1
|
1915
|
+
if not @opt.simulate
|
1916
|
+
headers = {"User-Agent" => USER_AGENT}
|
1917
|
+
headers["Referer"] = content.feedurl if content.feedurl and (content.feedurl =~ %r{^http:} or content.feedurl =~ %r{^ftp:})
|
1918
|
+
content.download_url = content.url unless content.download_url
|
1919
|
+
open(content.download_url, headers) do |fin|
|
1920
|
+
if fin.base_uri.instance_of?(URI::HTTP)
|
1921
|
+
if fin.status[0] =~ Regexp.new('^3')
|
1922
|
+
content.download_url = fin.meta['location']
|
1923
|
+
raise "redirecting"
|
1924
|
+
elsif fin.status[0] !~ Regexp.new('^2')
|
1925
|
+
raise 'failed'
|
1926
|
+
end
|
1927
|
+
end
|
1928
|
+
# write content to cache
|
1929
|
+
content.redirection_url = fin.base_uri.to_s # content.redirection_url is used for finding the correct filename in case of redirection
|
1930
|
+
content.redirection_url = nil if content.redirection_url.eql?(content.url)
|
1931
|
+
content.file = filename(content, @cache_dir)
|
1932
|
+
content.file.open("wb") do |fout|
|
1933
|
+
fin.each_byte() do |b|
|
1934
|
+
fout.putc b
|
1935
|
+
end
|
1936
|
+
end
|
1937
|
+
end
|
1938
|
+
content.size = content.file.size
|
1939
|
+
@history.add content
|
1940
|
+
end
|
1941
|
+
playlist.add(content)
|
1942
|
+
inc+= content.size
|
1943
|
+
itemsize+= content.size
|
1944
|
+
end
|
1945
|
+
break
|
1946
|
+
rescue Interrupt
|
1947
|
+
rescue SystemExit
|
1948
|
+
exit 1
|
1949
|
+
rescue Exception
|
1950
|
+
end
|
1951
|
+
$stderr.puts "Attempt #{i} aborted" if @opt.verbose
|
1952
|
+
if content.file and i == @opt.retries
|
1953
|
+
if content.file.exist?
|
1954
|
+
parent = content.file.parent
|
1955
|
+
content.file.delete
|
1956
|
+
if parent.parent != @opt.dir and parent.entries.size == 2
|
1957
|
+
#delete empty feed subfolder
|
1958
|
+
parent.delete
|
1959
|
+
end
|
1960
|
+
end
|
1961
|
+
content.file = nil
|
1962
|
+
end
|
1963
|
+
sleep 5
|
1964
|
+
end
|
1965
|
+
redo unless content.file # skip unavailable enclosures
|
1966
|
+
redo if @opt.itemsize > itemsize
|
1967
|
+
itemsize = 0
|
1968
|
+
end
|
1969
|
+
total+=inc
|
1970
|
+
end
|
1971
|
+
#shut down torrents
|
1972
|
+
if torrents.length > 0
|
1973
|
+
$stderr.puts "Fetching torrents (duration: 30min to a couple of hours) " if @opt.verbose
|
1974
|
+
bt = torrents[0]
|
1975
|
+
completion = torrents.collect() do |e|
|
1976
|
+
e.percent_completed
|
1977
|
+
end
|
1978
|
+
while torrents.length > 0
|
1979
|
+
sleep 30*60
|
1980
|
+
for i in 0...torrents.length
|
1981
|
+
c = torrents[i].percent_completed
|
1982
|
+
complete = torrents[i].complete?
|
1983
|
+
$stderr.puts "Fetched: #{c}% of #{torrentfiles[i].url} " if @opt.verbose
|
1984
|
+
if complete or c == completion[i]
|
1985
|
+
begin
|
1986
|
+
torrents[i].shutdown
|
1987
|
+
rescue SystemExit
|
1988
|
+
exit 1
|
1989
|
+
rescue Interrupt, Exception
|
1990
|
+
end
|
1991
|
+
if complete
|
1992
|
+
playlist.add(torrentfiles[i])
|
1993
|
+
@history.add torrentfiles[i]
|
1994
|
+
else
|
1995
|
+
$stderr.puts "Aborted: #{torrentfiles[i].url}" if @opt.verbose
|
1996
|
+
begin
|
1997
|
+
torrentfiles[i].file.delete if torrentfiles[i].file.exist?
|
1998
|
+
torrentfiles[i] = nil
|
1999
|
+
rescue Interrupt, SystemExit
|
2000
|
+
exit 1
|
2001
|
+
rescue Exception
|
2002
|
+
end
|
2003
|
+
end
|
2004
|
+
torrents[i] = nil
|
2005
|
+
torrentfiles[i] = nil
|
2006
|
+
completion[i] = nil
|
2007
|
+
next
|
2008
|
+
end
|
2009
|
+
completion[i] = c
|
2010
|
+
end
|
2011
|
+
torrents.compact!
|
2012
|
+
torrentfiles.compact!
|
2013
|
+
completion.compact!
|
2014
|
+
end
|
2015
|
+
begin
|
2016
|
+
bt.shutdown_all
|
2017
|
+
rescue Interrupt, SystemExit
|
2018
|
+
exit 1
|
2019
|
+
rescue Exception
|
2020
|
+
end
|
2021
|
+
$stderr.puts "BitTorrent stopped" if @opt.verbose
|
2022
|
+
end
|
2023
|
+
playlist.finish
|
2024
|
+
@history.trim(@opt.memsize) unless @opt.simulate or @opt.strategy == :cache
|
2025
|
+
playlist.to_s
|
2026
|
+
end
|
2027
|
+
private
|
2028
|
+
def fetchdoc(link)
|
2029
|
+
doc = ""
|
2030
|
+
1.upto(@opt.retries) do |i|
|
2031
|
+
begin
|
2032
|
+
if link.url =~ %r{^http:} or link.url =~ %r{^ftp:}
|
2033
|
+
if link.referrer and (link.referrer =~ %r{^http:} or link.referrer =~ %r{^ftp:})
|
2034
|
+
open(link.url, "User-Agent" => USER_AGENT, "Referer" => link.referrer) do |f|
|
2035
|
+
break if f.content_type.index "audio/"
|
2036
|
+
break if f.content_type.index "video/"
|
2037
|
+
f.each_line() do |e|
|
2038
|
+
doc += e
|
2039
|
+
end
|
2040
|
+
end
|
2041
|
+
else
|
2042
|
+
open(link.url, "User-Agent" => USER_AGENT) do |f|
|
2043
|
+
break if f.content_type.index "audio/"
|
2044
|
+
break if f.content_type.index "video/"
|
2045
|
+
f.each_line() do |e|
|
2046
|
+
doc += e
|
2047
|
+
end
|
2048
|
+
end
|
2049
|
+
end
|
2050
|
+
else
|
2051
|
+
open(link.url) do |f|
|
2052
|
+
f.each_line() do |e|
|
2053
|
+
doc += e
|
2054
|
+
end
|
2055
|
+
end
|
2056
|
+
end
|
2057
|
+
break
|
2058
|
+
rescue Interrupt
|
2059
|
+
rescue SystemExit
|
2060
|
+
exit 1
|
2061
|
+
rescue Exception
|
2062
|
+
end
|
2063
|
+
$stderr.puts "Attempt #{i} aborted" if @opt.verbose
|
2064
|
+
doc = ""
|
2065
|
+
sleep 5
|
2066
|
+
end
|
2067
|
+
res = OpenStruct.new
|
2068
|
+
begin
|
2069
|
+
res.dom = Document.new doc
|
2070
|
+
rescue Exception
|
2071
|
+
end
|
2072
|
+
if res.dom
|
2073
|
+
res.url = link.url
|
2074
|
+
else
|
2075
|
+
res = nil
|
2076
|
+
end
|
2077
|
+
res
|
2078
|
+
end
|
2079
|
+
def filename(content, dir) #produce filename for content to be downloaded
|
2080
|
+
begin #per-feed subfolder
|
2081
|
+
if @opt.per_feed and content.feed_title and content.feed_title.size > 0
|
2082
|
+
newdir = dir+content.feed_title
|
2083
|
+
newdir = dir+content.feed_title.gsub(/[\\\/:*?\"<>|!]/, ' ').gsub(/-+/,'-').gsub(/\s+/,' ').strip if @opt.restricted_names
|
2084
|
+
if newdir.exist?
|
2085
|
+
if newdir.directory?
|
2086
|
+
dir = newdir
|
2087
|
+
end
|
2088
|
+
else
|
2089
|
+
newdir.mkdir
|
2090
|
+
dir = newdir
|
2091
|
+
end
|
2092
|
+
end
|
2093
|
+
rescue Exception
|
2094
|
+
# $stderr.puts "error: #{$!}"
|
2095
|
+
end
|
2096
|
+
ext = [""]
|
2097
|
+
if content.metainfo
|
2098
|
+
begin
|
2099
|
+
ext = ["."+content.metainfo.info.name.split(".").reverse[0]]
|
2100
|
+
rescue Exception
|
2101
|
+
end
|
2102
|
+
else
|
2103
|
+
ext = case content.mime.downcase
|
2104
|
+
when "audio/mpeg"
|
2105
|
+
[".mp3"]
|
2106
|
+
when "audio/x-mpeg"
|
2107
|
+
[".mp3"]
|
2108
|
+
when "audio/x-ms-wma"
|
2109
|
+
[".wma"]
|
2110
|
+
when "audio/x-m4a"
|
2111
|
+
[".m4a"]
|
2112
|
+
when "video/mpeg"
|
2113
|
+
[".mpg",".mpeg",".mpe",".mpa",".mp2",".mpv2"]
|
2114
|
+
when "video/quicktime"
|
2115
|
+
[".mov",".qt"]
|
2116
|
+
when "video/x-msvideo"
|
2117
|
+
[".avi"]
|
2118
|
+
when "video/x-ms-wmv"
|
2119
|
+
[".wmv"]
|
2120
|
+
when "video/x-ms-asf"
|
2121
|
+
[".asf"]
|
2122
|
+
when "video/mp4"
|
2123
|
+
[".m4v", ".mp4",".mpg4"]
|
2124
|
+
when "video/x-m4v"
|
2125
|
+
[".m4v", ".mp4",".mpg4"]
|
2126
|
+
else
|
2127
|
+
[""]
|
2128
|
+
end
|
2129
|
+
end
|
2130
|
+
#name from url?
|
2131
|
+
name = nil
|
2132
|
+
begin
|
2133
|
+
if content.metainfo
|
2134
|
+
name = content.metainfo.info.name
|
2135
|
+
name = nil if (dir+name).exist?
|
2136
|
+
else
|
2137
|
+
urlname = nil
|
2138
|
+
urlname = URI.split(content.redirection_url)[5].split("/")[-1] if content.redirection_url
|
2139
|
+
urlname = URI.split(content.url)[5].split("/")[-1] unless urlname
|
2140
|
+
ext.each() do |e|
|
2141
|
+
if e.length == 0 or urlname[-e.length..-1].downcase == e
|
2142
|
+
name = urlname
|
2143
|
+
name = URI.unescape(name)
|
2144
|
+
name = nil if (dir+name).exist?
|
2145
|
+
break if name
|
2146
|
+
end
|
2147
|
+
end
|
2148
|
+
end
|
2149
|
+
rescue Exception
|
2150
|
+
end
|
2151
|
+
#unique name?
|
2152
|
+
loop do
|
2153
|
+
name = Time.now.to_f.to_s+ext[0]
|
2154
|
+
break unless (dir+name).exist?
|
2155
|
+
sleep 1
|
2156
|
+
end unless name
|
2157
|
+
dir+name
|
2158
|
+
end
|
2159
|
+
end
|
2160
|
+
class OPML
|
2161
|
+
def initialize(title = nil)
|
2162
|
+
@doc = Document.new
|
2163
|
+
@doc.xml_decl.dowrite
|
2164
|
+
@doc.add_element Element.new("opml")
|
2165
|
+
@doc.root.add_attribute "version", "1.1"
|
2166
|
+
head = Element.new("head")
|
2167
|
+
@doc.root.add_element head
|
2168
|
+
if title
|
2169
|
+
titlee = Element.new("title")
|
2170
|
+
titlee.text = title
|
2171
|
+
head.add_element titlee
|
2172
|
+
end
|
2173
|
+
@body = Element.new("body")
|
2174
|
+
@doc.root.add_element @body
|
2175
|
+
@size = 0
|
2176
|
+
end
|
2177
|
+
def add(feedurl, text=nil)
|
2178
|
+
e = Element.new("outline")
|
2179
|
+
e.add_attribute("text", text) if text
|
2180
|
+
e.add_attribute "type", "link"
|
2181
|
+
e.add_attribute "url", feedurl
|
2182
|
+
@body.add_element e
|
2183
|
+
@size += 1
|
2184
|
+
end
|
2185
|
+
def write()
|
2186
|
+
@doc.write $stdout, 0
|
2187
|
+
end
|
2188
|
+
def size()
|
2189
|
+
@size
|
2190
|
+
end
|
2191
|
+
end
|
2192
|
+
|
2193
|
+
class Query
|
2194
|
+
def initialize(opt, query)
|
2195
|
+
@@ATOM_NS = Regexp.new '^http://purl.org/atom/ns#'
|
2196
|
+
@@ITUNES_NS = 'http://www.itunes.com/dtds/podcast-1.0.dtd'
|
2197
|
+
@opt = opt
|
2198
|
+
if query
|
2199
|
+
@query = query.downcase.split
|
2200
|
+
@query = nil if @query.size == 0
|
2201
|
+
end
|
2202
|
+
@stats = Stats.new opt.dir
|
2203
|
+
end
|
2204
|
+
def search(urls)
|
2205
|
+
res = []
|
2206
|
+
begin
|
2207
|
+
newpaths = []
|
2208
|
+
dochistory = []
|
2209
|
+
paths = []
|
2210
|
+
if urls.size == 0
|
2211
|
+
$stderr.puts "Reading subscriptions from standard input" if @opt.verbose
|
2212
|
+
begin
|
2213
|
+
xml = ""
|
2214
|
+
$stdin.each() do |e|
|
2215
|
+
xml += e
|
2216
|
+
end
|
2217
|
+
path = OpenStruct.new
|
2218
|
+
path.doc = Document.new(xml)
|
2219
|
+
if path.doc and path.doc.root
|
2220
|
+
path.relevance = 0
|
2221
|
+
newpaths << path
|
2222
|
+
end
|
2223
|
+
rescue Interrupt, SystemExit
|
2224
|
+
raise
|
2225
|
+
rescue Exception
|
2226
|
+
$stderr.puts "Error: unreadable subscriptions"
|
2227
|
+
end
|
2228
|
+
else
|
2229
|
+
newpaths = urls.uniq.collect() do |e|
|
2230
|
+
path = OpenStruct.new
|
2231
|
+
path.url = e
|
2232
|
+
path
|
2233
|
+
end
|
2234
|
+
newpaths = newpaths.collect() do |path|
|
2235
|
+
$stderr.puts "Fetching: #{path.url}" if @opt.verbose
|
2236
|
+
dochistory << path.url
|
2237
|
+
path.doc = fetchdoc(path)
|
2238
|
+
if path.doc
|
2239
|
+
path.relevance = 0
|
2240
|
+
path
|
2241
|
+
else
|
2242
|
+
$stderr.puts "Skipping unreadable document" if @opt.verbose
|
2243
|
+
nil
|
2244
|
+
end
|
2245
|
+
end
|
2246
|
+
newpaths.compact!
|
2247
|
+
end
|
2248
|
+
#send usage statistics
|
2249
|
+
@stats.ping_search @opt, @query.join(' ')
|
2250
|
+
#
|
2251
|
+
loop do
|
2252
|
+
break if @opt.feeds and res.size >= @opt.feeds
|
2253
|
+
begin
|
2254
|
+
newpaths.sort!() do |path1, path2|
|
2255
|
+
path2.relevance <=> path1.relevance
|
2256
|
+
end
|
2257
|
+
paths = newpaths + paths
|
2258
|
+
newpaths = []
|
2259
|
+
path = nil
|
2260
|
+
loop do
|
2261
|
+
path = paths.shift
|
2262
|
+
break unless path
|
2263
|
+
if path.doc
|
2264
|
+
break
|
2265
|
+
else
|
2266
|
+
if dochistory.detect{|e| e == path.url}
|
2267
|
+
$stderr.puts "Skipping duplicate: #{path.url}" if @opt.verbose
|
2268
|
+
next
|
2269
|
+
end
|
2270
|
+
$stderr.puts "Fetching: #{path.url}" if @opt.verbose
|
2271
|
+
dochistory << path.url
|
2272
|
+
path.doc = fetchdoc(path)
|
2273
|
+
if path.doc
|
2274
|
+
break
|
2275
|
+
end
|
2276
|
+
$stderr.puts "Error: skipping unreadable document"
|
2277
|
+
end
|
2278
|
+
end
|
2279
|
+
break unless path
|
2280
|
+
if path.doc.root.name == "opml"
|
2281
|
+
#doc relevance
|
2282
|
+
path.relevance += relevance_of(XPath.first(path.doc, "/opml/head/title/text()"))
|
2283
|
+
#outgoing links
|
2284
|
+
XPath.each(path.doc,"//outline") do |outline|
|
2285
|
+
url = outline.attributes["xmlUrl"]
|
2286
|
+
url = outline.attributes["url"] unless url
|
2287
|
+
next unless url
|
2288
|
+
begin
|
2289
|
+
url = URI.parse(path.url).merge(url).to_s if path.url
|
2290
|
+
rescue Interrupt, SystemExit
|
2291
|
+
raise
|
2292
|
+
rescue Exception
|
2293
|
+
end
|
2294
|
+
newpath = OpenStruct.new
|
2295
|
+
newpath.url = url
|
2296
|
+
newpath.referrer = path.url
|
2297
|
+
#link relevance
|
2298
|
+
newpath.relevance = path.relevance
|
2299
|
+
XPath.each(outline, "ancestor-or-self::outline") do |e|
|
2300
|
+
newpath.relevance += relevance_of(e.attributes["text"])
|
2301
|
+
end
|
2302
|
+
#
|
2303
|
+
newpaths << newpath
|
2304
|
+
end
|
2305
|
+
elsif path.doc.root.name == "pcast"
|
2306
|
+
#outgoing links
|
2307
|
+
XPath.each(path.doc,"/pcast/channel") do |channel|
|
2308
|
+
link = XPath.first(channel, "link[@rel='feed']")
|
2309
|
+
next unless link
|
2310
|
+
url = link.attributes["href"]
|
2311
|
+
next unless url
|
2312
|
+
begin
|
2313
|
+
url = URI.parse(path.url).merge(url).to_s if path.url
|
2314
|
+
rescue Interrupt, SystemExit
|
2315
|
+
raise
|
2316
|
+
rescue Exception
|
2317
|
+
end
|
2318
|
+
newpath = OpenStruct.new
|
2319
|
+
newpath.url = url
|
2320
|
+
newpath.referrer = path.url
|
2321
|
+
#link relevance
|
2322
|
+
newpath.relevance = path.relevance
|
2323
|
+
newpath.relevance += relevance_of(XPath.first(channel, "title/text()"))
|
2324
|
+
newpath.relevance += relevance_of(XPath.first(channel, "subtitle/text()"))
|
2325
|
+
#
|
2326
|
+
newpaths << newpath
|
2327
|
+
end
|
2328
|
+
elsif path.doc.root.namespace =~ @@ATOM_NS and path.url
|
2329
|
+
#doc relevance
|
2330
|
+
title = nil
|
2331
|
+
begin
|
2332
|
+
XPath.each(path.doc.root,"/*/*") do |e|
|
2333
|
+
next unless e.namespace =~ @@ATOM_NS
|
2334
|
+
next unless e.name == "title" or e.name == "subtitle"
|
2335
|
+
title = e.text if e.name == "title"
|
2336
|
+
path.relevance += relevance_of(e.text)
|
2337
|
+
end
|
2338
|
+
rescue Interrupt, SystemExit
|
2339
|
+
raise
|
2340
|
+
rescue Exception
|
2341
|
+
#$stderr.puts "error: #{$!}"
|
2342
|
+
end
|
2343
|
+
if path.relevance > 0
|
2344
|
+
$stderr.puts "Found: #{title} (relevance: #{path.relevance})" if @opt.verbose
|
2345
|
+
if title
|
2346
|
+
path.title = ""
|
2347
|
+
title.value.each_line() do |e3| #remove line breaks
|
2348
|
+
path.title+= e3.chomp+" "
|
2349
|
+
end
|
2350
|
+
path.title.strip!
|
2351
|
+
end
|
2352
|
+
res << path
|
2353
|
+
end
|
2354
|
+
elsif path.doc.root.name = "rss" and path.url
|
2355
|
+
#doc relevance
|
2356
|
+
title = XPath.first(path.doc, "//channel/title/text()")
|
2357
|
+
path.relevance += relevance_of(title)
|
2358
|
+
path.relevance += relevance_of(XPath.first(path.doc, "//channel/description/text()"))
|
2359
|
+
begin
|
2360
|
+
XPath.each(path.doc.root,"//channel/*") do |e|
|
2361
|
+
next unless e.name == "category"
|
2362
|
+
if e.namespace == @@ITUNES_NS
|
2363
|
+
XPath.each(e, "descendant-or-self::*") do |e2|
|
2364
|
+
next unless e2.name == "category"
|
2365
|
+
path.relevance += relevance_of(e2.attributes["text"])
|
2366
|
+
end
|
2367
|
+
else
|
2368
|
+
path.relevance += relevance_of(e.text)
|
2369
|
+
end
|
2370
|
+
end
|
2371
|
+
rescue Interrupt, SystemExit
|
2372
|
+
raise
|
2373
|
+
rescue Exception
|
2374
|
+
#$stderr.puts "error: #{$!}"
|
2375
|
+
end
|
2376
|
+
if path.relevance > 0
|
2377
|
+
$stderr.puts "Found: #{title} (relevance: #{path.relevance})" if @opt.verbose
|
2378
|
+
if title
|
2379
|
+
path.title = ""
|
2380
|
+
title.value.each_line() do |e3| #remove line breaks
|
2381
|
+
path.title+= e3.chomp+" "
|
2382
|
+
end
|
2383
|
+
path.title.strip!
|
2384
|
+
end
|
2385
|
+
res << path
|
2386
|
+
end
|
2387
|
+
end
|
2388
|
+
rescue Interrupt, SystemExit
|
2389
|
+
raise
|
2390
|
+
rescue Exception
|
2391
|
+
$stderr.puts "Error: skipping unreadable document"
|
2392
|
+
end
|
2393
|
+
end
|
2394
|
+
rescue Interrupt, SystemExit
|
2395
|
+
$stderr.puts "Execution interrupted"
|
2396
|
+
rescue Exception
|
2397
|
+
end
|
2398
|
+
result = nil
|
2399
|
+
while not result
|
2400
|
+
begin
|
2401
|
+
res.sort!() do |path1, path2|
|
2402
|
+
path2.relevance <=> path1.relevance
|
2403
|
+
end
|
2404
|
+
opml = OPML.new "Search results for \"#{@query.collect(){|e| "#{e} "}}\""
|
2405
|
+
res.each() do |path|
|
2406
|
+
opml.add path.url, path.title
|
2407
|
+
end
|
2408
|
+
result = opml
|
2409
|
+
rescue Exception
|
2410
|
+
end
|
2411
|
+
end
|
2412
|
+
result.write
|
2413
|
+
result
|
2414
|
+
end
|
2415
|
+
private
|
2416
|
+
def relevance_of(meta)
|
2417
|
+
return 0 unless meta
|
2418
|
+
unless meta.kind_of? String #Text todo: resolve entities
|
2419
|
+
meta = meta.value
|
2420
|
+
end
|
2421
|
+
meta = meta.downcase
|
2422
|
+
meta = meta.split
|
2423
|
+
res = 0
|
2424
|
+
@query.each() do |e|
|
2425
|
+
meta.each() do |e2|
|
2426
|
+
res += 1 if e2.index(e)
|
2427
|
+
end
|
2428
|
+
end
|
2429
|
+
res
|
2430
|
+
end
|
2431
|
+
def fetchdoc(link)
|
2432
|
+
doc = ""
|
2433
|
+
1.upto(@opt.retries) do |i|
|
2434
|
+
begin
|
2435
|
+
if link.url =~ %r{^http:} or link.url =~ %r{^ftp:}
|
2436
|
+
if link.referrer and (link.referrer =~ %r{^http:} or link.referrer =~ %r{^ftp:})
|
2437
|
+
open(link.url, "User-Agent" => USER_AGENT, "Referer" => link.referrer) do |f|
|
2438
|
+
break if f.content_type.index "audio/"
|
2439
|
+
break if f.content_type.index "video/"
|
2440
|
+
f.each_line() do |e|
|
2441
|
+
doc += e
|
2442
|
+
end
|
2443
|
+
end
|
2444
|
+
else
|
2445
|
+
open(link.url, "User-Agent" => USER_AGENT) do |f|
|
2446
|
+
break if f.content_type.index "audio/"
|
2447
|
+
break if f.content_type.index "video/"
|
2448
|
+
f.each_line() do |e|
|
2449
|
+
doc += e
|
2450
|
+
end
|
2451
|
+
end
|
2452
|
+
end
|
2453
|
+
else
|
2454
|
+
open(link.url) do |f|
|
2455
|
+
f.each_line() do |e|
|
2456
|
+
doc += e
|
2457
|
+
end
|
2458
|
+
end
|
2459
|
+
end
|
2460
|
+
break
|
2461
|
+
rescue Interrupt
|
2462
|
+
rescue SystemExit
|
2463
|
+
break
|
2464
|
+
rescue Exception
|
2465
|
+
end
|
2466
|
+
$stderr.puts "Attempt #{i} aborted" if @opt.verbose
|
2467
|
+
doc = ""
|
2468
|
+
sleep 5
|
2469
|
+
end
|
2470
|
+
res = nil
|
2471
|
+
begin
|
2472
|
+
res = Document.new doc
|
2473
|
+
rescue Exception
|
2474
|
+
end
|
2475
|
+
res = nil unless res and res.root
|
2476
|
+
res
|
2477
|
+
end
|
2478
|
+
end
|
2479
|
+
|
2480
|
+
opt.size *= 1_000_000 if opt.size
|
2481
|
+
opt.upload_rate *= 1024 if opt.upload_rate
|
2482
|
+
opt.itemsize *= 1_000_000
|
2483
|
+
arguments = arguments + ARGV
|
2484
|
+
|
2485
|
+
if opt.check_for_update
|
2486
|
+
$stderr.puts "Enabling update check." if opt.verbose
|
2487
|
+
end
|
2488
|
+
|
2489
|
+
if opt.vote
|
2490
|
+
$stderr.puts "Enabling the sending of anonymous usage statistics." if opt.verbose
|
2491
|
+
end
|
2492
|
+
|
2493
|
+
begin
|
2494
|
+
require "rubytorrent"
|
2495
|
+
opt.rubytorrent = true
|
2496
|
+
$stderr.puts "RubyTorrent detected." if opt.verbose
|
2497
|
+
rescue Interrupt, SystemExit
|
2498
|
+
exit 1
|
2499
|
+
rescue Exception
|
2500
|
+
end
|
2501
|
+
|
2502
|
+
if opt.function == :download
|
2503
|
+
cache = Cache.new opt
|
2504
|
+
cache.createplaylist arguments
|
2505
|
+
elsif opt.function == :search
|
2506
|
+
dir = Query.new opt, arguments.shift
|
2507
|
+
dir.search arguments
|
2508
|
+
end
|
2509
|
+
|
2510
|
+
if opt.check_for_update
|
2511
|
+
update = Update.new opt.dir
|
2512
|
+
update.check
|
2513
|
+
end
|
2514
|
+
|
2515
|
+
if opt.verbose and false
|
2516
|
+
$stderr.puts ""
|
2517
|
+
$stderr.puts " *********************************************************************"
|
2518
|
+
$stderr.puts " **** Qworum - A platform for web-based services (sponsor) ****"
|
2519
|
+
$stderr.puts " *********************************************************************"
|
2520
|
+
$stderr.puts " **** Sell and buy services: ****"
|
2521
|
+
$stderr.puts " **** Host services on your own domain; sell them to websites ****"
|
2522
|
+
$stderr.puts " **** or businesses on the service marketplace. ****"
|
2523
|
+
$stderr.puts " **** ****"
|
2524
|
+
$stderr.puts " **** Build enterprise information systems: ****"
|
2525
|
+
$stderr.puts " **** Use Qworum in your information system, and enjoy the ****"
|
2526
|
+
$stderr.puts " **** benefits of a powerful SOA technology. ****"
|
2527
|
+
$stderr.puts " **** ****"
|
2528
|
+
$stderr.puts " **** Learn more at http://www.qworum.com/ ****"
|
2529
|
+
$stderr.puts " *********************************************************************"
|
2530
|
+
$stderr.puts ""
|
2531
|
+
end
|
2532
|
+
|
2533
|
+
$stderr.puts "End of podcatching session." if opt.verbose
|
2534
|
+
|
2535
|
+
|