bwkfanboy 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ (The MIT License)
2
+
3
+ Copyright (c) 2010 Alexander Gromnitsky.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ 'Software'), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
21
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
22
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,88 @@
1
+ = About
2
+
3
+ bwkfanboy is a HTML to Atom feed converter that you can use to watch
4
+ sites that do not provide its own feed.
5
+
6
+ The converter is not a magick tool: you'll need to write a plugin (in
7
+ Ruby) for each site you want to watch. bwkfanboy provides guidelines and
8
+ general assistance.
9
+
10
+ = Architecture
11
+
12
+ == Plugins
13
+
14
+ bwkfanboy comes with 1 exmple plugin that parses a search page of
15
+ dailyprincetonian.com looking for bwk's articles.
16
+
17
+ The plugin is a Ruby class +Page+ that inherits Bwkfanboy::Parse
18
+ parent, overriding 1 method.
19
+
20
+ The plugins can be in the system
21
+
22
+ `gem env gemdir`/gems/bwkfanboy-x.y.z/lib/bwkfanboy/plugins
23
+
24
+ or user's home
25
+
26
+ ~/.bwkfanboy/plugins
27
+
28
+ directories.
29
+
30
+ == Pipeline
31
+
32
+ The program consists of 4 parts:
33
+
34
+ 0. *bwkfanboy* script that takes 1 parameter: the name of a file in
35
+ plugins directories (without the .rb suffix). So, for example to get
36
+ an atom feed from dailyprincetonian.com you type:
37
+
38
+ % bwkfanboy bwk
39
+
40
+ and it will load
41
+ <tt>/usr/local/lib/ruby/gems/1.9/gems/bwkfanboy-0.0.1/lib/bwkfanboy/plugins/bwk.rb</tt>
42
+ file on my FreeBSD machine, fetch and parse html from
43
+ dailyprincetonian.com and generate the required feed, dumping it to
44
+ stdout.
45
+
46
+ The script is just a convinient wrapper for 3 separate utils.
47
+
48
+ 1. *bwkfanboy_fetch*
49
+
50
+ It reads 1 line from stdin for the URL to fetch from. The result will
51
+ be dumped to stdout.
52
+
53
+ 2. *bwkfanboy_parse*
54
+
55
+ It takes 1 parameter: <em>a full path</em> to a plugin file.
56
+
57
+ This util reads stdin expecting it to be a xhtml, parses it and dumps
58
+ the result to stdout in JSON-formatted object.
59
+
60
+ 3. *bwkfanboy_generate*
61
+
62
+ Reads stdin expecting it to be a proper JSON-formatted object.
63
+
64
+ The result will be an Atom feed dumped to stdout in UTF-8.
65
+
66
+ So, without the wrapper all this together looks like:
67
+
68
+ % echo http://example.org | bwkfanboy_fetch |
69
+ bwkfanboy_parse /path/to/my/plugin.rb | bwkfanboy_generate
70
+
71
+ == Log
72
+
73
+ All utils write to <tt>/tmp/bwkfanboy/USER/log/general.log</tt> file if
74
+ permissions allows it.
75
+
76
+ == HTTP
77
+
78
+ There are 2 method to get an Atom feed via HTTP:
79
+
80
+ 1. <tt>web/bwkfanboy.cgi</tt> (from the program tarball), which you may
81
+ copy to your Apache cgi directory and run it. This prohibits you from
82
+ using HOME directory for your own plugins. Also the cgi script
83
+ requires some manual editing (setting 1 variable in it) before even
84
+ you can start utilizing it.
85
+
86
+ 2. Small *bwkfanboy_server* HTTP server. It can run from any user and
87
+ thus is able to inherit env variables for discovering your HOME
88
+ directory. Read bin/bwkfanboy_server to know how to operate it.
data/Rakefile ADDED
@@ -0,0 +1,48 @@
1
+ # -*-ruby-*-
2
+
3
+ require 'rake'
4
+ require 'rake/gempackagetask'
5
+ require 'rake/clean'
6
+ require 'rake/rdoctask'
7
+ require 'rake/testtask'
8
+
9
+ spec = Gem::Specification.new() {|i|
10
+ i.name = "bwkfanboy"
11
+ i.summary = 'A converter from HTML to Atom feed that you can use to watch sites that do not provide its own feed.'
12
+ i.version = '0.0.1'
13
+ i.author = 'Alexander Gromnitsky'
14
+ i.email = 'alexander.gromnitsky@gmail.com'
15
+ i.homepage = 'http://github.com/gromnitsky/bwkfanboy'
16
+ i.platform = Gem::Platform::RUBY
17
+ i.required_ruby_version = '>= 1.9'
18
+ i.files = FileList['lib/**/*', 'bin/*', 'doc/*', '[A-Z]*', 'test/**/*']
19
+
20
+ i.executables = FileList['bin/*'].gsub(/^bin\//, '')
21
+ i.default_executable = i.name
22
+
23
+ i.has_rdoc = true
24
+ i.test_files = FileList['test/test_*.rb']
25
+
26
+ i.rdoc_options << '-m' << 'Bwkfanboy' << '-x' << 'plugins'
27
+ i.extra_rdoc_files = FileList['bin/*']
28
+
29
+ i.add_dependency('activesupport', '>= 3.0.0')
30
+ i.add_dependency('nokogiri', '>= 1.4.3')
31
+ i.add_dependency('open4', '>= 1.0.1')
32
+ i.add_dependency('jsonschema', '>= 2.0.0')
33
+ }
34
+
35
+ Rake::GemPackageTask.new(spec).define()
36
+
37
+ task(default: %(repackage))
38
+
39
+ Rake::RDocTask.new('doc') {|i|
40
+ i.main = "Bwkfanboy"
41
+ i.rdoc_files = FileList['doc/*', 'lib/**/*.rb', 'bin/*']
42
+ i.rdoc_files.exclude("lib/**/plugins", "test")
43
+ }
44
+
45
+ Rake::TestTask.new() {|i|
46
+ i.test_files = FileList['test/test_*.rb']
47
+ i.verbose = true
48
+ }
data/TODO ADDED
@@ -0,0 +1,7 @@
1
+ -*-text-*-
2
+
3
+ 0.0.2
4
+ -----
5
+
6
+ - Add plugin listing to bwkfanboy_server.
7
+ - More tests.
data/bin/bwkfanboy ADDED
@@ -0,0 +1,128 @@
1
+ #!/usr/bin/env ruby19
2
+ # -*-ruby-*-
3
+
4
+ # This program is executed by bin/bwkfanboy_server to do all dirty work:
5
+ # fetch HTML, parse it and generate a pretty Atom feed.
6
+ #
7
+ # It is a wrapper which you can utilize for such common tasks as listing
8
+ # all available plugins.
9
+ #
10
+ # Type:
11
+ #
12
+ # % bwkfanboy -h
13
+ #
14
+ # to get some basic help & read about Bwkfanboy module.
15
+
16
+ require_relative '../lib/bwkfanboy/parser'
17
+
18
+ $conf = {
19
+ prog_name: 'bwkfanboy',
20
+ prog_ver: '0.0.1',
21
+ mode: 'pipe',
22
+ banner: "Usage: #{File.basename($0)} [options] plugin-name"
23
+ }
24
+
25
+ class Plugin # :nodoc: all
26
+ attr_reader :name, :path
27
+
28
+ def initialize(name)
29
+ @name = name
30
+ @path = nil
31
+ end
32
+
33
+ def dirs()
34
+ # try to create user's home plugin directory
35
+ begin
36
+ ['~/.bwkfanboy', '~/.bwkfanboy/plugins'].each {|i|
37
+ Dir.mkdir(File.expand_path(i))
38
+ }
39
+ rescue
40
+ # empty
41
+ end
42
+
43
+ r = []
44
+ dirs = ['~/.bwkfanboy/plugins', "#{Bwkfanboy::Utils.gem_dir_system}/plugins"]
45
+ begin
46
+ # this will fail for user's home directory under Apache CGI
47
+ # environment
48
+ dirs.map! {|i| File.expand_path(i) }
49
+ rescue
50
+ end
51
+ dirs.each {|i|
52
+ if File.readable?(i) then
53
+ r << i
54
+ else
55
+ Bwkfanboy::Utils.warnx("directory #{i} isn't readable");
56
+ end
57
+ }
58
+
59
+ if r.length == 0 then
60
+ Bwkfanboy::Utils.errx(1, "no dirs for plugins found: #{dirs.join(' ')}")
61
+ end
62
+ return r
63
+ end
64
+
65
+ def load()
66
+ abort($conf[:banner]) unless (@name && @name !~ /^\s*$/)
67
+
68
+ dirs.each {|i|
69
+ files = Dir.glob("#{i}/*.rb")
70
+ if (@path = files.index("#{i}/#{@name}.rb")) then
71
+ @path = files[@path]
72
+ break
73
+ end
74
+ }
75
+ Bwkfanboy::Utils.errx(1, "no such plugin '#{@name}'") if ! @path
76
+ Bwkfanboy::Utils.plugin_load(@path, Bwkfanboy::Meta::PLUGIN_CLASS)
77
+
78
+ pn = Page.new()
79
+ pn.check()
80
+ return pn
81
+ end
82
+
83
+ end # class
84
+
85
+ # ----------------------------------------------------------------------
86
+
87
+ o = Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner]) # create OptionParser object
88
+ o.on('-i', 'Show some info about the plugin') { |i| $conf[:mode] = 'info' }
89
+ o.on('-l', 'List all plugins') { |i| $conf[:mode] = 'list' }
90
+ o.on('-p', 'List all plugins paths') { |i| $conf[:mode] = 'path' }
91
+ o.on('-D', '(ignore this) Use URI_DEBUG const instead URI in plugins') { |i| $conf[:mode] = 'debug' }
92
+ Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], o) # run cl parser
93
+
94
+ plugin = Plugin.new(ARGV[0])
95
+
96
+ case $conf[:mode]
97
+ when 'list'
98
+ plugin.dirs().each {|i|
99
+ puts "#{i}:"
100
+ Dir.glob("#{i}/*.rb").each {|j|
101
+ puts "\t#{File.basename(j, '.rb')}"
102
+ }
103
+ }
104
+ when 'path'
105
+ plugin.dirs().each {|i| puts i}
106
+ when 'info'
107
+ plugin.load().dump_info
108
+ else
109
+ # A pipe mode
110
+ pn = plugin.load()
111
+ cmd = "./bwkfanboy_fetch | ./bwkfanboy_parse '#{plugin.path}' | ./bwkfanboy_generate"
112
+ if Bwkfanboy::Utils.cfg[:verbose] >= 2 then
113
+ puts ($conf[:mode] != 'debug' ? pn.class::Meta::URI : pn.class::Meta::URI_DEBUG)
114
+ puts cmd
115
+ exit 0
116
+ end
117
+
118
+ # go to the directory with current script
119
+ Dir.chdir(File.dirname(File.expand_path($0)))
120
+
121
+ pipe = IO.popen(cmd, 'w+')
122
+ pipe.puts ($conf[:mode] != 'debug' ? pn.class::Meta::URI : pn.class::Meta::URI_DEBUG)
123
+ pipe.close_write
124
+ while line = pipe.gets
125
+ puts line
126
+ end
127
+ pipe.close
128
+ end
@@ -0,0 +1,30 @@
1
+ #!/usr/bin/env ruby19
2
+ # -*-ruby-*-
3
+
4
+ # Read stdin for a URI or a full path to the local file, download it (or
5
+ # read for the local file) and print to stdout.
6
+
7
+ require 'open-uri'
8
+
9
+ require_relative '../lib/bwkfanboy/utils'
10
+
11
+ $conf = { banner: "Usage: #{File.basename($0)} [options] < uri" }
12
+
13
+ Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], nil, true)
14
+
15
+ uri = gets.chomp()
16
+
17
+ Bwkfanboy::Utils.veputs(1, "fetching #{uri}\n")
18
+
19
+ begin
20
+ open(uri, "User-Agent" => Bwkfanboy::Meta::USER_AGENT) {|f|
21
+ if defined?(f.meta) && f.status[0] != '200' then
22
+ Bwkfanboy::Utils.errx(1, "cannot fetch #{uri} : HTTP responce: #{f.status[0]}")
23
+ end
24
+ Bwkfanboy::Utils.veputs(1, "charset = #{f.content_type_parse[1][1]}\n") if defined?(f.meta)
25
+ f.each_line {|i| puts i}
26
+ }
27
+ rescue
28
+ # typically Errno::ENOENT
29
+ Bwkfanboy::Utils.errx(1, "cannot fetch: #{$!}");
30
+ end
@@ -0,0 +1,80 @@
1
+ #!/usr/bin/env ruby19
2
+ # -*-ruby-*-
3
+
4
+ # Read stdin for JSON, generate from it an Atom feed and print the
5
+ # result to stdout in UTF-8.
6
+ #
7
+ # One can validate the JSON by providing '--check' command line option
8
+ # (by default the validating is off).
9
+
10
+ require 'rss/maker'
11
+ require 'date'
12
+ require 'json'
13
+ require 'jsonschema'
14
+
15
+ require_relative '../lib/bwkfanboy/utils'
16
+
17
+ $conf = {
18
+ banner: "Usage: #{File.basename($0)} [options] < json",
19
+ check: false
20
+ }
21
+
22
+ o = Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner])
23
+ o.on('--check', 'Validate the input (slow!)') { |i| $conf[:check] = true }
24
+ Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], o) # run cl parser
25
+
26
+ begin
27
+ j = JSON.parse(STDIN.read)
28
+ rescue
29
+ Bwkfanboy::Utils.errx(1, "stdin had invalid JSON");
30
+ end
31
+
32
+ # validate the input
33
+ schema = Bwkfanboy::Utils.gem_dir_system() + '/schema.js'
34
+ if $conf[:check] then
35
+ begin
36
+ JSON::Schema.validate(j, JSON.parse(File.read(schema)))
37
+ rescue
38
+ Bwkfanboy::Utils.errx(1, "JSON validation with schema (#{schema}) failed");
39
+ end
40
+ end
41
+
42
+ feed = RSS::Maker.make("atom") { |maker|
43
+ maker.channel.id = j['channel']['id']
44
+ maker.channel.updated = j['channel']['updated']
45
+ maker.channel.author = j['channel']['author']
46
+ maker.channel.title = j['channel']['title']
47
+
48
+ maker.channel.links.new_link {|i|
49
+ i.href = j['channel']['link']
50
+ i.rel = 'alternate'
51
+ i.type = 'text/html' # eh
52
+ }
53
+
54
+ maker.items.do_sort = true
55
+
56
+ j['x_entries'].each { |i|
57
+ maker.items.new_item do |item|
58
+ item.links.new_link {|k|
59
+ k.href = i['link']
60
+ k.rel = 'alternate'
61
+ k.type = 'text/html' # only to make happy crappy pr2nntp gateway
62
+ }
63
+ item.title = i['title']
64
+ item.author = i['author']
65
+ item.updated = i['updated']
66
+ item.content.type = j['channel']['x_entries_content_type']
67
+
68
+ case item.content.type
69
+ when 'text'
70
+ item.content.content = i['content']
71
+ when 'html'
72
+ item.content.content = i['content']
73
+ else
74
+ item.content.xhtml = i['content']
75
+ end
76
+ end
77
+ }
78
+ }
79
+
80
+ puts feed
@@ -0,0 +1,32 @@
1
+ #!/usr/bin/env ruby19
2
+ # -*-ruby-*-
3
+
4
+ # Take 1 command line parameter: a full path to a plugin.
5
+ #
6
+ # Read stdin for a HTML, parse it and print the result to stdout in JSON
7
+ # format. If '-vv' command line parameters were given, output will be in
8
+ # 'key: value' pairs and <em>not</em> in JSON.
9
+
10
+ require_relative '../lib/bwkfanboy/parser'
11
+
12
+ $conf = {
13
+ banner: "Usage: #{File.basename($0)} [options] /path/to/my/plugin.rb < html"
14
+ }
15
+
16
+ Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], nil, true)
17
+
18
+ if ARGV.size == 0 then
19
+ abort($conf[:banner])
20
+ else
21
+ Bwkfanboy::Utils.plugin_load(ARGV[0], Bwkfanboy::Meta::PLUGIN_CLASS)
22
+ end;
23
+
24
+ pn = Page.new()
25
+ pn.check()
26
+ pn.parse()
27
+
28
+ if Bwkfanboy::Utils.cfg[:verbose] >= 2 then
29
+ pn.dump()
30
+ else
31
+ puts pn.to_json()
32
+ end
@@ -0,0 +1,141 @@
1
+ #!/usr/bin/env ruby19
2
+ # -*-ruby-*-
3
+
4
+ # Start a HTTP server (by default on 127.0.0.1:9042). To get Atom feeds
5
+ # from it, initiate GET request with URI
6
+ #
7
+ # http://localhost:9042/?p=PLUGIN
8
+ #
9
+ # where +PLUGIN+ is a name of a bwkfanboy's plugin (without '.re' suffix).
10
+ #
11
+ # To list all available plugins, point you browser to
12
+ #
13
+ # http://localhost:9042/list
14
+ #
15
+ # The server is intended to run from a non-root user from
16
+ # <tt>~/.login</tt> file. It can detach from a terminal if you give it
17
+ # '-d' command line option.
18
+ #
19
+ # For other help, type:
20
+ #
21
+ # bwkfanboy_server -h
22
+ #
23
+ # The server maintains 2 logs:
24
+ #
25
+ # /tmp/bwkfanboy/USER/log/bwkfanboy_server.log
26
+ # /tmp/bwkfanboy/USER/log/bwkfanboy_server-access.log
27
+ #
28
+ # The file with a pid:
29
+ #
30
+ # /tmp/bwkfanboy/USER/bwkfanboy_server.pid
31
+
32
+ require 'webrick'
33
+ require_relative '../lib/bwkfanboy/utils'
34
+
35
+ $conf = {
36
+ addr: '127.0.0.1',
37
+ port: 9042,
38
+ converter: "./#{Bwkfanboy::Meta::NAME}",
39
+ banner: "Usage: #{File.basename($0)} [options]",
40
+ server_type: WEBrick::SimpleServer,
41
+ workdir: File.dirname(File.expand_path($0)),
42
+ pidfile: "#{Bwkfanboy::Meta::DIR_TMP}/#{File.basename($0)}.pid",
43
+ log: "#{Bwkfanboy::Meta::DIR_LOG}/#{File.basename($0)}.log",
44
+ alog: "#{Bwkfanboy::Meta::DIR_LOG}/#{File.basename($0)}-access.log",
45
+ mode: 'pipe'
46
+ }
47
+
48
+ o = Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner]) # create OptionParser object
49
+ o.on('-b VAL', 'BindAddress') { |i| $conf[:addr] = i }
50
+ o.on('-p VAL', 'A port number') { |i| $conf[:port] = i }
51
+ o.on('-c VAL', "A path to main #{Bwkfanboy::Meta::NAME} executable") { |i| $conf[:converter] = i }
52
+ o.on('-d', 'Detach from a terminal') {|i| $conf[:server_type] = WEBrick::Daemon }
53
+ o.on('-D', '(ignore this) Use URI_DEBUG const instead URI in plugins') { |i| $conf[:mode] = 'debug' }
54
+ Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], o) # run cl parser
55
+
56
+ Bwkfanboy::Utils.dir_tmp_create()
57
+
58
+ class FeedServlet < WEBrick::HTTPServlet::AbstractServlet # :nodoc: all
59
+ def do_GET(req, res)
60
+ if req.query['p'] && req.query['p'] =~ Bwkfanboy::Meta::PLUGIN_NAME
61
+ res['Content-Type'] = 'application/atom+xml; charset=UTF-8'
62
+ res['Content-Disposition'] = "inline; filename=\"#{Bwkfanboy::Meta::NAME}-#{req.query['p']}.xml"
63
+
64
+ cmd = "#{$conf[:converter]} #{$conf[:mode] == 'debug' ? '-D' : ''} #{req.query['p']}"
65
+ r = Bwkfanboy::Utils.cmd_run(cmd)
66
+ if r[0] != 0 then
67
+ raise WEBrick::HTTPStatus::InternalServerError.new("Errors in the pipeline:\n\n #{r[1]}")
68
+ end
69
+
70
+ res.body = r[2]
71
+ else
72
+ raise WEBrick::HTTPStatus::InternalServerError.new("Parameter 'p' required")
73
+ end
74
+ end
75
+ end
76
+
77
+ class FeedListServlet < WEBrick::HTTPServlet::AbstractServlet # :nodoc: all
78
+ def do_GET(req, res)
79
+ cmd = "#{$conf[:converter]} -l"
80
+ r = Bwkfanboy::Utils.cmd_run(cmd)
81
+ if r[0] != 0 then
82
+ raise WEBrick::HTTPStatus::InternalServerError.new("Errors:\n\n #{r[1]}")
83
+ end
84
+
85
+ res.body = r[2]
86
+ end
87
+ end
88
+
89
+ # create temporally files
90
+ def start_callback()
91
+ Dir.chdir($conf[:workdir])
92
+ if ! File.executable?($conf[:converter]) then
93
+ Bwkfanboy::Utils.errx(1, "Missing executable file '#{$conf[:converter]}'")
94
+ end
95
+
96
+ begin
97
+ File.open($conf[:pidfile], "w+") {|i| i.puts $$ }
98
+ rescue
99
+ Bwkfanboy::Utils.warnx("unable to create a pidfile " + $conf[:pidfile])
100
+ end
101
+ end
102
+
103
+ # remove temporally files
104
+ def stop_callback()
105
+ begin
106
+ File.unlink $conf[:pidfile]
107
+ rescue
108
+ # ignore errors
109
+ end
110
+ end
111
+
112
+ def log_create(f)
113
+ begin
114
+ log = Logger.new(f, 2, Bwkfanboy::Meta::LOG_MAXSIZE)
115
+ rescue
116
+ Bwkfanboy::Utils.warnx("cannot open log #{f}");
117
+ return nil
118
+ end
119
+ log.datetime_format = "%H:%M:%S"
120
+ log
121
+ end
122
+
123
+ # ----------------------------------------------------------------------
124
+
125
+ server_log = log_create($conf[:log])
126
+ access_log = [[ log_create($conf[:alog]), WEBrick::AccessLog::COMBINED_LOG_FORMAT ]]
127
+
128
+ s = WEBrick::HTTPServer.new(Port: $conf[:port],
129
+ BindAddress: $conf[:addr],
130
+ ServerType: $conf[:server_type],
131
+ StartCallback: -> {start_callback},
132
+ StopCallback: -> {stop_callback},
133
+ Logger: server_log,
134
+ AccessLog: access_log
135
+ )
136
+ s.mount("/", FeedServlet)
137
+ s.mount("/list", FeedListServlet)
138
+ ['TERM', 'INT'].each {|i|
139
+ trap(i) { s.shutdown }
140
+ }
141
+ s.start
data/doc/README.rdoc ADDED
@@ -0,0 +1,88 @@
1
+ = About
2
+
3
+ bwkfanboy is a HTML to Atom feed converter that you can use to watch
4
+ sites that do not provide its own feed.
5
+
6
+ The converter is not a magick tool: you'll need to write a plugin (in
7
+ Ruby) for each site you want to watch. bwkfanboy provides guidelines and
8
+ general assistance.
9
+
10
+ = Architecture
11
+
12
+ == Plugins
13
+
14
+ bwkfanboy comes with 1 exmple plugin that parses a search page of
15
+ dailyprincetonian.com looking for bwk's articles.
16
+
17
+ The plugin is a Ruby class +Page+ that inherits Bwkfanboy::Parse
18
+ parent, overriding 1 method.
19
+
20
+ The plugins can be in the system
21
+
22
+ `gem env gemdir`/gems/bwkfanboy-x.y.z/lib/bwkfanboy/plugins
23
+
24
+ or user's home
25
+
26
+ ~/.bwkfanboy/plugins
27
+
28
+ directories.
29
+
30
+ == Pipeline
31
+
32
+ The program consists of 4 parts:
33
+
34
+ 0. *bwkfanboy* script that takes 1 parameter: the name of a file in
35
+ plugins directories (without the .rb suffix). So, for example to get
36
+ an atom feed from dailyprincetonian.com you type:
37
+
38
+ % bwkfanboy bwk
39
+
40
+ and it will load
41
+ <tt>/usr/local/lib/ruby/gems/1.9/gems/bwkfanboy-0.0.1/lib/bwkfanboy/plugins/bwk.rb</tt>
42
+ file on my FreeBSD machine, fetch and parse html from
43
+ dailyprincetonian.com and generate the required feed, dumping it to
44
+ stdout.
45
+
46
+ The script is just a convinient wrapper for 3 separate utils.
47
+
48
+ 1. *bwkfanboy_fetch*
49
+
50
+ It reads 1 line from stdin for the URL to fetch from. The result will
51
+ be dumped to stdout.
52
+
53
+ 2. *bwkfanboy_parse*
54
+
55
+ It takes 1 parameter: <em>a full path</em> to a plugin file.
56
+
57
+ This util reads stdin expecting it to be a xhtml, parses it and dumps
58
+ the result to stdout in JSON-formatted object.
59
+
60
+ 3. *bwkfanboy_generate*
61
+
62
+ Reads stdin expecting it to be a proper JSON-formatted object.
63
+
64
+ The result will be an Atom feed dumped to stdout in UTF-8.
65
+
66
+ So, without the wrapper all this together looks like:
67
+
68
+ % echo http://example.org | bwkfanboy_fetch |
69
+ bwkfanboy_parse /path/to/my/plugin.rb | bwkfanboy_generate
70
+
71
+ == Log
72
+
73
+ All utils write to <tt>/tmp/bwkfanboy/USER/log/general.log</tt> file if
74
+ permissions allows it.
75
+
76
+ == HTTP
77
+
78
+ There are 2 method to get an Atom feed via HTTP:
79
+
80
+ 1. <tt>web/bwkfanboy.cgi</tt> (from the program tarball), which you may
81
+ copy to your Apache cgi directory and run it. This prohibits you from
82
+ using HOME directory for your own plugins. Also the cgi script
83
+ requires some manual editing (setting 1 variable in it) before even
84
+ you can start utilizing it.
85
+
86
+ 2. Small *bwkfanboy_server* HTTP server. It can run from any user and
87
+ thus is able to inherit env variables for discovering your HOME
88
+ directory. Read bin/bwkfanboy_server to know how to operate it.