bwkfanboy 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +22 -0
- data/README.rdoc +88 -0
- data/Rakefile +48 -0
- data/TODO +7 -0
- data/bin/bwkfanboy +128 -0
- data/bin/bwkfanboy_fetch +30 -0
- data/bin/bwkfanboy_generate +80 -0
- data/bin/bwkfanboy_parse +32 -0
- data/bin/bwkfanboy_server +141 -0
- data/doc/README.rdoc +88 -0
- data/doc/plugin.rdoc +118 -0
- data/lib/bwkfanboy/parser.rb +143 -0
- data/lib/bwkfanboy/plugins/bwk.rb +33 -0
- data/lib/bwkfanboy/plugins/freebsd-ports-update.rb +76 -0
- data/lib/bwkfanboy/schema.js +39 -0
- data/lib/bwkfanboy/utils.rb +134 -0
- data/test/plugins/bwk.rb +29 -0
- data/test/plugins/empty.rb +0 -0
- data/test/popen4.sh +4 -0
- data/test/semis/bwk.html +398 -0
- data/test/semis/bwk.json +82 -0
- data/test/test_fetch.rb +34 -0
- data/test/test_generate.rb +30 -0
- data/test/test_parse.rb +32 -0
- data/test/test_server.rb +39 -0
- data/test/ts_utils.rb +21 -0
- data/test/xml-clean.sh +8 -0
- metadata +158 -0
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
(The MIT License)
|
2
|
+
|
3
|
+
Copyright (c) 2010 Alexander Gromnitsky.
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
'Software'), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
20
|
+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
21
|
+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
22
|
+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
ADDED
@@ -0,0 +1,88 @@
|
|
1
|
+
= About
|
2
|
+
|
3
|
+
bwkfanboy is a HTML to Atom feed converter that you can use to watch
|
4
|
+
sites that do not provide its own feed.
|
5
|
+
|
6
|
+
The converter is not a magick tool: you'll need to write a plugin (in
|
7
|
+
Ruby) for each site you want to watch. bwkfanboy provides guidelines and
|
8
|
+
general assistance.
|
9
|
+
|
10
|
+
= Architecture
|
11
|
+
|
12
|
+
== Plugins
|
13
|
+
|
14
|
+
bwkfanboy comes with 1 exmple plugin that parses a search page of
|
15
|
+
dailyprincetonian.com looking for bwk's articles.
|
16
|
+
|
17
|
+
The plugin is a Ruby class +Page+ that inherits Bwkfanboy::Parse
|
18
|
+
parent, overriding 1 method.
|
19
|
+
|
20
|
+
The plugins can be in the system
|
21
|
+
|
22
|
+
`gem env gemdir`/gems/bwkfanboy-x.y.z/lib/bwkfanboy/plugins
|
23
|
+
|
24
|
+
or user's home
|
25
|
+
|
26
|
+
~/.bwkfanboy/plugins
|
27
|
+
|
28
|
+
directories.
|
29
|
+
|
30
|
+
== Pipeline
|
31
|
+
|
32
|
+
The program consists of 4 parts:
|
33
|
+
|
34
|
+
0. *bwkfanboy* script that takes 1 parameter: the name of a file in
|
35
|
+
plugins directories (without the .rb suffix). So, for example to get
|
36
|
+
an atom feed from dailyprincetonian.com you type:
|
37
|
+
|
38
|
+
% bwkfanboy bwk
|
39
|
+
|
40
|
+
and it will load
|
41
|
+
<tt>/usr/local/lib/ruby/gems/1.9/gems/bwkfanboy-0.0.1/lib/bwkfanboy/plugins/bwk.rb</tt>
|
42
|
+
file on my FreeBSD machine, fetch and parse html from
|
43
|
+
dailyprincetonian.com and generate the required feed, dumping it to
|
44
|
+
stdout.
|
45
|
+
|
46
|
+
The script is just a convinient wrapper for 3 separate utils.
|
47
|
+
|
48
|
+
1. *bwkfanboy_fetch*
|
49
|
+
|
50
|
+
It reads 1 line from stdin for the URL to fetch from. The result will
|
51
|
+
be dumped to stdout.
|
52
|
+
|
53
|
+
2. *bwkfanboy_parse*
|
54
|
+
|
55
|
+
It takes 1 parameter: <em>a full path</em> to a plugin file.
|
56
|
+
|
57
|
+
This util reads stdin expecting it to be a xhtml, parses it and dumps
|
58
|
+
the result to stdout in JSON-formatted object.
|
59
|
+
|
60
|
+
3. *bwkfanboy_generate*
|
61
|
+
|
62
|
+
Reads stdin expecting it to be a proper JSON-formatted object.
|
63
|
+
|
64
|
+
The result will be an Atom feed dumped to stdout in UTF-8.
|
65
|
+
|
66
|
+
So, without the wrapper all this together looks like:
|
67
|
+
|
68
|
+
% echo http://example.org | bwkfanboy_fetch |
|
69
|
+
bwkfanboy_parse /path/to/my/plugin.rb | bwkfanboy_generate
|
70
|
+
|
71
|
+
== Log
|
72
|
+
|
73
|
+
All utils write to <tt>/tmp/bwkfanboy/USER/log/general.log</tt> file if
|
74
|
+
permissions allows it.
|
75
|
+
|
76
|
+
== HTTP
|
77
|
+
|
78
|
+
There are 2 method to get an Atom feed via HTTP:
|
79
|
+
|
80
|
+
1. <tt>web/bwkfanboy.cgi</tt> (from the program tarball), which you may
|
81
|
+
copy to your Apache cgi directory and run it. This prohibits you from
|
82
|
+
using HOME directory for your own plugins. Also the cgi script
|
83
|
+
requires some manual editing (setting 1 variable in it) before even
|
84
|
+
you can start utilizing it.
|
85
|
+
|
86
|
+
2. Small *bwkfanboy_server* HTTP server. It can run from any user and
|
87
|
+
thus is able to inherit env variables for discovering your HOME
|
88
|
+
directory. Read bin/bwkfanboy_server to know how to operate it.
|
data/Rakefile
ADDED
@@ -0,0 +1,48 @@
|
|
1
|
+
# -*-ruby-*-
|
2
|
+
|
3
|
+
require 'rake'
|
4
|
+
require 'rake/gempackagetask'
|
5
|
+
require 'rake/clean'
|
6
|
+
require 'rake/rdoctask'
|
7
|
+
require 'rake/testtask'
|
8
|
+
|
9
|
+
spec = Gem::Specification.new() {|i|
|
10
|
+
i.name = "bwkfanboy"
|
11
|
+
i.summary = 'A converter from HTML to Atom feed that you can use to watch sites that do not provide its own feed.'
|
12
|
+
i.version = '0.0.1'
|
13
|
+
i.author = 'Alexander Gromnitsky'
|
14
|
+
i.email = 'alexander.gromnitsky@gmail.com'
|
15
|
+
i.homepage = 'http://github.com/gromnitsky/bwkfanboy'
|
16
|
+
i.platform = Gem::Platform::RUBY
|
17
|
+
i.required_ruby_version = '>= 1.9'
|
18
|
+
i.files = FileList['lib/**/*', 'bin/*', 'doc/*', '[A-Z]*', 'test/**/*']
|
19
|
+
|
20
|
+
i.executables = FileList['bin/*'].gsub(/^bin\//, '')
|
21
|
+
i.default_executable = i.name
|
22
|
+
|
23
|
+
i.has_rdoc = true
|
24
|
+
i.test_files = FileList['test/test_*.rb']
|
25
|
+
|
26
|
+
i.rdoc_options << '-m' << 'Bwkfanboy' << '-x' << 'plugins'
|
27
|
+
i.extra_rdoc_files = FileList['bin/*']
|
28
|
+
|
29
|
+
i.add_dependency('activesupport', '>= 3.0.0')
|
30
|
+
i.add_dependency('nokogiri', '>= 1.4.3')
|
31
|
+
i.add_dependency('open4', '>= 1.0.1')
|
32
|
+
i.add_dependency('jsonschema', '>= 2.0.0')
|
33
|
+
}
|
34
|
+
|
35
|
+
Rake::GemPackageTask.new(spec).define()
|
36
|
+
|
37
|
+
task(default: %(repackage))
|
38
|
+
|
39
|
+
Rake::RDocTask.new('doc') {|i|
|
40
|
+
i.main = "Bwkfanboy"
|
41
|
+
i.rdoc_files = FileList['doc/*', 'lib/**/*.rb', 'bin/*']
|
42
|
+
i.rdoc_files.exclude("lib/**/plugins", "test")
|
43
|
+
}
|
44
|
+
|
45
|
+
Rake::TestTask.new() {|i|
|
46
|
+
i.test_files = FileList['test/test_*.rb']
|
47
|
+
i.verbose = true
|
48
|
+
}
|
data/TODO
ADDED
data/bin/bwkfanboy
ADDED
@@ -0,0 +1,128 @@
|
|
1
|
+
#!/usr/bin/env ruby19
|
2
|
+
# -*-ruby-*-
|
3
|
+
|
4
|
+
# This program is executed by bin/bwkfanboy_server to do all dirty work:
|
5
|
+
# fetch HTML, parse it and generate a pretty Atom feed.
|
6
|
+
#
|
7
|
+
# It is a wrapper which you can utilize for such common tasks as listing
|
8
|
+
# all available plugins.
|
9
|
+
#
|
10
|
+
# Type:
|
11
|
+
#
|
12
|
+
# % bwkfanboy -h
|
13
|
+
#
|
14
|
+
# to get some basic help & read about Bwkfanboy module.
|
15
|
+
|
16
|
+
require_relative '../lib/bwkfanboy/parser'
|
17
|
+
|
18
|
+
$conf = {
|
19
|
+
prog_name: 'bwkfanboy',
|
20
|
+
prog_ver: '0.0.1',
|
21
|
+
mode: 'pipe',
|
22
|
+
banner: "Usage: #{File.basename($0)} [options] plugin-name"
|
23
|
+
}
|
24
|
+
|
25
|
+
class Plugin # :nodoc: all
|
26
|
+
attr_reader :name, :path
|
27
|
+
|
28
|
+
def initialize(name)
|
29
|
+
@name = name
|
30
|
+
@path = nil
|
31
|
+
end
|
32
|
+
|
33
|
+
def dirs()
|
34
|
+
# try to create user's home plugin directory
|
35
|
+
begin
|
36
|
+
['~/.bwkfanboy', '~/.bwkfanboy/plugins'].each {|i|
|
37
|
+
Dir.mkdir(File.expand_path(i))
|
38
|
+
}
|
39
|
+
rescue
|
40
|
+
# empty
|
41
|
+
end
|
42
|
+
|
43
|
+
r = []
|
44
|
+
dirs = ['~/.bwkfanboy/plugins', "#{Bwkfanboy::Utils.gem_dir_system}/plugins"]
|
45
|
+
begin
|
46
|
+
# this will fail for user's home directory under Apache CGI
|
47
|
+
# environment
|
48
|
+
dirs.map! {|i| File.expand_path(i) }
|
49
|
+
rescue
|
50
|
+
end
|
51
|
+
dirs.each {|i|
|
52
|
+
if File.readable?(i) then
|
53
|
+
r << i
|
54
|
+
else
|
55
|
+
Bwkfanboy::Utils.warnx("directory #{i} isn't readable");
|
56
|
+
end
|
57
|
+
}
|
58
|
+
|
59
|
+
if r.length == 0 then
|
60
|
+
Bwkfanboy::Utils.errx(1, "no dirs for plugins found: #{dirs.join(' ')}")
|
61
|
+
end
|
62
|
+
return r
|
63
|
+
end
|
64
|
+
|
65
|
+
def load()
|
66
|
+
abort($conf[:banner]) unless (@name && @name !~ /^\s*$/)
|
67
|
+
|
68
|
+
dirs.each {|i|
|
69
|
+
files = Dir.glob("#{i}/*.rb")
|
70
|
+
if (@path = files.index("#{i}/#{@name}.rb")) then
|
71
|
+
@path = files[@path]
|
72
|
+
break
|
73
|
+
end
|
74
|
+
}
|
75
|
+
Bwkfanboy::Utils.errx(1, "no such plugin '#{@name}'") if ! @path
|
76
|
+
Bwkfanboy::Utils.plugin_load(@path, Bwkfanboy::Meta::PLUGIN_CLASS)
|
77
|
+
|
78
|
+
pn = Page.new()
|
79
|
+
pn.check()
|
80
|
+
return pn
|
81
|
+
end
|
82
|
+
|
83
|
+
end # class
|
84
|
+
|
85
|
+
# ----------------------------------------------------------------------
|
86
|
+
|
87
|
+
o = Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner]) # create OptionParser object
|
88
|
+
o.on('-i', 'Show some info about the plugin') { |i| $conf[:mode] = 'info' }
|
89
|
+
o.on('-l', 'List all plugins') { |i| $conf[:mode] = 'list' }
|
90
|
+
o.on('-p', 'List all plugins paths') { |i| $conf[:mode] = 'path' }
|
91
|
+
o.on('-D', '(ignore this) Use URI_DEBUG const instead URI in plugins') { |i| $conf[:mode] = 'debug' }
|
92
|
+
Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], o) # run cl parser
|
93
|
+
|
94
|
+
plugin = Plugin.new(ARGV[0])
|
95
|
+
|
96
|
+
case $conf[:mode]
|
97
|
+
when 'list'
|
98
|
+
plugin.dirs().each {|i|
|
99
|
+
puts "#{i}:"
|
100
|
+
Dir.glob("#{i}/*.rb").each {|j|
|
101
|
+
puts "\t#{File.basename(j, '.rb')}"
|
102
|
+
}
|
103
|
+
}
|
104
|
+
when 'path'
|
105
|
+
plugin.dirs().each {|i| puts i}
|
106
|
+
when 'info'
|
107
|
+
plugin.load().dump_info
|
108
|
+
else
|
109
|
+
# A pipe mode
|
110
|
+
pn = plugin.load()
|
111
|
+
cmd = "./bwkfanboy_fetch | ./bwkfanboy_parse '#{plugin.path}' | ./bwkfanboy_generate"
|
112
|
+
if Bwkfanboy::Utils.cfg[:verbose] >= 2 then
|
113
|
+
puts ($conf[:mode] != 'debug' ? pn.class::Meta::URI : pn.class::Meta::URI_DEBUG)
|
114
|
+
puts cmd
|
115
|
+
exit 0
|
116
|
+
end
|
117
|
+
|
118
|
+
# go to the directory with current script
|
119
|
+
Dir.chdir(File.dirname(File.expand_path($0)))
|
120
|
+
|
121
|
+
pipe = IO.popen(cmd, 'w+')
|
122
|
+
pipe.puts ($conf[:mode] != 'debug' ? pn.class::Meta::URI : pn.class::Meta::URI_DEBUG)
|
123
|
+
pipe.close_write
|
124
|
+
while line = pipe.gets
|
125
|
+
puts line
|
126
|
+
end
|
127
|
+
pipe.close
|
128
|
+
end
|
data/bin/bwkfanboy_fetch
ADDED
@@ -0,0 +1,30 @@
|
|
1
|
+
#!/usr/bin/env ruby19
|
2
|
+
# -*-ruby-*-
|
3
|
+
|
4
|
+
# Read stdin for a URI or a full path to the local file, download it (or
|
5
|
+
# read for the local file) and print to stdout.
|
6
|
+
|
7
|
+
require 'open-uri'
|
8
|
+
|
9
|
+
require_relative '../lib/bwkfanboy/utils'
|
10
|
+
|
11
|
+
$conf = { banner: "Usage: #{File.basename($0)} [options] < uri" }
|
12
|
+
|
13
|
+
Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], nil, true)
|
14
|
+
|
15
|
+
uri = gets.chomp()
|
16
|
+
|
17
|
+
Bwkfanboy::Utils.veputs(1, "fetching #{uri}\n")
|
18
|
+
|
19
|
+
begin
|
20
|
+
open(uri, "User-Agent" => Bwkfanboy::Meta::USER_AGENT) {|f|
|
21
|
+
if defined?(f.meta) && f.status[0] != '200' then
|
22
|
+
Bwkfanboy::Utils.errx(1, "cannot fetch #{uri} : HTTP responce: #{f.status[0]}")
|
23
|
+
end
|
24
|
+
Bwkfanboy::Utils.veputs(1, "charset = #{f.content_type_parse[1][1]}\n") if defined?(f.meta)
|
25
|
+
f.each_line {|i| puts i}
|
26
|
+
}
|
27
|
+
rescue
|
28
|
+
# typically Errno::ENOENT
|
29
|
+
Bwkfanboy::Utils.errx(1, "cannot fetch: #{$!}");
|
30
|
+
end
|
@@ -0,0 +1,80 @@
|
|
1
|
+
#!/usr/bin/env ruby19
|
2
|
+
# -*-ruby-*-
|
3
|
+
|
4
|
+
# Read stdin for JSON, generate from it an Atom feed and print the
|
5
|
+
# result to stdout in UTF-8.
|
6
|
+
#
|
7
|
+
# One can validate the JSON by providing '--check' command line option
|
8
|
+
# (by default the validating is off).
|
9
|
+
|
10
|
+
require 'rss/maker'
|
11
|
+
require 'date'
|
12
|
+
require 'json'
|
13
|
+
require 'jsonschema'
|
14
|
+
|
15
|
+
require_relative '../lib/bwkfanboy/utils'
|
16
|
+
|
17
|
+
$conf = {
|
18
|
+
banner: "Usage: #{File.basename($0)} [options] < json",
|
19
|
+
check: false
|
20
|
+
}
|
21
|
+
|
22
|
+
o = Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner])
|
23
|
+
o.on('--check', 'Validate the input (slow!)') { |i| $conf[:check] = true }
|
24
|
+
Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], o) # run cl parser
|
25
|
+
|
26
|
+
begin
|
27
|
+
j = JSON.parse(STDIN.read)
|
28
|
+
rescue
|
29
|
+
Bwkfanboy::Utils.errx(1, "stdin had invalid JSON");
|
30
|
+
end
|
31
|
+
|
32
|
+
# validate the input
|
33
|
+
schema = Bwkfanboy::Utils.gem_dir_system() + '/schema.js'
|
34
|
+
if $conf[:check] then
|
35
|
+
begin
|
36
|
+
JSON::Schema.validate(j, JSON.parse(File.read(schema)))
|
37
|
+
rescue
|
38
|
+
Bwkfanboy::Utils.errx(1, "JSON validation with schema (#{schema}) failed");
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
feed = RSS::Maker.make("atom") { |maker|
|
43
|
+
maker.channel.id = j['channel']['id']
|
44
|
+
maker.channel.updated = j['channel']['updated']
|
45
|
+
maker.channel.author = j['channel']['author']
|
46
|
+
maker.channel.title = j['channel']['title']
|
47
|
+
|
48
|
+
maker.channel.links.new_link {|i|
|
49
|
+
i.href = j['channel']['link']
|
50
|
+
i.rel = 'alternate'
|
51
|
+
i.type = 'text/html' # eh
|
52
|
+
}
|
53
|
+
|
54
|
+
maker.items.do_sort = true
|
55
|
+
|
56
|
+
j['x_entries'].each { |i|
|
57
|
+
maker.items.new_item do |item|
|
58
|
+
item.links.new_link {|k|
|
59
|
+
k.href = i['link']
|
60
|
+
k.rel = 'alternate'
|
61
|
+
k.type = 'text/html' # only to make happy crappy pr2nntp gateway
|
62
|
+
}
|
63
|
+
item.title = i['title']
|
64
|
+
item.author = i['author']
|
65
|
+
item.updated = i['updated']
|
66
|
+
item.content.type = j['channel']['x_entries_content_type']
|
67
|
+
|
68
|
+
case item.content.type
|
69
|
+
when 'text'
|
70
|
+
item.content.content = i['content']
|
71
|
+
when 'html'
|
72
|
+
item.content.content = i['content']
|
73
|
+
else
|
74
|
+
item.content.xhtml = i['content']
|
75
|
+
end
|
76
|
+
end
|
77
|
+
}
|
78
|
+
}
|
79
|
+
|
80
|
+
puts feed
|
data/bin/bwkfanboy_parse
ADDED
@@ -0,0 +1,32 @@
|
|
1
|
+
#!/usr/bin/env ruby19
|
2
|
+
# -*-ruby-*-
|
3
|
+
|
4
|
+
# Take 1 command line parameter: a full path to a plugin.
|
5
|
+
#
|
6
|
+
# Read stdin for a HTML, parse it and print the result to stdout in JSON
|
7
|
+
# format. If '-vv' command line parameters were given, output will be in
|
8
|
+
# 'key: value' pairs and <em>not</em> in JSON.
|
9
|
+
|
10
|
+
require_relative '../lib/bwkfanboy/parser'
|
11
|
+
|
12
|
+
$conf = {
|
13
|
+
banner: "Usage: #{File.basename($0)} [options] /path/to/my/plugin.rb < html"
|
14
|
+
}
|
15
|
+
|
16
|
+
Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], nil, true)
|
17
|
+
|
18
|
+
if ARGV.size == 0 then
|
19
|
+
abort($conf[:banner])
|
20
|
+
else
|
21
|
+
Bwkfanboy::Utils.plugin_load(ARGV[0], Bwkfanboy::Meta::PLUGIN_CLASS)
|
22
|
+
end;
|
23
|
+
|
24
|
+
pn = Page.new()
|
25
|
+
pn.check()
|
26
|
+
pn.parse()
|
27
|
+
|
28
|
+
if Bwkfanboy::Utils.cfg[:verbose] >= 2 then
|
29
|
+
pn.dump()
|
30
|
+
else
|
31
|
+
puts pn.to_json()
|
32
|
+
end
|
@@ -0,0 +1,141 @@
|
|
1
|
+
#!/usr/bin/env ruby19
|
2
|
+
# -*-ruby-*-
|
3
|
+
|
4
|
+
# Start a HTTP server (by default on 127.0.0.1:9042). To get Atom feeds
|
5
|
+
# from it, initiate GET request with URI
|
6
|
+
#
|
7
|
+
# http://localhost:9042/?p=PLUGIN
|
8
|
+
#
|
9
|
+
# where +PLUGIN+ is a name of a bwkfanboy's plugin (without '.re' suffix).
|
10
|
+
#
|
11
|
+
# To list all available plugins, point you browser to
|
12
|
+
#
|
13
|
+
# http://localhost:9042/list
|
14
|
+
#
|
15
|
+
# The server is intended to run from a non-root user from
|
16
|
+
# <tt>~/.login</tt> file. It can detach from a terminal if you give it
|
17
|
+
# '-d' command line option.
|
18
|
+
#
|
19
|
+
# For other help, type:
|
20
|
+
#
|
21
|
+
# bwkfanboy_server -h
|
22
|
+
#
|
23
|
+
# The server maintains 2 logs:
|
24
|
+
#
|
25
|
+
# /tmp/bwkfanboy/USER/log/bwkfanboy_server.log
|
26
|
+
# /tmp/bwkfanboy/USER/log/bwkfanboy_server-access.log
|
27
|
+
#
|
28
|
+
# The file with a pid:
|
29
|
+
#
|
30
|
+
# /tmp/bwkfanboy/USER/bwkfanboy_server.pid
|
31
|
+
|
32
|
+
require 'webrick'
|
33
|
+
require_relative '../lib/bwkfanboy/utils'
|
34
|
+
|
35
|
+
$conf = {
|
36
|
+
addr: '127.0.0.1',
|
37
|
+
port: 9042,
|
38
|
+
converter: "./#{Bwkfanboy::Meta::NAME}",
|
39
|
+
banner: "Usage: #{File.basename($0)} [options]",
|
40
|
+
server_type: WEBrick::SimpleServer,
|
41
|
+
workdir: File.dirname(File.expand_path($0)),
|
42
|
+
pidfile: "#{Bwkfanboy::Meta::DIR_TMP}/#{File.basename($0)}.pid",
|
43
|
+
log: "#{Bwkfanboy::Meta::DIR_LOG}/#{File.basename($0)}.log",
|
44
|
+
alog: "#{Bwkfanboy::Meta::DIR_LOG}/#{File.basename($0)}-access.log",
|
45
|
+
mode: 'pipe'
|
46
|
+
}
|
47
|
+
|
48
|
+
o = Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner]) # create OptionParser object
|
49
|
+
o.on('-b VAL', 'BindAddress') { |i| $conf[:addr] = i }
|
50
|
+
o.on('-p VAL', 'A port number') { |i| $conf[:port] = i }
|
51
|
+
o.on('-c VAL', "A path to main #{Bwkfanboy::Meta::NAME} executable") { |i| $conf[:converter] = i }
|
52
|
+
o.on('-d', 'Detach from a terminal') {|i| $conf[:server_type] = WEBrick::Daemon }
|
53
|
+
o.on('-D', '(ignore this) Use URI_DEBUG const instead URI in plugins') { |i| $conf[:mode] = 'debug' }
|
54
|
+
Bwkfanboy::Utils.cl_parse(ARGV, $conf[:banner], o) # run cl parser
|
55
|
+
|
56
|
+
Bwkfanboy::Utils.dir_tmp_create()
|
57
|
+
|
58
|
+
class FeedServlet < WEBrick::HTTPServlet::AbstractServlet # :nodoc: all
|
59
|
+
def do_GET(req, res)
|
60
|
+
if req.query['p'] && req.query['p'] =~ Bwkfanboy::Meta::PLUGIN_NAME
|
61
|
+
res['Content-Type'] = 'application/atom+xml; charset=UTF-8'
|
62
|
+
res['Content-Disposition'] = "inline; filename=\"#{Bwkfanboy::Meta::NAME}-#{req.query['p']}.xml"
|
63
|
+
|
64
|
+
cmd = "#{$conf[:converter]} #{$conf[:mode] == 'debug' ? '-D' : ''} #{req.query['p']}"
|
65
|
+
r = Bwkfanboy::Utils.cmd_run(cmd)
|
66
|
+
if r[0] != 0 then
|
67
|
+
raise WEBrick::HTTPStatus::InternalServerError.new("Errors in the pipeline:\n\n #{r[1]}")
|
68
|
+
end
|
69
|
+
|
70
|
+
res.body = r[2]
|
71
|
+
else
|
72
|
+
raise WEBrick::HTTPStatus::InternalServerError.new("Parameter 'p' required")
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
class FeedListServlet < WEBrick::HTTPServlet::AbstractServlet # :nodoc: all
|
78
|
+
def do_GET(req, res)
|
79
|
+
cmd = "#{$conf[:converter]} -l"
|
80
|
+
r = Bwkfanboy::Utils.cmd_run(cmd)
|
81
|
+
if r[0] != 0 then
|
82
|
+
raise WEBrick::HTTPStatus::InternalServerError.new("Errors:\n\n #{r[1]}")
|
83
|
+
end
|
84
|
+
|
85
|
+
res.body = r[2]
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
# create temporally files
|
90
|
+
def start_callback()
|
91
|
+
Dir.chdir($conf[:workdir])
|
92
|
+
if ! File.executable?($conf[:converter]) then
|
93
|
+
Bwkfanboy::Utils.errx(1, "Missing executable file '#{$conf[:converter]}'")
|
94
|
+
end
|
95
|
+
|
96
|
+
begin
|
97
|
+
File.open($conf[:pidfile], "w+") {|i| i.puts $$ }
|
98
|
+
rescue
|
99
|
+
Bwkfanboy::Utils.warnx("unable to create a pidfile " + $conf[:pidfile])
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
# remove temporally files
|
104
|
+
def stop_callback()
|
105
|
+
begin
|
106
|
+
File.unlink $conf[:pidfile]
|
107
|
+
rescue
|
108
|
+
# ignore errors
|
109
|
+
end
|
110
|
+
end
|
111
|
+
|
112
|
+
def log_create(f)
|
113
|
+
begin
|
114
|
+
log = Logger.new(f, 2, Bwkfanboy::Meta::LOG_MAXSIZE)
|
115
|
+
rescue
|
116
|
+
Bwkfanboy::Utils.warnx("cannot open log #{f}");
|
117
|
+
return nil
|
118
|
+
end
|
119
|
+
log.datetime_format = "%H:%M:%S"
|
120
|
+
log
|
121
|
+
end
|
122
|
+
|
123
|
+
# ----------------------------------------------------------------------
|
124
|
+
|
125
|
+
server_log = log_create($conf[:log])
|
126
|
+
access_log = [[ log_create($conf[:alog]), WEBrick::AccessLog::COMBINED_LOG_FORMAT ]]
|
127
|
+
|
128
|
+
s = WEBrick::HTTPServer.new(Port: $conf[:port],
|
129
|
+
BindAddress: $conf[:addr],
|
130
|
+
ServerType: $conf[:server_type],
|
131
|
+
StartCallback: -> {start_callback},
|
132
|
+
StopCallback: -> {stop_callback},
|
133
|
+
Logger: server_log,
|
134
|
+
AccessLog: access_log
|
135
|
+
)
|
136
|
+
s.mount("/", FeedServlet)
|
137
|
+
s.mount("/list", FeedListServlet)
|
138
|
+
['TERM', 'INT'].each {|i|
|
139
|
+
trap(i) { s.shutdown }
|
140
|
+
}
|
141
|
+
s.start
|
data/doc/README.rdoc
ADDED
@@ -0,0 +1,88 @@
|
|
1
|
+
= About
|
2
|
+
|
3
|
+
bwkfanboy is a HTML to Atom feed converter that you can use to watch
|
4
|
+
sites that do not provide its own feed.
|
5
|
+
|
6
|
+
The converter is not a magick tool: you'll need to write a plugin (in
|
7
|
+
Ruby) for each site you want to watch. bwkfanboy provides guidelines and
|
8
|
+
general assistance.
|
9
|
+
|
10
|
+
= Architecture
|
11
|
+
|
12
|
+
== Plugins
|
13
|
+
|
14
|
+
bwkfanboy comes with 1 exmple plugin that parses a search page of
|
15
|
+
dailyprincetonian.com looking for bwk's articles.
|
16
|
+
|
17
|
+
The plugin is a Ruby class +Page+ that inherits Bwkfanboy::Parse
|
18
|
+
parent, overriding 1 method.
|
19
|
+
|
20
|
+
The plugins can be in the system
|
21
|
+
|
22
|
+
`gem env gemdir`/gems/bwkfanboy-x.y.z/lib/bwkfanboy/plugins
|
23
|
+
|
24
|
+
or user's home
|
25
|
+
|
26
|
+
~/.bwkfanboy/plugins
|
27
|
+
|
28
|
+
directories.
|
29
|
+
|
30
|
+
== Pipeline
|
31
|
+
|
32
|
+
The program consists of 4 parts:
|
33
|
+
|
34
|
+
0. *bwkfanboy* script that takes 1 parameter: the name of a file in
|
35
|
+
plugins directories (without the .rb suffix). So, for example to get
|
36
|
+
an atom feed from dailyprincetonian.com you type:
|
37
|
+
|
38
|
+
% bwkfanboy bwk
|
39
|
+
|
40
|
+
and it will load
|
41
|
+
<tt>/usr/local/lib/ruby/gems/1.9/gems/bwkfanboy-0.0.1/lib/bwkfanboy/plugins/bwk.rb</tt>
|
42
|
+
file on my FreeBSD machine, fetch and parse html from
|
43
|
+
dailyprincetonian.com and generate the required feed, dumping it to
|
44
|
+
stdout.
|
45
|
+
|
46
|
+
The script is just a convinient wrapper for 3 separate utils.
|
47
|
+
|
48
|
+
1. *bwkfanboy_fetch*
|
49
|
+
|
50
|
+
It reads 1 line from stdin for the URL to fetch from. The result will
|
51
|
+
be dumped to stdout.
|
52
|
+
|
53
|
+
2. *bwkfanboy_parse*
|
54
|
+
|
55
|
+
It takes 1 parameter: <em>a full path</em> to a plugin file.
|
56
|
+
|
57
|
+
This util reads stdin expecting it to be a xhtml, parses it and dumps
|
58
|
+
the result to stdout in JSON-formatted object.
|
59
|
+
|
60
|
+
3. *bwkfanboy_generate*
|
61
|
+
|
62
|
+
Reads stdin expecting it to be a proper JSON-formatted object.
|
63
|
+
|
64
|
+
The result will be an Atom feed dumped to stdout in UTF-8.
|
65
|
+
|
66
|
+
So, without the wrapper all this together looks like:
|
67
|
+
|
68
|
+
% echo http://example.org | bwkfanboy_fetch |
|
69
|
+
bwkfanboy_parse /path/to/my/plugin.rb | bwkfanboy_generate
|
70
|
+
|
71
|
+
== Log
|
72
|
+
|
73
|
+
All utils write to <tt>/tmp/bwkfanboy/USER/log/general.log</tt> file if
|
74
|
+
permissions allows it.
|
75
|
+
|
76
|
+
== HTTP
|
77
|
+
|
78
|
+
There are 2 method to get an Atom feed via HTTP:
|
79
|
+
|
80
|
+
1. <tt>web/bwkfanboy.cgi</tt> (from the program tarball), which you may
|
81
|
+
copy to your Apache cgi directory and run it. This prohibits you from
|
82
|
+
using HOME directory for your own plugins. Also the cgi script
|
83
|
+
requires some manual editing (setting 1 variable in it) before even
|
84
|
+
you can start utilizing it.
|
85
|
+
|
86
|
+
2. Small *bwkfanboy_server* HTTP server. It can run from any user and
|
87
|
+
thus is able to inherit env variables for discovering your HOME
|
88
|
+
directory. Read bin/bwkfanboy_server to know how to operate it.
|