bwkfanboy 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +22 -0
- data/README.rdoc +88 -0
- data/Rakefile +48 -0
- data/TODO +7 -0
- data/bin/bwkfanboy +128 -0
- data/bin/bwkfanboy_fetch +30 -0
- data/bin/bwkfanboy_generate +80 -0
- data/bin/bwkfanboy_parse +32 -0
- data/bin/bwkfanboy_server +141 -0
- data/doc/README.rdoc +88 -0
- data/doc/plugin.rdoc +118 -0
- data/lib/bwkfanboy/parser.rb +143 -0
- data/lib/bwkfanboy/plugins/bwk.rb +33 -0
- data/lib/bwkfanboy/plugins/freebsd-ports-update.rb +76 -0
- data/lib/bwkfanboy/schema.js +39 -0
- data/lib/bwkfanboy/utils.rb +134 -0
- data/test/plugins/bwk.rb +29 -0
- data/test/plugins/empty.rb +0 -0
- data/test/popen4.sh +4 -0
- data/test/semis/bwk.html +398 -0
- data/test/semis/bwk.json +82 -0
- data/test/test_fetch.rb +34 -0
- data/test/test_generate.rb +30 -0
- data/test/test_parse.rb +32 -0
- data/test/test_server.rb +39 -0
- data/test/ts_utils.rb +21 -0
- data/test/xml-clean.sh +8 -0
- metadata +158 -0
data/doc/plugin.rdoc
ADDED
@@ -0,0 +1,118 @@
|
|
1
|
+
= HOWTO Write a \Plugin
|
2
|
+
|
3
|
+
First of all, look at examples provided with bwkfanboy. They were
|
4
|
+
intended to be 100% working because I was writing them for myself.
|
5
|
+
|
6
|
+
Basically, all you need is to write a class named _Page_ that
|
7
|
+
inherits this class Bwkfanboy::Parse, override in the child #myparse
|
8
|
+
method and write a simple module named _Meta_ inside your _Page_
|
9
|
+
class.
|
10
|
+
|
11
|
+
== Skeleton
|
12
|
+
|
13
|
+
Here is a skeleton of a plugin:
|
14
|
+
|
15
|
+
require 'nokogiri'
|
16
|
+
|
17
|
+
class Page < Bwkfanboy::Parse
|
18
|
+
module Meta
|
19
|
+
URI = 'http://example.org/news'
|
20
|
+
ENC = 'UTF-8'
|
21
|
+
VERSION = 1
|
22
|
+
COPYRIGHT = '(c) 2010 John Doe'
|
23
|
+
TITLE = "News from example.org"
|
24
|
+
CONTENT_TYPE = 'html'
|
25
|
+
end
|
26
|
+
|
27
|
+
def myparse()
|
28
|
+
# read stdin and parse it
|
29
|
+
doc = Nokogiri::HTML(STDIN, nil, Meta::ENC)
|
30
|
+
doc.xpath("XPATH QUERY").each {|i|
|
31
|
+
t = clean(i.xpath("XPATH QUERY").text())
|
32
|
+
l = clean(i.xpath("XPATH QUERY").text())
|
33
|
+
u = date(i.xpath("XPATH QUERY").text())
|
34
|
+
a = clean(i.xpath("XPATH QUERY").text())
|
35
|
+
c = clean(i.xpath("XPATH QUERY").text())
|
36
|
+
|
37
|
+
self << { title: t, link: l, updated: u, author: a, content: c }
|
38
|
+
}
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
As you see, we are using Nokogiri for HTML parsing. You are not
|
43
|
+
required to use it too--take the parser whatever you like. Nokogiri
|
44
|
+
is nice, because it's able to read a broken HTML and search thought
|
45
|
+
it via XPath. If you would like to use, for example, REXML, beware
|
46
|
+
that it loves only strict XML--you may need to clean the HTML with
|
47
|
+
an external utility such as Tide.
|
48
|
+
|
49
|
+
Bwkfanboy loads a plugin from 1 file as a valid Ruby code. It means
|
50
|
+
that the plugin can contain *any* Ruby code, but doesn't mean that
|
51
|
+
it should.
|
52
|
+
|
53
|
+
=== \Meta
|
54
|
+
|
55
|
+
Module _Meta_ can have only constants--and *all* constants listed in
|
56
|
+
the skeleton are required.
|
57
|
+
|
58
|
+
* <tt>URI</tt>--can be a <tt>http(s)://</tt> or <tt>ftp://</tt> URL
|
59
|
+
or just a path to a file on your local machine, as
|
60
|
+
<tt>/home/bob/huzza.html</tt>. This is the source that
|
61
|
+
bwkfanboy will be transforming to the Atom feed.
|
62
|
+
|
63
|
+
* <tt>ENC</tt>--an encoding for URI.
|
64
|
+
|
65
|
+
* <tt>VERSION</tt>--a version of a plugin.
|
66
|
+
|
67
|
+
* <tt>COPYRIGHT</tt>--some boring string.
|
68
|
+
|
69
|
+
* <tt>TITLE</tt>--a short description of the future feed. It'll be
|
70
|
+
used later in the resulting XML.
|
71
|
+
|
72
|
+
* <tt>CONTENT_TYPE</tt>--one of +xhtml+, +html+ or +text+ values. This is
|
73
|
+
very important constant because it says in what format entries
|
74
|
+
will be placed in the feed. Usually it's safe to use +html+.
|
75
|
+
|
76
|
+
=== myparse
|
77
|
+
|
78
|
+
In #myparse method please read stdin. The contends of it is the raw
|
79
|
+
HTML you want to parse. The general idea:
|
80
|
+
|
81
|
+
* Atom feed must contain at least 1 entry, so look in HTML for some
|
82
|
+
crap which you break into 5 peaces: title of the entry, link for
|
83
|
+
it, a date for the entry, who is author of the entry and its
|
84
|
+
contents.
|
85
|
+
|
86
|
+
* After you scan and grab 1 entry, create a hash and add it to
|
87
|
+
_self_ as it was in the skeleton:
|
88
|
+
|
89
|
+
self << { title: t, link: l, updated: u, author: a, content: c }
|
90
|
+
|
91
|
+
Here variables _t_, _l_, _u_, _a_ and _c_ contains the actual
|
92
|
+
values of 5 peaces for the entry. Names of the keys in hash are
|
93
|
+
important of course--don't invent your own.
|
94
|
+
|
95
|
+
* There would be probably more crap in HTML that you can use to
|
96
|
+
construct another entry. Keep parsing and adding entries.
|
97
|
+
|
98
|
+
* While you scanning, use the 2 helper methods for cleaning each
|
99
|
+
peace: \#clean, which removed duplicate spaces and #date, which
|
100
|
+
parses a sting and return a date in ISO8601 format. You may
|
101
|
+
override #date method if you like.
|
102
|
+
|
103
|
+
== How to test all this
|
104
|
+
|
105
|
+
To test how nice your plugin works, save the html page to the file
|
106
|
+
and type:
|
107
|
+
|
108
|
+
% bwkparser_parse -vv path/to/a/plugin.rb < saved_page.html
|
109
|
+
|
110
|
+
to see the result as in plain text, or
|
111
|
+
|
112
|
+
% bwkparser_parse -v path/to/a/plugin.rb < saved_page.html
|
113
|
+
|
114
|
+
as pretty JSON.
|
115
|
+
|
116
|
+
<tt>bwkparser_parse</tt> return 0 if no errors occurred or >= 1 if you
|
117
|
+
have errors in your plugin code. N.B.: the output from
|
118
|
+
<tt>bwkparser_parse</tt> is always in UTF-8.
|
@@ -0,0 +1,143 @@
|
|
1
|
+
require 'json'
|
2
|
+
require 'date'
|
3
|
+
|
4
|
+
require_relative 'utils'
|
5
|
+
|
6
|
+
# :include: ../../doc/README.rdoc
|
7
|
+
module Bwkfanboy
|
8
|
+
|
9
|
+
# :include: ../../doc/plugin.rdoc
|
10
|
+
class Parse
|
11
|
+
ENTRIES_MAX = 64
|
12
|
+
|
13
|
+
def initialize()
|
14
|
+
@entries = []
|
15
|
+
end
|
16
|
+
|
17
|
+
# Invokes #myparse & checks if it has grabbed something.
|
18
|
+
def parse()
|
19
|
+
@entries = []
|
20
|
+
begin
|
21
|
+
myparse()
|
22
|
+
rescue
|
23
|
+
@entries = []
|
24
|
+
Utils.errx(1, "parsing failed: #{$!}\n\nBacktrace:\n\n#{$!.backtrace.join("\n")}")
|
25
|
+
end
|
26
|
+
Utils.errx(1, "plugin return no output") if @entries.length == 0
|
27
|
+
end
|
28
|
+
|
29
|
+
# Prints entries in 'key: value' formatted strings. Intended for
|
30
|
+
# debugging.
|
31
|
+
def dump()
|
32
|
+
@entries.each {|i|
|
33
|
+
puts "title : " + i[:title]
|
34
|
+
puts "link : " + i[:link]
|
35
|
+
puts "updated : " + i[:updated]
|
36
|
+
puts "author : " + i[:author]
|
37
|
+
puts "content : " + i[:content]
|
38
|
+
puts ""
|
39
|
+
}
|
40
|
+
end
|
41
|
+
|
42
|
+
def to_json()
|
43
|
+
# guess the time of the most recent entry
|
44
|
+
u = DateTime.parse() # January 1, 4713 BCE
|
45
|
+
@entries.each {|i|
|
46
|
+
t = DateTime.parse(i[:updated])
|
47
|
+
u = t if t > u
|
48
|
+
}
|
49
|
+
|
50
|
+
m = get_meta()
|
51
|
+
j = {
|
52
|
+
channel: {
|
53
|
+
updated: u,
|
54
|
+
id: m::URI,
|
55
|
+
author: Meta::NAME, # just a placeholder
|
56
|
+
title: m::TITLE,
|
57
|
+
link: m::URI,
|
58
|
+
x_entries_content_type: m::CONTENT_TYPE
|
59
|
+
},
|
60
|
+
x_entries: @entries
|
61
|
+
}
|
62
|
+
Utils::cfg[:verbose] >= 1 ? JSON.pretty_generate(j) : JSON.generate(j)
|
63
|
+
end
|
64
|
+
|
65
|
+
# After loading a plugin, one can do basic validation of the
|
66
|
+
# plugin's class with the help of this method.
|
67
|
+
def check
|
68
|
+
m = get_meta()
|
69
|
+
begin
|
70
|
+
[:URI, :ENC, :VERSION, :COPYRIGHT, :TITLE, :CONTENT_TYPE].each {|i|
|
71
|
+
fail "#{m}::#{i} not defined or empty" if (! m.const_defined?(i) || m.const_get(i) =~ /^\s*$/)
|
72
|
+
}
|
73
|
+
rescue
|
74
|
+
Utils.errx(1, "incomplete plugin: #{$!}")
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
# Prints plugin's meta information.
|
79
|
+
def dump_info()
|
80
|
+
m = get_meta()
|
81
|
+
puts "Version : #{m::VERSION}"
|
82
|
+
puts "Copyright : #{m::COPYRIGHT}"
|
83
|
+
puts "Title : #{m::TITLE}"
|
84
|
+
puts "URI : #{m::URI}"
|
85
|
+
end
|
86
|
+
|
87
|
+
protected
|
88
|
+
|
89
|
+
# This *must* be overridden in the child.
|
90
|
+
def myparse()
|
91
|
+
raise "plugin isn't finished yet"
|
92
|
+
end
|
93
|
+
|
94
|
+
# Tries to parse _s_ as a date string. Return the result in ISO 8601
|
95
|
+
# format.
|
96
|
+
def date(s)
|
97
|
+
begin
|
98
|
+
DateTime.parse(clean(s)).iso8601()
|
99
|
+
rescue
|
100
|
+
Utils.vewarnx(2, "#{s} is unparsable; date is set to current")
|
101
|
+
DateTime.now().iso8601()
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
# will help you to check if there is a
|
106
|
+
def toobig?
|
107
|
+
return true if @entries.length >= ENTRIES_MAX
|
108
|
+
return false
|
109
|
+
end
|
110
|
+
|
111
|
+
def <<(t)
|
112
|
+
if toobig? then
|
113
|
+
Utils.warnx("reached max number of entries (#{ENTRIES_MAX})")
|
114
|
+
return @entries
|
115
|
+
end
|
116
|
+
|
117
|
+
%w(updated author link).each { |i|
|
118
|
+
fail "unable to extract '#{i}'" if ! t.key?(i.to_sym) || t[i.to_sym] == nil || t[i.to_sym].empty?
|
119
|
+
}
|
120
|
+
%w(title content).each { |i|
|
121
|
+
fail "missing '#{i}'" if ! t.key?(i.to_sym) || t[i.to_sym] == nil
|
122
|
+
}
|
123
|
+
# a redundant check if user hasn't redefined date() method
|
124
|
+
if t[:updated] !~ /\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2}/ then
|
125
|
+
fail "'#{t[:updated]}' isn't in iso8601 format"
|
126
|
+
end
|
127
|
+
@entries << t
|
128
|
+
end
|
129
|
+
|
130
|
+
private
|
131
|
+
|
132
|
+
def clean(s)
|
133
|
+
s.gsub(/\s+/, ' ').strip()
|
134
|
+
end
|
135
|
+
|
136
|
+
def get_meta()
|
137
|
+
Utils.errx(1, "incomplete plugin: no #{self.class}::Meta module") if (! defined?(self.class::Meta) || ! self.class::Meta.is_a?(Module))
|
138
|
+
self.class::Meta
|
139
|
+
end
|
140
|
+
|
141
|
+
end # class
|
142
|
+
|
143
|
+
end # module
|
@@ -0,0 +1,33 @@
|
|
1
|
+
# A simple plugin that parses the listing of bwk's articles from
|
2
|
+
# dailyprincetonian.com.
|
3
|
+
|
4
|
+
require 'nokogiri'
|
5
|
+
|
6
|
+
class Page < Bwkfanboy::Parse
|
7
|
+
module Meta
|
8
|
+
URI = 'http://www.dailyprincetonian.com/advanced_search/?author=Brian+Kernighan'
|
9
|
+
URI_DEBUG = '/home/alex/lib/software/alex/bwkfanboy/test/semis/bwk.html'
|
10
|
+
ENC = 'UTF-8'
|
11
|
+
VERSION = 1
|
12
|
+
COPYRIGHT = '(c) 2010 Alexander Gromnitsky'
|
13
|
+
TITLE = "Brian Kernighan's articles from Daily Princetonian"
|
14
|
+
CONTENT_TYPE = 'html'
|
15
|
+
end
|
16
|
+
|
17
|
+
def myparse()
|
18
|
+
url = "http://www.dailyprincetonian.com"
|
19
|
+
|
20
|
+
doc = Nokogiri::HTML(STDIN, nil, Meta::ENC)
|
21
|
+
doc.xpath("//div[@class='article_item']").each {|i|
|
22
|
+
t = clean(i.xpath("h2/a").children.text())
|
23
|
+
fail 'unable to extract link' if (link = clean(i.xpath("h2/a")[0].attributes['href'].value()).empty?)
|
24
|
+
link = clean(i.xpath("h2/a")[0].attributes['href'].value())
|
25
|
+
l = url + link + "print"
|
26
|
+
u = date(i.xpath("h2").children[1].text())
|
27
|
+
a = clean(i.xpath("div/span/a[1]").children.text())
|
28
|
+
c = clean(i.xpath("div[@class='summary']").text())
|
29
|
+
|
30
|
+
self << { title: t, link: l, updated: u, author: a, content: c }
|
31
|
+
}
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,76 @@
|
|
1
|
+
require 'digest/md5'
|
2
|
+
|
3
|
+
class Page < Bwkfanboy::Parse
|
4
|
+
module Meta
|
5
|
+
URI = '/usr/ports/UPDATING'
|
6
|
+
URI_DEBUG = URI
|
7
|
+
ENC = 'ASCII'
|
8
|
+
VERSION = 1
|
9
|
+
COPYRIGHT = '(c) 2010 Alexander Gromnitsky'
|
10
|
+
TITLE = "News from FreeBSD ports"
|
11
|
+
CONTENT_TYPE = 'text'
|
12
|
+
end
|
13
|
+
|
14
|
+
def myadd(ready, t, l, u, a, c)
|
15
|
+
return true if ! ready
|
16
|
+
return false if toobig?
|
17
|
+
self << { title: t, link: l, updated: u, author: a, content: c.rstrip } if ready
|
18
|
+
return true
|
19
|
+
end
|
20
|
+
|
21
|
+
def clean(t)
|
22
|
+
t = t[2..-1] if t[0] != "\t"
|
23
|
+
return '' if t == nil
|
24
|
+
return t
|
25
|
+
end
|
26
|
+
|
27
|
+
def myparse()
|
28
|
+
re_u = /^(\d{8}):$/
|
29
|
+
re_t1 = /^ {2}AFFECTS:\s+(.+)$/
|
30
|
+
re_t2 = /^\s+(.+)$/
|
31
|
+
re_a = /^ {2}AUTHOR:\s+(.+)$/
|
32
|
+
|
33
|
+
ready = false
|
34
|
+
mode = nil
|
35
|
+
t = l = u = a = c = nil
|
36
|
+
while line = STDIN.gets
|
37
|
+
line.rstrip!
|
38
|
+
|
39
|
+
if line =~ re_u then
|
40
|
+
# add a new entry
|
41
|
+
break if ! myadd(ready, t, l, u, a, c)
|
42
|
+
ready = true
|
43
|
+
u = date($1)
|
44
|
+
l = $1 # partial, see below
|
45
|
+
t = a = c = nil
|
46
|
+
next
|
47
|
+
end
|
48
|
+
|
49
|
+
if ready then
|
50
|
+
if line =~ re_t1 then
|
51
|
+
mode = 'title'
|
52
|
+
t = $1
|
53
|
+
c = clean($&) + "\n"
|
54
|
+
# link should be unique
|
55
|
+
l = "file://#{Meta::URI}\##{l}-#{Digest::MD5.hexdigest($1)}"
|
56
|
+
elsif line =~ re_a
|
57
|
+
mode = 'author'
|
58
|
+
a = $1
|
59
|
+
c += clean($&) + "\n"
|
60
|
+
elsif line =~ re_t2 && mode == 'title'
|
61
|
+
t += ' ' + $1
|
62
|
+
c += clean($&) + "\n"
|
63
|
+
else
|
64
|
+
# content
|
65
|
+
c += clean(line) + "\n"
|
66
|
+
mode = nil
|
67
|
+
end
|
68
|
+
end
|
69
|
+
|
70
|
+
# skipping the preamble
|
71
|
+
end
|
72
|
+
|
73
|
+
# add last entry
|
74
|
+
myadd(ready, t, l, u, a, c)
|
75
|
+
end
|
76
|
+
end
|
@@ -0,0 +1,39 @@
|
|
1
|
+
{
|
2
|
+
"type": "object",
|
3
|
+
"properties": {
|
4
|
+
"channel": {
|
5
|
+
"type": "object",
|
6
|
+
"properties": {
|
7
|
+
"updated": {
|
8
|
+
"type": "string",
|
9
|
+
"format": "date-time"
|
10
|
+
},
|
11
|
+
"id": { "type": "string" },
|
12
|
+
"author": { "type": "string" },
|
13
|
+
"title": { "type": "string" },
|
14
|
+
"link": { "type": "string" },
|
15
|
+
"x_entries_content_type": {
|
16
|
+
"type": "string",
|
17
|
+
"enum": ["text", "html", "xhtml"]
|
18
|
+
}
|
19
|
+
}
|
20
|
+
},
|
21
|
+
"x_entries": {
|
22
|
+
"type": "array",
|
23
|
+
"minItems": 1,
|
24
|
+
"items": {
|
25
|
+
"type": "object",
|
26
|
+
"properties": {
|
27
|
+
"title": { "type": "string" },
|
28
|
+
"link": { "type": "string" },
|
29
|
+
"updated": {
|
30
|
+
"type": "string",
|
31
|
+
"format": "date-time"
|
32
|
+
},
|
33
|
+
"author": { "type": "string" },
|
34
|
+
"content": { "type": "string" }
|
35
|
+
}
|
36
|
+
}
|
37
|
+
}
|
38
|
+
}
|
39
|
+
}
|
@@ -0,0 +1,134 @@
|
|
1
|
+
require 'optparse'
|
2
|
+
require 'logger'
|
3
|
+
|
4
|
+
require 'open4'
|
5
|
+
require 'active_support/core_ext/module/attribute_accessors'
|
6
|
+
|
7
|
+
module Bwkfanboy
|
8
|
+
module Meta
|
9
|
+
NAME = 'bwkfanboy'
|
10
|
+
VERSION = '0.0.1'
|
11
|
+
USER_AGENT = "#{NAME}/#{VERSION} (#{RUBY_PLATFORM}; N; #{Encoding.default_external.name}; #{RUBY_ENGINE}; rv:#{RUBY_VERSION}.#{RUBY_PATCHLEVEL})"
|
12
|
+
PLUGIN_CLASS = 'Page'
|
13
|
+
DIR_TMP = "/tmp/#{Meta::NAME}/#{ENV['USER']}"
|
14
|
+
DIR_LOG = "#{DIR_TMP}/log"
|
15
|
+
LOG_MAXSIZE = 64*1024
|
16
|
+
PLUGIN_NAME = /^[a-zA-Z0-9_-]+$/
|
17
|
+
end
|
18
|
+
|
19
|
+
module Utils
|
20
|
+
mattr_accessor :cfg, :log
|
21
|
+
|
22
|
+
self.cfg = Hash.new()
|
23
|
+
cfg[:verbose] = 0
|
24
|
+
cfg[:log] = "#{Meta::DIR_LOG}/general.log"
|
25
|
+
|
26
|
+
def self.warnx(t)
|
27
|
+
m = File.basename($0) +" warning: "+ t + "\n";
|
28
|
+
$stderr.print(m);
|
29
|
+
log.warn(m.chomp) if log
|
30
|
+
end
|
31
|
+
|
32
|
+
def self.errx(ec, t)
|
33
|
+
m = File.basename($0) +" error: "+ t + "\n"
|
34
|
+
$stderr.print(m);
|
35
|
+
log.error(m.chomp) if log
|
36
|
+
exit(ec)
|
37
|
+
end
|
38
|
+
|
39
|
+
def self.veputs(level, t)
|
40
|
+
if cfg[:verbose] >= level then
|
41
|
+
# p log
|
42
|
+
log.info(t.chomp) if log
|
43
|
+
print(t)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def self.vewarnx(level, t)
|
48
|
+
warnx(t) if cfg[:verbose] >= level
|
49
|
+
end
|
50
|
+
|
51
|
+
# Logs and pidfiles the other temporal stuff sits here
|
52
|
+
def self.dir_tmp_create()
|
53
|
+
if ! File.writable?(Meta::DIR_TMP) then
|
54
|
+
begin
|
55
|
+
t = '/'
|
56
|
+
Meta::DIR_TMP.split('/')[1..-1].each {|i|
|
57
|
+
t += i + '/'
|
58
|
+
Dir.mkdir(t) if ! Dir.exists?(t)
|
59
|
+
}
|
60
|
+
rescue
|
61
|
+
warnx("cannot create/open directory #{Meta::DIR_TMP} for writing")
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
def self.log_start()
|
67
|
+
dir_tmp_create()
|
68
|
+
begin
|
69
|
+
Dir.mkdir(Meta::DIR_LOG) if ! File.writable?(Meta::DIR_LOG)
|
70
|
+
log = Logger.new(cfg[:log], 2, Meta::LOG_MAXSIZE)
|
71
|
+
rescue
|
72
|
+
warnx("cannot open log #{cfg[:log]}");
|
73
|
+
return nil
|
74
|
+
end
|
75
|
+
log.level = Logger::DEBUG
|
76
|
+
log.datetime_format = "%H:%M:%S"
|
77
|
+
log.info("#{$0} starting")
|
78
|
+
log
|
79
|
+
end
|
80
|
+
self.log = log_start()
|
81
|
+
|
82
|
+
# Loads (via <tt>require()</tt>) a Ruby code from _path_ (the full path to
|
83
|
+
# the file). <em>class_name</em> is the name of the class to check
|
84
|
+
# for existence after successful plugin loading.
|
85
|
+
def self.plugin_load(path, class_name)
|
86
|
+
begin
|
87
|
+
require(path)
|
88
|
+
# TODO get rid of eval()
|
89
|
+
fail "class #{class_name} isn't defined" if (! eval("defined?#{class_name}") || ! eval(class_name).is_a?(Class) )
|
90
|
+
rescue LoadError
|
91
|
+
errx(1, "cannot load plugin '#{path}'");
|
92
|
+
rescue Exception
|
93
|
+
errx(1, "plugin '#{path}' has errors: #{$!}\n\nBacktrace:\n\n#{$!.backtrace.join("\n")}")
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
# Parses command line options. _arr_ is an array of options (usually
|
98
|
+
# +ARGV+). _banner_ is a help string that describes what your
|
99
|
+
# program does.
|
100
|
+
#
|
101
|
+
# If _o_ is non nil function parses _arr_ immediately, otherwise it
|
102
|
+
# only creates +OptionParser+ object and return it (if _simple_ is
|
103
|
+
# false). See <tt>bwkfanboy</tt> script for examples.
|
104
|
+
def self.cl_parse(arr, banner, o = nil, simple = false)
|
105
|
+
if ! o then
|
106
|
+
o = OptionParser.new
|
107
|
+
o.banner = banner
|
108
|
+
o.on('-v', 'Be more verbose.') { |i| Bwkfanboy::Utils.cfg[:verbose] += 1 }
|
109
|
+
return o if ! simple
|
110
|
+
end
|
111
|
+
|
112
|
+
begin
|
113
|
+
o.parse!(arr)
|
114
|
+
rescue
|
115
|
+
Bwkfanboy::Utils.errx(1, $!.to_s)
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
119
|
+
# used in CGI and WEBrick examples
|
120
|
+
def self.cmd_run(cmd)
|
121
|
+
pid, stdin, stdout, stderr = Open4::popen4(cmd)
|
122
|
+
ignored, status = Process::waitpid2(pid)
|
123
|
+
[status.exitstatus, stderr.read, stdout.read]
|
124
|
+
end
|
125
|
+
|
126
|
+
def self.gem_dir_system
|
127
|
+
t = ["#{File.dirname(File.expand_path($0))}/../lib/#{Meta::NAME}",
|
128
|
+
"#{Gem.dir}/gems/#{Meta::NAME}-#{Meta::VERSION}/lib/#{Meta::NAME}"]
|
129
|
+
t.each {|i| return i if File.readable?(i) }
|
130
|
+
raise "both paths are invalid: #{t}"
|
131
|
+
end
|
132
|
+
|
133
|
+
end # utils
|
134
|
+
end
|
data/test/plugins/bwk.rb
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
require 'nokogiri'
|
2
|
+
|
3
|
+
class Page < Bwkfanboy::Parse
|
4
|
+
module Meta
|
5
|
+
URI = "html/bwk.html"
|
6
|
+
ENC = 'UTF-8'
|
7
|
+
VERSION = 1
|
8
|
+
COPYRIGHT = '(c) 2010 Alexander Gromnitsky'
|
9
|
+
TITLE = "Brian Kernighan's articles from Daily Princetonian"
|
10
|
+
CONTENT_TYPE = 'html'
|
11
|
+
end
|
12
|
+
|
13
|
+
def myparse()
|
14
|
+
url = "http://www.dailyprincetonian.com"
|
15
|
+
|
16
|
+
doc = Nokogiri::HTML(STDIN, nil, Meta::ENC)
|
17
|
+
doc.xpath("//div[@class='article_item']").each {|i|
|
18
|
+
t = clean(i.xpath("h2/a").children.text())
|
19
|
+
fail 'unable to extract link' if (link = clean(i.xpath("h2/a")[0].attributes['href'].value()).empty?)
|
20
|
+
link = clean(i.xpath("h2/a")[0].attributes['href'].value())
|
21
|
+
l = url + link + "print"
|
22
|
+
u = date(i.xpath("h2").children[1].text())
|
23
|
+
a = clean(i.xpath("div/span/a[1]").children.text())
|
24
|
+
c = clean(i.xpath("div[@class='summary']").text())
|
25
|
+
|
26
|
+
self << { title: t, link: l, updated: u, author: a, content: c }
|
27
|
+
}
|
28
|
+
end
|
29
|
+
end
|
File without changes
|
data/test/popen4.sh
ADDED