google 1.0.3 → 1.0.4
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE.md +10 -0
- data/README.md +87 -0
- data/google.gemspec +19 -0
- data/lib/google/display_serp.rb +53 -0
- data/lib/google/grab.rb +26 -0
- data/lib/google/input.rb +51 -0
- data/lib/google/pipe-view.rb +20 -0
- data/lib/google/request.rb +20 -0
- data/lib/google/search.rb +26 -0
- data/lib/google/utils.rb +51 -0
- data/lib/google.rb +8 -8
- data/lib/reverse-markdown/reverse_markdown.rb +297 -0
- data/lib/trollop/lib/trollop.rb +791 -0
- data/lib/trollop/test/test_trollop.rb +1094 -0
- metadata +14 -17
data/LICENSE.md
ADDED
@@ -0,0 +1,10 @@
|
|
1
|
+
[The MIT License](http://www.opensource.org/licenses/mit-license.php)
|
2
|
+
=====================================================================
|
3
|
+
|
4
|
+
Copyright (c) 2012 Kerrick Long
|
5
|
+
|
6
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
7
|
+
|
8
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
9
|
+
|
10
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
google
|
2
|
+
======
|
3
|
+
|
4
|
+
## The power of Google Search in your command line.
|
5
|
+
|
6
|
+
The CLI displays results from the [Google Web Search API](https://developers.google.com/web-search/), allows you to page through the results, and choose a page to view, all without leaving the command line.
|
7
|
+
|
8
|
+
[Hosted on RubyGems](https://rubygems.org/gems/google).
|
9
|
+
|
10
|
+
## Installation
|
11
|
+
|
12
|
+
gem install google # Requires Ruby :)
|
13
|
+
|
14
|
+
## Usage Examples
|
15
|
+
|
16
|
+
Typing `google --help` will list all the available commands. They are also listed here.
|
17
|
+
|
18
|
+
The google gem is a simple tool to search Gooogle with via a CLI.
|
19
|
+
Usage:
|
20
|
+
google [options] "my search query string here"
|
21
|
+
where [options] are:
|
22
|
+
--page, -p <i>: Start by showing the <i>th result page. (Default: 1)
|
23
|
+
--size, -s <i>: Show <i> results on each SERP. Must be between 1 and 8. (Default: 4)
|
24
|
+
--result, -r <i>: Skip the SERP and show the <i>th result.
|
25
|
+
--lucky, -l: I'm feeling lucky! Skip the SERP and show the first result. (Alias to --result 1)
|
26
|
+
--no-readability, -e: Filter the results through readability to get rid of extra content. (Default: true)
|
27
|
+
--no-markdown, -m: Change the results from raw HTML to markdown. (Default: true)
|
28
|
+
--version, -v: Print the version and exit.
|
29
|
+
--help, -h: Show this information and exit.
|
30
|
+
|
31
|
+
### I'm Feeling Lucky into Less
|
32
|
+
|
33
|
+
google -l 'how to install nvidia drivers debian squeeze' | less
|
34
|
+
|
35
|
+
### Search on a single site
|
36
|
+
|
37
|
+
google 'site:randsinrepose.com NADD'
|
38
|
+
|
39
|
+
### Vanity Search
|
40
|
+
|
41
|
+
google '"Kerrick Long" -inurl:kerrick -inurl:kerricklong'
|
42
|
+
|
43
|
+
## Features
|
44
|
+
|
45
|
+
* You have access to all the [search operators](http://support.google.com/websearch/bin/answer.py?hl=en&answer=136861) you've come to know and love in Google Search.
|
46
|
+
|
47
|
+
* If using the `--lucky` or `--result` flags, you can pipe the results into any other unix command.
|
48
|
+
|
49
|
+
* Results are filtered through [Readability](http://www.readability.com/), so you only get the relevant content. (Shoutout to [Ruby Readability](https://github.com/iterationlabs/ruby-readability)!)
|
50
|
+
|
51
|
+
* Results are shown in [markdown](http://daringfireball.net/projects/markdown/) for a good balance between legibility and document heirarchy. (Shoutout to [reverse_markdown](https://github.com/xijo/reverse_markdown)!)
|
52
|
+
|
53
|
+
* Results pages are formatted to look like a Google SERP, including colors, domains, descriptions, and bold search matches. (Shoutout to [formatador](https://github.com/geemus/formatador)!)
|
54
|
+
|
55
|
+
* You can pipe a result into another unix command! At the prompt, type the number of the result, followed by a pipe as normal. Note that utilities such as `less` and `more` that need to control the display don't work.
|
56
|
+
|
57
|
+
\> 3 | espeak -a 200 -v en-us
|
58
|
+
|
59
|
+
## Supported Ruby Versions
|
60
|
+
|
61
|
+
This library has only been tested against Ruby 1.9.3. Other versions might be supported, but they haven't been tested.
|
62
|
+
|
63
|
+
## Acknowledgements
|
64
|
+
|
65
|
+
This gem is [Powered by Google](http://www.google.com).
|
66
|
+
|
67
|
+
### Copyright
|
68
|
+
|
69
|
+
Copyright (c) 2012 Kerrick Long. See [LICENSE](https://github.com/Kerrick/google/blob/master/LICENSE.md) for details.
|
70
|
+
|
71
|
+
### Dependencies
|
72
|
+
|
73
|
+
* [Ruby Readability](https://github.com/iterationlabs/ruby-readability) (gem)
|
74
|
+
|
75
|
+
* [Nokogiri](http://nokogiri.org/)
|
76
|
+
|
77
|
+
* [guess_html_encoding](https://github.com/iterationlabs/guess_html_encoding)
|
78
|
+
|
79
|
+
* [JSON Implementation for Ruby](http://flori.github.com/json/) (gem)
|
80
|
+
|
81
|
+
* [Formatador](https://github.com/geemus/formatador) (gem)
|
82
|
+
|
83
|
+
* [HTML Entities for Ruby](http://htmlentities.rubyforge.org/) (gem)
|
84
|
+
|
85
|
+
* [Reverse Markdown](https://github.com/xijo/reverse_markdown) (lib)
|
86
|
+
|
87
|
+
* [Trollop](http://trollop.rubyforge.org/) (lib)
|
data/google.gemspec
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
Gem::Specification.new do |s|
|
2
|
+
s.name = "google"
|
3
|
+
s.version = "1.0.4"
|
4
|
+
s.executables << 'google'
|
5
|
+
s.add_runtime_dependency "json", ["~> 1"]
|
6
|
+
s.add_runtime_dependency "htmlentities", ["~> 4"]
|
7
|
+
s.add_runtime_dependency "formatador", ["~> 0.2"]
|
8
|
+
s.add_runtime_dependency "ruby-readability", ["~> 0.5"]
|
9
|
+
s.date = "2012-05-23"
|
10
|
+
s.summary = "Google Search on the command line"
|
11
|
+
s.description = "A ruby gem to give you the power of Google Search in your command line."
|
12
|
+
s.authors = ["Kerrick Long"]
|
13
|
+
s.email = "me@kerricklong.com"
|
14
|
+
s.files = ["lib/google.rb"]
|
15
|
+
s.files += %w(LICENSE.md README.md google.gemspec)
|
16
|
+
s.files += Dir.glob("lib/**/*.rb")
|
17
|
+
s.files += Dir.glob("bin/**/*")
|
18
|
+
s.homepage = "http://kerrick.github.com/google/"
|
19
|
+
end
|
@@ -0,0 +1,53 @@
|
|
1
|
+
require 'formatador'
|
2
|
+
require 'htmlentities'
|
3
|
+
|
4
|
+
class Google
|
5
|
+
def display_serp info
|
6
|
+
results = info[:results]
|
7
|
+
query_strings = info[:query_strings]
|
8
|
+
coder = HTMLEntities.new
|
9
|
+
current_page = results['responseData']['cursor']['currentPageIndex']+1
|
10
|
+
max_result = query_strings[:start] + query_strings[:rsz]
|
11
|
+
estimated_results = results['responseData']['cursor']['resultCount']
|
12
|
+
result_array = results['responseData']['results']
|
13
|
+
|
14
|
+
attribution = "\n"
|
15
|
+
attribution << ' ' * (max_result.to_s.length + 2)
|
16
|
+
attribution << "[yellow]Powered by Google[/]"
|
17
|
+
Formatador.display_line attribution
|
18
|
+
|
19
|
+
result_array.each_with_index do | result, i |
|
20
|
+
this_num = (i + query_strings[:start] + 1).to_s
|
21
|
+
|
22
|
+
serp_title = "\n#{' ' * (max_result.to_s.length - this_num.length)}"
|
23
|
+
serp_title << "[bold][blue]#{this_num}. "
|
24
|
+
serp_title << "[normal]#{result["titleNoFormatting"]}[/]\n"
|
25
|
+
serp_url = ' ' * max_result.to_s.length
|
26
|
+
serp_url << "[green]#{result["url"]}[/]\n"
|
27
|
+
serp_desc = ' ' * max_result.to_s.length
|
28
|
+
serp_desc << result["content"].gsub(/<b>/, "[bold]")
|
29
|
+
.gsub(/<\/b>/, "[/]").squeeze(" ")
|
30
|
+
|
31
|
+
Formatador.display_line coder.decode(
|
32
|
+
Utils::wrap(serp_title, :prefix => max_result.to_s.length + 2)
|
33
|
+
)
|
34
|
+
Formatador.display_line coder.decode(
|
35
|
+
Utils::wrap(serp_url, :prefix => max_result.to_s.length + 2)
|
36
|
+
)
|
37
|
+
Formatador.display_line coder.decode(
|
38
|
+
Utils::wrap(serp_desc, :prefix => max_result.to_s.length + 2)
|
39
|
+
)
|
40
|
+
end
|
41
|
+
|
42
|
+
metadata = ''
|
43
|
+
metadata << "\n#{' ' * (max_result.to_s.length + 2)}"
|
44
|
+
metadata << "[yellow]Displaying results "
|
45
|
+
metadata << "#{query_strings[:start] + 1} through "
|
46
|
+
metadata << "#{max_result} of "
|
47
|
+
metadata << "#{estimated_results} "
|
48
|
+
metadata << "(Page #{current_page})"
|
49
|
+
Formatador.display_line metadata
|
50
|
+
|
51
|
+
input info, result_array
|
52
|
+
end
|
53
|
+
end
|
data/lib/google/grab.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
require 'ruby-readability'
|
3
|
+
require_relative '../reverse-markdown/reverse_markdown'
|
4
|
+
|
5
|
+
class Google
|
6
|
+
def grab url
|
7
|
+
source = open(url).read
|
8
|
+
|
9
|
+
if @opts[:readability]
|
10
|
+
content = Readability::Document.new(source,
|
11
|
+
:tags => %w[div p a pre code h1 h2 h3 h4 h5 h6 blockquote ul ol li],
|
12
|
+
:attributes => %w[href],
|
13
|
+
:remote_empty_nodes => true).content
|
14
|
+
else
|
15
|
+
content = source
|
16
|
+
end
|
17
|
+
|
18
|
+
if @opts[:markdown]
|
19
|
+
output = ReverseMarkdown.new.parse_string(content)
|
20
|
+
else
|
21
|
+
output = content
|
22
|
+
end
|
23
|
+
|
24
|
+
output
|
25
|
+
end
|
26
|
+
end
|
data/lib/google/input.rb
ADDED
@@ -0,0 +1,51 @@
|
|
1
|
+
require 'formatador'
|
2
|
+
|
3
|
+
class Google
|
4
|
+
def input info, result_array
|
5
|
+
prompt = "\n[yellow]"
|
6
|
+
prompt << "Enter N or P for Next or Previous page, E or Q to quit, "
|
7
|
+
prompt << "or a number to see that result."
|
8
|
+
|
9
|
+
Formatador.display Utils::wrap(prompt)
|
10
|
+
|
11
|
+
Formatador.display "\n[bold]>[/] "
|
12
|
+
choice = STDIN.gets.chomp + ' '
|
13
|
+
|
14
|
+
case
|
15
|
+
when choice[0].downcase == 'e' || choice[0].downcase == 'q'
|
16
|
+
exit
|
17
|
+
when choice[0].downcase == 'n'
|
18
|
+
results = request :q => @query,
|
19
|
+
:rsz => @opts[:size],
|
20
|
+
:start => info[:query_strings][:start] + @opts[:size]
|
21
|
+
display_serp results
|
22
|
+
when choice[0].downcase == 'p'
|
23
|
+
if info[:query_strings][:start] < 1
|
24
|
+
Formatador.display Utils::wrap("[yellow]! Already at page one.")
|
25
|
+
input info
|
26
|
+
else
|
27
|
+
results = request :q => @query,
|
28
|
+
:rsz => @opts[:size],
|
29
|
+
:start => info[:query_strings][:start] - @opts[:size]
|
30
|
+
display_serp results
|
31
|
+
end
|
32
|
+
when choice[0].match(/\d/)
|
33
|
+
/(\d+)(\s*\|(.*))*/.match(choice) do | str |
|
34
|
+
num = str[1].to_i - 1
|
35
|
+
if info[:query_strings][:start] <= num &&
|
36
|
+
info[:query_strings][:start] + @opts[:size] > num
|
37
|
+
if str[3].nil?
|
38
|
+
view result_array[num]['url']
|
39
|
+
else
|
40
|
+
pipe str[3].strip, result_array[num]['url']
|
41
|
+
end
|
42
|
+
else
|
43
|
+
Formatador.display Utils::wrap("[yellow]! Result not on this page.")
|
44
|
+
input info, result_array
|
45
|
+
end
|
46
|
+
end
|
47
|
+
else
|
48
|
+
input info
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
class Google
|
2
|
+
def pipe command, url
|
3
|
+
# puts "You want to pipe the data."
|
4
|
+
# puts "URL: #{url}"
|
5
|
+
# puts "Command: #{command}"
|
6
|
+
text = grab(url)
|
7
|
+
|
8
|
+
IO.popen(command, 'w+') do | process |
|
9
|
+
process.write(text)
|
10
|
+
process.close_write
|
11
|
+
puts process.read
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
def view url
|
16
|
+
# puts "You want to view the data."
|
17
|
+
# puts "URL: #{url}"
|
18
|
+
Formatador.display Utils::wrap(grab(url))
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
require 'json'
|
3
|
+
|
4
|
+
class Google
|
5
|
+
def request query_strings
|
6
|
+
@api_url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0"
|
7
|
+
@api_url << "&rsz=#{query_strings[:rsz]}"
|
8
|
+
@api_url << "&start=#{query_strings[:start]}"
|
9
|
+
@api_url << "&q=#{query_strings[:q]}"
|
10
|
+
|
11
|
+
results = JSON.parse(open(@api_url).read)
|
12
|
+
|
13
|
+
if results['responseStatus'].to_i != 200
|
14
|
+
Trollop::die Utils::wrap("Google Search API Status " +
|
15
|
+
"#{results['responseStatus']}. Details:\n" +
|
16
|
+
"#{results['responseDetails']}\nTry again in a moment")
|
17
|
+
end
|
18
|
+
{results: results, query_strings: query_strings}
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'uri'
|
2
|
+
|
3
|
+
class Google
|
4
|
+
|
5
|
+
def initialize query, opts
|
6
|
+
@query = URI.escape(query)
|
7
|
+
@opts = opts
|
8
|
+
|
9
|
+
@opts[:size] = 8 if opts[:size] > 7
|
10
|
+
@opts[:size] = 1 if opts[:size] <= 1
|
11
|
+
end
|
12
|
+
|
13
|
+
def search
|
14
|
+
if @opts[:result]
|
15
|
+
results = request :q => @query,
|
16
|
+
:rsz => 1,
|
17
|
+
:start => @opts[:result]
|
18
|
+
view results[:results]['responseData']['results'][0]['url']
|
19
|
+
else
|
20
|
+
results = request :q => @query,
|
21
|
+
:rsz => @opts[:size],
|
22
|
+
:start => ((@opts[:page] - 1) * @opts[:size])
|
23
|
+
display_serp results
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
data/lib/google/utils.rb
ADDED
@@ -0,0 +1,51 @@
|
|
1
|
+
class Utils
|
2
|
+
# From the trollop rubygem
|
3
|
+
# File lib/trollop.rb, line 644
|
4
|
+
def self.wrap_line str, opts={}
|
5
|
+
prefix = opts[:prefix] || 0
|
6
|
+
width = opts[:width] || (self.width - 1)
|
7
|
+
start = 0
|
8
|
+
ret = []
|
9
|
+
until start > str.length
|
10
|
+
nextt =
|
11
|
+
if start + width >= str.length
|
12
|
+
str.length
|
13
|
+
else
|
14
|
+
x = str.rindex(/\s/, start + width)
|
15
|
+
x = str.index(/\s/, start) if x && x < start
|
16
|
+
x || str.length
|
17
|
+
end
|
18
|
+
ret << (ret.empty? ? "" : " " * prefix) + str[start ... nextt]
|
19
|
+
start = nextt + 1
|
20
|
+
end
|
21
|
+
ret
|
22
|
+
end
|
23
|
+
|
24
|
+
# From the trollop rubygem
|
25
|
+
# File lib/trollop.rb, line 505
|
26
|
+
def self.wrap str, opts={} # :nodoc:
|
27
|
+
if str == ""
|
28
|
+
[""]
|
29
|
+
else
|
30
|
+
str.split("\n").map { |s| self.wrap_line s, opts }.flatten(1).join("\n")
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
# From the trollop rubygem
|
35
|
+
# File lib/trollop.rb, line 489
|
36
|
+
def self.width #:nodoc:
|
37
|
+
@@width ||= if $stdout.tty?
|
38
|
+
begin
|
39
|
+
require 'curses'
|
40
|
+
Curses::init_screen
|
41
|
+
x = Curses::cols
|
42
|
+
Curses::close_screen
|
43
|
+
x
|
44
|
+
rescue Exception
|
45
|
+
80
|
46
|
+
end
|
47
|
+
else
|
48
|
+
80
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
data/lib/google.rb
CHANGED
@@ -1,11 +1,11 @@
|
|
1
|
-
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
1
|
+
require_relative 'trollop/lib/trollop'
|
2
|
+
require_relative 'google/utils'
|
3
|
+
require_relative 'google/search'
|
4
|
+
require_relative 'google/request'
|
5
|
+
require_relative 'google/display_serp'
|
6
|
+
require_relative 'google/input'
|
7
|
+
require_relative 'google/grab'
|
8
|
+
require_relative 'google/pipe-view'
|
9
9
|
|
10
10
|
opts = Trollop::options do
|
11
11
|
version "google v1.0.3 (c) 2012 Kerrick Long http://kerrick.github.com/google"
|
@@ -0,0 +1,297 @@
|
|
1
|
+
require 'rexml/document'
|
2
|
+
require 'benchmark'
|
3
|
+
include REXML
|
4
|
+
include Benchmark
|
5
|
+
|
6
|
+
# reverse markdown for ruby
|
7
|
+
# author: JO
|
8
|
+
# e-mail: xijo@gmx.de
|
9
|
+
# date: 14.7.2009
|
10
|
+
# version: 0.1
|
11
|
+
# license: GPL
|
12
|
+
# taken from https://github.com/xijo/reverse-markdown/raw/master/reverse_markdown.rb
|
13
|
+
|
14
|
+
# TODO
|
15
|
+
# - ol numbering is buggy, in fact doesn't matter for markdown code
|
16
|
+
# -
|
17
|
+
|
18
|
+
class ReverseMarkdown
|
19
|
+
|
20
|
+
# set basic variables:
|
21
|
+
# - @li_counter: numbering list item (li) tags in an ordered list (ol)
|
22
|
+
# - @links: hold the links for adding them to the bottom of the @output
|
23
|
+
# this means 'reference style', please take a look at http://daringfireball.net/projects/markdown/syntax#link
|
24
|
+
# - @outout: fancy markdown code in here!
|
25
|
+
# - @indent: control indention level for nested lists
|
26
|
+
# - @errors: appearing errors, like unknown tags, go into this array
|
27
|
+
def initialize()
|
28
|
+
@li_counter = 0
|
29
|
+
@links = []
|
30
|
+
@output = ""
|
31
|
+
@indent = 0
|
32
|
+
@errors = []
|
33
|
+
end
|
34
|
+
|
35
|
+
# Invokes the HTML parsing by using a string. Returns the markdown code in @output.
|
36
|
+
# To garantuee well-formed xml for REXML a <root> element will be added, but has no effect.
|
37
|
+
# After parsing all elements, the 'reference style'-links will be inserted.
|
38
|
+
def parse_string(string)
|
39
|
+
doc = Document.new("<root>\n"+string+"\n</root>")
|
40
|
+
parse_element(doc.root, :none)
|
41
|
+
insert_links()
|
42
|
+
@output
|
43
|
+
end
|
44
|
+
|
45
|
+
# Parsing an element and its children (recursive) and writing its markdown code to @output
|
46
|
+
# 1. do indent for nested list items
|
47
|
+
# 2. add the markdown opening tag for this element
|
48
|
+
# 3a. if element only contains text, handle it like a text node
|
49
|
+
# 3b. if element is a container handle its children, which may be text- or element nodes
|
50
|
+
# 4. finally add the markdown ending tag for this element
|
51
|
+
def parse_element(element, parent)
|
52
|
+
name = element.name.to_sym
|
53
|
+
# 1.
|
54
|
+
@output << indent() if name.eql?(:li)
|
55
|
+
# 2.
|
56
|
+
@output << opening(element, parent)
|
57
|
+
|
58
|
+
# 3a.
|
59
|
+
if (element.has_text? and element.children.size < 2)
|
60
|
+
@output << text_node(element, parent)
|
61
|
+
end
|
62
|
+
|
63
|
+
# 3b.
|
64
|
+
if element.has_elements?
|
65
|
+
element.children.each do |child|
|
66
|
+
# increase indent if nested list
|
67
|
+
@indent += 1 if element.name=~/(ul|ol)/ and parent.eql?(:li)
|
68
|
+
|
69
|
+
if child.node_type.eql?(:element)
|
70
|
+
parse_element(child, element.name.to_sym)
|
71
|
+
else
|
72
|
+
if parent.eql?(:blockquote)
|
73
|
+
@output << child.to_s.gsub("\n ", "\n>")
|
74
|
+
else
|
75
|
+
@output << child.to_s
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
# decrease indent if end of nested list
|
80
|
+
@indent -= 1 if element.name=~/(ul|ol)/ and parent.eql?(:li)
|
81
|
+
end
|
82
|
+
end
|
83
|
+
|
84
|
+
# 4.
|
85
|
+
@output << ending(element, parent)
|
86
|
+
end
|
87
|
+
|
88
|
+
# Returns opening markdown tag for the element. Its parent matters sometimes!
|
89
|
+
def opening(type, parent)
|
90
|
+
case type.name.to_sym
|
91
|
+
when :h1
|
92
|
+
"# "
|
93
|
+
when :li
|
94
|
+
parent.eql?(:ul) ? " - " : " "+(@li_counter+=1).to_s+". "
|
95
|
+
when :ol
|
96
|
+
@li_counter = 0
|
97
|
+
""
|
98
|
+
when :ul
|
99
|
+
""
|
100
|
+
when :h2
|
101
|
+
"## "
|
102
|
+
when :h3
|
103
|
+
"### "
|
104
|
+
when :h4
|
105
|
+
"#### "
|
106
|
+
when :h5
|
107
|
+
"##### "
|
108
|
+
when :h6
|
109
|
+
"###### "
|
110
|
+
when :em
|
111
|
+
"*"
|
112
|
+
when :strong
|
113
|
+
"**"
|
114
|
+
when :blockquote
|
115
|
+
# remove leading newline
|
116
|
+
type.children.first.value = ""
|
117
|
+
"> "
|
118
|
+
when :code
|
119
|
+
parent.eql?(:pre) ? " " : "`"
|
120
|
+
when :a
|
121
|
+
"["
|
122
|
+
when :img
|
123
|
+
"!["
|
124
|
+
when :hr
|
125
|
+
"----------\n\n"
|
126
|
+
when :root
|
127
|
+
""
|
128
|
+
else
|
129
|
+
@errors << "unknown start tag: "+type.name.to_s
|
130
|
+
""
|
131
|
+
end
|
132
|
+
end
|
133
|
+
|
134
|
+
# Returns the closing markdown tag, like opening()
|
135
|
+
def ending(type, parent)
|
136
|
+
case type.name.to_sym
|
137
|
+
when :h1
|
138
|
+
" #\n\n"
|
139
|
+
when :h2
|
140
|
+
" ##\n\n"
|
141
|
+
when :h3
|
142
|
+
" ###\n\n"
|
143
|
+
when :h4
|
144
|
+
" ####\n\n"
|
145
|
+
when :h5
|
146
|
+
" #####\n\n"
|
147
|
+
when :h6
|
148
|
+
" ######\n\n"
|
149
|
+
when :p
|
150
|
+
parent.eql?(:root) ? "\n\n" : "\n"
|
151
|
+
when :ol
|
152
|
+
parent.eql?(:li) ? "" : "\n"
|
153
|
+
when :ul
|
154
|
+
parent.eql?(:li) ? "" : "\n"
|
155
|
+
when :em
|
156
|
+
"*"
|
157
|
+
when :strong
|
158
|
+
"**"
|
159
|
+
when :li
|
160
|
+
""
|
161
|
+
when :blockquote
|
162
|
+
""
|
163
|
+
when :code
|
164
|
+
parent.eql?(:pre) ? "" : "`"
|
165
|
+
when :a
|
166
|
+
@links << type.attribute('href').to_s
|
167
|
+
"][" + @links.size.to_s + "] "
|
168
|
+
when :img
|
169
|
+
@links << type.attribute('src').to_s
|
170
|
+
"" + type.attribute('alt').to_s + "][" + @links.size.to_s + "] "
|
171
|
+
"#{type.attribute('alt')}][#{@links.size}] "
|
172
|
+
when :root
|
173
|
+
""
|
174
|
+
else
|
175
|
+
@errors << " unknown end tag: "+type.name.to_s
|
176
|
+
""
|
177
|
+
end
|
178
|
+
end
|
179
|
+
|
180
|
+
# Perform indent: two space, @indent times - quite simple! :)
|
181
|
+
def indent
|
182
|
+
str = ""
|
183
|
+
@indent.times do
|
184
|
+
str << " "
|
185
|
+
end
|
186
|
+
str
|
187
|
+
end
|
188
|
+
|
189
|
+
# Return the content of element, which should be just text.
|
190
|
+
# If its a code block to indent of 4 spaces.
|
191
|
+
# For block quotation add a leading '>'
|
192
|
+
def text_node(element, parent)
|
193
|
+
if element.name.to_sym.eql?(:code) and parent.eql?(:pre)
|
194
|
+
element.text.gsub("\n","\n ") << "\n"
|
195
|
+
elsif parent.eql?(:blockquote)
|
196
|
+
element.text.gsub!("\n ","\n>")
|
197
|
+
else
|
198
|
+
element.text
|
199
|
+
end
|
200
|
+
end
|
201
|
+
|
202
|
+
# Insert the mentioned reference style links.
|
203
|
+
def insert_links
|
204
|
+
@output << "\n"
|
205
|
+
@links.each_index do |index|
|
206
|
+
@output << " [#{index+1}]: #{@links[index]}\n"
|
207
|
+
end
|
208
|
+
end
|
209
|
+
|
210
|
+
# Print out all errors, that occured and have been written to @errors.
|
211
|
+
def print_errors
|
212
|
+
@errors.each do |error|
|
213
|
+
puts error
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
# Perform a benchmark on a given string n-times.
|
218
|
+
def speed_benchmark(string, n)
|
219
|
+
initialize()
|
220
|
+
bm(15) do |test|
|
221
|
+
test.report("reverse markdown:") { n.times do; parse_string(string); initialize(); end; }
|
222
|
+
end
|
223
|
+
end
|
224
|
+
|
225
|
+
end
|
226
|
+
|
227
|
+
if __FILE__ == $0
|
228
|
+
|
229
|
+
# Example HTML Code for parsing
|
230
|
+
example = <<-EOF
|
231
|
+
This text, though not within an element, should also be shown.
|
232
|
+
|
233
|
+
<h2>heading 1.1</h2>
|
234
|
+
|
235
|
+
<p>text *italic* and **bold**.</p>
|
236
|
+
|
237
|
+
<pre><code>text *italic* and **bold**.
|
238
|
+
sdfsdff
|
239
|
+
sdfsd
|
240
|
+
sdf sdfsdf
|
241
|
+
</code></pre>
|
242
|
+
|
243
|
+
<blockquote>
|
244
|
+
<p>text <em>italic</em> and <strong>bold</strong>. sdfsdff
|
245
|
+
sdfsd sdf sdfsdf</p>
|
246
|
+
</blockquote>
|
247
|
+
|
248
|
+
<p>asdasd <code>sdfsdfsdf</code> asdad <a href="http://www.bla.de">link text</a></p>
|
249
|
+
|
250
|
+
<p><a href="http://www.bla.de">link <strong>text</strong></a></p>
|
251
|
+
|
252
|
+
<ol>
|
253
|
+
<li>List item</li>
|
254
|
+
<li>List <em>item</em>
|
255
|
+
<ol><li>List item</li>
|
256
|
+
<li>dsfdsf
|
257
|
+
<ul><li>dfwe</li>
|
258
|
+
<li>dsfsdfsdf</li></ul></li>
|
259
|
+
<li>lidsf <img src="http://www.dfgdfg.de/dsf.jpe" alt="item" title="" /></li></ol></li>
|
260
|
+
<li>sdfsdfsdf
|
261
|
+
<ul><li>sdfsdfsdf</li>
|
262
|
+
<li>sdfsdfsdf <strong>sdfsdf</strong></li></ul></li>
|
263
|
+
</ol>
|
264
|
+
|
265
|
+
<blockquote>
|
266
|
+
<p>Lorem ipsum dolor sit amet, consetetur
|
267
|
+
voluptua. At vero eos et accusam et
|
268
|
+
justo duo dolores et ea rebum. Stet
|
269
|
+
clita kasd gubergren, no sea takimata
|
270
|
+
sanctus est Lorem ipsum dolor sit
|
271
|
+
amet. <em>italic</em></p>
|
272
|
+
</blockquote>
|
273
|
+
|
274
|
+
<hr />
|
275
|
+
|
276
|
+
<blockquote>
|
277
|
+
<p>Lorem ipsum dolor sit amet, consetetur
|
278
|
+
sadipscing elitr, sed diam nonumy
|
279
|
+
eirmod tempor invidunt ut labore et
|
280
|
+
dolore magna aliquyam erat, sed</p>
|
281
|
+
</blockquote>
|
282
|
+
|
283
|
+
This should also be shown, even if it's not wrapped in an element.
|
284
|
+
|
285
|
+
<p>nur ein text! nur eine maschine!</p>
|
286
|
+
|
287
|
+
This text should not be invisible!
|
288
|
+
EOF
|
289
|
+
|
290
|
+
r = ReverseMarkdown.new
|
291
|
+
|
292
|
+
puts r.parse_string(example)
|
293
|
+
|
294
|
+
#r.print_errors
|
295
|
+
|
296
|
+
#r.speed_benchmark(example, 100)
|
297
|
+
end
|