mercury_parser 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- checksums.yaml.gz.sig +2 -0
- data.tar.gz.sig +2 -0
- data/.gitignore +10 -0
- data/.rspec +2 -0
- data/.travis.yml +12 -0
- data/.yardopts +1 -0
- data/Gemfile +17 -0
- data/README.md +87 -0
- data/Rakefile +11 -0
- data/lib/mercury_parser.rb +25 -0
- data/lib/mercury_parser/api/content.rb +21 -0
- data/lib/mercury_parser/article.rb +12 -0
- data/lib/mercury_parser/client.rb +21 -0
- data/lib/mercury_parser/configuration.rb +37 -0
- data/lib/mercury_parser/connection.rb +39 -0
- data/lib/mercury_parser/error.rb +60 -0
- data/lib/mercury_parser/request.rb +38 -0
- data/lib/mercury_parser/version.rb +3 -0
- data/mercury_parser.gemspec +30 -0
- data/spec/mercury_parser/api/content_spec.rb +4 -0
- data/spec/mercury_parser/article_spec.rb +4 -0
- data/spec/mercury_parser/client_spec.rb +62 -0
- data/spec/mercury_parser/error_spec.rb +7 -0
- data/spec/mercury_parser_spec.rb +14 -0
- data/spec/spec_helper.rb +10 -0
- metadata +166 -0
- metadata.gz.sig +0 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 5cf8578d07811493b1de6baad3ddda00253d7185
|
4
|
+
data.tar.gz: bc312548d3bb45dc59f7d947df13f903305646f3
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 91875c69c130eb3ff1df8fb94a757c55ca3c6115f1c7a0c37430cc5f3200d8e008775bd4d1d68b4fe2129e1f320196320eba71ea60da6bca2d07bc9107b97fdc
|
7
|
+
data.tar.gz: f258abab55bb8663208cb7b9f4b4a04cc08dd66a9dc98523ec2924f665d9da2b8d663b39a3c67fe151545843dd555fc611f6b47ae7924865cbb206d4f642f8b6
|
checksums.yaml.gz.sig
ADDED
data.tar.gz.sig
ADDED
data/.gitignore
ADDED
data/.rspec
ADDED
data/.travis.yml
ADDED
data/.yardopts
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--markup markdown
|
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
# Mercury Parser
|
2
|
+
A tiny ruby wrapper for [Mercury's Web Parser](https://mercury.postlight.com/web-parser/)
|
3
|
+
|
4
|
+
[![Gem Version](https://badge.fury.io/rb/mercury_parser.png)](http://badge.fury.io/rb/mercury_parser)
|
5
|
+
[![Code Climate](https://codeclimate.com/github/moisesnarvaez/mercury_parser.png)](https://codeclimate.com/github/moisesnarvaez/mercury_parser)
|
6
|
+
[![Dependency Status](https://gemnasium.com/moisesnarvaez/mercury_parser.png)](https://gemnasium.com/moisesnarvaez/mercury_parser)
|
7
|
+
[![Build Status](https://travis-ci.org/moisesnarvaez/mercury_parser.png)](https://travis-ci.org/moisesnarvaez/mercury_parser)
|
8
|
+
|
9
|
+
## Installation
|
10
|
+
Add this line to your application's Gemfile:
|
11
|
+
|
12
|
+
gem 'mercury_parser'
|
13
|
+
|
14
|
+
And then execute:
|
15
|
+
|
16
|
+
bundle install
|
17
|
+
|
18
|
+
## Configuration
|
19
|
+
|
20
|
+
Set the Api Key:
|
21
|
+
|
22
|
+
```ruby
|
23
|
+
MercuryParser.api_key = MERCURY_API_KEY
|
24
|
+
```
|
25
|
+
|
26
|
+
Make sure to set `MERCURY_API_KEY` in your environement variables. You can get an API key by contacting Mercury's team directly, more information on their [web parser page](https://mercury.postlight.com/web-parser/).
|
27
|
+
|
28
|
+
Multiple tokens or multithreaded usage:
|
29
|
+
|
30
|
+
```ruby
|
31
|
+
client = MercuryParser::Client.new(api_key: MERCURY_API_KEY)
|
32
|
+
```
|
33
|
+
|
34
|
+
## Usage
|
35
|
+
|
36
|
+
### Parse
|
37
|
+
|
38
|
+
Parse a webpage and return its main content:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
article = MercuryParser.parse("https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed")
|
42
|
+
=> #<MercuryParser::Article title="Building Awesome CMS", content="<div><div class=\"section-content\"><div class=\"section-inner sectionLayout--insetColumn\"><figure id=\"1b95\" class=\"graf graf--figure graf-after--h3\"><div class=\"aspectRatioPlaceholder is-locked\"><img class=\"graf-image\" src=\"https://d262ilb51hltx0.cloudfront.net/max/800/1*zo51eqdjJ_XSU0D8Vm8P9A.png\"></div></figure><p id=\"c21b\" class=\"graf graf--p graf-after--figure\"><a href=\"https://github.com/postlight/awesome-cms\" class=\"markup--anchor markup--p-anchor\">Awesome CMS</a> is…an awesome list of awesome CMSes. It’s on GitHub, so anyone can add to it via a pull request. Here are some notes on how and why it came to be.</p><p id=\"2a96\" class=\"graf graf--p graf-after--h3\">GitHub has a <a href=\"https://help.github.com/articles/search-syntax/\" class=\"markup--anchor markup--p-anchor\">set of powerful commands</a> for narrowing search results. In seeking out modern content management tools, I used queries like this:</p><p id=\"5c79\" class=\"graf graf--p graf-after--p\"><a href=\"https://github.com/search?o=desc&q=cms+OR+%22content+management%22+OR+admin+pushed%3A%3E2016-01-01+stars%3A%3E50&ref=searchresults&s=stars&type=Repositories&utf8=✓\" class=\"markup--anchor markup--p-anchor\">cms OR “content management” OR admin pushed:>2016–01–01 stars:>50</a></p><p id=\"7d38\" class=\"graf graf--p graf-after--p\">Sorting by stars, I worked my way backwards. I was able to quickly spot relevant CMS projects. I also started to notice some trends.</p><ul class=\"postList\"><li id=\"8671\" class=\"graf graf--li graf-after--p\">Modern and popular content management systems are written in PHP, JavaScript, Python, and Ruby. There are also a few content management systems written in .NET (C#), but they are much less popular on GitHub.</li><li id=\"a406\" class=\"graf graf--li graf-after--li\">Headless content management systems are gaining popularity. Simply presenting the UI for users to edit content, and relying on the end user to create the user-facing site by ingesting the API. <a href=\"http://getdirectus.com/\" class=\"markup--anchor markup--li-anchor\">Directus</a> and <a href=\"https://www.cloudcms.com/\" class=\"markup--anchor markup--li-anchor\">Cloud CMS</a> are headless CMS options.</li><li id=\"e133\" class=\"graf graf--li graf-after--li\">Static content management systems don’t host pages for you. Instead they help generate your CMS, using static files. <a href=\"https://github.com/netlify/netlify-cms\" class=\"markup--anchor markup--li-anchor\">Netlify CMS</a>, <a href=\"https://respondcms.com/\" class=\"markup--anchor markup--li-anchor\">Respond CMS</a>, and <a href=\"https://www.getlektor.com/\" class=\"markup--anchor markup--li-anchor\">Lektor</a> are a few of the options in the static CMS space.</li></ul><p id=\"3bfc\" class=\"graf graf--p graf-after--h3\">I knew the list of all popular content management systems would be huge. I didn’t want to put that data into Markdown directly, as it would be difficult to maintain and to augment with extra data (stars on GitHub, last push date, tags, etc).</p><p id=\"4bcb\" class=\"graf graf--p graf-after--p\">Instead, I opted to store the data in <a href=\"https://github.com/toml-lang/toml\" class=\"markup--anchor markup--p-anchor\">TOML</a>, a human-friendly configuration file language. You can view all of the data that powers Awesome CMS in the <a href=\"https://github.com/postlight/awesome-cms/tree/97216ef432963d4dfb2238340e2ebf9a4127fb1e/data\" class=\"markup--anchor markup--p-anchor\">data folder</a>. Here’s WordPress’ entry in that file:</p><pre id=\"4771\" class=\"graf graf--pre graf-after--p\">[[cms]]<br>name = "WordPress"<br>description = "WordPress is a free and open-source content management system (CMS) based on PHP and MySQL."<br>url = "https://wordpress.org"<br>github_repo = "WordPress/WordPress"<br>awesome_repo = "miziomon/awesome-wordpress"<br>language = "php"</pre><p id=\"4703\" class=\"graf graf--p graf-after--pre\">I process this file using JavaScript in <a href=\"https://github.com/postlight/awesome-cms/blob/97216ef432963d4dfb2238340e2ebf9a4127fb1e/scripts/generateReadme.js\" class=\"markup--anchor markup--p-anchor\">generateReadme.js</a>. It handles processing the TOML, fetching information from GitHub, and generating the final README.md file using the <a href=\"https://github.com/postlight/awesome-cms/blob/master/README.md.hbs\" class=\"markup--anchor markup--p-anchor\">Handlebars template</a>. I’m scraping GitHub for star counts because GitHub’s API only allows for 60 requests an hour for authenticated users. We want to make it as easy as possible for anyone to contribute. Requiring users to generate a GitHub authentication token to generate the README wasn’t an option.</p><p id=\"73aa\" class=\"graf graf--p graf-after--p\">By storing the data in TOML at generating the README.md using JavaScript, I’ve essentially created an incredibly light-weight, GitHub backed, static CMS to power Awesome CMS.</p><figure id=\"7c3e\" class=\"graf graf--figure graf-after--p graf--last\"><div class=\"aspectRatioPlaceholder is-locked\"><img class=\"graf-image\" src=\"https://d262ilb51hltx0.cloudfront.net/max/800/1*Y69yr0JgwOaLzACB0ZXDGw.gif\"></div><figcaption class=\"imageCaption\">I heard you like content management systems</figcaption></figure></div></div></div>", author="Jeremy Mack", date_published="2016-10-03T12:48:58.385Z", lead_image_url="https://d262ilb51hltx0.cloudfront.net/max/1200/1*zo51eqdjJ_XSU0D8Vm8P9A.png", dek=nil, next_page_url=nil, url="https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed", domain="trackchanges.postlight.com", excerpt="Awesome CMS is…an awesome list of awesome CMSes. It’s on GitHub, so anyone can add to it via a pull request.", word_count=397, direction="ltr", total_pages=1, rendered_pages=1>
|
43
|
+
|
44
|
+
article.title
|
45
|
+
article.content
|
46
|
+
article.author
|
47
|
+
article.date_published
|
48
|
+
article.lead_image_url
|
49
|
+
article.dek
|
50
|
+
article.next_page_url
|
51
|
+
article.url
|
52
|
+
article.domain
|
53
|
+
article.excerpt
|
54
|
+
article.word_count
|
55
|
+
article.direction
|
56
|
+
article.total_pages
|
57
|
+
article.rendered_pages
|
58
|
+
```
|
59
|
+
|
60
|
+
## Contributing
|
61
|
+
|
62
|
+
1. Fork it
|
63
|
+
2. [Create a topic branch](http://learn.github.com/p/branching.html)
|
64
|
+
3. Add specs for your unimplemented modifications
|
65
|
+
4. Run `bundle exec rspec`. If specs pass, return to step 3.
|
66
|
+
5. Implement your modifications
|
67
|
+
6. Run `bundle exec rspec`. If specs fail, return to step 5.
|
68
|
+
7. Commit your changes and push
|
69
|
+
8. [Submit a pull request](http://help.github.com/send-pull-requests/)
|
70
|
+
|
71
|
+
## Inspiration
|
72
|
+
Based on: [ReadabilityParserGem](https://github.com/phildionne/readability_parser)
|
73
|
+
|
74
|
+
## Author
|
75
|
+
[Moises Narvaez](http://www.moisesnarvaez.com)
|
76
|
+
|
77
|
+
## Copyright
|
78
|
+
Copyright (c) 2016 Moises Narvaez
|
79
|
+
|
80
|
+
## License
|
81
|
+
MIT License
|
82
|
+
|
83
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
84
|
+
|
85
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
86
|
+
|
87
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/Rakefile
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
require 'mercury_parser/configuration'
|
2
|
+
require 'mercury_parser/client'
|
3
|
+
|
4
|
+
module MercuryParser
|
5
|
+
extend Configuration
|
6
|
+
|
7
|
+
class << self
|
8
|
+
# Alias for MercuryParser::Client.new
|
9
|
+
#
|
10
|
+
# @return [MercuryParser::Client]
|
11
|
+
def new(options = {})
|
12
|
+
MercuryParser::Client.new(options)
|
13
|
+
end
|
14
|
+
|
15
|
+
# Delegate to MercuryParser::Client
|
16
|
+
def method_missing(method, *args, &block)
|
17
|
+
return super unless new.respond_to?(method)
|
18
|
+
new.send(method, *args, &block)
|
19
|
+
end
|
20
|
+
|
21
|
+
def respond_to?(method, include_private = false)
|
22
|
+
new.respond_to?(method, include_private) || super(method, include_private)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end # MercuryParser
|
@@ -0,0 +1,21 @@
|
|
1
|
+
module MercuryParser
|
2
|
+
module API
|
3
|
+
module Content
|
4
|
+
|
5
|
+
# Parse a webpage and return its main content
|
6
|
+
# Returns a MercuryParser::Article object
|
7
|
+
#
|
8
|
+
# Optionally pass the ID of an article as `id => "id"` in `options` to return the content for a specific DOM node
|
9
|
+
# You can also pass a `max_pages` integer to set the maximum number of pages to parse and combine. Default is 25.
|
10
|
+
#
|
11
|
+
# @param url [String] The URL of an article to return the content for
|
12
|
+
# @return [MercuryParser::Article]
|
13
|
+
def parse(url, options = {})
|
14
|
+
params = { url: url }
|
15
|
+
response = get('', params.merge(options))
|
16
|
+
|
17
|
+
MercuryParser::Article.new(response)
|
18
|
+
end
|
19
|
+
end # Content
|
20
|
+
end # API
|
21
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'mercury_parser/connection'
|
2
|
+
require 'mercury_parser/request'
|
3
|
+
require 'mercury_parser/api/content'
|
4
|
+
require 'mercury_parser/article'
|
5
|
+
|
6
|
+
module MercuryParser
|
7
|
+
class Client
|
8
|
+
attr_accessor *Configuration::VALID_CONFIG_KEYS
|
9
|
+
|
10
|
+
def initialize(options = {})
|
11
|
+
options = MercuryParser.options.merge(options)
|
12
|
+
Configuration::VALID_OPTIONS_KEYS.each do |key|
|
13
|
+
send("#{key}=", options[key])
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
include MercuryParser::Connection
|
18
|
+
include MercuryParser::Request
|
19
|
+
include MercuryParser::API::Content
|
20
|
+
end # Client
|
21
|
+
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
require 'mercury_parser/version'
|
2
|
+
|
3
|
+
module MercuryParser
|
4
|
+
module Configuration
|
5
|
+
VALID_CONNECTION_KEYS = [:api_endpoint, :user_agent].freeze
|
6
|
+
VALID_OPTIONS_KEYS = [:api_key].freeze
|
7
|
+
VALID_CONFIG_KEYS = VALID_CONNECTION_KEYS + VALID_OPTIONS_KEYS
|
8
|
+
|
9
|
+
DEFAULT_API_ENDPOINT = "https://mercury.postlight.com/parser"
|
10
|
+
DEFAULT_USER_AGENT = "MercuryParser Ruby Gem #{MercuryParser::VERSION}".freeze
|
11
|
+
DEFAULT_API_TOKEN = nil
|
12
|
+
|
13
|
+
attr_accessor *VALID_CONFIG_KEYS
|
14
|
+
|
15
|
+
def self.extended(base)
|
16
|
+
base.reset!
|
17
|
+
end
|
18
|
+
|
19
|
+
# Convenience method to allow configuration options to be set in a block
|
20
|
+
def configure
|
21
|
+
yield self
|
22
|
+
end
|
23
|
+
|
24
|
+
def options
|
25
|
+
Hash[ * VALID_CONFIG_KEYS.map { |key| [key, send(key)] }.flatten ]
|
26
|
+
end
|
27
|
+
|
28
|
+
def reset!
|
29
|
+
self.api_endpoint = DEFAULT_API_ENDPOINT
|
30
|
+
self.user_agent = DEFAULT_USER_AGENT
|
31
|
+
|
32
|
+
self.api_key = DEFAULT_API_TOKEN
|
33
|
+
|
34
|
+
return true
|
35
|
+
end
|
36
|
+
end # Configuration
|
37
|
+
end
|
@@ -0,0 +1,39 @@
|
|
1
|
+
require 'faraday'
|
2
|
+
require 'faraday_middleware'
|
3
|
+
|
4
|
+
module MercuryParser
|
5
|
+
module Connection
|
6
|
+
|
7
|
+
# Instantiate a Faraday::Connection
|
8
|
+
# @private
|
9
|
+
private
|
10
|
+
|
11
|
+
# Returns a Faraday::Connection object
|
12
|
+
#
|
13
|
+
# @return [Faraday::Connection]
|
14
|
+
def connection(options = {})
|
15
|
+
options = {
|
16
|
+
:url => MercuryParser.api_endpoint
|
17
|
+
}.merge(options)
|
18
|
+
|
19
|
+
connection = Faraday.new(options) do |c|
|
20
|
+
# encode request params as "www-form-urlencoded"
|
21
|
+
c.use Faraday::Request::UrlEncoded
|
22
|
+
|
23
|
+
c.use FaradayMiddleware::FollowRedirects, limit: 3
|
24
|
+
|
25
|
+
# raise exceptions on 40x, 50x responses
|
26
|
+
c.use Faraday::Response::RaiseError
|
27
|
+
|
28
|
+
c.response :xml, :content_type => /\bxml$/
|
29
|
+
c.response :json, :content_type => /\bjson$/
|
30
|
+
|
31
|
+
c.adapter Faraday.default_adapter
|
32
|
+
end
|
33
|
+
|
34
|
+
connection.headers[:user_agent] = MercuryParser.user_agent
|
35
|
+
|
36
|
+
connection
|
37
|
+
end
|
38
|
+
end # Connection
|
39
|
+
end
|
@@ -0,0 +1,60 @@
|
|
1
|
+
require 'multi_json'
|
2
|
+
|
3
|
+
module MercuryParser
|
4
|
+
class Error < StandardError
|
5
|
+
|
6
|
+
# Raised when Mercury returns a 4xx or 500 HTTP status code
|
7
|
+
class ClientError < Error
|
8
|
+
|
9
|
+
# Creates a new error from an HTTP environement
|
10
|
+
#
|
11
|
+
# @param response [Hash]
|
12
|
+
# @return [MercuryParser::Error::ClientError]
|
13
|
+
def initialize(error = nil)
|
14
|
+
parsed_error = parse_error(error)
|
15
|
+
http_error = error.response[:status].to_i
|
16
|
+
|
17
|
+
if ERROR_MAP.has_key?(http_error)
|
18
|
+
raise ERROR_MAP[http_error].new(parsed_error[:messages])
|
19
|
+
else
|
20
|
+
super
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
|
25
|
+
private
|
26
|
+
|
27
|
+
def parse_error(error)
|
28
|
+
MultiJson.load(error.response[:body], :symbolize_keys => true)
|
29
|
+
end
|
30
|
+
end # ClientError
|
31
|
+
|
32
|
+
class ConfigurationError < MercuryParser::Error; end
|
33
|
+
|
34
|
+
# Raised when there's an error in Faraday
|
35
|
+
class RequestError < MercuryParser::Error; end
|
36
|
+
|
37
|
+
# Raised when MercuryParser returns a 400 HTTP status code
|
38
|
+
class BadRequest < MercuryParser::Error; end
|
39
|
+
|
40
|
+
# Raised when MercuryParser returns a 401 HTTP status code
|
41
|
+
class UnauthorizedRequest < MercuryParser::Error; end
|
42
|
+
|
43
|
+
# Raised when MercuryParser returns a 403 HTTP status code
|
44
|
+
class Forbidden < MercuryParser::Error; end
|
45
|
+
|
46
|
+
# Raised when MercuryParser returns a 404 HTTP status code
|
47
|
+
class NotFound < MercuryParser::Error; end
|
48
|
+
|
49
|
+
# Raised when MercuryParser returns a 500 HTTP status code
|
50
|
+
class InternalServerError < MercuryParser::Error; end
|
51
|
+
|
52
|
+
ERROR_MAP = {
|
53
|
+
400 => MercuryParser::Error::BadRequest,
|
54
|
+
401 => MercuryParser::Error::UnauthorizedRequest,
|
55
|
+
403 => MercuryParser::Error::Forbidden,
|
56
|
+
404 => MercuryParser::Error::NotFound,
|
57
|
+
500 => MercuryParser::Error::InternalServerError
|
58
|
+
}
|
59
|
+
end # Error
|
60
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
require 'mercury_parser/error'
|
2
|
+
|
3
|
+
module MercuryParser
|
4
|
+
module Request
|
5
|
+
|
6
|
+
# Performs a HTTP Get request
|
7
|
+
def get(path, params={})
|
8
|
+
request(:get, path, params)
|
9
|
+
end
|
10
|
+
|
11
|
+
|
12
|
+
private
|
13
|
+
|
14
|
+
# Returns a Faraday::Response object
|
15
|
+
#
|
16
|
+
# @return [Faraday::Response]
|
17
|
+
def request(method, path, params = {})
|
18
|
+
raise MercuryParser::Error::ConfigurationError.new("Please configure MercuryParser.api_key first") if api_key.nil?
|
19
|
+
|
20
|
+
connection_options = {}
|
21
|
+
begin
|
22
|
+
response = connection(connection_options).send(method) do |req|
|
23
|
+
req.url(path, params)
|
24
|
+
req.headers['Content-Type'] = 'application/json'
|
25
|
+
req.headers['x-api-key'] = api_key
|
26
|
+
end
|
27
|
+
rescue Faraday::Error::ClientError => error
|
28
|
+
if error.is_a?(Faraday::Error::ClientError)
|
29
|
+
raise MercuryParser::Error::ClientError.new(error)
|
30
|
+
else
|
31
|
+
raise MercuryParser::Error::RequestError.new(error)
|
32
|
+
end
|
33
|
+
end
|
34
|
+
|
35
|
+
response.body
|
36
|
+
end
|
37
|
+
end # Request
|
38
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
|
5
|
+
require 'mercury_parser/version'
|
6
|
+
|
7
|
+
Gem::Specification.new do |gem|
|
8
|
+
gem.name = "mercury_parser"
|
9
|
+
gem.version = MercuryParser::VERSION
|
10
|
+
gem.authors = ["Moises Narvaez"]
|
11
|
+
gem.email = ["MoisesNarvaez@gmail.com"]
|
12
|
+
gem.description = %q{A tiny ruby wrapper for Mercury's content parser api}
|
13
|
+
gem.summary = %q{Interact with the article parsing featureset of Mercury. This means grabbing an article's content based on a URL.}
|
14
|
+
gem.homepage = "https://github.com/moisesnarvaez/mercury_parser"
|
15
|
+
gem.licenses = "MIT"
|
16
|
+
|
17
|
+
gem.files = `git ls-files`.split($/)
|
18
|
+
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
19
|
+
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
20
|
+
gem.require_paths = ["lib"]
|
21
|
+
|
22
|
+
gem.cert_chain = ['certs/gem-public_cert.pem']
|
23
|
+
gem.signing_key = File.expand_path("~/.gem/gem-private_key.pem") if $0 =~ /gem\z/
|
24
|
+
|
25
|
+
gem.add_dependency "faraday", "~> 0.9"
|
26
|
+
gem.add_dependency "faraday_middleware", "~> 0.9"
|
27
|
+
gem.add_dependency "hashie", "~> 3.2"
|
28
|
+
gem.add_dependency "multi_xml", "~> 0.5"
|
29
|
+
gem.add_dependency "multi_json", "~> 1.10"
|
30
|
+
end
|
@@ -0,0 +1,62 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe MercuryParser::Client do
|
4
|
+
|
5
|
+
after do
|
6
|
+
MercuryParser.reset!
|
7
|
+
end
|
8
|
+
|
9
|
+
context "with module configuration" do
|
10
|
+
before do
|
11
|
+
MercuryParser.configure do |config|
|
12
|
+
MercuryParser::Configuration::VALID_CONFIG_KEYS.each do |key|
|
13
|
+
config.send("#{key}=", key)
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
it "inherits the module configuration" do
|
19
|
+
MercuryParser::Configuration::VALID_CONFIG_KEYS.each do |key|
|
20
|
+
expect(MercuryParser.send(:"#{key}")).to eq(key)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
context "with class configuration" do
|
26
|
+
before do
|
27
|
+
@configuration = {
|
28
|
+
api_key: '1234'
|
29
|
+
}
|
30
|
+
end
|
31
|
+
|
32
|
+
it "overrides the module configuration after initialization" do
|
33
|
+
MercuryParser.configure do |config|
|
34
|
+
@configuration.each do |key, value|
|
35
|
+
config.send("#{key}=", value)
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
MercuryParser::Configuration::VALID_OPTIONS_KEYS.each do |key|
|
40
|
+
expect(MercuryParser.send(:"#{key}")).to eq(@configuration[key])
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
describe "#connection" do
|
46
|
+
it "looks like Faraday connection" do
|
47
|
+
expect(subject.send(:connection)).to respond_to(:run_request)
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
describe "#request" do
|
52
|
+
before { MercuryParser.api_key = '1234' }
|
53
|
+
|
54
|
+
it "catches Faraday connection errors" do
|
55
|
+
skip
|
56
|
+
end
|
57
|
+
|
58
|
+
it "catches Mercury Parser API errors" do
|
59
|
+
skip
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
data/spec/spec_helper.rb
ADDED
metadata
ADDED
@@ -0,0 +1,166 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: mercury_parser
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Moises Narvaez
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain:
|
11
|
+
- |
|
12
|
+
-----BEGIN CERTIFICATE-----
|
13
|
+
MIIDhTCCAm2gAwIBAgIBATANBgkqhkiG9w0BAQUFADBEMRYwFAYDVQQDDA1tb2lz
|
14
|
+
ZXNuYXJ2YWV6MRUwEwYKCZImiZPyLGQBGRYFZ21haWwxEzARBgoJkiaJk/IsZAEZ
|
15
|
+
FgNjb20wHhcNMTYxMTIxMjIxNzAxWhcNMTcxMTIxMjIxNzAxWjBEMRYwFAYDVQQD
|
16
|
+
DA1tb2lzZXNuYXJ2YWV6MRUwEwYKCZImiZPyLGQBGRYFZ21haWwxEzARBgoJkiaJ
|
17
|
+
k/IsZAEZFgNjb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDCgs6b
|
18
|
+
ZV5GnDnYnY5ia8b3macpMzPabZZk0DAD7VdBr1yaabN8MwzVqfo8NK8DRHyS0gAc
|
19
|
+
QWnO/WD0IG41aad3DdlZqLxg6MburvmontSzcvsCsnmdSoqbFMWBiKmiEhIvVHWs
|
20
|
+
a2x/7nUUPCiEQ/zoA4xNVhLSPAizF8jgWXIwtWAUWG1gGqmsdy45Ox5tqb/trh7y
|
21
|
+
7QkNZjYy9xGelTPuIOutoD7247+UWFjUyyG/g3wNaEQUVI3RDQRWzOVDKJtHxCo7
|
22
|
+
WXbjtw2r3LS/F4MW5M+637hid780yNrIxkiqDs59Lkt51WFEQVnZxoVXtD5dci0I
|
23
|
+
PMiTtPYeVA1aq9tFAgMBAAGjgYEwfzAJBgNVHRMEAjAAMAsGA1UdDwQEAwIEsDAd
|
24
|
+
BgNVHQ4EFgQUPqahq7ETJWDvF7WBCFXj13ak+/wwIgYDVR0RBBswGYEXbW9pc2Vz
|
25
|
+
bmFydmFlekBnbWFpbC5jb20wIgYDVR0SBBswGYEXbW9pc2VzbmFydmFlekBnbWFp
|
26
|
+
bC5jb20wDQYJKoZIhvcNAQEFBQADggEBAKzCnByaDkYUyeFpSAaOaoXHymMKjd6S
|
27
|
+
dI+ESmQYPzZmOFzUrIeImKvfwNS5AENpxyO1TF/M4LtYtTi5TTu/bcr0tloXYq+Z
|
28
|
+
M+nOFlum82y5F4ndXk/mdT+bxKxK/VH9jI47N/eC9aKQCAtTkKISKKhHFBprN0Yx
|
29
|
+
LexaioJTNCUIRtR6RUS3vSmXcma1Z19Z6mkHT4W4ljianFiEce/jubJPqNYlQkGZ
|
30
|
+
ypccthPoC9Hj/J31ykMMe6GK9Kvjh9J9X/fcV1Zy8vaE1uOa5D1r1PsZeFL7UQwl
|
31
|
+
4KzyFooXeRThwYgBIr55pffGE/pBC+q8diOD3EDZMXL0E2YGnHsH98s=
|
32
|
+
-----END CERTIFICATE-----
|
33
|
+
date: 2016-11-21 00:00:00.000000000 Z
|
34
|
+
dependencies:
|
35
|
+
- !ruby/object:Gem::Dependency
|
36
|
+
name: faraday
|
37
|
+
requirement: !ruby/object:Gem::Requirement
|
38
|
+
requirements:
|
39
|
+
- - "~>"
|
40
|
+
- !ruby/object:Gem::Version
|
41
|
+
version: '0.9'
|
42
|
+
type: :runtime
|
43
|
+
prerelease: false
|
44
|
+
version_requirements: !ruby/object:Gem::Requirement
|
45
|
+
requirements:
|
46
|
+
- - "~>"
|
47
|
+
- !ruby/object:Gem::Version
|
48
|
+
version: '0.9'
|
49
|
+
- !ruby/object:Gem::Dependency
|
50
|
+
name: faraday_middleware
|
51
|
+
requirement: !ruby/object:Gem::Requirement
|
52
|
+
requirements:
|
53
|
+
- - "~>"
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: '0.9'
|
56
|
+
type: :runtime
|
57
|
+
prerelease: false
|
58
|
+
version_requirements: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
60
|
+
- - "~>"
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: '0.9'
|
63
|
+
- !ruby/object:Gem::Dependency
|
64
|
+
name: hashie
|
65
|
+
requirement: !ruby/object:Gem::Requirement
|
66
|
+
requirements:
|
67
|
+
- - "~>"
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: '3.2'
|
70
|
+
type: :runtime
|
71
|
+
prerelease: false
|
72
|
+
version_requirements: !ruby/object:Gem::Requirement
|
73
|
+
requirements:
|
74
|
+
- - "~>"
|
75
|
+
- !ruby/object:Gem::Version
|
76
|
+
version: '3.2'
|
77
|
+
- !ruby/object:Gem::Dependency
|
78
|
+
name: multi_xml
|
79
|
+
requirement: !ruby/object:Gem::Requirement
|
80
|
+
requirements:
|
81
|
+
- - "~>"
|
82
|
+
- !ruby/object:Gem::Version
|
83
|
+
version: '0.5'
|
84
|
+
type: :runtime
|
85
|
+
prerelease: false
|
86
|
+
version_requirements: !ruby/object:Gem::Requirement
|
87
|
+
requirements:
|
88
|
+
- - "~>"
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
version: '0.5'
|
91
|
+
- !ruby/object:Gem::Dependency
|
92
|
+
name: multi_json
|
93
|
+
requirement: !ruby/object:Gem::Requirement
|
94
|
+
requirements:
|
95
|
+
- - "~>"
|
96
|
+
- !ruby/object:Gem::Version
|
97
|
+
version: '1.10'
|
98
|
+
type: :runtime
|
99
|
+
prerelease: false
|
100
|
+
version_requirements: !ruby/object:Gem::Requirement
|
101
|
+
requirements:
|
102
|
+
- - "~>"
|
103
|
+
- !ruby/object:Gem::Version
|
104
|
+
version: '1.10'
|
105
|
+
description: A tiny ruby wrapper for Mercury's content parser api
|
106
|
+
email:
|
107
|
+
- MoisesNarvaez@gmail.com
|
108
|
+
executables: []
|
109
|
+
extensions: []
|
110
|
+
extra_rdoc_files: []
|
111
|
+
files:
|
112
|
+
- ".gitignore"
|
113
|
+
- ".rspec"
|
114
|
+
- ".travis.yml"
|
115
|
+
- ".yardopts"
|
116
|
+
- Gemfile
|
117
|
+
- README.md
|
118
|
+
- Rakefile
|
119
|
+
- lib/mercury_parser.rb
|
120
|
+
- lib/mercury_parser/api/content.rb
|
121
|
+
- lib/mercury_parser/article.rb
|
122
|
+
- lib/mercury_parser/client.rb
|
123
|
+
- lib/mercury_parser/configuration.rb
|
124
|
+
- lib/mercury_parser/connection.rb
|
125
|
+
- lib/mercury_parser/error.rb
|
126
|
+
- lib/mercury_parser/request.rb
|
127
|
+
- lib/mercury_parser/version.rb
|
128
|
+
- mercury_parser.gemspec
|
129
|
+
- spec/mercury_parser/api/content_spec.rb
|
130
|
+
- spec/mercury_parser/article_spec.rb
|
131
|
+
- spec/mercury_parser/client_spec.rb
|
132
|
+
- spec/mercury_parser/error_spec.rb
|
133
|
+
- spec/mercury_parser_spec.rb
|
134
|
+
- spec/spec_helper.rb
|
135
|
+
homepage: https://github.com/moisesnarvaez/mercury_parser
|
136
|
+
licenses:
|
137
|
+
- MIT
|
138
|
+
metadata: {}
|
139
|
+
post_install_message:
|
140
|
+
rdoc_options: []
|
141
|
+
require_paths:
|
142
|
+
- lib
|
143
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
144
|
+
requirements:
|
145
|
+
- - ">="
|
146
|
+
- !ruby/object:Gem::Version
|
147
|
+
version: '0'
|
148
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
149
|
+
requirements:
|
150
|
+
- - ">="
|
151
|
+
- !ruby/object:Gem::Version
|
152
|
+
version: '0'
|
153
|
+
requirements: []
|
154
|
+
rubyforge_project:
|
155
|
+
rubygems_version: 2.6.8
|
156
|
+
signing_key:
|
157
|
+
specification_version: 4
|
158
|
+
summary: Interact with the article parsing featureset of Mercury. This means grabbing
|
159
|
+
an article's content based on a URL.
|
160
|
+
test_files:
|
161
|
+
- spec/mercury_parser/api/content_spec.rb
|
162
|
+
- spec/mercury_parser/article_spec.rb
|
163
|
+
- spec/mercury_parser/client_spec.rb
|
164
|
+
- spec/mercury_parser/error_spec.rb
|
165
|
+
- spec/mercury_parser_spec.rb
|
166
|
+
- spec/spec_helper.rb
|
metadata.gz.sig
ADDED
Binary file
|