linkey 1.4.0 → 1.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +26 -26
- data/lib/linkey.rb +28 -19
- data/lib/linkey/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 86d840ae1eee23f4d83fb1fac2ab24412283e2ff
|
4
|
+
data.tar.gz: eb990de7ca319e5842e0b86928031d73123ac6e1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b77a591f2eef8bf76e5b1ecb519b97792b78d482b6811328603a89b1bba61a1179f3fe767f597d80e432c75434c07cb3072beb35b9494122f7fe4bb1925c2dd1
|
7
|
+
data.tar.gz: 86acdb19a4e504cd65d3fd8f901e2e7f76fbb80658c41b3242b9602cb10dd7b99443721641bf2512e97519201a143f58a66c5416a3a4738deaa1b372edd58b6d
|
data/README.md
CHANGED
@@ -1,49 +1,45 @@
|
|
1
|
-
|
2
|
-
Linkey
|
3
|
-
=====
|
1
|
+
# Linkey
|
4
2
|
|
5
|
-
|
3
|
+
[![gem_version.png](https://img.shields.io/gem/v/linkey.svg)](https://rubygems.org/gems/linkey) [![gem_downloads.png](https://img.shields.io/gem/dt/linkey.svg)](https://rubygems.org/gems/linkey) [![travis.png](https://img.shields.io/travis/DaveBlooman/linkey/master.svg)](https://travis-ci.org/DaveBlooman/linkey) [![code_climate.png](https://img.shields.io/codeclimate/github/DaveBlooman/linkey.svg)](https://codeclimate.com/github/DaveBlooman/linkey)
|
4
|
+
|
5
|
+
**Link checker for BBC News & World Services sites.**
|
6
6
|
|
7
7
|
The idea is to quickly check a page for broken links by doing a status check on all the relative URL's on the page.
|
8
8
|
|
9
9
|
There are 4 parts to this tool, the URL, the base URL, the regex and the filename.
|
10
10
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
11
|
+
* **URL** is the page that you want to check for broken links, e.g `www.bbc.co.uk/news/uk-29928282`
|
12
|
+
* **Base URL** is used with the relative URL from the regex to create a full URL, e.g `www.bbc.co.uk`
|
13
|
+
* **Regex** is the point of the URL that you want to keep from the regex, e.g `bbc.co.uk/news/uk`, specifying `/news` would create `/news/uk`.
|
14
|
+
* **Filename** is markdown (.md) file where all the page links are stored, this can be useful for manual checks, e.g `file.md`
|
15
15
|
|
16
|
-
##
|
16
|
+
## Installation
|
17
17
|
|
18
|
-
|
19
|
-
gem install linkey
|
20
|
-
```
|
18
|
+
gem install linkey
|
21
19
|
|
22
|
-
##
|
20
|
+
## Usage
|
23
21
|
|
24
|
-
|
22
|
+
### Command Line
|
25
23
|
|
26
24
|
```
|
27
|
-
linkey check
|
25
|
+
linkey check <url> <base_url> <regex> <filename>
|
28
26
|
```
|
29
|
-
|
27
|
+
|
28
|
+
**Examples**
|
30
29
|
|
31
30
|
```
|
32
31
|
linkey check http://www.bbc.co.uk/arabic http://www.bbc.co.uk /arabic arabic.md
|
33
32
|
```
|
34
|
-
Another
|
35
33
|
|
36
34
|
```
|
37
35
|
linkey check http://www.theguardian.com/technology/2014/feb/15/year-of-code-needs-reboot-teachers http://theguardian.com /technology news.md
|
38
36
|
```
|
39
|
-
Output
|
37
|
+
**Output**
|
40
38
|
|
41
|
-
Once running, you'll see either a 200 with
|
42
|
-
|
43
|
-
|
44
|
-
`Status is NOT GOOD for URL`
|
39
|
+
Once running, you'll see either a 200 with `Status is 200 for <URL>` or `Status is NOT GOOD for <URL>`.
|
40
|
+
|
41
|
+
### Script It
|
45
42
|
|
46
|
-
## Script it
|
47
43
|
```ruby
|
48
44
|
require 'linkey'
|
49
45
|
|
@@ -58,7 +54,8 @@ status = Linkey::CheckResponse.new(url, base, reg, filename)
|
|
58
54
|
page.capture_links
|
59
55
|
status.check_links
|
60
56
|
```
|
61
|
-
|
57
|
+
|
58
|
+
### From a File
|
62
59
|
|
63
60
|
If you have a lot of URLs that you want to check all the time using from a file is an alternative option. This will utilise the smoke option, then point to a YAML file with the extension. In some situations, we are deploying applications that we don't want public facing, so ensuring they 404 is essential. There is a status code option to allow a specific status code to be set against a group of URL's, ensuring builds fail if the right code conditions are met.
|
64
61
|
|
@@ -66,10 +63,13 @@ If you have a lot of URLs that you want to check all the time using from a file
|
|
66
63
|
linkey smoke test.yaml
|
67
64
|
```
|
68
65
|
|
69
|
-
Example
|
66
|
+
Example YAML Config:
|
67
|
+
|
70
68
|
```yaml
|
71
69
|
base: 'http://www.bbc.co.uk'
|
72
70
|
|
71
|
+
concurrency: 100
|
72
|
+
|
73
73
|
headers:
|
74
74
|
-
|
75
75
|
X-content-override: 'https://example.com'
|
@@ -81,7 +81,7 @@ paths:
|
|
81
81
|
- /news/uk
|
82
82
|
```
|
83
83
|
|
84
|
-
|
84
|
+
Via a Ruby script:
|
85
85
|
|
86
86
|
```ruby
|
87
87
|
require 'linkey'
|
data/lib/linkey.rb
CHANGED
@@ -28,7 +28,7 @@ module Linkey
|
|
28
28
|
|
29
29
|
def scan(page_links)
|
30
30
|
urls = page_links.scan(/^#{Regexp.quote(reg)}(?:|.+)?$/)
|
31
|
-
Getter.new(urls, base).check
|
31
|
+
Getter.new(urls, base, 100, 200, {}).check
|
32
32
|
end
|
33
33
|
end
|
34
34
|
|
@@ -50,29 +50,40 @@ module Linkey
|
|
50
50
|
end
|
51
51
|
|
52
52
|
class Checker
|
53
|
+
attr_accessor :config
|
54
|
+
|
53
55
|
def initialize(config)
|
54
|
-
@
|
56
|
+
@config = YAML.load(File.open("#{config}"))
|
55
57
|
end
|
56
58
|
|
57
59
|
def base
|
58
|
-
|
60
|
+
config["base"]
|
61
|
+
end
|
62
|
+
|
63
|
+
def concurrency
|
64
|
+
config["concurrency"] ? config["concurrency"] : 100
|
65
|
+
end
|
66
|
+
|
67
|
+
def status_code
|
68
|
+
config["status_code"] ? config["status_code"] : 200
|
59
69
|
end
|
60
70
|
|
61
71
|
def smoke
|
62
|
-
urls =
|
63
|
-
options =
|
72
|
+
urls = config["paths"]
|
73
|
+
options = config["headers"]
|
64
74
|
headers = Hash[*options]
|
65
|
-
|
66
|
-
Getter.new(urls, base, { :headers => headers }, status_code).check
|
75
|
+
Getter.new(urls, base, concurrency, status_code, { :headers => headers }).check
|
67
76
|
end
|
68
77
|
end
|
69
78
|
|
70
79
|
class Getter
|
71
|
-
def initialize(paths, base,
|
72
|
-
@paths
|
73
|
-
@base
|
74
|
-
@headers
|
75
|
-
@status
|
80
|
+
def initialize(paths, base, concurrency, status, headers)
|
81
|
+
@paths = paths
|
82
|
+
@base = base
|
83
|
+
@headers = headers
|
84
|
+
@status = status
|
85
|
+
|
86
|
+
@hydra = Typhoeus::Hydra.new(:max_concurrency => concurrency)
|
76
87
|
end
|
77
88
|
|
78
89
|
def check
|
@@ -82,22 +93,20 @@ module Linkey
|
|
82
93
|
begin
|
83
94
|
Typhoeus::Request.new(url(path), options).tap do |req|
|
84
95
|
req.on_complete { |r| parse_response(r, status) }
|
85
|
-
|
96
|
+
hydra.queue req
|
86
97
|
end
|
87
|
-
rescue
|
88
|
-
puts "Error with URL #{path}, please check config"
|
98
|
+
rescue URI::InvalidURIError
|
99
|
+
puts "Error with URL #{path}, please check config."
|
89
100
|
end
|
90
101
|
end
|
91
102
|
|
92
|
-
|
103
|
+
hydra.run
|
93
104
|
check_for_broken
|
94
105
|
end
|
95
106
|
|
96
107
|
private
|
97
108
|
|
98
|
-
attr_reader :base, :headers, :paths, :status
|
99
|
-
|
100
|
-
HYDRA = Typhoeus::Hydra.new(:max_concurrency => 100)
|
109
|
+
attr_reader :base, :headers, :paths, :status, :concurrency, :hydra
|
101
110
|
|
102
111
|
def check_for_broken
|
103
112
|
puts "Checking"
|
data/lib/linkey/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: linkey
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dave Blooman
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-04-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: thor
|