spn2 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 744fc63b21023c96ff40a72561d80f8e277be132b369c10b7196f273178177d8
4
- data.tar.gz: 26c923d7ac606c16c0f56e98cff3497878771899a7fc23947c57d864316de4a7
3
+ metadata.gz: e968c75da93882e48ac210e17bb599497971ee1f065035758279529b23c53f1b
4
+ data.tar.gz: 9fa1b6f6125d9347d2418254b8e8838572fd8a4efcf0c6bbabc258f3784c097b
5
5
  SHA512:
6
- metadata.gz: ca70afc978bd3766ebdc4234575ea1ca5e19202757c353bd917b6b822293764fbde204ad3e6753188eaab0c2123abf2b71ae8fee6c827df41fb836662f90f788
7
- data.tar.gz: 23a245ea4c896b034da511fa7dab95d81fda3cd8cec778c1e0be27eddf4f49e46452efb7e10feeb59af0b7ad02ee95fc352f2cbc34c8e79f245c5a8efd283b52
6
+ metadata.gz: bacfeda95f8a40e132496cb69078e8229767de20c99615db7da4bb1d5f35a9a4c7e3e688a1fe3d31d6dd9e5f026d0d6fc8e328aae90915de55e6810db675732f
7
+ data.tar.gz: e62fe104f074cb9ab5e4a85b89050e8dc0f198613dd9ff62fd56d3c645428e080699e14d57b8b6a655b68ed3be00704ff67ba5e9a58508083fe0f961cdc1ac55
data/.rubocop.yml CHANGED
@@ -8,6 +8,3 @@ AllCops:
8
8
 
9
9
  Style/HashSyntax:
10
10
  Enabled: false # yuk Ruby 3.1
11
-
12
- Style/NumericLiterals:
13
- Enabled: false
data/CHANGELOG.md CHANGED
@@ -1,5 +1,3 @@
1
- ## [Unreleased]
2
-
3
1
  ## [0.1.0] - 2022-06-29
4
2
 
5
3
  - Initial release
@@ -8,3 +6,8 @@
8
6
 
9
7
  - Add error handling
10
8
  - Add ability to add opts to Spn2.save
9
+
10
+ ## [0.1.2] - 2022-07-02
11
+
12
+ - Add user_status
13
+ - Add status calls for multiple job_ids and outlinks
data/Guardfile CHANGED
@@ -4,4 +4,5 @@ guard :minitest do
4
4
  watch(%r{^test/(.*)/?test_(.*)\.rb$})
5
5
  watch(%r{^test/test_helper\.rb$}) { 'test' }
6
6
  watch(%r{^lib/(.*/)?([^/]+)\.rb$}) { |m| "test/lib/#{m[1]}test_#{m[2]}.rb" }
7
+ watch(%r{^lib/spn2\.rb}) { 'test/lib/spn2' }
7
8
  end
data/README.md CHANGED
@@ -1,3 +1,4 @@
1
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-rubocop-brightgreen.svg)](https://github.com/rubocop/rubocop)
1
2
  [![Gem Version](https://badge.fury.io/rb/spn2.svg)](https://badge.fury.io/rb/spn2)
2
3
 
3
4
  # Spn2
@@ -37,19 +38,28 @@ Save (capture) a url in the Wayback Machine. This method returns the job_id in a
37
38
  ```rb
38
39
  > Spn2.save(url: 'example.com') # returns a job_id
39
40
 
40
- => {job_id: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14'} # json may include "url" and "message" keys too
41
+ => {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"} # may include a "message" key too
41
42
  ```
42
43
  Various options are available, as detailed in the [specification](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit) in the section "Capture request". These may be passed like so:
43
44
  ```rb
44
45
  > Spn2.save(url: 'example.com', opts: { capture_all: 1, capture_outlinks: 1 })
45
46
 
46
- => {url: 'example.com', job_id: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14'}
47
+ => {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"}
47
48
  ```
49
+ Page save errors will raise an error and look like this:
50
+ ```rb
51
+ => {"status"=>"error", "status_ext"=>"error:too-many-daily-captures", "message"=>"This URL has been already captured 10 times today.
52
+ Please try again tomorrow. Please email us at \"info@archive.org\" if you would like to discuss this more."} (Spn2::Spn2ErrorFailedCapture)
53
+ ```
54
+ The key "status_ext" contains an explanatory message - see the API [specification](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit).
55
+
56
+
57
+
48
58
  ### View the status of a job
49
59
 
50
60
  Use the job_id.
51
61
  ```rb
52
- > Spn2.status(job_id: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14')
62
+ > Spn2.status_job_id(job_id: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14')
53
63
 
54
64
  => {"counters"=>{"outlinks"=>1, "embeds"=>2}, "job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14",
55
65
  "original_url"=>"http://example.com/", "resources"=>["http://example.com/", "http://example.com/favicon.ico"],
@@ -58,6 +68,35 @@ Use the job_id.
58
68
  ```
59
69
  "status" => "success" is what you are looking for.
60
70
 
71
+ Care is advised for domains/urls which are frequently saved into the Wayback Machine as the job_id is merely "spn2-" followed by a hash of the url\*. A status request will show the status of _the most recent capture by anyone_ of the url in question.
72
+
73
+ \* Usually an sha1 hash of the url in the form http://\<domain\>/\<path\>/ e.g:
74
+ ```sh
75
+ $ echo "http://example.com/"|tr -d "\n"|shasum
76
+ 9c17e047f58f9220a7008d4f18152fee4d111d14 -
77
+ ```
78
+
79
+ The status of a comma-separated list of job_id's can be obtained with:
80
+ ```rb
81
+ > Spn2.status_job_ids(job_ids: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14,spn2-...')
82
+
83
+ => [.. # an array of status hashes
84
+ ```
85
+
86
+ Finally, the status of any outlinks captured by using the save option `capture_outlinks: 1` is available by supplying the parent job_id to:
87
+ ```rb
88
+ > Spn2.status_job_id_outlinks(job_id: 'spn2-cce034d987e1d72d8cbf1770bcf99024fe20dddf')
89
+
90
+ => [.. # an array of outlink job status hashes
91
+ ```
92
+ ### User status
93
+
94
+ Information about the user is available via:
95
+ ```rb
96
+ > Spn2.user_status
97
+ => {"daily_captures_limit"=>100000, "available"=>8, "processing"=>0, "daily_captures"=>10}
98
+ ```
99
+
61
100
  ### System status
62
101
 
63
102
  The status of Wayback Machine itself is available.
@@ -67,11 +106,19 @@ The status of Wayback Machine itself is available.
67
106
  ```
68
107
  ### Error handling
69
108
 
70
- To fascilitate graceful error handling, a full list of all error classes is provided by:
109
+ To facilitate graceful error handling, a full list of all error classes is provided by:
71
110
  ```rb
72
111
  > Spn2.error_classes
73
112
  => [Spn2::Spn2Error, Spn2::Spn2ErrorBadAuth,.. ..]
74
113
  ```
114
+ ## Testing
115
+
116
+ Just run `bundle exec rake` to run the test suite.
117
+
118
+ Valid API keys must be held in SPN2_ACCESS_KEY and SPN2_SECRET_KEY for testing. Go to https://archive.org/account/s3.php to set up API keys if you need them. If you have your live keys stored in these env vars just do:
119
+
120
+ `export SPN2_ACCESS_KEY=<valid access test key> && export SPN2_SECRET_KEY=<valid secret test key>` immediately before the above command.
121
+
75
122
  ## Development
76
123
 
77
124
  ~~After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.~~
data/lib/spn2/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Spn2
4
- VERSION = '0.1.1'
4
+ VERSION = '0.1.2'
5
5
  end
data/lib/spn2.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'date'
3
4
  require 'json'
4
5
  require 'nokogiri'
5
6
 
@@ -7,20 +8,22 @@ require_relative 'curlable'
7
8
 
8
9
  # Design decison to not use a class as only 'state' is in 2 env vars
9
10
  module Spn2
10
- extend Curlable # for system_status
11
- include Curlable
11
+ extend Curlable
12
12
 
13
+ BAD_AUTH_MSG = 'You need to be logged in to use Save Page Now.'
13
14
  ERROR_CODES = [502].freeze
14
15
 
15
16
  class Spn2Error < StandardError; end
16
17
  class Spn2ErrorBadAuth < Spn2Error; end
17
- class Spn2ErrorBadAuth < Spn2Error; end
18
- class Spn2ErrorBadResponse < Spn2Error; end
18
+ class Spn2ErrorFailedCapture < Spn2Error; end
19
19
  class Spn2ErrorInvalidOption < Spn2Error; end
20
+ class Spn2ErrorMissingKeys < Spn2Error; end
21
+ class Spn2ErrorNoOutlinks < Spn2Error; end
22
+ class Spn2ErrorTooManyRequests < Spn2Error; end
23
+ class Spn2ErrorUnknownResponse < Spn2Error; end
20
24
  class Spn2ErrorUnknownResponseCode < Spn2Error; end
21
25
  ERROR_CODES.each { |i| Spn2.const_set("Spn2Error#{i}", Class.new(Spn2Error)) }
22
26
 
23
- BAD_AUTH_MSG = 'You need to be logged in to use Save Page Now.'
24
27
  ESSENTIAL_STATUS_KEYS = %w[job_id resources status].freeze
25
28
  JOB_ID_REGEXP = /^(spn2-([a-f]|\d){40})$/
26
29
  WEB_ARCHIVE = 'https://web.archive.org'
@@ -35,37 +38,58 @@ module Spn2
35
38
  end
36
39
 
37
40
  def access_key
38
- ENV.fetch('SPN2_ACCESS_KEY', nil)
41
+ ENV.fetch('SPN2_ACCESS_KEY')
39
42
  end
40
43
 
41
44
  def secret_key
42
- ENV.fetch('SPN2_SECRET_KEY', nil)
45
+ ENV.fetch('SPN2_SECRET_KEY')
43
46
  end
44
47
 
45
48
  def system_status
46
49
  json get(url: "#{WEB_ARCHIVE}/save/status/system") # no auth
47
50
  end
48
51
 
52
+ def user_status
53
+ json auth_get(url: "#{WEB_ARCHIVE}/save/status/user?t=#{DateTime.now.strftime('%Q').to_i}")
54
+ end
55
+
49
56
  def save(url:, opts: {})
50
57
  raise Spn2ErrorInvalidOption, "One or more invalid options: #{opts}" unless options_valid?(opts)
51
58
 
52
- hash = json(auth_post(url: "#{WEB_ARCHIVE}/save/#{url}", params: { url: url }.merge(opts)))
53
- raise Spn2ErrorBadAuth, hash.inspect if hash['message']&.== BAD_AUTH_MSG
59
+ json = json(auth_post(url: "#{WEB_ARCHIVE}/save/#{url}", params: { url: url }.merge(opts)))
60
+ raise Spn2ErrorBadAuth, json.inspect if json['message']&.== BAD_AUTH_MSG
54
61
 
55
- raise Spn2ErrorBadResponse, "Bad response: #{hash.inspect}" unless hash['job_id']
62
+ raise Spn2ErrorFailedCapture, json.inspect unless json['job_id']
56
63
 
57
- hash
64
+ json
58
65
  end
59
66
  alias capture save
60
67
 
61
- def status(job_id:)
62
- hash = json(auth_get(url: "#{WEB_ARCHIVE}/save/status/#{job_id}"))
63
- raise Spn2ErrorBadAuth, hash.inspect if hash['message']&.== BAD_AUTH_MSG
68
+ def status_job_id(job_id:)
69
+ json = json(auth_post(url: "#{WEB_ARCHIVE}/save/status", params: { job_id: job_id }))
70
+ raise Spn2ErrorBadAuth, json.inspect if json['message']&.== BAD_AUTH_MSG
64
71
 
65
- raise Spn2ErrorBadResponse, "Bad response: #{hash.inspect}" unless (ESSENTIAL_STATUS_KEYS - hash.keys).empty?
72
+ raise Spn2ErrorMissingKeys, json.inspect unless (ESSENTIAL_STATUS_KEYS - json.keys).empty?
66
73
 
67
- hash
74
+ json
68
75
  end
76
+ alias status status_job_id
77
+
78
+ def status_job_ids(job_ids:)
79
+ json = json(auth_post(url: "#{WEB_ARCHIVE}/save/status", params: { job_ids: job_ids }))
80
+ raise Spn2Error, json.inspect unless json.is_a? Array
81
+
82
+ json
83
+ end
84
+ alias statuses status_job_ids
85
+
86
+ def status_job_id_outlinks(job_id:)
87
+ json = json(auth_post(url: "#{WEB_ARCHIVE}/save/status", params: { job_id_outlinks: job_id }))
88
+ raise Spn2ErrorNoOutlinks, json.inspect unless json.is_a? Array
89
+
90
+ json
91
+ end
92
+ alias status_outlinks status_job_id_outlinks
69
93
 
70
94
  private
71
95
 
@@ -85,19 +109,29 @@ module Spn2
85
109
  { Authorization: "LOW #{Spn2.access_key}:#{Spn2.secret_key}" }
86
110
  end
87
111
 
88
- def json(html_string)
89
- JSON.parse(doc = Nokogiri::HTML(html_string))
90
- rescue JSON::ParserError # an html response
91
- raise Spn2ErrorBadResponse, "No title in: #{html_string}" unless (title = doc.title)
112
+ def doc(html_string)
113
+ Nokogiri::HTML html_string
114
+ end
92
115
 
93
- parse_error_code_from_page_title(title)
116
+ def json(html_string)
117
+ JSON.parse(doc = doc(html_string))
118
+ rescue JSON::ParserError # an html response & therefore an error
119
+ parse_error_code_from_page_title(doc.title) if doc.title
120
+ parse_error_from_page_body(html_string) # if no title parse body
94
121
  end
95
122
 
96
- def parse_error_code_from_page_title(string)
97
- code = string.to_i
123
+ def parse_error_code_from_page_title(title_string)
124
+ code = title_string.to_i
98
125
  raise Spn2.const_get("Spn2Error#{code}") if ERROR_CODES.include? code
99
126
 
100
- raise Spn2ErrorUnknownResponseCode, string
127
+ raise Spn2ErrorUnknownResponseCode
128
+ end
129
+
130
+ def parse_error_from_page_body(html_string)
131
+ h1 = doc(html_string).xpath('//h1')
132
+ raise Spn2ErrorTooManyRequests if !h1.empty? && h1.text == 'Too Many Requests'
133
+
134
+ raise Spn2ErrorUnknownResponse, html_string # fall through
101
135
  end
102
136
 
103
137
  def options_valid?(opts)
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: spn2
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - MatzFan
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-06-30 00:00:00.000000000 Z
11
+ date: 2022-07-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: curb
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
82
  version: '5.16'
83
+ - !ruby/object:Gem::Dependency
84
+ name: minitest-parallel_fork
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.2'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.2'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: rake
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -136,7 +150,7 @@ dependencies:
136
150
  - - "~>"
137
151
  - !ruby/object:Gem::Version
138
152
  version: '0.6'
139
- description: Atomate the process of saving web pages to archive.org
153
+ description: Automate the process of saving web pages to archive.org
140
154
  email:
141
155
  executables: []
142
156
  extensions: []
@@ -153,7 +167,7 @@ files:
153
167
  - lib/spn2.rb
154
168
  - lib/spn2/version.rb
155
169
  - sig/spn2.rbs
156
- homepage: https://gitlab.com/matxfan/spn2
170
+ homepage: https://gitlab.com/matzfan/spn2
157
171
  licenses:
158
172
  - MIT
159
173
  metadata:
@@ -180,5 +194,5 @@ requirements: []
180
194
  rubygems_version: 3.3.17
181
195
  signing_key:
182
196
  specification_version: 4
183
- summary: Gem for the Save Page Now API of the Wayback Machine
197
+ summary: Gem for the Save Page Now 2 API of the Wayback Machine
184
198
  test_files: []