arx 0.3.2 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +61 -0
- data/README.md +81 -14
- data/Rakefile +1 -1
- data/arx.gemspec +4 -3
- data/lib/arx/cleaner.rb +44 -5
- data/lib/arx/entities/paper.rb +22 -13
- data/lib/arx/query/query.rb +34 -36
- data/lib/arx/query/validate.rb +3 -1
- data/lib/arx/version.rb +3 -3
- data/lib/arx.rb +17 -6
- metadata +23 -9
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 71eb1bee2ff468ea9327736e613e40b414c2b630d4f21b148da648500af3a47e
|
|
4
|
+
data.tar.gz: 8592a476d3abbeedfe2bef11637498b551fd478d643c5985a7c6a119aa79ca80
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: c9d708bd3f7d244f8557da0c9dba42d66c9a344a8f0316463879e76cc385fcf895c4c4089a5340543e166d771958d975fac685d27908755be0575e97f70625c2
|
|
7
|
+
data.tar.gz: 382f4ec892499c9e3b062693755aaa4af6c86790a830aaee39a2435d8af902d7e6bdcfc91712eaa539ef3e066c6a6ae04c151327951477194b0ca722848b0d2c
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,64 @@
|
|
|
1
|
+
# 1.1.0
|
|
2
|
+
|
|
3
|
+
#### Major changes
|
|
4
|
+
|
|
5
|
+
- Change `bundler` requirement to `>= 1.17` in `arx.gemspec`. ([#53](https://github.com/eonu/arx/pull/53))
|
|
6
|
+
- Remove `Arx.find` alias of `Arx.search`. ([#57](https://github.com/eonu/arx/pull/57))
|
|
7
|
+
- Add `Query#group` for subquery grouping support. ([#59](https://github.com/eonu/arx/pull/59))
|
|
8
|
+
|
|
9
|
+
#### Minor changes
|
|
10
|
+
|
|
11
|
+
- Add contributing guidelines (`CONTRIBUTING.md`). ([#48](https://github.com/eonu/arx/pull/48))
|
|
12
|
+
- Add issue templates to `./github/ISSUE_TEMPLATE` for ([#49](https://github.com/eonu/arx/pull/49), [#54](https://github.com/eonu/arx/pull/54), [#55](https://github.com/eonu/arx/pull/55)):
|
|
13
|
+
- **Error or warning**<br>For reporting an error or warning generated by Arx.
|
|
14
|
+
- **Unexpected or incorrect functionality**<br>For reporting something that doesn't seem to be working correctly or is unexpected.
|
|
15
|
+
- **Improvement to an existing feature**<br>For suggesting an improvement to a feature already offered by Arx.
|
|
16
|
+
- **Suggesting a new feature**<br>For proposing a new feature to Arx that would be beneficial.
|
|
17
|
+
- Add a pull request template at `./github/PULL_REQUEST_TEMPLATE.md`. ([#49](https://github.com/eonu/arx/pull/49))
|
|
18
|
+
- Remove issue templates from `CONTRIBUTING.md`. ([#49](https://github.com/eonu/arx/pull/49))
|
|
19
|
+
- Remove `LICENSE` from YARD documentation (remove from `.yardopts`). ([#50](https://github.com/eonu/arx/pull/50))
|
|
20
|
+
- Add RVM ruby version `2.6` to `.travis.yml`. ([#53](https://github.com/eonu/arx/pull/53))
|
|
21
|
+
- Add contributor code-of-conduct (`CODE_OF_CONDUCT.md`). ([#56](https://github.com/eonu/arx/pull/56))
|
|
22
|
+
- Thank Scholastica in `README.md`. ([#58](https://github.com/eonu/arx/pull/58))
|
|
23
|
+
- Add `bin/console` for gem debugging. ([#60](https://github.com/eonu/arx/pull/60))
|
|
24
|
+
- Modify `gem:debug` rake task to run `bin/console`. ([#60](https://github.com/eonu/arx/pull/60))
|
|
25
|
+
|
|
26
|
+
# 1.0.1
|
|
27
|
+
|
|
28
|
+
#### Major changes
|
|
29
|
+
|
|
30
|
+
- Add cases to handle `nil` query returns. ([#45](https://github.com/eonu/arx/pull/45))
|
|
31
|
+
- Add support for the `coveralls` gem (`.coveralls.yml` configuration file). ([#42](https://github.com/eonu/arx/pull/42))
|
|
32
|
+
|
|
33
|
+
#### Minor changes
|
|
34
|
+
|
|
35
|
+
- Add code coverage badge to `README.md`. ([#42](https://github.com/eonu/arx/pull/42))
|
|
36
|
+
- Remove documentation badge from top of `README.md`. ([#42](https://github.com/eonu/arx/pull/42))
|
|
37
|
+
- Change author email from `ed@mail.eonu.net` to `ed@eonu.net`. ([#43](https://github.com/eonu/arx/pull/43))
|
|
38
|
+
- Change `ends_with_connective?` to `end_with_connective?` to follow typical Ruby patterns. ([#44](https://github.com/eonu/arx/pull/44))
|
|
39
|
+
- Add `/coverage/` directory to `.gitignore`. ([#45](https://github.com/eonu/arx/pull/45))
|
|
40
|
+
- Remove version numbers from paper identifiers in error message in `README.md`. ([#46](https://github.com/eonu/arx/pull/46))
|
|
41
|
+
|
|
42
|
+
# 1.0.0
|
|
43
|
+
|
|
44
|
+
#### Major changes
|
|
45
|
+
|
|
46
|
+
- Change `Query` connective instance methods ([#38](https://github.com/eonu/arx/pull/38)):
|
|
47
|
+
- `#&` -> `#and`
|
|
48
|
+
- `#|` -> `#or`
|
|
49
|
+
- `#!` -> `#and_not`
|
|
50
|
+
- Split version number from paper identifier in `Paper` (add `version` key-word argument to `#id` and `#url`, and add `#version`). ([#39](https://github.com/eonu/arx/pull/39))
|
|
51
|
+
- Add `Cleaner.extract_id` and `Cleaner.extract_version`. ([#39](https://github.com/eonu/arx/pull/39))
|
|
52
|
+
- Make `Query#add_connective` always return `self`. ([#40](https://github.com/eonu/arx/pull/40))
|
|
53
|
+
- Redefine `Arx.search` to user `Paper.parse`'s `search` key-word argument. ([#40](https://github.com/eonu/arx/pull/40))
|
|
54
|
+
- Implement all tests. ([#40](https://github.com/eonu/arx/pull/40))
|
|
55
|
+
|
|
56
|
+
#### Minor changes
|
|
57
|
+
|
|
58
|
+
- Change declared regular expression literals from `%r""` to standard `//`. ([#39](https://github.com/eonu/arx/pull/39))
|
|
59
|
+
- Remove `#extract_id` from `Query` and use `Cleaner.extract_id` instead. ([#39](https://github.com/eonu/arx/pull/39))
|
|
60
|
+
- Redefine `Paper#revision?` to use the new `#version` instead of `#updated_at` and`#published_at`. ([#39](https://github.com/eonu/arx/pull/39))
|
|
61
|
+
|
|
1
62
|
# 0.3.2
|
|
2
63
|
|
|
3
64
|
#### Major changes
|
data/README.md
CHANGED
|
@@ -7,8 +7,8 @@
|
|
|
7
7
|
[](https://github.com/eonu/arx/blob/master/LICENSE)
|
|
8
8
|
|
|
9
9
|
[](https://codeclimate.com/github/eonu/arx/maintainability)
|
|
10
|
-
[](https://www.rubydoc.info/github/eonu/arx/master/toplevel)
|
|
11
10
|
[](https://travis-ci.com/eonu/arx)
|
|
11
|
+
[](https://coveralls.io/github/eonu/arx?branch=feature%2Fcoveralls)
|
|
12
12
|
|
|
13
13
|
**A Ruby interface for querying academic papers on the arXiv search API.**
|
|
14
14
|
|
|
@@ -24,6 +24,23 @@ Although [Scholastica](https://github.com/scholastica) offer a great [Ruby gem](
|
|
|
24
24
|
|
|
25
25
|
*Arx is a gem that allows for quick and easy querying of the arXiv search API, without having to worry about manually writing your own search query strings or parse the resulting XML query response to find the data you need.*
|
|
26
26
|
|
|
27
|
+
## Example
|
|
28
|
+
|
|
29
|
+
Suppose we wish to search for:
|
|
30
|
+
|
|
31
|
+
> Papers in the `cs.FL` (Formal Languages and Automata Theory) category whose title contains `"Buchi Automata"`, not authored by `Tomáš Babiak`, sorted by submission date (latest first).
|
|
32
|
+
|
|
33
|
+
This query can be executed with the following code:
|
|
34
|
+
|
|
35
|
+
```ruby
|
|
36
|
+
require 'arx'
|
|
37
|
+
|
|
38
|
+
papers = Arx(sort_by: :date_submitted) do |query|
|
|
39
|
+
query.category('cs.FL')
|
|
40
|
+
query.title('Buchi Automata').and_not.author('Tomáš Babiak')
|
|
41
|
+
end
|
|
42
|
+
```
|
|
43
|
+
|
|
27
44
|
## Features
|
|
28
45
|
|
|
29
46
|
- Ruby classes `Arx::Paper`, `Arx::Author` and `Arx::Category` that wrap the resulting Atom XML query result from the search API.
|
|
@@ -43,6 +60,10 @@ $ gem install arx
|
|
|
43
60
|
|
|
44
61
|
The documentation for Arx is hosted on [](https://www.rubydoc.info/github/eonu/arx/master/toplevel).
|
|
45
62
|
|
|
63
|
+
## Contributing
|
|
64
|
+
|
|
65
|
+
All contributions to Arx are greatly appreciated. Contribution guidelines can be found [here](/CONTRIBUTING.md).
|
|
66
|
+
|
|
46
67
|
## Usage
|
|
47
68
|
|
|
48
69
|
Before you start using Arx, you'll have to ensure that the gem is required (either in your current working file, or shell such as [IRB](https://en.wikipedia.org/wiki/Interactive_Ruby_Shell)):
|
|
@@ -178,16 +199,49 @@ q.author('Dominik Edelmann')
|
|
|
178
199
|
q.category('math.NA')
|
|
179
200
|
```
|
|
180
201
|
|
|
181
|
-
To change the logical connective used to chain subqueries, use the
|
|
202
|
+
To change the logical connective used to chain subqueries, use the `and`, `or`, `and_not` instance methods between the subquery calls:
|
|
182
203
|
|
|
183
204
|
```ruby
|
|
184
205
|
# Papers authored by "Eleonora Andreotti" in neither the "Numerical Analysis" (math.NA) or "Combinatorics (math.CO)" categories.
|
|
185
206
|
q = Arx::Query.new
|
|
186
207
|
q.author('Eleonora Andreotti')
|
|
187
|
-
q
|
|
208
|
+
q.and_not
|
|
188
209
|
q.category('math.NA', 'math.CO', connective: :or)
|
|
189
210
|
```
|
|
190
211
|
|
|
212
|
+
#### Grouping subqueries
|
|
213
|
+
|
|
214
|
+
Sometimes you'll have a query that requires nested or grouped logic, using parentheses. This can be done using the `Arx::Query#group` method.
|
|
215
|
+
|
|
216
|
+
This method accepts a block and basically parenthesises the result of whichever methods were called within the block.
|
|
217
|
+
|
|
218
|
+
For example, this will allow the last query from the previous section to be written as:
|
|
219
|
+
|
|
220
|
+
```ruby
|
|
221
|
+
# Papers authored by "Eleonora Andreotti" in neither the "Numerical Analysis" (math.NA) or "Combinatorics (math.CO)" categories.
|
|
222
|
+
q = Arx::Query.new
|
|
223
|
+
q.author('Eleonora Andreotti')
|
|
224
|
+
q.and_not
|
|
225
|
+
q.group do
|
|
226
|
+
q.category('math.NA').or.category('math.CO')
|
|
227
|
+
end
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
Another more complicated example with two grouped subqueries:
|
|
231
|
+
|
|
232
|
+
```ruby
|
|
233
|
+
# Papers whose title contains "Buchi Automata", either authored by "Tomáš Babiak", or in the "Formal Languages and Automata Theory (cs.FL)" category and not the "Computational Complexity (cs.CC)" category.
|
|
234
|
+
q = Arx::Query.new
|
|
235
|
+
q.title('Buchi Automata')
|
|
236
|
+
q.group do
|
|
237
|
+
q.author('Tomáš Babiak')
|
|
238
|
+
q.or
|
|
239
|
+
q.group do
|
|
240
|
+
q.category('cs.FL').and_not.category('cs.CC')
|
|
241
|
+
end
|
|
242
|
+
end
|
|
243
|
+
```
|
|
244
|
+
|
|
191
245
|
### Running search queries
|
|
192
246
|
|
|
193
247
|
Search queries can be executed with the `Arx()` method (alias of `Arx.search`). This method contains the same parameters as the `Arx::Query` initializer - including the list of IDs.
|
|
@@ -202,9 +256,7 @@ Calling the `Arx()` method with a block allows for the construction and executio
|
|
|
202
256
|
# Papers in the cs.FL category whose title contains "Buchi Automata", not authored by Tomáš Babiak
|
|
203
257
|
results = Arx(sort_by: :date_submitted) do |query|
|
|
204
258
|
query.category('cs.FL')
|
|
205
|
-
query.title('Buchi Automata')
|
|
206
|
-
query.!()
|
|
207
|
-
query.author('Tomáš Babiak')
|
|
259
|
+
query.title('Buchi Automata').and_not.author('Tomáš Babiak')
|
|
208
260
|
end
|
|
209
261
|
|
|
210
262
|
results.size #=> 18
|
|
@@ -220,9 +272,7 @@ The `Arx()` method accepts a predefined `Arx::Query` object through the `query`
|
|
|
220
272
|
# Papers in the cs.FL category whose title contains "Buchi Automata", not authored by Tomáš Babiak
|
|
221
273
|
q = Arx::Query.new(sort_by: :date_submitted)
|
|
222
274
|
q.category('cs.FL')
|
|
223
|
-
q.title('Buchi Automata')
|
|
224
|
-
q.!()
|
|
225
|
-
q.author('Tomáš Babiak')
|
|
275
|
+
q.title('Buchi Automata').and_not.author('Tomáš Babiak')
|
|
226
276
|
|
|
227
277
|
results = Arx(query: q)
|
|
228
278
|
results.size #=> 18
|
|
@@ -259,9 +309,18 @@ paper = Arx('1809.09415')
|
|
|
259
309
|
#=> #<Arx::Paper:0x00007fb657b59bd0>
|
|
260
310
|
|
|
261
311
|
paper.id
|
|
312
|
+
#=> "1809.09415"
|
|
313
|
+
paper.id(version: true)
|
|
262
314
|
#=> "1809.09415v1"
|
|
263
315
|
paper.url
|
|
316
|
+
#=> "http://arxiv.org/abs/1809.09415"
|
|
317
|
+
paper.url(version: true)
|
|
264
318
|
#=> "http://arxiv.org/abs/1809.09415v1"
|
|
319
|
+
paper.version
|
|
320
|
+
#=> 1
|
|
321
|
+
paper.revision?
|
|
322
|
+
#=> false
|
|
323
|
+
|
|
265
324
|
paper.title
|
|
266
325
|
#=> "On finitely ambiguous Büchi automata"
|
|
267
326
|
paper.summary
|
|
@@ -280,20 +339,18 @@ paper.published_at
|
|
|
280
339
|
#=> #<DateTime: 2018-09-25T11:40:39+00:00 ((2458387j,42039s,0n),+0s,2299161j)>
|
|
281
340
|
paper.updated_at
|
|
282
341
|
#=> #<DateTime: 2018-09-25T11:40:39+00:00 ((2458387j,42039s,0n),+0s,2299161j)>
|
|
283
|
-
paper.revision?
|
|
284
|
-
#=> false
|
|
285
342
|
|
|
286
343
|
# Paper's comment
|
|
287
344
|
paper.comment?
|
|
288
345
|
#=> false
|
|
289
346
|
paper.comment
|
|
290
|
-
#=> Arx::Error::MissingField (arXiv paper 1809.
|
|
347
|
+
#=> Arx::Error::MissingField (arXiv paper 1809.09415 is missing the `comment` metadata field)
|
|
291
348
|
|
|
292
349
|
# Paper's journal reference
|
|
293
350
|
paper.journal?
|
|
294
351
|
#=> false
|
|
295
352
|
paper.journal
|
|
296
|
-
#=> Arx::Error::MissingField (arXiv paper 1809.
|
|
353
|
+
#=> Arx::Error::MissingField (arXiv paper 1809.09415 is missing the `journal` metadata field)
|
|
297
354
|
|
|
298
355
|
# Paper's PDF URL
|
|
299
356
|
paper.pdf?
|
|
@@ -339,4 +396,14 @@ category.name
|
|
|
339
396
|
#=> "cond-mat"
|
|
340
397
|
category.full_name
|
|
341
398
|
#=> "Condensed Matter"
|
|
342
|
-
```
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
# Thanks
|
|
402
|
+
|
|
403
|
+
A large portion of this library is based on the brilliant work done by [Scholastica](https://github.com/scholastica) in their [`arxiv`](https://github.com/scholastica/arxiv) gem for retrieving individual papers from arXiv through the search API.
|
|
404
|
+
|
|
405
|
+
Arx was created mostly due to the seemingly inactive nature of Scholastica's repository. Additionally, it would have been infeasible to contribute such large changes to an already well-established gem, especially since https://scholasticahq.com/ appears to be dependent upon this gem.
|
|
406
|
+
|
|
407
|
+
---
|
|
408
|
+
|
|
409
|
+
Nevertheless, a special thanks goes out to Scholastica for providing the influence for Arx.
|
data/Rakefile
CHANGED
data/arx.gemspec
CHANGED
|
@@ -6,7 +6,7 @@ Gem::Specification.new do |spec|
|
|
|
6
6
|
spec.name = 'arx'
|
|
7
7
|
spec.version = Arx::VERSION
|
|
8
8
|
spec.authors = ['Edwin Onuonga']
|
|
9
|
-
spec.email = ['ed@
|
|
9
|
+
spec.email = ['ed@eonu.net']
|
|
10
10
|
spec.homepage = 'https://github.com/eonu/arx'
|
|
11
11
|
|
|
12
12
|
spec.summary = %q{A Ruby interface for querying academic papers on the arXiv search API.}
|
|
@@ -21,10 +21,11 @@ Gem::Specification.new do |spec|
|
|
|
21
21
|
spec.add_runtime_dependency 'nokogiri', '~> 1.10'
|
|
22
22
|
spec.add_runtime_dependency 'nokogiri-happymapper', '~> 0.8'
|
|
23
23
|
|
|
24
|
-
spec.add_development_dependency 'bundler', '
|
|
24
|
+
spec.add_development_dependency 'bundler', '>= 1.17'
|
|
25
25
|
spec.add_development_dependency 'rake', '~> 12.3'
|
|
26
|
-
spec.add_development_dependency 'thor', '~> 0.
|
|
26
|
+
spec.add_development_dependency 'thor', '~> 0.19.4'
|
|
27
27
|
spec.add_development_dependency 'rspec', '~> 3.7'
|
|
28
|
+
spec.add_development_dependency 'coveralls', '0.8.22'
|
|
28
29
|
|
|
29
30
|
spec.metadata = {
|
|
30
31
|
'source_code_uri' => spec.homepage,
|
data/lib/arx/cleaner.rb
CHANGED
|
@@ -4,11 +4,50 @@ module Arx
|
|
|
4
4
|
# @private
|
|
5
5
|
class Cleaner
|
|
6
6
|
|
|
7
|
-
#
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
7
|
+
# arXiv paper URL prefix format
|
|
8
|
+
URL_PREFIX = /^(https?\:\/\/)?(www.)?arxiv\.org\/abs\//
|
|
9
|
+
|
|
10
|
+
class << self
|
|
11
|
+
|
|
12
|
+
# Cleans strings.
|
|
13
|
+
# @param [String] string Removes newline/return characters and multiple spaces from a string.
|
|
14
|
+
# @return [String] The cleaned string.
|
|
15
|
+
def clean(string)
|
|
16
|
+
string.gsub(/\r\n|\r|\n/, ' ').strip.squeeze ' '
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
# Attempt to extract an arXiv identifier from a string such as a URL.
|
|
20
|
+
#
|
|
21
|
+
# @param string [String] The string to extract the ID from.
|
|
22
|
+
# @param version [Boolean] Whether or not to include the paper's version.
|
|
23
|
+
# @return [String] The extracted ID.
|
|
24
|
+
def extract_id(string, version: false)
|
|
25
|
+
if version == !!version
|
|
26
|
+
if string.is_a? String
|
|
27
|
+
trimmed = /#{URL_PREFIX}.+\/?$/.match?(string) ? string.gsub(/(#{URL_PREFIX})|(\/$)/, '') : string
|
|
28
|
+
raise ArgumentError.new("Couldn't extract arXiv identifier from: #{string}") unless Validate.id? trimmed
|
|
29
|
+
version ? trimmed : trimmed.sub(/v[0-9]+$/, '')
|
|
30
|
+
else
|
|
31
|
+
raise TypeError.new("Expected `string` to be a String, got: #{string.class}")
|
|
32
|
+
end
|
|
33
|
+
else
|
|
34
|
+
raise TypeError.new("Expected `version` to be boolean (TrueClass or FalseClass), got: #{version.class}")
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
# Attempt to extract a version number from an arXiv identifier.
|
|
39
|
+
#
|
|
40
|
+
# @param string [String] The arXiv identifier to extract the version number from.
|
|
41
|
+
# @return [String] The extracted version number.
|
|
42
|
+
def extract_version(string)
|
|
43
|
+
reversed = extract_id(string, version: true).reverse
|
|
44
|
+
|
|
45
|
+
if /^[0-9]+v/.match? reversed
|
|
46
|
+
reversed.partition('v').first.reverse.to_i
|
|
47
|
+
else
|
|
48
|
+
raise ArgumentError.new("Couldn't extract version number from identifier: #{string}")
|
|
49
|
+
end
|
|
50
|
+
end
|
|
12
51
|
end
|
|
13
52
|
end
|
|
14
53
|
end
|
data/lib/arx/entities/paper.rb
CHANGED
|
@@ -13,18 +13,33 @@ module Arx
|
|
|
13
13
|
# @example
|
|
14
14
|
# 1705.01662v1
|
|
15
15
|
# cond-mat/0211034
|
|
16
|
+
# @param version [Boolean] Whether or not to include the paper's version.
|
|
16
17
|
# @return [String] The paper's identifier.
|
|
17
|
-
def id
|
|
18
|
-
@id
|
|
18
|
+
def id(version: false)
|
|
19
|
+
Cleaner.extract_id @id, version: version
|
|
19
20
|
end
|
|
20
21
|
|
|
21
22
|
# The URL of the paper on the arXiv website.
|
|
22
23
|
# @example
|
|
23
24
|
# http://arxiv.org/abs/1705.01662v1
|
|
24
25
|
# http://arxiv.org/abs/cond-mat/0211034
|
|
26
|
+
# @param version [Boolean] Whether or not to include the paper's version.
|
|
25
27
|
# @return [String] The paper's arXiv URL.
|
|
26
|
-
def url
|
|
27
|
-
|
|
28
|
+
def url(version: false)
|
|
29
|
+
"http://arxiv.org/abs/#{id version: version}"
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
# The version of the paper.
|
|
33
|
+
# @return [Integer] The paper's version.
|
|
34
|
+
def version
|
|
35
|
+
Cleaner.extract_version @id
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
# Whether the paper is a revision or not.
|
|
39
|
+
# @note A paper is a revision if its {version} is greater than 1.
|
|
40
|
+
# @return [Boolean]
|
|
41
|
+
def revision?
|
|
42
|
+
version > 1
|
|
28
43
|
end
|
|
29
44
|
|
|
30
45
|
# @!method updated_at
|
|
@@ -58,13 +73,6 @@ module Arx
|
|
|
58
73
|
# @return [Array<Category>]
|
|
59
74
|
has_many :categories, Category, tag: 'category'
|
|
60
75
|
|
|
61
|
-
# Whether the paper is a revision or not.
|
|
62
|
-
# @note A paper is a revision if {updated_at} differs from {published_at}.
|
|
63
|
-
# @return [Boolean]
|
|
64
|
-
def revision?
|
|
65
|
-
@published_at != @updated_at
|
|
66
|
-
end
|
|
67
|
-
|
|
68
76
|
# @!method summary
|
|
69
77
|
# The summary (or abstract) of the paper.
|
|
70
78
|
# @return [String]
|
|
@@ -152,9 +160,10 @@ module Arx
|
|
|
152
160
|
end
|
|
153
161
|
|
|
154
162
|
inspector *%i[
|
|
155
|
-
id url
|
|
163
|
+
id url version revision?
|
|
164
|
+
title summary authors
|
|
156
165
|
primary_category categories
|
|
157
|
-
published_at updated_at
|
|
166
|
+
published_at updated_at
|
|
158
167
|
comment? comment
|
|
159
168
|
journal? journal
|
|
160
169
|
pdf? pdf_url
|
data/lib/arx/query/query.rb
CHANGED
|
@@ -22,13 +22,6 @@ module Arx
|
|
|
22
22
|
and_not: 'ANDNOT'
|
|
23
23
|
}
|
|
24
24
|
|
|
25
|
-
# Logical connective method names.
|
|
26
|
-
CONNECTIVE_METHODS = {
|
|
27
|
-
'&': :and,
|
|
28
|
-
'!': :and_not,
|
|
29
|
-
'|': :or
|
|
30
|
-
}
|
|
31
|
-
|
|
32
25
|
# Supported fields for the search queries made to the arXiv search API.
|
|
33
26
|
# @see https://arxiv.org/help/prep arXiv metadata fields
|
|
34
27
|
# @see https://arxiv.org/help/api/user-manual#query_details arXiv user manual (query details)
|
|
@@ -73,31 +66,30 @@ module Arx
|
|
|
73
66
|
|
|
74
67
|
ids.flatten!
|
|
75
68
|
unless ids.empty?
|
|
76
|
-
ids.map!
|
|
77
|
-
Validate.ids ids
|
|
69
|
+
ids.map! &Cleaner.method(:extract_id)
|
|
78
70
|
@query << "&#{PARAMS[:id_list]}=#{ids * ','}"
|
|
79
71
|
end
|
|
80
72
|
|
|
81
73
|
yield self if block_given?
|
|
82
74
|
end
|
|
83
75
|
|
|
84
|
-
# @!method
|
|
76
|
+
# @!method and
|
|
85
77
|
# Logical conjunction (+AND+) of subqueries.
|
|
86
78
|
# @see https://arxiv.org/help/api/user-manual#query_details arXiv user manual
|
|
87
79
|
# @return [self]
|
|
88
80
|
|
|
89
|
-
# @!method
|
|
81
|
+
# @!method and_not
|
|
90
82
|
# Logical negated conjunction (+ANDNOT+) of subqueries.
|
|
91
83
|
# @see https://arxiv.org/help/api/user-manual#query_details arXiv user manual
|
|
92
84
|
# @return [self]
|
|
93
85
|
|
|
94
|
-
# @!method
|
|
86
|
+
# @!method or
|
|
95
87
|
# Logical disjunction (+OR+) of subqueries.
|
|
96
88
|
# @see https://arxiv.org/help/api/user-manual#query_details arXiv user manual
|
|
97
89
|
# @return [self]
|
|
98
90
|
|
|
99
|
-
|
|
100
|
-
define_method(
|
|
91
|
+
CONNECTIVES.keys.each do |connective|
|
|
92
|
+
define_method(connective) { add_connective connective }
|
|
101
93
|
end
|
|
102
94
|
|
|
103
95
|
# @!method title(*values, exact: true, connective: :and)
|
|
@@ -181,6 +173,20 @@ module Arx
|
|
|
181
173
|
end
|
|
182
174
|
end
|
|
183
175
|
|
|
176
|
+
# Creates a nested subquery (grouped statements with parentheses).
|
|
177
|
+
#
|
|
178
|
+
# @return [self]
|
|
179
|
+
def group
|
|
180
|
+
add_connective :and unless end_with_connective?
|
|
181
|
+
@query << (search_query? ? '+' : "&#{PARAMS[:search_query]}=")
|
|
182
|
+
|
|
183
|
+
@query << CGI.escape('(')
|
|
184
|
+
yield
|
|
185
|
+
@query << CGI.escape(')')
|
|
186
|
+
|
|
187
|
+
self
|
|
188
|
+
end
|
|
189
|
+
|
|
184
190
|
# Returns the query string.
|
|
185
191
|
#
|
|
186
192
|
# @return [String]
|
|
@@ -196,8 +202,9 @@ module Arx
|
|
|
196
202
|
# @param connective [Symbol] The symbol of the logical connective to add.
|
|
197
203
|
# @return [self]
|
|
198
204
|
def add_connective(connective)
|
|
199
|
-
|
|
200
|
-
|
|
205
|
+
if search_query?
|
|
206
|
+
@query << "+#{CONNECTIVES[connective]}" unless end_with_connective? || start_of_group?
|
|
207
|
+
end
|
|
201
208
|
self
|
|
202
209
|
end
|
|
203
210
|
|
|
@@ -205,13 +212,10 @@ module Arx
|
|
|
205
212
|
#
|
|
206
213
|
# @param subquery [String] The subquery to add.
|
|
207
214
|
def add_subquery(subquery)
|
|
215
|
+
add_connective :and unless end_with_connective?
|
|
216
|
+
|
|
208
217
|
if search_query?
|
|
209
|
-
|
|
210
|
-
@query << "+#{subquery}"
|
|
211
|
-
else
|
|
212
|
-
add_connective :and
|
|
213
|
-
@query << "+#{subquery}"
|
|
214
|
-
end
|
|
218
|
+
@query << (start_of_group? ? "#{subquery}" : "+#{subquery}")
|
|
215
219
|
else
|
|
216
220
|
@query << "&#{PARAMS[:search_query]}=#{subquery}"
|
|
217
221
|
end
|
|
@@ -229,10 +233,17 @@ module Arx
|
|
|
229
233
|
#
|
|
230
234
|
# @see CONNECTIVES
|
|
231
235
|
# @return [Boolean]
|
|
232
|
-
def
|
|
236
|
+
def end_with_connective?
|
|
233
237
|
CONNECTIVES.values.any? &@query.method(:end_with?)
|
|
234
238
|
end
|
|
235
239
|
|
|
240
|
+
# Whether the query string ends in a start-of-group character '('.
|
|
241
|
+
#
|
|
242
|
+
# @return [Boolean]
|
|
243
|
+
def start_of_group?
|
|
244
|
+
@query.end_with? CGI.escape('(')
|
|
245
|
+
end
|
|
246
|
+
|
|
236
247
|
# Parenthesizes a string with CGI-escaped parentheses.
|
|
237
248
|
#
|
|
238
249
|
# @param string [String] The string to parenthesize.
|
|
@@ -248,18 +259,5 @@ module Arx
|
|
|
248
259
|
def enquote(string)
|
|
249
260
|
CGI.escape("\"") + string + CGI.escape("\"")
|
|
250
261
|
end
|
|
251
|
-
|
|
252
|
-
# Attempt to extract an ID from an arXiv URL.
|
|
253
|
-
#
|
|
254
|
-
# @param url [String] The URL to extract the ID from.
|
|
255
|
-
# @return [String] The extracted ID if successful, otherwise the original string.
|
|
256
|
-
def extract_id(url)
|
|
257
|
-
prefix = %r"^(https?\:\/\/)?(www.)?arxiv\.org\/abs\/"
|
|
258
|
-
if %r"#{prefix}.*$".match? url
|
|
259
|
-
url.sub(prefix, '').sub(%r"\/$", '')
|
|
260
|
-
else
|
|
261
|
-
url
|
|
262
|
-
end
|
|
263
|
-
end
|
|
264
262
|
end
|
|
265
263
|
end
|
data/lib/arx/query/validate.rb
CHANGED
|
@@ -94,7 +94,9 @@ module Arx
|
|
|
94
94
|
# @see NEW_IDENTIFIER_FORMAT
|
|
95
95
|
# @see OLD_IDENTIFIER_FORMAT
|
|
96
96
|
def id?(id)
|
|
97
|
-
NEW_IDENTIFIER_FORMAT.match?
|
|
97
|
+
return true if NEW_IDENTIFIER_FORMAT.match? id
|
|
98
|
+
return true if OLD_IDENTIFIER_FORMAT.match?(id) && Arx::CATEGORIES.keys.include?(id.split('/').first)
|
|
99
|
+
false
|
|
98
100
|
end
|
|
99
101
|
end
|
|
100
102
|
end
|
data/lib/arx/version.rb
CHANGED
data/lib/arx.rb
CHANGED
|
@@ -32,7 +32,7 @@ module Arx
|
|
|
32
32
|
# 1705.01662v1
|
|
33
33
|
# 1412.0135
|
|
34
34
|
# 0706.0001v2
|
|
35
|
-
NEW_IDENTIFIER_FORMAT =
|
|
35
|
+
NEW_IDENTIFIER_FORMAT = /^\d{4}\.\d{4,5}(v\d+)?$/
|
|
36
36
|
|
|
37
37
|
# The legacy arXiv paper identifier scheme (before 1 April 2007).
|
|
38
38
|
#
|
|
@@ -40,7 +40,7 @@ module Arx
|
|
|
40
40
|
# @example
|
|
41
41
|
# math/0309136v1
|
|
42
42
|
# cond-mat/0211034
|
|
43
|
-
OLD_IDENTIFIER_FORMAT =
|
|
43
|
+
OLD_IDENTIFIER_FORMAT = /^[a-z]+(\-[a-z]+)?\/\d{7}(v\d+)?$/
|
|
44
44
|
|
|
45
45
|
class << self
|
|
46
46
|
|
|
@@ -59,12 +59,23 @@ module Arx
|
|
|
59
59
|
yield query if block_given?
|
|
60
60
|
|
|
61
61
|
document = Nokogiri::XML(open ENDPOINT + query.to_s + '&max_results=10000').remove_namespaces!
|
|
62
|
-
results = Paper.parse(document, single:
|
|
63
|
-
|
|
64
|
-
|
|
62
|
+
results = Paper.parse(document, single: ids.size == 1)
|
|
63
|
+
|
|
64
|
+
if results.is_a? Paper
|
|
65
|
+
raise Error::MissingPaper.new(ids.first) if results.title.empty?
|
|
66
|
+
elsif results.is_a? Array
|
|
67
|
+
results.reject! {|paper| paper.title.empty?}
|
|
68
|
+
elsif results.nil?
|
|
69
|
+
if ids.size == 1
|
|
70
|
+
raise Error::MissingPaper.new(ids.first)
|
|
71
|
+
else
|
|
72
|
+
results = []
|
|
73
|
+
end
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
results
|
|
65
77
|
end
|
|
66
78
|
|
|
67
|
-
alias_method :find, :search
|
|
68
79
|
alias_method :get, :search
|
|
69
80
|
end
|
|
70
81
|
end
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: arx
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 1.1.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Edwin Onuonga
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2019-
|
|
11
|
+
date: 2019-04-24 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: nokogiri
|
|
@@ -42,16 +42,16 @@ dependencies:
|
|
|
42
42
|
name: bundler
|
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
|
44
44
|
requirements:
|
|
45
|
-
- - "
|
|
45
|
+
- - ">="
|
|
46
46
|
- !ruby/object:Gem::Version
|
|
47
|
-
version: '
|
|
47
|
+
version: '1.17'
|
|
48
48
|
type: :development
|
|
49
49
|
prerelease: false
|
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
|
51
51
|
requirements:
|
|
52
|
-
- - "
|
|
52
|
+
- - ">="
|
|
53
53
|
- !ruby/object:Gem::Version
|
|
54
|
-
version: '
|
|
54
|
+
version: '1.17'
|
|
55
55
|
- !ruby/object:Gem::Dependency
|
|
56
56
|
name: rake
|
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -72,14 +72,14 @@ dependencies:
|
|
|
72
72
|
requirements:
|
|
73
73
|
- - "~>"
|
|
74
74
|
- !ruby/object:Gem::Version
|
|
75
|
-
version:
|
|
75
|
+
version: 0.19.4
|
|
76
76
|
type: :development
|
|
77
77
|
prerelease: false
|
|
78
78
|
version_requirements: !ruby/object:Gem::Requirement
|
|
79
79
|
requirements:
|
|
80
80
|
- - "~>"
|
|
81
81
|
- !ruby/object:Gem::Version
|
|
82
|
-
version:
|
|
82
|
+
version: 0.19.4
|
|
83
83
|
- !ruby/object:Gem::Dependency
|
|
84
84
|
name: rspec
|
|
85
85
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -94,9 +94,23 @@ dependencies:
|
|
|
94
94
|
- - "~>"
|
|
95
95
|
- !ruby/object:Gem::Version
|
|
96
96
|
version: '3.7'
|
|
97
|
+
- !ruby/object:Gem::Dependency
|
|
98
|
+
name: coveralls
|
|
99
|
+
requirement: !ruby/object:Gem::Requirement
|
|
100
|
+
requirements:
|
|
101
|
+
- - '='
|
|
102
|
+
- !ruby/object:Gem::Version
|
|
103
|
+
version: 0.8.22
|
|
104
|
+
type: :development
|
|
105
|
+
prerelease: false
|
|
106
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
107
|
+
requirements:
|
|
108
|
+
- - '='
|
|
109
|
+
- !ruby/object:Gem::Version
|
|
110
|
+
version: 0.8.22
|
|
97
111
|
description:
|
|
98
112
|
email:
|
|
99
|
-
- ed@
|
|
113
|
+
- ed@eonu.net
|
|
100
114
|
executables: []
|
|
101
115
|
extensions: []
|
|
102
116
|
extra_rdoc_files: []
|