math_metadata_lookup 0.1.4 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +35 -21
- data/Rakefile +44 -4
- data/TODO +2 -2
- data/bin/math_metadata_lookup +5 -5
- data/lib/math_metadata_lookup/article.rb +25 -1
- data/lib/math_metadata_lookup/author.rb +5 -0
- data/lib/math_metadata_lookup/entity.rb +1 -1
- data/lib/math_metadata_lookup/lookup.rb +1 -1
- data/lib/math_metadata_lookup/reference.rb +23 -1
- data/lib/math_metadata_lookup/result.rb +30 -1
- data/lib/math_metadata_lookup/site.rb +42 -9
- data/lib/math_metadata_lookup/sites/cedram.rb +158 -0
- data/lib/math_metadata_lookup/sites/dmlcz.rb +57 -0
- data/lib/math_metadata_lookup/sites/mr.rb +3 -1
- data/lib/math_metadata_lookup/sites/numdam.rb +93 -0
- data/lib/math_metadata_lookup/sites/zbl.rb +2 -1
- data/lib/math_metadata_lookup/tools.rb +2 -1
- data/math_metadata_lookup.gemspec +4 -3
- metadata +23 -13
data/README.md
CHANGED
@@ -2,9 +2,18 @@ About
|
|
2
2
|
-----
|
3
3
|
|
4
4
|
This utility search mathematical reviews sites and fetches metadata about articles.
|
5
|
-
It returns results as one of text, xml, html, yaml or ruby formats.
|
5
|
+
It returns results as one of text, xml, html, yaml, json or ruby formats.
|
6
6
|
It can work with LaTeX accent notation.
|
7
7
|
|
8
|
+
Sites
|
9
|
+
=====
|
10
|
+
|
11
|
+
* MathSciNet
|
12
|
+
* Zentralblatt
|
13
|
+
* Numdam
|
14
|
+
* Cedram
|
15
|
+
* DmlCZ
|
16
|
+
|
8
17
|
|
9
18
|
Installation
|
10
19
|
------------
|
@@ -18,7 +27,7 @@ then you can install ``math_metadata_lookup`` using rubygems:
|
|
18
27
|
Command line usage example
|
19
28
|
--------------------------
|
20
29
|
|
21
|
-
To get full help run it without any argument
|
30
|
+
To get full help run it without any argument:
|
22
31
|
|
23
32
|
math_metadata_lookup
|
24
33
|
|
@@ -26,10 +35,12 @@ Fetching metadata about an article:
|
|
26
35
|
|
27
36
|
math_metadata_lookup.rb article -t "Sobolev embeddings with variable exponent. II"
|
28
37
|
|
29
|
-
Returns list of articles:
|
30
|
-
|
38
|
+
Returns list of articles as html:
|
39
|
+
|
40
|
+
bin/math_metadata_lookup.rb article -t "Sobolev embeddings" -a "Rakosnik" -a "Edmunds" -f html
|
31
41
|
|
32
42
|
Searching for authors:
|
43
|
+
|
33
44
|
bin/math_metadata_lookup.rb author -a "Vesely, Jiri"
|
34
45
|
|
35
46
|
|
@@ -39,11 +50,13 @@ Usage from ruby
|
|
39
50
|
require 'rubygems'
|
40
51
|
require 'math_metadata_lookup'
|
41
52
|
|
42
|
-
# initialize search engine to look only
|
53
|
+
# initialize search engine to look only into Mathematical Reviews database
|
43
54
|
l = MathMetadata::Lookup.new( :sites => [:mr], :verbose => false )
|
44
55
|
|
45
|
-
# fetch result
|
56
|
+
# fetch first site from search result
|
46
57
|
mr_result = l.article( :title => "Sobolev embeddings with variable exponent. II" ).first
|
58
|
+
|
59
|
+
# get first article from the :mr site
|
47
60
|
article = mr_result[:result].first
|
48
61
|
|
49
62
|
# print out article authors separated with semicolon
|
@@ -55,27 +68,28 @@ Resources
|
|
55
68
|
|
56
69
|
Content of the resource directory:
|
57
70
|
|
58
|
-
*
|
71
|
+
* ``math_metadata_lookup.js``: contains function ``toggle_references( id )``. It can toggle visibility of references in html document. By default are all references visible. If you set in css class references attribute display to none it will be hidden by default.
|
59
72
|
|
60
73
|
|
61
|
-
Function reference
|
62
|
-
|
74
|
+
Function reference for MathMetadata::Lookup class
|
75
|
+
-------------------------------------------------
|
63
76
|
|
64
|
-
#
|
77
|
+
#article( hash )
|
65
78
|
|
66
79
|
Hash arguments are:
|
67
80
|
|
68
81
|
* article id is known
|
69
|
-
* **:id**
|
82
|
+
* **:id** String
|
83
|
+
|
70
84
|
* article id is unknown
|
71
|
-
* **:title**
|
85
|
+
* **:title** String
|
72
86
|
* **:authors** Array of Strings
|
73
|
-
* **:year**
|
87
|
+
* **:year** String
|
74
88
|
|
75
89
|
Returns instance of class Result.
|
76
90
|
|
77
91
|
|
78
|
-
#
|
92
|
+
#author( hash )
|
79
93
|
|
80
94
|
Search for authors "name forms".
|
81
95
|
|
@@ -86,27 +100,27 @@ Hash arguments are:
|
|
86
100
|
Returns instance of class Result.
|
87
101
|
|
88
102
|
|
89
|
-
#
|
103
|
+
#heuristic( hash )
|
90
104
|
|
91
105
|
Returns only one best match from each site where similarity is higher then threshold.
|
92
|
-
It runs article
|
106
|
+
It runs article search with first two words from title and only surnames from author names.
|
93
107
|
The result of search is sorted by similarity and articles with similarity less then threshold are deleted.
|
94
108
|
Similarity is count as weighted average from title, authors and year using Levenshtein distance method.
|
95
109
|
The Levenshtein distance function is run on full given title and full given names.
|
96
110
|
|
97
111
|
Hash arguments are:
|
98
112
|
|
99
|
-
* **:title**
|
100
|
-
* **:
|
101
|
-
* **:year**
|
113
|
+
* **:title** String
|
114
|
+
* **:authors** Array of Strings
|
115
|
+
* **:year** String
|
102
116
|
* **:threshold** Float. Range: 0.0...1.0. Default: 0.6
|
103
117
|
|
104
118
|
Returns instance of class Result.
|
105
119
|
|
106
120
|
|
107
|
-
#
|
121
|
+
#reference( hash )
|
108
122
|
|
109
|
-
Parse reference string and run heuristic.
|
123
|
+
Parse reference string and run heuristic. It expects that authors are separated by colon.
|
110
124
|
|
111
125
|
Hash arguments are:
|
112
126
|
|
data/Rakefile
CHANGED
@@ -2,16 +2,23 @@
|
|
2
2
|
# vi: fenc=utf-8:expandtab:ts=2:sw=2:sts=2
|
3
3
|
#
|
4
4
|
# @author: Petr Kovar <pejuko@gmail.com>
|
5
|
-
$KCODE='UTF8'
|
5
|
+
$KCODE='UTF8' if RUBY_VERSION < "1.9"
|
6
6
|
|
7
7
|
require 'rake/gempackagetask'
|
8
8
|
require 'rake/clean'
|
9
9
|
|
10
|
-
CLEAN << "coverage" << "pkg" << "README.html" << "CHANGELOG.html" << '*.rbc'
|
10
|
+
CLEAN << "coverage" << "pkg" << "README.html" << "CHANGELOG.html" << '*.rbc' << "html/" << "yardoc/"
|
11
11
|
|
12
12
|
task :default => [:doc, :gem]
|
13
13
|
|
14
|
-
|
14
|
+
task :gem do |t|
|
15
|
+
load 'math_metadata_lookup.gemspec'
|
16
|
+
builder = Gem::Builder.new @spec
|
17
|
+
builder.build
|
18
|
+
end
|
19
|
+
|
20
|
+
|
21
|
+
docs = []
|
15
22
|
|
16
23
|
begin
|
17
24
|
require 'bluecloth'
|
@@ -27,11 +34,44 @@ begin
|
|
27
34
|
File.open(htmlfile, "w") { |f| f << md.to_html }
|
28
35
|
end
|
29
36
|
|
30
|
-
task :doc => [:readme]
|
31
37
|
|
32
38
|
task :readme do |t|
|
33
39
|
build_document("README.md")
|
34
40
|
end
|
35
41
|
|
42
|
+
docs << :readme
|
43
|
+
|
36
44
|
rescue LoadError
|
37
45
|
end
|
46
|
+
|
47
|
+
|
48
|
+
begin
|
49
|
+
|
50
|
+
require 'rake/rdoctask'
|
51
|
+
|
52
|
+
Rake::RDocTask.new do |rd|
|
53
|
+
rd.main = "README.md"
|
54
|
+
rd.rdoc_files.include("README.md", "lib/**/*.rb", "bin/*")
|
55
|
+
end
|
56
|
+
|
57
|
+
docs << :rdoc
|
58
|
+
|
59
|
+
rescue LoadError
|
60
|
+
end
|
61
|
+
|
62
|
+
|
63
|
+
begin
|
64
|
+
|
65
|
+
require 'yard'
|
66
|
+
|
67
|
+
YARD::Rake::YardocTask.new do |t|
|
68
|
+
t.files = ['README.md', 'lib/**/*.rb', 'bin/*'] # optional
|
69
|
+
t.options = ['--output-dir=yardoc'] # optional
|
70
|
+
end
|
71
|
+
|
72
|
+
docs << :yard
|
73
|
+
|
74
|
+
rescue LoadError
|
75
|
+
end
|
76
|
+
|
77
|
+
task :doc => docs
|
data/TODO
CHANGED
data/bin/math_metadata_lookup
CHANGED
@@ -39,22 +39,22 @@ def print_help
|
|
39
39
|
--threshold, -d <0.0...1.0> -- default: 0.6
|
40
40
|
|
41
41
|
|
42
|
-
reference -- parse reference string and run heuristic
|
42
|
+
reference -- parse reference string and run heuristic; it expects that authors are separated by colon
|
43
43
|
|
44
44
|
--reference, -r <string with reference> -- parse the string to get title, authors and year
|
45
45
|
--threshold, -d <0.0...1.0> -- default: 0.6
|
46
46
|
|
47
47
|
|
48
48
|
common options:
|
49
|
-
--site, -s <mr,zbl>
|
50
|
-
--format, -f <text|html|xml|ruby|yaml> -- output format, default: text
|
49
|
+
--site, -s <mr,zbl,dmlcz,cedram,numdam> -- repeatable, sites to search on, default: all
|
50
|
+
--format, -f <text|html|xml|ruby|yaml|json> -- output format, default: text
|
51
51
|
--verbose, -v
|
52
52
|
|
53
53
|
|
54
54
|
Examples:
|
55
55
|
|
56
56
|
#{$0} article -t \"Sobolev embeddings with variable exponent. II\"
|
57
|
-
#{$0} article -t \"Sobolev embeddings\" -a
|
57
|
+
#{$0} article -t \"Sobolev embeddings\" -a Rakosnik -a Edmunds -f html
|
58
58
|
#{$0} author -a \"Vesely, Jiri\"
|
59
59
|
#{$0} reference -r \"Kufner, A., John, O., and Fučík, S.: Function Spaces, Noordhoff, Leyden, and Academia, Prague, 1977\" -d 0.4
|
60
60
|
"
|
@@ -129,6 +129,6 @@ end
|
|
129
129
|
case $options[:format].to_sym
|
130
130
|
when :ruby
|
131
131
|
pp result
|
132
|
-
when :yaml, :html, :xml, :text
|
132
|
+
when :yaml, :html, :xml, :text, :json
|
133
133
|
puts result.format($options[:format])
|
134
134
|
end
|
@@ -1,5 +1,19 @@
|
|
1
1
|
module MathMetadata
|
2
2
|
|
3
|
+
# == Attributes
|
4
|
+
#
|
5
|
+
# * :id [String]
|
6
|
+
# * :similarity [String]
|
7
|
+
# * :publication [String]
|
8
|
+
# * :title [String]
|
9
|
+
# * :authors [Array of Strings]
|
10
|
+
# * :year [String]
|
11
|
+
# * :language [String]
|
12
|
+
# * :msc [Array of Strings]
|
13
|
+
# * :pages [String]
|
14
|
+
# * :issn [Array of Strings]
|
15
|
+
# * :keywords [Array of Strings]
|
16
|
+
# * :references [Array of MathMetadata::Reference]
|
3
17
|
class Article < Entity
|
4
18
|
|
5
19
|
def ==(article)
|
@@ -27,6 +41,11 @@ module MathMetadata
|
|
27
41
|
end
|
28
42
|
|
29
43
|
|
44
|
+
def to_json(*args)
|
45
|
+
@metadata.to_json(*args)
|
46
|
+
end
|
47
|
+
|
48
|
+
|
30
49
|
def to_text
|
31
50
|
result = ""
|
32
51
|
result += %~Id: #{@metadata[:id]}
|
@@ -39,7 +58,8 @@ Language: #{@metadata[:language]}
|
|
39
58
|
MSC: #{[@metadata[:msc]].flatten.join("; ")}
|
40
59
|
Pages: #{@metadata[:range]}
|
41
60
|
ISSN: #{@metadata[:issn].join('; ')}
|
42
|
-
Keywords: #{@metadata[:keywords].join('; ')}
|
61
|
+
Keywords: #{@metadata[:keywords].join('; ')}
|
62
|
+
Publisher: #{@metadata[:publisher]}~
|
43
63
|
@metadata[:references].to_a.each_with_index do |ref, idx|
|
44
64
|
a = ref.article
|
45
65
|
result += %~
|
@@ -79,6 +99,7 @@ Ref.: #{idx+1}. #{[a[:authors]].flatten.join("; ")}: #{a[:title]}~
|
|
79
99
|
<keyword>#{::CGI.escapeHTML keyword.to_s}</keyword>~
|
80
100
|
end
|
81
101
|
result += %~
|
102
|
+
<publisher>#{::CGI.escapeHTML @metadata[:publisher].to_s}</publisher>
|
82
103
|
<references>
|
83
104
|
~
|
84
105
|
@metadata[:references].to_a.each_with_index do |reference, idx|
|
@@ -97,6 +118,7 @@ Ref.: #{idx+1}. #{[a[:authors]].flatten.join("; ")}: #{a[:title]}~
|
|
97
118
|
<publication>#{::CGI.escapeHTML ref[:publication].to_s}</publication>
|
98
119
|
<publisher>#{::CGI.escapeHTML ref[:publisher].to_s}</publisher>
|
99
120
|
<year>#{::CGI.escapeHTML ref[:year].to_s}</year>
|
121
|
+
<range>#{::CGI.escapeHTML ref[:range].to_s}</range>
|
100
122
|
</reference>
|
101
123
|
~
|
102
124
|
end
|
@@ -121,6 +143,7 @@ Ref.: #{idx+1}. #{[a[:authors]].flatten.join("; ")}: #{a[:title]}~
|
|
121
143
|
<span class="label">Pages:</span> <span class="pages">#{::CGI.escapeHTML @metadata[:range].to_s}</span><br />
|
122
144
|
<span class="label">ISSN:</span> <span class="issn">#{::CGI.escapeHTML @metadata[:issn].to_a.join('; ')}</span><br />
|
123
145
|
<span class="label">Keywords:</span> <span class="keywords">#{::CGI.escapeHTML @metadata[:keywords].to_a.join('; ')}</span><br />
|
146
|
+
<span class="label">Publisher:</span> <span class="publisher">#{::CGI.escapeHTML @metadata[:publisher].to_s}</span><br />
|
124
147
|
~
|
125
148
|
if @metadata[:references].to_a.size > 0
|
126
149
|
result += %~
|
@@ -137,6 +160,7 @@ Ref.: #{idx+1}. #{[a[:authors]].flatten.join("; ")}: #{a[:title]}~
|
|
137
160
|
<span class="label">Publication:</span> <span class="publication">#{::CGI.escapeHTML ref[:publication].to_s}</span><br />
|
138
161
|
<span class="label">Publisher:</span> <span class="publisher">#{::CGI.escapeHTML ref[:publisher].to_s}</span><br />
|
139
162
|
<span class="label">Year:</span> <span class="year">#{::CGI.escapeHTML ref[:year].to_s}</span><br />
|
163
|
+
<span class="label">Pages:</span> <span class="range">#{::CGI.escapeHTML ref[:range].to_s}</span><br />
|
140
164
|
<span class="label">Id:</span> <span class="id">#{::CGI.escapeHTML ref[:id].to_s}</span><br />
|
141
165
|
</div>
|
142
166
|
~
|
@@ -36,7 +36,7 @@ module MathMetadata
|
|
36
36
|
end
|
37
37
|
|
38
38
|
|
39
|
-
#
|
39
|
+
# returns best result for each site
|
40
40
|
def heuristic( args={} )
|
41
41
|
opts = {:threshold => 0.6, :authors => []}.merge(args)
|
42
42
|
result = Result.new
|
@@ -3,6 +3,10 @@
|
|
3
3
|
|
4
4
|
module MathMetadata
|
5
5
|
|
6
|
+
# == Attributes
|
7
|
+
#
|
8
|
+
# * :source [String] original string
|
9
|
+
# * :article [MathMetadata::Article] parsed metadata
|
6
10
|
class Reference
|
7
11
|
|
8
12
|
# 1=authors, 2=title, 3=publication, 4=year, 5=range
|
@@ -27,6 +31,8 @@ module MathMetadata
|
|
27
31
|
ARTICLE_REFERENCE_10_RE = %r{([^:]+):\s*(.*?),\s*(.*?)\s*.*?}mi
|
28
32
|
# 1=authors, 2=title, 3=place, 4=year
|
29
33
|
ARTICLE_REFERENCE_11_RE = %r{([^:]+):\s*(.*),(.*?)\s+(\d{4})}mi
|
34
|
+
# 1=authors, 2=title
|
35
|
+
# ARTICLE_REFERENCE_12_RE = %r{([^:]+):\s*([^,]+),.*}mi
|
30
36
|
|
31
37
|
|
32
38
|
attr_accessor :source, :article, :suffix, :number, :reg
|
@@ -43,7 +49,20 @@ module MathMetadata
|
|
43
49
|
end
|
44
50
|
|
45
51
|
|
46
|
-
def
|
52
|
+
def to_json(*args)
|
53
|
+
{
|
54
|
+
:number => @number,
|
55
|
+
:source => @source,
|
56
|
+
:article => @article
|
57
|
+
}.to_json(*args)
|
58
|
+
end
|
59
|
+
|
60
|
+
|
61
|
+
def self.parse( ref_str )
|
62
|
+
str = ref_str.dup
|
63
|
+
if ref_str =~ %r~^\s*(?:\[.*?\]|\(.*?\)|\d+[\.]?)\s*(.*)~mi
|
64
|
+
str = $1
|
65
|
+
end
|
47
66
|
article = Article.new
|
48
67
|
rnumber = 0
|
49
68
|
suffix = nil
|
@@ -86,6 +105,9 @@ module MathMetadata
|
|
86
105
|
when 11
|
87
106
|
# 1=authors, 2=title, 3=place, 4=year
|
88
107
|
found = [$1, $2, nil, $4, nil, nil, $3]
|
108
|
+
# when 12
|
109
|
+
# # 1=authors, 2=title, 3=place, 4=year
|
110
|
+
# found = [$1, $2, nil, nil, nil, nil, nil]
|
89
111
|
end
|
90
112
|
rnumber = j
|
91
113
|
break
|
@@ -2,13 +2,37 @@
|
|
2
2
|
# vi: fenc=utf-8:expandtab:ts=2:sw=2:sts=2
|
3
3
|
|
4
4
|
require 'ya2yaml'
|
5
|
+
require 'json'
|
5
6
|
|
6
7
|
module MathMetadata
|
7
8
|
|
9
|
+
# == Example of Result class
|
10
|
+
#
|
11
|
+
# # initialize search engine
|
12
|
+
# l = MathMetadata::Lookup.new(:verbose => false)
|
13
|
+
#
|
14
|
+
# # result containing articles (l.article, l.heuristic, l.reference)
|
15
|
+
# result = l.article(:title => 'sobolev')
|
16
|
+
# result.each do |site|
|
17
|
+
# next unless site[:result]
|
18
|
+
# site[:result].each do |article|
|
19
|
+
# puts "#{site[:name]}: #{article[:title]} (#{article[:authors].join('; ')})"
|
20
|
+
# end
|
21
|
+
# end
|
22
|
+
#
|
23
|
+
# # result containing authors (l.author)
|
24
|
+
# result = l.author(:name => 'Vesely, Jiri')
|
25
|
+
# result.each do |site|
|
26
|
+
# next unless site[:result]
|
27
|
+
# site[:result].each do |author|
|
28
|
+
# puts "#{site[:name]}: #{author[:preferred]} (#{author[:forms].join('; ')})"
|
29
|
+
# end
|
30
|
+
# end
|
31
|
+
|
8
32
|
class Result
|
9
33
|
include Enumerable
|
10
34
|
|
11
|
-
FORMATS = [:ruby, :yaml, :xml, :html, :text]
|
35
|
+
FORMATS = [:ruby, :yaml, :json, :xml, :html, :text]
|
12
36
|
|
13
37
|
def initialize( meta=[] )
|
14
38
|
@metadata = meta
|
@@ -75,6 +99,11 @@ module MathMetadata
|
|
75
99
|
end
|
76
100
|
|
77
101
|
|
102
|
+
def to_json
|
103
|
+
@metadata.to_json
|
104
|
+
end
|
105
|
+
|
106
|
+
|
78
107
|
def to_array
|
79
108
|
@metadata
|
80
109
|
end
|
@@ -11,7 +11,7 @@ module MathMetadata
|
|
11
11
|
SITES = []
|
12
12
|
|
13
13
|
# Abstract class. Inherit in your sites definition.
|
14
|
-
class Site
|
14
|
+
class Site
|
15
15
|
|
16
16
|
def initialize( opts={} )
|
17
17
|
@options = { :verbose => true }.merge(opts)
|
@@ -45,7 +45,7 @@ module MathMetadata
|
|
45
45
|
page = fetch_article(opts)
|
46
46
|
articles = []
|
47
47
|
|
48
|
-
return
|
48
|
+
return articles unless page
|
49
49
|
|
50
50
|
if list_of_articles?(page)
|
51
51
|
articles = get_article_list(page)
|
@@ -54,10 +54,13 @@ module MathMetadata
|
|
54
54
|
articles << a unless a[:title].to_s.strip.empty?
|
55
55
|
end
|
56
56
|
|
57
|
-
return
|
57
|
+
#return [] if articles.size == 0
|
58
58
|
articles
|
59
59
|
end
|
60
60
|
|
61
|
+
def to_json
|
62
|
+
""
|
63
|
+
end
|
61
64
|
|
62
65
|
protected
|
63
66
|
|
@@ -112,6 +115,7 @@ module MathMetadata
|
|
112
115
|
def author_name_forms( name )
|
113
116
|
forms = []
|
114
117
|
|
118
|
+
return forms if self.class::AUTHOR_URL.empty?
|
115
119
|
page = fetch_author name
|
116
120
|
forms = get_author_m page, 2, 1
|
117
121
|
|
@@ -145,16 +149,18 @@ module MathMetadata
|
|
145
149
|
def get_article( page, opts={} )
|
146
150
|
a = Article.new( {
|
147
151
|
:id => get_article_id(page),
|
152
|
+
:language => get_article_language(page),
|
148
153
|
:authors => get_article_author_s(page),
|
149
154
|
:msc => get_article_msc(page),
|
150
155
|
:publication => get_article_publication(page),
|
156
|
+
:publisher => get_article_publisher(page),
|
151
157
|
:range => MathMetadata.normalize_range(get_article_range(page)),
|
152
158
|
:year => get_article_year(page),
|
153
159
|
:keywords => get_article_keyword_s(page),
|
154
160
|
:issn => get_article_issn_s(page)
|
155
161
|
} )
|
156
162
|
|
157
|
-
a.title
|
163
|
+
a.title = get_article_title(page)
|
158
164
|
a.title = a.title.to_s.gsub(/<\/span>/,'')
|
159
165
|
a.references = get_article_references(page) if opts[:references]
|
160
166
|
|
@@ -166,23 +172,50 @@ module MathMetadata
|
|
166
172
|
articles = []
|
167
173
|
page.scan(self.class::ARTICLE_ENTRY_RE).each do |match|
|
168
174
|
a = article(:id => match[0]).first
|
175
|
+
next unless a
|
169
176
|
articles << a unless a[:title].to_s.strip.empty?
|
170
177
|
end
|
171
178
|
articles
|
172
179
|
end
|
173
180
|
|
174
181
|
|
182
|
+
# select first n words from string which are longer then 2 characters
|
175
183
|
def nwords(s)
|
176
|
-
|
184
|
+
n = @options[:nwords].to_i
|
185
|
+
words = 0
|
186
|
+
s.split(" ").inject("") do |res,word|
|
187
|
+
if words < n
|
188
|
+
res << " " << word
|
189
|
+
words += 1 if word.size > 2
|
190
|
+
end
|
191
|
+
res.strip
|
192
|
+
end
|
177
193
|
end
|
178
194
|
|
179
195
|
|
180
|
-
def
|
196
|
+
def normalize_page(page, args={})
|
181
197
|
opts = {:entities => true}.merge(args)
|
182
|
-
|
198
|
+
|
199
|
+
if page =~ %r{<meta.*charset=([^"\s]+)}i
|
200
|
+
if page.respond_to?(:force_encoding)
|
201
|
+
page.force_encoding($1)
|
202
|
+
page.encode!("utf-8")
|
203
|
+
else
|
204
|
+
page = Iconv.new('utf-8', $1).iconv(page)
|
205
|
+
end
|
206
|
+
end
|
207
|
+
if page and opts[:entities]
|
208
|
+
coder = HTMLEntities.new
|
209
|
+
page = coder.decode(page)
|
210
|
+
end
|
211
|
+
|
212
|
+
end
|
213
|
+
|
214
|
+
|
215
|
+
def fetch_page( url, args={} )
|
183
216
|
puts "fetching #{url}" if @options[:verbose]
|
184
217
|
page = URI.parse(url).read
|
185
|
-
page =
|
218
|
+
page = normalize_page page, args
|
186
219
|
|
187
220
|
page
|
188
221
|
end
|
@@ -197,7 +230,7 @@ module MathMetadata
|
|
197
230
|
|
198
231
|
|
199
232
|
def join_article_authors( authors )
|
200
|
-
authors.collect { |author| URI.escape MathMetadata.normalize_name(author) }.join('
|
233
|
+
authors.collect { |author| URI.escape MathMetadata.normalize_name(author) }.join(';%20') || ''
|
201
234
|
end
|
202
235
|
|
203
236
|
def fetch_article( args={} )
|
@@ -0,0 +1,158 @@
|
|
1
|
+
# -*-: coding: utf-8 -*-
|
2
|
+
# vi: fenc=utf-8:expandtab:ts=2:sw=2:sts=2
|
3
|
+
|
4
|
+
require 'net/http'
|
5
|
+
|
6
|
+
module MathMetadata
|
7
|
+
|
8
|
+
# NUMDAM
|
9
|
+
# http://numdam.org/
|
10
|
+
class CEDRAM < Site
|
11
|
+
ID = :cedram
|
12
|
+
NAME = "CEDRAM"
|
13
|
+
URL = "http://cedram.org/"
|
14
|
+
|
15
|
+
|
16
|
+
# AUTHOR_URL % "Author, Name"
|
17
|
+
AUTHOR_URL = %~~
|
18
|
+
|
19
|
+
AUTHORS_RE = %r{}mi
|
20
|
+
AUTHOR_RE = %r{}mi
|
21
|
+
|
22
|
+
|
23
|
+
ARTICLE_ID_URL = "http://aif.cedram.org/aif-bin/item?id=%s"
|
24
|
+
ARTICLE_ID_AFST_URL = "http://afst.cedram.org/afst-bin/item?id=%s"
|
25
|
+
ARTICLE_URL = "http://www.cedram.org/cedram-bin/search?ti=%s&au=%s&py1=%s&lang=en"
|
26
|
+
# ARTICLE_URL = "http://www.cedram.org/cedram-bin/search"
|
27
|
+
|
28
|
+
LIST_OF_ARTICLES_RE = %r{matches(.*?)</td>}mi
|
29
|
+
ARTICLE_ENTRY_RE = %r{<a href=".*?id=([^"]+)">Details</a>}mi
|
30
|
+
ARTICLE_ID_RE = %r{<input type="hidden" name="id" value="([^"]+)" />}mi
|
31
|
+
ARTICLE_TITLE_RE = %r{<span class="atitle">(.*?)</span>}mi
|
32
|
+
ARTICLE_LANGUAGE_RE = %r{xxxxxxxxxxxxxxx}mi
|
33
|
+
ARTICLE_AUTHORS_RE = %r{<hr />(<a.*?)<br />}mi
|
34
|
+
ARTICLE_AUTHOR_RE = %r{<a.*?>(.*?)</a>}mi
|
35
|
+
ARTICLE_MSCS_RE = %r{Class. Math.:(.*?)<br}mi
|
36
|
+
ARTICLE_MSC_RE = %r{([^,]+),?\s*}mi
|
37
|
+
ARTICLE_PUBLICATION_RE = %r{(<span class="jtitle">.*?</span>,\s*<a href="http://aif.cedram.org:80/aif-bin/get\?id=[^"]+">\d+</a>\s*no\.\s*<a href="http://aif.cedram.org:80/aif-bin/browse\?id=[^"]+">\d+</a>\s*\(<a href="http://aif.cedram.org:80/aif-bin/get\?id=[^"]+">\d+</a>\))}mi
|
38
|
+
ARTICLE_PUBLISHER_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
39
|
+
ARTICLE_RANGE_RE = %r{(\d+\-\d+)\s*<br\s*/>\s*Article}mi
|
40
|
+
ARTICLE_YEAR_RE = %r{<span class="jtitle">.*?</span>,\s*<a href="http://aif.cedram.org:80/aif-bin/get\?id=[^"]+">\d+</a>\s*no\.\s*<a href="http://aif.cedram.org:80/aif-bin/browse\?id=[^"]+">\d+</a>\s*\(<a href="http://aif.cedram.org:80/aif-bin/get\?id=[^"]+">(\d+)</a>\)}mi
|
41
|
+
ARTICLE_ISSNS_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
42
|
+
ARTICLE_ISSN_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
43
|
+
ARTICLE_KEYWORDS_RE = %r{Keywords:(.*?)<div}mi
|
44
|
+
ARTICLE_KEYWORD_RE = %r{([^,]+),?\s*}mi
|
45
|
+
ARTICLE_REFERENCES_RE = %r{<P>\s*<B>\s*Bibliography\s*</B>\s*</P>\s*</DIV>\s*(.*?)\s*</td>}mi
|
46
|
+
ARTICLE_REFERENCE_RE = %r{\[\d+\]\s*(.*?)(?:(?:<BR>)|(?:<br\s*/>)|(?:</div>))}mi
|
47
|
+
|
48
|
+
def get_article_publication( page )
|
49
|
+
page =~ ARTICLE_PUBLICATION_RE
|
50
|
+
return nil unless $1
|
51
|
+
$1.gsub(/<.*?>/, '')
|
52
|
+
end
|
53
|
+
|
54
|
+
def get_article_references( page )
|
55
|
+
references = []
|
56
|
+
|
57
|
+
refs = get_article_reference_s page
|
58
|
+
|
59
|
+
i = 0;
|
60
|
+
refs.each do |r|
|
61
|
+
i+=1
|
62
|
+
r.gsub!(/\r|\n/, ' ')
|
63
|
+
r.gsub!(/ +/,' ')
|
64
|
+
ref = Reference.new nil, i
|
65
|
+
ref.source = r
|
66
|
+
ref.article = Article.new
|
67
|
+
|
68
|
+
r =~ %r{<span class="(?:atitle|btitre)">(.*?)</span>}im
|
69
|
+
ref.article.title = $1.to_s.strip
|
70
|
+
|
71
|
+
ref.article.authors = []
|
72
|
+
r.split(%r{<span class="bauteur">\s?(.*?)\s*</span>}).each do |a|
|
73
|
+
next if a.strip.empty? or a.strip == "-" or [',', '&'].include?(a.strip[0,1])
|
74
|
+
author = a.gsub /<.*?>/, ''
|
75
|
+
ref.article.authors << author
|
76
|
+
end
|
77
|
+
|
78
|
+
r =~ %r{<span class="(?:brevue|bjournal)">(.*?)</span>}mi
|
79
|
+
ref.article.publication = $1.strip if $1
|
80
|
+
|
81
|
+
=begin
|
82
|
+
r =~ %r{<bediteur>(.*?)</bediteur>}mi
|
83
|
+
ref.article.publisher = $1.strip if $1
|
84
|
+
|
85
|
+
r =~ %r{<blieued>(.*?)</blieued>}mi
|
86
|
+
ref.article.place = $1.strip if $1
|
87
|
+
=end
|
88
|
+
r =~ %r{(\d\d\d\d)}mi
|
89
|
+
ref.article.year = $1.strip if $1
|
90
|
+
|
91
|
+
r =~ %r{(\d+)\-(\d+)}mi
|
92
|
+
ref.article.range = "#{$1.strip}-#{$2.strip}" if $1
|
93
|
+
|
94
|
+
references << ref
|
95
|
+
end
|
96
|
+
|
97
|
+
references
|
98
|
+
end
|
99
|
+
|
100
|
+
|
101
|
+
def fetch_article( args={} )
|
102
|
+
opts = {:id => nil, :title => "", :year => "", :authors => []}.merge(args)
|
103
|
+
url = ""
|
104
|
+
unless opts[:id].to_s.strip.empty?
|
105
|
+
pattern = self.class::ARTICLE_ID_URL
|
106
|
+
pattern = self.class::ARTICLE_ID_AFST_URL if opts[:id] =~ /^AFST/mi
|
107
|
+
url = pattern % URI.escape(opts[:id].to_s.strip)
|
108
|
+
end
|
109
|
+
form = {'submit' => " Start search "}
|
110
|
+
if opts[:id].to_s.strip.empty?
|
111
|
+
author = join_article_authors opts[:authors]
|
112
|
+
title = opts[:title]
|
113
|
+
title = '' if not title.kind_of?(String)
|
114
|
+
title = MathMetadata.normalize_text(title)
|
115
|
+
title = nwords(title) if @options[:nwords]
|
116
|
+
|
117
|
+
form['ti'] = title
|
118
|
+
form['au'] = author unless author.empty?
|
119
|
+
form['py1'] = opts[:year].to_s
|
120
|
+
form['py2'] = ""
|
121
|
+
form['pages'] = ""
|
122
|
+
form['bibitems_text'] = ""
|
123
|
+
form['maxdocs'] = "300"
|
124
|
+
form['format'] = "short"
|
125
|
+
form['ti_op'] = "and"
|
126
|
+
form["au_op"] = "and"
|
127
|
+
form["bibitems.text_op"] = "and"
|
128
|
+
|
129
|
+
url = self.class::ARTICLE_URL % [URI.escape(title), author, opts[:year].to_s]
|
130
|
+
# url = self.class::ARTICLE_URL
|
131
|
+
else
|
132
|
+
return fetch_page(url, opts)
|
133
|
+
end
|
134
|
+
|
135
|
+
uri = URI.parse(url)
|
136
|
+
puts uri if opts[:verbose]
|
137
|
+
req = Net::HTTP::Post.new(uri.path, {
|
138
|
+
'Host' => "www.cedram.org",
|
139
|
+
'User-Agent'=> "Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20110429 Firefox/4.0.1",
|
140
|
+
'Accept' => "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
141
|
+
'Accept-Language' => "cs,en;q=0.7,en-us;q=0.3",
|
142
|
+
'Accept-Encoding' => "gzip, deflated",
|
143
|
+
'Accept-Charset' => "UTF-8,*",
|
144
|
+
'Keep-Alive' => "115",
|
145
|
+
'Connection' => "keep-alive",
|
146
|
+
'Referer' => "http://www.cedram.org/cedram-bin/search",
|
147
|
+
'Content-Type' => "application/x-www-form-urlencoded",
|
148
|
+
})
|
149
|
+
req.set_form_data(form)
|
150
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
151
|
+
resp = http.request(req)
|
152
|
+
page = normalize_page resp.body
|
153
|
+
page
|
154
|
+
end
|
155
|
+
|
156
|
+
end # MRev
|
157
|
+
|
158
|
+
end
|
@@ -0,0 +1,57 @@
|
|
1
|
+
# -*-: coding: utf-8 -*-
|
2
|
+
# vi: fenc=utf-8:expandtab:ts=2:sw=2:sts=2
|
3
|
+
|
4
|
+
module MathMetadata
|
5
|
+
|
6
|
+
# Czech Digital Mathematics Library
|
7
|
+
# http://dml.cz/
|
8
|
+
# does not support author search
|
9
|
+
class DMLCZ < Site
|
10
|
+
ID = :dmlcz
|
11
|
+
NAME = "DMLCZ"
|
12
|
+
URL = "http://dml.cz/"
|
13
|
+
|
14
|
+
|
15
|
+
# AUTHOR_URL % "Author, Name"
|
16
|
+
AUTHOR_URL = %~~
|
17
|
+
|
18
|
+
AUTHORS_RE = %r{}mi
|
19
|
+
AUTHOR_RE = %r{}mi
|
20
|
+
|
21
|
+
|
22
|
+
ARTICLE_ID_URL = "http://dml.cz/handle/10338.dmlcz/%s?show=full"
|
23
|
+
ARTICLE_URL = "http://dml.cz/advanced-search?num_search_field=10&results_per_page=100&scope=%%2F&field1=title&query1=%s%s&conjunction2=AND&field2=year&query2=%s&submit=Go"
|
24
|
+
|
25
|
+
LIST_OF_ARTICLES_RE = %r{<ul class="bibliolist">(.*?)</ul>}mi
|
26
|
+
ARTICLE_ENTRY_RE = %r{<li>.*?href="/handle/10338.dmlcz/(\d+)".*?</li>}mi
|
27
|
+
#ARTICLE_ENTRY_RE = %r{<div class="headlineText">\s*<a href="/mathscinet/search/publdoc.html[^"]+">\s*<strong>\s*([^< ]+)\s*</strong>\s*<strong>}mi
|
28
|
+
|
29
|
+
ARTICLE_ID_RE = %r{<meta\s*name="citation_id"\s*content="(\d+)"\s*/>}mi
|
30
|
+
ARTICLE_TITLE_RE = %r{<meta\s*name="dc.Title"\s*content="([^"]+)"\s*/>}mi
|
31
|
+
ARTICLE_LANGUAGE_RE = %r{<meta\s*name="citation_language"\s*content="([^"]+)"\s*/>}mi
|
32
|
+
ARTICLE_AUTHORS_RE = %r{<meta\s*name="citation_authors"\s*content="([^"]+)" />}mi
|
33
|
+
ARTICLE_AUTHOR_RE = %r{([^;]+);?\s*}mi
|
34
|
+
ARTICLE_MSCS_RE = %r{<table\s*xmlns:fn="http://www.w3.org/2003/11/xpath-functions"\s*class="dml_detail_view">(.*?)</table>}mi
|
35
|
+
ARTICLE_MSC_RE = %r{<tr>\s*<td\s*class="label">\s*MSC:\s*</td>\s*<td\s*class="value">\s*([^< ]+)\s*</td>\s*</tr>}mi
|
36
|
+
ARTICLE_PUBLICATION_RE = %r{<meta\s*name="citation_journal_title"\s*content="([^"]+)"\s*/>}mi
|
37
|
+
ARTICLE_PUBLISHER_RE = %r{<meta\s*name="citation_publisher"\s*content="([^"]+)"\s*/>}mi
|
38
|
+
ARTICLE_RANGE_RE = %r{<tr>\s*<td class="label">\s*Pages:\s*</td>\s*<td\s*class="value">([^ <]+)</td>\s*</tr>}mi
|
39
|
+
ARTICLE_YEAR_RE = %r{<meta\s*name="citation_year"\s*content="([^"]+)"\s*/>}mi
|
40
|
+
ARTICLE_ISSNS_RE = %r{<head>(.*?)</head>}mi
|
41
|
+
ARTICLE_ISSN_RE = %r{<meta\s*name="citation_issn"\s*content="([^"]+)"\s*/>}mi
|
42
|
+
ARTICLE_KEYWORDS_RE = %r{<head>(.*?)</head>}mi
|
43
|
+
ARTICLE_KEYWORD_RE = %r{<meta\s*name="citation_keywords"\s*content="([^"]+)"\s*/>}mi
|
44
|
+
ARTICLE_REFERENCES_RE = %r{<table\s*xmlns:fn="http://www.w3.org/2003/11/xpath-functions"\s*class="dml_detail_view">(.*?)</table>}mi
|
45
|
+
ARTICLE_REFERENCE_RE = %r{<tr>\s*<td class="label">Reference:\s*</td>\s*<td class="value">\s*\[[^\]]+\]\s*([^<]+)</td>\s*</tr>}mi
|
46
|
+
|
47
|
+
def join_article_authors( authors )
|
48
|
+
i = 2
|
49
|
+
authors.collect { |author|
|
50
|
+
i += 1
|
51
|
+
"&conjunction#{i}=AND&field#{i}=author&query#{i}=#{URI.escape MathMetadata.normalize_name(author)}"
|
52
|
+
}.join("&")
|
53
|
+
end
|
54
|
+
|
55
|
+
end # MRev
|
56
|
+
|
57
|
+
end
|
@@ -27,12 +27,14 @@ module MathMetadata
|
|
27
27
|
ARTICLE_ENTRY_RE = %r{<div class="headlineText">\s*<a href="/mathscinet/search/publdoc.html[^"]+">\s*<strong>\s*([^< ]+)\s*</strong>\s*<strong>}mi
|
28
28
|
|
29
29
|
ARTICLE_ID_RE = %r{<strong>(.*?)</strong>}mi
|
30
|
-
ARTICLE_TITLE_RE = %r{<span class="title">(?:<span class="searchHighlight">)?(.*?)</span
|
30
|
+
ARTICLE_TITLE_RE = %r{<span class="title">(?:<span class="searchHighlight">)?(.*?)</span>}mi
|
31
|
+
ARTICLE_LANGUAGE_RE = %r{<span class="sumlang">\s*\(?(.*?)\)?\s*</span>?}mi
|
31
32
|
ARTICLE_AUTHORS_RE = %r{<br />(<a href="/mathscinet/search/publications.html[^"]*">.*?</a>)<br />}mi
|
32
33
|
ARTICLE_AUTHOR_RE = %r{<a href="/mathscinet/search/publications.html[^"]*">(.*?)</a>}mi
|
33
34
|
ARTICLE_MSCS_RE = %r{<a href="/mathscinet/search/mscdoc.html\?code=[^"]*">(.*?)</a>}mi
|
34
35
|
ARTICLE_MSC_RE = %r{([^, ]+)}mi
|
35
36
|
ARTICLE_PUBLICATION_RE = %r{<a href="/mathscinet/search/journaldoc\.html\?cn=[^"]*">\s*<em>(.*?)</em>\s*</a>}mi
|
37
|
+
ARTICLE_PUBLISHER_RE = %r{xxxxxxxxxxxxxxxxxx}mi
|
36
38
|
ARTICLE_RANGE_RE = %r{(\d+–\d+)}mi
|
37
39
|
ARTICLE_YEAR_RE = %r{<a href="/mathscinet/search/publications\.html[^"]*">\s*\(?(\d{4})\)?, </a>}mi
|
38
40
|
ARTICLE_ISSNS_RE = %r{(ISSN.*?)<br>}mi
|
@@ -0,0 +1,93 @@
|
|
1
|
+
# -*-: coding: utf-8 -*-
|
2
|
+
# vi: fenc=utf-8:expandtab:ts=2:sw=2:sts=2
|
3
|
+
|
4
|
+
require 'rexml/document'
|
5
|
+
|
6
|
+
module MathMetadata
|
7
|
+
|
8
|
+
# NUMDAM
|
9
|
+
# http://numdam.org/
|
10
|
+
class NUMDAM < Site
|
11
|
+
ID = :numdam
|
12
|
+
NAME = "NUMDAM"
|
13
|
+
URL = "http://numdam.org/"
|
14
|
+
|
15
|
+
|
16
|
+
# AUTHOR_URL % "Author, Name"
|
17
|
+
AUTHOR_URL = %~~
|
18
|
+
|
19
|
+
AUTHORS_RE = %r{}mi
|
20
|
+
AUTHOR_RE = %r{}mi
|
21
|
+
|
22
|
+
|
23
|
+
ARTICLE_ID_URL = "http://www.numdam.org/numdam-bin/item?id=%s"
|
24
|
+
ARTICLE_URL = "http://www.numdam.org/numdam-bin/search?bibitems.au_op=and&bibitems.text_op=and&ti=%s&au=%s&ti_op=and&Index1.y=0&Index1.x=0&bibitems.ti_op=and&au_op=and&py1=%s"
|
25
|
+
|
26
|
+
LIST_OF_ARTICLES_RE = %r{<P>\s*<DIV\s+align="center">.*?</DIV>\s*</P>\s*(.*?)\s*<P>\s*<DIV\s+align="center">.*?</DIV>\s*</P>}mi
|
27
|
+
ARTICLE_ENTRY_RE = %r{<a href="http://www.numdam.org:80/numdam-bin/item\?id=([^"]+)">Full entry</a>}mi
|
28
|
+
|
29
|
+
ARTICLE_ID_RE = %r{<P>stable URL: http://www.numdam.org/item\?id=([^<]+)</P>}mi
|
30
|
+
ARTICLE_TITLE_RE = %r{<SPAN class="atitle">(.*?)</SPAN>}mi
|
31
|
+
ARTICLE_LANGUAGE_RE = %r{xxxxxxxxxxxxxxx}mi
|
32
|
+
ARTICLE_AUTHORS_RE = %r{<head>\s*(.*?)\s*</head>}mi
|
33
|
+
ARTICLE_AUTHOR_RE = %r{<meta content="([^"]+)" name="DC.creator">}mi
|
34
|
+
ARTICLE_MSCS_RE = %r{Class\. Math\.:\s*(.*?)\s*<br>}mi
|
35
|
+
ARTICLE_MSC_RE = %r{([^,]+)(?:,\s*)?}mi
|
36
|
+
ARTICLE_PUBLICATION_RE = %r{<SPAN class="jtitle">(.*?)</SPAN>}mi
|
37
|
+
ARTICLE_PUBLISHER_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
38
|
+
ARTICLE_RANGE_RE = %r{(\d+\-\d+)\s*<BR>\s*Full text}mi
|
39
|
+
ARTICLE_YEAR_RE = %r{py=(\d+)}mi
|
40
|
+
ARTICLE_ISSNS_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
41
|
+
ARTICLE_ISSN_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
42
|
+
ARTICLE_KEYWORDS_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
43
|
+
ARTICLE_KEYWORD_RE = %r{xxxxxxxxxxxxxxxxx}mi
|
44
|
+
ARTICLE_REFERENCES_RE = %r{<P>\s*<B>\s*Bibliography\s*</B>\s*</P>\s*</DIV>\s*(.*?\s*</td>)}mi
|
45
|
+
ARTICLE_REFERENCE_RE = %r{\[\d+\](.*?)(?:<BR>|</td>)}mi
|
46
|
+
|
47
|
+
def get_article_references( page )
|
48
|
+
references = []
|
49
|
+
|
50
|
+
refs = get_article_reference_s page
|
51
|
+
|
52
|
+
i = 0;
|
53
|
+
refs.each do |r|
|
54
|
+
i+=1
|
55
|
+
ref = Reference.new nil, i
|
56
|
+
ref.source = r.gsub(/ +/,' ')
|
57
|
+
ref.article = Article.new
|
58
|
+
|
59
|
+
r =~ %r{<span class="atitle">(.*?)</span>}im
|
60
|
+
ref.article.title = $1.to_s.strip
|
61
|
+
|
62
|
+
ref.article.authors = []
|
63
|
+
r.split(%r{<span class="bauteur">\s?(.*?)\s*</span>}).each do |a|
|
64
|
+
a.strip!
|
65
|
+
next if a.empty? or a.== "-" or [',', '&'].include?(a[0,1]) or a.downcase=='and'
|
66
|
+
author = a.gsub /<.*?>/, ''
|
67
|
+
ref.article.authors << author
|
68
|
+
end
|
69
|
+
|
70
|
+
r =~ %r{<span class="brevue">(.*?)</span>}mi
|
71
|
+
ref.article.publication = $1.strip if $1
|
72
|
+
|
73
|
+
r =~ %r{<bediteur>(.*?)</bediteur>}mi
|
74
|
+
ref.article.publisher = $1.strip if $1
|
75
|
+
|
76
|
+
r =~ %r{<blieued>(.*?)</blieued>}mi
|
77
|
+
ref.article.place = $1.strip if $1
|
78
|
+
|
79
|
+
r =~ %r{<bannee>(.*?)</bannee>}mi
|
80
|
+
ref.article.year = $1.strip if $1
|
81
|
+
|
82
|
+
r =~ %r{<bpagedeb>(\d+)</bpagedeb>-<bpagefin>(\d+)</bpagefin>}mi
|
83
|
+
ref.article.range = "#{$1.strip}-#{$2.strip}" if $1
|
84
|
+
|
85
|
+
references << ref
|
86
|
+
end
|
87
|
+
|
88
|
+
references
|
89
|
+
end
|
90
|
+
|
91
|
+
end # MRev
|
92
|
+
|
93
|
+
end
|
@@ -26,7 +26,8 @@ module MathMetadata
|
|
26
26
|
ARTICLE_ENTRY_RE = %r{<span[^>]*?>\s*<a href="\?q=an:([^\&]+)\&format=complete">[^<]+</a>\s*<b>}mi
|
27
27
|
|
28
28
|
ARTICLE_ID_RE = %r{<a href="\?q=an:.*?complete">(.*?)</a>}mi
|
29
|
-
ARTICLE_TITLE_RE = %r{</a><br>(.*?)\.</b>\s*\(
|
29
|
+
ARTICLE_TITLE_RE = %r{</a><br>(.*?)\.</b>\s*\(.*?\)<br>}mi
|
30
|
+
ARTICLE_LANGUAGE_RE = %r{</a><br>.*?\.</b>\s*\((.*?)\)<br>}mi
|
30
31
|
ARTICLE_AUTHORS_RE = %r{<br><b>(<a href="\?q=[^"]*">.*?</a>)<br>}mi
|
31
32
|
ARTICLE_AUTHOR_RE = %r{<a href="\?q=[^"]*">(.*?)</a>}mi
|
32
33
|
ARTICLE_MSCS_RE = %r{<dd>(.*?)</dd>}mi
|
@@ -44,7 +44,7 @@ module MathMetadata
|
|
44
44
|
|
45
45
|
|
46
46
|
def normalize_mscs( mscs )
|
47
|
-
mscs.map{|m| m.split(/,|;/) }.flatten.map{|m| m =~ /\s*\(?([^\s\)\(]+)\)?\s*/; $1}
|
47
|
+
mscs.map{|m| m.split(/,|;/) }.flatten.map{|m| m.gsub(/<.*?>/,'')}.map{|m| m =~ /\s*\(?([^\s\)\(]+)\)?\s*/; $1}
|
48
48
|
end
|
49
49
|
|
50
50
|
|
@@ -80,6 +80,7 @@ module MathMetadata
|
|
80
80
|
str = remove_punctuation(str)
|
81
81
|
str.gsub!(%r{\W+}, ' ')
|
82
82
|
str.gsub!(%r{(?:the|a|of|)\s+}i, ' ')
|
83
|
+
str.gsub!(%r{\s+}, ' ')
|
83
84
|
str.strip
|
84
85
|
end
|
85
86
|
|
@@ -6,25 +6,26 @@
|
|
6
6
|
require 'rubygems'
|
7
7
|
require 'find'
|
8
8
|
|
9
|
-
spec = Gem::Specification.new do |s|
|
9
|
+
@spec = Gem::Specification.new do |s|
|
10
10
|
s.platform = Gem::Platform::RUBY
|
11
11
|
s.summary = "Search mathematical reviews sites and fetches metadata about articles."
|
12
12
|
s.homepage = "http://github.com/pejuko/math_metadata_lookup"
|
13
13
|
s.email = "pejuko@gmail.com"
|
14
14
|
s.authors = ["Petr Kovar"]
|
15
15
|
s.name = 'math_metadata_lookup'
|
16
|
-
s.version = '0.
|
16
|
+
s.version = '0.2.0'
|
17
17
|
s.date = Time.now.strftime("%Y-%m-%d")
|
18
18
|
s.add_dependency('unicode')
|
19
19
|
s.add_dependency('unidecoder')
|
20
20
|
s.add_dependency('ya2yaml')
|
21
|
+
s.add_dependency('json')
|
21
22
|
s.require_path = 'lib'
|
22
23
|
s.files = ["bin/math_metadata_lookup", "README.md", "math_metadata_lookup.gemspec", "TODO", "Rakefile"]
|
23
24
|
s.files += Dir["lib/**/*.rb", "resources/*"]
|
24
25
|
s.executables = ["math_metadata_lookup"]
|
25
26
|
s.description = <<EOF
|
26
27
|
This utility/library search mathematical reviews sites and fetches metadata about articles.
|
27
|
-
It can return results as one of text, xml, html, yaml or ruby formats.
|
28
|
+
It can return results as one of text, xml, html, yaml, json or ruby formats.
|
28
29
|
EOF
|
29
30
|
end
|
30
31
|
|
metadata
CHANGED
@@ -1,13 +1,12 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: math_metadata_lookup
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
prerelease:
|
4
|
+
prerelease: false
|
6
5
|
segments:
|
7
6
|
- 0
|
8
|
-
-
|
9
|
-
-
|
10
|
-
version: 0.
|
7
|
+
- 2
|
8
|
+
- 0
|
9
|
+
version: 0.2.0
|
11
10
|
platform: ruby
|
12
11
|
authors:
|
13
12
|
- Petr Kovar
|
@@ -15,7 +14,7 @@ autorequire:
|
|
15
14
|
bindir: bin
|
16
15
|
cert_chain: []
|
17
16
|
|
18
|
-
date: 2011-
|
17
|
+
date: 2011-06-10 00:00:00 +02:00
|
19
18
|
default_executable:
|
20
19
|
dependencies:
|
21
20
|
- !ruby/object:Gem::Dependency
|
@@ -26,7 +25,6 @@ dependencies:
|
|
26
25
|
requirements:
|
27
26
|
- - ">="
|
28
27
|
- !ruby/object:Gem::Version
|
29
|
-
hash: 3
|
30
28
|
segments:
|
31
29
|
- 0
|
32
30
|
version: "0"
|
@@ -40,7 +38,6 @@ dependencies:
|
|
40
38
|
requirements:
|
41
39
|
- - ">="
|
42
40
|
- !ruby/object:Gem::Version
|
43
|
-
hash: 3
|
44
41
|
segments:
|
45
42
|
- 0
|
46
43
|
version: "0"
|
@@ -54,15 +51,27 @@ dependencies:
|
|
54
51
|
requirements:
|
55
52
|
- - ">="
|
56
53
|
- !ruby/object:Gem::Version
|
57
|
-
hash: 3
|
58
54
|
segments:
|
59
55
|
- 0
|
60
56
|
version: "0"
|
61
57
|
type: :runtime
|
62
58
|
version_requirements: *id003
|
59
|
+
- !ruby/object:Gem::Dependency
|
60
|
+
name: json
|
61
|
+
prerelease: false
|
62
|
+
requirement: &id004 !ruby/object:Gem::Requirement
|
63
|
+
none: false
|
64
|
+
requirements:
|
65
|
+
- - ">="
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
segments:
|
68
|
+
- 0
|
69
|
+
version: "0"
|
70
|
+
type: :runtime
|
71
|
+
version_requirements: *id004
|
63
72
|
description: |
|
64
73
|
This utility/library search mathematical reviews sites and fetches metadata about articles.
|
65
|
-
It can return results as one of text, xml, html, yaml or ruby formats.
|
74
|
+
It can return results as one of text, xml, html, yaml, json or ruby formats.
|
66
75
|
|
67
76
|
email: pejuko@gmail.com
|
68
77
|
executables:
|
@@ -83,8 +92,11 @@ files:
|
|
83
92
|
- lib/math_metadata_lookup/tools.rb
|
84
93
|
- lib/math_metadata_lookup/lookup.rb
|
85
94
|
- lib/math_metadata_lookup/result.rb
|
95
|
+
- lib/math_metadata_lookup/sites/numdam.rb
|
86
96
|
- lib/math_metadata_lookup/sites/mr.rb
|
87
97
|
- lib/math_metadata_lookup/sites/zbl.rb
|
98
|
+
- lib/math_metadata_lookup/sites/cedram.rb
|
99
|
+
- lib/math_metadata_lookup/sites/dmlcz.rb
|
88
100
|
- lib/math_metadata_lookup/reference.rb
|
89
101
|
- lib/math_metadata_lookup/entity.rb
|
90
102
|
- lib/math_metadata_lookup/author.rb
|
@@ -103,7 +115,6 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
103
115
|
requirements:
|
104
116
|
- - ">="
|
105
117
|
- !ruby/object:Gem::Version
|
106
|
-
hash: 3
|
107
118
|
segments:
|
108
119
|
- 0
|
109
120
|
version: "0"
|
@@ -112,14 +123,13 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
112
123
|
requirements:
|
113
124
|
- - ">="
|
114
125
|
- !ruby/object:Gem::Version
|
115
|
-
hash: 3
|
116
126
|
segments:
|
117
127
|
- 0
|
118
128
|
version: "0"
|
119
129
|
requirements: []
|
120
130
|
|
121
131
|
rubyforge_project:
|
122
|
-
rubygems_version: 1.
|
132
|
+
rubygems_version: 1.3.7
|
123
133
|
signing_key:
|
124
134
|
specification_version: 3
|
125
135
|
summary: Search mathematical reviews sites and fetches metadata about articles.
|