logstash-filter-ezproxy 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (39) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +2 -0
  3. data/CONTRIBUTORS +10 -0
  4. data/DEVELOPER.md +2 -0
  5. data/Gemfile +5 -0
  6. data/LICENSE +7 -0
  7. data/README.md +86 -0
  8. data/lib/logstash/filters/dawsonera.rb +41 -0
  9. data/lib/logstash/filters/ebscohost.rb +116 -0
  10. data/lib/logstash/filters/emerald.rb +96 -0
  11. data/lib/logstash/filters/ezproxy.rb +93 -0
  12. data/lib/logstash/filters/jstor.rb +112 -0
  13. data/lib/logstash/filters/lexisnexis.rb +37 -0
  14. data/lib/logstash/filters/sage.rb +39 -0
  15. data/lib/logstash/filters/sciencedirect.rb +171 -0
  16. data/lib/logstash/filters/tandf.rb +55 -0
  17. data/lib/logstash/filters/wiley.rb +202 -0
  18. data/logstash-filter-ezproxy.gemspec +21 -0
  19. data/spec/filters/dawsonera/dawsonera.2014-09-03.csv +4 -0
  20. data/spec/filters/dawsonera/dawsonera_spec.rb +15 -0
  21. data/spec/filters/ebscohost/ebscohost.2014-08-21.csv +13 -0
  22. data/spec/filters/ebscohost/ebscohost_spec.rb +22 -0
  23. data/spec/filters/emerald/emerald.2015-08-11.csv +15 -0
  24. data/spec/filters/emerald/emerald_spec.rb +17 -0
  25. data/spec/filters/ezproxy_spec.rb +53 -0
  26. data/spec/filters/jstor/jstor.2013-10-03.csv +18 -0
  27. data/spec/filters/jstor/jstor_spec.rb +20 -0
  28. data/spec/filters/lexisnexis/lexisnexis.2013-05-17.csv +2 -0
  29. data/spec/filters/lexisnexis/lexisnexis_spec.rb +15 -0
  30. data/spec/filters/sage/sage_spec.rb +16 -0
  31. data/spec/filters/sage/sagej.2016-12-05.csv +6 -0
  32. data/spec/filters/sciencedirect/sciencedirect_spec.rb +17 -0
  33. data/spec/filters/sciencedirect/sd.2013-01-09.csv +28 -0
  34. data/spec/filters/tandf/tandf.2015-03-25.csv +9 -0
  35. data/spec/filters/tandf/tandf_spec.rb +17 -0
  36. data/spec/filters/wiley/wiley.2013-04-15.csv +28 -0
  37. data/spec/filters/wiley/wiley_spec.rb +19 -0
  38. data/spec/spec_helper.rb +2 -0
  39. metadata +130 -0
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d55b67f72348d7a1e6b0afa2fa3df83ba7c09967
4
+ data.tar.gz: ca1a3ec43e516224aec41ac687f095e052fa53ed
5
+ SHA512:
6
+ metadata.gz: 00f3455d7f27aa70148ac3d952b486b4dae1119bed7258bac2431ef7e0959b77ae4410fe231b07d550679f590514d48d6ce22e195b59d4c968bbe58dc01c1cbf
7
+ data.tar.gz: b510124c44d2df0adddbebe35b8dd418689435652fc972475c0917c254adc14ac785a547b956546fc406e0b4d4343324c4d58929c31c023a1b13ff00c9d3f9b2
@@ -0,0 +1,2 @@
1
+ ## 0.1.0
2
+ - Plugin created with the logstash plugin generator
@@ -0,0 +1,10 @@
1
+ The following is a list of people who have contributed ideas, code, bug
2
+ reports, or in general have helped logstash along its way.
3
+
4
+ Contributors:
5
+ * Dom Belcher - dominic.belcher@gmail.com
6
+
7
+ Note: If you've sent us patches, bug reports, or otherwise contributed to
8
+ Logstash, and you aren't on the list above and want to be, please let us know
9
+ and we'll make sure you're here. Contributions from folks like you are what make
10
+ open source awesome.
@@ -0,0 +1,2 @@
1
+ # logstash-filter-ezproxy
2
+ Example filter plugin. This should help bootstrap your effort to write your own filter plugin!
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
3
+ # gem "logstash", :github => "elastic/logstash", :branch => "6.1"
4
+
5
+ gem 'rspec', '~> 3.0'
data/LICENSE ADDED
@@ -0,0 +1,7 @@
1
+ Copyright 2018 Lancaster University
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,86 @@
1
+ # Logstash Plugin
2
+
3
+ This is a plugin for [Logstash](https://github.com/elastic/logstash).
4
+
5
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
6
+
7
+ ## Documentation
8
+
9
+ Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
10
+
11
+ - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
12
+ - For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide
13
+
14
+ ## Need Help?
15
+
16
+ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
17
+
18
+ ## Developing
19
+
20
+ ### 1. Plugin Developement and Testing
21
+
22
+ #### Code
23
+ - To get started, you'll need JRuby with the Bundler gem installed.
24
+
25
+ - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
26
+
27
+ - Install dependencies
28
+ ```sh
29
+ bundle install
30
+ ```
31
+
32
+ #### Test
33
+
34
+ - Update your dependencies
35
+
36
+ ```sh
37
+ bundle install
38
+ ```
39
+
40
+ - Run tests
41
+
42
+ ```sh
43
+ bundle exec rspec
44
+ ```
45
+
46
+ ### 2. Running your unpublished Plugin in Logstash
47
+
48
+ #### 2.1 Run in a local Logstash clone
49
+
50
+ - Edit Logstash `Gemfile` and add the local plugin path, for example:
51
+ ```ruby
52
+ gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
53
+ ```
54
+ - Install plugin
55
+ ```sh
56
+ bin/logstash-plugin install --no-verify
57
+ ```
58
+ - Run Logstash with your plugin
59
+ ```sh
60
+ bin/logstash -e 'filter {awesome {}}'
61
+ ```
62
+ At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
63
+
64
+ #### 2.2 Run in an installed Logstash
65
+
66
+ You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
67
+
68
+ - Build your plugin gem
69
+ ```sh
70
+ gem build logstash-filter-awesome.gemspec
71
+ ```
72
+ - Install the plugin from the Logstash home
73
+ ```sh
74
+ bin/logstash-plugin install /your/local/plugin/logstash-filter-awesome.gem
75
+ ```
76
+ - Start Logstash and proceed to test the plugin
77
+
78
+ ## Contributing
79
+
80
+ All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
81
+
82
+ Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
83
+
84
+ It is more important to the community that you are able to contribute.
85
+
86
+ For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
@@ -0,0 +1,41 @@
1
+
2
+ require 'uri'
3
+ require 'cgi'
4
+
5
+ module DawsonEra
6
+ def DawsonEra.parse (input)
7
+
8
+ uri = URI(URI.unescape(input))
9
+
10
+ path = uri.path
11
+ params = {}
12
+ if (uri.query)
13
+ params = CGI::parse(uri.query)
14
+ end
15
+
16
+ data = {
17
+ "provider" => "dawsonera"
18
+ }
19
+
20
+ if (match = /^(\/abstract\/([0-9]+))$/.match(path))
21
+ data['rtype'] = 'ABS'
22
+ data['mime'] = 'MISC'
23
+ data['online_identifier'] = match[2]
24
+ data['unit_id'] = match[1]
25
+
26
+ elsif (match = /^(\/readonline\/([0-9]+))$/.match(path))
27
+ data['rtype'] = 'BOOK'
28
+ data['mime'] = 'MISC'
29
+ data['online_identifier'] = match[2]
30
+ data['unit_id'] = match[1]
31
+
32
+ elsif ((match = /^(\/download\/drm\/[0-9]+\/([0-9]+))$/.match(path)))
33
+ data['rtype'] = 'BOOK'
34
+ data['mime'] = 'PDF'
35
+ data['online_identifier'] = match[2]
36
+ data['unit_id'] = match[1]
37
+ end
38
+
39
+ return data;
40
+ end
41
+ end
@@ -0,0 +1,116 @@
1
+
2
+ require 'uri'
3
+ require 'cgi'
4
+
5
+ module Ebscohost
6
+
7
+ @openUrlFields = {
8
+ 'issn' => 'print_identifier',
9
+ 'isbn' => 'print_identifier',
10
+ 'volume' => 'vol',
11
+ 'issue' => 'num',
12
+ 'spage' => 'first_page',
13
+ 'title' => 'publication_title',
14
+ 'id' => 'unit_id'
15
+ }
16
+
17
+ def Ebscohost.parse (input)
18
+
19
+ uri = URI(URI.unescape(input))
20
+
21
+ path = uri.path
22
+ params = {}
23
+ if (uri.query)
24
+ params = CGI::parse(uri.query)
25
+ end
26
+
27
+ data = {
28
+ "provider" => "ebscohost"
29
+ }
30
+
31
+
32
+ if ((match = /^\/(ehost|eds)\/([a-z]+)(?:\/[a-z]+)?$/i.match(path)))
33
+ category = match[2].downcase
34
+
35
+ if (match[1].downcase == 'eds')
36
+ data['platform_name'] = 'EBSCO Discovery Service'
37
+ end
38
+
39
+
40
+ case (category)
41
+ when 'results', 'resultsadvanced'
42
+ data['rtype'] = 'TOC'
43
+ data['mime'] = 'MISC'
44
+
45
+ when 'ebookviewer'
46
+ data['rtype'] = 'BOOK'
47
+ data['mime'] = 'PDF'
48
+
49
+ when 'pdfviewer'
50
+ data['rtype'] = 'ARTICLE'
51
+ data['mime'] = 'PDF'
52
+
53
+ when 'search'
54
+ data['rtype'] = 'SEARCH'
55
+ data['mime'] = 'MISC'
56
+
57
+ when 'detail'
58
+ data['rtype'] = 'REF'
59
+ data['mime'] = 'HTML'
60
+
61
+ if (uri.fragment)
62
+ hashedUrl = uri.fragment
63
+ query = CGI::parse(hashedUrl)
64
+
65
+ if (query.key?('AN'))
66
+ data['unit_id'] = query['AN'][0]
67
+ end
68
+ end
69
+ end
70
+
71
+
72
+
73
+ elsif ((match = /^\/pdf[a-z0-9_]*\/pdf\/\S+\/([a-z0-9]+)\.pdf$/i.match(path)))
74
+ data['rtype'] = 'ARTICLE'
75
+ data['mime'] = 'PDF'
76
+ data['unit_id'] = match[1]
77
+
78
+ elsif (path.downcase === '/contentserver.asp')
79
+ data['rtype'] = 'ARTICLE'
80
+ data['mime'] = 'PDF'
81
+
82
+ if (params.key?('K'))
83
+ data['unit_id'] = params['K'][0]
84
+ end
85
+
86
+ elsif (path.downcase == '/openurl')
87
+ data['rtype'] = 'OPENURL'
88
+ data['mime'] = 'HTML'
89
+
90
+ params.each do |key, value|
91
+
92
+ if (@openUrlFields.key?(key))
93
+ data[@openUrlFields[key]] = value[0]
94
+ end
95
+ end
96
+
97
+
98
+
99
+ if (params.key?('pages'))
100
+ pagesMatch = /^(\d+)-(\d+)$/.match(params['pages'][0]);
101
+ if (pagesMatch)
102
+ data['first_page'] = pagesMatch[1]
103
+ data['last_page'] = pagesMatch[2]
104
+ end
105
+ end
106
+
107
+
108
+
109
+ if (data['unit_id'] && data['unit_id'].downcase.start_with?('doi:'))
110
+ data['doi'] = data['unit_id'] = data['unit_id'][4..-1]
111
+ end
112
+ end
113
+
114
+ return data
115
+ end
116
+ end
@@ -0,0 +1,96 @@
1
+
2
+ require 'uri'
3
+ require 'cgi'
4
+
5
+ module Emerald
6
+ def Emerald.parse (input)
7
+
8
+ uri = URI(URI.unescape(input))
9
+
10
+ path = uri.path
11
+ params = {}
12
+ if (uri.query)
13
+ params = CGI::parse(uri.query)
14
+ end
15
+
16
+ data = {
17
+ "provider" => "emerald"
18
+ }
19
+
20
+ if ((match = /^\/series\/([a-z]+)$/.match(path)))
21
+ data['rtype'] = 'BOOKSERIE'
22
+ data['mime'] = 'MISC'
23
+ data['title_id'] = match[1]
24
+ data['unit_id'] = 'series/' +match[1]
25
+ elsif ((match = /^\/doi\/([a-z]+)\/([0-9]{2}\.[0-9]{4,5})\/(([A-Z]{1})([0-9]+)([-])([0-9]+)[(]([0-9]{4})[)]([0-9]+))$/.match(path)))
26
+
27
+ if (match[1] === 'abs')
28
+ data['rtype'] = 'ABS'
29
+ data['mime'] = 'MISC'
30
+ elsif (match[1] === 'book')
31
+ data['rtype'] = 'BOOKSERIE'
32
+ data['mime'] = 'MISC'
33
+ elsif (match[1] === 'full')
34
+ data['mime'] = 'HTML'
35
+ data['rtype'] = 'ARTICLE'
36
+ elsif (match[1] === 'pdfplus')
37
+ data['mime'] = 'PDFPLUS'
38
+ data['rtype'] = 'ARTICLE'
39
+ else
40
+ data['rtype'] = 'ARTICLE'
41
+ data['mime'] = 'MISC'
42
+ end
43
+
44
+ data['publication_date']= match[8]
45
+ data['title_id'] = match[5] +match[6] +match[7]
46
+ data['unit_id'] =data['doi'] = match[2] + '/' + match[3]
47
+ elsif ((match = /^\/loi\/([a-z]+)$/.match(path)))
48
+
49
+ data['mime'] = 'MISC'
50
+ data['title_id'] = match[1]
51
+ data['unit_id'] = 'loi/' +match[1]
52
+ elsif ((match = /^\/toc\/([a-z]+)\/([0-9]+)\/([0-9]+)/.match(path)))
53
+ data['rtype'] = 'TOC'
54
+ data['mime'] = 'MISC'
55
+ data['title_id'] = match[1]
56
+ data['unit_id'] = match[1] + '/' + match[2] + '/'+ match[3]
57
+ elsif ((match = /^\/doi\/([a-z]+)\/([0-9]{2}\.[0-9]{4,5})\/(([A-Z]+)([-])([0-9]+)([-])([0-9]+)([-])([0-9]+))$/.match(path)))
58
+
59
+ if (match[1] === 'abs')
60
+ data['rtype'] = 'ABS'
61
+ data['mime'] = 'MISC'
62
+ elsif (match[1] === 'full')
63
+ data['mime'] = 'HTML'
64
+ data['rtype'] = 'ARTICLE'
65
+ elsif (match[1] === 'pdfplus')
66
+ data['mime'] = 'PDFPLUS'
67
+ data['rtype'] = 'ARTICLE'
68
+ else
69
+ data['rtype'] = 'ARTICLE'
70
+ end
71
+
72
+ data['title_id'] = match[4]
73
+ data['unit_id'] =data['doi'] = match[2] + '/' + match[3]
74
+ elsif ((match = /^\/doi\/([a-z]+)\/([0-9]{2}\.[0-9]{4,5})\/([0-9]+)$/.match(path)))
75
+
76
+ if (match[1] === 'abs')
77
+ data['rtype'] = 'ABS'
78
+ data['mime'] = 'MISC'
79
+ elsif (match[1] === 'full')
80
+ data['mime'] = 'HTML'
81
+ data['rtype'] = 'ARTICLE'
82
+ elsif (match[1] === 'pdfplus')
83
+ data['mime'] = 'PDFPLUS'
84
+ data['rtype'] = 'ARTICLE'
85
+ else
86
+ data['rtype'] = 'ARTICLE'
87
+ end
88
+
89
+ data['title_id'] = match[3]
90
+ data['unit_id'] =data['doi'] = match[2] + '/' + match[3]
91
+ end
92
+
93
+
94
+ return data;
95
+ end
96
+ end
@@ -0,0 +1,93 @@
1
+ # encoding: utf-8
2
+ require "logstash/filters/base"
3
+ require "logstash/namespace"
4
+ require_relative "./jstor"
5
+ require_relative "./lexisnexis"
6
+ require_relative "./sage"
7
+ require_relative "./wiley"
8
+ require_relative "./sciencedirect"
9
+ require_relative "./dawsonera"
10
+ require_relative "./tandf"
11
+ require_relative "./emerald"
12
+ require_relative "./ebscohost"
13
+ require 'uri'
14
+ require 'cgi'
15
+
16
+ # This filter will replace the contents of the default
17
+ # message field with whatever you specify in the configuration.
18
+ #
19
+ # It is only intended to be used as an .
20
+ class LogStash::Filters::Ezproxy < LogStash::Filters::Base
21
+
22
+ # Setting the config_name here is required. This is how you
23
+ # configure this filter from your Logstash config.
24
+ #
25
+ # filter {
26
+ # {
27
+ # message => "My message..."
28
+ # }
29
+ # }
30
+ #
31
+ config_name "ezproxy"
32
+
33
+ # The url to be parsed by the filter
34
+ config :url, :validate => :string, :required => true
35
+
36
+ # hosts = {
37
+ # "www.jstor.org" => Jstor::parse
38
+ # }
39
+
40
+
41
+ public
42
+ def register
43
+ # Add instance variables
44
+ end # def register
45
+
46
+ public
47
+ def filter(event)
48
+ url = event.get(@url)
49
+ data = {}
50
+ uri = URI(URI::extract(url)[0])
51
+
52
+ # if (uri.host == "ezproxy.lancs.ac.uk")
53
+ # if (uri.query)
54
+ # puts uri
55
+ # params = CGI::parse(uri.query)
56
+ # if params.key?('url')
57
+ # uri = URI(params['url'][0])
58
+ # elsif params.key?('qurl')
59
+ # uri = URI(params['qurl'][0])
60
+ # end
61
+ # event.tag("requested_host_ezproxy")
62
+ # event.set("requested_host", uri.host)
63
+ # end
64
+ # end
65
+
66
+ case
67
+ when uri.host.include?("www.jstor.org")
68
+ data = Jstor::parse(uri.to_s)
69
+ when uri.host.include?("www.lexisnexis.com")
70
+ data = LexisNexis::parse(uri.to_s)
71
+ when uri.host.include?("journals.sagepub.com")
72
+ data = Sage::parse(uri.to_s)
73
+ when uri.host.include?("onlinelibrary.wiley.com")
74
+ data = Wiley::parse(uri.to_s)
75
+ when uri.host.include?("www.sciencedirect.com")
76
+ data = ScienceDirect::parse(uri.to_s)
77
+ when uri.host.include?("www.dawsonera.com")
78
+ data = DawsonEra::parse(uri.to_s)
79
+ when uri.host.include?("www.tandfonline.com")
80
+ data = TandF::parse(uri.to_s)
81
+ when uri.host.include?("www.emeraldinsight.com")
82
+ data = Emerald::parse(uri.to_s)
83
+ when uri.host.include?("ebscohost.com")
84
+ data = Ebscohost::parse(uri.to_s)
85
+ end
86
+ event.set("request_metadata", data)
87
+ event.tag("ezproxy_parse_success")
88
+
89
+
90
+ # filter_matched should go in the last line of our successful code
91
+ filter_matched(event)
92
+ end # def filter
93
+ end # class LogStash::Filters::Ezproxy