logstash-input-multirss 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 328bb4fc51f7439c362bdde86eeaa3fd5921d24bf782e42463924def6a98118e
4
+ data.tar.gz: 62a56620434f89a81493352754d570a346e065ebafddfc2bc2fb7f6641d074fa
5
+ SHA512:
6
+ metadata.gz: c8a7ddab9008630b748a318af49bbbd77a5d92d97f45b45ea2f43f60623fd6f6556c2bc1d773bd73475a867a373280e222a789e3c4f2d3eca32cbb7e5cb1c6a0
7
+ data.tar.gz: a5ee270023b86ab495290916b4f4be34f5f8f2411f2b34ba3e4bf32ce031e854e07fb600977e8d6c24577b352272819820e8d188baca4211a649735ba1ee6e68
@@ -0,0 +1,2 @@
1
+ ## 0.1.0
2
+ - Plugin created with the logstash plugin generator
@@ -0,0 +1,10 @@
1
+ The following is a list of people who have contributed ideas, code, bug
2
+ reports, or in general have helped logstash along its way.
3
+
4
+ Contributors:
5
+ * -
6
+
7
+ Note: If you've sent us patches, bug reports, or otherwise contributed to
8
+ Logstash, and you aren't on the list above and want to be, please let us know
9
+ and we'll make sure you're here. Contributions from folks like you are what make
10
+ open source awesome.
@@ -0,0 +1,2 @@
1
+ # logstash-input-multirss
2
+ Example input plugin. This should help bootstrap your effort to write your own input plugin!
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
3
+
data/LICENSE ADDED
@@ -0,0 +1,11 @@
1
+ Licensed under the Apache License, Version 2.0 (the "License");
2
+ you may not use this file except in compliance with the License.
3
+ You may obtain a copy of the License at
4
+
5
+ http://www.apache.org/licenses/LICENSE-2.0
6
+
7
+ Unless required by applicable law or agreed to in writing, software
8
+ distributed under the License is distributed on an "AS IS" BASIS,
9
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10
+ See the License for the specific language governing permissions and
11
+ limitations under the License.
@@ -0,0 +1,86 @@
1
+ # Logstash Plugin
2
+
3
+ This is a plugin for [Logstash](https://github.com/elastic/logstash).
4
+
5
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
6
+
7
+ ## Documentation
8
+
9
+ Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
10
+
11
+ - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
12
+ - For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide
13
+
14
+ ## Need Help?
15
+
16
+ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
17
+
18
+ ## Developing
19
+
20
+ ### 1. Plugin Developement and Testing
21
+
22
+ #### Code
23
+ - To get started, you'll need JRuby with the Bundler gem installed.
24
+
25
+ - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
26
+
27
+ - Install dependencies
28
+ ```sh
29
+ bundle install
30
+ ```
31
+
32
+ #### Test
33
+
34
+ - Update your dependencies
35
+
36
+ ```sh
37
+ bundle install
38
+ ```
39
+
40
+ - Run tests
41
+
42
+ ```sh
43
+ bundle exec rspec
44
+ ```
45
+
46
+ ### 2. Running your unpublished Plugin in Logstash
47
+
48
+ #### 2.1 Run in a local Logstash clone
49
+
50
+ - Edit Logstash `Gemfile` and add the local plugin path, for example:
51
+ ```ruby
52
+ gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
53
+ ```
54
+ - Install plugin
55
+ ```sh
56
+ bin/logstash-plugin install --no-verify
57
+ ```
58
+ - Run Logstash with your plugin
59
+ ```sh
60
+ bin/logstash -e 'filter {awesome {}}'
61
+ ```
62
+ At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
63
+
64
+ #### 2.2 Run in an installed Logstash
65
+
66
+ You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
67
+
68
+ - Build your plugin gem
69
+ ```sh
70
+ gem build logstash-filter-awesome.gemspec
71
+ ```
72
+ - Install the plugin from the Logstash home
73
+ ```sh
74
+ bin/logstash-plugin install /your/local/plugin/logstash-filter-awesome.gem
75
+ ```
76
+ - Start Logstash and proceed to test the plugin
77
+
78
+ ## Contributing
79
+
80
+ All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
81
+
82
+ Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
83
+
84
+ It is more important to the community that you are able to contribute.
85
+
86
+ For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
@@ -0,0 +1,137 @@
1
+ # encoding: utf-8
2
+ require "logstash/inputs/base"
3
+ require "logstash/namespace"
4
+ require "stud/interval"
5
+ require "net/http"
6
+ require "uri"
7
+ require "mechanize"
8
+ require "rss"
9
+
10
+ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
11
+ config_name "multirss"
12
+
13
+ default :codec, "plain"
14
+
15
+ # The rss array list to use in the pipe
16
+ config :rss_list, :validate => :array, :required => true
17
+
18
+ #Set de interval for stoppable_sleep
19
+ config :interval, :validate => :number, :default => 200
20
+
21
+ #Set de black list to forget
22
+ config :blacklist, :validate => :array, :default => ['http://fusion.google.com/','yahoo.com','live.com','netvibes.com','bloglines.com']
23
+
24
+ public
25
+ def register
26
+ @urls = []
27
+ @list_rss = @rss_list
28
+ @agent = Mechanize.new
29
+ @agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
30
+ end # def register
31
+
32
+
33
+ def run(queue)
34
+ # we can abort the loop if stop? becomes true
35
+ while !stop?
36
+
37
+ @rss_list.each do |rss|
38
+ @actual_rss = rss
39
+ puts "Read parent: " + @actual_rss
40
+ begin
41
+ page = @agent.get(@actual_rss)
42
+ page.links.each do |link|
43
+ if link.href.chars.last(3).join == "xml" && not_include_blacklist(link)
44
+ @urls << link.href
45
+ end
46
+ end
47
+ rescue
48
+ puts "Fail to get " + @actual_rss + "feed"
49
+ end
50
+
51
+ links = @urls.uniq
52
+
53
+ links.each do |link|
54
+ begin
55
+ @agente = Mechanize.new
56
+ response = @agente.get(link)
57
+ @agente.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
58
+ handle_response(response, queue)
59
+ puts "Read clidren: " + link
60
+ rescue
61
+ puts "Fail to get " + link
62
+ end
63
+ end
64
+
65
+ @urls.clear
66
+
67
+ end
68
+
69
+ Stud.stoppable_sleep(@interval) { stop? }
70
+ end # loop
71
+ end # def run
72
+
73
+
74
+ def stop
75
+ # nothing to do in this case so it is not necessary to define stop
76
+ # examples of common "stop" tasks:
77
+ # * close sockets (unblocking blocking reads/accets)
78
+ # * cleanup temporary files
79
+ # * terminate spawned threads
80
+ end
81
+
82
+ def not_include_blacklist(link)
83
+ for i in 0..@blacklist.length-1
84
+ if link.href.include?(@blacklist[i])
85
+ return false
86
+ end
87
+ end
88
+ return true
89
+ end
90
+
91
+
92
+ def handle_response(response, queue)
93
+ body = response.body
94
+ begin
95
+ feed = RSS::Parser.parse(body)
96
+ feed.items.each do |item|
97
+ if has_enclosure?(item)
98
+ puts "item have a enclosure field"
99
+ next
100
+ else
101
+ handle_rss_response(queue, item)
102
+ end
103
+ end
104
+ rescue RSS::MissingTagError => e
105
+ next
106
+ @logger.error("Invalid RSS feed", :exception => e)
107
+ rescue RSS::TooMuchTagError => ex
108
+ next
109
+ @logger.error("TooMuchTagError feed (have enclosure tag)", :exception => ex)
110
+ rescue => exc
111
+ next
112
+ @logger.error("Uknown error while parsing the feed", :exception => exc)
113
+ end
114
+ end
115
+
116
+
117
+ def handle_rss_response(queue, item)
118
+ @codec.decode(item.description) do |event|
119
+ event.set("Feed", @actual_rss)
120
+ event.set("published", item.pubDate)
121
+ event.set("title", item.title)
122
+ event.set("link", item.link)
123
+ event.set("author", item.author)
124
+ decorate(event)
125
+ queue << event
126
+ end
127
+ end
128
+
129
+ def has_enclosure?(item)
130
+ if item.enclosure
131
+ return true
132
+ else
133
+ return false
134
+ end
135
+ end
136
+
137
+ end # class LogStash::Inputs::Crawler
@@ -0,0 +1,51 @@
1
+ # encoding: utf-8
2
+ require "logstash/inputs/base"
3
+ require "logstash/namespace"
4
+ require "stud/interval"
5
+ require "socket" # for Socket.gethostname
6
+
7
+ # Generate a repeating message.
8
+ #
9
+ # This plugin is intented only as an example.
10
+
11
+ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
12
+ config_name "multirss"
13
+
14
+ # If undefined, Logstash will complain, even if codec is unused.
15
+ default :codec, "plain"
16
+
17
+ # The message string to use in the event.
18
+ config :message, :validate => :string, :default => "Hello World!"
19
+
20
+ # Set how frequently messages should be sent.
21
+ #
22
+ # The default, `1`, means send a message every second.
23
+ config :interval, :validate => :number, :default => 1
24
+
25
+ public
26
+ def register
27
+ @host = Socket.gethostname
28
+ end # def register
29
+
30
+ def run(queue)
31
+ # we can abort the loop if stop? becomes true
32
+ while !stop?
33
+ event = LogStash::Event.new("message" => @message, "host" => @host)
34
+ decorate(event)
35
+ queue << event
36
+ # because the sleep interval can be big, when shutdown happens
37
+ # we want to be able to abort the sleep
38
+ # Stud.stoppable_sleep will frequently evaluate the given block
39
+ # and abort the sleep(@interval) if the return value is true
40
+ Stud.stoppable_sleep(@interval) { stop? }
41
+ end # loop
42
+ end # def run
43
+
44
+ def stop
45
+ # nothing to do in this case so it is not necessary to define stop
46
+ # examples of common "stop" tasks:
47
+ # * close sockets (unblocking blocking reads/accepts)
48
+ # * cleanup temporary files
49
+ # * terminate spawned threads
50
+ end
51
+ end # class LogStash::Inputs::Multirss
@@ -0,0 +1,117 @@
1
+ # encoding: utf-8
2
+ require "logstash/inputs/base"
3
+ require "logstash/namespace"
4
+ require "stud/interval"
5
+ require "net/http"
6
+ require "uri"
7
+ require "mechanize"
8
+ require "faraday"
9
+ require "rss"
10
+
11
+ class LogStash::Inputs::Crawler < LogStash::Inputs::Base
12
+ config_name "multirss"
13
+
14
+ # If undefined, Logstash will complain, even if codec is unused.
15
+ default :codec, "plain"
16
+
17
+ # The message string to use in the event.
18
+ config :urls, :validate => :array, :required => true
19
+
20
+ #Set de interval for stoppable_sleep
21
+ config :interval, :validate => :number, :default => 86400
22
+
23
+ #Domains to exclude
24
+ config :blacklist, :validate => :array , :default => ['http://fusion.google.com/','yahoo.com','live.com','netvibes.com']
25
+
26
+ public
27
+ def register
28
+ @urls = []
29
+ puts "**********************************************************"
30
+ puts "STARTING MULTI-RSS"
31
+ puts "*******************************************************"
32
+ @agent = Mechanize.new
33
+ @agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
34
+ end # def register
35
+
36
+
37
+ def run(queue)
38
+ # we can abort the loop if stop? becomes true
39
+ while !stop?
40
+ puts "********************ENTRA1"
41
+ @urls.each do |url|
42
+ page = @agent.get(@url)
43
+ page.links.each do |link|
44
+ if link.href.chars.last(3).join == "xml" && valid_link?(link)
45
+ puts "*******************************************************"
46
+ puts link.href
47
+ @new_urls << link.href
48
+ end
49
+ end
50
+
51
+ links = @new_urls.uniq
52
+
53
+ links.each do |link|
54
+ puts "********************ENTRA2"
55
+ response = @agent.get(link)
56
+ handle_response(response, queue)
57
+ end
58
+
59
+ end #urls.each
60
+ Stud.stoppable_sleep(@interval) { stop? }
61
+ end # loop
62
+ end # def run
63
+
64
+
65
+ def stop
66
+ # nothing to do in this case so it is not necessary to define stop
67
+ # examples of common "stop" tasks:
68
+ # * close sockets (unblocking blocking reads/accets)
69
+ # * cleanup temporary files
70
+ # * terminate spawned threads
71
+ end
72
+
73
+ def valid_link?(link)
74
+ puts "********************ENTRA3"
75
+ @blacklist.each do |black_link|
76
+ if link.href.include?(black_link)
77
+ return false
78
+ end
79
+ end
80
+ return true
81
+ end
82
+
83
+
84
+ def handle_response(response, queue)
85
+ puts "********************ENTRA4"
86
+ body = response.body
87
+ begin
88
+ feed = RSS::Parser.parse(body)
89
+ feed.items.each do |item|
90
+ # Put each item into an event
91
+ @logger.debug("Item", :item => item.author)
92
+ handle_rss_response(queue, item)
93
+ end
94
+ rescue RSS::MissingTagError => e
95
+ @logger.error("Invalid RSS feed", :exception => e)
96
+ rescue => e
97
+ @logger.error("Uknown error while parsing the feed", :exception => e)
98
+ end
99
+ end
100
+
101
+
102
+ def handle_rss_response(queue, item)
103
+ puts "********************ENTRA5"
104
+ @codec.decode(item.description) do |event|
105
+ event.set("Feed", @url)
106
+ event.set("published", item.pubDate)
107
+ event.set("title", item.title)
108
+ event.set("link", item.link)
109
+ event.set("author", item.author)
110
+ decorate(event)
111
+ queue << event
112
+ end
113
+ end
114
+
115
+
116
+
117
+ end # class LogStash::Inputs::Crawler
@@ -0,0 +1,27 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'logstash-input-multirss'
3
+ s.version = '0.1.0'
4
+ s.licenses = ['Apache-2.0']
5
+ s.summary = 'Simple multi rss plugin'
6
+ s.description = 'This plugin need set the initial url.'
7
+ s.homepage = 'https://github.com/felixramirezgarcia/logstash-input-multirss'
8
+ s.authors = ['Felix R G']
9
+ s.email = 'felixramirezgarcia@correo.ugr.es'
10
+ s.require_paths = ['lib']
11
+
12
+ # Files
13
+ s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','CONTRIBUTORS','Gemfile','LICENSE','NOTICE.TXT']
14
+ # Tests
15
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
16
+
17
+ # Special flag to let us know this is actually a logstash plugin
18
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" }
19
+
20
+ # Gem dependencies
21
+ s.add_runtime_dependency "logstash-core"
22
+ s.add_runtime_dependency 'logstash-codec-plain'
23
+ s.add_runtime_dependency 'stud', '>= 0.0.22'
24
+ s.add_development_dependency 'logstash-devutils', '>= 0.0.16'
25
+ s.add_runtime_dependency "mechanize"
26
+ s.add_runtime_dependency "nokogiri"
27
+ end
@@ -0,0 +1,11 @@
1
+ # encoding: utf-8
2
+ require "logstash/devutils/rspec/spec_helper"
3
+ require "logstash/inputs/multirss"
4
+
5
+ describe LogStash::Inputs::Multirss do
6
+
7
+ it_behaves_like "an interruptible input plugin" do
8
+ let(:config) { { "interval" => 100 } }
9
+ end
10
+
11
+ end
metadata ADDED
@@ -0,0 +1,141 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-input-multirss
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Felix R G
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2018-08-06 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ">="
17
+ - !ruby/object:Gem::Version
18
+ version: '0'
19
+ name: logstash-core
20
+ prerelease: false
21
+ type: :runtime
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: '0'
33
+ name: logstash-codec-plain
34
+ prerelease: false
35
+ type: :runtime
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 0.0.22
47
+ name: stud
48
+ prerelease: false
49
+ type: :runtime
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: 0.0.22
55
+ - !ruby/object:Gem::Dependency
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: 0.0.16
61
+ name: logstash-devutils
62
+ prerelease: false
63
+ type: :development
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: 0.0.16
69
+ - !ruby/object:Gem::Dependency
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ name: mechanize
76
+ prerelease: false
77
+ type: :runtime
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ requirement: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - ">="
87
+ - !ruby/object:Gem::Version
88
+ version: '0'
89
+ name: nokogiri
90
+ prerelease: false
91
+ type: :runtime
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ description: This plugin need set the initial url.
98
+ email: felixramirezgarcia@correo.ugr.es
99
+ executables: []
100
+ extensions: []
101
+ extra_rdoc_files: []
102
+ files:
103
+ - CHANGELOG.md
104
+ - CONTRIBUTORS
105
+ - DEVELOPER.md
106
+ - Gemfile
107
+ - LICENSE
108
+ - README.md
109
+ - lib/logstash/inputs/multirss.rb
110
+ - lib/logstash/inputs/multirss.rb.bk
111
+ - lib/logstash/inputs/multirss.rb.bk2
112
+ - logstash-input-multirss.gemspec
113
+ - spec/inputs/multirss_spec.rb
114
+ homepage: https://github.com/felixramirezgarcia/logstash-input-multirss
115
+ licenses:
116
+ - Apache-2.0
117
+ metadata:
118
+ logstash_plugin: 'true'
119
+ logstash_group: input
120
+ post_install_message:
121
+ rdoc_options: []
122
+ require_paths:
123
+ - lib
124
+ required_ruby_version: !ruby/object:Gem::Requirement
125
+ requirements:
126
+ - - ">="
127
+ - !ruby/object:Gem::Version
128
+ version: '0'
129
+ required_rubygems_version: !ruby/object:Gem::Requirement
130
+ requirements:
131
+ - - ">="
132
+ - !ruby/object:Gem::Version
133
+ version: '0'
134
+ requirements: []
135
+ rubyforge_project:
136
+ rubygems_version: 2.6.13
137
+ signing_key:
138
+ specification_version: 4
139
+ summary: Simple multi rss plugin
140
+ test_files:
141
+ - spec/inputs/multirss_spec.rb