logstash-input-multirss 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +21 -34
- data/lib/logstash/inputs/multirss.rb +106 -31
- data/logstash-input-multirss.gemspec +8 -2
- metadata +26 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e1cb1975f468ee41df996eab224de416ee1c1389d4508ab153eba11e3d9ec601
|
4
|
+
data.tar.gz: 66adf18634c47c4a2bf3aebc31b071e112bf54b7f1eb40a34e0f47ecbebcedad
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: '0401495068a213d7d689859c390b3ec1dd9a95e82c6e3af42d49a39780770eb9558c2558c9cf3b1951d1ba76ee7462d568c39384488b48b16f1ac6cf30219d39'
|
7
|
+
data.tar.gz: a65c76f4dce2c18bf85074328c30c1dc5ffbfaecd127bae27a1664cb1e2ac94c13a79bcd4c77a97c03741c62bdb458b3a506f9b9d8737f8a54c6e1a4488e9d1f
|
data/README.md
CHANGED
@@ -4,7 +4,9 @@ This is a plugin for [Logstash](https://github.com/elastic/logstash).
|
|
4
4
|
|
5
5
|
It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
|
6
6
|
|
7
|
-
#
|
7
|
+
# Developing
|
8
|
+
|
9
|
+
## Install
|
8
10
|
|
9
11
|
U can install the plugin from https://rubygems.org/gems/logstash-input-multirss , or build it yuouself in a logstash service or container with :
|
10
12
|
|
@@ -16,15 +18,28 @@ ruby -S gem build logstash-input-multirss.gemspec
|
|
16
18
|
|
17
19
|
logstash-plugin install logstash-input-multirss-[nº_version].gem
|
18
20
|
|
19
|
-
|
21
|
+
### Pipeline Example
|
20
22
|
|
21
23
|
You can see a example in https://github.com/felixramirezgarcia/logstash-input-multirss/blob/master/example-pipeline.conf
|
22
24
|
|
23
25
|
The difference between the attributes multi_feed and one_feed is that the multi_feed is the URI of the parent address where several rss (xml) are found. For the case where you want to explore only one of those links you can use the one_feed attribute. A visual example can be seen by visiting the following links:
|
24
26
|
|
25
27
|
Father (multi_feed) => http://rss.elmundo.es/rss/
|
28
|
+
|
26
29
|
Son (one_feed) => http://estaticos.elmundo.es/elmundo/rss/portada.xml
|
27
30
|
|
31
|
+
All the params are :
|
32
|
+
|
33
|
+
1) multi_feed => [array] URI parent with more rss links inside , something like this: http://rss.elmundo.es/rss/
|
34
|
+
|
35
|
+
2) one_feed => [array] childs URIS with XML content inside , something like this: http://estaticos.elmundo.es/elmundo/rss/portada.xml
|
36
|
+
|
37
|
+
3) blacklist => [array] strings , links, text ... what you dont want explored
|
38
|
+
|
39
|
+
4) Interval => [int] Set the Stoppable_sleep interval for the pipe
|
40
|
+
|
41
|
+
5) keywords => [array] If you use this parameter will only compile those news that contain in any of its attributes a word from this array
|
42
|
+
|
28
43
|
## Documentation
|
29
44
|
|
30
45
|
Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
|
@@ -36,37 +51,9 @@ Logstash provides infrastructure to automatically generate documentation for thi
|
|
36
51
|
|
37
52
|
Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
|
38
53
|
|
39
|
-
|
40
|
-
|
41
|
-
### 1. Plugin Developement and Testing
|
42
|
-
|
43
|
-
#### Code
|
44
|
-
- To get started, you'll need JRuby with the Bundler gem installed.
|
45
|
-
|
46
|
-
- Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
|
47
|
-
|
48
|
-
- Install dependencies
|
49
|
-
```sh
|
50
|
-
bundle install
|
51
|
-
```
|
52
|
-
|
53
|
-
#### Test
|
54
|
-
|
55
|
-
- Update your dependencies
|
56
|
-
|
57
|
-
```sh
|
58
|
-
bundle install
|
59
|
-
```
|
60
|
-
|
61
|
-
- Run tests
|
62
|
-
|
63
|
-
```sh
|
64
|
-
bundle exec rspec
|
65
|
-
```
|
66
|
-
|
67
|
-
### 2. Running your unpublished Plugin in Logstash
|
54
|
+
# Running your unpublished Plugin in Logstash
|
68
55
|
|
69
|
-
|
56
|
+
## Run in a local Logstash clone
|
70
57
|
|
71
58
|
- Edit Logstash `Gemfile` and add the local plugin path, for example:
|
72
59
|
```ruby
|
@@ -82,9 +69,9 @@ bin/logstash -e 'filter {awesome {}}'
|
|
82
69
|
```
|
83
70
|
At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
|
84
71
|
|
85
|
-
|
72
|
+
## Run in an installed Logstash
|
86
73
|
|
87
|
-
You can use the same
|
74
|
+
You can use the same method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
|
88
75
|
|
89
76
|
- Build your plugin gem
|
90
77
|
```sh
|
@@ -7,31 +7,37 @@ require "uri"
|
|
7
7
|
require "mechanize"
|
8
8
|
require "rss"
|
9
9
|
require "nokogiri"
|
10
|
+
require "fileutils"
|
10
11
|
|
11
12
|
# if you want to debug it you just have to uncomment the puts and build the gem with
|
12
13
|
# ruby -S gem build logstash-input-multirss.gemspec
|
13
|
-
# and install the gem in a logstash service with
|
14
|
+
# and install the gem in a logstash service or container with
|
14
15
|
# logstash-plugin install logstash-input-multirss-x.x.x.gem
|
15
16
|
|
16
17
|
class LogStash::Inputs::Multirss < LogStash::Inputs::Base
|
17
|
-
config_name "multirss"
|
18
|
+
config_name "multirss" #Plugin name
|
18
19
|
|
19
|
-
default :codec, "plain"
|
20
|
+
default :codec, "plain" #Codec
|
20
21
|
|
21
|
-
# The rss array list to use in the pipe
|
22
|
-
config :multi_feed, :validate => :array, :
|
22
|
+
# The rss parent array list to use in the pipe (link with a lot rss links inside)
|
23
|
+
config :multi_feed, :validate => :array, :default => []
|
23
24
|
|
24
|
-
# The rss array list to use in the pipe
|
25
|
+
# The rss childs array list to use in the pipe (simple rss link)
|
25
26
|
config :one_feed, :validate => :array, :default => []
|
26
27
|
|
27
28
|
#Set de interval for stoppable_sleep
|
28
29
|
config :interval, :validate => :number, :default => 3600
|
29
30
|
|
30
|
-
#Set de black list to forget read
|
31
|
+
#Set de black list to forget read and get content
|
31
32
|
config :blacklist, :validate => :array, :default => []
|
32
33
|
|
34
|
+
#Set de keywords to ONLY get content whit it
|
35
|
+
config :keywords, :validate => :array, :default => []
|
36
|
+
|
33
37
|
public
|
34
|
-
|
38
|
+
|
39
|
+
def register #initialize
|
40
|
+
#Mechanize agent
|
35
41
|
@agent = Mechanize.new
|
36
42
|
@agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
|
37
43
|
end # def register
|
@@ -41,11 +47,13 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
|
|
41
47
|
# we can abort the loop if stop? becomes true
|
42
48
|
urls = []
|
43
49
|
|
50
|
+
#Don't stop, keep going.
|
44
51
|
while !stop?
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
52
|
+
|
53
|
+
manage_tempdir
|
54
|
+
|
55
|
+
@multi_feed.each do |rss| #get the father's children
|
56
|
+
#puts "Read parent: " + rss
|
49
57
|
begin
|
50
58
|
page = @agent.get(rss)
|
51
59
|
page.links.each do |link|
|
@@ -62,11 +70,9 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
|
|
62
70
|
links.each do |link|
|
63
71
|
begin
|
64
72
|
response_link(link,queue)
|
65
|
-
|
66
|
-
#puts str
|
73
|
+
#puts "Read clidren: " + link
|
67
74
|
rescue
|
68
|
-
|
69
|
-
#puts str
|
75
|
+
#puts "Fail to get " + link + " children"
|
70
76
|
next
|
71
77
|
end # end begin
|
72
78
|
end # end each links
|
@@ -82,17 +88,23 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
|
|
82
88
|
all_links.each do |link|
|
83
89
|
begin
|
84
90
|
response_link(link,queue)
|
85
|
-
|
86
|
-
#puts str
|
91
|
+
#puts "Read clidren: " + link
|
87
92
|
rescue
|
88
|
-
|
89
|
-
#puts str
|
93
|
+
#puts "Fail to get " + link
|
90
94
|
next
|
91
95
|
end # begin
|
92
96
|
end # all_links loop
|
93
97
|
|
94
98
|
urls.clear
|
95
99
|
|
100
|
+
# Remove the tempfiles
|
101
|
+
if (File::directory?(@d))
|
102
|
+
ENV.delete("TMPDIR")
|
103
|
+
FileUtils.rm_rf @d
|
104
|
+
#puts "Remove temp dir"
|
105
|
+
end
|
106
|
+
|
107
|
+
#Stoppable_sleep interval
|
96
108
|
Stud.stoppable_sleep(@interval) { stop? }
|
97
109
|
end # end while
|
98
110
|
end # end def run
|
@@ -129,19 +141,53 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
|
|
129
141
|
|
130
142
|
def link_rss_response(queue, item)
|
131
143
|
event = LogStash::Event.new()
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
|
142
|
-
|
144
|
+
|
145
|
+
if @keywords.size.to_s.to_i > 0 # "Have keywords
|
146
|
+
haskey = false
|
147
|
+
|
148
|
+
item.element_children.each do |x|
|
149
|
+
if include_keywords(x.inner_html.to_s)
|
150
|
+
#puts "--------------Finded notice with the keyword---------------"
|
151
|
+
haskey = true
|
152
|
+
end
|
153
|
+
end # end loop
|
154
|
+
|
155
|
+
if haskey == true
|
156
|
+
item.element_children.each do |x|
|
157
|
+
#puts "The notice " + x.name + " is " + x.inner_html.to_s
|
158
|
+
if x.inner_html.to_s.chars.first(9).join == "<![CDATA["
|
159
|
+
eve = LogStash::Event.new( x.name => x.inner_html.to_s[9..x.inner_html.to_s.length-4] )
|
160
|
+
event.append( eve )
|
161
|
+
else
|
162
|
+
eve = LogStash::Event.new( x.name => x.inner_html.to_s )
|
163
|
+
event.append( eve )
|
164
|
+
end # end if else
|
165
|
+
end # end loop
|
166
|
+
elsif haskey == false # havent haskey
|
167
|
+
event = nil
|
168
|
+
end # if haskey
|
169
|
+
|
170
|
+
else # havent keywords!
|
171
|
+
#puts "Havent keywords, go to get all items"
|
172
|
+
item.element_children.each do |x|
|
173
|
+
if x.inner_html.to_s.chars.first(9).join == "<![CDATA["
|
174
|
+
eve = LogStash::Event.new( x.name => x.inner_html.to_s[9..x.inner_html.to_s.length-4])
|
175
|
+
event.append( eve )
|
176
|
+
else
|
177
|
+
eve = LogStash::Event.new( x.name => x.inner_html.to_s )
|
178
|
+
event.append( eve )
|
179
|
+
end # end if
|
180
|
+
end # end loop
|
181
|
+
end # end if have keywords
|
182
|
+
|
183
|
+
if event != nil
|
184
|
+
decorate(event)
|
185
|
+
queue << event
|
186
|
+
end # end if
|
187
|
+
|
143
188
|
end # def link_rss_response
|
144
189
|
|
190
|
+
|
145
191
|
def not_include_blacklist(link)
|
146
192
|
for i in 0..@blacklist.length-1
|
147
193
|
if link.href.include?(@blacklist[i])
|
@@ -152,4 +198,33 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
|
|
152
198
|
end # def not_include_blacklist
|
153
199
|
|
154
200
|
|
201
|
+
def include_keywords(key)
|
202
|
+
for i in 0..@keywords.length-1
|
203
|
+
if key.include?(@keywords[i])
|
204
|
+
return true
|
205
|
+
end # end if
|
206
|
+
end # end for
|
207
|
+
return false
|
208
|
+
end # def include_keywords
|
209
|
+
|
210
|
+
|
211
|
+
def manage_tempdir
|
212
|
+
#set the tempfile to openUri output
|
213
|
+
@d = "#{Dir.home}/.tmp"
|
214
|
+
#if exists
|
215
|
+
if (File::directory?(@d))
|
216
|
+
#puts "Dir exists , removed and create again"
|
217
|
+
ENV.delete("TMPDIR")
|
218
|
+
FileUtils.rm_rf @d
|
219
|
+
#create new
|
220
|
+
Dir.mkdir @d #create in /usr/share/logstash
|
221
|
+
ENV["TMPDIR"] = @d
|
222
|
+
else
|
223
|
+
Dir.mkdir @d #create in /usr/share/logstash
|
224
|
+
ENV["TMPDIR"] = @d
|
225
|
+
#puts "Dir no exist , created...."
|
226
|
+
end
|
227
|
+
end
|
228
|
+
|
229
|
+
|
155
230
|
end # class LogStash::Inputs::Crawler
|
@@ -1,9 +1,14 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-input-multirss'
|
3
|
-
s.version = '1.
|
3
|
+
s.version = '1.2.0'
|
4
4
|
s.licenses = ['Apache-2.0']
|
5
5
|
s.summary = 'Simple multi rss plugin'
|
6
|
-
s.description = 'This plugin
|
6
|
+
s.description = 'This plugin get the feed rss content (being able to use keywords to get the feed) , the params are:
|
7
|
+
1) multi_feed => [array] URI parent with more rss links inside , something like this: http://rss.elmundo.es/rss/
|
8
|
+
2) one_feed => [array] (optionally) childs URIS with XML content inside , something like this: http://estaticos.elmundo.es/elmundo/rss/portada.xml
|
9
|
+
3) blacklist => [array] (optionally) strings , links, text ... what you dont want explored
|
10
|
+
4) Interval => [int] Set the Stoppable_sleep interval for the pipe
|
11
|
+
5) keywords => [array] if you use this parameter will only compile those news that contain in any of its attributes a word from this array'
|
7
12
|
s.homepage = 'https://github.com/felixramirezgarcia/logstash-input-multirss'
|
8
13
|
s.authors = ['Felix R G']
|
9
14
|
s.email = 'felixramirezgarcia@correo.ugr.es'
|
@@ -24,4 +29,5 @@ Gem::Specification.new do |s|
|
|
24
29
|
s.add_development_dependency 'logstash-devutils', '>= 0.0.16'
|
25
30
|
s.add_runtime_dependency "mechanize"
|
26
31
|
s.add_runtime_dependency "nokogiri"
|
32
|
+
s.add_runtime_dependency "fileutils"
|
27
33
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-multirss
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Felix R G
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-09-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -94,8 +94,30 @@ dependencies:
|
|
94
94
|
- - ">="
|
95
95
|
- !ruby/object:Gem::Version
|
96
96
|
version: '0'
|
97
|
-
|
98
|
-
|
97
|
+
- !ruby/object:Gem::Dependency
|
98
|
+
requirement: !ruby/object:Gem::Requirement
|
99
|
+
requirements:
|
100
|
+
- - ">="
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: '0'
|
103
|
+
name: fileutils
|
104
|
+
prerelease: false
|
105
|
+
type: :runtime
|
106
|
+
version_requirements: !ruby/object:Gem::Requirement
|
107
|
+
requirements:
|
108
|
+
- - ">="
|
109
|
+
- !ruby/object:Gem::Version
|
110
|
+
version: '0'
|
111
|
+
description: "This plugin get the feed rss content (being able to use keywords to\
|
112
|
+
\ get the feed) , the params are: \n 1) multi_feed => [array]\
|
113
|
+
\ URI parent with more rss links inside , something like this: http://rss.elmundo.es/rss/\
|
114
|
+
\ \n 2) one_feed => [array] (optionally) childs URIS with\
|
115
|
+
\ XML content inside , something like this: http://estaticos.elmundo.es/elmundo/rss/portada.xml\
|
116
|
+
\ \n 3) blacklist => [array] (optionally) strings , links,\
|
117
|
+
\ text ... what you dont want explored\n 4) Interval => [int]\
|
118
|
+
\ Set the Stoppable_sleep interval for the pipe\n 5) keywords\
|
119
|
+
\ => [array] if you use this parameter will only compile those news that contain\
|
120
|
+
\ in any of its attributes a word from this array"
|
99
121
|
email: felixramirezgarcia@correo.ugr.es
|
100
122
|
executables: []
|
101
123
|
extensions: []
|